Issue Number | 274 |
---|---|
Summary | EBMS Statistics - Password/Login Problems |
Created | 2015-02-05 09:08:15 |
Issue Type | Task |
Submitted By | Juthe, Robin (NIH/NCI) [E] |
Assigned To | alan |
Status | Closed |
Resolved | 2015-03-25 00:29:08 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/oceebms/issue.146486 |
We are trying to compile some data to illustrate the login problems our Board members are facing. Iām wondering if you have access to any of the following information:
-A monthly total number of unique Board members who logged into the
EBMS from May 2013 through February 2015
---Might be worth stratifying this by NCI and non-NCI Board members if
this is possible
-A monthly total number of active EBMS accounts for the same time period (so we can calculate the number of unique Board members logging in as a percentage of the total)
-A monthly total of unique Board members who had failed login
attempts
---Do we know what subset of these resulted in locked accounts?
Combing through the database and the software I have not found any stored data that would allow us to generate the information we need for the reports from activities in the past. In light of that, here is my thinking about the issue so far:
In order to create the reports going forward we would have to modify the software and install the modifications in production. Two types of modifications are required, modifications to the login software to record more information (the most important part), and new reports to use it. The new report development could be deferred for a while since it won't tell us anything until the data gathering has been in place for a while.
It would also be necessary to add one new table to the database. I'm thinking we would want a table with one row for every login attempt. At a minimum, the table might contain:
Userid (always available for a successful login, sometimes but not always determinable from an unsuccessful one.
Datetime of attempt
Success or failure
This information might also be needed:
Affiliation - If people change their affiliation (NCI, non-NCI) we might want to record what it was at the time of the attempt. If not, we can get it from existing data.
Login method - If people change their login method (edir, drupal, or sso) independently of changing their affiliation (don't know why we'd do that but maybe we do), we'd want to know that. If not we can get this from existing data too.
Information about failed SSO user logins ("NIH Login") may be impossible to get. We send the user to an NIH login page but if the login fails I don't know that we can find out who the user was, how many times he or she tried to login, or whether the account was locked.
Once such a table was defined, we could write all of the requested reports for users with edir or drupal logins (though we probably don't care about drupal logins). For sso users we could get some information, but probably only for successful logins, not failures.
The main question that comes to my mind about this is, Should we go ahead with it given that we cannot get retrospective information and might not be able to put anything in place to gather prospective information for at least a few weeks, with no useful information derived for at least a month or two after that?
If the answer is yes, I'd need to go over the design of the existing login logic with Bob, prepare a design for the modifications, and implement it. The actual logic doesn't look as hard (for me) as figuring out the key places to insert modifications and figuring out what information edir and sso can provide us. Bob has worked with this stuff before and may be able to give me a lot of that info much faster than I could figure it out.
Thanks for looking into this, Alan. I think we should proceed with capturing this information. It's okay if we can't gather information about the SSO users - they aren't the ones having problems. I had mentioned them above only because I wanted to make sure we looked at non-NIH users separately. I think the table should include login method but I don't think we need to include the affiliation.
Robin,
Here's what I'm working on. Please let me know if this meets the
requirements or whether I've missed something.
I'll present an input form with the following fields to fill in:
Start date: (A calendar widget)
End date: (Another calendar widget)
Details checkbox (see below)
The dates will be adjusted to the actual minimum and maximum dates
for which we have data. So if you want data beginning Jan 1 but the
earliest data we have is Feb 15, we'll start with Feb 15 and let you
know that.
The report will do the following:
Search the log files for the data.
Present the following results:
For each month:
For board members using edir login:
Total board members currently in that category
(this may not be accurate for the dates chosen)
(currently there are 118 board members in edir)
Total login attempts
Total successful logins
Total failed
Percent successful
For board members using SSO:
As much of the same information as I can get. See notes.
If we don't want the SSO users, I'll leave this out
entirely.
Notes:
I see log messages showing failed logins for a user (a CBIIT
sysadmin) with an SSO login method. But I don't know that
the messages came from an SSO attempt. I spoke to him and I
know that he tried both methods, edir and SSO (I wouldn't be
surprised if a lot of board members do this too.) It may be
that the edir failures were the only ones logged. If that's
true, the failure rate is probably worse than the numbers
will indicate.
I don't know if I can get the number who were blocked. I
see the place where the block occurs in the core drupal
code. It's a place we wouldn't want to modify unless it were
critical to do so. I may still be able to trap and log the
failure outside the core module. I'll talk to Bob about
that, but I will keep working on what I know I can do
without that.
If the "Details" checkbox is checked:
Produce a list of login names:
For each member name show:
Number of successful login attempts over the entire period.
Number of failures over the entire period (maybe edir only).
Notes:
We have a record of the name used in edir login attempts.
These names might not be 100% accurate. For example, if
"John Doe" fails in an attempt to login, he might think he's
got the name field wrong and try other patterns, for
example:
John Doe
John Q Doe
Doe, John
etc.
So you might see a report that says:
John Doe - successful 2, failed 2
John Q Doe - failed 1
Doe, John - failed 1
The detailed report might show you a lot about the problems.
However if you don't need it, let me know and I won't bother
with it.
I reported earlier that I would have to insert new logging code into the EBMS. However, after reading more code, I found that login successes and failures are already logged - but in a temporary revolving log that only keeps a limited number of entries.
I got CBIIT to remove the limit on the log message count so these messages can be used for the future. I had them bump up the number so that about 2 years worth will be saved.
The earliest datetime in the production log file is now 2015-02-12 15:10:47. Our report should be able to generate data from then on.
Hi Alan,
I think the report you've proposed sounds good. I have just a few comments.
1. I think we can scrap the SSO users from the report. Since that information is harder (and perhaps some of it is impossible) to obtain, I don't think we should worry about it unless we learn that the SSO users are having difficulty logging in.
2. I think the detailed report would be very useful. It would be nice to pinpoint who is repeatedly having problems so that we can help them. Seeing the different usernames they attempt to log in with could also help to determine if username or password issues predominate.
3. If we're able to get the names and number of blocked users, that would also be helpful. I know you said that's something you might have to talk to Bob about. It would also be nice to know precisely how many attempts (from a single IP address or with a single username) result in a blocked account AND how long the blocked period is (I think it's somewhere around 6 hrs maybe?). We often get that question from Board members.
Thanks!
Robin
I will drop the SSO users from the design, keep the detailed list of failures, and investigate the blocked users.
According to the documentation I found on the Drupal site (all from users making guesses) the controls are:
Maximum of 5 consecutive failures by one user name before blocking.
Maximum of 50 consecutive failures from one IP address>
Outage time = 6 hours.
However, reading the code, I found a place where it's set to 5 failures but only 1 hour.
These can be overridden, but it looks to me like we'd have to add another user contributed user module or else write our own override code. It doesn't look hard but, unfortunately, it doesn't come out of the box, so we'd have to write the code and get CBIIT to install it.
Thanks for investigating this, Alan. I think we should discuss overriding that code (if CBIIT will allow it). I know it's extremely frustrating to Board members to get blocked - that's typically the point at which they contact us and we have to tell them that they won't be able to get back in for at least 6 hrs... As this is a separate issue from this request for statistics, I will put in a new ticket.
I think the report is working on DEV.
I will attach a text document that describes how it works and
explains the meanings of the rows and columns in the various
tables. If we ever get a real user manual for board managers and
others who aren't members, the material in that document should
be included in the manual. Until then, we've got this.
Testing is DIFFICULT. A board manager can test to see if the
report runs without crashing and produces plausible looking
numbers, but I don't know how an EBMS user can test to find out
if the report is accurate. If the report says there were 77
successful logins and 29 failures, how can a user check to see if
that's right?
I spent many hours testing by running different SQL queries
against the database and comparing the results to what the report
showed. The testing showed that simple minded approaches didn't
work and I often needed multiple complicated queries to produce a
single number on the report.
However, to the best of my knowledge, all of the numbers are
now
accurate and match what the documentation says they should show.
To try out the report click:
Reports > Board Management Reports > Board Member Login Attempts
Documentation for the Board Member Login Attempts report.
Documentation for the Board Member Login Attempts report.
This is ready for testing on DEV.
This looks really good so far. Thank you for the detailed write-up, too. I've asked Victoria to take a look as well.
One question she had ā will the default start date always adjust to the earliest date for which we have login data available? Thinking ahead to when we have 100,000 entries, will it adjust accordingly to the date for the 1st of those available entries?
The default start date is determined dynamically by finding the earliest record in the Drupal log table. When older data starts being chopped off, the default date will automatically adjust to show you the earliest date available at the time the program runs.
As we get closer to the time that data starts to be chopped off you may wish to increase the number of log entries to keep cumulating data. Another, maybe less functional approach would be to print out reports from time to time and keep them on paper. I say that's less functional because, if you ever want a modification of the report, it won't be possible to go back and run the modified software against the data that's gone.
This sounds good. I think about two years worth of data is fine for now but we can revisit this down the road. Thanks, Alan.
I edited the log comment for revision 13135 (where the changes for this ticket were checked in), adding the ID for this ticket so that JIRA can find it and link it to this page (and so we can find this ticket in the future when we're looking at the history and logs in Subversion). As a general practice, it's a good idea to add the ID of the issue tracking ticket to any revision checking in work done for a specific ticket (or for several tickets, for that matter).
This report seems to be working well on QA, although we noticed one possible bug ā and would like to revise the wording of the report options to be a bit clearer.
Possible bug:
we are unable to run the report for a single day (e.g., today). Today's
date cannot be entered as both the start date and the end date.
Wording changes:
We would like to change "INCLUDE MEMBER DETAILS" to "REPORT
OPTIONS"
Please change "None" to "Statistics Only"
Please change "Only for login failures" to "Board Members With Failed
Login Attempts (Excludes NIH Board Members)"
Please change "All login details" to "All Board Member Login Attempts
(Includes NIH Board Members)"
Thanks!
One more thing. Could we please add a line at the bottom of the report (when the 2nd or 3rd options are chosen) to say what the * means? This could say:
Recognized EBMS username
(My star was automatically converted to a bullet, but you get the idea)
Sorry, one more thing. Could we remove the "Total Edir Users" from the first table? Since this report is just looking at Board member logins, I don't think the total Edir user number is meaningful to us and will likely introduce confusion. The number we care about (and will use for calculations) is the Total Board member Edir users. Thanks.
I have made all of the changes, checked them in to version control and installed them on DEV and QA.
Ready for QA.
Thanks, Alan! Looking these over now and came across one thing so far:
The third checkbox should say Includes NIH Board Members.
Oops. Give me a minute.
Fixed on DEV, QA, and version control archive.
Thanks, Alan. Could you please remove the "Total Edir Users" from the first table? As I explained above, I don't think this number is too meaningful to us.
Us squirrels have short attention spans.
Sigh.
I'll fix it sometime tonight.
Done.
Verified on QA.
(In some browsers, some of the column lines are just a tiny, tiny bit off as you scroll down the page, but that isn't affecting the data.)
Verified on PROD.
File Name | Posted | User |
---|---|---|
ReportDoc.txt | 2015-03-25 00:28:09 |
Elapsed: 0:00:00.000394