Issue Number | 701 |
---|---|
Summary | Article Search Page - Author search error |
Created | 2023-01-18 16:11:44 |
Issue Type | Bug |
Submitted By | Boggess, Cynthia (NIH/NCI) [C] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2023-01-27 08:29:43 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/oceebms/issue.336574 |
On the Article Search Page when searching the author field in the first position without an initial, EBMS4 retrieves zero results while Prod retrieves citations. For example:
In EBMS Prod:
Author only search in first position using Kuhlen retrieves 15 citations
Author only search in first position using %Kuhlen% retrieves 16 citations
Author only search in first position using Kuhlen M retrieves 15 citations
Author only search in first position using %Kuhlen M% retrieves 15 citations
In EBMS 4:
Author only search in first position using Kuhlen retrieves 0 citations
Author only search in first position using %Kuhlen% retrieves 16 citations
Author only search in first position using Kuhlen M retrieves 15 citations
Author only search in first position using %Kuhlen M% retrieves 15citations
The old system stores the surname separately from the initials. The only use made of that separate storage is on the search page, using flawed search logic which assumes that no author will have a compound surname (when in fact, 67,882 authors, responsible for authoring or co-authoring 163,573 articles, have compound surnames). When the user enters "von Doblen" in the Author field the search will report that no article were found with this author, even when "Any" is selected for the Author Position, failing to find PMID 20197772, co-authored by U. von Doblen (whose name we represent as "von Doblen U"). This is because the current software splits the name entered by the user on the first space, and treats the left side of that space ("von" in this case) as the surname and everything else ("Doblen" for this example) as the author's initials. I noticed this problem, and not wanting to retain the flaw in the new system, decided to store the full surname plus initials together in a single field, so that the user could either search for "von Doblen%" (less precision), or for "von Doblen U" (more precision). I'm assuming we don't want to carry forward the flawed logic from the old EBMS, but we could (at the price of some precision) have the software always append a trailing wildcard ("%") to the name(s) entered on the search page by the user.
Your thoughts?
I should probably have mentioned earlier this fix for the mistake in the logic which I found. For that I apologize. (I don't have a real excuse—I just forgot). Two more observations:
my count of 67,882 authors doesn't include authors with hyphenated surnames; these are the authors with spaces in the surname field
the only way I can think of to restore the functionality you thought you had in the old system (the ability to find articles written by M Kuhlen without picking up articles written by ML Kuhlenschmidt, when you don't know Kuhlen's first name initial) would be to (a) revert to storing the surnames separately from the initials, modifying the data structures and rewriting the import and migration software, as well as the software which assembles the display of articles; and (b) modify the search interface, making the user enter the surname and initials for each author in separate fields in repeating groups (or come up with some complicated syntax which the user would enter to distinguish the surname from the initials, such as "SN:Smith,INIT:JK;SN:Jones,INIT:I"). I'm going to guess those solutions don't appeal to anyone very much.
Does my convoluted explanation of the change in behavior make any sense? 😛
My main concern with the author search is not really precision but rather the zero result when just Kuhlen was searched even though there were at least 15 citations that could have been retrieved as we saw in prod. I would rather lean toward less precision than for a user to miss a citation due to a search technicality.
Jeff was the one who identified this issue. I did not notice because I have become accustomed to always using % at the beginning and end of my search terms and did not think to try without.
I don't have a problem with automatically tacking on a wildcard at the end of the author search. And if I understand all that you have mentioned above, having a wildcard at the end should not impact names like von Doblen due to your correction with how the names are now being stored.
If you think that automatically tacking on a wildcard at the end of the author search will cause problems, then we will probably want to add user instruction for wildcard use with the author search field on the Article search page. Yes, it will also be in the help documentation, but in this case would be needed in both places.
The only concern with the automatic wildcard for the author field is the resulting inconsistency with the title field behavior, where the wildcard(s) must be specified in order to be applied. So I would lean toward the option of adding to the instructions (which I agree would be helpful). I have added some trial language to the search page on my server. Your thoughts, ~vshields?
Here are a few suggestions from Jeff regarding the trial language for us to consider:
For Author, give an additional example of the wildcards appearing before and after the name (for example %smith%)
For Journal Title, give an example similar to the one above (since the user may not think to look at the above field). This would also be useful since most people associate wildcard characters with asterisks and exclamation points, rather than percentage signs.
I have applied Jeff's suggestions on https://ebms.rksystems.com.
Looks good to Jeff and I, let's see what others think.
I have entered a provisional LOE for the story points based tentatively on the proposed solution, with the suggested amendments to the wording of the on-form help text, but ~vshields and ~juther will be consulting with the other board member for their input and possible confirmation that this is the desired solution.
The users have confirmed that the current behavior of EBMS4 for searching by author is what they want.
Verified on QA. Thanks!
Elapsed: 0:00:00.000869