EBMS Tickets

Issue Number 180
Summary [Queue] Add filtering by string patterns in article titles for Med Librarian Queue
Created 2014-05-28 18:37:29
Issue Type Improvement
Submitted By Juthe, Robin (NIH/NCI) [E]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2014-08-01 17:43:17
Resolution Fixed
Path /home/bkline/backups/jira/oceebms/issue.127443
Description

We would like to add the ability to search by title words string patterns to the medical librarian queue page. Low priority.

Comment entered 2014-05-28 18:38:14 by Juthe, Robin (NIH/NCI) [E]

Adding Cynthia and Minaxi to this issue.

Comment entered 2014-05-29 08:25:07 by Kline, Bob (NIH/NCI) [C]

Could we get specific requirements for this request?

Comment entered 2014-05-29 08:49:30 by Boggess, Cynthia (NIH/NCI) [C]

From the medical librarian's queue at the top of the page along with the filter features board, sumtop, journal, and journal ID add a feature to filter the queue to citations with specific words in the title. Place this feature in the filter list above journal and journal id.

Comment entered 2014-05-29 09:10:01 by Kline, Bob (NIH/NCI) [C]

Thanks. To fill in the gaps: we will ignore case in matching words specified, but we will not do stemming (e.g., "mouse" will not match "mice" nor will "advancing" match "advance"). All words entered in the new field (separated by a space character) must be present (in any order) in the displayed title in order for the article to be included in the queue. Embedded punctuation must match exactly ("cancers" will not match "cancer's" and "cancer's" will not match "cancer’s" [with Microsoft Word's fancy replacement for the apostrophe]), but leading and trailing punctuation will be stripped before matching (we don't want a period at the end of a title or abbreviation to muddle the logic). A hyphenated sequence (e.g., "long-running") will be considered as a single word.

Comment entered 2014-05-29 09:17:23 by Juthe, Robin (NIH/NCI) [E]

Ideally, this search logic should match the logic we used for the title field on the search the database page, for consistency and ease of understanding. We use wildcards (%) there, but I'm not sure what else is different about that search from what you've proposed here.

Comment entered 2014-05-29 09:24:05 by Boggess, Cynthia (NIH/NCI) [C]

yes, I agree. This would make use and training easier.

Comment entered 2014-05-29 09:41:17 by Kline, Bob (NIH/NCI) [C]

I just spoke with Robin. Her understanding is that we're not going to implement search by words at all. Instead, we will use the same string pattern filtering as is used on the search page. So you can disregard all of the discussion above about looking for "specific words" and ignoring word order.

Comment entered 2014-08-01 17:43:17 by Kline, Bob (NIH/NCI) [C]

Ready for user testing on DEV.

I followed the example of the journal title abbreviation field on the same filtering form, supplying implied SQL wildcards on both ends of the value entered by the user in the new field. There are other places in the EBMS where filtering by string patterns works slightly differently: in those places the user can supply the wildcards, but they aren't implied, so exact matches are supported, as well as patterns with wildcards. As always, wildcards can be inserted in the middle of the fields value to make the filtering more flexible.

I think we've discussed this before, but the heading for the form on this page is misleading: the form is not used for sorting (as the heading implies) but for filtering. I'll leave it the way it is, though, unless I'm explicitly asked to change it.

Comment entered 2014-08-04 11:39:40 by Boggess, Cynthia (NIH/NCI) [C]

I have tested this and it is successfully retrieving citations with specific words in the title. I think maybe "Title Words" would be better than "Title Pattern"

Comment entered 2014-08-04 12:24:32 by Kline, Bob (NIH/NCI) [C]

I was told by Robin (back in May) that we weren't going to implement searching by title words (which is much more complex). Was that wrong?

Comment entered 2014-08-04 12:33:03 by Juthe, Robin (NIH/NCI) [E]

If this is using the same string pattern search as is supported by the TITLE field on the search database page, perhaps we should just call it TITLE?

Bob, could you please briefly explain the difference between searching by title words and searching by title patterns? I know we discussed it and I think what you've done is fine since Cynthia seems to be getting what was expected but I can't remember the difference in meaning. Thanks.

Comment entered 2014-08-04 12:36:13 by Boggess, Cynthia (NIH/NCI) [C]

:)Call it what you want...but what you did was to allow us to search for specific words in the title. For example...I want to filter my queue by citations with the word prognosis in the title. I search %prognos% and I get those citations. It works with more than one word as well. %phase% %advanc% will retrieve citations in my queue with the words phase and advanced in the title.

Comment entered 2014-08-04 12:53:36 by Kline, Bob (NIH/NCI) [C]

Title word searching (which is much more expensive, both in terms of implementation, and response time) involves extracting (with complicated rules for how to deal with spacing and punctuation variants) all of the separate words from the titles of all the articles, and extracting each of the words in the value entered by the user in the form field, and ensuring that all of the words of the latter set are found (in any order) in the former set. So "surgical" would not match "surgically" (for example). What we've implemented here just looks for articles which have the string entered by the user anywhere in the title. The "pattern" part allows the user to put wild cards in the form field's string to make the search more flexible. This is useful (for example) if you're not sure how a word was spelled (so 'labo%r pains' matches 'labor pains' as well as 'labour pains') or you're not sure what some intervening words might be present (so 'relapsed%non-hodgkins lymphoma' would match 'relapsed non-Hodgkins lymphoma' as well as 'relapsed aggressive non-Hodgkins lymphoma'). If you don't use wild cards, you can think of this field as just title substrings (and 'surgical' matches 'surgically').

Does this help? We can change the field label to 'Title' if you prefer (and you're not worried that it would be misleading).

Comment entered 2014-08-04 13:00:03 by Kline, Bob (NIH/NCI) [C]

:)Call it what you want...but what you did was to allow us to search for specific words in the title. For example...I want to filter my queue by citations with the word prognosis in the title. I search %prognos% and I get those citations. It works with more than one word as well. %phase% %advanc% will retrieve citations in my queue with the words phase and advanced in the title.

Yes, but as implemented 'phase' has to come before 'advanced' in the title in order to show up in the results. Furthermore, 'advance' will also match 'advanced' - which it would not do if we were doing matching of words (instead of substrings and patterns).

Comment entered 2014-08-04 13:24:38 by Boggess, Cynthia (NIH/NCI) [C]

Yes so it seems patterning will be more useful to us in this case. I am usually searching my queue using single words so I don't think the word order issue you mention above will be an issue for this particular case.

Comment entered 2014-08-04 13:37:28 by Juthe, Robin (NIH/NCI) [E]

Thanks for the explanation, Bob. Sounds like we're all on the same page, then. Let's change the label to TITLE to match the search page.

Comment entered 2014-08-04 14:23:29 by Kline, Bob (NIH/NCI) [C]

Let's change the label to TITLE ...

Done.

Comment entered 2014-08-04 14:29:18 by Boggess, Cynthia (NIH/NCI) [C]

Great! Thanks this feature will make my reviewing more flexible!

Comment entered 2014-09-17 14:54:24 by Juthe, Robin (NIH/NCI) [E]

Verified on QA.

Comment entered 2014-10-29 08:55:59 by trivedim

Verified on Prod.

Elapsed: 0:00:00.000745