EBMS Tickets

Issue Number 203
Summary [Search Database] Add Option to View Unpublished ONLY, NOT listed ONLY, and Rejected ONLY
Created 2014-06-09 17:25:15
Issue Type Improvement
Submitted By Juthe, Robin (NIH/NCI) [E]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2014-08-03 08:04:00
Resolution Fixed
Path /home/bkline/backups/jira/oceebms/issue.129188
Description

The medical librarians would like to have the ability to search the database for ONLY unpublished, NOT listed, and/or Rejected citations. I think the best way to accomplish this would be to add the following new checkboxes to the Search the Database page:

__ NOT LISTED ONLY
__ REJECTED ONLY
__ UNPUBLISHED ONLY

These options would limit the search results to ONLY yield citations that had been not listed, rejected, and/or had not been published. The checkboxes should be added to the Administrator Search section. Please leave them unchecked by default.

Comment entered 2014-06-09 17:25:54 by Juthe, Robin (NIH/NCI) [E]

Adding Cynthia and Minaxi.

Comment entered 2014-08-03 08:04:00 by Kline, Bob (NIH/NCI) [C]

Ready for user review on DEV.

Comment entered 2014-08-04 11:41:54 by Boggess, Cynthia (NIH/NCI) [C]

I have tested this in dev and it seems to be successfully retrieving the ONLY citations.

Comment entered 2014-09-12 10:35:41 by Shields, Victoria (NIH/NCI) [E]

Cynthia and Minaxi have tested this one and provided the following feedback:

Rejected Only checkbox – seems to be working when limited by topic or board (see comments from OCEEBMS 204)
NOT Listed Only checkbox – We calculate the total number of citations excluded by NOT List manually by subtracting the number in Queue immediately after import from the total number of citations retrieved in search strategies. For Jan 2014 this number was calculated to be 850. Using the NOT Listed Only checkbox we got 1201. How is this number being generated because in theory it should be 850, right?
Unpublished Only Checkbox – this seems to be working and is in fact retrieving citations that are not listed, rejected or not yet reviewed by med lib.

Something else we found in the process of testing these new checkboxes…
In general the numbers generated in the citations reports should match the numbers that we can generate using the checkboxes in the search the dataset feature. This does not seem to be the case.
To get the statistics for “Total Citations Retrieved in Search Strategies”, we generated the numbers from search the database by checking the boxes “include unpublished”, “include not listed” and “include rejected” for Sept 2014 review cycle (which comprises of new data we imported for testing) and compared these numbers with the “citations imported report”.
These numbers do not match. See below chart:

citations imported report Sept 2014 Search EBMS QA

Total Citations Retrieved in Search Strategies 350
Adult Treatment 179 254
Pediatric Treatment 72 76
Screening & Prevention 90 92
Cancer Genetics 55 59

(Note that I attached a Word version of the table because the formatted didn't keep when I pasted it above.)

Comment entered 2014-09-12 16:38:15 by Kline, Bob (NIH/NCI) [C]

I believe this is the same bug I described in my last comment posted to OCEEBMS-204. Let's see if fixing that resolves these discrepancies.

Comment entered 2014-09-15 12:52:08 by Kline, Bob (NIH/NCI) [C]

Cancer Genetics 55 59

The Citations Imported report looks for articles which got the "Ready for Initial Review" status. That would exclude the four articles in the batch which didn't get that status because they were in articles that were "not-listed" for the board, and which hadn't already been imported by other jobs.

Comment entered 2014-09-17 11:51:13 by Kline, Bob (NIH/NCI) [C]

Sorry it's taken me so long to address all of your comments above. You're raised some import questions, and I'd like very much to have us all come away satisfied with the answers, and understanding exactly how the system works.

Starting with the count of articles which were rejected because they were published in a journal which the board in question does not care to use: a total of 4,062 articles were "imported" for the January 2014 cycle (where "imported" in this context means that at least one topic was assigned to the article for that cycle). Of those articles, 1,198 were rejected for at least one of those topics because the board for that topic doesn't want to see articles published in the journals in which the articles appeared. That's the number I get for the "not-listed" articles for January when I submit a database query directly, and it's what I get when I do a search on QA for the January 2014 cycle with the "NOT LISTED ONLY" box checked, so I'm going to guess you were looking at the number for tier other than QA, since the number you reported was 1,201, or three higher than what I got on QA. I can think of two conditions which would cause that number to be higher than the result you would get if you were to subtract the number of articles in your queue after all of the import jobs for the cycle were done. One would be if there were articles in your queue which were left over from the previous cycle, still awaiting a decision from you. The other condition would be the presence of articles which were picked up by the NOT LISTED ONLY search because one of the topics selected for the articles was for a board which had the articles' journals on its "NOT" list, but other topics were assigned for a different board, which didn't want to automatically reject articles from those journals, so the articles would also show up in your queue anyway. (A variation on this second condition would be if some of the import jobs were done with the "NOT LIST" box checked, which would suppress the rejection based on the articles journals.) I don't have a way to see what the size of your queue was at the time you're describing. I assume you just wrote down the number back then, right?

For the second issue you described in that same comment, I have corrected the problem I noted in my September 15 comment, so the four Genetics articles which were missed by the "Citations Imported" report are now included, and both the report and the search show 59 for that board and the September cycle. However, there are still some discrepancies which I don't totally understand, and they appear to be caused by articles which are assigned a "Published" state for a topic without having been given any earlier states. For example, the search for Pediatric Treatment board articles for the September cycle returns 76 articles (as you show in the table you posted), but the "Citations Imported" report shows only 75 for that board/cycle combination. So we're closer, but still one off. The article which was missed by the report but found by the search is EBMS ID# 328390 ("Relationship between CYP1A1 polymorphisms and invasion and metastasis of breast cancer"; clearly this was done by a developer or member of the QA team who wasn't paying any attention to the actual relevance of the article to the topic/board selected, as the topic was "Childhood Brain & Spinal Cord Tumors" - the user was recorded as Test Board Manager). There's only one row in the state table for this article for Peds in September, and that's for the Published state. There are no rows for earlier states for this article/topic combination, which is why the "Citations Imported" report doesn't pick it up. Are there places other than the "Publish Citations" page where you can publish an article/topic combination?

Alan: Do you have any ideas about how this state row could be created without the earlier states being recorded for the article/topic combo? The only way for it to show up on the "Publish Citations" page is if the combination appears in the state table with a current state of "Ready for Initial Review" (as far as I can determine).

Comment entered 2014-09-18 05:54:07 by Kline, Bob (NIH/NCI) [C]

OK, I have figured out what's going on. If you bring up the "Full Citation" page for an article which is already in the EBMS, and you add a topic to the article (either for a board which already has another topic associated with the article, or a different board which you add yourself), then the article is put directly into the Published state, and it never gets a row in the state table for ReadyInitReview. I guess that makes sense, as the decision to add the topic there obviates the need for the initial review. So I need to know if such an article-topic combination should be reflected in the counts of articles shown in the "Citations Imported" report. On the face of it, this action doesn't seem very "import"-like. In the more granular Import Report we don't include an article which had been imported previously and for which a new topic was added in the "ARTICLES IMPORTED" count (they instead get picked up for the "DUPLICATE ARTICLES" count, as well as the "ARTICLES WITH TOPICS ADDED" count. On the other hand, we've already muddied the linguistic waters a little bit by having the assignment of new topics for existing articles happen in the context of an import, and we were including such articles in the less granular "Citations Imported" report's counts (see my earlier comment, where I wrote that "'imported' in this context [the 'Citations Imported' report] means that at least one topic was assigned to the article for that cycle"). Weighing in favor of reflecting articles in the "Citations Imported" report based on the addition of a new topic to the report on the "Full Citation" page would be the librarians' assumption that "In general the numbers generated in the citations reports should match the numbers that we can generate using the checkboxes in the search the dataset feature" (see earlier comment above). I guess I need to know whether the "In general ..." part of that sentence means that there are exceptions to the "should match" rule, and we wouldn't expect the numbers from the search and from the report to match in such cases. I can make the report behave either way. Just let me know what the consensus is.

Comment entered 2014-09-18 12:00:19 by trivedim

This is a combined repy to your previous two comments.

Regarding count of articles rejected by NOT journals, your logic makes sense to us. Regarding the conditions that can cause higher numbers “One would be if there were articles in your queue which were left over from the previous cycle” is not valid as the queue is always empty before the importing begins. The number in queue is written down as soon as the import is completed for the review cycle. These numbers should be correct unless someone else imports a citation and not use the tag “fast tracked”. The other condition seems valid. “The other condition would be the presence of articles which were picked up by the NOT LISTED ONLY search because one of the topics selected for the articles was for a board which had the articles' journals on its "NOT" list, but other topics were assigned for a different board, which didn't want to automatically reject articles from those journals, so the articles would also show up in your queue anyway.”
What this means for us is that we will be depending on the manual count to have the exact number of citations rejected by NOT Journals rather than search the database.
For the second issue, now you have figured out what is going on and we understand why the search the database numbers may not match the numbers in the reports.
The present “Citations imported report” is exactly what we want and we would not like to add the other citations with topics added to that report. So the bottom line is “we wouldn't expect the numbers from the search and from the report to match”

Comment entered 2014-09-18 12:35:21 by Kline, Bob (NIH/NCI) [C]

What this means for us is that we will be depending on the manual count to have the exact number of citations rejected by NOT Journals rather than search the database.

It sounds like for your purposes "number of citations rejected by NOT Journals" means "number of articles for which ALL topics assigned during import were rejected by the NOT lists." But we still want to have the search return all article rejected by a NOT list for ANY topic assigned during import, right?

Have we covered all of your concerns, or did I miss any outstanding loose threads?

Comment entered 2014-09-18 13:12:12 by Boggess, Cynthia (NIH/NCI) [C]

Minaxi and I agree that the search the database is fine as is. What it is doing makes sense to us especially when broken down by board. All other statistics except NOT journal count have a report as well. Perhaps we could have a report that would reflect the data that we want to have in the monthly report. So total number of citations NOTed out by NOT Lists at the time of import broken down by boards and then a grand total that reflects the number of unique citations completely NOTed out.

Comment entered 2014-09-18 13:39:04 by Kline, Bob (NIH/NCI) [C]

If you decide you want such a report, add a ticket to the backlog and we'll try to implement it for the next release.

Comment entered 2014-09-18 16:44:24 by Juthe, Robin (NIH/NCI) [E]

Cynthia/Minaxi,

Can we move this issue to QA Verified? It looks like everything has been addressed in the comments (aside from a possible new report in a future release, which will have its own ticket), but I just wanted to be sure before we close this out.

Thanks,
Robin

Comment entered 2014-09-18 17:03:13 by Boggess, Cynthia (NIH/NCI) [C]

Yes, I think we have sorted this issue out. we can work on the report as a separate issue.

Comment entered 2014-09-18 17:16:32 by Juthe, Robin (NIH/NCI) [E]

OK, great. Thank you all for the thorough testing and troubleshooting! I'm marking this verified on QA.

Cynthia/Minaxi, please enter a new issue for the report whenever you're ready.

Comment entered 2014-10-29 08:57:44 by trivedim

Verified on Prod.

Attachments
File Name Posted User
OCEEBMS-203.doc 2014-09-12 10:34:55 Shields, Victoria (NIH/NCI) [E]

Elapsed: 0:00:00.000871