Issue Number | 571 |
---|---|
Summary | Journal NOT list not excluding all journals on lists |
Created | 2020-08-28 14:30:54 |
Issue Type | Bug |
Submitted By | Boggess, Cynthia (NIH/NCI) [C] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2020-09-02 12:30:17 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/oceebms/issue.273833 |
We have identified a problem with NOT journal filter not working for some journals.
While testing a Thyroid cancer file to identify the NOT journals, we imported the file in Adult Board, Thyroid cancer summary in a TEST mode.
The import report showed only 3 journals as NOT journals. As we had doubt about some journals in the file, we checked all the journals from the file in the EBMS.
In the attached thyroid cancer file only the following journals should have remained...all other journals should have been displayed as NOT journal, but did not.
Oncology
Semin Oncol
JAMA Oncol
Eur J Surg Oncol
Oral Oncol
J Surg Oncol
Head Neck
We suspect that this is a problem with import and well as the test mode and for many more journal than just those represented in this thyroid file as we have experienced this issue with at least 6 other files in past 2 days. Also we have only tested this with the adult treatment board. We suspect it will be an issue for all boards but will test the others and follow up here.
Can you give me a specific PMID for an article which should have been not-listed but was not?
I ran the PubMed search results file on DEV in test and then in live mode. I am unable to find any cases in which the software failed to identify the journal in which an article appeared as on the "NOT" list. Because I was on DEV, which is behind the production server, there were many more articles in the batch which had already been imported into the EBMS on PROD than on DEV, but all three of the articles in the Adult Treatment test for this search results set on PROD which had not yet been imported are identified as on the "NOT" list for both PROD and DEV, as far as I can tell. It's important to remember that the "NOT" list doesn't prevent import of the article, it just prevents it from getting into the review queues (unless you check the box for overriding the "NOT" list. And (again, as far as I can tell) this has been true since the beginning. If I've missed something, please let me know.
One other thing which might be muddying the waters is that we have a business rule that we don't apply the "NOT" list to an article which has ever been assigned a state before (so we don't override a human decision). Let me do some more digging to make sure that's what's going on for the other articles.
Hmm, I think we actually do have a problem. It is true that all but the three which were marked as not-listed already did have a state assigned to them. However, for 51 of those 86 other articles the state was Rejected by NOT list and only 35 of them had some other state (in every case, a rejection at some point along the way (either in the initial review or by the board manager).
I can think of two possible explanations.
The original business requirement that articles with an existing state not be excluded by the "NOT" list was stipulated (and implemented) before the decision to treat "NOT" list exclusion as a state (perhaps it was not in the original cite management system; I don't recall), and the requirement didn't get revisited after that decision.
Alan assumed that the users wouldn't try to import the same document for the same topic more than once.
I think the first explanation is the more likely of the two (Alan didn't tend to make such risky assumptions).
In any case, this is a bug which needs to be fixed.
Presumably, this has never been caught before because in the unlikely event that Alan had made the assumption behind that second possible explanation above, he would have been mostly right. 😉
@robin - Is this something which should be included in Canyonlands?
Thanks, Bob. Sorry I failed to see your updates above, I thought I added myself to this issue but apparently I hadn't! In any event, thank you for tracking down this problem. I think it should go into Canyonlands if possible.
In addition to the monthly searches, We run special search requests that may cover as far back as 10 years. Many citations from these special searches are already in the EBMS and do in fact get imported for the same summary topic but are assigned a new review cycle.
Well, I thought I had identified the problem, but according to Cynthia, she's seeing behavior which would not be explained by what I found in Alan's code. I believe what I saw should cause the import job to ignore the "NOT" list for articles which articles which already have a state regardless of the test flag. If she's really seeing differing "NOT" list behavior depending on whether we're running in test mode, then I have more digging to do. I'll need specific cases (with times run, and PMIDs) to examine.
Here are the results of a test Minaxi ran this morning.
I am attaching a file of 26 citations for September 2020 adult ALL. In
PROD, Minaxi imported this citation file using the test mode check box
on import page. Test mode indicated that 24 citations would be imported
and that 13 would be eliminated by the NOT journal filter. She then
imported the citations to the database and the import report indicated
that 24 were imported and 13 were NOT listed. She then imported again
using the test mode check box which indicated that 26 were duplicates, 0
were imported and that 0 were NOT listed. Our question is why test mode
is not showing the 13 NOT listed citations? Regardless of whether they
are duplicates, they should still be NOT listed and show as such in the
test mode import report.
Some more context…
The thyroid file we attached above contains the results of a special
search request going back 5 years. The vast majority of the citations we
retrieved for this retro search we knew were already in the EBMS. Before
sending Victoria these results, we imported the file using the test mode
so that we could identify how many were NOT listed. And using the import
report from the test mode we can gather all the PMIDS for the citations
NOT listed and then manually remove them from our text file. When we did
this we noticed that only a few citations were NOT listed for the
thyroid file even though we recognized many more that were in fact NOT
journals. We ended up having to search every journal title for all the
citations using the EBMS’s Journal Maintenance feature and then removed
all the NOT listed citations.adult_ALL_Sept20.txt
Often these special search requests do not get imported into the EBMS because the BM wants a file to send directly to specific board members instead. And since these searches go back 5-10 years most of the citations have already been picked up by our monthly searches and are therefore already in the database. So we use the import test mode for these searches so that we can eliminate all of the citations from NOT listed journals.
I'm confused. The activity you just described does not sound like test mode was behaving differently than live mode (ignoring the discrepancy between 24 vs. 26, which I assume was just a typographical error). In both cases the software identified as rejected by the "NOT" list only those articles which were published in journals on that list AND did not already have a state in the database.
Can you point me to a pair of import requests (with and without test mode) which illustrate the behavior you described in the email thread:
My testing this morning seems to indicate that the NOT journals are getting recognized upon true import but not in the test mode…at least for the adult board.
What I need includes:
the time(s) when the import requests were submitted
at least one PMID for an article which did not already have a state in the database and was rejected by the "NOT" list for one of the requests in the pair but not for the other request.
No typo, 26 in the file 24 were imported. Two of the 26 were duplicates.
I can not give you these examples because they do not exist. Yes these leukemia citations and the thyroid citations are excluded by the NOT journal listl. We can see that in the full record for each in the EBMS. So if they are assigned that state and that state remains why would the above test mode import ever not indicate and list them as such? As I mentioned above the test mode first listed 13 of the ALL citations as NOT listed and then the second time it listed 0. And when we see something like this we have to ask if the NOT journal filters were working.
Although, if I am understanding correctly, I think the issue then is with the test mode display. We would like it to always show us the citations NOT listed.
So if they are assigned that state and that state remains why would the above test mode import ever not indicate and list them as such?
I tried to explain what's going on in an earlier comment above. Let me try again.
When Alan implemented the logic for applying the "NOT" list filters he was working with a business rule which told him not to use those filters for any article which already had a state in the database for the import request's topic. As he wrote in his comments above the code, "If it was previously processed for this topic, we don't want to reject it now. That could override a human's decision."
After he wrote that code, a decision was made to track this "NOT" list filtering as a state, and record it as such in the database's state table, but the logic above was not revisited to account for this change. This results in the behavior you reported above for the thirteen articles which had not yet been recorded in the state table before the sequence of requests you described.
The first import request is submitted. No states are found for the articles in the database, and the articles were flagged as rejected by the "NOT" list. Since this request was in test mode, no states were recorded.
The second import request is submitted. No states exist in the database for the articles, so the articles are again flagged as rejected by the "NOT" list. This request is submitted in live mode, so these rejections are recorded in the state table. Now those articles have a recorded state for the request's topic.
The third import request is submitted. Because Alan finds an existing state for each of the articles he avoids applying the "NOT" list filter, and a new state is assigned as the current state.
As I noted in an earlier comment on this ticket, this is a bug which needs to be corrected. The business rule which Alan's code has implemented should be to avoid applying the "NOT" list filter for an article/state combination already has a state in the database unless the current state for the article/topic combination is Rejected by NOT list.
However, I am reluctant to proceed with the modification of the import logic until we are all confident that there is not a second bug, which causes the live mode requests to apply the "NOT" list filters which would not have been applied for the same request if the mode had been test instead of live.
Unless I have missed something in my own examination of the code, I believe that as it is currently written, the import software will NEVER apply the "NOT" list filters to an article if it finds an existing state for the article/topic combination, regardless of whether the request is made in test or in live mode, and the software will ALWAYS apply those filters to an article if it finds no existing state for the article/topic combination, again regardless of whether the request is made in test or in live mode.
If you agree with this description of the current behavior of the software, I will go ahead and fix the bug so that the import software applies the "NOT" list filters to an article if the current state is recorded as Rejected by NOT list.
But if you still believe that there is a second bug, which results in test and live mode behaving differently for the same request, then we need to track down the cause of that bug (and for that we'll need to be able to reproduce it).
Does this more detailed explanation help?
So it seems Minaxi and I have been misinterpreting the results listed in the test mode. We have been using it to identify citations excluded by NOT journal lists but it is now clear that the test mode is only reporting the state changes that would be made if imported and doing so accurately. And in most cases reporting state changes has been equivalent to identifying NOT listed citations, but not in the case where the majority of the citations are already in the ebms and already have the NOT listed state...as with the case of our thyroid citations.
We do not have any evidence to indicate that the NOT journal filters are not working as they should. Our only issue was with the numbers displayed in the import test mode that we thought may be a bigger problem.
I think we should consider this issue resolved. Unfortunately the test mode has lost some of its utility for our purposes and we will need to figure out a way to address that.
I think we should still fix the bug Bob reported above having to do with the NOT list filter being bypassed upon subsequent import if an article has been previously rejected by the NOT list. As Bob proposes, we should avoid applying the "NOT" list filter for an article/state combination that already has a state in the database unless the current state for the article/topic combination is Rejected by NOT list.
Cynthia, we should talk some more about how you might be able address the lost utility of the test import in another way. Some initial thoughts are to develop a new report for this purpose (likely for a future release since we're running out of time for Canyonlands) or to import all special search citations into the EBMS. I can ask the Board managers about pros/cons of the latter approach from our perspective. Personally, I prefer to have everything in the EBMS to review but I realize others may feel differently. Thanks again for bringing this up!
yes, I agree the bug that Bob has identified should still be addressed but it is not part of the issue that Minaxi and I originally reported. It is a complicated issue that will require more thought. I think the original set up may have been in place to account for journals that get removed from the NOT list after having been on the list for a while. For monthly searches this is less of an issue as the latest NOT journal list should be reflected but for retro searches going back 5-10 years, changes to NOT journal lists may be more relevant. What state do you apply to a citation that has been in the database for 2 years as a NOT journal when it gets imported again as a special search under a revised NOT journal list where it is no longer excluded? Does it remain as a NOT journal or should it have a state change? I am not sure how often journals are removed from the NOT journal lists, but I have seen citations that came in as NOT journals that later were fast tracked. what does fast tracking do to the NOT listed state?
And yes, importing all results to the EBMS and not sending text files to BMs would eliminate the need for us to use the test mode to try to identify NOT journal citations to eliminate manually from such files. BMs may be more in favor of this policy if the EBMS had a way to display lists of citations with abstracts rather than just links to abstracts. Currently there is a display option on the search the database screen for Board Member Version but it does not include the abstract just a link.
BMs may be more in favor of this policy if the EBMS had a way to display lists of citations with abstracts rather than just links to abstracts.
You mean something like this?
yes but available for search results as this feature is currently only available from my queue and the BMs queue. Of course the BMs would need to decide if this could work instead of text files and they may want a means to download the list as a file to send to their board members.
I think I have a fix in place on DEV and QA for the bug identified above. I ran the thyroid batch attached to the issue three times on QA: once in test mode, the second time in live mode, and finally in test mode again. The same articles were "NOT LISTED" all three times.
Just to make sure everyone understands: this bug fix will have no effect on articles which — because of this bug — have a different current status than Rejected by NOT list assigned before the bug was fix.
Using the same adult ALL file with 26 citations (where in PROD test showed 13 NOT listed, live showed 13 NOT listed and 2nd test showed 0) now in QA is showing 13 NOT listed for all three...test, live and 2nd test.
I just did a second live import in QA with the adult ALL file. Still seeing 13 NOT listed and in the full record display under import process the second live import is now displaying NOT listed in addition to Duplicate, Not Imported. Currently in PROD, duplicate citations only show as NOT listed on the first import and only show Duplicate, Not Imported on the second.
Verified on QA.
File Name | Posted | User |
---|---|---|
adult_ALL_Sept20.txt | 2020-08-31 12:33:07 | Boggess, Cynthia (NIH/NCI) [C] |
image-2020-08-28-17-44-48-327.png | 2020-08-28 17:44:49 | Kline, Bob (NIH/NCI) [C] |
image-2020-09-01-14-51-54-697.png | 2020-09-01 14:51:54 | Kline, Bob (NIH/NCI) [C] |
image-2020-09-02-12-29-49-183.png | 2020-09-02 12:29:49 | Kline, Bob (NIH/NCI) [C] |
NOTlisted_appears_second_import.docx | 2020-09-02 16:54:24 | Boggess, Cynthia (NIH/NCI) [C] |
thyroid_reviews_102cits_pubmed.txt | 2020-08-28 14:29:07 | Boggess, Cynthia (NIH/NCI) [C] |
Elapsed: 0:00:00.000264