Issue Number | 568 |
---|---|
Summary | [Import] Consider having related citations imported and linked programmatically |
Created | 2020-07-15 13:56:55 |
Issue Type | Improvement |
Submitted By | Juthe, Robin (NIH/NCI) [E] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2021-06-04 14:39:44 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/oceebms/issue.266316 |
Adding as a placeholder for now. This needs more discussion before requirements are determined and a solution is implemented.
LOE will be created when requirements have been determined.
We discussed this issue yesterday and thought it might be helpful to get a better sense of the number of type of citations that would be imported if we go ahead with this enhancement. I will create a separate ticket with specs for a potential ad-hoc report to gather some more information.
The report has been generated.
There are a number of questions to think about here.
Do we want to expand the current import batch with the additional related articles we find which weren't in that original batch, obscuring the distinction between the articles explicitly requested and those added programmatically?
Would we instead prefer to create a new batch for the programmatically added article, so we can preserve that distinction?
Do we want to recurse, examining the added articles for references to related articles which aren't already picked up by looking inside the explicitly requested articles?
Should this recursion be unlimited?
Should the user get to review and confirm (or reject) the added articles? And possibly change some of import properties (such as the topic or the comments) for the added articles?
Or is the import unconditional, with the list of additionally imported articles (and the assigned topic, etc.) unchangeable?
Are we going to get wildly fancy, and not just look for articles we don't already have, but also article we do have, but for different topics?
Do we want to only want to to examine articles we are importing for the first time for related articles? Or do we want to repeat this process every time an "import" request is made for an article we already have so that a new topic can be added (creating a difference in outcome depending on the path taken to add a topic)?
Do we want to create a new tag for this new type of import?
Without answering all of these questions, here is one possible approach.
the import software examines all of the newly imported articles to find articles we don't already have
the requested articles are imported
the user is presented the form and the links to the status of the imported articles
the form is pre-populated with the values for the original import, except that the article selection now reflects the related articles found
the user reviews the form, makes any desired changes, and submits the request, if appropriate
the process repeats, with the possibility (unlikely though it may be) that this second import may find articles which the first pass didn't find
This approach is safest, somewhat easier to implement, and provides the user with maximum flexibility and control.
OK, I think we've come up with a proposed plan:
We'd like to implement this as you proposed but only for the following Boards: Adult, Cancer Genetics, and Screening & Prevention. We don't want to implement this for citations that are only associated with IACT, Supportive & Palliative Care, or Pediatric treatment.
Additionally, we'd like to implement this only for core journals, using the same list of 13 core journals that we use elsewhere in the system (let me know if you need me to send this list).
Also, as we discussed, we'd like the "related citation" link to be added automatically. Please use the link type "OTHER" with the following comment: "linked programmatically upon import" followed by the current date (YYYY-MM-DD). We can always go back and edit the link type if needed.
It's fine for this to be recursive since, as we discussed, this is unlikely to turn up additional related citations given the reciprocal relationship. And we'd like for this to be a separate import in the manner you had proposed. A separate tag for these citations is not needed.
Please let me know if you have any additional questions. Thank you!!
Can we assume that "for core journals" refers to the journal in which the imported article appears, rather than the journal in which the related article appears (which would require an extra retrieval from NLM)? In most cases, the two articles will be in the same journal, so unlikely to have much of an impact on the results.
Yes, we can base this on the journal in which the imported article appears. Thanks!
Story points reflects database modifications to track which boards get this new functionality and to make that setting editable in the administrative interface.
OK, I think I've got this working. Installed on DEV and QA.
We're still testing this on QA, but I had expected it to work for duplicate citations. For example, I imported 32302077 (a NEJM article) ** and was presented with a screen to import related citations. However, before I imported those related citations, I opened another tab and re-imported the same article. It was a "duplicate", so no actions were taken. I had assumed that related citations would still be identified and imported. Is this possible? Do you see a danger in importing related citations of duplicates? I think this would be helpful, but there may be repercussions I'm not thinking of.
Ah, I thought you had said you wanted to go with my proposed approach, which was to just examine the new imported articles. Back to the drawing board. 🙂
Victoria noticed another issue with the related citations import today. If the first article you import was given a Fast Track flag with the placement level "On agenda" with a particular meeting date, when the page is refreshed with the related article(s), On agenda remains selected but no meeting is selected and none are available on the picklists from which to select a meeting date.
Minaxi also came across a citation today that we think should have had a related citation called up for import. The PMID is 32955176, which is a NEJM article that she imported for a Cancer Genetics topic on QA. It has a commentary linked in PubMed (PMID 32955176). This related citation wasn't presented for import after successfully importing the article. I replicated this on DEV.
Well, this is tricky. We have to make decisions about how to handle
the import based on the XML we get from NLM at the time of import, not
what they may add to the document later on. Right now it's true that the
XML for this article has the CommentsCorrections
block we
would need in order to piggyback the related article on the first
import.
However, this is what the software got at the time the import was requested:
As you can see, looking at this earlier version of the XML the
CommentsCorrections
block isn't there. I totally understand
that this makes it difficult if to tell if the software has behaved
correctly. I guess you would have to pull down the XML from NLM yourself
before the import request as well as after to have a better
chance of figuring it out.
This is why I had warned early in our discussions for this enhancement that it might not work as you hoped if you import an article as soon as it gets into NLM's database, but before they get around to adding the relationships to other articles. You had expressed optimism that they generally do everything at once for most related articles, but this evidence seems to confirm that it doesn't always happen that way. 🙁
The only thing I can think of that might help with this fly in the ointment would be to have the job which refreshes the XML from NLM go through looking for related articles and report them to you (restricted, I assume, to the the refresh of articles published in "core" journals linked to boards which are interested in this feature). In another release, I think.
I have modified the logic so that we apply this enhancement to articles in core journals every time they are imported, not just the first time. The problem Victoria stumbled on with the meetings has been a bug in the system from the start. Fixed, I think. These changes are on DEV. If they look OK to you on that tier, I'll commit the changes and deploy them to QA.
That does make it hard to test, since the PubMed record now includes the comment. From talking with Cynthia and Minaxi, the timing of adding those comments is hard to predict, so this may not be too uncommon of an occurrence. It seems like a good idea to have the XML refresh job pick up related articles, restricted to those we're interested in. I'll put a ticket in the backlog to consider that in the next release. I think this will be a learning experience in many ways so we'll be better informed about potential changes by the time of the next release. Thanks for the explanation.
I'm testing the import of a duplicate citation on DEV, and thought I'd use the article Minaxi had reported (PMID 32955176) as a good test case since it's a duplicate now but the XML has been updated to now include a related citation. However, it still isn't picking up the related citation, so I want to make sure I understand how this works.
Do we always rely only on the XML from the original import, even in the case of duplicate citations that are re-imported? If that's the case, I can see that the application of this related citation rule to apply to re-imported citations will be most beneficial for older citations, rather than those newer articles that may have just recently had a related citation added to the XML in PubMed.
Am I understanding this correctly? Thanks again. This is complicated!
We do not rely on the XML from the original report, but refresh it when a duplicate is brought in. There was one more modification I needed to make to implement this change in direction, and I've made it now so I'd like you to give it another try (still on DEV). I think that the only thing which would prevent your test from succeeding now would be if the related article has already come into the system.
That's good to know, thanks.
I just tried reimporting PMID 32955176 on DEV but I'm still not seeing a prompt to import a related article. I checked and the related article (PMID: 32955175) is not in the system.
Found the typo. It works now, but you'll have to find another related article to test with, because 32955175 is not in the system. 🙂
I just tried re-importing a duplicate citation (PMID:32955176) and the related citation was added to the import screen, but the message at the top of the page included a lot of extra information in the green box that I'm not used to seeing. See screenshot below.
Ah, sorry about that. I had backed out from GitHub the debugging calls I used while I was getting this to work properly, but neglected to deploy the final version to the live code set. Please try again.
Thanks, Bob. The message in the green box is cleaned up, but I'm now caught in an endless loop. The system keeps identifying related (duplicate) citations. I imported 32955176 and then the related citation 32955175 but them was prompted to again import 32955176 and then again 32955175. This pattern continues with the same two citations.
Aargh! We're going to get this right eventually. Please try again (on DEV).
I think this is working as expected now. Could you please apply these changes to QA and I'll ask Cynthia and Minaxi to do some additional testing there as well? Thank you!
All of the modifications are checked in, and I have re-run the deployment script on QA, so that tier is ready for the librarians to do their testing. Thanks for your patience while I thrashed through this one. 🙂
Verified on QA with individual citations, batches of citations, and re-imported duplicate citations. I think we're ready to proceed!
Verified on PROD.
Reopening this to discuss the possibility of limiting the programmatically identified related citations to only those that are in the same journal as the original imported article. We've found that sometimes the related articles are found in various lesser journals and we don't necessarily want to see those/share them with our Boards.
We decided to open a separate ticket for this purpose. See OCEEBMS-600.
File Name | Posted | User |
---|---|---|
image-2020-09-24-21-32-53-852.png | 2020-09-24 21:32:54 | Kline, Bob (NIH/NCI) [C] |
image-2020-09-24-21-34-09-921.png | 2020-09-24 21:34:10 | Kline, Bob (NIH/NCI) [C] |
image-2020-09-29-09-25-23-167.png | 2020-09-29 09:25:23 | Juthe, Robin (NIH/NCI) [E] |
Elapsed: 0:00:00.000650