EBMS Tickets

Issue Number 568
Summary [Import] Consider having related citations imported and linked programmatically
Created 2020-07-15 13:56:55
Issue Type Improvement
Submitted By Juthe, Robin (NIH/NCI) [E]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2021-06-04 14:39:44
Resolution Fixed
Path /home/bkline/backups/jira/oceebms/issue.266316
Description

Adding as a placeholder for now. This needs more discussion before requirements are determined and a solution is implemented.

Comment entered 2020-07-15 14:07:51 by Kline, Bob (NIH/NCI) [C]

LOE will be created when requirements have been determined.

Comment entered 2020-08-06 09:16:56 by Juthe, Robin (NIH/NCI) [E]

We discussed this issue yesterday and thought it might be helpful to get a better sense of the number of type of citations that would be imported if we go ahead with this enhancement. I will create a separate ticket with specs for a potential ad-hoc report to gather some more information.

Comment entered 2020-08-10 16:53:16 by Kline, Bob (NIH/NCI) [C]

The report has been generated.

Comment entered 2020-08-18 15:28:24 by Kline, Bob (NIH/NCI) [C]

There are a number of questions to think about here.

  1. Do we want to expand the current import batch with the additional related articles we find which weren't in that original batch, obscuring the distinction between the articles explicitly requested and those added programmatically?

  2. Would we instead prefer to create a new batch for the programmatically added article, so we can preserve that distinction?

  3. Do we want to recurse, examining the added articles for references to related articles which aren't already picked up by looking inside the explicitly requested articles?

  4. Should this recursion be unlimited?

  5. Should the user get to review and confirm (or reject) the added articles? And possibly change some of import properties (such as the topic or the comments) for the added articles?

  6. Or is the import unconditional, with the list of additionally imported articles (and the assigned topic, etc.) unchangeable?

  7. Are we going to get wildly fancy, and not just look for articles we don't already have, but also article we do have, but for different topics?

  8. Do we want to only want to to examine articles we are importing for the first time for related articles? Or do we want to repeat this process every time an "import" request is made for an article we already have so that a new topic can be added (creating a difference in outcome depending on the path taken to add a topic)?

  9. Do we want to create a new tag for this new type of import?

Without answering all of these questions, here is one possible approach.

  • the import software examines all of the newly imported articles to find articles we don't already have

  • the requested articles are imported

  • the user is presented the form and the links to the status of the imported articles

  • the form is pre-populated with the values for the original import, except that the article selection now reflects the related articles found

  • the user reviews the form, makes any desired changes, and submits the request, if appropriate

  • the process repeats, with the possibility (unlikely though it may be) that this second import may find articles which the first pass didn't find

This approach is safest, somewhat easier to implement, and provides the user with maximum flexibility and control.

Comment entered 2020-08-26 11:00:10 by Juthe, Robin (NIH/NCI) [E]

OK, I think we've come up with a proposed plan:

 

We'd like to implement this as you proposed but only for the following Boards: Adult, Cancer Genetics, and Screening & Prevention. We don't want to implement this for citations that are only associated with IACT, Supportive & Palliative Care, or Pediatric treatment.

 

Additionally, we'd like to implement this only for core journals, using the same list of 13 core journals that we use elsewhere in the system (let me know if you need me to send this list).

 

Also, as we discussed, we'd like the "related citation" link to be added automatically. Please use the link type "OTHER" with the following comment: "linked programmatically upon import" followed by the current date (YYYY-MM-DD). We can always go back and edit the link type if needed.

 

It's fine for this to be recursive since, as we discussed, this is unlikely to turn up additional related citations given the reciprocal relationship. And we'd like for this to be a separate import in the manner you had proposed. A separate tag for these citations is not needed.

 

Please let me know if you have any additional questions. Thank you!!

Comment entered 2020-08-26 16:21:37 by Kline, Bob (NIH/NCI) [C]

Can we assume that "for core journals" refers to the journal in which the imported article appears, rather than the journal in which the related article appears (which would require an extra retrieval from NLM)? In most cases, the two articles will be in the same journal, so unlikely to have much of an impact on the results.

Comment entered 2020-08-26 17:32:10 by Juthe, Robin (NIH/NCI) [E]

Yes, we can base this on the journal in which the imported article appears. Thanks!

Comment entered 2020-08-26 17:54:00 by Kline, Bob (NIH/NCI) [C]

Story points reflects database modifications to track which boards get this new functionality and to make that setting editable in the administrative interface.

Comment entered 2020-09-02 11:23:47 by Kline, Bob (NIH/NCI) [C]

OK, I think I've got this working. Installed on DEV and QA.

Comment entered 2020-09-21 14:20:07 by Juthe, Robin (NIH/NCI) [E]

We're still testing this on QA, but I had expected it to work for duplicate citations. For example, I imported 32302077 (a NEJM article) ** and was presented with a screen to import related citations. However, before I imported those related citations, I opened another tab and re-imported the same article. It was a "duplicate", so no actions were taken. I had assumed that related citations would still be identified and imported. Is this possible? Do you see a danger in importing related citations of duplicates? I think this would be helpful, but there may be repercussions I'm not thinking of.

Comment entered 2020-09-21 14:54:14 by Kline, Bob (NIH/NCI) [C]

Ah, I thought you had said you wanted to go with my proposed approach, which was to just examine the new imported articles. Back to the drawing board. 🙂

Comment entered 2020-09-24 16:58:33 by Juthe, Robin (NIH/NCI) [E]

Victoria noticed another issue with the related citations import today. If the first article you import was given a Fast Track flag with the placement level "On agenda" with a particular meeting date, when the page is refreshed with the related article(s), On agenda remains selected but no meeting is selected and none are available on the picklists from which to select a meeting date.

Comment entered 2020-09-24 20:07:59 by Juthe, Robin (NIH/NCI) [E]

Minaxi also came across a citation today that we think should have had a related citation called up for import. The PMID is 32955176, which is a NEJM article that she imported for a Cancer Genetics topic on QA. It has a commentary linked in PubMed (PMID 32955176). This related citation wasn't presented for import after successfully importing the article. I replicated this on DEV.

Comment entered 2020-09-24 21:38:33 by Kline, Bob (NIH/NCI) [C]

Well, this is tricky. We have to make decisions about how to handle the import based on the XML we get from NLM at the time of import, not what they may add to the document later on. Right now it's true that the XML for this article has the CommentsCorrections block we would need in order to piggyback the related article on the first import.

However, this is what the software got at the time the import was requested:

As you can see, looking at this earlier version of the XML the CommentsCorrections block isn't there. I totally understand that this makes it difficult if to tell if the software has behaved correctly. I guess you would have to pull down the XML from NLM yourself before the import request as well as after to have a better chance of figuring it out.

Comment entered 2020-09-25 06:35:25 by Kline, Bob (NIH/NCI) [C]

This is why I had warned early in our discussions for this enhancement that it might not work as you hoped if you import an article as soon as it gets into NLM's database, but before they get around to adding the relationships to other articles. You had expressed optimism that they generally do everything at once for most related articles, but this evidence seems to confirm that it doesn't always happen that way. 🙁

Comment entered 2020-09-25 06:48:23 by Kline, Bob (NIH/NCI) [C]

The only thing I can think of that might help with this fly in the ointment would be to have the job which refreshes the XML from NLM go through looking for related articles and report them to you (restricted, I assume, to the the refresh of articles published in "core" journals linked to boards which are interested in this feature). In another release, I think.

Comment entered 2020-09-25 07:32:56 by Kline, Bob (NIH/NCI) [C]

I have modified the logic so that we apply this enhancement to articles in core journals every time they are imported, not just the first time. The problem Victoria stumbled on with the meetings has been a bug in the system from the start. Fixed, I think. These changes are on DEV. If they look OK to you on that tier, I'll commit the changes and deploy them to QA.

Comment entered 2020-09-25 10:14:12 by Juthe, Robin (NIH/NCI) [E]

That does make it hard to test, since the PubMed record now includes the comment. From talking with Cynthia and Minaxi, the timing of adding those comments is hard to predict, so this may not be too uncommon of an occurrence. It seems like a good idea to have the XML refresh job pick up related articles, restricted to those we're interested in. I'll put a ticket in the backlog to consider that in the next release. I think this will be a learning experience in many ways so we'll be better informed about potential changes by the time of the next release. Thanks for the explanation.

Comment entered 2020-09-25 10:55:12 by Juthe, Robin (NIH/NCI) [E]

I'm testing the import of a duplicate citation on DEV, and thought I'd use the article Minaxi had reported (PMID 32955176) as a good test case since it's a duplicate now but the XML has been updated to now include a related citation. However, it still isn't picking up the related citation, so I want to make sure I understand how this works.

Do we always rely only on the XML from the original import, even in the case of duplicate citations that are re-imported? If that's the case, I can see that the application of this related citation rule to apply to re-imported citations will be most beneficial for older citations, rather than those newer articles that may have just recently had a related citation added to the XML in PubMed.

Am I understanding this correctly? Thanks again. This is complicated!

Comment entered 2020-09-25 11:55:45 by Kline, Bob (NIH/NCI) [C]

We do not rely on the XML from the original report, but refresh it when a duplicate is brought in. There was one more modification I needed to make to implement this change in direction, and I've made it now so I'd like you to give it another try (still on DEV). I think that the only thing which would prevent your test from succeeding now would be if the related article has already come into the system.

Comment entered 2020-09-25 15:03:35 by Juthe, Robin (NIH/NCI) [E]

That's good to know, thanks.

 

I just tried reimporting PMID 32955176 on DEV but I'm still not seeing a prompt to import a related article. I checked and the related article (PMID: 32955175) is not in the system.

Comment entered 2020-09-26 08:17:18 by Kline, Bob (NIH/NCI) [C]

Found the typo. It works now, but you'll have to find another related article to test with, because 32955175 is not in the system. 🙂

Comment entered 2020-09-29 09:26:31 by Juthe, Robin (NIH/NCI) [E]

I just tried re-importing a duplicate citation (PMID:32955176) and the related citation was added to the import screen, but the message at the top of the page included a lot of extra information in the green box that I'm not used to seeing. See screenshot below.

Comment entered 2020-09-29 09:42:34 by Kline, Bob (NIH/NCI) [C]

Ah, sorry about that. I had backed out from GitHub the debugging calls I used while I was getting this to work properly, but neglected to deploy the final version to the live code set. Please try again.

Comment entered 2020-09-29 10:42:45 by Juthe, Robin (NIH/NCI) [E]

Thanks, Bob. The message in the green box is cleaned up, but I'm now caught in an endless loop. The system keeps identifying related (duplicate) citations. I imported 32955176 and then the related citation 32955175 but them was prompted to again import 32955176 and then again 32955175. This pattern continues with the same two citations.

Comment entered 2020-09-29 11:00:42 by Kline, Bob (NIH/NCI) [C]

Aargh! We're going to get this right eventually. Please try again (on DEV).

Comment entered 2020-09-30 10:49:30 by Juthe, Robin (NIH/NCI) [E]

I think this is working as expected now. Could you please apply these changes to QA and I'll ask Cynthia and Minaxi to do some additional testing there as well? Thank you!

Comment entered 2020-09-30 11:41:21 by Kline, Bob (NIH/NCI) [C]

All of the modifications are checked in, and I have re-run the deployment script on QA, so that tier is ready for the librarians to do their testing. Thanks for your patience while I thrashed through this one. 🙂

Comment entered 2020-10-01 21:24:48 by Juthe, Robin (NIH/NCI) [E]

Verified on QA with individual citations, batches of citations, and re-imported duplicate citations. I think we're ready to proceed!

Comment entered 2020-10-12 16:26:49 by Juthe, Robin (NIH/NCI) [E]

Verified on PROD.

Comment entered 2021-06-02 23:11:48 by Juthe, Robin (NIH/NCI) [E]

Reopening this to discuss the possibility of limiting the programmatically identified related citations to only those that are in the same journal as the original imported article. We've found that sometimes the related articles are found in various lesser journals and we don't necessarily want to see those/share them with our Boards.

Comment entered 2021-06-04 14:39:11 by Juthe, Robin (NIH/NCI) [E]

We decided to open a separate ticket for this purpose. See OCEEBMS-600.

Attachments
File Name Posted User
image-2020-09-24-21-32-53-852.png 2020-09-24 21:32:54 Kline, Bob (NIH/NCI) [C]
image-2020-09-24-21-34-09-921.png 2020-09-24 21:34:10 Kline, Bob (NIH/NCI) [C]
image-2020-09-29-09-25-23-167.png 2020-09-29 09:25:23 Juthe, Robin (NIH/NCI) [E]

Elapsed: 0:00:00.000650