EBMS Tickets

Issue Number 600
Summary [Import] Revise related citation programmatic identification to match original journal
Created 2021-06-04 14:36:59
Issue Type Improvement
Submitted By Juthe, Robin (NIH/NCI) [E]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2021-08-02 17:16:50
Resolution Fixed
Path /home/bkline/backups/jira/oceebms/issue.291712
Description

We'd like to discuss the possibility of limiting the programmatically identified related citations to only those that are in the same journal as the original imported article. We've found that sometimes the related articles are found in various lesser journals and we don't necessarily want to see those/share them with our Boards.

Comment entered 2021-07-13 14:28:36 by Kline, Bob (NIH/NCI) [C]

Can we have any discussion which is still needed at this Thursday's weekly status meeting?

Comment entered 2021-07-23 10:25:19 by Juthe, Robin (NIH/NCI) [E]

I don't think we need further discussion from our end since you've already answered that this is a possibility. 🙂 We'd like to proceed with this limitation. Thanks.

Comment entered 2021-07-29 13:08:47 by Kline, Bob (NIH/NCI) [C]

To implement this, I think we will need to fetch each of the related articles from NLM twice: once (to find out which journal the article is in) before putting the PMID on the followup import page, and a second time when the user submits that page.

Comment entered 2021-07-30 16:33:12 by Kline, Bob (NIH/NCI) [C]

Some thoughts about this ticket:

  • I may not have to fetch the article XML from NLM an extra time, as there's a RefSource sibling element next to the PMID element in the CommentsCorrections block, which appears to have the short title abbreviation as the first substring in the text content for that element. I'm doing some analysis to see if I can determine how reliable that would be

  • While doing this analysis I noticed that for the overwhelming majority of the related articles they're both in the same journal. That got me wondering about the tradeoff for the risks inherent in implementing this (suppose whatever approach we adopt isn't as foolproof as we think it is, and we inadvertently miss importing some related documents we wanted) versus the relatively low percentage of articles which would be filtered out. Of course my sample may not be sufficiently representative of what you see in your more extensive experience.

Comment entered 2021-07-30 17:26:38 by Kline, Bob (NIH/NCI) [C]

Looks like I'm going to have to abandon the more efficient approach to checking the journals for the related articles. For well over 99% of the articles, NLM uses the format I describe above, with the brief journal title followed by a period as the first portion of the RefSource element's text content. But then every once in a while you come across a block where someone has put something like this in that element.

 

Abdelkefi A, Ladeb S, Torjman L, Ben Othman T, Lakhal A, Ben Romdhane N, Elloumi M, Jeddi R, Aissaouï L, Ben Hassen A, Msadek F, Saad A, Hsaïri M. Blood. 2009 Jun 11;113(24):6265

So this is not a reliable approach, and as feared it could wrongly reject an article that we would have wanted to import.

Comment entered 2021-08-02 17:16:50 by Kline, Bob (NIH/NCI) [C]

OK, I think I've got this working on DEV (after I finally figured out that failures I was getting were caused by the fact that NLM will not accept more than 3 requests per second). Testing is trickier than for most tickets, because you have to track down core journal articles which have related articles some of which are in the same journal and some of which are not.

Comment entered 2021-08-27 16:19:51 by Boggess, Cynthia (NIH/NCI) [C]

I was able to identify two non-core related citations commenting on core citations from previous review cycles in prod. I imported the corresponding text files in dev. After import, neither were included in the list of pmids for related citations. So it looks like this is working. 

Verified in dev.

Comment entered 2021-09-14 14:36:33 by Boggess, Cynthia (NIH/NCI) [C]

Tested on QA, see 33200890 and 29554195 both of which are core journal articles with comments (related citations) from the same core journal as well as from other journals. Only same journal related citations were imported. This seems to be working correctly.

Comment entered 2021-09-21 21:52:24 by Juthe, Robin (NIH/NCI) [E]

Thanks for your comment, Cynthia! Verified on QA.

Comment entered 2021-10-18 14:00:09 by Osei-Poku, William (NIH/NCI) [C]

"We have tested and seems to be working fine but we will need to watch it for a few more review cycles before we know for sure. I think it can be closed for now".

Comment entered 2021-10-21 13:19:58 by Shields, Victoria (NIH/NCI) [E]

Agree, we can't really test this on PROD but we can reopen the issue if needed. Closing issue.

Elapsed: 0:00:00.000733