Issue Number | 600 |
---|---|
Summary | [Import] Revise related citation programmatic identification to match original journal |
Created | 2021-06-04 14:36:59 |
Issue Type | Improvement |
Submitted By | Juthe, Robin (NIH/NCI) [E] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2021-08-02 17:16:50 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/oceebms/issue.291712 |
We'd like to discuss the possibility of limiting the programmatically identified related citations to only those that are in the same journal as the original imported article. We've found that sometimes the related articles are found in various lesser journals and we don't necessarily want to see those/share them with our Boards.
Can we have any discussion which is still needed at this Thursday's weekly status meeting?
I don't think we need further discussion from our end since you've already answered that this is a possibility. 🙂 We'd like to proceed with this limitation. Thanks.
To implement this, I think we will need to fetch each of the related articles from NLM twice: once (to find out which journal the article is in) before putting the PMID on the followup import page, and a second time when the user submits that page.
Some thoughts about this ticket:
I may not have to fetch the article XML from NLM an extra time,
as there's a RefSource
 sibling element next to the
PMID
element in the CommentsCorrections
block,
which appears to have the short title abbreviation as the first
substring in the text content for that element. I'm doing some analysis
to see if I can determine how reliable that would be
While doing this analysis I noticed that for the overwhelming majority of the related articles they're both in the same journal. That got me wondering about the tradeoff for the risks inherent in implementing this (suppose whatever approach we adopt isn't as foolproof as we think it is, and we inadvertently miss importing some related documents we wanted) versus the relatively low percentage of articles which would be filtered out. Of course my sample may not be sufficiently representative of what you see in your more extensive experience.
Looks like I'm going to have to abandon the more efficient approach
to checking the journals for the related articles. For well over 99% of
the articles, NLM uses the format I describe above, with the brief
journal title followed by a period as the first portion of the
RefSource
element's text content. But then every once in a
while you come across a block where someone has put something like this
in that element.
Â
, Ladeb S, Torjman L, Ben Othman T, Lakhal A, Ben Romdhane N, Elloumi M, Jeddi R, Aissaouï L, Ben Hassen A, Msadek F, Saad A, Hsaïri M. Blood. 2009 Jun 11;113(24):6265 Abdelkefi A
So this is not a reliable approach, and as feared it could wrongly reject an article that we would have wanted to import.
OK, I think I've got this working on DEV (after I finally figured out that failures I was getting were caused by the fact that NLM will not accept more than 3 requests per second). Testing is trickier than for most tickets, because you have to track down core journal articles which have related articles some of which are in the same journal and some of which are not.
I was able to identify two non-core related citations commenting on core citations from previous review cycles in prod. I imported the corresponding text files in dev. After import, neither were included in the list of pmids for related citations. So it looks like this is working.Â
Verified in dev.
Tested on QA, see 33200890 and 29554195 both of which are core journal articles with comments (related citations) from the same core journal as well as from other journals. Only same journal related citations were imported. This seems to be working correctly.
Thanks for your comment, Cynthia! Verified on QA.
"We have tested and seems to be working fine but we will need to watch it for a few more review cycles before we know for sure. I think it can be closed for now".
Agree, we can't really test this on PROD but we can reopen the issue if needed. Closing issue.
Elapsed: 0:00:00.000733