EBMS Tickets

Issue Number 570
Summary Ad-hoc report for related citations
Created 2020-08-06 09:18:16
Issue Type Task
Submitted By Juthe, Robin (NIH/NCI) [E]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2020-08-06 15:58:53
Resolution Fixed
Path /home/bkline/backups/jira/oceebms/issue.271925
Description

In order to get a better understanding of the breadth and volume of related citations to inform requirements for OCEEBMS-568, would it be possible to generate an ad-hoc spreadsheet showing related citations for all those articles imported into the EBMS since Jan 1, 2020? It would be helpful if it's possible for the spreadsheet to include the following information:

  • PMID of the citation in the EBMS

  • the journal abbreviation for the citation in the EBMS

  • the date of publication of the citation 

  • the date the citation was imported into the EBMS

  • the PMID of the related citation

  • the type of related citation (if this is something PubMed provides? It would be something like editorial/comment, supplement, errata)

  • the journal abbreviation for the related citation

  • the date of publication of the related citation

Comment entered 2020-08-06 15:58:53 by Kline, Bob (NIH/NCI) [C]

Added a couple of extra columns, just in case ... 😃

Comment entered 2020-08-07 10:56:10 by Juthe, Robin (NIH/NCI) [E]

Thanks, Bob. I think maybe I wasn't clear that we wanted to see all related citations from the perspective of PubMed, not just those that have been imported into the EBMS. (Just guessing that these are all in the EBMS based on the length of the spreadsheet and the fact that each has a relationship type.) We're trying to get a sense of what would be imported if we were to implement the automatic import of related citations from PubMed.

 

Is it possible to do this? Thank you!

Comment entered 2020-08-07 11:13:04 by Kline, Bob (NIH/NCI) [C]

Ah! good thing I didn't estimate the base ticket. 😛 I was picturing new fields on the import page where you would be specifying the related articles and having the software create the links as part of the import action. I will investigate.

Comment entered 2020-08-07 11:31:46 by Juthe, Robin (NIH/NCI) [E]

Ah, okay. Maybe this is more than we should do in Drupal 7? Anyway, would be good to get some more info with which to make a decision. Thanks for taking a look.

Comment entered 2020-08-07 13:07:23 by Kline, Bob (NIH/NCI) [C]

I'll ask NLM if it's possible to query for the relationships. If you have any searching techniques which work for this purpose, please pass them on.

Comment entered 2020-08-07 15:43:59 by Juthe, Robin (NIH/NCI) [E]

I don't have any searching tricks, although the related citations usually appear on the same page as the original citation. For example, on this page (https://pubmed.ncbi.nlm.nih.gov/31995683/) there is a section beneath the abstract called "Comment in" that lists several editorials/commentaries related to the main article on the page.

Comment entered 2020-08-10 11:56:49 by Kline, Bob (NIH/NCI) [C]

How's this (it's what I can get without going out to NLM for each one to fetch the XML for the related article)?

Comment entered 2020-08-10 12:14:15 by Kline, Bob (NIH/NCI) [C]

Replaced the report (wasn't sure I had saved my reformatting and sorting before dragging it onto the issue). Be sure to use the one posted today.

Comment entered 2020-08-14 15:07:34 by Kline, Bob (NIH/NCI) [C]

Just a heads-up that CBIIT has pointed ebms-qa to point to the new CentOS 7 server, so please use that name instead of ebms-qa2 when testing on that tier.

Comment entered 2020-08-21 12:15:52 by Kline, Bob (NIH/NCI) [C]

: do you want the new Board column to reflect all the boards connected to the article, or just the one associated with the topic used when the article was first imported?

Comment entered 2020-08-21 13:10:55 by Juthe, Robin (NIH/NCI) [E]

All of the associated Boards, please. Thanks!

Comment entered 2020-08-21 14:52:45 by Kline, Bob (NIH/NCI) [C]

New report added, using the requirements we came up with in this morning's WebEx conference. I ran a script to assemble to sets of counts, one for the number of times a given board is linked to the article represented in the first column, and the second for the number of times a given board is linked to the article represented in the first column, but only if we don't already have the related article represented in the fifth column. You're right, there are many more for Adult Treatment than I expected. You'll see in the output below that there is one article in the system which has no rows in the ebms_article_state table, which is a little bit baffling.

 

NO BOARDS FOR PMID 30817959: HOW CAN THIS BE???
COUNTS
 2664 Adult Treatment
  666 Cancer Genetics
   49 Integrative, Alternative, and Complementary Therapies
  331 Pediatric Treatment
 1124 Screening and Prevention
  510 Supportive and Palliative Care
COUNTS FOR RELATED ARTICLES NOT IN THE EBMS
 1974 Adult Treatment
  437 Cancer Genetics
   28 Integrative, Alternative, and Complementary Therapies
  213 Pediatric Treatment
  747 Screening and Prevention
  374 Supportive and Palliative Care
Comment entered 2020-08-21 14:53:49 by Juthe, Robin (NIH/NCI) [E]

Maybe it's an internal article? I'll take a look. Thank you!!

Comment entered 2020-08-21 15:27:50 by Juthe, Robin (NIH/NCI) [E]

It is an internal article only.

Comment entered 2020-08-21 15:34:12 by Juthe, Robin (NIH/NCI) [E]

We just reviewed the new spreadsheet briefly and talked through this a bit more and identified a few more questions (below). We're concerned about the potential increase in work (caused by additional citations in our queue) so we're considering ways to limit the number of additional citations.

 

1) Would it be possible to implement this only for certain journals (e.g., core journals)? 

2) Would it be possible to implement this only for certain Boards? (Another possibility we discussed is just to ignore the additional import citation page when importing citations for certain Boards.)

3) A different (but related) enhancement idea.... Would it be feasible to have the system automatically apply related citation links for existing citations in our system, if we have both related citations already in the system? Maybe this would be done in some kind of a scheduled or ad-hoc job? I can see this being a big effort in terms of the amount of data we'd be querying at NLM, especially the first time it is done.

Comment entered 2020-08-21 16:19:47 by Kline, Bob (NIH/NCI) [C]

1) Yes.

2) Yes.

3) Yes. Not for this release. 😉 We can examine the XML we already have if we're concerned about heavy-duty retrievals from NLM. Or we could use the XML I pulled from NLM not that long ago while I was scratching my head to figure out what the Drupal 9 version of the system might look like.

Comment entered 2020-08-26 11:01:51 by Juthe, Robin (NIH/NCI) [E]

Thanks, Bob. This was all really helpful in clarifying our requirements for this issue. I just added a comment in OCEEBMS-568 with a summary of our requirements. I think we'll definitely want to implement something like my question #3 above in a future release, so I'll add a ticket to the backlog for that.

Attachments
File Name Posted User
oceebms-570.xlsx 2020-08-21 14:46:38 Kline, Bob (NIH/NCI) [C]
oceebms-570.xlsx 2020-08-10 12:13:16 Kline, Bob (NIH/NCI) [C]
oceebms-570.xlsx 2020-08-06 15:58:06 Kline, Bob (NIH/NCI) [C]

Elapsed: 0:00:00.000664