EBMS Tickets

Issue Number 717
Summary Duplicate PMIDs
Created 2023-01-30 11:30:59
Issue Type Improvement
Submitted By Kline, Bob (NIH/NCI) [C]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2023-02-04 14:46:10
Resolution Fixed
Path /home/bkline/backups/jira/oceebms/issue.337514
Description

There are six pairs of articles in the production EBMS sharing PMIDs. Can we eliminate the duplicates as part of the Everglades rewrite? I have listed the pairs below with the relevant history for each record, as well as proposed resolutions for each pair, to be implemented as part of the migration to EBMS4. This will allow us to prevent duplicates going forward. Of course, we will need to monitor the final days of EBMS3 to detect any additional duplicates which are created.

PMID 16442793

  • EBMS 91933 - imported 2006-06-05; final board decision for Genetics of Skin Cancer 2009-09-25; assigned to a Late Effects of Treatment for Childhood Cancer packet for review 2006-06-22, but no reviews posted; passed abstract review for Unusual Cancers of Childhood 2006-07-07 but got no further

  • EBMS 182374 - imported 2009-09-25; rejected in initial librarian review for Genetics of Skin Cancer 2009-09-25 (drop this conflicting record?)

PMID 24365049

  • EBMS 392403 - imported 2015-06-03; rejected after abstract review for Non-Small Cell Lung Cancer 2015-06-26

  • EBMS 392404 - imported 2015-06-03; rejected after abstract review for Non-Small Cell Lung Cancer 2015-06-26 (drop?)

PMID 31596791

  • EBMS 730644 - imported 2021-03-28; rejected in initial librarian review for Childhood Extracranial Germ Cell Tumors and Cancer Screening Overview 2021-04-09

  • EBMS 730645 - imported 2021-03-28; rejected after abstract review for General Pediatric Treatment 2021-04-27; rejected in initial librarian review for Cancer Screening Overview 2021-04-21 (merge with 730644?)

PMID 31749144

  • EBMS 730646 - imported 2021-03-28; rejected in initial librarian review for Cancer Screening Overview 2021-04-21; rejected after abstract review for Cancer Prevention Overview 2021-08-16

  • EBMS 730647 - imported 2021-03-28; rejected in initial librarian review for Cancer Screening Overview 2021-04-21; rejected after abstract review for Cancer Prevention Overview 2021-08-16 (drop?)

PMID 31785152

  • EBMS 730648 - imported 2021-03-28; rejected in initial librarian review for Cancer Screening Overview 2021-04-16

  • EBMS 730649 - imported 2021-03-28; rejected in initial librarian review for Cancer Screening Overview 2021-04-16 (drop?)

PMID 36318132

  • EBMS 878549 - imported 2022-11-07; published 2022-11-07; linked to 878548

  • EBMS 878550 - imported 2022-11-07; published 2022-11-07; linked to 878548 (drop?)

Comment entered 2023-01-30 11:31:41 by Kline, Bob (NIH/NCI) [C]

Added watchers.

Comment entered 2023-02-01 09:35:51 by Kline, Bob (NIH/NCI) [C]

and It would be helpful to be able to fold work on this issue into the other changes I'm making to the migration software. Can you provide any feedback on my proposed plan for eliminating the duplicates? Ideally, it would be good to be able to test the results of merging the article records with the next migration build.

Comment entered 2023-02-01 11:18:34 by Boggess, Cynthia (NIH/NCI) [C]

The Librarian opinion:

I have taken a close look at these citations and I think that dropping the 5 records you have designated above will not result in any data loss as all data is included in the record that will be retained.

Re: PMID 16442793 this citation came up in two different review cycles 3 years apart but reviewed in the earlier one so record 91933 has the accurate review data and my guess is that when 182374 came in for review, I must have seen that it was already reviewed and rejected it so that it would not get reviewed again. Regardless I think 182374 can be dropped.

I also think the two records for PMID 31596791 can be merged. I am not sure how the duplicate was created but in terms of my review process it looks like I rejected this citation for peds extra germ cell because I wanted to publish it as general peds instead. So combining these two records would create a complete record of the review process.

Comment entered 2023-02-02 08:43:04 by Kline, Bob (NIH/NCI) [C]

I'm doing fresh migrations on the QA server (when CBIIT cleans up its misconfiguration on that VM) and on my own server as part of the work on OCEEBMS-719. I have tentatively folded in (but not committed, pending sign-off from and/or ) the approach to eliminating the duplicate documents described here (and implementing the database constraints to prevent future duplicates).

Comment entered 2023-02-02 15:08:36 by Juthe, Robin (NIH/NCI) [E]

Victoria and I discussed this issue and we agree with the proposed approach. Thank you for the explanation!

Comment entered 2023-02-04 14:46:10 by Kline, Bob (NIH/NCI) [C]

The migration scripts have been modified to remove the duplicates, merging information as needed.

Comment entered 2023-02-08 11:18:11 by Boggess, Cynthia (NIH/NCI) [C]

I reviewed these citations in EBMS4, the drops and one merge seem to have worked correctly. Looks good.

Elapsed: 0:00:00.000839