Issue Number | 717 |
---|---|
Summary | Duplicate PMIDs |
Created | 2023-01-30 11:30:59 |
Issue Type | Improvement |
Submitted By | Kline, Bob (NIH/NCI) [C] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2023-02-04 14:46:10 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/oceebms/issue.337514 |
There are six pairs of articles in the production EBMS sharing PMIDs. Can we eliminate the duplicates as part of the Everglades rewrite? I have listed the pairs below with the relevant history for each record, as well as proposed resolutions for each pair, to be implemented as part of the migration to EBMS4. This will allow us to prevent duplicates going forward. Of course, we will need to monitor the final days of EBMS3 to detect any additional duplicates which are created.
EBMS 91933 - imported 2006-06-05; final board decision for Genetics of Skin Cancer 2009-09-25; assigned to a Late Effects of Treatment for Childhood Cancer packet for review 2006-06-22, but no reviews posted; passed abstract review for Unusual Cancers of Childhood 2006-07-07 but got no further
EBMS 182374 - imported 2009-09-25; rejected in initial librarian review for Genetics of Skin Cancer 2009-09-25 (drop this conflicting record?)
EBMS 392403 - imported 2015-06-03; rejected after abstract review for Non-Small Cell Lung Cancer 2015-06-26
EBMS 392404 - imported 2015-06-03; rejected after abstract review for Non-Small Cell Lung Cancer 2015-06-26 (drop?)
EBMS 730644 - imported 2021-03-28; rejected in initial librarian review for Childhood Extracranial Germ Cell Tumors and Cancer Screening Overview 2021-04-09
EBMS 730645 - imported 2021-03-28; rejected after abstract review for General Pediatric Treatment 2021-04-27; rejected in initial librarian review for Cancer Screening Overview 2021-04-21 (merge with 730644?)
EBMS 730646 - imported 2021-03-28; rejected in initial librarian review for Cancer Screening Overview 2021-04-21; rejected after abstract review for Cancer Prevention Overview 2021-08-16
EBMS 730647 - imported 2021-03-28; rejected in initial librarian review for Cancer Screening Overview 2021-04-21; rejected after abstract review for Cancer Prevention Overview 2021-08-16 (drop?)
EBMS 730648 - imported 2021-03-28; rejected in initial librarian review for Cancer Screening Overview 2021-04-16
EBMS 730649 - imported 2021-03-28; rejected in initial librarian review for Cancer Screening Overview 2021-04-16 (drop?)
EBMS 878549 - imported 2022-11-07; published 2022-11-07; linked to 878548
EBMS 878550 - imported 2022-11-07; published 2022-11-07; linked to 878548 (drop?)
Added watchers.
~vshields and ~juther It would be helpful to be able to fold work on this issue into the other changes I'm making to the migration software. Can you provide any feedback on my proposed plan for eliminating the duplicates? Ideally, it would be good to be able to test the results of merging the article records with the next migration build.
The Librarian opinion:
I have taken a close look at these citations and I think that dropping the 5 records you have designated above will not result in any data loss as all data is included in the record that will be retained.
Re: PMID 16442793 this citation came up in two different review cycles 3 years apart but reviewed in the earlier one so record 91933 has the accurate review data and my guess is that when 182374 came in for review, I must have seen that it was already reviewed and rejected it so that it would not get reviewed again. Regardless I think 182374 can be dropped.
I also think the two records for PMID 31596791 can be merged. I am not sure how the duplicate was created but in terms of my review process it looks like I rejected this citation for peds extra germ cell because I wanted to publish it as general peds instead. So combining these two records would create a complete record of the review process.
I'm doing fresh migrations on the QA server (when CBIIT cleans up its misconfiguration on that VM) and on my own server as part of the work on OCEEBMS-719. I have tentatively folded in (but not committed, pending sign-off from ~vshields and/or ~juther) the approach to eliminating the duplicate documents described here (and implementing the database constraints to prevent future duplicates).
Victoria and I discussed this issue and we agree with the proposed approach. Thank you for the explanation!
The migration scripts have been modified to remove the duplicates, merging information as needed.
I reviewed these citations in EBMS4, the drops and one merge seem to have worked correctly. Looks good.
Elapsed: 0:00:00.000839