CDR Tickets

Issue Number 4337
Summary Upcoming NLM PubMed DTD updates - schema changes
Created 2017-11-09 12:43:39
Issue Type Bug
Submitted By Osei-Poku, William (NIH/NCI) [C]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2017-12-05 09:15:54
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.216741
Description

I have posted below the email from NLM which appears to indicate changes that may affect PubMed citations import. Please take a look to see if we have to make any schema changes.

Dear NCBI PubMed E-Utilities Users,

We anticipate updating the PubMed E-Utilities DTD for 2018 in late November, approximately November 27, 2017.

The forthcoming DTD is now available:

http://dtd.nlm.nih.gov/ncbi/pubmed/out/pubmed_180101.dtd

The following describes the substantive changes to PubMed DTD and PubMed XML:

1. The DateCreated element will be deleted.

2. The valid value Organism will be added to the Type attribute of the SuppleMeshName element.

DTD:
<!ELEMENT SupplMeshName (#PCDATA) >
<!ATTLIST SupplMeshName
Type (Disease | Protocol | Organism) #REQUIRED
UI CDATA #REQUIRED >

3. Change to baseline and update file names (ftp://ftp.ncbi.nlm.nih.gov/pubmed/baseline and ftp://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/):

Because NLM exports citations other than MEDLINE records, file names for the ftp server will be corrected beginning with the 2018 baseline.
a. Baseline files will begin with pubmed18n0001.xml.gz
b. Daily update files will continue with this naming convention: pubmed18nxxxx.xml.gz
c. Associated .md5 files will follow this convention beginning with pubmed18n0001.xml.gz.md5
d. Stats files will follow this convention beginning with pubmed18n0001_stats.html

Thank you,
PubMed Development Team

Comment entered 2017-11-09 13:51:41 by Osei-Poku, William (NIH/NCI) [C]

will make changes on DEV to test these changes. The DateCreated element will be made optional but preserved.

Comment entered 2017-11-09 14:25:47 by Kline, Bob (NIH/NCI) [C]

Changes installed on DEV.

Comment entered 2017-11-22 11:41:35 by Juthe, Robin (NIH/NCI) [E]

I successfully imported a new citation on DEV, validated it, removed the date created field, and validated it again. I also validated an existing citation. Is there anything else we should do to test this?

Comment entered 2017-11-22 11:43:43 by Juthe, Robin (NIH/NCI) [E]

I just saw your note (in an email message) that this schema change is on all tiers, so we've actually done a lot more testing than I thought. I think we can consider this verified, although the true test will be after NLM makes its changes. Thanks!

Comment entered 2017-11-22 12:44:48 by Kline, Bob (NIH/NCI) [C]

... anything else we should do to test this?

I guess you could stick the new attribute value in and confirm that the document still passes validation.

Comment entered 2017-11-22 12:50:33 by Juthe, Robin (NIH/NCI) [E]

Still valid. Thanks!

Comment entered 2017-11-22 13:30:15 by Osei-Poku, William (NIH/NCI) [C]

Agreed. So far, we haven't seen any problems with this schema change.

Comment entered 2017-12-05 09:15:54 by Kline, Bob (NIH/NCI) [C]

Go ahead and close this if everything looks OK.

Comment entered 2017-12-05 09:17:17 by Kline, Bob (NIH/NCI) [C]

Never mind. I got this mixed up with another schema-change ticket. I need to push this to the upper tiers. More to follow. :-)

Comment entered 2017-12-05 09:28:30 by Kline, Bob (NIH/NCI) [C]

The changes are now on all the tiers. Please verify, and if everything looks OK, close this ticket.

Comment entered 2018-01-04 13:04:15 by Osei-Poku, William (NIH/NCI) [C]

Verified on PROD. Thanks!

Elapsed: 0:00:00.001320