Issue Number | 4337 |
---|---|
Summary | Upcoming NLM PubMed DTD updates - schema changes |
Created | 2017-11-09 12:43:39 |
Issue Type | Bug |
Submitted By | Osei-Poku, William (NIH/NCI) [C] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2017-12-05 09:15:54 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.216741 |
~bkline I have posted below the email from NLM which appears to indicate changes that may affect PubMed citations import. Please take a look to see if we have to make any schema changes.
Dear NCBI PubMed E-Utilities Users,
We anticipate updating the PubMed E-Utilities DTD for 2018 in late November, approximately November 27, 2017.
The forthcoming DTD is now available:
http://dtd.nlm.nih.gov/ncbi/pubmed/out/pubmed_180101.dtd
The following describes the substantive changes to PubMed DTD and PubMed XML:
1. The DateCreated element will be deleted.
2. The valid value Organism will be added to the Type attribute of the SuppleMeshName element.
DTD:
<!ELEMENT SupplMeshName (#PCDATA) >
<!ATTLIST SupplMeshName
Type (Disease | Protocol | Organism) #REQUIRED
UI CDATA #REQUIRED >
3. Change to baseline and update file names (ftp://ftp.ncbi.nlm.nih.gov/pubmed/baseline and ftp://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/):
Because NLM exports citations other than MEDLINE records, file names
for the ftp server will be corrected beginning with the 2018
baseline.
a. Baseline files will begin with pubmed18n0001.xml.gz
b. Daily update files will continue with this naming convention:
pubmed18nxxxx.xml.gz
c. Associated .md5 files will follow this convention beginning with
pubmed18n0001.xml.gz.md5
d. Stats files will follow this convention beginning with
pubmed18n0001_stats.html
Thank you,
PubMed Development Team
~bkline will make changes on DEV to test these changes. The DateCreated element will be made optional but preserved.
Changes installed on DEV.
I successfully imported a new citation on DEV, validated it, removed the date created field, and validated it again. I also validated an existing citation. Is there anything else we should do to test this?
I just saw your note (in an email message) that this schema change is on all tiers, so we've actually done a lot more testing than I thought. I think we can consider this verified, although the true test will be after NLM makes its changes. Thanks!
... anything else we should do to test this?
I guess you could stick the new attribute value in and confirm that the document still passes validation.
Still valid. Thanks!
Agreed. So far, we haven't seen any problems with this schema change.
Go ahead and close this if everything looks OK.
Never mind. I got this mixed up with another schema-change ticket. I need to push this to the upper tiers. More to follow. :-)
The changes are now on all the tiers. Please verify, and if everything looks OK, close this ticket.
Verified on PROD. Thanks!
Elapsed: 0:00:00.001320