Issue Number | 4236 |
---|---|
Summary | citations import error |
Created | 2017-03-09 17:51:37 |
Issue Type | Improvement |
Submitted By | Osei-Poku, William (NIH/NCI) [C] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2017-08-08 13:04:37 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.204469 |
We're having trouble importing this citation PMID 26460295 without a validation error. It appears have a new element <CoiStatement> which is causing to fail on PROD and DEV. I tried to import another/different citation and that worked so it doesn't appear to be a global issue. It seems this error pertains to this particular PMID 26460295 : CDR ID on PROD is CDR0000787538 and on DEV it is CDR0000779828.
NLM has a new DTD.
Raised the priority of this ticket as more citations import fail. Without a fix, we may not be able to update a lot of summaries. ~mbeckwit ~juther
Bob, can you remind me what we need to do when NLM updates the DTD? Specifically, do the updates on our end require a release and/or hot fix? Adding Amy so she is aware of this.
Oops, now I've added ~duganal.
We need to deploy a modified schema for the Citation document type, something we can't do without the assistance of CBIIT. You might consider adding a ticket for implementing the scaffolding to allow us to deploy schema changes without having to get CBIIT involved. Back when Lakshmi decided we'd track the NLM data structure directly in our schema, no one ever imagined we wouldn't be able to do that directly ourselves on the production system. :-)
I added a new ticket without realizing that you were not replying directly to my comments. Please feel free to revise the ticket or delete it if necessary. OCECDR-4239
Try importing the problem citation on DEV, please.
I added a new ticket without realizing that you were not replying directly to my comments.
Yes, that's a bug in JIRA, which doesn't always show replies in context. I've reported the bug to CBIIT, but they've told me it probably won't be fixed.
I was able to import CDR0000779850, which has the new element, without any errors and the document is valid. However, it took a long time to import >2mins.
I tried a few more imports and they all imported fast enough.
I can try to shoehorn the schema change into the set which will be deployed to production later in the week, provided Amy approves.
~duganal: I am reasonably confident that the schema change will be benign. I have limited what I changed to just enough (the addition of a single element) to unblock the import of citations William's been reporting in order to make the level of confidence that this won't break anything as high as possible. There are other changes in NLM's latest DTD which I have not applied to our Citation schema, and which I would be uncomfortable trying to squeeze in without extensive user testing. Please let me know how you would like me to proceed. If you think I should delay the change to a separate patch, that's what I'll do. If you want me to slip this into the release, I can do that easily. If that's what we do, STAGE will be out of sync with the other tiers until we address the rest of this ticket.
Here's the patch:
Index: CitationSchema.xml===================================================================
--- CitationSchema.xml (revision 14668)
+++ CitationSchema.xml (working copy)
@@ -8,6 +8,7 @@
BZIssue::4952 (catching up with NLM DTD changes)
JIRA::OCECDR-3664 (more breakage by NLM)
JIRA::OCECDR-3825 (more changes at NLM)+ JIRA::OCECDR-4236 (more changes at NLM)
-->
<schema xmlns = 'http://www.w3.org/2001/XMLSchema'>@@ -166,6 +167,9 @@
type = 'KeywordList'
minOccurs = '0'
maxOccurs = 'unbounded'/>+ <element name = 'CoiStatement'
+ type = 'NlmInlineFormatting'
+ minOccurs = '0'/>
<element name = 'SpaceFlightMission'
type = 'NotEmptyString' minOccurs = '0'
Note that the added element is optional
(minOccurs = '0'
).
Thanks,
Bob
The probability that the slow performance of your earlier test and the change to the schema were related to each other is very close to zero.
~bkline, is there anything that we could have one of our QA folks test out on dev that might, however unlikely, be impacted by the above schema change? Thanks!
Hmmm, nothing's coming to mind. ~juther or ~oseipokuw - can you think of any tests which could be run at the user level on DEV to smoke test the schema change? Best I can come up with is not at the user level – I could write a script to run all of the citations which are currently marked as valid on DEV to revalidate them against the new schema. Shall I go ahead and do that?
I was going to suggest spot-checking some of the older citations to make sure they are still valid, but if you can write a program to do that on a larger scale without too much difficulty, then I think you should.
~oseipokuw, did you try linking any of the new citations in a summary and validating the summary? I can't imagine that wouldn't work, but it would be worth confirming.
Otherwise, I can't think of anything else on the user end.
I can't think of any tests besides the validity and publishability of the imported document which I am able to test/verify manually. What I haven not done is adding the citation to summaries and making the summaries publishable.
I didn't go as far as testing that but I can do that now just to make sure. I can also run some of the citations reports to make sure they are running okay.
I linked a new citation to a summary, checked PP, QC Reports and other citations reports and they all checked out okay.
The script to validate the Citation documents on DEV has finished. Of the 44,287 documents checked (I didn't look at any blocked documents, if indeed there were any), nine had validation errors, unrelated to the schema change. None of the nine documents had been marked as valid before the schema change.
CDR54984: Missing required attribute cdr:ref in element
FullTextArticle
CDR586713: Expected child elements for empty Author element of type
Author
CDR610077: Missing required attribute cdr:ref in element
FullTextArticle
CDR732225: Missing required attribute cdr:ref in element
FullTextArticle
CDR750673: Missing required attribute cdr:ref in element
FullTextArticle
CDR758791: Missing required attribute cdr:ref in element
FullTextArticle
CDR762086: Missing required attribute cdr:ref in element
FullTextArticle
CDR767786: Missing required attribute cdr:ref in element
FullTextArticle
CDR772261: Missing required attribute cdr:ref in element
FullTextArticle
Thanks for testing, ~oseipokuw!
Let's plan to include this schema update with the Einstein deployment.
I have deployed the update to QA and inserted it into the deployment set for production. As noted above, STAGE will not have this change until we complete and deploy the rest of the work for this ticket.
On all tiers.
File Name | Posted | User |
---|---|---|
pubmed_170101.dtd | 2017-03-10 08:10:25 | Kline, Bob (NIH/NCI) [C] |
Elapsed: 0:00:00.001441