CDR Tickets

Issue Number 4236
Summary citations import error
Created 2017-03-09 17:51:37
Issue Type Improvement
Submitted By Osei-Poku, William (NIH/NCI) [C]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2017-08-08 13:04:37
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.204469
Description

We're having trouble importing this citation PMID 26460295 without a validation error. It appears have a new element <CoiStatement> which is causing to fail on PROD and DEV. I tried to import another/different citation and that worked so it doesn't appear to be a global issue. It seems this error pertains to this particular PMID 26460295 : CDR ID on PROD is CDR0000787538 and on DEV it is CDR0000779828.

Comment entered 2017-03-10 08:10:39 by Kline, Bob (NIH/NCI) [C]

NLM has a new DTD.

Comment entered 2017-03-11 12:07:39 by Osei-Poku, William (NIH/NCI) [C]

Raised the priority of this ticket as more citations import fail. Without a fix, we may not be able to update a lot of summaries.

Comment entered 2017-03-13 13:23:59 by Juthe, Robin (NIH/NCI) [E]

Bob, can you remind me what we need to do when NLM updates the DTD? Specifically, do the updates on our end require a release and/or hot fix? Adding Amy so she is aware of this.

Comment entered 2017-03-13 13:24:28 by Juthe, Robin (NIH/NCI) [E]

Oops, now I've added .

Comment entered 2017-03-13 14:17:46 by Kline, Bob (NIH/NCI) [C]

We need to deploy a modified schema for the Citation document type, something we can't do without the assistance of CBIIT. You might consider adding a ticket for implementing the scaffolding to allow us to deploy schema changes without having to get CBIIT involved. Back when Lakshmi decided we'd track the NLM data structure directly in our schema, no one ever imagined we wouldn't be able to do that directly ourselves on the production system. :-)

Comment entered 2017-03-13 15:05:16 by Osei-Poku, William (NIH/NCI) [C]

I added a new ticket without realizing that you were not replying directly to my comments. Please feel free to revise the ticket or delete it if necessary. OCECDR-4239

Comment entered 2017-03-13 16:11:49 by Kline, Bob (NIH/NCI) [C]

Try importing the problem citation on DEV, please.

Comment entered 2017-03-13 16:13:40 by Kline, Bob (NIH/NCI) [C]

I added a new ticket without realizing that you were not replying directly to my comments.

Yes, that's a bug in JIRA, which doesn't always show replies in context. I've reported the bug to CBIIT, but they've told me it probably won't be fixed.

Comment entered 2017-03-13 16:40:55 by Osei-Poku, William (NIH/NCI) [C]

I was able to import CDR0000779850, which has the new element, without any errors and the document is valid. However, it took a long time to import >2mins.

Comment entered 2017-03-13 16:59:26 by Osei-Poku, William (NIH/NCI) [C]

I tried a few more imports and they all imported fast enough.

Comment entered 2017-03-13 17:03:54 by Kline, Bob (NIH/NCI) [C]

I can try to shoehorn the schema change into the set which will be deployed to production later in the week, provided Amy approves.

: I am reasonably confident that the schema change will be benign. I have limited what I changed to just enough (the addition of a single element) to unblock the import of citations William's been reporting in order to make the level of confidence that this won't break anything as high as possible. There are other changes in NLM's latest DTD which I have not applied to our Citation schema, and which I would be uncomfortable trying to squeeze in without extensive user testing. Please let me know how you would like me to proceed. If you think I should delay the change to a separate patch, that's what I'll do. If you want me to slip this into the release, I can do that easily. If that's what we do, STAGE will be out of sync with the other tiers until we address the rest of this ticket.

Here's the patch:

Index: CitationSchema.xml
===================================================================
--- CitationSchema.xml  (revision 14668)
+++ CitationSchema.xml  (working copy)
@@ -8,6 +8,7 @@
     BZIssue::4952 (catching up with NLM DTD changes)
     JIRA::OCECDR-3664 (more breakage by NLM)
     JIRA::OCECDR-3825 (more changes at NLM)
+    JIRA::OCECDR-4236 (more changes at NLM)
   -->

 <schema              xmlns           = 'http://www.w3.org/2001/XMLSchema'>
@@ -166,6 +167,9 @@
                      type            = 'KeywordList'
                      minOccurs       = '0'
                      maxOccurs       = 'unbounded'/>
+      <element       name            = 'CoiStatement'
+                     type            = 'NlmInlineFormatting'
+                     minOccurs       = '0'/>
       <element       name            = 'SpaceFlightMission'
                      type            = 'NotEmptyString'
                      minOccurs       = '0'

Note that the added element is optional (minOccurs = '0').

Thanks,
Bob

Comment entered 2017-03-13 17:06:55 by Kline, Bob (NIH/NCI) [C]

The probability that the slow performance of your earlier test and the change to the schema were related to each other is very close to zero.

Comment entered 2017-03-13 17:23:00 by Dugan, Amy (NIH/NCI) [C]

, is there anything that we could have one of our QA folks test out on dev that might, however unlikely, be impacted by the above schema change? Thanks!

Comment entered 2017-03-13 17:27:28 by Kline, Bob (NIH/NCI) [C]

Hmmm, nothing's coming to mind. or - can you think of any tests which could be run at the user level on DEV to smoke test the schema change? Best I can come up with is not at the user level – I could write a script to run all of the citations which are currently marked as valid on DEV to revalidate them against the new schema. Shall I go ahead and do that?

Comment entered 2017-03-13 17:39:10 by Juthe, Robin (NIH/NCI) [E]

I was going to suggest spot-checking some of the older citations to make sure they are still valid, but if you can write a program to do that on a larger scale without too much difficulty, then I think you should.

, did you try linking any of the new citations in a summary and validating the summary? I can't imagine that wouldn't work, but it would be worth confirming.

Otherwise, I can't think of anything else on the user end.

Comment entered 2017-03-13 17:41:44 by Osei-Poku, William (NIH/NCI) [C]

I can't think of any tests besides the validity and publishability of the imported document which I am able to test/verify manually. What I haven not done is adding the citation to summaries and making the summaries publishable.

Comment entered 2017-03-13 17:45:35 by Osei-Poku, William (NIH/NCI) [C]

I didn't go as far as testing that but I can do that now just to make sure. I can also run some of the citations reports to make sure they are running okay.

Comment entered 2017-03-13 19:13:52 by Osei-Poku, William (NIH/NCI) [C]

I linked a new citation to a summary, checked PP, QC Reports and other citations reports and they all checked out okay.

Comment entered 2017-03-14 08:21:39 by Kline, Bob (NIH/NCI) [C]

The script to validate the Citation documents on DEV has finished. Of the 44,287 documents checked (I didn't look at any blocked documents, if indeed there were any), nine had validation errors, unrelated to the schema change. None of the nine documents had been marked as valid before the schema change.

CDR54984: Missing required attribute cdr:ref in element FullTextArticle
CDR586713: Expected child elements for empty Author element of type Author
CDR610077: Missing required attribute cdr:ref in element FullTextArticle
CDR732225: Missing required attribute cdr:ref in element FullTextArticle
CDR750673: Missing required attribute cdr:ref in element FullTextArticle
CDR758791: Missing required attribute cdr:ref in element FullTextArticle
CDR762086: Missing required attribute cdr:ref in element FullTextArticle
CDR767786: Missing required attribute cdr:ref in element FullTextArticle
CDR772261: Missing required attribute cdr:ref in element FullTextArticle

Comment entered 2017-03-14 08:22:08 by Dugan, Amy (NIH/NCI) [C]

Thanks for testing, !

Comment entered 2017-03-14 08:36:09 by Dugan, Amy (NIH/NCI) [C]

Let's plan to include this schema update with the Einstein deployment.

Comment entered 2017-03-14 08:54:02 by Kline, Bob (NIH/NCI) [C]

I have deployed the update to QA and inserted it into the deployment set for production. As noted above, STAGE will not have this change until we complete and deploy the rest of the work for this ticket.

Comment entered 2017-08-08 13:04:37 by Kline, Bob (NIH/NCI) [C]

On all tiers.

Attachments
File Name Posted User
pubmed_170101.dtd 2017-03-10 08:10:25 Kline, Bob (NIH/NCI) [C]

Elapsed: 0:00:00.001441