CDR Tickets

Issue Number 3253
Summary Validation Errors in CDR 62855
Created 2010-10-22 08:53:58
Issue Type Improvement
Submitted By Juthe, Robin (NIH/NCI) [E]
Assigned To alan
Status Closed
Resolved 2010-11-12 11:57:21
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.107581
Description

BZISSUE::4943
BZDATETIME::2010-10-22 08:53:58
BZCREATOR::Robin Juthe
BZASSIGNEE::Alan Meyer
BZQACONTACT::William Osei-Poku

Putting in an issue for a topic discussed in yesterday's CDR meeting about validation errors in CDR62855 on Bach. The errors appear to be related to Summary Fragment Refs within the document that link to text that is in Proposed Deletion tags.

Please revise the component if necessary.

Comment entered 2010-10-25 09:04:12 by Juthe, Robin (NIH/NCI) [E]

BZDATETIME::2010-10-25 09:04:12
BZCOMMENTOR::Robin Juthe
BZCOMMENT::1

I'm upping the priority on this issue because these validation errors are holding up our ability to republish the document in order to run the October mailer.

Alan, if you are not the correct assignee, please change this. Thank you!

Comment entered 2010-10-25 13:13:53 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-10-25 13:13:53
BZCOMMENTOR::Volker Englisch
BZCOMMENT::2

Alan, since Robin said this problem is holding up stuff I took a look at the document since you are probably still recovering from Orbs. :-)

Robin, the target to the fragment _1234 is completely missing from the document.
There is the proposed deleted summary section which has the fragment-ID=_2104 and the proposed inserted text with the fragment-ID=_2064.

The summary fragment link in the proposed inserted text does correctly link to _2064 but the summary fragment link in the proposed deleted text will need to link to _2104 for the validation to succeed.

Unfortunately, I can't tell why the fragment-ID=_1234 disappeared because it obviously used to exist (since I'm assuming you didn't manually change the link-ID).

Please try to change the fragment-ID from _1234 to _2104 and try to revalidate.

The second error is similar and is a result of the fragment-ID=_2072 in the changes section to link to the proposed inserted text instead of linking to the proposed deleted text. You probably want to put the text in the changes section within proposed inserted markup.

The same is true for the third error linking to the paragraph 'Retrospective and prospective studies...'

The errors 4-6 are a direct result of the first three errors.

Comment entered 2010-10-25 13:27:41 by Juthe, Robin (NIH/NCI) [E]

BZDATETIME::2010-10-25 13:27:41
BZCOMMENTOR::Robin Juthe
BZCOMMENT::3

Thanks! I've asked Aman to correct these three summary fragment refs and then we will try revalidating the document.

Comment entered 2010-10-25 15:33:50 by alan

BZDATETIME::2010-10-25 15:33:50
BZCOMMENTOR::Alan Meyer
BZCOMMENT::4

Sorry, I was planning to look at this tomorrow and didn't see the priority change until just now, and I also see that Volker has taken care of it.

Comment entered 2010-10-25 15:36:55 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-10-25 15:36:55
BZCOMMENTOR::Volker Englisch
BZCOMMENT::5

(In reply to comment #4)
> I also see that Volker has taken care of it.

"Has taken care of it" is an overstatement but we're getting there.

Comment entered 2010-10-25 16:30:31 by Juthe, Robin (NIH/NCI) [E]

BZDATETIME::2010-10-25 16:30:31
BZCOMMENTOR::Robin Juthe
BZCOMMENT::6

A valid publishable version has been created, and Bonnie just ran a successful mailer job.

The validation including proposed markup is still finding errors in Summary Fragment Refs though. I'm adjusting the priority since it is no longer holding up our other work, but I'm still troubled by these errors.

Comment entered 2010-10-25 19:02:32 by alan

BZDATETIME::2010-10-25 19:02:32
BZCOMMENTOR::Alan Meyer
BZCOMMENT::7

I'll look into the errors tomorrow (Tuesday, Oct. 26).

Comment entered 2010-10-26 22:39:37 by alan

BZDATETIME::2010-10-26 22:39:37
BZCOMMENTOR::Alan Meyer
BZCOMMENT::8

I believe I found the cause of the error messages reported in
proposed Deletion markup in CDR62855. It is a bug in the CDR
link validation software in the CdrServer. It's a bug that has
existed for a long time but didn't become a real problem until we
modified the link validation a couple of years ago to support
link checking that used publishable versions instead of the
current working documents.

The basic problem is this:

Validation of links is done by reading tables of link data in
the database. That is the wrong thing to do when validating
internal links, i.e., a link from inside the document to
another place in the same document. In those cases we should
be checking the document for internal link consistency and
ignoring the database.

The validation problem occurs because the last publishable
version has links to a number of cdr:ids that are in sections
that will be deleted. When we delete them, e.g., when
validating with proposed deletions applied, the resultant
document is missing cdr:ids that the publishable version
still links to.

Implementing a fix will require some analysis and design. It's
not a re-write of a a few lines of code. It will have to be
thought out.

I'll work on that on Thursday, but this is pretty central code
and it may be a couple of weeks before we have a fix that's
tested and ready to put into production.

I'm not sure how the lack of a fix will affect publishing in the
meantime. In case of dire need I can think of a work around that
will enable us to make a publishable document even without a fix.
I believe that we can circumvent the problem by applying the
deletion markup in two stages as follows:

1. In stage 1, delete the references from within the document
to the cdr:ids within the document, but DO NOT delete the
referenced cdr:ids, i.e., the targets.

2. Save a publishable version. The document will validate
because the cdr:ids that the old publishable version
references will still be there.

3. In stage 2, delete the referenced, target cdr:ids. The
just saved publishable version will not reference them, so
they can now be removed without causing any validation
errors.

4. Save a publishable version. This is the one we'll
actually publish.

Here is a second workaround that should also work:

1. Apply the deletion markup.

2. Save the modified document without making it publishable.
This creates index entries in the database that do not
have the links to the missing cdr:ids.

3. Use the administrative interface to modify the link type
definition for SummaryFragmentRef, and any other link
types involved. The modification is to change the "Link
Target Is" property from "Published version" to either of
the other values.

4. Save again as a publishable version. Validation now works
because we no longer require the last publishable version
to be free of references to the missing ids.

5. Use the admin interface to restore the link type
definition to the correct value.

I hope to get everything working in time to avoid using
workarounds, but I wanted to document them here in case a new
version of a document with this problem has be published before
the modified software is ready.

There is a secondary problem that is similar to this one and has
a similar cause, but we may never actually encounter it because
it can't be generated through XMetal. I only discovered it
because I wrote a test program that didn't use XMetal in order to
better understand the XMetal problem, and it hit the second bug.
The second bug is:

If a document has no doc ID, it is internally assigned a
document ID of 0 during link validation. An internal link,
for example to "CDR0000062855#_2064", thus appears to be a
link from CDR0000000000 -> CDR0000062855. The software, not
knowing that it has 62855 in its hands, checks the indexes in
the database and discovers, in this particular case, that the
publishable version of 62855 doesn't have fragment _2064,
which was only added in the current working document.

I doubt that this second problem can ever occur in production
because we probably always have a CDR document ID when validating
for real. However there may be some situation I haven't thought
about where we have a string of XML to validate without knowing
its doc ID. I think the fix is the same as for the first problem
so I might as well fix both together.

Comment entered 2010-10-27 09:48:56 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-10-27 09:48:56
BZCOMMENTOR::Volker Englisch
BZCOMMENT::9

(In reply to comment #8)
> I'm not sure how the lack of a fix will affect publishing in the
> meantime. In case of dire need I can think of a work around that
> will enable us to make a publishable document even without a fix.

It's not a problem for publishing. It is only a problem when testing QC reports. Since board managers are using proposed markup more and more they also want to check if the used links are valid or not (after all, we do provide the option of link validation with proposed markup).

Comment entered 2010-11-02 23:42:15 by alan

BZDATETIME::2010-11-02 23:42:15
BZCOMMENTOR::Alan Meyer
BZCOMMENT::10

I think I have this fixed.

It wasn't as broken as I thought. I wound up changing just one
line of code, though it took me some time to figure out what that
line was.

I've installed a new CdrServer on Mahler with the fix. I tested
by modifying a Summary document to delete a section that was
referenced from inside the same document, then validated. It
appeared to work properly.

Here are the specifics of my test:

Called up CDR62856 in XMetal on Mahler.

Located a cdr:href="CDR0000062856#_179" in the document.

Located the section on "Prognostic factors", which has the
cdr:id="_179".

Wrapped the section with the cdr:id in Deletion markup, with
RevisionLevel="proposed".

Tried link validation Include approved markup.
Passed.
The deletion was not approved, so it was not deleted
from the point of view of validation.

Tried link validation Include approved and proposed markup.
Failed.
The deletion was performed prior to validation, so
the href link failed.

Wrapped the element containing the href in Deletion markup
with RevisionLevel="approved".

Tried link validation Include approved markup.
Passed.

Tried link validation Include approved and proposed markup.
Passed.
This is the one that failed on Bach with CDR62855.

I did not save my changes to CDR62856.

I'm going to declare this resolved-fixed, ready for QA testing on
Mahler.

It may be difficult to test since XMetal users don't have an easy
way to search for attributes.

If it's too difficult to test this on Mahler, I can help out, or
I can install it on Bach alongside the production server and run
it on a different port, not 2010. Or I can copy CDR62855 or other
documents from Bach to Mahler.

Comment entered 2010-11-03 13:16:43 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-11-03 13:16:43
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::11

(In reply to comment #10)
> If it's too difficult to test this on Mahler, I can help out, or
> I can install it on Bach alongside the production server and run
> it on a different port, not 2010. Or I can copy CDR62855 or other
> documents from Bach to Mahler.

We tested this on Mahler and it seems to be working fine without errors. Please promote to Bach.

Comment entered 2010-11-04 11:15:37 by alan

BZDATETIME::2010-11-04 11:15:37
BZCOMMENTOR::Alan Meyer
BZCOMMENT::12

(In reply to comment #11)
> We tested this on Mahler and it seems to be working fine without errors. Please
> promote to Bach.

Promoting the code requires a replacement of the CdrServer. I'll do that tonight when users are no longer heavily using the system.

Comment entered 2010-11-04 22:51:01 by alan

BZDATETIME::2010-11-04 22:51:01
BZCOMMENTOR::Alan Meyer
BZCOMMENT::13

I promoted the new CdrServer to Bach and Franck.

I tested on each system by validating CDR62855 with and without proposed markup. Everything was fine on Bach. 17 validation errors were reported on Franck when using proposed markup. The software should be exactly the same on the two servers, so I presume that the version of 62855 on Frank is an earlier version for which many proposed citation links had been added, but the citations did not yet have publishable versions in the older Franck database.

If I am mistaken about that, please let me know and I'll investigate. Otherwise I think everything should be fine and we can close the issue.

Comment entered 2010-11-12 11:57:21 by Juthe, Robin (NIH/NCI) [E]

BZDATETIME::2010-11-12 11:57:21
BZCOMMENTOR::Robin Juthe
BZCOMMENT::14

Thanks, Alan. I verified that the document validates on Bach with and without proposed markup. Closing issue.

Elapsed: 0:00:00.001716