CDR Tickets

Issue Number 3340
Summary Global Change Protocol Links - Check for old schema/filter mismatch errors
Created 2011-04-07 20:41:15
Issue Type Bug
Submitted By alan
Assigned To alan
Status Closed
Resolved 2011-09-22 09:49:07
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.107668
Description

BZISSUE::5033
BZDATETIME::2011-04-07 20:41:15
BZCREATOR::Alan Meyer
BZASSIGNEE::Alan Meyer
BZQACONTACT::William Osei-Poku

OCECDR-3324 dealt with a problem in the global change for protocol terminology links. It turned out that the XSLT filter used to add or replace terms in a protocol was designed to work with InScopeProtocol and CTGovProtocol schemas that changed after the program went into production. The schema changes did not prevent the global change program from running, but did cause it to do the wrong thing in some cases.

The purpose of this task is to analyze old protocol versions to find out if there are differences between versions before and after global changes that were incorrect. If there are, then the next step will be to analyze ways to find and fix them automatically or, if they can't be fixed automatically, to at least find and report them.

Comment entered 2011-07-28 23:53:10 by alan

BZDATETIME::2011-07-28 23:53:10
BZCOMMENTOR::Alan Meyer
BZCOMMENT::1

I wrote a draft program that does the following:

Locate all documents and versions that were at risk for loss of data elements.

For each one:
Get the pre-change version.
Count the elements that were at risk for disappearance.
Get the post-change version.
Count the same elements.
If the counts differ:
Report the CDR ID and version number.

I ran it on Bach and found 0 errors.

That is either terrific news, or else it's too good to be true.

I'll go over the program next week and, if I don't see any problems with it myself, I'll walk through it with Bob or Volker to see if they spot any holes that errors might have slipped through undetected.

Comment entered 2011-08-12 00:00:21 by alan

BZDATETIME::2011-08-12 00:00:21
BZCOMMENTOR::Alan Meyer
BZCOMMENT::2

It was too good to be true (sigh).

As near as I can tell 192 Gender elements were lost in InScopeProtocols. Of them, one CDR0000065893, lost Gender="Female" All of the rest lost Gender="Both".

I didn't find any errors in CTGovProtocols.

The next thing is to decide what to do about it. I can write a global change to put the Gender element back in each of them. I'd probably do the one with Gender=Female by hand.

Some of the 192 are probably now blocked, so we have to decide whether to do those or not.

I'll wait for a decision from Margaret or someone else who knows better than I the significance of the losses before doing any more.

Comment entered 2011-08-25 15:39:20 by alan

BZDATETIME::2011-08-25 15:39:20
BZCOMMENTOR::Alan Meyer
BZCOMMENT::3

We decided at the status meeting today to proceed with a global change to fix the 192 InScopeProtocols. We'll fix all of them without regard to whether they are currently blocked or not (it's no extra labor to fix them all.)

I think I may as well do the work for the global under this issue rather than creating a new one.

Comment entered 2011-08-25 22:16:10 by alan

BZDATETIME::2011-08-25 22:16:10
BZCOMMENTOR::Alan Meyer
BZCOMMENT::4

I wrote the global change, experimented with it, learned a bit,
modified it, and ran it again in test mode on Mahler.

The test results can be seen here:

http://mahler.nci.nih.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2011-08-25_20-56-56

The log file for the run, which also has interesting information,
is attached.

The global does the following:

Select the 192 documents identified by the previous program
run. These are documents that lost a Gender element in the
course of a previous global change.

For each document in the list:

For each unique version in (CWD, LASTV, or LASTP):

Retrieve the XML from the CDR server.

Check to see if it is a CTGovProtocol. These are
protocols that were once InScopeProtocols but have
been transferred to CTGov.

If it is a CTGovProtocol, skip it.

Else:

Check to see if it already has a Gender element.
I believe that these are documents that lost a
Gender element during a global change and someone
noticed that it was missing Gender and added it
back.

If it has a Gender element, skip it.

Else:

Insert a Gender element.

NOTES on skipped versions (CTGovProtocols or InScopeProtocols with
restored Gender elements):

Skipped versions show up in the log file with the word
"skipping".

Skipped versions show up in the Global Change Test Results
with a 20 character .diff file containing the string:

" – No differences – "

The results were as follows on Mahler:

26 document versions were found to be CTGovProtocols.

52 document versions were found to already have a Gender element.

397 document versions had a Gender element added.

The total is larger than 192 because many documents had two or
three different versions that were examined.

Some of the documents that had a Gender element added might
actually be CTGovProtocol documents today, but were transferred
to CTGov before we adopted our present policy of retaining the
CDR ID for the CTGov version. I didn't check for that situation,
which would have added some complexity that I didn't want to add
unless we need it.

The output is ready for test inspection. We need to determine:

1. Is what was done correct?

2. Do we need to do more?

The "more" I'm thinking of is to do something about the
documents that were transferred to CTGov missing a Gender
element.

I presume that we aren't allowed to do anything with the
CTGov versions of those documents because we no longer own
them. And I presume that, for the 26 CTGov documents that
showed up in this list, there's no particular value in
transforming the old InScopeProtocol versions that existed
prior to the change.

3. Do we need to do less?

The program DOES transform old InScopeProtocols that were
transferred to CTGov and given a new CDR ID. Is it wrong to
transform them to a state that they didn't have at the point
of transfer to CTGov?

I'm guessing that it really doesn't matter one way or the
other. It is still possible to find the state of the
document at the time of the transfer using the version
archive.

Comment entered 2011-08-25 22:16:10 by alan

Attachment Request5033a.log has been added with description: Log file from test run on Mahler

Comment entered 2011-09-01 21:29:08 by alan

BZDATETIME::2011-09-01 21:29:08
BZCOMMENTOR::Alan Meyer
BZCOMMENT::5

Our decision at today's status meeting was to run the global change as is, neither more nor less.

I'll do that when William completes testing.

Comment entered 2011-09-06 14:51:13 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-09-06 14:51:13
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::6

(In reply to comment #5)
> Our decision at today's status meeting was to run the global change as is,
> neither more nor less.
>
> I'll do that when William completes testing.

Verified. Please run in live mode on Mahler.

Comment entered 2011-09-07 00:21:18 by alan

BZDATETIME::2011-09-07 00:21:18
BZCOMMENTOR::Alan Meyer
BZCOMMENT::7

(In reply to comment #6)

> Verified. Please run in live mode on Mahler.

Done. The log file is attached.

Comment entered 2011-09-07 00:21:18 by alan

Attachment Request5033b.log has been added with description: Log file from live run on Mahler

Comment entered 2011-09-07 12:59:16 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-09-07 12:59:16
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::8

(In reply to comment #7)
> Created attachment 2152 [details]
> Log file from live run on Mahler
>
> (In reply to comment #6)
>
> > Verified. Please run in live mode on Mahler.
>
> Done. The log file is attached.

Verified on Mahler. Please run in test mode on Bach.

Comment entered 2011-09-08 14:48:36 by alan

BZDATETIME::2011-09-08 14:48:36
BZCOMMENTOR::Alan Meyer
BZCOMMENT::9

I've run the global in test mode on Bach. The log file is attached.

In comment #2 I noted that one document, CDR0000065893, lost Gender=Female. All the rest were Gender=Both. I put a test for that document in the program but it turns out that the protocol is now a CTGovProtocol, so I didn't process it. However someone caught it somewhere since CTGender=Female.

It's ready for review at:

http://bach.nci.nih.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2011-09-08_14-19-08

Comment entered 2011-09-08 14:48:36 by alan

Attachment Request5033test.log has been added with description: Log file from test run on Bach

Comment entered 2011-09-14 05:51:19 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-09-14 05:51:19
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::10

(In reply to comment #9)
> Created attachment 2154 [details]
> Log file from test run on Bach
>
> I've run the global in test mode on Bach. The log file is attached.
>
> In comment #2 I noted that one document, CDR0000065893, lost Gender=Female.
> All the rest were Gender=Both. I put a test for that document in the program
> but it turns out that the protocol is now a CTGovProtocol, so I didn't process
> it. However someone caught it somewhere since CTGender=Female.
>
> It's ready for review at:
>
> http://bach.nci.nih.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2011-09-08_14-19-08

Verified. Please run in live mode on Bach.

Comment entered 2011-09-20 15:28:55 by alan

BZDATETIME::2011-09-20 15:28:55
BZCOMMENTOR::Alan Meyer
BZCOMMENT::11

I've run the fix in live mode on Bach.

The log file is attached.

Comment entered 2011-09-20 15:28:55 by alan

Attachment Request5033.log has been added with description: Log file from live run on Bach

Comment entered 2011-09-21 13:44:02 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-09-21 13:44:02
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::12

(In reply to comment #11)
> Created attachment 2159 [details]
> Log file from live run on Bach
>
> I've run the fix in live mode on Bach.
>
> The log file is attached.

Verified this on Bach. We'll fix the validation errors.
I've marked this issue as Resolved. Should I go ahead and close it?

Comment entered 2011-09-21 15:33:56 by alan

BZDATETIME::2011-09-21 15:33:56
BZCOMMENTOR::Alan Meyer
BZCOMMENT::13

(In reply to comment #12)
> (In reply to comment #11)
> > Created attachment 2159 [details]
> > Log file from live run on Bach
> >
> > I've run the fix in live mode on Bach.
> >
> > The log file is attached.
>
> Verified this on Bach. We'll fix the validation errors.
> I've marked this issue as Resolved. Should I go ahead and close it?

I think we're done. You can close it.

Comment entered 2011-09-22 09:49:07 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-09-22 09:49:07
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::14

(In reply to comment #13)
> I think we're done. You can close it.

Bug closed. Thank you!

Attachments
File Name Posted User
Request5033.log 2011-09-20 15:28:55
Request5033a.log 2011-08-25 22:16:10
Request5033b.log 2011-09-07 00:21:18
Request5033test.log 2011-09-08 14:48:36

Elapsed: 0:00:00.001560