CDR Tickets

Issue Number 3045
Summary Global for Trials that don't need to be transferred
Created 2009-12-17 16:53:36
Issue Type Improvement
Submitted By Osei-Poku, William (NIH/NCI) [C]
Assigned To alan
Status Closed
Resolved 2010-04-19 14:17:08
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.107373
Description

BZISSUE::4721
BZDATETIME::2009-12-17 16:53:36
BZCREATOR::William Osei-Poku
BZASSIGNEE::Alan Meyer
BZQACONTACT::William Osei-Poku

We need a global to assign a value of “Transfer not required” in the CTGovOwnershipTransferContactLog block for all InScope protocols that have a status of “Completed” and the completion date that is prior to 2007-09-27.

The global will look for:

1. All InScope protocols
2. That have a status of “Completed”
3. With a CompletionDate that is prior to 2007-09-27
4. Insert CTGovOwnershipTransferContactLog block
5. Assign a value of “Transfer not required” in the CTGovOwnershipTransferContactResponse field
6. Insert the current date (the day the global is run) in the Date field.

I am including the email communication between Lakshmi and Kim about this issue

From: Eckley, Kimberly A kimberly.a.eckley@lmco.com
Sent: Wednesday, November 25, 2009 12:31 PM
To: Grama, Lakshmi (NIH/NCI) [E]; Jackson, Andrea (NIH/NCI) [C]
Cc: Leech, Mark J; Beckwith, Margaret (NIH/NCI) [E]; Osei-Poku, William
Subject: RE: Trials that don't need to be transferred

Yes! Please! Here are the elements we can update to facilitate this. If it can be a global, that will help immensely.

<CTGovOwnershipTransferContactLog>
<CTGovOwnershipTransferContactResponse> ---> insert “Transfer not required”
<Date>---> insert Date of global program

Kimberly A. Eckley, MA, PMP
Lockheed Martin (Contractor)
Program Manager - NCI-Clinical Trials Reporting Office Support

301-519-6511 - office
kimberly.a.eckley@lmco.com

From: Grama, Lakshmi (NIH/NCI) [E] lgrama@mail.nih.gov
Sent: Wednesday, November 25, 2009 12:28 PM
To: Eckley, Kimberly A; Jackson, Andrea (NIH/NCI) [C]
Cc: Leech, Mark J; Beckwith, Margaret (NIH/NCI) [E]; 'Osei-Poku, William'
Subject: Trials that don't need to be transferred

I looked at the large transfer related report today and found 1390 trials that have a completion date prior to 9/27/2007. I think these do not need to be transferred since they predate the FDAAA requirements. Should we mark these globally in some way as not needing transfer?

Lakshmi

Comment entered 2009-12-28 20:21:50 by alan

BZDATETIME::2009-12-28 20:21:50
BZCOMMENTOR::Alan Meyer
BZCOMMENT::1

I have completed a program for this global change and run it in
test mode on Franck. As with OCECDR-3044 I thought the Franck
data would be better for testing than the Mahler data.

There are some surprising results.

The first surprise is that my selection found far fewer than the
number of documents that were mentioned in Lashmi's email.
Lakshmi found 1390 trials. I found:

Mahler: 244
Franck: 272
Bach: 418

Here's the SQL query I'm using:

    • Find InScopeProtocols that are completed prior to 2007-09-27
      SELECT qStat.doc_id
      FROM query_term qStat
      JOIN query_term qDate
      ON qStat.doc_id = qDate.doc_id
      WHERE qStat.path = '/InScopeProtocol/ProtocolAdminInfo/CurrentProtocolStatus'
      AND qStat.value = 'Completed'
      AND qDate.path = '/InScopeProtocol/ProtocolAdminInfo/CompletionDate'
      AND qDate.value < '2007-09-27'
      ORDER BY qStat.doc_id

Have I done something wrong? Is there perhaps some other status
in addition to 'Completed', or some other date that I should be
looking at?

More in the next comment.

Comment entered 2009-12-28 20:30:32 by alan

BZDATETIME::2009-12-28 20:30:32
BZCOMMENTOR::Alan Meyer
BZCOMMENT::2

Here was another possible surprise:

I wrote the program with the following logic:

If there is no CTGovOwnershipTransferContactLog:

Create one.

Add the two specified subelements.

Else (there is a ContactLog):

Write a message to the log file and skip this document.

I didn't know if there would be any documents with ContactLogs in
the selected set (Completed trials, completed before the
specified date), but thought it would be a good idea to not
modify any documents that had ContactLog information for two
reasons:

1. Possibly, the document has actually been transferred, or
will be transferred, if the ContactLog element exists.

2. If, for any reason, we run the global more than once,
subsequent runs should not modify a document that has
already been modified.

It turns out that 33 of the 272 documents selected on Franck do
have ContactLogs. These were not modified.

I have attached the log file from the test mode run on Franck.
To find examples of documents that already have ContactLog
elements, search for ContactLog in the log file, then look at the
document in the Global Change report to see the contents.

The global change is dated: 2009-12-28 19:59:38.

Comment entered 2009-12-28 20:30:32 by alan

Attachment Request4721.log has been added with description: Log file for test run on Franck.

Comment entered 2009-12-29 08:19:27 by eckleyk

BZDATETIME::2009-12-29 08:19:27
BZCOMMENTOR::Kim Eckley
BZCOMMENT::3

(In reply to comment #1)
> I have completed a program for this global change and run it in
> test mode on Franck. As with OCECDR-3044 I thought the Franck
> data would be better for testing than the Mahler data.
> There are some surprising results.
> The first surprise is that my selection found far fewer than the
> number of documents that were mentioned in Lashmi's email.
>> Have I done something wrong? Is there perhaps some other status
> in addition to 'Completed', or some other date that I should be
> looking at?
> More in the next comment.

I think I know what went wrong. I have a long list of completed trials that meet the criteria, lucky for me, the first one I looked at, 449938, is not in the log file.

I looked at the records for one found in the log, and the example above. The difference is the presence of the <CompleteionDate> element. Not all trials completed prior to 9/27/07 have the <CompletionDate> element - 449938 does not, but when looking at the CurrentOrgStatus block, the StatusDate is in the right range.

Now, I'm not the technical expert, but I see two options.
1 - run another global insert the CompletionDate element with attribute DateType set to 'Actual' for trials with a CurrentOrgStatus of Completed with StatusDate prior to 9/27/07 then proceed with this global.

2 - work the parameters into this global to also look at CurrentOrgStatus and StatusDate.

I'm sure you may think of another option!

Comment entered 2009-12-29 11:16:23 by alan

BZDATETIME::2009-12-29 11:16:23
BZCOMMENTOR::Alan Meyer
BZCOMMENT::4

(In reply to comment #3)

> ... I see two options.
>
> 1 - run another global insert the CompletionDate element with
> attribute DateType set to 'Actual' for trials with a
> CurrentOrgStatus of Completed with StatusDate prior to 9/27/07
> then proceed with this global.
>
> 2 - work the parameters into this global to also look at
> CurrentOrgStatus and StatusDate.
>
> I'm sure you may think of another option!

Those two options look reasonable to me. I propose we choose
between them in the following way:

1. Run another global.

I would be inclined towards this approach if it is desirable
to update the CompletionDate regardless of other
considerations.

A global change to do that need not look at specific dates.
If a protocol were completed after 2007-07-27 but didn't have
an updated overall CompletionDate, we should probably still
update the CompletionDate.

Presumably, we would want to do this independently of any
CTGov involvement. The document may already have
CTGovOwnershipContactLog information, or may already have been
supplanted by a new CTGovProtocol with a different CDR ID.
But we would still update the CompletionDate if the trial were
completed.

If all of that is the case, then I would recommend this
solution. It will update many additional documents.

See also the discussion below under "a. There is no existing
overall CompletionDate."

2. Work the parameters into this global.

I would be inclined towards this approach only if updating
CompletionDates is thought to be tricky or risky and we only
want to add the new CTGovOwnershipTransferContactLog
information.

For either solution, we will also need to pin down the exact
algorithm for determining the CompletionDate. Based on Kim's
suggestion and after discussing this with Bob, my thinking is
that we update the CompletionDate when:

a. There is no existing overall CompletionDate.

If we calculate completion dates from LeadOrg status
information for documents that already have an overall
CompletionDate, we might well come up with a different date,
or with an "Actual" instead of "Projected" date.

This might or might not be a good thing to do. Maybe we
should replace CompletionDates that are "Projected" with
"Actual". Or maybe we should just leave them all alone.

b. Every single LeadOrgProtocolStatuses/CurrentProtocolStatus =
"Completed".

If any LeadOrg has a different CurrentProtocolStatus, the
trial is not completed.

We would then choose the latest date from among all the status
dates. For Kim's solution 1, we'd copy it into CompletionDate
and set the DateType attribute to "Actual". For Kim's solution
2, we'd just use it to determine whether or not the document
needs updating for the CTGovOwnershipTransferContactLog.

Comment entered 2010-01-25 15:35:54 by alan

BZDATETIME::2010-01-25 15:35:54
BZCOMMENTOR::Alan Meyer
BZCOMMENT::5

This issue seems to have fallen through the cracks. I'll need
answers to the questions posed in comment #4 in order to proceed.

Should I write a global to install and/or update CompletionDate
in InScopeProtocols, independently of this global (option 1
above?) Or should I just add logic to this global change (option
2 above?)

For either solution, I'll also need guidance on the algorithm to
use for determining the completion date based on (possibly)
multiple ProtocolLeadOrgs, where all, or not all, have completion
dates. If we choose to update the CompletionDate element, I'll
also need guidance on the setting of the DateType attribute to
"Actual" or "Projected".

Comment entered 2010-01-26 17:07:05 by alan

BZDATETIME::2010-01-26 17:07:05
BZCOMMENTOR::Alan Meyer
BZCOMMENT::6

Lakshmi and I discussed the issues in comment #4 and comment #5,
and I have done some research to find out more about what is
going on in the existing system. Here are some conclusions from
from our discussions, verified by research.

1. CurrentProtocolStatus is auto-updated.

The cdr::CdrDoc::updateProtocolStatus function in the server
sets the value of CurrentProtocolStatus = CurrentOrgStatus for
the lead organizations any time an InScopeProtocol is saved.

2. All lead organizations must have the same CurrentOrgStatus in
order for an InScopeProtocol to be valid.

The same function examines the CurrentOrgStatus for every lead
organization. If they are not all the same, it produces an
error message and invalidates the document - if validation is
being performed. Hence every publishable version will have
consistent values for CurrentProtocolStatus and for all
occurrences of lead org CurrentOrgStatus.

According to comments in the code, this has been unchanged
since 2003.

3. Not all protocols have consistent values for
CurrentProtocolStatus and CurrentOrgstatus. There are 77
Completed InScopeProtocols with inconsistent date values. Two
lists are attached, one showing the documents in doc ID order,
the other in descending date order by CompletionDate.

We don't index the the DateType attribute in SQL Server and I
haven't written a program to check the DateType for these
records. Presumably, some still have a Projected
CompletionDate with an actual CurrentOrgStatus/StatusDate.

Some of these are old.

Comment entered 2010-01-26 17:07:05 by alan

Attachment protdates.txt has been added with description: Mismatched CompletionDate and CurrentOrgStatus/StatusDate

Comment entered 2010-01-26 17:11:49 by alan

BZDATETIME::2010-01-26 17:11:49
BZCOMMENTOR::Alan Meyer
BZCOMMENT::7

The version I uploaded had the wrong column headers. This
has the corrected headers for the document attached to
comment #6.

Comment entered 2010-01-26 17:11:49 by alan

Attachment protdates.txt has been added with description: Fixed the column headers for previous attachment

Comment entered 2010-01-29 08:23:24 by eckleyk

BZDATETIME::2010-01-29 08:23:24
BZCOMMENTOR::Kim Eckley
BZCOMMENT::8

(In reply to comment #6)

> 2. All lead organizations must have the same CurrentOrgStatus in
> order for an InScopeProtocol to be valid.
> The same function examines the CurrentOrgStatus for every lead
> organization. If they are not all the same, it produces an
> error message and invalidates the document - if validation is
> being performed. Hence every publishable version will have
> consistent values for CurrentProtocolStatus and for all
> occurrences of lead org CurrentOrgStatus.
> According to comments in the code, this has been unchanged
> since 2003.

I wanted to double check this statement -

From my knowledge this is incorrect. CurrentOrgStatuses are not required to match throughout the document. If there are multiple lead orgs and varying CurrentOrgStatuses, there is an existing algorithm (below) that sets the CurrentOrgStatus. If there is a "mismatch" between CurrentOrgStatuses, the user sees a warning - but not an error. The trial can still be valid with mismatched org level statuses.

Following is the CDR algorithm regarding the programmatic setting of Current Protocol Status:

•If there is at least one CurrentOrgStatus of Active, set status to Active.
•If there is at least one CurrentOrgStatus of Temporarily closed, set status to Temporarily closed.
•If there is at least one CurrentOrgStatus of Completed, set status to Completed.
•If there is at least one CurrentOrgStatus of Closed, set status to Closed.

Comment entered 2010-02-05 00:25:29 by alan

BZDATETIME::2010-02-05 00:25:29
BZCOMMENTOR::Alan Meyer
BZCOMMENT::9

I'm running the global in test mode on Franck. It's still
running as I write this, but output will be in
"2010-02-04_23-06-10".

I am using the CurrentOrgStatus/StatusDate associated with the
primary lead org as the comparison date. I'll attach a log file
tomorrow when it's complete.

Some of the documents already have a contact log. They will be
identified in the log file with the phrase:

"Contact log exists. Skipping this document."

They show up in the global change report with "No differences" in
the diff file.

Comment entered 2010-02-05 02:21:09 by alan

BZDATETIME::2010-02-05 02:21:09
BZCOMMENTOR::Alan Meyer
BZCOMMENT::10

This run uses selection criteria independent of
the CompletionDate. Log file attached.

Comment entered 2010-02-05 02:21:09 by alan

Attachment Request4721a.log has been added with description: Log file for another test run on Franck

Comment entered 2010-02-05 08:40:56 by alan

BZDATETIME::2010-02-05 08:40:56
BZCOMMENTOR::Alan Meyer
BZCOMMENT::11

Don't bother looking at the output. I'm getting too many
documents. I have some ideas why. I'll do some research
and run again sometime over the weekend.

Comment entered 2010-02-05 15:13:32 by alan

BZDATETIME::2010-02-05 15:13:32
BZCOMMENTOR::Alan Meyer
BZCOMMENT::12

I have some more questions that I'd like to ask before I proceed
with this.

The usual global changes that we do are applied to up to three
different versions of a document:

Current Working Document (CWD)
Last Version
Last Publishable Version (if different from tlast version.)

I have found documents (e.g., CDR63397) that have already been
transferred that have transfer contact information in the CWD but
not the publishable version. This makes sense to me. As I
recall the process, we decide to transfer a document, block it
from export to CTGov, and mark the CWD with contact and blocking
information. I'm not sure that we would necessarily make a
version or a publishable version of the so marked CWD.

So, if the CWD has been marked as transferred, or in the process
of transfer, do we want to make any changes to the publishable or
last publishable version?

I would think not.

That would mean that the program cannot independently evaluate
the last publishable and last versions, as is normally done in a
global change, but would have to look only at the CWD to
determine if the document should get a "Transfer not required"
value. If the CWD should not get a "Transfer not required"
value, then neither of the other versions should get it.

Is that right?

I would also like to know the most foolproof way to determine
which documents are already transferred or might someday be
transferred.

One document I looked at, CDR63397, was marked as completed
eleven years ago in 1999, but it was transferred anyway. So the
date alone is insufficient.

In other tasks like this I found matching InScope and CTGov
documents by finding matching NCT_IDs. But those tasks were
updating the CTGov versions and so only needed to run on a
document when the CTGov tranfer was complete. This task is a
little different in that we may have documents that in the
process of transfer. In some cases the process is at a very
early stage, not yet involving sending anything to NLM.

So I'd like some criteria that are also useful at those early
stages. Ideally, the first indication that an InScopeProtocol
might some day be transferred to NLM should prevent my program
from adding the "Transfer not required" value.

If the indications are in a query_term indexed field, that is
ideal. If not, well, my program will just have to go through the
documents one by one and inspect them.

Thanks.

Comment entered 2010-02-09 15:24:32 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-02-09 15:24:32
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::13

I have attempted to answer the questions with the hope that Lakshmi will be able to review my answers to see if they are sound.

(In reply to comment #12)
> I have some more questions that I'd like to ask before I proceed
> with this.
> The usual global changes that we do are applied to up to three
> different versions of a document:
> Current Working Document (CWD)
> Last Version
> Last Publishable Version (if different from tlast version.)
> I have found documents (e.g., CDR63397) that have already been
> transferred that have transfer contact information in the CWD but
> not the publishable version. This makes sense to me. As I
> recall the process, we decide to transfer a document, block it
> from export to CTGov, and mark the CWD with contact and blocking
> information. I'm not sure that we would necessarily make a
> version or a publishable version of the so marked CWD.
> So, if the CWD has been marked as transferred, or in the process
> of transfer, do we want to make any changes to the publishable or
> last publishable version?
> I would think not.

I believe the above analysis is right. The contact log response has 5 values. As long as none of the transferred protocols is tagged as “Transfer not required”, it is accurate. And as you noted already, some trials will fall into this category. Also, in almost all cases, the current status of the trial (which is critical in determining the transferability of the trial) would not have changed between the publishable version and CWD which has the transfer information.

> That would mean that the program cannot independently evaluate
> the last publishable and last versions, as is normally done in a
> global change, but would have to look only at the CWD to
> determine if the document should get a "Transfer not required"
> value. If the CWD should not get a "Transfer not required"
> value, then neither of the other versions should get it.
> Is that right?

This is also right. I believe at this point we use the contact log information for reporting purposes only and does not affect the publishing of the document so that should not be a problem.

> I would also like to know the most foolproof way to determine
> which documents are already transferred or might someday be
> transferred.

Transfer documents (InScope) can be identified by a combination of the following:
1. The document has a transfer block and
2. The document has NCT ID and
3. The document is blocked from publication or
4. The document has BlockedFromCTGov element and also blocked from publication and
5. The document had previously been publishable.
Such documents will typically have corresponding CTGov documents which are not blocked from publication. This may not be true for trials that have just received the transfer blocks on the same day that you run the global, for example. Those trials will be transformed into CTGov documents by the next day. Or there may be trials that don’t get transformed because they are on the CTGov duplicate list.

Documents that might be transferred can also be identified by the following:
1. Has NCT ID and not blockedfromCTGov (element)
2. The document has a status of closed (this may not be the case for all trials)
3. The document may or may not have a Contact log block. If the document has Contact log block, the response value should be a value other than "Transfer not required"
4. The document is not blocked from publication
5. The document is publishable

Please note that these requirements for documents that might be transferred may change in the future and also may not be applicable in all situations because in some cases transfer trials may be active.

> One document I looked at, CDR63397, was marked as completed
> eleven years ago in 1999, but it was transferred anyway. So the
> date alone is insufficient.

Documents that fall in to the above category should have been tagged with a value other than “Transfer not required" for their CTGovTransferContactResponse element. For the most part, the value should be “PRS information received” as in the above example you provided. Additionally, I believe this will not be a problem as long as we know not to include these trials when generating reports. In other words, these trials need not be transferred. However, if they have been transferred, skip them (when performing the global) and also when we are generating reports to identify trials that need not be transferred, we need to remember to exclude trials that were completed prior to 2007-09-27 but have the transfer block. On the other hand you can report these trials for manual review.

> In other tasks like this I found matching InScope and CTGov
> documents by finding matching NCT_IDs. But those tasks were
> updating the CTGov versions and so only needed to run on a
> document when the CTGov tranfer was complete. This task is a
> little different in that we may have documents that in the
> process of transfer. In some cases the process is at a very
> early stage, not yet involving sending anything to NLM.
> So I'd like some criteria that are also useful at those early
> stages. Ideally, the first indication that an InScopeProtocol
> might some day be transferred to NLM should prevent my program
> from adding the "Transfer not required" value.
> If the indications are in a query_term indexed field, that is
> ideal. If not, well, my program will just have to go through the
> documents one by one and inspect them.
> Thanks.

Comment entered 2010-02-12 16:50:32 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2010-02-12 16:50:32
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::14

I found it a little difficult to understand the train of William's comments so I am trying to answer the questions independently.

I would also like to know the most foolproof way to determine
> which documents are already transferred or might someday be
> transferred.

I am attaching a document that provides some clarifications of what will not be transferred. In addition, as William points out, documents that are already transferred will Have a CTGOVOwnershipTransfer block in the CWD if it is still an InscopeProtocol.

Comment entered 2010-02-12 16:53:06 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2010-02-12 16:53:06
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::15

Comment entered 2010-02-12 16:53:06 by Grama, Lakshmi (NIH/NCI) [E]

Attachment Protocols that will not be transferred.doc has been added with description: Lakshmi's write up of criteria for adding Transfer not required

Comment entered 2010-02-12 17:03:55 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2010-02-12 17:03:55
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::16

> This task is a little different in that we may have documents that in the
> process of transfer. In some cases the process is at a very
> early stage, not yet involving sending anything to NLM.
> So I'd like some criteria that are also useful at those early
> stages. Ideally, the first indication that an InScopeProtocol
> might some day be transferred to NLM should prevent my program
> from adding the "Transfer not required" value.
> If the indications are in a query_term indexed field, that is
> ideal. If not, well, my program will just have to go through the
> documents one by one and inspect them.

The CTGOVOwnershipTransferContactLog element should actually help in this as I indicated in my write up CTGOVOwnershipTransferContactLogResponse values of “PRS information received”, “Transfer not required” “Waiting for response” all indicate that the global should ignore the record. If the field is not indexed in the query_term table, can you not add it to the table and index it to help run the global?

Comment entered 2010-02-15 17:26:21 by alan

BZDATETIME::2010-02-15 17:26:21
BZCOMMENTOR::Alan Meyer
BZCOMMENT::17

After reading the writeup, I think the most straightforward
approach to the problem is to produce five separate selections,
each with its own criteria and comment explaining why the
document was marked as not needing transfer.

I would run each selection as a separate global change ModifyDocs
job, though all five can be started from one program that has
common code for them all. When running in test mode, that would
cause the five groups to each appear in its own line in the
global change test results. I think that will make testing
easier since, if I make an error in one of the selections, it
should be easier to spot if all of the selections of one type are
together. If the number of documents for one of the selections
is too large or too small, we're more likely to see that than if
they are all in one big pot.

The writeup has two notes at the end, one for

"Protocols that will be transferred with some exceptions
(non-FDA regulated)"

and one for:

"Protocols that may be transferred (presumed to be need until
notified otherwise by Responsible Party)"

I presume those notes are for informational purposes only and
that I'm not supposed to make any changes to documents that they
discuss. Is that right?

In answer to the question of whether I can add an index entry if
needed to simplify selections, the answer is yes. I'll do that
if necessary.

Comment entered 2010-02-16 17:35:35 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2010-02-16 17:35:35
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::18

>>I presume those notes are for informational purposes only and
that I'm not supposed to make any changes to documents that they
discuss. Is that right?

Yes

Comment entered 2010-02-16 23:53:31 by alan

BZDATETIME::2010-02-16 23:53:31
BZCOMMENTOR::Alan Meyer
BZCOMMENT::19

I have written and tested five queries on Bach to match the five
sets of criteria in Lakshmi's document. Before testing a global
change I thought it would be useful to see the counts produced by
the queries and make sure that the results are near the expected
numbers.

The count of documents retrieved by each query on Bach, as of
this date, are listed. Groups 3, 4, and 5 have some documents
also selected in group 2. The overlap count is also listed.

1. No NCTID present.
Count = 12,499 (8,504 docs have NCTIDs)

2. NCTID present, but doc is blocked from CTGov.
Count = 1,064

3. NCTID present, Completed before 2007-09-27.
Count = 1,624
Overlap with group 2 = 143 docs.

4. NCTID present, but "Withdrawn from PDQ"
Count = 169
Overlap with group 2 = 20 docs.

5. NCTID present, "Withdrawn" before 2007-09-27
Count = 80
Overlap with group 2 = 7 docs.

Questions:

1. Do these counts look about right?

2. What should be done about placing comments in the docs that
overlap group 2?

Options might include:

a. Put in a single comment, in a priority order, e.g., Use
the comment for groups 3, 4, or 5. Only if none has been
applied do we insert the comment for group 2.

b. Add two comments to these documents.

This would seem to give the most information.

I've attached a file containing the SQL queries for each of the
five groups.

Comment entered 2010-02-16 23:53:31 by alan

Attachment Request4721.sql has been added with description: SQL queries to select InScopeProtocols not needing transfer

Comment entered 2010-02-18 10:31:00 by alan

BZDATETIME::2010-02-18 10:31:00
BZCOMMENTOR::Alan Meyer
BZCOMMENT::20

I woke up in the middle of the night last night and realized that
my queries all left out a critical part. The counts are all
wrong.

I'll post a corrected count and queries soon.

Comment entered 2010-02-18 16:27:57 by alan

BZDATETIME::2010-02-18 16:27:57
BZCOMMENTOR::Alan Meyer
BZCOMMENT::21

The problem with my earlier counts (in comment #19) was that they
included documents that had one of the TransferContactResponse
elements indicating that their disposition is already known.

Here are the numbers with those deleted, just showing the
document count for those that need the new "Transfer not
required" value. There are also slight differences due to
activity in the database between last Tuesday and today.

1. No NCTID present.
Count = 12,500

2. NCTID present, but doc is blocked from CTGov.
Count = 184

3. NCTID present, Completed before 2007-09-27.
Count = 988

4. NCTID present, but "Withdrawn from PDQ"
Count = 162

5. NCTID present, "Withdrawn" before 2007-09-27
Count = 75

Comment entered 2010-02-18 16:31:38 by alan

BZDATETIME::2010-02-18 16:31:38
BZCOMMENTOR::Alan Meyer
BZCOMMENT::22

(In reply to comment #19)
...
> Questions:
>
> 1. Do these counts look about right?
>
> 2. What should be done about placing comments in the docs that
> overlap group 2?

The answers we agreed upon at the meeting today are:

1. The counts look right.

2. We will not put two comments in documents that are in
both group 2 and one of the other groups.

We also decided that I should all existing comments alone and
not add new ones where a comment already exists. The reason is
that existing comments were added by users and should already
contain any useful descriptive information.

I'm going to proceed to work on the implementing a new version
of the global in light of all decisions taken.

Comment entered 2010-02-18 16:33:13 by alan

BZDATETIME::2010-02-18 16:33:13
BZCOMMENTOR::Alan Meyer
BZCOMMENT::23

(In reply to comment #22)
...
leave
> We also decided that I should ^ all existing comments alone

Comment entered 2010-02-18 21:25:48 by alan

BZDATETIME::2010-02-18 21:25:48
BZCOMMENTOR::Alan Meyer
BZCOMMENT::24

I have another question.

"CTGovOwnershipTransferContactResponse" can have any of five
values:

'No response'
'PRS information received'
'Transfer not required'
'Unable to locate'
'Waiting for response'

We've said what to do when a document has one of three of those
values, but we haven't said exactly what to do if the value is
'No response' or 'Unable to locate'.

I can think of several possibilities:

1. Do nothing to this document.

2. Replace the contact response, losing the previous
information.

3. Add a new element with the "Transfer not required" value.

This requires that we add another container element,
"CTGovOwnershipTransferContactLog", with a new date and
comment.

Option 1 looks most attractive to me. If someone tried to
contact the organization running the trial but has not gotten a
response, it seems that we originally thought that the trial
should be transferred. Saying "No transfer required" might be
just wrong.

Option 2 has two possible disadvantages. It says that transfer
is not required when in fact we know that someone once thought
that it might be. It also destroys the information we already
entered.

Option 3 looks like a possibility, but do we really want to say a
transfer is not required when it might actually not be?

I checked to see how many times we used each value. Here are the
current counts on Bach:

Unable to locate 16
Waiting for response 95
No response 116
Transfer not required 519
PRS information received 919

I also checked to see if any of the documents currently have more
than one contact response. As of tonight, none do. They all
have exactly one or none.

If we choose option 3, should we put the new contact information
before or after the existing one?

What should we do?

Comment entered 2010-02-19 08:41:48 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2010-02-19 08:41:48
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::25

Use Option 1. That is why I did not include those values as part of the criteria - the value that already exists is meaningful as is.

Comment entered 2010-02-19 17:02:34 by alan

BZDATETIME::2010-02-19 17:02:34
BZCOMMENTOR::Alan Meyer
BZCOMMENT::26

(In reply to comment #25)
> Use Option 1. That is why I did not include those values as part of the
> criteria - the value that already exists is meaningful as is.

In that case, I think our conclusion is quite simple. If there
is any TransferContactResponse element, leave the document alone.
We don't have to examine the values at all.

That's what I'll do.

Comment entered 2010-02-23 23:36:21 by alan

BZDATETIME::2010-02-23 23:36:21
BZCOMMENTOR::Alan Meyer
BZCOMMENT::27

I have run the global change in test mode on Bach. I wasn't sure
the data on any other server would be up to date enough for a
good test.

A complete test would modify (as of tonight) 13,897 documents.
That's more than anyone can look at for test purposes, so I
limited the processing of any one category of document to a
maximum of 200 individual docs - which is still more than we
really need, but I wanted to do enough to give some confidence
that it really is working right.

As explained in comment #12, only current working documents were
included in the results. The global will not save any modified
documents in the version archive. No last versions and no last
publishable versions are changed.

As explained in comment #17, the outputs are in five different
global change test results, named:

2010-02-23 23:16:39
2010-02-23 23:13:02
2010-02-23 23:10:17
2010-02-23 23:07:06
2010-02-23 23:04:50

Each represents a different category of InScopeProtocol, where
the categories are explained in Lakshmi's attachment to comment
#15.

The earliest date ran first. If some document could qualify for
processing under more than one of the five selection criteria, it
was processed only in earliest one for which it qualified in the
time order given above. If the order should change (I doubt if
it really makes a difference), I can re-arrange the order of
processing.

If everything looks good, I can either run the global in test
mode on the full 13,897 documents as a further test, or go
straight to a live run - whichever I am asked to do.

Comment entered 2010-03-02 11:17:14 by alan

BZDATETIME::2010-03-02 11:17:14
BZCOMMENTOR::Alan Meyer
BZCOMMENT::28

Has anyone looked at the test run yet?

Should I run it live or do a fuller test?

My impression of the test run I did is that the changes
are fine. The most important thing to check is probably
the selection. Were the protocols selected correctly to
get the new "Transfer not required" assignment?

Comment entered 2010-03-02 13:29:02 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-03-02 13:29:02
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::29

I have looked at several of the protocols in each category and did not find any problems. I compared the protocols in each of the test results with the categories (provided by Lakshmi) and also looked at individual records to determine if they should be assigned the "Transfer not required" value or not.

Kim,
Do you want to take a look at some of the records before the live run?

Comment entered 2010-03-03 13:48:56 by eckleyk

BZDATETIME::2010-03-03 13:48:56
BZCOMMENTOR::Kim Eckley
BZCOMMENT::30

(In reply to comment #28)
> Has anyone looked at the test run yet?
> Should I run it live or do a fuller test?
> My impression of the test run I did is that the changes
> are fine. The most important thing to check is probably
> the selection. Were the protocols selected correctly to
> get the new "Transfer not required" assignment?

OK - I think there is a problem with the selection.

From Lakshmi's comments (comment 15):
Primary Criteria – InscopeProtocols should not already have a CTGOVOwnershipTransferInfo block or a CTGOVOwnershipTransferContactLog with “PRS information received”, “Transfer not required” “Waiting for response”

The following trials appeared in the 2010-02-23 23:07:06 global run, trials with "blocked from CTGov"
66509
580360
551972
490989

All of these trials do have the BlockFromCTGov element, but also DO have the
CTGOVOwnershipTransferInfo block. They do not have the CTGOVOwnershipTransferContactLog however, as when these trials were transfered, the log block did not exist.

Perhaps we will need to look at a global selection to add "Transfer Info Received" when there IS a CTGOVOwnershipTransferInfo block but not a CTGOVOwnershipTransferContactLog block.

We will continue with our QC.

Comment entered 2010-03-03 16:35:52 by alan

BZDATETIME::2010-03-03 16:35:52
BZCOMMENTOR::Alan Meyer
BZCOMMENT::31

(In reply to comment #30)

> OK - I think there is a problem with the selection.

Thanks for the careful checking Kim.

I will look at these tomorrow morning and do whatever needs
done to fix them.

I'll plan on doing another test run tomorrow afternoon or
evening.

Comment entered 2010-03-04 09:20:14 by eckleyk

BZDATETIME::2010-03-04 09:20:14
BZCOMMENTOR::Kim Eckley
BZCOMMENT::32

(In reply to comment #31)
> (In reply to comment #30)
> > OK - I think there is a problem with the selection.
> Thanks for the careful checking Kim.
> I will look at these tomorrow morning and do whatever needs
> done to fix them.
> I'll plan on doing another test run tomorrow afternoon or
> evening.

Thanks Alan.
I also tried working backwards - finding a trial that if the logic wasn't right - that it would have been on the list - but that backfired - it wasn't on the list! Not sure what was going on with these then.

Comment entered 2010-03-04 10:17:44 by alan

BZDATETIME::2010-03-04 10:17:44
BZCOMMENTOR::Alan Meyer
BZCOMMENT::33

(In reply to comment #32)

> I also tried working backwards - finding a trial that if the
> logic wasn't right - that it would have been on the list - but
> that backfired - it wasn't on the list! Not sure what was going
> on with these then.

That could just have been due to the sample size. I limited the
test to 200 documents in each of the five categories, and a
couple of them had many more documents that would have qualified
but didn't make it into the 200.

The docs were retrieved by CDR ID. So for example in the group
at:
http://bach.nci.nih.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2010-02-23_23-04-50

the selected docs ranged from CDR0000063215 to CDR0000063428.

If I'm missing a document that should have qualified and was in
that range, then there's a bug in my selection query. However if
the CDR ID is above 63428, it would not have been selected just
because of the 200 document test batch size.

Once I've fixed the other problem, it may be best for me to do a
complete test run, unlimited in number of documents. That way
you can can check if the documents that you found that should
have been included were or were not included in the full
selection.

I'll do that tonight, after users have gone home and nightly
publishing is done, so that the long run won't bog down the
system for production use.

Comment entered 2010-03-04 10:24:21 by eckleyk

BZDATETIME::2010-03-04 10:24:21
BZCOMMENTOR::Kim Eckley
BZCOMMENT::34

(In reply to comment #33)
> (In reply to comment #32)
> > I also tried working backwards - finding a trial that if the
> > logic wasn't right - that it would have been on the list - but
> > that backfired - it wasn't on the list! Not sure what was going
> > on with these then.
> That could just have been due to the sample size. I limited the
> test to 200 documents in each of the five categories, and a
> couple of them had many more documents that would have qualified
> but didn't make it into the 200.

That was the weird thing - the query I looked at only had 179 trials, so I though it would be inclusive.

We'll look again after the full test.

Comment entered 2010-03-04 10:28:13 by alan

BZDATETIME::2010-03-04 10:28:13
BZCOMMENTOR::Alan Meyer
BZCOMMENT::35

(In reply to comment #34)

> That was the weird thing - the query I looked at only had 179 trials, so I
> though it would be inclusive.

In that case, the missing doc(s) should have been included.

Please send me the CDR IDs for any such docs. I'll check them out
before I run tonight. Hopefully I can find the problem before
the big test.

Comment entered 2010-03-04 11:16:51 by alan

BZDATETIME::2010-03-04 11:16:51
BZCOMMENTOR::Alan Meyer
BZCOMMENT::36

I searched the database for docs that have a
CTGovOwnershipTransferInfo/CTGovOwnerOrganization, but no
CTGovOwnershipTransferContactLog/CTGovOwnershipTransferContactResponse.

Listed below are the doc IDs and the name of the owner
organization. There are currently 70 of them.

65519 UNC_Lineberger
66390 UNC_Lineberger
66483 BurzynskiRI
66485 BurzynskiRI
66489 BurzynskiRI
66490 BurzynskiRI
66492 BurzynskiRI
66504 BurzynskiRI
66505 BurzynskiRI
66507 BurzynskiRI
66509 BurzynskiRI
66510 BurzynskiRI
66512 BurzynskiRI
66513 BurzynskiRI
66514 BurzynskiRI
66531 BurzynskiRI
66537 BurzynskiRI
66538 BurzynskiRI
66552 BurzynskiRI
66554 BurzynskiRI
66570 BurzynskiRI
66578 BurzynskiRI
66582 BurzynskiRI
66585 BurzynskiRI
67840 UNC_Lineberger
67973 MasonicCC
68119 UNC_Lineberger
68120 UNC_Lineberger
68121 UNC_Lineberger
341468 UNC_Lineberger
367486 StJudeCRH
377732 UNC_Lineberger
385684 WakeForest
429486 Mdanderson
437085 StJudeCRH
439444 UNC_Lineberger
440120 UNC_Lineberger
445077 OregonHSU
450904 MasonicCC
451886 MasonicCC
452043 MasonicCC
470861 Mdanderson
490989 MasonicCC
491154 Mdanderson
491190 MasonicCC
504457 StJudeCRH
531778 Mdanderson
531905 MasonicCC
532938 Mdanderson
543871 StJudeCRH
549772 UNC_Lineberger
550090 UNC_Lineberger
550159 UNC_Lineberger
551972 UNC_Lineberger
553137 UNC_Lineberger
557580 Mdanderson
561610 UNC_Lineberger
561613 UNC_Lineberger
561620 UNC_Linberger
562062 UNC_Lineberger
564370 MasonicCC
579819 UNC_Lineberger
580360 StJudeCRH
586510 MasonicCC
586671 MasonicCC
588191 Mdanderson
593406 StJudeCRH
599895 Beth_IsraelMC
642276 MasonicCC
649128 UNC_Lineberger

All of the docs listed by Kim are on this list.

I will add a qualifier to all of my selection queries to say that
if the CTGovOwnerOrganization exists, don't include the document
in the set to be transformed.

Comment entered 2010-03-05 00:04:41 by alan

BZDATETIME::2010-03-05 00:04:41
BZCOMMENTOR::Alan Meyer
BZCOMMENT::37

I ran the global change in test mode for all documents, producing
37,504 files and 277,030,180 bytes of output. We should probably
delete all of it when testing is done and the live global is
complete. It's occupying a lot of space.

Here are the document selection counts from the first global
change that did not incorporate the correction identified by Kim.
The processing was limited to 200 documents per category, but the
selection counts show what would have happened in a full,
unlimited, test run.

2010-02-23 23:05:25: 12507 documents selected
2010-02-23 23:07:06: 180 documents selected
2010-02-23 23:10:42: 975 documents selected
2010-02-23 23:13:02: 162 documents selected
2010-02-23 23:16:45: 73 documents selected

Here are the new document counts after incorporating the
corrections:

2010-03-04 20:51:33: 12504 documents selected
2010-03-04 22:58:29: 114 documents selected
2010-03-04 23:07:37: 968 documents selected
2010-03-04 23:24:24: 163 documents selected
2010-03-04 23:29:12: 73 documents selected

Some of the changes in counts are undoubtedly due not to the
query correction but to changes in the database. The increase
from 162 to 163 for the second batch in each group must have been
of this type.

The documents can be examined using the global change test report
for any of the above date/time values.

Comment entered 2010-03-15 14:47:13 by eckleyk

BZDATETIME::2010-03-15 14:47:13
BZCOMMENTOR::Kim Eckley
BZCOMMENT::38

(In reply to comment #37)
> I ran the global change in test mode for all documents, producing
> 37,504 files and 277,030,180 bytes of output. We should probably
> delete all of it when testing is done and the live global is
> complete. It's occupying a lot of space.
>> of this type.

I was about to say that this all looked OK, but found another minor snafu.

For the criteria "InscopeProtocols that do not have an NCTID", that should not include trials w/o a publishable version. There are trials in CDR that have a processing status that includes "Needs administrative information" and the element <MissingRequiredInformation>. These trials have not YET been published, and therefore do not have an NCTID. They are a different animal that all of the 5-digit CDRID trials that are appropriately in this bucket.

These are in the query results for: 2010-03-04 20:51:24.
Sample trial: 666482

At this point, the other queries/results look OK.

Comment entered 2010-03-16 16:39:06 by alan

BZDATETIME::2010-03-16 16:39:06
BZCOMMENTOR::Alan Meyer
BZCOMMENT::39

(In reply to comment #38)
> ...
> I was about to say that this all looked OK, but found another
> minor snafu.
>
> For the criteria "InscopeProtocols that do not have an NCTID",
> that should not include trials w/o a publishable version. There
> are trials in CDR that have a processing status that includes
> "Needs administrative information" and the element
> <MissingRequiredInformation>. These trials have not YET been
> published, and therefore do not have an NCTID. They are a
> different animal that all of the 5-digit CDRID trials that are
> appropriately in this bucket.

In order to find these documents and exclude them from the global
change I'll need to add a couple of entries into our
query_term_def table - the definitions that tell the software
what XML elements to index. Then I'll need to re-index all
InScopeProtocols.

I'll do this first on Mahler then, tonight, when publishing is
done, I'll do the update and re-indexing on Bach. It's a big
batch process and I don't want it to interfere with interactive
users or with publishing.

Franck will get the updates when we next update Franck from Bach.

Comment entered 2010-03-17 01:11:09 by alan

BZDATETIME::2010-03-17 01:11:09
BZCOMMENTOR::Alan Meyer
BZCOMMENT::40

(In reply to comment #38)
> ...
> For the criteria "InscopeProtocols that do not have an NCTID",
> that should not include trials w/o a publishable version. There
> are trials in CDR that have a processing status that includes
> "Needs administrative information" and the element
> <MissingRequiredInformation>. These trials have not YET been
> published, and therefore do not have an NCTID. They are a
> different animal that all of the 5-digit CDRID trials that are
> appropriately in this bucket.
>
> These are in the query results for: 2010-03-04 20:51:24.
> Sample trial: 666482

I've revised the query for the documents in light of the above as
follows:

If the document has a ProcessingStatus (in either the old or
the new element used for that) with a value of "Needs
administrative information"

AND

If the document has a MissingRequiredInformation/MissingInformation
element, with any value at all.

THEN

Don't include it in the query.

We don't need to test by re-running the global because there is
no change in the way documents are processed. All we need to do
run the old and new queries and look at the differences:

Here are the differences. The following documents are excluded
by the new query that would have been found by the old query. If
these are correct, we're ready to roll.

560423
657208
661794
662682
663557
663832
664115
664529
665195
665319
665359
665860
666327
666482
666745
666959
666999
668246
668388

I must say that Kim has sharp eyes to have spotted one of these
and seen that it was wrong in among a total of about 12,500
documents!

Comment entered 2010-03-17 11:49:21 by eckleyk

BZDATETIME::2010-03-17 11:49:21
BZCOMMENTOR::Kim Eckley
BZCOMMENT::41

(In reply to comment #40)
> Here are the differences. The following documents are excluded
> by the new query that would have been found by the old query. If
> these are correct, we're ready to roll.
> 560423
> 657208
> 661794
> 662682
> 663557
> 663832
> 664115
> 664529
> 665195
> 665319
> 665359
> 665860
> 666327
> 666482
> 666745
> 666959
> 666999
> 668246
> 668388
> I must say that Kim has sharp eyes to have spotted one of these
> and seen that it was wrong in among a total of about 12,500
> documents!

I just prepared a long comment, and lost it. Urgg.

Thanks Alan - I knew that 6-digit numbers in that quantity wasn't good....

We looked at a few more trials, as there are still quite a few 6-digit trials remaining. Looks like some do not have the processing status of MissingAdministrative Information - but do have the MissingInformationElement. They do have a processing status of HOLD.

We should probably look at adding (to the criteria) the processing status of HOLD as another flag to exclude the trial from selection.
Samples:
600208 - on hold for missing information
617024 - on hold for missing information

Hope that makes sense.

Comment entered 2010-03-17 12:40:48 by alan

BZDATETIME::2010-03-17 12:40:48
BZCOMMENTOR::Alan Meyer
BZCOMMENT::42

(In reply to comment #41)
...
> We should probably look at adding (to the criteria) the processing status of
> HOLD as another flag to exclude the trial from selection.
> Samples:
> 600208 - on hold for missing information
> 617024 - on hold for missing information
>
> Hope that makes sense.

That will be easy to do. But before I do it I better give you
the complete list of processing statuses in case there are any
others that we ought to exclude. Here they are from the
CommonProtocolInfo schema:

<!– Deprecated; to be removed after old docs are flushed. -->
<enumeration value = 'Edit HP Abstract'/>
<enumeration value = 'Edit PT Abstract'/>
<enumeration value = 'Write HP Abstract'/>
<enumeration value = 'Write PT Abstract'/>
<enumeration value = 'Ready to merge'/>

<!– These values are from the InScopeProtocol schema. -->
<enumeration value = 'Disapproved by PDQ Editorial Board'/>
<enumeration value = 'Approved by PDQ Editorial Board'/>
<enumeration value = 'Under PDQ Editorial Board review'/>
<enumeration value = 'Processing complete'/>
<enumeration value = 'Pending'/>
<enumeration value = 'Duplicate'/>
<enumeration value = 'Withdrawn'/>
<enumeration value = 'Hold'/>
<enumeration value = 'Abstract in review'/>
<enumeration value = 'Merged'/>
<enumeration value = 'Needs administrative information'/>
<enumeration value = 'Research study'/>
<enumeration value = 'Legacy - Do not publish'/>
<enumeration value = 'Needs OCCM review'/>
<enumeration value = 'Needs PI/PC comment'/>
<enumeration value = 'Needs scientist comment'/>
<enumeration value = 'Needs scientific information'/>

<!– Use these instead for new documents. -->
<enumeration value = 'Final QC/Publishing'/>
<enumeration value = 'First edit'/>
<enumeration value = 'Patient Abstract'/>
<enumeration value = 'Patient QC'/>
<enumeration value = 'Requires second edit'/>
<enumeration value = 'Second edit'/>
<enumeration value = 'Write HP draft'/>
<enumeration value = 'Requires PT Abstract'/>
<enumeration value = 'Requires patient QC'/>
<enumeration value = 'Requires first edit'/>

Also, we probably need to reconsider the role of the
MissingInformation element. Some options are:

If there is a MissingInformation element in a document,
exclude that document from the list of documents to process,
no matter what the processing status is.

OR

For certain status values, only exclude the documents if a
MissingInformation element exists.

OR

Don't use the MissingInformation element at all.

Comment entered 2010-03-17 12:56:27 by eckleyk

BZDATETIME::2010-03-17 12:56:27
BZCOMMENTOR::Kim Eckley
BZCOMMENT::43

(In reply to comment #42)
Exclude trials with the following processing statuses:

<enumeration value = 'Pending'/>
<enumeration value = 'Hold'/>
<enumeration value = 'Abstract in review'/>
<enumeration value = 'Merged'/>
<enumeration value = 'Needs administrative information'/>

I don't think we will see more than merged, needs admin info and hold, but let's play it safe this time. Thanks for the list.

> Also, we probably need to reconsider the role of the
> MissingInformation element. Some options are:
> If there is a MissingInformation element in a document,
> exclude that document from the list of documents to process,
> no matter what the processing status is.
> OR
> For certain status values, only exclude the documents if a
> MissingInformation element exists.
> OR
> Don't use the MissingInformation element at all.

For this, let's go with the first option.

Comment entered 2010-03-17 13:18:43 by alan

BZDATETIME::2010-03-17 13:18:43
BZCOMMENTOR::Alan Meyer
BZCOMMENT::44

I'll implement the changes tomorrow and generate a new list
of document IDs that will be excluded from the main list.

Comment entered 2010-03-18 10:42:17 by alan

BZDATETIME::2010-03-18 10:42:17
BZCOMMENTOR::Alan Meyer
BZCOMMENT::45

(In reply to comment #44)
> I'll implement the changes tomorrow and generate a new list
> of document IDs that will be excluded from the main list.

There are now 191 docs that are excluded from the list of
documents with no NCTID. I've attached the CDR IDs.

Comment entered 2010-03-18 10:42:17 by alan

Attachment diffout2 has been added with description: IDs of docs with no NCTID that will not be marked as not needing transfer

Comment entered 2010-03-18 13:49:28 by alan

BZDATETIME::2010-03-18 13:49:28
BZCOMMENTOR::Alan Meyer
BZCOMMENT::46

Email message from Andrea:

Hi Alan,

Kim asked me to take a look at the 191 trials that were excluded from the new list. As far as I can see things looks good. Unfortunately I don’t Bugzilla access so I wasn’t able to leave the comment there.

Thanks,
Andrea

Comment entered 2010-03-19 00:25:56 by alan

BZDATETIME::2010-03-19 00:25:56
BZCOMMENTOR::Alan Meyer
BZCOMMENT::47

I started the global change tonight in live mode on Bach and let
it run while I worked on another problem. But something has gone
wrong. After processing 4,889 documents, I noticed that only 22
had been saved.

I think the 22 should have been saved, so no harm was done, but I
don't yet know why the other 4,867 documents were not saved.

I have done some testing. I suspect there is a bug in the
ModifyDocs module (the global change "harness" that performs the
testing and saving of documents in all global changes) that is
hit when a calling program specifies that no saving is to be done
for previous versions, only for the CWD. I can't remember that
we ever actually used that capability before. It appeared to
work fine in test mode, but not in live mode. It's possible that
it has only ever been tested in test mode. I remember creating
the capability because I thought we might use it for something,
but then we didn't. So this task might be the first real use.

It appears that the proximate cause of the bug is that, somehow,
the ModifyDocs module replaces the in-memory copy of the CWD with
the transformed version too early in the process. It then
compares the transformed version of the current working document
with itself, instead of with the old CWD. The transformation
actually occurs, but the saveDoc logic doesn't see that anything
has changed and so does not save the document back to the
database.

At this point, I need to do a number of things:

1. Figure out why the CWD is replaced too early.

It may have something to do with the mechanism for turning
off the save of the publishable version. The sequence of
processing, saving, and remembering what was done is by far
the most complicated part of the ModifyDocs module. It is
possible that some part of the logic knows that it shouldn't
save the document, but doesn't know that it shouldn't replace
the in-memory copy of the CWD.

2. Figure out what is different about the 22 documents that
caused them to be saved.

The superficially obvious explanation - that they don't have
publishable versions - isn't right. I will have to do some
digging. I don't want to "fix" anything until I fully
understand why it occasionally worked.

3. Work out a fix, testing on small samples on Mahler.

4. Walk through it with Bob and/or Volker to see if they spot
any problems that I missed.

5. Test thoroughly on Mahler or Franck.

6. Run on Bach, but watching carefully and stopping it quickly
if it isn't doing the right thing.

Since the heavily used ModifyDocs module is involved, I'm bumping
up the priority of this task from 5 to 3, and will work on it as
my main task next Tuesday until I get it fixed (or someone tells
me this isn't high priority.)

Comment entered 2010-03-19 09:59:45 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-03-19 09:59:45
BZCOMMENTOR::Volker Englisch
BZCOMMENT::48

(In reply to comment #47)
> It appears that the proximate cause of the bug is that, somehow,
> the ModifyDocs module replaces the in-memory copy of the CWD with
> the transformed version too early in the process. It then
> compares the transformed version of the current working document
> with itself, instead of with the old CWD. The transformation
> actually occurs, but the saveDoc logic doesn't see that anything
> has changed and so does not save the document back to the
> database.

Somehow this process/problem rings a bell for me. I sort of remember that Bob and I had been looking at a problem like this possibly while you were on vacation, Alan.
Is it possible that we fixed the problem only for the test mode?

Comment entered 2010-03-24 00:34:08 by alan

BZDATETIME::2010-03-24 00:34:08
BZCOMMENTOR::Alan Meyer
BZCOMMENT::49

(In reply to comment #47)
> ...
> At this point, I need to do a number of things:
>
> 1. Figure out why the CWD is replaced too early.

Done. The cause was in our complex mechanism for treating
changes to stored versions as changes to the CWD. The
logic we had was right when processing the CWD plus the
last version(s), but wrong when we weren't processing last
versions.

> 2. Figure out what is different about the 22 documents that
> caused them to be saved.

Done. What made the 22 documents different was that they each
had a CWD that was not the same as the last version. All
the other docs had a CWD that was identical to the last
version.

> 3. Work out a fix, testing on small samples on Mahler.

Done. I've tested on small samples on Mahler in both test and
live mode. It appears to be working correctly.

> 4. Walk through it with Bob and/or Volker to see if they spot
> any problems that I missed.

Ready. I hope to walk through the changes with Bob and Volker on
Thursday.

I made changes both to fix the problems, and to better
document the issues to help me, or whoever works on this
in the future, to fully understand what's going on.

I think the new code will be faster and simpler in the
specific case of processing CWDs only, and unchanged in
the more general case.

To achieve simplicity, I eliminated the possibility of
separately specifying that we should process CWDs plus
last versions only, or CWDs plus last publishable versions
only. It was a risky technique that made things very
complicated. The new code allows a programmer to specify
only two possibilities:

Process the CWD and last version and last publishable
version.

Only the minimum required set is actually processed, as
before. If the last version is both publishable and
identical to the CWD, processing it effectively
processes all three states of the document.

or:

Process only the CWD.

I don't think we have a use case for separating last and
last publishable versions and only processing one of
those.

> 5. Test thoroughly on Mahler or Franck.

To be done.

> 6. Run on Bach, but watching carefully and stopping it quickly
> if it isn't doing the right thing.

To be done.

Comment entered 2010-03-24 00:47:56 by alan

BZDATETIME::2010-03-24 00:47:56
BZCOMMENTOR::Alan Meyer
BZCOMMENT::50

(In reply to comment #47)

> ... I can't remember that
> we ever actually used that capability before. ...

It appears that we actually did use this once before. It was in
December 2006, for issue #2747, "Convert three 'bajo estudio'
phrases to 'en estudio'".

If there were documents that had a CWD that was identical to the
last version, then unless this worked before but got broken later
(which I doubt), those documents never got converted.

According the log files, 98 documents were converted. Perhaps
there are more that should have been. But I won't go back and
look for them unless someone says I should.

Comment entered 2010-03-25 19:27:44 by alan

BZDATETIME::2010-03-25 19:27:44
BZCOMMENTOR::Alan Meyer
BZCOMMENT::51

At our status meeting today we decided:

1. I will test thoroughly in live mode on Franck before running on Bach.

2. I will not revisit the "bajo estudio" issue.

Bob, Volker and I completed the code walkthrough and identified a number of changes intended to make the code more readable and fix a backward compatibility issue. I have implemented all of the changes but won't have time to test tonight.

Comment entered 2010-03-31 00:52:51 by alan

BZDATETIME::2010-03-31 00:52:51
BZCOMMENTOR::Alan Meyer
BZCOMMENT::52

I've been testing the new ModifyDocs code on Mahler. It seems
to be almost right but not exactly right. It's now properly
storing all of the same CWD documents in live mode that it
transforms in test mode. However on a test of 250 documents,
it also transformed 4 versions. I'll figure it out on Thursday.

Comment entered 2010-04-02 00:56:25 by alan

BZDATETIME::2010-04-02 00:56:25
BZCOMMENTOR::Alan Meyer
BZCOMMENT::53

Light has dawned over Marble Head.

The saved versions in my last message were mostly okay, saving
old CWDs that were different from the last version. Only one was
really wrong.

There was a bug in my selection queries that caused one of the
document IDs that I was processing for this request to be
selected twice. I was missing a "DISTINCT" doc ID qualifier.
Once I realized the problem I saw that the evidence was plain
from the very beginning, but after making significant changes to
the ModifyDocs module last week I spent all my time trying to
figure out why that module went wrong. But it didn't go wrong.
It was just handed an incorrect list of document IDs.

I've started the test again on Mahler. I'm doing a test mode
run, then a live mode run. If it looks right this time, I'll do
a test on Franck on Tuesday.

Bear with me. We'll get there Real Soon Now.

Comment entered 2010-04-08 17:55:30 by alan

BZDATETIME::2010-04-08 17:55:30
BZCOMMENTOR::Alan Meyer
BZCOMMENT::54

Everything looked fine to me in the tests on Mahler.

I'm currently running a full test mode run on Franck. When it's
complete, I'll do some checking, then run a full live mode run on
Franck tonight. If everything still looks good I'll post a
message tonight asking users to do any review that seems
appropriate and tell me that there are problems or else authorize
a run on Bach.

I can start the job on Bach from home if desired to get it done
as soon as possible. If authorized tomorrow, I'll wait until
after users have gone home and weekly publishing is complete -
perhaps running the job on Saturday.

Comment entered 2010-04-09 10:35:07 by alan

BZDATETIME::2010-04-09 10:35:07
BZCOMMENTOR::Alan Meyer
BZCOMMENT::55

I've run twice on Franck. The first run was in test mode.
Results are in the global change test results here:

2010-04-08 19:39:05
2010-04-08 19:36:24
2010-04-08 19:16:51
2010-04-08 19:14:15
2010-04-08 17:07:35

The second run was live. The log file for that is
attached.

The changes to the documents are easy to verify in the test
mode run. More difficult would be verifying that the right
versions were saved in the live mode, and that the selections
in either mode were correct.

If everything looks good, give me the go ahead and I'll start
either a test or live mode run on Bach this weekend.

Comment entered 2010-04-09 10:35:07 by alan

Attachment 4721,log has been added with description: Log file for live run on Franck

Comment entered 2010-04-12 11:28:54 by eckleyk

BZDATETIME::2010-04-12 11:28:54
BZCOMMENTOR::Kim Eckley
BZCOMMENT::56

(In reply to comment #55)
> Created an attachment (id=1888) [details]
> Log file for live run on Franck
> I've run twice on Franck. The first run was in test mode.
> Results are in the global change test results here:
> 2010-04-08 19:39:05
> 2010-04-08 19:36:24
> 2010-04-08 19:16:51
> 2010-04-08 19:14:15
> 2010-04-08 17:07:35
> The second run was live. The log file for that is
> attached.
> The changes to the documents are easy to verify in the test
> mode run. More difficult would be verifying that the right
> versions were saved in the live mode, and that the selections
> in either mode were correct.
> If everything looks good, give me the go ahead and I'll start
> either a test or live mode run on Bach this weekend.

In 2010-04-08 17:07:35 - trial 653070 has the contact log added as not required for reason "Never registered in CTGov". However, this trial has a processing status of Hold and the missing required information elements.

Trial 666327 also has the missing req. info element and the status of Needs admin information. We were going to exclude these as part of comment #43.

Comment entered 2010-04-13 10:22:38 by alan

BZDATETIME::2010-04-13 10:22:38
BZCOMMENTOR::Alan Meyer
BZCOMMENT::57

(In reply to comment #56)
...
> In 2010-04-08 17:07:35 - trial 653070 has the contact log added
> as not required for reason "Never registered in CTGov".
> However, this trial has a processing status of Hold and the
> missing required information elements.
>
> Trial 666327 also has the missing req. info element and the
> status of Needs admin information. We were going to exclude
> these as part of comment #43.

I see that I misinterpreted my instructions there.

I added the requirement for that to the "Never registered in
CTGov" category. But you probably intended for me to add it to
"Blocked From CTGov" as well.

I can do that. To be clear, I will add the restriction that
documents with a ProcessingStatus of:

'Pending',
'Hold',
'Abstract in review',
'Merged',
'Needs administrative information'

are all to be excluded from the categories:

"Never registered in CTGov",
"Blocked From CTGov"

I would think that this doesn't affect the remaining three
categories, but if you think it could, let me know and I'll add
it to whichever ones require it:

"Completed prior to Sept 27, 2007",
"Withdrawn from PDQ",
"Withdrawn prior to Sept 27, 2007"

I can't re-run on Franck again because the documents there now
already have the "Transfer not required" information added to
them and will not be selected again.

What I'll do as a (hopefully final) test is run in test mode on
Bach tonight, after publishing completes.

I won't be changing the transformation script, just the selection
criteria. So the only QA we should need is to look for documents
that should not have been selected and were, or should have been
and were not.

I'm sorry this is taking so long. I think part of the problem is
that we're changing many thousands of documents, with a complex
set of selection criteria, and it's hard to know whether we got
the criteria exactly right.

Comment entered 2010-04-13 10:30:09 by eckleyk

BZDATETIME::2010-04-13 10:30:09
BZCOMMENTOR::Kim Eckley
BZCOMMENT::58

(In reply to comment #57)
> (In reply to comment #56)
> ...
> > In 2010-04-08 17:07:35 - trial 653070 has the contact log added
> > as not required for reason "Never registered in CTGov".
> > However, this trial has a processing status of Hold and the
> > missing required information elements.
> >
> > Trial 666327 also has the missing req. info element and the
> > status of Needs admin information. We were going to exclude
> > these as part of comment #43.
> I see that I misinterpreted my instructions there.
> I added the requirement for that to the "Never registered in
> CTGov" category. But you probably intended for me to add it to
> "Blocked From CTGov" as well.
> I can do that. To be clear, I will add the restriction that
> documents with a ProcessingStatus of:
> 'Pending',
> 'Hold',
> 'Abstract in review',
> 'Merged',
> 'Needs administrative information'
> are all to be excluded from the categories:
> "Never registered in CTGov",
> "Blocked From CTGov"

Just to make sure we are both on the same page - the trials with those statuses should not have ANY call log added. They should be left alone.

> What I'll do as a (hopefully final) test is run in test mode on
> Bach tonight, after publishing completes.

Sounds good.

> I'm sorry this is taking so long. I think part of the problem is
> that we're changing many thousands of documents, with a complex
> set of selection criteria, and it's hard to know whether we got
> the criteria exactly right.

I think that is exactly the problem - just so many documents and complex requirements. No worries.

Comment entered 2010-04-13 11:10:32 by alan

BZDATETIME::2010-04-13 11:10:32
BZCOMMENTOR::Alan Meyer
BZCOMMENT::59

(In reply to comment #58)
...
> > I can do that. To be clear, I will add the restriction that
> > documents with a ProcessingStatus of:
> > 'Pending',
> > 'Hold',
> > 'Abstract in review',
> > 'Merged',
> > 'Needs administrative information'
> > are all to be excluded from the categories:
> > "Never registered in CTGov",
> > "Blocked From CTGov"
>
> Just to make sure we are both on the same page - the trials
> with those statuses should not have ANY call log added. They
> should be left alone.

Right. If they have one of the two categories "Never registered
in CTGov" or "Blocked From CTGov", and have any of the five
listed ProcessingStatus values, they will not be selected for the
global change and will not have a contact log block added.

> > What I'll do as a (hopefully final) test is run in test mode
> > on Bach tonight, after publishing completes.
>
> Sounds good.

I'll run it tonight.

> > I'm sorry this is taking so long. I think part of the
> > problem is that we're changing many thousands of documents,
> > with a complex set of selection criteria, and it's hard to
> > know whether we got the criteria exactly right.
>
> I think that is exactly the problem - just so many documents
> and complex requirements. No worries.

I appreciate your careful testing on this. I know you're looking
for needles in the haystack here.

Comment entered 2010-04-13 23:37:33 by alan

BZDATETIME::2010-04-13 23:37:33
BZCOMMENTOR::Alan Meyer
BZCOMMENT::60

I didn't think deeply enough about Kim's error report when I
posted my earlier comments.

It turns out that there are a number of issues to be resolved.

The cause of the selection of the documents Kim identified that
shouldn't have been selected had to do with missing indexing on
Franck. A query that worked on Bach and Mahler made errors on
Franck because "MissingInformation" was missing from the index
there. Kind of poetic isn't it?

What I said about applying the ProcessingStatus filter to another
category of documents had nothing to do with the particular
errors Kim saw, but it turns out that it probably did cause
errors.

When I added the filter to the Blocked From CTGov category, it
did indeed change the results. Seven documents were excluded
that had not been excluded before. They were:

299686
331922
459498
573460
579896
579908

These would have been included in the old query but not the new.

But there's yet another problem too. It occurs to me that I am
probably selecting ProcessingStatus values incorrectly.

By dumb luck, I found one document that got excluded because it
had a ProcessingStatus of "Pending", but it turned out that
wasn't the latest processing status. The latest status is
"Processing complete".

When Kim explained the requirement to me to not select documents
that were Pending, Hold, etc., she didn't say to only look at the
latest ProcessingStatus, probably because that was so obvious it
didn't need saying.

But alas, since I don't work with the actual content enough, it
wasn't obvious to me.

So I'm not ready to run another test tonight. I need to better
understand and rethink the portion of the query dealing with
ProcessingStatus before we test again. Here are some questions
relating to that:

1. What is the relationship between the two kinds of
ProcessingStatus elements?

The two are:

a. '/InScopeProtocol/ProtocolProcessingDetails/ProcessingStatuses/
ProcessingStatusInfo/ProcessingStatus'

b. '/InScopeProtocol/ProtocolProcessingDetails/ProcessingStatus'

Can they conflict?

Does one of them take precedence over the other? Should one
be ignored if the other exists? Or is the relationship more
complicated than that?

2. How do I evaluate the ProcessingStatus that's part of the
list of statuses?

Is it always the most recent one that counts? Does that one
always supersede all the other status values, or does it ever
supplement one?

Is the most recent one always the one that's physically at
the top of the list in the order in the document? If it is,
it's easier to find than if I need to look at all of them and
order them by date.

3. Are we sure that it's just the two categories of documents
for which we have to evaluate ProcessingStatus, namely:

Never registered in CTGov
and
Blocked From CTGov

My naive understanding of the other three tells me that
ProcessingStatus doesn't matter for them, but maybe that's
too naive:

Completed prior to Sept 27, 2007
Withdrawn from PDQ
Withdrawn prior to Sept 27, 2007

I'll update the indexing on Franck, though it probably doesn't
matter at this point since testing is no longer practical there
after running in live mode.

And I'll wait for answers to my three questions above before I
update the program further and run a new test.

Comment entered 2010-04-14 08:24:36 by eckleyk

BZDATETIME::2010-04-14 08:24:36
BZCOMMENTOR::Kim Eckley
BZCOMMENT::61

(In reply to comment #60)
> A query that worked on Bach and Mahler made errors on
> Franck because "MissingInformation" was missing from the index
> there. Kind of poetic isn't it?

🙂

> When Kim explained the requirement to me to not select documents
> that were Pending, Hold, etc., she didn't say to only look at the
> latest ProcessingStatus, probably because that was so obvious it
> didn't need saying.
> But alas, since I don't work with the actual content enough, it
> wasn't obvious to me.

I should have made that more clear - see the answer to your question below.

> Here are some questions relating to that:
> 1. What is the relationship between the two kinds of
> ProcessingStatus elements?
> The two are:
> a. '/InScopeProtocol/ProtocolProcessingDetails/ProcessingStatuses/
> ProcessingStatusInfo/ProcessingStatus'
> b. '/InScopeProtocol/ProtocolProcessingDetails/ProcessingStatus'
> Can they conflict?
> Does one of them take precedence over the other? Should one
> be ignored if the other exists? Or is the relationship more
> complicated than that?

They shouldn't conflict. Since I am too close to the content, I don't always remember things they way there were, but the way they ARE. Of the two elements above, (b) is an older version of processing status. We wanted to track the true processing status in a bit more detail, so we came up with (a). Older documents may have (b), but all newer documents (since the newer elements were created) can only have (a) (by nature of inserting the elements, both are obviously still valid). Any documents that were in active processing when we made the change I believe WERE updated to (a). So we shouldn't see any docs with both types.

> 2. How do I evaluate the ProcessingStatus that's part of the
> list of statuses?
> Is it always the most recent one that counts?
This is were your initial assumption was correct - it doesn't have to be the last status.

>Does that one always supersede all the other status values, or does it ever
> supplement one?

They are entered in processing order - so this is a tricky question. None really supplement another; but the trump status could be first - the needs admin info; or it could be last - the Hold status.

> Is the most recent one always the one that's physically at
> the top of the list in the order in the document? If it is,
> it's easier to find than if I need to look at all of them and
> order them by date.

I don't think date should matter - just as long as it's present. Not sure if that makes it easier/harder on you.

> 3. Are we sure that it's just the two categories of documents
> for which we have to evaluate ProcessingStatus, namely:
> Never registered in CTGov
> and
> Blocked From CTGov
> My naive understanding of the other three tells me that
> ProcessingStatus doesn't matter for them, but maybe that's
> too naive:
> Completed prior to Sept 27, 2007
> Withdrawn from PDQ
> Withdrawn prior to Sept 27, 2007

You are corrected. The last three imply that the trial was published at some point - and the presence of one of those processing statuses SHOULD mean it was never published.

In actuality, only the "Never registered in CTGov" category may apply - as Blocked from CTGov also implies that the trial was at least published to cancer.gov. The trials with the a status of the following at point in the processing queue shouldn't be published - therefore, modified by this query.

'Hold',
'Abstract in review',
'Needs administrative information'

You'll notice that now missing from the above is the statuses are 'pending' and 'Merged'. After thinking your questions through and getting to this point, I realized these are a bit trickier.

The only time pending needs to be looked at is if it's the ONLY processing status. Nearly all trials start in a pending status, so that would wipe out a large chunk of trials!

As for Merged, after thinking more, this is always present at the end of the steps, but should be OK overall UNLESS the trial is in review, hold, or missing information - and we have those covered. We should NOT consider Merged.

For our processing, we keep all processing statuses, in order, except hold and needs admin info. Those we remove when warranted.

I'm wondering at this point b/c these are so tricky, if when we run live, if we could have a report (and if this is too difficult say so!) to tell us which trials were NOT changed based on this set of criteria. We can then go back to review if necessary. Or, in reverse, before we run, is it possible for us to see what would be excluded based on processing status and we could take a quick look?

> I'll update the indexing on Franck, though it probably doesn't
> matter at this point since testing is no longer practical there
> after running in live mode.
> And I'll wait for answers to my three questions above before I
> update the program further and run a new test.

Thanks, Alan for all your patience on this one. If you and I keep asking such probing questions and finding every needle, I wonder if this will ever get done! 🙂 That said, I really, really want it done - but don't want to sacrifice the data integrity!! Thanks!

Comment entered 2010-04-15 11:39:25 by alan

BZDATETIME::2010-04-15 11:39:25
BZCOMMENTOR::Alan Meyer
BZCOMMENT::62

(In reply to comment #61)
...
> I'm wondering at this point b/c these are so tricky, if when we
> run live, if we could have a report (and if this is too
> difficult say so!) to tell us which trials were NOT changed
> based on this set of criteria. We can then go back to review if
> necessary. Or, in reverse, before we run, is it possible for
> us to see what would be excluded based on processing status and
> we could take a quick look?

That seems like an excellent idea, and is not hard to do.

Producing the report will have two advantages:

1. It will enable you to zero in on the trickiest parts of the
data, making it easier to find potential errors than if you
had to look through a sea of 13,000 documents to find the
ones that would change should not have, or 7,000 others to
see if any of the ones that would not have changed, should
have.

2. The time spent producing the report could be much less than
time spent cleaning up errors that we missed because we
didn't have it.

I propose to write queries to produce the following reports:

Separately for each of the two categories:

Never registered in CTGov
Blocked From CTGov

For each ProcessingStatus:

Pending
Hold
Abstract in review
Merged
Needs administrative information

Print a sorted list of document IDs

You might decide you only want three ProcessingStatus values
examined in the final global change, or don't want "Blocked From
CTGov" docs to look at ProcessingStatus, but by having the report
you'll be better able to confirm that's what you want.

Since ProcessingStatus is a multiply occurring element, it is
entirely possible that some doc ID's will appear on more than
one list.

I'll start work on this. Let me know if it makes sense, and
whether you have any suggestions for improving it.

Maybe, for example, you really need to see the complete list of
ProcessingStatus values for every document that appears on one
of the lists?

Comment entered 2010-04-15 12:16:26 by eckleyk

BZDATETIME::2010-04-15 12:16:26
BZCOMMENTOR::Kim Eckley
BZCOMMENT::63

(In reply to comment #62)

> I propose to write queries to produce the following reports:
> Separately for each of the two categories:
> Never registered in CTGov
> Blocked From CTGov
> For each ProcessingStatus:
> Pending
> Hold
> Abstract in review
> Merged
> Needs administrative information
> Print a sorted list of document IDs
> You might decide you only want three ProcessingStatus values
> examined in the final global change, or don't want "Blocked From
> CTGov" docs to look at ProcessingStatus, but by having the report
> you'll be better able to confirm that's what you want.
> Since ProcessingStatus is a multiply occurring element, it is
> entirely possible that some doc ID's will appear on more than
> one list.
> I'll start work on this. Let me know if it makes sense, and
> whether you have any suggestions for improving it.
> Maybe, for example, you really need to see the complete list of
> ProcessingStatus values for every document that appears on one
> of the lists?

I'm thinking we can skip the queries on the first two - by category. It's really the processing status aspect that's throwing us for the loop.

For the query, yes, I think the complete list with the trials with the following status would be best.

> Hold
> Abstract in review
> Needs administrative information

We can exclude pending, and if we can get a separate report for merged, that would be good. Most documents will have pending and be absolutely fine, so I don't want to see those. I'm almost as confident that merged would be the same, but not at the same level that I don't want to see a report. I think the most of my analysis will be in the single query with the three statuses.

Does that make sense?

Comment entered 2010-04-15 12:46:09 by alan

BZDATETIME::2010-04-15 12:46:09
BZCOMMENTOR::Alan Meyer
BZCOMMENT::64

(In reply to comment #63)
> (In reply to comment #62)
>
> > I propose to write queries to produce the following reports:
> > Separately for each of the two categories:
> > Never registered in CTGov
> > Blocked From CTGov
> > For each ProcessingStatus:
> > Pending
> > Hold
> > Abstract in review
> > Merged
> > Needs administrative information
> > Print a sorted list of document IDs
...

> I'm thinking we can skip the queries on the first two - by category. It's
> really the processing status aspect that's throwing us for the loop.

I understand. I won't include any document on the list that doesn't
have one of the ProcessingStatus values of interest. But I'll still
say, for each of those docs, whether it was never registered or
whether it's blocked.

> For the query, yes, I think the complete list with the trials with the
> following status would be best.
>
> > Hold
> > Abstract in review
> > Needs administrative information
>
> We can exclude pending, and if we can get a separate report for merged, that
> would be good. Most documents will have pending and be absolutely fine, so I
> don't want to see those. I'm almost as confident that merged would be the same,
> but not at the same level that I don't want to see a report. I think the most
> of my analysis will be in the single query with the three statuses.
>
> Does that make sense?

It does.

One of the good things about a report, as opposed to a global
change, is that if I haven't understood, at least I haven't
changed any data.

I've got a meeting to go to but I'll start working on the report
this afternoon and should have it done today.

Comment entered 2010-04-15 16:35:16 by alan

BZDATETIME::2010-04-15 16:35:16
BZCOMMENTOR::Alan Meyer
BZCOMMENT::65

Here are the queries and results as discussed in the preceding
two comments.

Comment entered 2010-04-15 16:35:16 by alan

Attachment 4791ProcStat.txt has been added with description: Docs that would be excluded because of ProcessingStatus on Bach

Comment entered 2010-04-16 12:35:59 by eckleyk

BZDATETIME::2010-04-16 12:35:59
BZCOMMENTOR::Kim Eckley
BZCOMMENT::66

(In reply to comment #65)
> Created attachment 1896 [details]
> Docs that would be excluded because of ProcessingStatus on Bach
> Here are the queries and results as discussed in the preceding
> two comments.

Andrea and I have reviewed the query results two ways.

1 - making sure the trials in the results should be excluded.
2 - Looking at the large trial processing status report, and making sure the trials on the list are either on the query results, or rightfully excluded from the query for another reason (presence of the missinginformation element).

And we are happy! Things look good.

I'm nervous to say it of course, but Alan, if you agree we've covered all bases, I'm ready to go live in Bach if you are.

Comment entered 2010-04-16 13:41:21 by alan

BZDATETIME::2010-04-16 13:41:21
BZCOMMENTOR::Alan Meyer
BZCOMMENT::67

(In reply to comment #66)
...
> And we are happy! Things look good.
>
> I'm nervous to say it of course, but Alan, if you agree we've covered all
> bases, I'm ready to go live in Bach if you are.

I think I won't do anything until publishing is over tonight. Then, tonight or tomorrow, I'll run the job in test mode and look over the results as best I can. Then, unless you want one more review, I'll run the job in live mode, probably on Sunday.

I'm not sure what the consequences of errors are. If we fail to mark something as not needing transfer, I should think the worst that would happen is that a human will review it at some point and realize that it doesn't really need transfer even though it doesn't have the contact info that says it doesn't. That doesn't sound too serious to me.

On the other side, we might mark something as not needing transfer when it does. Well, I guess at this point we've done as much checking as we can on 13,000+ documents. Hopefully there won't be any errors. Or at least not too many.

Comment entered 2010-04-16 14:35:27 by eckleyk

BZDATETIME::2010-04-16 14:35:27
BZCOMMENTOR::Kim Eckley
BZCOMMENT::68

(In reply to comment #67)

> I think I won't do anything until publishing is over tonight. Then, tonight or
> tomorrow, I'll run the job in test mode and look over the results as best I
> can. Then, unless you want one more review, I'll run the job in live mode,
> probably on Sunday.

Sounds good = go for live on Sunday if you are able. I think we've looked at everything possible!

> Well, I guess at this point we've done as much checking as we can on
> 13,000+ documents. Hopefully there won't be any errors. Or at least not too
> many.

I agree!

Comment entered 2010-04-18 22:32:13 by alan

BZDATETIME::2010-04-18 22:32:13
BZCOMMENTOR::Alan Meyer
BZCOMMENT::69

I ran the test mode on Saturday. When it was done I checked
various things. The counts looked about right, the diffs
were right, I searched the files for strings that should or
shouldn't be there, and they all looked right.

So, looking over the edge at the ocean far below, I stepped
back, crossed my fingers, ran forward and leaped over the
cliff.

The live mode is running now. It should finish tonight.

Comment entered 2010-04-19 09:38:39 by alan

BZDATETIME::2010-04-19 09:38:39
BZCOMMENTOR::Alan Meyer
BZCOMMENT::70

The job is complete. There was one doc that was locked and
therefore not processed, and lots and lots of validation
warnings, which is not unexpected with so many long untouched,
closed protocols.

The log file, a very large one, is attached.

Comment entered 2010-04-19 09:38:39 by alan

Attachment 4721b.log has been added with description: Log file for live run on Bach

Comment entered 2010-04-19 09:39:20 by alan

BZDATETIME::2010-04-19 09:39:20
BZCOMMENTOR::Alan Meyer
BZCOMMENT::71

I'm marking this resolved-fixed.

Comment entered 2010-04-19 14:17:08 by eckleyk

BZDATETIME::2010-04-19 14:17:08
BZCOMMENTOR::Kim Eckley
BZCOMMENT::72

(In reply to comment #71)
> I'm marking this resolved-fixed.

And I'm marking this closed. Thank you so much Alan!

Attachments
File Name Posted User
4721,log 2010-04-09 10:35:07
4721b.log 2010-04-19 09:38:39
4791ProcStat.txt 2010-04-15 16:35:16
diffout2 2010-03-18 10:42:17
protdates.txt 2010-01-26 17:11:49
protdates.txt 2010-01-26 17:07:05
Protocols that will not be transferred.doc 2010-02-12 16:53:06 Grama, Lakshmi (NIH/NCI) [E]
Request4721.log 2009-12-28 20:30:32
Request4721.sql 2010-02-16 23:53:31
Request4721a.log 2010-02-05 02:21:09

Elapsed: 0:00:00.001658