CDR Tickets

Issue Number 3322
Summary Documents on Cancer.gov not accounted for
Created 2011-03-15 13:31:57
Issue Type Improvement
Submitted By Englisch, Volker (NIH/NCI) [C]
Assigned To Englisch, Volker (NIH/NCI) [C]
Status Closed
Resolved 2011-04-21 16:38:40
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.107650
Description

BZISSUE::5015
BZDATETIME::2011-03-15 13:31:57
BZCREATOR::Volker Englisch
BZASSIGNEE::Volker Englisch
BZQACONTACT::Alan Meyer

Before we moved to Percussion the WCM team frequently requested a summary and drug info summary data load from FRANCK to test the Gatekeeper loading procedures.

Right before the switch to Percussion a new test load was requested with the most current data possible to be submitted from FRANCK. After the data had been loaded to their test server and everything had been tested the WCM team used that same data for production. In effect, the information of the documents that had been updated/submitted to Cancer.gov was registered on FRANCK and not on BACH.
As a result of this I found a document that had been published to Cancer.gov but we don't have an entry about this publishing event in our database.

We will need to identify how many documents are affected that are on Cancer.gov but have no record of the publishing event in our CDR database.

Comment entered 2011-03-16 14:51:33 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-03-16 14:51:33
BZCOMMENTOR::Volker Englisch
BZCOMMENT::1

Bob, I have a question for you:

We have this tool that reports the status of a CDR document on Gatekeeper. The tool comes in three falvors:
a) submit a CDR-ID and report the existence on Staging, Preview, or Live
b) submit a Job-ID and report the existence of all documents within that push
job on Staging, Preview, or Live
c) Report the status of all documents on Gatekeeper

If you supply a CDR-ID for a document that hasn't been published yet (or has been removed) the message 'Not Present' appears for the individual stages. This makes sense.
If I, however, submit the full status request to Gatekeeper it reports back CDR-IDs with a status of 'Not Present' as well. I don't understand this and was hoping you might remember the reason for this.
If I'm asking: Show me all the IDs that you have from us. I wouldn't expect the answer to be: Here are all the IDs you gave me but the IDs from this subset I do not have.

Would this be a question for Blair?

Comment entered 2011-03-16 14:52:39 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-03-16 14:52:39
BZCOMMENTOR::Volker Englisch
BZCOMMENT::2

(In reply to comment #1)
> The tool comes in three falvors:

Correction:
falvors ==> flavors

Comment entered 2011-03-16 15:29:15 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-03-16 15:29:15
BZCOMMENTOR::Bob Kline
BZCOMMENT::3

(In reply to comment #1)

> If I, however, submit the full status request to Gatekeeper it reports back
> CDR-IDs with a status of 'Not Present' as well. I don't understand this and
> was hoping you might remember the reason for this.
> If I'm asking: Show me all the IDs that you have from us. I wouldn't expect
> the answer to be: Here are all the IDs you gave me but the IDs from this
> subset I do not have.
>
> Would this be a question for Blair?

By all means consult with Blair to verify, but here's what the latest version of the "Gatekeeper Communication Protocol" document says about the "DocumentLocation" flavor of the status request command, which looks like what you're trying to use:

================================= START QUOTE ===============================

2.2.3

RequestStatusResult format for statusType = “DocumentLocation”

If RequestStatus is called with statusType = ‘DocumentLocation’, it will return a RequestStatusResult listing all
documents in the system and the CDR job IDs currently in place on the Gatekeeper, Preview and Live systems.

Note: The requestID parameter is not present for DocumentLocation reports.

RequestStatusResult = ‘<Response>
<docCount>docCount</docCount >
<detailedMessage>detailedMessage</detailedMessage>
</Response>’

Parameter description:
o docCount represents the total number of documents in the system.
o detailedMessage contains additional information about the status in the format shown below.

<documentList>
<document cdrid=“cdrID1” gatekeeper=“jobID” gatekeeperDateTime=“date/time”
preview=“jobID” previewDateTime=“date/time” live=“jobID” liveDateTime=“date/time”
/>
<document cdrid=“cdrID2” gatekeeper=“jobID” gatekeeperDateTime=“date/time”
preview=“jobID” previewDateTime=“date/time” live=“jobID” liveDateTime=“date/time” />
:
<document cdrid=“cdrIDn” gatekeeper=“jobID” gatekeeperDateTime=“date/time”
preview=“jobID” previewDateTime=“date/time” live=“jobID” liveDateTime=“date%

Comment entered 2011-03-16 16:05:22 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-03-16 16:05:22
BZCOMMENTOR::Bob Kline
BZCOMMENT::4

(In reply to comment #1)

> If I, however, submit the full status request to Gatekeeper it reports back
> CDR-IDs with a status of 'Not Present' as well. I don't understand this and
> was hoping you might remember the reason for this.
> If I'm asking: Show me all the IDs that you have from us. I wouldn't expect
> the answer to be: Here are all the IDs you gave me but the IDs from this
> subset I do not have.
>
> Would this be a question for Blair?

By all means consult with Blair to verify, but here's what the latest version of the "Gatekeeper Communication Protocol" document says about the "DocumentLocation" flavor of the status request command, which looks like what you're trying to use:

================================= START QUOTE ===============================

2.2.3

RequestStatusResult format for statusType = “DocumentLocation”

If RequestStatus is called with statusType = ‘DocumentLocation’, it will return a RequestStatusResult listing all
documents in the system and the CDR job IDs currently in place on the Gatekeeper, Preview and Live systems.

Note: The requestID parameter is not present for DocumentLocation reports.

RequestStatusResult = ‘<Response>
<docCount>docCount</docCount >
<detailedMessage>detailedMessage</detailedMessage>
</Response>’

Parameter description:
o docCount represents the total number of documents in the system.
o detailedMessage contains additional information about the status in the format shown below.

<documentList>
<document cdrid=“cdrID1” gatekeeper=“jobID” gatekeeperDateTime=“date/time”
preview=“jobID” previewDateTime=“date/time” live=“jobID” liveDateTime=“date/time”
/>
<document cdrid=“cdrID2” gatekeeper=“jobID” gatekeeperDateTime=“date/time”
preview=“jobID” previewDateTime=“date/time” live=“jobID” liveDateTime=“date/time” />
:
<document cdrid=“cdrIDn” gatekeeper=“jobID” gatekeeperDateTime=“date/time”
preview=“jobID” previewDateTime=“date/time” live=“jobID” liveDateTime=“date%

Comment entered 2011-03-16 18:25:21 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-03-16 18:25:21
BZCOMMENTOR::Bob Kline
BZCOMMENT::5

(In reply to comment #1)

> If I, however, submit the full status request to Gatekeeper it reports back
> CDR-IDs with a status of 'Not Present' as well. I don't understand this and
> was hoping you might remember the reason for this.
> If I'm asking: Show me all the IDs that you have from us. I wouldn't expect
> the answer to be: Here are all the IDs you gave me but the IDs from this
> subset I do not have.
>
> Would this be a question for Blair?

By all means consult with Blair to verify, but here's what the latest version of the "Gatekeeper Communication Protocol" document says about the "DocumentLocation" flavor of the status request command, which looks like what you're trying to use:

================================= START QUOTE ===============================

2.2.3

RequestStatusResult format for statusType = “DocumentLocation”

If RequestStatus is called with statusType = ‘DocumentLocation’, it will return a RequestStatusResult listing all
documents in the system and the CDR job IDs currently in place on the Gatekeeper, Preview and Live systems.

Note: The requestID parameter is not present for DocumentLocation reports.

RequestStatusResult = ‘<Response>
<docCount>docCount</docCount >
<detailedMessage>detailedMessage</detailedMessage>
</Response>’

Parameter description:
o docCount represents the total number of documents in the system.
o detailedMessage contains additional information about the status in the format shown below.

<documentList>
<document cdrid=“cdrID1” gatekeeper=“jobID” gatekeeperDateTime=“date/time”
preview=“jobID” previewDateTime=“date/time” live=“jobID” liveDateTime=“date/time”
/>
<document cdrid=“cdrID2” gatekeeper=“jobID” gatekeeperDateTime=“date/time”
preview=“jobID” previewDateTime=“date/time” live=“jobID” liveDateTime=“date/time” />
:
<document cdrid=“cdrIDn” gatekeeper=“jobID” gatekeeperDateTime=“date/time”
preview=“jobID” previewDateTime=“date/time” live=“jobID” liveDateTime=“date%

Comment entered 2011-03-16 18:28:16 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-03-16 18:28:16
BZCOMMENTOR::Volker Englisch
BZCOMMENT::6

FYI:
I've talked to Blair about this. His report does list all documents we had ever submitted and for those that have been removed Gatekeeper's response for all three stages is 'Not Present'.

I've adjusted my report accordingly since those documents should not be listed as errors.

Comment entered 2011-03-16 22:45:50 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-03-16 22:45:50
BZCOMMENTOR::Bob Kline
BZCOMMENT::7

(In reply to comment #1)

> If I, however, submit the full status request to Gatekeeper it reports back
> CDR-IDs with a status of 'Not Present' as well. I don't understand this and
> was hoping you might remember the reason for this.
> If I'm asking: Show me all the IDs that you have from us. I wouldn't expect
> the answer to be: Here are all the IDs you gave me but the IDs from this
> subset I do not have.
>
> Would this be a question for Blair?

By all means consult with Blair to verify, but here's what the latest version of the "Gatekeeper Communication Protocol" document says about the "DocumentLocation" flavor of the status request command, which looks like what you're trying to use:

================================= START QUOTE ===============================

2.2.3

RequestStatusResult format for statusType = “DocumentLocation”

If RequestStatus is called with statusType = ‘DocumentLocation’, it will return a RequestStatusResult listing all
documents in the system and the CDR job IDs currently in place on the Gatekeeper, Preview and Live systems.

Note: The requestID parameter is not present for DocumentLocation reports.

RequestStatusResult = ‘<Response>
<docCount>docCount</docCount >
<detailedMessage>detailedMessage</detailedMessage>
</Response>’

Parameter description:
o docCount represents the total number of documents in the system.
o detailedMessage contains additional information about the status in the format shown below.

<documentList>
<document cdrid=“cdrID1” gatekeeper=“jobID” gatekeeperDateTime=“date/time”
preview=“jobID” previewDateTime=“date/time” live=“jobID” liveDateTime=“date/time”
/>
<document cdrid=“cdrID2” gatekeeper=“jobID” gatekeeperDateTime=“date/time”
preview=“jobID” previewDateTime=“date/time” live=“jobID” liveDateTime=“date/time” />
:
<document cdrid=“cdrIDn” gatekeeper=“jobID” gatekeeperDateTime=“date/time”
preview=“jobID” previewDateTime=“date/time” live=“jobID” liveDateTime=“date/time” />
</documentList>

cdrIDx is a CDR document ID.
jobID is the CDR job id that contained the version of the document which currently resides at that level.
(E.g. gatekeeper and preview might contain the version from job 20060502, but live would contain the older
20060301.)
date/time is the date and time the document was promoted to a given level, reported in the format yyyy-mm-
ddThh:mi:ss.mmm (no spaces).

In the event that a document has never been promoted to a given level, both the jobID and date/time contains
the text “Not Present.”

In the event that a document has been deleted from a level, the jobID contains the text “Not Present” and the
date/time contains the date and time of the deletion.

================================== END QUOTE ================================

So it would seem that what you're seeing would reflect the condition in which a document was sent to the Gatekeeper but either didn't make it successfully to any of the levels because of processing or data errors, or was subsequently deleted from all levels for some reason. I don't know what circumstances would warrant the deletion of a document from a level. At any rate, it would seem that for the purpose of determining what should be in the pub_proc_cg table you could ignore any documents which were not represented as being on the "live" level (though that would be a temporary state of the systems, since it's always possible that they could nudge a document that's sitting in one of the earlier stages to the "live" stage without any action on our part, so we'd probably want to perform this exercise periodically).

Again, confirming all this with Blair is probably the prudent thing to do.

Comment entered 2011-04-20 14:51:29 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-04-20 14:51:29
BZCOMMENTOR::Volker Englisch
BZCOMMENT::8

I've updated the GatekeeperStatus.py report to include a test identifying if a reported document from Gatekeeper is also recorded in the CDR in the pub_prog_cg table.
I made minor changes to set the proper Gatekeeper server and minor changes to improve readability of the help section. Also, an option has been added to suppress the display of documents that are OK meaning that only documents with errors will be displayed.
An option has been added to the publishing reports menu.

The following programs have been copied to FRANCK and BACH:
GateKeeperStatus.py - R10068
PublishReports.py - R10067

Comment entered 2011-04-21 16:38:40 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-04-21 16:38:40
BZCOMMENTOR::Volker Englisch
BZCOMMENT::9

As discussed at today's status meeting I'm closing this issue myself.

Elapsed: 0:00:00.000589