Issue Number | 3322 |
---|---|
Summary | Documents on Cancer.gov not accounted for |
Created | 2011-03-15 13:31:57 |
Issue Type | Improvement |
Submitted By | Englisch, Volker (NIH/NCI) [C] |
Assigned To | Englisch, Volker (NIH/NCI) [C] |
Status | Closed |
Resolved | 2011-04-21 16:38:40 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.107650 |
BZISSUE::5015
BZDATETIME::2011-03-15 13:31:57
BZCREATOR::Volker Englisch
BZASSIGNEE::Volker Englisch
BZQACONTACT::Alan Meyer
Before we moved to Percussion the WCM team frequently requested a summary and drug info summary data load from FRANCK to test the Gatekeeper loading procedures.
Right before the switch to Percussion a new test load was requested
with the most current data possible to be submitted from FRANCK. After
the data had been loaded to their test server and everything had been
tested the WCM team used that same data for production. In effect, the
information of the documents that had been updated/submitted to
Cancer.gov was registered on FRANCK and not on BACH.
As a result of this I found a document that had been published to
Cancer.gov but we don't have an entry about this publishing event in our
database.
We will need to identify how many documents are affected that are on Cancer.gov but have no record of the publishing event in our CDR database.
BZDATETIME::2011-03-16 14:51:33
BZCOMMENTOR::Volker Englisch
BZCOMMENT::1
Bob, I have a question for you:
We have this tool that reports the status of a CDR document on
Gatekeeper. The tool comes in three falvors:
a) submit a CDR-ID and report the existence on Staging, Preview, or
Live
b) submit a Job-ID and report the existence of all documents within that
push
job on Staging, Preview, or Live
c) Report the status of all documents on Gatekeeper
If you supply a CDR-ID for a document that hasn't been published yet
(or has been removed) the message 'Not Present' appears for the
individual stages. This makes sense.
If I, however, submit the full status request to Gatekeeper it reports
back CDR-IDs with a status of 'Not Present' as well. I don't understand
this and was hoping you might remember the reason for this.
If I'm asking: Show me all the IDs that you have from us. I wouldn't
expect the answer to be: Here are all the IDs you gave me but the IDs
from this subset I do not have.
Would this be a question for Blair?
BZDATETIME::2011-03-16 14:52:39
BZCOMMENTOR::Volker Englisch
BZCOMMENT::2
(In reply to comment #1)
> The tool comes in three falvors:
Correction:
falvors ==> flavors
BZDATETIME::2011-03-16 15:29:15
BZCOMMENTOR::Bob Kline
BZCOMMENT::3
(In reply to comment #1)
> If I, however, submit the full status request to Gatekeeper it
reports back
> CDR-IDs with a status of 'Not Present' as well. I don't understand
this and
> was hoping you might remember the reason for this.
> If I'm asking: Show me all the IDs that you have from us. I
wouldn't expect
> the answer to be: Here are all the IDs you gave me but the IDs from
this
> subset I do not have.
>
> Would this be a question for Blair?
By all means consult with Blair to verify, but here's what the latest version of the "Gatekeeper Communication Protocol" document says about the "DocumentLocation" flavor of the status request command, which looks like what you're trying to use:
================================= START QUOTE ===============================
2.2.3
RequestStatusResult format for statusType = “DocumentLocation”
If RequestStatus is called with statusType = ‘DocumentLocation’, it
will return a RequestStatusResult listing all
documents in the system and the CDR job IDs currently in place on the
Gatekeeper, Preview and Live systems.
Note: The requestID parameter is not present for DocumentLocation reports.
RequestStatusResult = ‘<Response>
<docCount>docCount</docCount >
<detailedMessage>detailedMessage</detailedMessage>
</Response>’
Parameter description:
o docCount represents the total number of documents in the system.
o detailedMessage contains additional information about the status in
the format shown below.
<documentList>
<document cdrid=“cdrID1” gatekeeper=“jobID”
gatekeeperDateTime=“date/time”
preview=“jobID” previewDateTime=“date/time” live=“jobID”
liveDateTime=“date/time”
/>
<document cdrid=“cdrID2” gatekeeper=“jobID”
gatekeeperDateTime=“date/time”
preview=“jobID” previewDateTime=“date/time” live=“jobID”
liveDateTime=“date/time” />
:
<document cdrid=“cdrIDn” gatekeeper=“jobID”
gatekeeperDateTime=“date/time”
preview=“jobID” previewDateTime=“date/time” live=“jobID”
liveDateTime=“date%
BZDATETIME::2011-03-16 16:05:22
BZCOMMENTOR::Bob Kline
BZCOMMENT::4
(In reply to comment #1)
> If I, however, submit the full status request to Gatekeeper it
reports back
> CDR-IDs with a status of 'Not Present' as well. I don't understand
this and
> was hoping you might remember the reason for this.
> If I'm asking: Show me all the IDs that you have from us. I
wouldn't expect
> the answer to be: Here are all the IDs you gave me but the IDs from
this
> subset I do not have.
>
> Would this be a question for Blair?
By all means consult with Blair to verify, but here's what the latest version of the "Gatekeeper Communication Protocol" document says about the "DocumentLocation" flavor of the status request command, which looks like what you're trying to use:
================================= START QUOTE ===============================
2.2.3
RequestStatusResult format for statusType = “DocumentLocation”
If RequestStatus is called with statusType = ‘DocumentLocation’, it
will return a RequestStatusResult listing all
documents in the system and the CDR job IDs currently in place on the
Gatekeeper, Preview and Live systems.
Note: The requestID parameter is not present for DocumentLocation reports.
RequestStatusResult = ‘<Response>
<docCount>docCount</docCount >
<detailedMessage>detailedMessage</detailedMessage>
</Response>’
Parameter description:
o docCount represents the total number of documents in the system.
o detailedMessage contains additional information about the status in
the format shown below.
<documentList>
<document cdrid=“cdrID1” gatekeeper=“jobID”
gatekeeperDateTime=“date/time”
preview=“jobID” previewDateTime=“date/time” live=“jobID”
liveDateTime=“date/time”
/>
<document cdrid=“cdrID2” gatekeeper=“jobID”
gatekeeperDateTime=“date/time”
preview=“jobID” previewDateTime=“date/time” live=“jobID”
liveDateTime=“date/time” />
:
<document cdrid=“cdrIDn” gatekeeper=“jobID”
gatekeeperDateTime=“date/time”
preview=“jobID” previewDateTime=“date/time” live=“jobID”
liveDateTime=“date%
BZDATETIME::2011-03-16 18:25:21
BZCOMMENTOR::Bob Kline
BZCOMMENT::5
(In reply to comment #1)
> If I, however, submit the full status request to Gatekeeper it
reports back
> CDR-IDs with a status of 'Not Present' as well. I don't understand
this and
> was hoping you might remember the reason for this.
> If I'm asking: Show me all the IDs that you have from us. I
wouldn't expect
> the answer to be: Here are all the IDs you gave me but the IDs from
this
> subset I do not have.
>
> Would this be a question for Blair?
By all means consult with Blair to verify, but here's what the latest version of the "Gatekeeper Communication Protocol" document says about the "DocumentLocation" flavor of the status request command, which looks like what you're trying to use:
================================= START QUOTE ===============================
2.2.3
RequestStatusResult format for statusType = “DocumentLocation”
If RequestStatus is called with statusType = ‘DocumentLocation’, it
will return a RequestStatusResult listing all
documents in the system and the CDR job IDs currently in place on the
Gatekeeper, Preview and Live systems.
Note: The requestID parameter is not present for DocumentLocation reports.
RequestStatusResult = ‘<Response>
<docCount>docCount</docCount >
<detailedMessage>detailedMessage</detailedMessage>
</Response>’
Parameter description:
o docCount represents the total number of documents in the system.
o detailedMessage contains additional information about the status in
the format shown below.
<documentList>
<document cdrid=“cdrID1” gatekeeper=“jobID”
gatekeeperDateTime=“date/time”
preview=“jobID” previewDateTime=“date/time” live=“jobID”
liveDateTime=“date/time”
/>
<document cdrid=“cdrID2” gatekeeper=“jobID”
gatekeeperDateTime=“date/time”
preview=“jobID” previewDateTime=“date/time” live=“jobID”
liveDateTime=“date/time” />
:
<document cdrid=“cdrIDn” gatekeeper=“jobID”
gatekeeperDateTime=“date/time”
preview=“jobID” previewDateTime=“date/time” live=“jobID”
liveDateTime=“date%
BZDATETIME::2011-03-16 18:28:16
BZCOMMENTOR::Volker Englisch
BZCOMMENT::6
FYI:
I've talked to Blair about this. His report does list all documents we
had ever submitted and for those that have been removed Gatekeeper's
response for all three stages is 'Not Present'.
I've adjusted my report accordingly since those documents should not be listed as errors.
BZDATETIME::2011-03-16 22:45:50
BZCOMMENTOR::Bob Kline
BZCOMMENT::7
(In reply to comment #1)
> If I, however, submit the full status request to Gatekeeper it
reports back
> CDR-IDs with a status of 'Not Present' as well. I don't understand
this and
> was hoping you might remember the reason for this.
> If I'm asking: Show me all the IDs that you have from us. I
wouldn't expect
> the answer to be: Here are all the IDs you gave me but the IDs from
this
> subset I do not have.
>
> Would this be a question for Blair?
By all means consult with Blair to verify, but here's what the latest version of the "Gatekeeper Communication Protocol" document says about the "DocumentLocation" flavor of the status request command, which looks like what you're trying to use:
================================= START QUOTE ===============================
2.2.3
RequestStatusResult format for statusType = “DocumentLocation”
If RequestStatus is called with statusType = ‘DocumentLocation’, it
will return a RequestStatusResult listing all
documents in the system and the CDR job IDs currently in place on the
Gatekeeper, Preview and Live systems.
Note: The requestID parameter is not present for DocumentLocation reports.
RequestStatusResult = ‘<Response>
<docCount>docCount</docCount >
<detailedMessage>detailedMessage</detailedMessage>
</Response>’
Parameter description:
o docCount represents the total number of documents in the system.
o detailedMessage contains additional information about the status in
the format shown below.
<documentList>
<document cdrid=“cdrID1” gatekeeper=“jobID”
gatekeeperDateTime=“date/time”
preview=“jobID” previewDateTime=“date/time” live=“jobID”
liveDateTime=“date/time”
/>
<document cdrid=“cdrID2” gatekeeper=“jobID”
gatekeeperDateTime=“date/time”
preview=“jobID” previewDateTime=“date/time” live=“jobID”
liveDateTime=“date/time” />
:
<document cdrid=“cdrIDn” gatekeeper=“jobID”
gatekeeperDateTime=“date/time”
preview=“jobID” previewDateTime=“date/time” live=“jobID”
liveDateTime=“date/time” />
</documentList>
cdrIDx is a CDR document ID.
jobID is the CDR job id that contained the version of the document which
currently resides at that level.
(E.g. gatekeeper and preview might contain the version from job
20060502, but live would contain the older
20060301.)
date/time is the date and time the document was promoted to a given
level, reported in the format yyyy-mm-
ddThh:mi:ss.mmm (no spaces).
In the event that a document has never been promoted to a given
level, both the jobID and date/time contains
the text “Not Present.”
In the event that a document has been deleted from a level, the jobID
contains the text “Not Present” and the
date/time contains the date and time of the deletion.
================================== END QUOTE ================================
So it would seem that what you're seeing would reflect the condition in which a document was sent to the Gatekeeper but either didn't make it successfully to any of the levels because of processing or data errors, or was subsequently deleted from all levels for some reason. I don't know what circumstances would warrant the deletion of a document from a level. At any rate, it would seem that for the purpose of determining what should be in the pub_proc_cg table you could ignore any documents which were not represented as being on the "live" level (though that would be a temporary state of the systems, since it's always possible that they could nudge a document that's sitting in one of the earlier stages to the "live" stage without any action on our part, so we'd probably want to perform this exercise periodically).
Again, confirming all this with Blair is probably the prudent thing to do.
BZDATETIME::2011-04-20 14:51:29
BZCOMMENTOR::Volker Englisch
BZCOMMENT::8
I've updated the GatekeeperStatus.py report to include a test
identifying if a reported document from Gatekeeper is also recorded in
the CDR in the pub_prog_cg table.
I made minor changes to set the proper Gatekeeper server and minor
changes to improve readability of the help section. Also, an option has
been added to suppress the display of documents that are OK meaning that
only documents with errors will be displayed.
An option has been added to the publishing reports menu.
The following programs have been copied to FRANCK and BACH:
GateKeeperStatus.py - R10068
PublishReports.py - R10067
BZDATETIME::2011-04-21 16:38:40
BZCOMMENTOR::Volker Englisch
BZCOMMENT::9
As discussed at today's status meeting I'm closing this issue myself.
Elapsed: 0:00:00.000589