CDR Tickets

Issue Number 3394
Summary [Internal] Create Re-Verification Tool
Created 2011-07-20 14:12:02
Issue Type Improvement
Submitted By Englisch, Volker (NIH/NCI) [C]
Assigned To Englisch, Volker (NIH/NCI) [C]
Status Closed
Resolved 2011-11-11 18:31:20
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.107722
Description

BZISSUE::5087
BZDATETIME::2011-07-20 14:12:02
BZCREATOR::Volker Englisch
BZASSIGNEE::Volker Englisch
BZQACONTACT::Alan Meyer

Occasionally we're submitting a publishing job to Gatekeeper that will not be processed immediately or will not be pushed all the way to the Live site (i.e. if we want to preview the data before is goes live).
Since our publishing process identifies a push job as 'Failed' when the documents didn't complete the publishing process to the Live site within 8 hours it occasionally happens that a job will be marked as 'failed' even though it will complete publishing successfully in the end.
For these types of 'preview-first' jobs, we would like to be able to re-verify a publishing job. This would be a job that could be run manually after the developers have identified that a publishing job finished so that the documents will be properly marked as successful publications within the CDR.

Comment entered 2011-09-20 15:38:59 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-09-20 15:38:59
BZCOMMENTOR::Volker Englisch
BZCOMMENT::1

I'm in the process of writing this re-verification tool and I'd like to get your opinions about something.

The general idea is the following:
a) A publishing job is being pushed to GK but it doesn't complete within
8 hours. Maybe we stopped processing at the preview site for review which
will cause the status of the job in pub_proc to be set to 'Stalled'.
The tool will - by default - look for all stalled jobs, re-verify and
update the failure status of the documents and/or the status of the
push job.

b) A publishing job pushed the documents to GK and a few documents, due to
a bug on Gatekeeper, failed processing. Our push job has been set to
Success, because not all of the documents failed.
We will be able to supply the job status and Job-ID to re-verify just this
one specific job. After the Gatekeeper bug has been fixed and the
documents have been reprocessed we should be able to re-verify.

My question is this:
There exists the 'Completed' column in the table pub_proc. Do you think the time stamp should be updated when the re-verification tool runs or should there be some other information - maybe by adding to the messages field - indicating that the re-verification tool ran on a particular job?

Comment entered 2011-09-22 18:15:19 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-09-22 18:15:19
BZCOMMENTOR::Volker Englisch
BZCOMMENT::2

I created this tool and tested it successfully on MAHLER and FRANCK.
ReverifyPushJob.py - 10204

Testing the tool may be a little difficult because the push jobs on Gatekeeper and in the pub_proc table must be setup for just the right combination in order for the tool to pick up anything. Maybe it would be best if I sit with you, Alan, to guide you through the testing.

The program is started on the command prompt either with
ReverifyPushJob.py --runmode --user=uid --passwd=pwd
or
ReverifyPushJob.py --runmode --user=uid --passwd=pwd --status=stat --jobid=12

with --runmode set to either --testmode or --livemode
In the first form the tool only looks at push jobs in the pub_proc table with a status of 'Stalled'. In the second form, one can re-verify jobs that finished processing on Gatekeeper with errors.

A sample of the output/changes or the messages on FRANCK can be seen at
http://franck.nci.nih.gov/cgi-bin/cdr/PubStatus.py?id=8925

This is ready for testing on MAHLER or FRANCK.

Comment entered 2011-10-07 13:33:53 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-10-07 13:33:53
BZCOMMENTOR::Volker Englisch
BZCOMMENT::3

These are the documents that will need to be re-verified:
select *
from pub_proc_doc
where pub_proc = 9057
and failure is not null

Comment entered 2011-10-11 11:42:04 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-10-11 11:42:04
BZCOMMENTOR::Volker Englisch
BZCOMMENT::4

Just look how lucky we are, here is another job that "failed" over the weekend and needs to be re-verified.

More than 8 hours have elapsed since completion of the push of CDR
documents for publishing job 9079, and loading of the documents
has still not completed.

Comment entered 2011-11-10 09:58:16 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-11-10 09:58:16
BZCOMMENTOR::Volker Englisch
BZCOMMENT::5

There is a job in production that needs to be re-verified, so this issue should go to production soon before we forget.
I'm increasing the priority.

Comment entered 2011-11-10 18:13:29 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-11-10 18:13:29
BZCOMMENTOR::Volker Englisch
BZCOMMENT::6

Alan and I tested the tool on FRANCK successfully. The program is now versioned as
ReverifyPushJob.py - R10204
and it is located in the directory
\cdr\publishing

Comment entered 2011-11-10 18:46:48 by alan

BZDATETIME::2011-11-10 18:46:48
BZCOMMENTOR::Alan Meyer
BZCOMMENT::7

As Volker said, we verified this. I asked all the questions I could think of and Volker never once had to say "er, uh, um, oops." So I'm marking it verified fixed.

Comment entered 2011-11-11 15:54:13 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-11-11 15:54:13
BZCOMMENTOR::Volker Englisch
BZCOMMENT::8

I've copied the program to BACH and ran it for the jobs that I identified with failed documents to reverify the push-jobs.

We can now close this issue.

Comment entered 2011-11-11 18:31:20 by alan

BZDATETIME::2011-11-11 18:31:20
BZCOMMENTOR::Alan Meyer
BZCOMMENT::9

Closing

Elapsed: 0:00:00.001419