CDR Tickets

Issue Number 4130
Summary Failure verifying Push Job (on lower tiers)
Created 2016-07-07 18:42:54
Issue Type Bug
Submitted By Englisch, Volker (NIH/NCI) [C]
Assigned To Englisch, Volker (NIH/NCI) [C]
Status Closed
Resolved 2018-07-06 11:41:12
Resolution Won't Fix
Path /home/bkline/backups/jira/ocecdr/issue.187729
Description

I'm seeing frequent publishing failures on the lower tiers with a message that the previous job is still in process.
This is a result of the previous job not getting updated during the verification process, so the job is still reported with a status of 'Verifying' in the pub_proc database table instead of getting updated to a status of 'Success'.
This problem has started about 2 months ago and it appears that the process is loosing network connectivity - likely at the GK end - and the verification process continues until the status for that particular job gets manually set to 'Failure'.

Comment entered 2016-07-12 18:14:22 by Englisch, Volker (NIH/NCI) [C]

Tracking Failures:

  • 07/11 - STAGE

Comment entered 2016-07-13 11:37:20 by Englisch, Volker (NIH/NCI) [C]

The problem has been identified and a ticket created on the GK side.

Comment entered 2016-07-13 11:43:43 by Englisch, Volker (NIH/NCI) [C]

I probably forgot but this problem had been identified as GK issue that turned out to be a result of OCECDR-3910. The reason this problem is now showing up more frequently is that the lower tiers of the GK server have been refreshed.

A temporary solution - at least for DEV and QA - is to reset the Job-ID in the pub_proc table using the following command:

 DBCC CHECKIDENT (pub_proc, RESEED, NNNN)

where NNNN should be larger than the latest Job-ID on gatekeeper with the Source entry of CDR-PROD.

Comment entered 2016-11-09 14:43:15 by Kline, Bob (NIH/NCI) [C]

Story points include:

  • nagging WCMS team to pick up ticket WCMSGK-47

  • manual cleanup of the data while waiting for that to happen

Comment entered 2017-03-30 14:00:19 by Englisch, Volker (NIH/NCI) [C]

This ticket is related to OCECDR-4215. If we're able to implement that ticket (creating a web interface to reset the Job-ID counter) we will be able to close this ticket.
I will have to find out if we're having the permissions needed to update the Job-ID counter on STAGE in order to make this problem disappear.

Comment entered 2017-08-09 09:14:42 by Kline, Bob (NIH/NCI) [C]

: see comment on OCECDR-4215.

Comment entered 2018-07-06 11:20:32 by Kline, Bob (NIH/NCI) [C]

: please close this ticket, unless there's a reason to keep it open (in which case please explain).

Comment entered 2018-07-06 11:41:12 by Englisch, Volker (NIH/NCI) [C]

The status Resolve as Won't Fix Issue isn't exactly right because we've implemented a work-around.
Closing ticket.

Elapsed: 0:00:00.001238