CDR Tickets

Issue Number 3895
Summary Publishing Error on Lower Tiers after Database Refresh
Created 2015-04-16 18:32:55
Issue Type Improvement
Submitted By Englisch, Volker (NIH/NCI) [C]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2016-03-23 14:13:20
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.159104
Description

When the CDR database on a lower tier gets refreshed it is possible that a publishing push job fails due to the existence of documents/directories of a publishing job that was generated before the database cleanup. The error message created is not very descriptive and it would make it easier in the future to either prevent the problem or help the developer to identify the source of the publishing failure.
Several options are available:
a) prior to creating the output directory 'Job12345.InProcess' ensure the directory 'Job12345' doesn't already exist and exit if it does.
b) In case the directory 'Job12345' already exists, delete it.
c) Create a more descriptive error message.

Comment entered 2015-04-24 13:37:22 by Kline, Bob (NIH/NCI) [C]

I plan to modify the publishing software to rename JobNNNN to JobNNNN-YYYYMMDDHHMMSS when this happens.

Comment entered 2015-04-24 18:51:31 by Englisch, Volker (NIH/NCI) [C]

One caveat of this approach is that there currently doesn't exist a step in FileSweeper to clean up these files but it's easy to add one such step. Given the fact that we shouldn't be running into this situation frequently there should only be a small risk of running out of disk space because of the duplicated space.

Comment entered 2016-02-18 12:06:44 by Learn, Blair (NIH/NCI) [C]

Estimate 5 points

Comment entered 2016-03-23 14:01:20 by Kline, Bob (NIH/NCI) [C]

Added at the bottom of the Publish class's initializer:

        # Prevent collisions resulting from mismatch between database
        # and file system (typically caused by a database refresh from
        # prod on the lower tiers). Let failures throw exceptions.
        # See https://tracker.nci.nih.gov/browse/OCECDR-3895.
        if not self.__isCgPushJob() and self.__outputDir:
            for path in glob.glob(self.__outputDir + "*"):
                if os.path.isdir(path) and "-" not in os.path.basename(path):
                    stat = os.stat(path)
                    stamp = time.strftime("%Y%m%d%H%M%S",
                                          time.localtime(stat.st_mtime))
                    os.rename(path, "%s-%s" % (path, stamp))
Comment entered 2016-03-23 14:38:44 by Kline, Bob (NIH/NCI) [C]
/branches/Darwin/lib/Python/cdrpub.py@13806

Installed on DEV and tested with job 13498. I created the directory

d:\cdr\Output\Job13498

after determining that 13497 was the last job ID in the pub_proc table. I then submitted a hotfix publishing job request.

NCIWS-D141-V-M:D:\home\bkline\sandboxes\Darwin\lib\Python>dir \cdr\Output\job13498*
 Volume in drive D is New Volume
 Volume Serial Number is F4F1-A98D

 Directory of D:\cdr\Output

03/23/2016  02:07 PM    <DIR>          Job13498
03/23/2016  02:04 PM    <DIR>          Job13498-20160323140408
               0 File(s)              0 bytes
               2 Dir(s)  113,107,124,224 bytes free
Comment entered 2016-04-29 12:44:19 by Englisch, Volker (NIH/NCI) [C]

I created a directory Job12345 knowing that the next publishing job would want to create just this directory. I started the publishing job and the directory I had created was renamed, allowing the publishing job to finish properly instead of failing at the end of the job.

This is working like option (d) in the description, i.e. better than (a) - (c)

Comment entered 2016-05-13 15:06:08 by Englisch, Volker (NIH/NCI) [C]

The problem only exists on the lower tiers where the changes had been successfully tested.

Closing ticket.

Elapsed: 0:00:00.001525