Issue Number | 3895 |
---|---|
Summary | Publishing Error on Lower Tiers after Database Refresh |
Created | 2015-04-16 18:32:55 |
Issue Type | Improvement |
Submitted By | Englisch, Volker (NIH/NCI) [C] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2016-03-23 14:13:20 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.159104 |
When the CDR database on a lower tier gets refreshed it is possible
that a publishing push job fails due to the existence of
documents/directories of a publishing job that was generated before the
database cleanup. The error message created is not very descriptive and
it would make it easier in the future to either prevent the problem or
help the developer to identify the source of the publishing
failure.
Several options are available:
a) prior to creating the output directory 'Job12345.InProcess' ensure
the directory 'Job12345' doesn't already exist and exit if it
does.
b) In case the directory 'Job12345' already exists, delete it.
c) Create a more descriptive error message.
I plan to modify the publishing software to rename JobNNNN to JobNNNN-YYYYMMDDHHMMSS when this happens.
One caveat of this approach is that there currently doesn't exist a step in FileSweeper to clean up these files but it's easy to add one such step. Given the fact that we shouldn't be running into this situation frequently there should only be a small risk of running out of disk space because of the duplicated space.
Estimate 5 points
Added at the bottom of the Publish class's initializer:
# Prevent collisions resulting from mismatch between database
# and file system (typically caused by a database refresh from
# prod on the lower tiers). Let failures throw exceptions.
# See https://tracker.nci.nih.gov/browse/OCECDR-3895.
if not self.__isCgPushJob() and self.__outputDir:
for path in glob.glob(self.__outputDir + "*"):
if os.path.isdir(path) and "-" not in os.path.basename(path):
= os.stat(path)
stat = time.strftime("%Y%m%d%H%M%S",
stamp
time.localtime(stat.st_mtime))"%s-%s" % (path, stamp)) os.rename(path,
/branches/Darwin/lib/Python/cdrpub.py@13806
Installed on DEV and tested with job 13498. I created the directory
:\cdr\Output\Job13498 d
after determining that 13497 was the last job ID in the pub_proc table. I then submitted a hotfix publishing job request.
-D141-V-M:D:\home\bkline\sandboxes\Darwin\lib\Python>dir \cdr\Output\job13498*
NCIWS
Volume in drive D is New VolumeNumber is F4F1-A98D
Volume Serial
:\cdr\Output
Directory of D
03/23/2016 02:07 PM <DIR> Job13498
03/23/2016 02:04 PM <DIR> Job13498-20160323140408
0 File(s) 0 bytes
2 Dir(s) 113,107,124,224 bytes free
I created a directory Job12345 knowing that the next publishing job would want to create just this directory. I started the publishing job and the directory I had created was renamed, allowing the publishing job to finish properly instead of failing at the end of the job.
This is working like option (d) in the description, i.e. better than (a) - (c)
The problem only exists on the lower tiers where the changes had been successfully tested.
Closing ticket.
Elapsed: 0:00:00.001525