CDR Tickets

Issue Number 3359
Summary Have disk cleanup utility keep Bach from running low on disk space
Created 2011-05-14 13:43:42
Issue Type Bug
Submitted By Kline, Bob (NIH/NCI) [C]
Assigned To Englisch, Volker (NIH/NCI) [C]
Status Closed
Resolved 2011-06-10 16:03:29
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.107687
Description

BZISSUE::5052
BZDATETIME::2011-05-14 13:43:42
BZCREATOR::Bob Kline
BZASSIGNEE::Volker Englisch
BZQACONTACT::Alan Meyer

From an earlier email exchange in March:

============================== snip ===================================

On 03/28/2011 12:51 PM, Bob Kline wrote:
> > On 03/28/2011 12:46 PM, Alan Meyer wrote:
>>> >> > I suggest that we run a du on Bach, if you haven't already done it, to
>>> >> > find out where the biggest space savings can be had.
> > That's in progress.
> >
That run aborted for some reason, so I ran another (after more
cleanup). Here's an overview of disk usage for the D: drive on Bach
right now:

Total size of disk: 717GB
Free space: 83GB
DBMS stores: 271GB (mostly cdr and version archive dbs)
/cdr/output: 218GB (mostly publishing output)
/cdr/Utilities: 101GB (mostly CT.gov downloads)
/home: 8GB
/cdr/Mailers: 8GB
/cdr/GlobalChange: 3GB
/cygwin: 3GB
/usr: 2GB
/Inetpub: 1.5GB
/downloads: 1GB
/tmp: 1GB
/Program Files: 1GB
everything else: 15GB

So the only really big-ticket space users are the DBMS stores, the
publishing output, and the CT.gov import files. Of those, the only
things that really need to be on the D: drive permanently are the DBMS
stores. The publishing output and CT.gov downloads can be siphoned off
to some other storage, I would think. Bottom line: one way or another,
we need to have a system whose disk usage doesn't increase, with the
exception of the unavoidable increase in the version archive database.
We should never have a "zero bytes free" condition again.

============================== snip ===================================

Let's follow up on this. Please have the Hoover configuration file modified so that the publishing output and the CT.gov downloads are moved off of D: on Bach regularly. The "work" directories for the CT.gov downloads should be kept for some limited amount of time (say, somewhere between one and three months), but can be discarded after that. The zip files for those downloads should be retained.

Comment entered 2011-05-16 16:26:43 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-05-16 16:26:43
BZCOMMENTOR::Volker Englisch
BZCOMMENT::1

(In reply to comment #0)
> Please have the Hoover configuration file modified so
> that the publishing output and the CT.gov downloads are moved off of D: on Bach
> regularly.

Did you already have a specific location where to put these files in mind?

Comment entered 2011-05-16 16:30:37 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-05-16 16:30:37
BZCOMMENTOR::Bob Kline
BZCOMMENT::2

You'll need to work with Mauricio's team to determine the best location.

Comment entered 2011-05-16 16:51:11 by alan

BZDATETIME::2011-05-16 16:51:11
BZCOMMENTOR::Alan Meyer
BZCOMMENT::3

(In reply to comment #1)
...
> Did you already have a specific location where to put these files in mind?

I would think somewhere on the SAN (R drive) would be best.

However, wherever we put them, we need to think about backup strategies. Most of the files in question are easy to backup because, once written, they never change. They're also ideal candidates for a 2-3 TB USB drive for backups since they can be backed up slowly without having to lock users out of anything at all. It could take some years to fill one drive and then we can take it offline and pay 49.95 for a new one. But it may not be practical for non-technical reasons to do this.

Comment entered 2011-05-25 14:37:06 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-05-25 14:37:06
BZCOMMENTOR::Volker Englisch
BZCOMMENT::4

I have modified the FileSweeper config file to include deleting all work-directories under the directory
/cdr/utilities/CTGovDownloads
that are older than a month.
I'm also deleting the vendor output data from the directory
/cdr/output/LicenseeDocs
that are older than a month.

I ran the filesweeper on BACH and FRANCK to free up additional space based on the changes.

I am now waiting for the OPS team to identify a location for me to move the FileSweeper output to.

Comment entered 2011-06-07 17:57:55 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-06-07 17:57:55
BZCOMMENTOR::Volker Englisch
BZCOMMENT::5

I've picked the SAN to store our Hoover output and I've created a directory structure mimicking what's on BACH for the location of the FileSweeper output:
R:/Backup/Bach/cdr/Output/JobArchive
../GlobalChange/JobArchive
../Franck/cdr/Output/JobArchive
...

The files that were stored on BACH are now stored on the SAN for the biggest space hogs and the FileSweeper config file has been adjusted to write directly to the SAN for future sweeps.
I'm still monitoring the sweeps to make sure everything is created where it's supposed to and adjusting some of the locations.
At this point we have around 20% free disk space on BACH. I think that's a little better than "zero bytes free".

Comment entered 2011-06-08 16:07:47 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-06-08 16:07:47
BZCOMMENTOR::Volker Englisch
BZCOMMENT::6

Bob, in the Mailer/Output directory there are a log of *.tar.bz2 files that are not used anymore but did not get created by the FileSweeper process.
These files are named
FailedJob?— ?.tar.bz2
Job
-e.tar.bz2
PrintFilesForJob
.tar.bz2
SupportFilesForJob
??.tar.bz2

Should I keep them where they are or move them, along with the Hoover files MailerJobs.*.tar.bz2 to the SAN?

Comment entered 2011-06-10 16:02:08 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-06-10 16:02:08
BZCOMMENTOR::Volker Englisch
BZCOMMENT::7

The FileSweeper has been adjusted to create the backup files on the SAN.
Most of the existing backup files have been moved to the SAN and BACH has now almost 200GB disk space available.

Comment entered 2011-06-10 16:03:29 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-06-10 16:03:29
BZCOMMENTOR::Bob Kline
BZCOMMENT::8

Hallelujah! Thank you!

Elapsed: 0:00:00.000627