Issue Number | 3359 |
---|---|
Summary | Have disk cleanup utility keep Bach from running low on disk space |
Created | 2011-05-14 13:43:42 |
Issue Type | Bug |
Submitted By | Kline, Bob (NIH/NCI) [C] |
Assigned To | Englisch, Volker (NIH/NCI) [C] |
Status | Closed |
Resolved | 2011-06-10 16:03:29 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.107687 |
BZISSUE::5052
BZDATETIME::2011-05-14 13:43:42
BZCREATOR::Bob Kline
BZASSIGNEE::Volker Englisch
BZQACONTACT::Alan Meyer
From an earlier email exchange in March:
============================== snip ===================================
On 03/28/2011 12:51 PM, Bob Kline wrote:
> > On 03/28/2011 12:46 PM, Alan Meyer wrote:
>>> >> > I suggest that we run a du on Bach, if you
haven't already done it, to
>>> >> > find out where the biggest space savings can
be had.
> > That's in progress.
> >
That run aborted for some reason, so I ran another (after more
cleanup). Here's an overview of disk usage for the D: drive on
Bach
right now:
Total size of disk: 717GB
Free space: 83GB
DBMS stores: 271GB (mostly cdr and version archive dbs)
/cdr/output: 218GB (mostly publishing output)
/cdr/Utilities: 101GB (mostly CT.gov downloads)
/home: 8GB
/cdr/Mailers: 8GB
/cdr/GlobalChange: 3GB
/cygwin: 3GB
/usr: 2GB
/Inetpub: 1.5GB
/downloads: 1GB
/tmp: 1GB
/Program Files: 1GB
everything else: 15GB
So the only really big-ticket space users are the DBMS stores,
the
publishing output, and the CT.gov import files. Of those, the only
things that really need to be on the D: drive permanently are the
DBMS
stores. The publishing output and CT.gov downloads can be siphoned
off
to some other storage, I would think. Bottom line: one way or
another,
we need to have a system whose disk usage doesn't increase, with
the
exception of the unavoidable increase in the version archive
database.
We should never have a "zero bytes free" condition again.
============================== snip ===================================
Let's follow up on this. Please have the Hoover configuration file modified so that the publishing output and the CT.gov downloads are moved off of D: on Bach regularly. The "work" directories for the CT.gov downloads should be kept for some limited amount of time (say, somewhere between one and three months), but can be discarded after that. The zip files for those downloads should be retained.
BZDATETIME::2011-05-16 16:26:43
BZCOMMENTOR::Volker Englisch
BZCOMMENT::1
(In reply to comment #0)
> Please have the Hoover configuration file modified so
> that the publishing output and the CT.gov downloads are moved off
of D: on Bach
> regularly.
Did you already have a specific location where to put these files in mind?
BZDATETIME::2011-05-16 16:30:37
BZCOMMENTOR::Bob Kline
BZCOMMENT::2
You'll need to work with Mauricio's team to determine the best location.
BZDATETIME::2011-05-16 16:51:11
BZCOMMENTOR::Alan Meyer
BZCOMMENT::3
(In reply to comment #1)
...
> Did you already have a specific location where to put these files
in mind?
I would think somewhere on the SAN (R drive) would be best.
However, wherever we put them, we need to think about backup strategies. Most of the files in question are easy to backup because, once written, they never change. They're also ideal candidates for a 2-3 TB USB drive for backups since they can be backed up slowly without having to lock users out of anything at all. It could take some years to fill one drive and then we can take it offline and pay 49.95 for a new one. But it may not be practical for non-technical reasons to do this.
BZDATETIME::2011-05-25 14:37:06
BZCOMMENTOR::Volker Englisch
BZCOMMENT::4
I have modified the FileSweeper config file to include deleting all
work-directories under the directory
/cdr/utilities/CTGovDownloads
that are older than a month.
I'm also deleting the vendor output data from the directory
/cdr/output/LicenseeDocs
that are older than a month.
I ran the filesweeper on BACH and FRANCK to free up additional space based on the changes.
I am now waiting for the OPS team to identify a location for me to move the FileSweeper output to.
BZDATETIME::2011-06-07 17:57:55
BZCOMMENTOR::Volker Englisch
BZCOMMENT::5
I've picked the SAN to store our Hoover output and I've created a
directory structure mimicking what's on BACH for the location of the
FileSweeper output:
R:/Backup/Bach/cdr/Output/JobArchive
../GlobalChange/JobArchive
../Franck/cdr/Output/JobArchive
...
The files that were stored on BACH are now stored on the SAN for the
biggest space hogs and the FileSweeper config file has been adjusted to
write directly to the SAN for future sweeps.
I'm still monitoring the sweeps to make sure everything is created where
it's supposed to and adjusting some of the locations.
At this point we have around 20% free disk space on BACH. I think that's
a little better than "zero bytes free".
BZDATETIME::2011-06-08 16:07:47
BZCOMMENTOR::Volker Englisch
BZCOMMENT::6
Bob, in the Mailer/Output directory there are a log of *.tar.bz2
files that are not used anymore but did not get created by the
FileSweeper process.
These files are named
FailedJob?— ?.tar.bz2
Job— -e.tar.bz2
PrintFilesForJob— .tar.bz2
SupportFilesForJob??.tar.bz2
Should I keep them where they are or move them, along with the Hoover files MailerJobs.*.tar.bz2 to the SAN?
BZDATETIME::2011-06-10 16:02:08
BZCOMMENTOR::Volker Englisch
BZCOMMENT::7
The FileSweeper has been adjusted to create the backup files on the
SAN.
Most of the existing backup files have been moved to the SAN and BACH
has now almost 200GB disk space available.
BZDATETIME::2011-06-10 16:03:29
BZCOMMENTOR::Bob Kline
BZCOMMENT::8
Hallelujah! Thank you!
Elapsed: 0:00:00.000627