CDR Tickets

Issue Number 3415
Summary Shrink CDR DB by moving Data to CDR_Archived_Versions DB
Created 2011-09-13 09:32:18
Issue Type Improvement
Submitted By Englisch, Volker (NIH/NCI) [C]
Assigned To alan
Status Closed
Resolved 2012-06-07 18:50:47
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.107743
Description

BZISSUE::5108
BZDATETIME::2011-09-13 09:32:18
BZCREATOR::Volker Englisch
BZASSIGNEE::Alan Meyer
BZQACONTACT::Bob Kline

+++ This bug was initially created as a clone of OCECDR-2949 +++
As a follow-on to Issue #3069, we need to periodically migrate
versions of documents from the cdr database doc_version table to
the cdr_archived_versions database.

To do that, we need to create a script that can be run to do the
job smoothly and safely.

-----------------------------------------------------------------

Believe it or not but the last time we ran these scripts was a year ago.
The cdr database is currently about the same size as the cdr_archived_versions DB and we should move the archived versions from the CDR DB.

Comment entered 2011-09-20 16:52:38 by alan

BZDATETIME::2011-09-20 16:52:38
BZCOMMENTOR::Alan Meyer
BZCOMMENT::1

To do this I'll want to review all of the previous Bugzilla postings, re-read all of the existing scripts, look at any extant log files, etc.

So this will percolate for a while before I do anything.

Comment entered 2011-09-29 22:17:39 by alan

BZDATETIME::2011-09-29 22:17:39
BZCOMMENTOR::Alan Meyer
BZCOMMENT::2

I have re-read the script for archiving versions, the comments in
OCECDR-2949 and the documentation in:

http://bach.nci.nih.gov/cgi-bin/cdr/Filter.py?DocId=CDR0000683474&Filter=name:Documentation+Help+Screens+Filter

As near as I can tell, the documentation is still accurate except
that NAGIOS needs to be turned off at the appropriate time
instead of Big Brother, and I saw no mention of running the DBCC
SHRINKDATABASE command on the version archive data.

I propose to run a test on Franck to be sure the process still
works, and to get timing information.

I looked through the old log files and in Bugzilla for timing
information. In our last run on Bach on June 6, 2010, timings
were approximately as follows:

Copy xml to cdr_archived versions = 7 hours.
Null out xml that was copied = 3.5 hours.
Shrink the cdr database = 7.25 hours.

As of this writing, Franck has:

Unique documents: 69937
Unique versions: 258497

Last year's numbers were:

Unique documents: 83619
Unique versions: 378680

So we should go as fast or faster than last year.

If the plan is acceptable, Volker and Bob need to pick a time
that doesn't interfere with any other use of Franck. We'll also
need to coordinate with the Ops Team to be sure that no backups
or other Ops Team tasks are scheduled for that time, and that any
NAGIOS or other monitoring that expects SQL Server or other
services to be up is turned off.

If everything works, the sequence for Bach would be:

1. Coordinate with the Ops Team on backups and monitoring.
2. Coordinate with users.
3. Get a full backup of the cdr database.
4. Run phase 1.
5. Check.
6. Run phase 2.
7. Check.
8. Run a DBCC SHRINKDATABASE on each of the databases.
9. Backup the two newly modified databases.
10. Allow users back in.

We might start on a Friday night and finish sometime on Sunday -
depending on when we can get backups done.

Comment entered 2011-09-30 09:23:48 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-09-30 09:23:48
BZCOMMENTOR::Volker Englisch
BZCOMMENT::3

Would you like me to refresh Franck first? If yes, when would you like me to do this?

Comment entered 2011-10-15 11:21:13 by alan

BZDATETIME::2011-10-15 11:21:13
BZCOMMENTOR::Alan Meyer
BZCOMMENT::4

I completed the operation to copy data from the cdr database to the cdr_archived_versions database on Franck and then null out the xml data in cdr that had been copied. I ran the second part in multi-user mode because it wouldn't go into single user mode. But that shouldn't have been a problem, and apparently wasn't. It completed normally and nulled out the expected number of items.

It looks like 45.6 GB was transferred from cdr to cdr_archived_versions.

In addition to the extra growth in the cdr_archived_versions database, the transaction logs grew considerably. Both databases had increases of about 29 GB in their transaction logs. So the total extra space required was:

46 + 29 + 29 = 104 GB

I lucked out on disk space. There were 33+ GB free at the end. We'll have to consider the disk space carefully before doing this on Bach. Bach currently has 159 GB free, which looks like enough now. If we consume much space between now and the time we run the archiver we'll need to recover some first.

I'm now running a DBCC SHRINKDATABASE('cdr'). That's also running in multiuser mode.

Comment entered 2011-10-17 16:32:00 by alan

BZDATETIME::2011-10-17 16:32:00
BZCOMMENTOR::Alan Meyer
BZCOMMENT::5

I'm adding this information to the Bugzilla record so it will be recorded for next year.

The shrinkdatabase command failed after 8 hours and 20 minutes due to an out of disk space error. SQLServer had used up all 33 GB of free space while shrinking the database. Unfortunately, the temporary space used was added to the transaction log where it makes things that much harder to fix now. Also unfortunately, although I knew about this from past experience, the Microsoft documentation did not mention in any place that I saw how much extra space was required, or even that any was required, in order to recover unused space inside a database.

Some options which can be tried include:

Free more space, possibly by copying large databases onto the SAN, and try again.

Try the dbcc shrinkfile command instead of the shrinkdatabase command.

Try the SQL Server Manager GUI tool to recover space by clicking buttons.

If Min is in on Tuesday, I'll ask her advice on the best way to proceed.

Comment entered 2011-10-18 16:04:22 by alan

BZDATETIME::2011-10-18 16:04:22
BZCOMMENTOR::Alan Meyer
BZCOMMENT::6

Min met with Bob, Volker and I and went over what we had to do to recover the space on Franck. She has an excellent understanding of SQL Server and saved us a lot of time and frustration.

The first thing we did was to change the "recovery mode" from "Full" to "Simple". That causes the transaction log to only record transactions until they are committed, then frees the space. That would probably have kept the transaction log from growing as it did during the data migration and during the shrink operation.

Then we ran a shrinkfile (not a shrinkdatabase) on the log files. Each one was truncated to half a gigabyte in a matter of seconds. That recovered about 115 GB.

I'm now running a dbcc shrinkdatabase on the main cdr database. I presume it will take many hours, as before, but will not run out of disk space. When it's done I'll do the same for the cdr_archived_versions. It looks like we can recover almost 16 GB there.

I'll update the documentation to reflect the new knowledge gained.

Comment entered 2011-10-18 19:27:50 by alan

BZDATETIME::2011-10-18 19:27:50
BZCOMMENTOR::Alan Meyer
BZCOMMENT::7

I have updated the documentation at:

http://bach.nci.nih.gov/cgi-bin/cdr/Filter.py?DocId=CDR0000683474&Filter=name:Documentation+Help+Screens+Filter

to reflect the latest experience on Franck.

The shrinkdatabase command is still running on the cdr database. The transaction log file has not ballooned upward as it had before Min advised us to put the database into "simple" recovery mode. That should enable things to go a lot more smoothly on Bach.

Comment entered 2011-10-18 23:52:07 by alan

BZDATETIME::2011-10-18 23:52:07
BZCOMMENTOR::Alan Meyer
BZCOMMENT::8

The shrink of the CDR database completed in 4 hours and change. The size went from:

166,825,689,088 bytes
to:
99,809,558,528 bytes

The shrink is still running on the cdr_archived_versions database. It should complete sometime in the wee hours of the morning.

I think we should run a weekly publishing job (no push) as a test of the process. Publishing will need to extract many document versions that previously came from the cdr but will come from cdr_archived_versions after this. We'll be looking to find out whether everything works and if there is any change in performance.

Comment entered 2011-10-19 13:30:37 by alan

BZDATETIME::2011-10-19 13:30:37
BZCOMMENTOR::Alan Meyer
BZCOMMENT::9

The shrinkdatabase completed in 11 hours and 21 minutes on the cdr_archived_versions database. It had no effect at all on the file size.

In theory, that's as it should be. There were no holes in the data because nothing has ever been deleted from it - though there might be some automatically assigned room for expansion that could have been compressed out if I had known the right parameters to use.

We're ready for testing on Franck. I suggest running a weekly publishing job with no push.

If everything looks good we can run for real on Bach this weekend, though I'd rather wait until the next weekend or even one more in order to have more testing.

Comment entered 2011-10-20 15:29:54 by alan

BZDATETIME::2011-10-20 15:29:54
BZCOMMENTOR::Alan Meyer
BZCOMMENT::10

I made a few more notes in the documentation and reorganized it a bit. By next year I will have forgotten a lot so we'll find out then whether the documentation is accurate.

Comment entered 2011-12-03 08:52:49 by alan

BZDATETIME::2011-12-03 08:52:49
BZCOMMENTOR::Alan Meyer
BZCOMMENT::11

Copying of versions from the cdr to the cdr_archived_versions databases is complete. The count verification showed all counts were right.

290,213 unique versions were copied in 4 hours and 55 minutes.

I've started the second step - to delete the copied data from the cdr database. I'm running this one in single user mode too. I expect it to not finish until tonight. Then I'll restore services and begin the database shrink programs to recover space that was freed by the copying.

Comment entered 2011-12-03 12:21:02 by alan

BZDATETIME::2011-12-03 12:21:02
BZCOMMENTOR::Alan Meyer
BZCOMMENT::12

The removal of versions from the cdr database completed in 3 hours and 22 minutes. CDR services are restored and the database is back in multi-user mode. It's now possible for people to resume any critical work. However I will be performing database shrink and backup procedures that will use a lot of computer time.

I'll make more postings as various parts of the process are completed.

Comment entered 2011-12-03 12:22:12 by alan

BZDATETIME::2011-12-03 12:22:12
BZCOMMENTOR::Alan Meyer
BZCOMMENT::13

I'm adding Robin and William to the CC list for this issue so that they'll get notifications when everything is done.

Comment entered 2011-12-04 00:30:46 by alan

BZDATETIME::2011-12-04 00:30:46
BZCOMMENTOR::Alan Meyer
BZCOMMENT::14

All of the archiving is complete and the database shrinks have
been done on both databases.

I'm now running a backup of the newly archived database to a disk
file at:

'd:\home\alan\cdr\database\cdr_archived_versions2011-12-04.bak'

When it's done, I'll compress it.

The results of the shrink operations were completely different
from what happened on Franck.

cdr database:

After the shrink, the database size was 109 GB, 10 GB more
than we saw on Franck. I don't know why. We almost
certainly did not have 10 GB more data.

Even more suprising, I ran a shrink on the
cdr_archived_versions database and by the time it was done
the cdr database had increased in size again. As of this
writing, it's now 120.6 GB!

cdr_archived_versions database:

This one compressed smaller than on Franck. On Franck the
shrink operation took 11 hours and 21 minutes and resulted in
no change to the database size, leaving the database at 183.6
GB.

On Bach, the shrink took 1 hour and 34 minutes and shrunk the
database down to just under 170 GB which is the amount I
would have predicted.

Here are the results of the shrink reported by the dbcc
command (in case Min or Qian can interpret them for us):

6 1 20749056 80 20746984 20746984
6 2 63 63 56 56

I have no results from the cdr database because I used the
GUI tool for that shrink and it didn't report results.

There were differences in the way the programs were run and in
the environments. I shrank the cdr database on Bach using the
SQL Server Management Studio GUI. I used the SQL DBCC commands
on Franck and also on the cdr_archived_versions database on Bach.

There were also scheduled jobs running on Bach while the shrinks
were underway. These resulted in 97 document versions
representing 95 documents being saved. These are now in the cdr
all_doc_versions table, not in the cdr_archived_versions.

However I don't know why any of the above would have made any
difference to anything. My guess is that the real differences
had to do with different configurations of SQL Server on Franck
and Bach, for example Bach may be configured to keep more free
space inside the cdr database, and perhaps the shrinkdatabase
silently failed on Franck. But I have no evidence one way or the
other for that. I also don't know if database configuration
information is backed up when a database is backed up and then
restored when it is restored. If so, then maybe any
configuration difference would have been eliminated when Franck
was last refreshed.

It may be that one of our DBAs can shed more light on this. All
I can say is that, according to all of my tests, the CDR seems to
be working fine and the archiving really did take place. It all
worked.

I'll report again after the database backup is complete.

Comment entered 2011-12-04 06:36:27 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-12-04 06:36:27
BZCOMMENTOR::Bob Kline
BZCOMMENT::15

(In reply to comment #14)

> However I don't know why any of the above would have made any
> difference to anything. My guess is that the real differences
> had to do with different configurations of SQL Server on Franck
> and Bach, for example Bach may be configured to keep more free
> space inside the cdr database, and perhaps the shrinkdatabase
> silently failed on Franck.

Remember, we're running different versions of SQL Server on the two machines.

Comment entered 2011-12-04 09:58:50 by alan

BZDATETIME::2011-12-04 09:58:50
BZCOMMENTOR::Alan Meyer
BZCOMMENT::16

(In reply to comment #15)

> Remember, we're running different versions of SQL Server on the two machines.

Yes, that might be the cause.

The backup completed in 52.6 minutes (I'm recording the times of everything
here os that next year we'll have estimates of how long everything takes.)

Bzip2 compression is underway.

Comment entered 2011-12-04 16:48:48 by alan

BZDATETIME::2011-12-04 16:48:48
BZCOMMENTOR::Alan Meyer
BZCOMMENT::17

(In reply to comment #16)

> Bzip2 compression is underway.

I made the mistake of requesting maximum bz2 compression ("-9") and I'm
guessing that it won't complete until late tomorrow or Tuesday. So
I've killed that and started a zip backup. I'll copy it off tonight.

In any case, the cdr is now open for business. Anyone who wants to use
it can.

Comment entered 2012-02-02 14:05:03 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2012-02-02 14:05:03
BZCOMMENTOR::Bob Kline
BZCOMMENT::18

This will be done on Mahler when Alan returns from vacation in March.

Comment entered 2012-05-03 14:05:58 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2012-05-03 14:05:58
BZCOMMENTOR::Bob Kline
BZCOMMENT::19

Not going to do anything more on this until we know more about the CBIIT migration.

Comment entered 2012-06-07 14:40:08 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2012-06-07 14:40:08
BZCOMMENTOR::Volker Englisch
BZCOMMENT::20

(In reply to comment #19)
> Not going to do anything more on this until we know more about the CBIIT
> migration.

Adding dependency because MAHLER cannot get refreshed successfully without the archived_versions database being updated.
I'll start the update of the archived_versions DB shortly.

Comment entered 2012-06-07 18:50:47 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2012-06-07 18:50:47
BZCOMMENTOR::Volker Englisch
BZCOMMENT::21

The cdr_archived_versions database on MAHLER has been refreshed.

I used the backup file that is currently located at

mahler\d:\db_backup\bach\cdr_archived_versions.zip
from 2011-12-06.

I believe this issue can now be closed.

Elapsed: 0:00:00.001108