CDR Tickets

Issue Number 3344
Summary Rebuild Linux Servers Verdi and Schubert
Created 2011-05-04 11:18:08
Issue Type Bug
Submitted By priced
Assigned To Englisch, Volker (NIH/NCI) [C]
Status Closed
Resolved 2011-08-09 17:21:57
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.107672
Description

BZISSUE::5037
BZDATETIME::2011-05-04 11:18:08
BZCREATOR::Admin (VEnglisch)
BZASSIGNEE::Volker Englisch
BZQACONTACT::Bob Kline

The two servers Verdi and Schubert need to be re-build and the infrastructure team has asked for assistance with this task.

The services affected are the following:

On the production server (Schubert == pdqupdate.cancer.gov):

  • clinical trial update portal

  • clinical trial submission portal

  • glossifier service

On the dev server (Verdi):

  • Bugzilla

  • PDQ Wiki

  • development of Electronic Board Member system

  • backup of subversion repository

Comment entered 2011-05-04 16:37:03 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-05-04 16:37:03
BZCOMMENTOR::Bob Kline
BZCOMMENT::1

Making progress. Bugzilla is installed on Verdi (I'll switch over the database tonight). The glossifier service and the Clinical Trials Submission portal are working on Schubert. I have tested the glossifier but not the CTS. CIAT is no longer generating protocol emailers, so I'm not going to worry about rebuilding the clinical trial update portal (though the database has been restored).

Comment entered 2011-05-04 18:35:56 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-05-04 18:35:56
BZCOMMENTOR::Volker Englisch
BZCOMMENT::2

I've reset all of the parameters on the new installation and copied over our Cdr skin. Bugzilla should function the same as the old installation now.
If that's not the case please let me know.

Comment entered 2011-05-04 21:54:58 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-05-04 21:54:58
BZCOMMENTOR::Bob Kline
BZCOMMENT::3

I shut down the temporary bugzilla a few minutes ago, backed up the database, moved it back to verdi, restored it on verdi, and turned bugzilla back on. Let me know if you run into any odd behavior.

Comment entered 2011-05-04 22:13:01 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-05-04 22:13:01
BZCOMMENTOR::Bob Kline
BZCOMMENT::4

Volker:

I had to change the admin login from admin@verdi.nci.nih.gov because it caused mail delivery for newly posted comments to fail if that account was tied to the issue (as is the case for this one), so I set it to Volker@Englisch.us. We can change that later if you want.

Comment entered 2011-05-05 10:45:10 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-05-05 10:45:10
BZCOMMENTOR::Bob Kline
BZCOMMENT::5

I've got the status report working on Verdi now.

Comment entered 2011-05-05 15:24:15 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-05-05 15:24:15
BZCOMMENTOR::Bob Kline
BZCOMMENT::6

The PDQ wiki is back up:

http://verdi.nci.nih.gov/pdqwiki/

Comment entered 2011-05-05 16:14:23 by priced

BZDATETIME::2011-05-05 16:14:23
BZCOMMENTOR::Admin (VEnglisch)
BZCOMMENT::7

Testing admin email. Bob, are you getting this email?

Comment entered 2011-05-05 22:47:32 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-05-05 22:47:32
BZCOMMENTOR::Bob Kline
BZCOMMENT::8

The new CTS urls are working on Verdi now (see issue #5035):

http://verdi.nci.nih.gov/submission
http://verdi.nci.nih.gov/liaison-office

Comment entered 2011-05-09 17:23:33 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-05-09 17:23:33
BZCOMMENTOR::Bob Kline
BZCOMMENT::9

I've got the emailer tracking, lookup tables, and glossifier refresh jobs working again on schubert.

Comment entered 2011-05-10 10:36:04 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-05-10 10:36:04
BZCOMMENTOR::Bob Kline
BZCOMMENT::10

I have some of the db backup jobs reinstated, but blocked for other backup jobs by the expiration of our RedHat entitlements, which Mauricio is working on getting restored. As I understand it, we're still not covered by tape backup procedures for the systems themselves.

Comment entered 2011-05-13 18:42:46 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-05-13 18:42:46
BZCOMMENTOR::Volker Englisch
BZCOMMENT::11

Bob, do you know if the whine reports are starting automatically or if we need to turn something on for those reports to run automatically?

I was expecting a report at 3pm this afternoon but that didn't get created (or send).

Comment entered 2011-05-16 10:07:00 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-05-16 10:07:00
BZCOMMENTOR::Bob Kline
BZCOMMENT::12

(In reply to comment #11)
> Bob, do you know if the whine reports are starting automatically or if we need
> to turn something on for those reports to run automatically?
>
> I was expecting a report at 3pm this afternoon but that didn't get created (or
> send).

I've been hoping that I could address things like this after I had access to the full filesystems of the original servers, but I guess it's going to take a while before that happens. I've set up a cron job to check every 15 minutes.

Comment entered 2011-05-16 11:46:40 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-05-16 11:46:40
BZCOMMENTOR::Bob Kline
BZCOMMENT::13

The entitlements have been restored and I was able to finish setting up the rest of the local backups. We still need offline backups to be set up.

I noticed that the web, db, and time servers were not configured to start automatically, so we would have had mysterious failures on the next reboot. I've corrected that problem. The clocks on both machines had drifted seriously behind.

The drupal development sites have been reinstalled, and I've tested some of them to verify that they survived intact.

I think all of our services are back in place, including the GP mailer system, which I neglected to mention in the list I gave earlier. Can you confirm that the GP mailers are working correctly, William?

Comment entered 2011-05-16 12:43:13 by alan

BZDATETIME::2011-05-16 12:43:13
BZCOMMENTOR::Alan Meyer
BZCOMMENT::14

(In reply to comment #13)

> ... We still need offline backups to be set up. ...

I've got a 320 (maybe 290 GB usable) GB USB 2.0 drive at home that I can spare temporarily. Would that be of use for storing backups until new hardware arrives? Or is it too small? If it's useful, I can bring it in tomorrow.

Comment entered 2011-05-16 13:18:42 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-05-16 13:18:42
BZCOMMENTOR::Bob Kline
BZCOMMENT::15

How large are the VM images, Volker? Does the VM have to be shut down to be cloned?

Comment entered 2011-05-17 11:45:45 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-05-17 11:45:45
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::16

(In reply to comment #13)
> I think all of our services are back in place, including the GP mailer system,
> which I neglected to mention in the list I gave earlier. Can you confirm that
> the GP mailers are working correctly, William?

Should I test on Mahler or Franck or Both? We have temporarily stopped sending out mailers so I cannot confirm on Bach.

Comment entered 2011-05-18 14:23:54 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-05-18 14:23:54
BZCOMMENTOR::Bob Kline
BZCOMMENT::17

(In reply to comment #16)
> (In reply to comment #13)
> > I think all of our services are back in place, including the GP mailer system,
> > which I neglected to mention in the list I gave earlier. Can you confirm that
> > the GP mailers are working correctly, William?
>
> Should I test on Mahler or Franck or Both? We have temporarily stopped sending
> out mailers so I cannot confirm on Bach.

I think I've done as much testing of the GP mailers as we need to do right now.

Comment entered 2011-06-09 07:06:08 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-06-09 07:06:08
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::18

We decided in today's meeting to close this issue since the Liaison office has been using the portal without any problems.
*Marking as resolved.

Comment entered 2011-06-09 07:06:32 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-06-09 07:06:32
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::19

(In reply to comment #18)
> We decided in today's meeting to close this issue since the Liaison office has
> been using the portal without any problems.
> *Marking as resolved.

Closing issue.

Comment entered 2011-06-09 07:27:26 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-06-09 07:27:26
BZCOMMENTOR::Volker Englisch
BZCOMMENT::20

Wasn't it issue 5035 that you wanted to close?

Comment entered 2011-06-09 07:31:22 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-06-09 07:31:22
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::21

(In reply to comment #20)
> Wasn't it issue 5035 that you wanted to close?

That is correct. I have reopened this one and closed 5035. Thanks for catching that :-)

Comment entered 2011-07-20 09:53:28 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-07-20 09:53:28
BZCOMMENTOR::Bob Kline
BZCOMMENT::22

Next step for this task is for Volker to move the backup tars for the incarnations of Verdi and Schubert which died this past spring to more permanent storage so we can reclaim the space. Then, after we're confident that Mauricio's team is backing up the servers appropriately, we can close this issue.

Comment entered 2011-07-27 17:54:52 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-07-27 17:54:52
BZCOMMENTOR::Volker Englisch
BZCOMMENT::23

The 1 TB disk mounted to Verdi is full and has only 42GB of space left. I doubt that Carbie would be too happy if I copied another 25GB to it. I've send him another message to see where to put those tar files from Verdi.

Comment entered 2011-07-29 18:18:43 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-07-29 18:18:43
BZCOMMENTOR::Volker Englisch
BZCOMMENT::24

I've copied all files from my home directory on Verdi to
/mnt/mn2/Verdi/Archive

I ran a cmp between the copied files and removed the original files from my home directory.

Still need to do Schubert.

Comment entered 2011-08-01 14:52:19 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-08-01 14:52:19
BZCOMMENTOR::Volker Englisch
BZCOMMENT::25

I've copied the tar files from my home directory on Schubert to the mounted drive
/mnt/mn2/Schubert/Archive
and removed these files from my directory after comparing the original with the copied version.
We have two tar files for 'home'. One is 22GB and another one (home2) is 3.8GB.
I've only copied the smaller version of the two. The larger file includes many older tar backups that were located in Stas' directory.

Let me know if you'd like me to copy the larger home tar file also.

Comment entered 2011-08-01 14:54:10 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-08-01 14:54:10
BZCOMMENTOR::Bob Kline
BZCOMMENT::26

(In reply to comment #25)

> Let me know if you'd like me to copy the larger home tar file also.

If that's the only difference, then no. Otherwise yes.

Comment entered 2011-08-01 16:37:21 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-08-01 16:37:21
BZCOMMENTOR::Volker Englisch
BZCOMMENT::27

These are the files that were excluded from the smaller home.tar.bz2 file. Everything else is identical:

> home/sfirstov/home.tar.gz
> home/sfirstov/mysql_backup_02_21_2008.dmp
> home/sfirstov/mysql_backup_12-10-07.dmp
> home/dbbackup/schubert/dropbox2011-04-26_01h00m.bz2
> home/dbbackup/schubert/cts2011-04-20_01h00m.bz2
> home/dbbackup/schubert/mysql2011-04-21_01h00m.bz2
> home/dbbackup/schubert/emailers2011-04-20_01h00m.bz2
> home/dbbackup/schubert/mysql2011-04-26_01h00m.bz2
> home/dbbackup/schubert/outcomes2011-04-26_01h00m.bz2
> home/dbbackup/schubert/cts2011-04-25_01h00m.bz2
> home/dbbackup/schubert/emailers2011-04-19_01h00m.bz2
> home/dbbackup/schubert/glossifier2011-04-21_01h00m.bz2
> home/dbbackup/schubert/dropbox2011-04-25_01h00m.bz2
> home/dbbackup/schubert/mysql2011-04-23_01h00m.bz2
> home/dbbackup/schubert/cts2011-04-22_01h00m.bz2
> home/dbbackup/schubert/emailers2011-04-26_01h00m.bz2
> home/dbbackup/schubert/wikidb2011-04-26_01h00m.bz2
> home/dbbackup/schubert/cts2011-04-23_01h00m.bz2
> home/dbbackup/schubert/glossifier2011-04-19_01h00m.bz2
> home/dbbackup/schubert/cts2011-04-26_01h00m.bz2
> home/dbbackup/schubert/glossifier2011-04-25_01h00m.bz2
> home/dbbackup/schubert/emailers2011-04-22_01h00m.bz2
> home/dbbackup/schubert/wikidb2011-04-21_01h00m.bz2
> home/dbbackup/schubert/mysql2011-04-25_01h00m.bz2
> home/dbbackup/schubert/dropbox2011-04-21_01h00m.bz2
> home/dbbackup/schubert/dropbox2011-04-23_01h00m.bz2
> home/dbbackup/schubert/wikidb2011-04-19_01h00m.bz2
> home/dbbackup/schubert/emailers2011-04-25_01h00m.bz2
> home/dbbackup/schubert/glossifier2011-04-24_01h00m.bz2
> home/dbbackup/schubert/glossifier2011-04-20_01h00m.bz2
> home/dbbackup/schubert/wikidb2011-04-24_01h00m.bz2
> home/dbbackup/schubert/outcomes2011-04-23_01h00m.bz2
> home/dbbackup/schubert/mysql2011-04-24_01h00m.bz2
> home/dbbackup/schubert/glossifier2011-04-22_01h00m.bz2
> home/dbbackup/schubert/glossifier2011-04-26_01h00m.bz2
> home/dbbackup/schubert/emailers2011-04-24_01h00m.bz2
> home/dbbackup/schubert/mysql2011-04-20_01h00m.bz2
> home/dbbackup/schubert/dropbox2011-04-24_01h00m.bz2
> home/dbbackup/schubert/outcomes2011-04-24_01h00m.bz2
> home/dbbackup/schubert/outcomes2011-04-25_01h00m.bz2
> home/dbbackup/schubert/wikidb2011-04-20_01h00m.bz2
> home/dbbackup/schubert/outcomes2011-04-21_01h00m.bz2
> home/dbbackup/schubert/wikidb2011-04-22_01h00m.bz2
> home/dbbackup/schubert/emailers2011-04-23_01h00m.bz2
> home/dbbackup/schubert/outcomes2011-04-19_01h00m.bz2
> home/dbbackup/schubert/dropbox2011-04-22_01h00m.bz2
> home/dbbackup/schubert/outcomes2011-04-22_01h00m.bz2
> home/dbbackup/schubert/outcomes2011-04-20_01h00m.bz2
> home/dbbackup/schubert/mysql2011-04-19_01h00m.bz2
> home/dbbackup/schubert/cts2011-04-21_01h00m.bz2
> home/dbbackup/schubert/mysql2011-04-22_01h00m.bz2
> home/dbbackup/schubert/glossifier2011-04-23_01h00m.bz2
> home/dbbackup/schubert/wikidb2011-04-23_01h00m.bz2
> home/dbbackup/schubert/cts2011-04-24_01h00m.bz2
> home/dbbackup/schubert/emailers2011-04-21_01h00m.bz2
> home/dbbackup/schubert/dropbox2011-04-20_01h00m.bz2
> home/dbbackup/schubert/wikidb2011-04-25_01h00m.bz2

Comment entered 2011-08-02 16:15:39 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-08-02 16:15:39
BZCOMMENTOR::Volker Englisch
BZCOMMENT::28

All tar files have been copied from Verdi and Schubert to
/mnt/mn2/Verdi/Archive
and
/mnt/mn2/Schubert/Archive

Comment entered 2011-08-09 17:20:46 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-08-09 17:20:46
BZCOMMENTOR::Volker Englisch
BZCOMMENT::29

What was left to do here? Didn't we decide this issue is ready to close?

Comment entered 2011-08-09 17:21:57 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-08-09 17:21:57
BZCOMMENTOR::Bob Kline
BZCOMMENT::30

All done. :-)

Elapsed: 0:00:00.001503