PDQ Issues

Issue Number	4101
Summary	Need a mechanism for automatically verifying that all tiers are configured identically
Created	2016-05-17 14:25:13
Issue Type	Improvement
Submitted By	Learn, Blair (NIH/NCI) [C]
Assigned To	Kline, Bob (NIH/NCI) [C]
Status	Closed
Resolved	2016-12-21 20:52:28
Resolution	Fixed
Path	/home/bkline/backups/jira/ocecdr/issue.184309

Description

We have determined that not all CDR tiers are configured identically. Ideally there would be a mechanism in place which could be used at deployment (or other arbitrary) time to verify that each tier meets a specific configuration.

Some of the items to consider:

Python version
Python package versions
Path and other environment variables
File paths
IIS configuration
Database versions (SQL Server and MySQL)

Comment entered 2016-11-01 10:08:39 by Kline, Bob (NIH/NCI) [C]

This will require changes which need to be deployed by CBIIT, so I've put this in Einstein.

Comment entered 2016-11-11 13:25:33 by Kline, Bob (NIH/NCI) [C]

~LearnB: can you elaborate on "File paths" from your issue description? Thanks.

Comment entered 2016-11-14 16:01:36 by Kline, Bob (NIH/NCI) [C]

~LearnB: Is the attached report what you had in mind? I can obviously only run it on the lower two tiers, but the software is set up to compare up to all four tiers once it's been deployed. The tool is run from the command line. If you give it a single tier, it just fetches and saves the settings for that tier. If you specify more than one tier, it compares each adjacent pair of tiers, creating a separate worksheet for each comparison. It's also possible to have it do the comparisons from saved settings files as an alternative to connecting to the tiers' servers. The attached report has more noise than there would normally be after a deployment, as DEV and QA will be expected to diverge during development. I'll try to get the DBA team to reconcile the discrepancies in the MySQL configurations between DEV and QA, which should cut down on the verbosity a bit. I deliberately left the pip versions different between the two tiers to test the Python part of the report.

I deliberately made this a command-line tool, rather than web-based, because I need separate sessions for each tier from which I'm collecting settings. That means I can't just piggy-back on the session for a web-based login which uses the browser's tools for collecting and submitting your credentials. Presumably a user would trust their browser's authentication mechanism more than they would trust my CGI script, whose source code they might not be able to inspect (whereas the user could always inspect the source code for a command-line script they are running).

usage: tier-settings.py [-h] [-u USER] [-p PASSWORD] tier [tier ...]

Compare cdr tiers

positional arguments:
  tier

optional arguments:
  -h, --help            show this help message and exit
  -u USER, --user USER  NIH domain user ID
  -p PASSWORD, --password PASSWORD
                        NIH domain password

If a single tier is named, its settings will be fetched and saved to a file
whose name contains the tier and a timestamp. If more than one tier is named,
the settings will be saved in separate files as for a single tier, and in
addition each adjacent pair of tiers will be compared and the differences
reported in an worksheet, to be saved as part of a single Excel workbook (with
a timestamped file name). You can specify a password on the command line, but
this is discouraged as insecure. If you supply a user name but no password
(the most common usage) you will be prompted for a password, which will not be
displayed as you type it. For some or all of the tiers, you may follow the
tier name with a file path identifying settings for a tier captured from a
previous run of the program. Separate the tier name from the path with a
colon. You can also provide a CDR session ID for each tier instead of giving
your NIH user name and password. Each tier must have its own session ID, valid
for that tier, and the session ID is separated from the tier name by a colon.

Comment entered 2016-11-14 17:13:40 by Kline, Bob (NIH/NCI) [C]

I have asked the DBA team to resolve the MySQL configuration discrepancies between DEV and QA (DBATEAM-2661).

Comment entered 2016-11-15 12:07:52 by Learn, Blair (NIH/NCI) [C]

"File paths" was meant to refer to where an executable is stored (e.g. is CdrServer.exe at the same path on each tier) or similar for the location of a given configuration file. Ideally, files live at a consistent path across the tiers.

The report is pretty much what I had in mind. There's likely some room for conversation about checksums versus version numbers for the various executables. It seems less likely that anyone would randomly edit the Python scripts versus something being overlooked. (Though detecting an edit is not a bad thing either.)

Comment entered 2016-12-22 13:51:21 by Kline, Bob (NIH/NCI) [C]

~volker: I've got this tool and the build/deploy software about where I want them for Einstein, and the tool verifies that QA and DEV are configured identically (after running deploy-all.py on QA) except for these four files:

Path	Status
/cdr/ClientFiles/Rules/DrugInformationSummary.ctm	different on DEV vs. QA
/cdr/ClientFiles/Template/Cdr/DrugInformationSummary.xml	different on DEV vs. QA
/cdr/ClientFiles/Template/Cdr/Term.xml	different on DEV vs. QA
/cdr/Publishing/test-ftp2.py	only on DEV

Can you confirm that these differences will find their way into svn soon?

Thanks!

Comment entered 2017-01-06 15:02:15 by Englisch, Volker (NIH/NCI) [C]

I'm guessing the comparison can only be run between DEV and QA at this time, right?

It would be nice to indicate in the help output the allowed values for the tiers, I'm assuming DEV, QA, STAGE, PROD. Any other value, i.e. development , results in an error message:

  File "D:\Python\lib\site-packages\requests\adapters.py", line 487, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: 
HTTPSConnectionPool(host='cdr-development.cancer.gov', port=443): 
Max retries exceeded with url: /cgi-bin/secure/login.py (
Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTP
SConnection object at 0x00000000032E9E80>: 
Failed to establish a new connection:  [Errno 11004] getaddrinfo failed',))

Other than the help message should we add this to the CDR documentation or Collaborate?

The program seems to be working as expected. When run against DEV and QA I'm seeing around 95 file differences of which 17 are not part of the glossifier or emailer.

Comment entered 2017-01-06 15:29:26 by Kline, Bob (NIH/NCI) [C]

I just checked in an enhanced version which explicitly identifies the valid tier values in the help message, and gives a better error message if an invalid tier value is supplied.

Comment entered 2017-01-06 19:23:36 by Kline, Bob (NIH/NCI) [C]

... I'm seeing around 95 file differences of which 17 are not part of the glossifier or emailer.

I've got the Linux deployments down to two scripts, one for each server, so Einstein will be much easier for CBIIT. I've run them on DEV and QA (you we're cc'd on the logs, meaning you should have received a couple of message from the the scripts for each server), so you'll probably see many fewer deltas next time you run the tier comparison script.

While we have the glossifier and ftp server on the same virtual host, we might consider taking advantage of that arrangement, and have the web server collect information about the ftp server's configuration, assuming we can get CBIIT to allow the glossifier account, under which Apache is running on the machine, to see the cdroperator account's files (it can't right now). What do you think?

Comment entered 2017-01-09 13:57:33 by Englisch, Volker (NIH/NCI) [C]

I just checked in an enhanced version

This means it is ready to be re-tested as part of Einstein IT-2?

Comment entered 2017-01-09 14:04:37 by Englisch, Volker (NIH/NCI) [C]

While we have the glossifier and ftp server on the same virtual host

This is only true on the lower tiers, isn't it?

Comment entered 2017-01-09 14:27:24 by Kline, Bob (NIH/NCI) [C]

Nah.

DEV

nciws-d165-v (Glossifier, Emailers, FTP)

nciws-q181-v (Glossifier, Emailers, FTP)

STAGE

nciws-203-v (Glossifier, FTP)
nciws-204-v (Emailers)

PROD

nciws-p194-v (Emailers)
nciws-p195-v (Glossifier, FTP)

Comment entered 2017-01-09 14:29:49 by Kline, Bob (NIH/NCI) [C]

Sure, in the sense that it's ready (though the rest of Einstein IT-2 isn't).

Comment entered 2017-01-12 14:44:19 by Kline, Bob (NIH/NCI) [C]

Verified on QA.

Attachments

File Name	Posted	User
tier-settings-20161114152348.xls	2016-11-14 15:29:14	Kline, Bob (NIH/NCI) [C]

Elapsed: 0:00:00.000374

CDR Tickets