Issue Number | 4101 |
---|---|
Summary | Need a mechanism for automatically verifying that all tiers are configured identically |
Created | 2016-05-17 14:25:13 |
Issue Type | Improvement |
Submitted By | Learn, Blair (NIH/NCI) [C] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2016-12-21 20:52:28 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.184309 |
We have determined that not all CDR tiers are configured identically. Ideally there would be a mechanism in place which could be used at deployment (or other arbitrary) time to verify that each tier meets a specific configuration.
Some of the items to consider:
Python version
Python package versions
Path and other environment variables
File paths
IIS configuration
Database versions (SQL Server and MySQL)
This will require changes which need to be deployed by CBIIT, so I've put this in Einstein.
~LearnB: can you elaborate on "File paths" from your issue description? Thanks.
~LearnB: Is the attached report what you had in mind? I can obviously only run it on the lower two tiers, but the software is set up to compare up to all four tiers once it's been deployed. The tool is run from the command line. If you give it a single tier, it just fetches and saves the settings for that tier. If you specify more than one tier, it compares each adjacent pair of tiers, creating a separate worksheet for each comparison. It's also possible to have it do the comparisons from saved settings files as an alternative to connecting to the tiers' servers. The attached report has more noise than there would normally be after a deployment, as DEV and QA will be expected to diverge during development. I'll try to get the DBA team to reconcile the discrepancies in the MySQL configurations between DEV and QA, which should cut down on the verbosity a bit. I deliberately left the pip versions different between the two tiers to test the Python part of the report.
I deliberately made this a command-line tool, rather than web-based, because I need separate sessions for each tier from which I'm collecting settings. That means I can't just piggy-back on the session for a web-based login which uses the browser's tools for collecting and submitting your credentials. Presumably a user would trust their browser's authentication mechanism more than they would trust my CGI script, whose source code they might not be able to inspect (whereas the user could always inspect the source code for a command-line script they are running).
usage: tier-settings.py [-h] [-u USER] [-p PASSWORD] tier [tier ...]
Compare cdr tiers
positional arguments:
tier
optional arguments:
-h, --help show this help message and exit
-u USER, --user USER NIH domain user ID
-p PASSWORD, --password PASSWORD
NIH domain password
If a single tier is named, its settings will be fetched and saved to a file
whose name contains the tier and a timestamp. If more than one tier is named,
the settings will be saved in separate files as for a single tier, and in
addition each adjacent pair of tiers will be compared and the differences
reported in an worksheet, to be saved as part of a single Excel workbook (with
a timestamped file name). You can specify a password on the command line, but
this is discouraged as insecure. If you supply a user name but no password
(the most common usage) you will be prompted for a password, which will not be
displayed as you type it. For some or all of the tiers, you may follow the
tier name with a file path identifying settings for a tier captured from a
previous run of the program. Separate the tier name from the path with a
colon. You can also provide a CDR session ID for each tier instead of giving
your NIH user name and password. Each tier must have its own session ID, valid
for that tier, and the session ID is separated from the tier name by a colon.
I have asked the DBA team to resolve the MySQL configuration discrepancies between DEV and QA (DBATEAM-2661).
"File paths" was meant to refer to where an executable is stored (e.g. is CdrServer.exe at the same path on each tier) or similar for the location of a given configuration file. Ideally, files live at a consistent path across the tiers.
The report is pretty much what I had in mind. There's likely some room for conversation about checksums versus version numbers for the various executables. It seems less likely that anyone would randomly edit the Python scripts versus something being overlooked. (Though detecting an edit is not a bad thing either.)
~volker: I've got this tool and the build/deploy software about where I want them for Einstein, and the tool verifies that QA and DEV are configured identically (after running deploy-all.py on QA) except for these four files:
Path |
Status |
---|---|
/cdr/ClientFiles/Rules/DrugInformationSummary.ctm |
different on DEV vs. QA |
/cdr/ClientFiles/Template/Cdr/DrugInformationSummary.xml |
different on DEV vs. QA |
/cdr/ClientFiles/Template/Cdr/Term.xml |
different on DEV vs. QA |
/cdr/Publishing/test-ftp2.py |
only on DEV |
Can you confirm that these differences will find their way into svn soon?
Thanks!
I'm guessing the comparison can only be run between DEV and QA at this time, right?
It would be nice to indicate in the help output the allowed values for the tiers, I'm assuming DEV, QA, STAGE, PROD. Any other value, i.e. development , results in an error message:
File "D:\Python\lib\site-packages\requests\adapters.py", line 487, in send
ConnectionError(e, request=request)
raise .exceptions.ConnectionError:
requestsHTTPSConnectionPool(host='cdr-development.cancer.gov', port=443):
: /cgi-bin/secure/login.py (
Max retries exceeded with urlNewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTP
Caused by 0x00000000032E9E80>:
SConnection object at new connection: [Errno 11004] getaddrinfo failed',)) Failed to establish a
Other than the help message should we add this to the CDR documentation or Collaborate?
The program seems to be working as expected. When run against DEV and QA I'm seeing around 95 file differences of which 17 are not part of the glossifier or emailer.
I just checked in an enhanced version which explicitly identifies the valid tier values in the help message, and gives a better error message if an invalid tier value is supplied.
... I'm seeing around 95 file differences of which 17 are not part of the glossifier or emailer.
I've got the Linux deployments down to two scripts, one for each server, so Einstein will be much easier for CBIIT. I've run them on DEV and QA (you we're cc'd on the logs, meaning you should have received a couple of message from the the scripts for each server), so you'll probably see many fewer deltas next time you run the tier comparison script.
While we have the glossifier and ftp server on the same virtual host, we might consider taking advantage of that arrangement, and have the web server collect information about the ftp server's configuration, assuming we can get CBIIT to allow the glossifier account, under which Apache is running on the machine, to see the cdroperator account's files (it can't right now). What do you think?
I just checked in an enhanced version
This means it is ready to be re-tested as part of Einstein IT-2?
While we have the glossifier and ftp server on the same virtual host
This is only true on the lower tiers, isn't it?
Nah.
DEV
nciws-d165-v (Glossifier, Emailers, FTP)
QA
nciws-q181-v (Glossifier, Emailers, FTP)
STAGE
nciws-203-v (Glossifier, FTP)
nciws-204-v (Emailers)
PROD
nciws-p194-v (Emailers)
nciws-p195-v (Glossifier, FTP)
Sure, in the sense that it's ready (though the rest of Einstein IT-2 isn't).
Verified on QA.
File Name | Posted | User |
---|---|---|
tier-settings-20161114152348.xls | 2016-11-14 15:29:14 | Kline, Bob (NIH/NCI) [C] |
Elapsed: 0:00:00.001464