CDR Tickets

Issue Number 4114
Summary [Python] Upgrade Python on CDR Windows servers
Created 2016-05-27 07:04:31
Issue Type Improvement
Submitted By Kline, Bob (NIH/NCI) [C]
Assigned To Englisch, Volker (NIH/NCI) [C]
Status Closed
Resolved 2016-12-08 16:15:36
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.185076
Description

We are currently running Python 2.7.2

2.7.2 (default, Jun 24 2011, 12:22:14) [MSC v.1500 64 bit (AMD64)]

The version currently available from ActiveState is 2.7.10.

Use of the older version is causing security warnings when running the package installer. See https://urllib3.readthedocs.org/en/latest/security.html for more information. I'm giving this an elevated priority, since it involves security.

Comment entered 2016-10-05 16:20:39 by Kline, Bob (NIH/NCI) [C]

I have attached a script that can be run to install most of the third-party modules we need using pip. The following modules can't be installed by pip (at least not on servers without compilers):

  • ndscheduler

  • PIL

  • pychecker (I'm inclined to skip this and use pylint instead)

  • MySQL-python

  • pycrypto

I'll come up with an up-to-date set of instructions for installing these and attach it here.

I've assigned this ticket to Volker, as he's shepherding the related issue https://tracker.nci.nih.gov/browse/WEBTEAM-9279.

Comment entered 2016-10-05 18:33:25 by Englisch, Volker (NIH/NCI) [C]

The log file has been copied to
D:\CDR\logs

There were a couple of warnings and two packages couldn't be installed because those need to be build: pylint and sqlalchemy

Comment entered 2016-10-05 18:36:49 by Kline, Bob (NIH/NCI) [C]

Sorry, you must have missed my note to Seth, telling him I'll have the instructions ready tomorrow. I just posted a replaced version of the script which forces pre-built binaries.

Comment entered 2016-10-05 18:40:11 by Kline, Bob (NIH/NCI) [C]

I have a suspicion that pycrypto isn't used any more (I see indications on the web that it's a dead package), and as I noted above we can probably abandon pychecker (I've never liked it much, as i runs your code, which is sometimes dangerous; pylint is better). And PIL (which is also looking pretty dead) can I think be replaced by pillow. That leaves ndscheduler and MySQL-python, which I think we have instructions for somewhere.

Comment entered 2016-10-05 19:08:54 by Englisch, Volker (NIH/NCI) [C]

I've never used pychecker myself except when Alan asked me to test something. I have no problem to remove those other packages you mentioned.

No, I didn't miss your note to Seth but I missed that you created an updated run-pip.bat. I'll wait until tomorrow.

Comment entered 2016-10-06 08:30:30 by Kline, Bob (NIH/NCI) [C]

Replacing PIL with pillow requires modifications to the import statements which use that module. Oddly, the fork (named "pillow" instead of "PIL"), requires from PIL import Image or from PIL import ImageEnhance where the original package (somewhat sloppily) exposed Image and ImageEnhance as global names. Here are the four files which need the modification:

  • ./Inetpub/wwwroot/cgi-bin/cdr/GetCdrImage.py

  • ./Inetpub/wwwroot/cgi-bin/cdr/ResizeImage.py

  • ./Inetpub/wwwroot/cgi-bin/cdr/TestPythonUpgrade.py

  • ./Mailers/cdrlatexlib.py

Comment entered 2016-10-06 08:44:56 by Kline, Bob (NIH/NCI) [C]

I think pycrypto was formerly used to meet a dependency in paramiko for which pip now uses the cryptography package.

Comment entered 2016-10-06 08:48:52 by Kline, Bob (NIH/NCI) [C]

I have attached the installer file for the MySQL-python package (pip doesn't have it); it can be run directly.

I have also attached ndscheduler.tar.bz2 which need to be unpacked and installed in a command window as follows:

D:
cd \tmp
tar xjf \original\location\ndescheduler.tar.bz2
cd ndscheduler
python setup.py install
Comment entered 2016-10-06 08:53:54 by Kline, Bob (NIH/NCI) [C]

CBIIT ticket to perform the parts of this for which the development team has insufficient permissions.

Comment entered 2016-10-07 10:41:07 by Kline, Bob (NIH/NCI) [C]

There's a fifth script which needs modification (the behavior of the cStringIO module has changed):

Index: DownloadCTGovProtocols.py
===================================================================
--- DownloadCTGovProtocols.py   (revision 14236)
+++ DownloadCTGovProtocols.py   (working copy)
@@ -73,6 +73,8 @@
         root = etree.XML(rows[0][0].encode("utf-8"))
         self.transform = etree.XSLT(root)
     def normalize(self, doc):
+        if isinstance(doc, unicode):
+            doc = doc.encode("utf-8")
         fp = cStringIO.StringIO(doc)
         tree = etree.parse(fp)
         return etree.tostring(self.transform(tree))
Comment entered 2016-10-07 22:26:46 by Kline, Bob (NIH/NCI) [C]

I have packaged up the post-processing steps as much as I believe is possible. The steps will now be:

  1. Back up D:\Python

  2. Uninstall the existing Python and remove D:\Python

  3. Run
    nciis-p401.nci.nih.gov\Group03\OCPL\OCPL_Cross\CDR\CdrBuild\ActivePython-2.7.10.12-win64-x64.msi and install to D:\Python

  4. Make everything at and under D:\Python world-readable

  5. Run
    nciis-p401.nci.nih.gov\Group03\OCPL\OCPL_Cross\CDR\CdrBuild\Scripts\python-upgrade-postprocess.bat

We will be able to run step 5 ourselves on QA. CBIIT will need to run them all on STAGE and PROD. The last step takes care of replacing the five files which needed to be modified (see above – I've created a branch for these), installing all of the third-party Python packages, and registering the COM ADO libraries. I've tested that last script pretty thoroughly on on one of my own systems, and I'm reasonably confident it will work on QA and the upper tiers. Still waiting on CBIIT to fix the permissions on DEV (I don't won't to start down time on QA before DEV is working again). Seems we're down to one response per day again on that ticket.

Comment entered 2016-10-08 05:14:47 by Kline, Bob (NIH/NCI) [C]

I have removed one more obstacle from the build process. I was trying to figure out a way for the MySQL-python installer to run synchronously without a GUI interface (I had needed to put that step at the very end because the batch file would keep going while the GUI installer for the prebuilt binary launched asyncrhonously). It didn't look as if there was any way to provide a command-line option to control the installer's behavior, so I decided to create my own wheel, so we could install with pip. I succeeded, and the result is in the CdrBuild directory on the L: drive. Wasn't easy, but an extra benefit was that I was able to upgrade from 1.2.3 to 1.2.5 (the latest version, if you don't count the Debian/Ubuntu fork).

Here's what I needed to do in order to build the wheel:

  1. Go to https://pypi.python.org/pypi/MySQL-python and download and unpack source for the latest version (MySQL-python-1.2.5.zip in this case)

  2. Go to https://www.microsoft.com/en-us/download/details.aspx?id=44266 and download and install Microsoft's C++ compiler for Python 2.7.

  3. Go to http://dev.mysql.com/downloads/connector/c/6.0.html and download and install the 64-bit MSI installer (mysql-connector-c-6.0.2-winx64.msi, not the one for VS2005). It's important to use version 6.0.2, because later versions are incompatible with expectations in Dustman's source code. I have filed a bug report for this incompatibility, which appears to be an oversight on the part of the MySQL devs

  4. Open a console window in the directory created in step #1 (e.g., Downloads\MySQL-python-1.2.5)

  5. Edit site.cfg and change

    connector = C:\Program Files (x86)\MySQL\MySQL Connector C 6.0.2

    to

    connector = C:\Program Files\MySQL\MySQL Connector C 6.0.2
  6. Run

    pip install wheel
  7. Run

    python setup.py bdist_wheel

    The wheel will be in the dist directory. You can install it with

    cd dist
    pip install MySQL_python-1.2.5-cp27-cp27m-win_amd64.whl

    I have copied the wheel I built, as well as the tools needed to build it (Dustman's 1.2.5 source code, the MySQL Connector C 6.0.2, and Microsoft's C++ compiler for Python 2.7) to the CdrBuild directory on the L: drive:

  • MySQL-python-1.2.5.zip

  • VCForPython27.msi

  • mysql-connector-c-6.0.2-winx64.msi

Comment entered 2016-10-08 07:52:56 by Kline, Bob (NIH/NCI) [C]

I created a wheel for ndscheduler. I think this makes the process more likely to succeed, and I know it removes dependencies on drive letter mapping for the network drive and speeds things up significantly. I put it on the L: drive in the CdrBuild director:

  • ndscheduler-0.1.1-py2-none-any.whl

Comment entered 2016-11-09 11:19:00 by Kline, Bob (NIH/NCI) [C]

Python and all third-party modules have been upgraded on all of the non-production CDR Windows servers. Volker and I have done some testing of publishing, the new scheduler, and some other behind-the-scenes functionality.

and : Before we promote the upgrades to production, it would be prudent if you checked at least the most critical of your reports on DEV or QA.

Thanks,
Bob

Comment entered 2016-11-21 11:28:33 by Juthe, Robin (NIH/NCI) [E]

I have checked the reports listed on the OCCM Board Managers page in the Admin menus (with the exception of the General Use reports) and didn't run into any problems. However, the PCIB stats report is showing the older version (prior to the enhancements made in OCECDR-4096)--I wasn't sure if that was to be expected.

Also, I only tested one Board member correspondence mailer. Bob, if you think I need to check all of the mailers, please let me know.

William, let me know if you need help testing the other reports. Thanks.

Comment entered 2016-11-21 12:03:09 by Kline, Bob (NIH/NCI) [C]

QA had the newer version of the PCIB stats report, but DEV had reverted back. I have restored the newer version on DEV. Testing any of the board member correspondence mailers should be sufficient.

Thanks.

Comment entered 2016-11-21 15:37:26 by Osei-Poku, William (NIH/NCI) [C]

The audio import report produced a python script error when I tried to load an existing file (1.Week_115.zip).

> -->
A problem occurred in a Python script.

D:\cdr\Log\tmpcdykix.html contains the description of this error.

Comment entered 2016-11-21 15:41:52 by Osei-Poku, William (NIH/NCI) [C]

The audio review report also produced the following error. However, these are all old existing files so I am not sure if they actually exist on the FTP server.

These are the files that are generating the error message:

Week_113.zip
Week_115.zip

"Error opening zipfile 'd:/cdr/Audio_from_CIPSFTP/Week_113.zip':<br /> Exception Type: <class 'zipfile.BadZipfile'></br /> Exception msg: File is not a zip file"

Comment entered 2016-11-21 15:44:10 by Kline, Bob (NIH/NCI) [C]

Was this on DEV?

Comment entered 2016-11-21 15:46:00 by Osei-Poku, William (NIH/NCI) [C]

Was this on DEV?
Reply

Yes.

Comment entered 2016-11-21 15:47:35 by Englisch, Volker (NIH/NCI) [C]

You are seeing these errors because the files you're trying to download do not exist on the DEV FTP server.

Comment entered 2016-11-21 15:48:43 by Kline, Bob (NIH/NCI) [C]

Those two files are empty, hence the message saying that they're not zip files. So this would be unrelated to the Python upgrade. Thanks for checking, though.

Comment entered 2016-11-21 15:49:53 by Kline, Bob (NIH/NCI) [C]

The files do exist, but they're both zero bytes in length.

Comment entered 2016-11-21 15:50:40 by Osei-Poku, William (NIH/NCI) [C]

I am getting the following error while running the Bounced Emailers report

"502 - Web server received an invalid response while acting as a gateway or proxy server.

There is a problem with the page you are looking for, and it cannot be displayed. When the Web server (while acting as a gateway or proxy) contacted the upstream content server, it received an invalid response from the content server."

Comment entered 2016-11-21 15:57:40 by Englisch, Volker (NIH/NCI) [C]

Oh, I see. I was looking on the Linux side thinking the program is trying to retrieve the files from the FTP server but the program accesses the files once they have already been downloaded to the CDR server.

Comment entered 2016-11-21 16:07:01 by Kline, Bob (NIH/NCI) [C]

That was the result of work on OCECDR-4107 for Einstein, for which I added the requirement for a valid CDR session on the Linux side, but neglected to take care of the comparable change on the Windows side. Fixed on DEV (but still broken on QA for the moment). Won't be a problem on the upper tiers, because none of the Einstein modifications will have been promoted yet.

Comment entered 2016-11-21 16:12:05 by Osei-Poku, William (NIH/NCI) [C]

Fixed on DEV (but still broken on QA for the moment).

Verified on DEV.

Comment entered 2016-11-21 16:26:14 by Osei-Poku, William (NIH/NCI) [C]

I am getting the following error message for the CTGovProtocols vs. Early EntryDate report

"Server Error

502 - Web server received an invalid response while acting as a gateway or proxy server.

There is a problem with the page you are looking for, and it cannot be displayed. When the Web server (while acting as a gateway or proxy) contacted the upstream content server, it received an invalid response from the content server."

Comment entered 2016-11-21 17:09:18 by Kline, Bob (NIH/NCI) [C]

This is a report you told us we could remove earlier this year. The script is gone, but the menu entry still needs to be dropped.

Comment entered 2016-11-21 17:53:24 by Osei-Poku, William (NIH/NCI) [C]

This is a report you told us we could remove earlier this year. The script is gone, but the menu entry still needs to be dropped.

Reply

That is right. Should I create a ticket to remove it from the menu or wait until all the protocol menus get removed eventually?

Comment entered 2016-11-21 17:54:20 by Osei-Poku, William (NIH/NCI) [C]

I am done testing. All the reports appear to work well.

Comment entered 2016-11-22 07:34:13 by Kline, Bob (NIH/NCI) [C]

I have dropped it from the menu in the Einstein branch (and on DEV).

Comment entered 2016-12-08 16:15:36 by Kline, Bob (NIH/NCI) [C]

This is in production.

Elapsed: 0:00:00.000568