PDQ Issues

Issue Number	4129
Summary	EOL for Python 2.7
Created	2016-06-30 10:59:37
Issue Type	Improvement
Submitted By	Kline, Bob (NIH/NCI) [C]
Assigned To	Kline, Bob (NIH/NCI) [C]
Status	Closed
Resolved	2019-11-13 17:43:29
Resolution	Fixed
Path	/home/bkline/backups/jira/ocecdr/issue.187173

Description

Python 2.7 support will end January 1, 2020. This is a separate ticket from OCECDR-4114 because upgrading to Python 3.x is harder but less urgent. "Utilities" is the best I could find for a component.

Comment entered 2019-02-09 10:59:13 by Kline, Bob (NIH/NCI) [C]

Comment entered 2019-04-04 13:43:59 by Kline, Bob (NIH/NCI) [C]

We should start this no later than the beginning of November, if not sooner.

Comment entered 2019-09-13 14:31:09 by Englisch, Volker (NIH/NCI) [C]

Time for an update 🙂

Comment entered 2019-09-13 14:43:45 by Kline, Bob (NIH/NCI) [C]

😛

Comment entered 2019-09-13 14:44:32 by Kline, Bob (NIH/NCI) [C]

I think we're going to make it. 😃

Comment entered 2019-09-13 15:00:08 by Englisch, Volker (NIH/NCI) [C]

What's in it for me? Anything you want me to look at?

Comment entered 2019-09-14 07:00:21 by Kline, Bob (NIH/NCI) [C]

Ah, your plate's not full? If so, let me take a look at my task list. I had assumed between keeping the CDR home fires burning, ramping up on Drupal, and the vendor and subsidiary site support they had pile on you, you'd be swamped. If that's not true, the first little thing it would be good for you to tackle would be to back out the obsolete document types (and their schemas) which had been purged recently, but somehow managed to be dragged back from the grave on DEV by the dev-data restoration script.

(Update: doesn't look like there are document types for these, so you just need to mark the schemas as deleted.)

ID	DOCTYPE	USER	SAVED	TITLE
799139	schema	volker	2019-08-30 13:34:23	EmailerDocument.xml
799138	schema	volker	2019-08-30 13:34:21	SubmittedTrial.xml
799137	schema	volker	2019-08-30 13:34:19	HereditaryCancerSyndrome.xml
799136	schema	volker	2019-08-30 13:34:17	EligibilityCriterion.xml
799135	schema	volker	2019-08-30 13:34:14	EmailerManifest.xml
799134	schema	volker	2019-08-30 13:34:12	EmailerRecipient.xml
799133	schema	volker	2019-08-30 13:34:09	GP.xml

(Here's a cool tip. You may have know this already, but that table was created by running a query in the CDR ad-hoc query interface (New Docs, a query I just created and saved), selecting some rows (including the headers), pasting into the Visual view of this comment editor, and using the toolbar commands to eliminate the two rows I didn't need. Jira has added some nifty functionality recently! 🙂)

Comment entered 2019-09-14 07:26:04 by Kline, Bob (NIH/NCI) [C]

After driving the wooden stake into those retired document types/schemas, the next task to tackle, if you should choose to accept your mission, would be to try and swap out the database layer currently used by the scheduler (pymssql, based on the freetds project) and replace it with pyodbc. While you're in the scheduler, you could tackle the task of running the 2to3 upgrade tool on our own code in that repository. There are different ways of using that tool. One is to blindly run it in "make the changes" mode and then see if the software works under Python 3. I prefer to run it without the -w switch, making the recommended changes with which I agree by hand. Sometimes when there are lots of changes to be made to a file, I will use the -w flag, but redirect the output to a file which I can then bring up to review, backing out or modifying any changes I don't think were right. Examples of changes I won't use:

the addition of an unnecessary extra set of parentheses for print() (so print("foo %" % bar) becomes print(("foo %' % bar))
unnecessary wrapping of iterables with list when I know I'm using the iterable in a way which doesn't need list()
replacing basestring with str, which changes the semantics, and in some cases breaks code (the other two examples are just ugly annoyances, but this one is dangerous)

Comment entered 2019-09-14 09:33:58 by Kline, Bob (NIH/NCI) [C]

And then your next task for this ticket would be to eliminate the encoding declaration from the xml decl at the top of any documents (filters, mostly) and from any of our code which puts it there. The encoding="utf-8" part of <?xml version="1.0" encoding="utf-8"?> is redundant, because that's the default encoding assumed if none is specified. And for Python 3, the XML parsers (or at least lxml, the one we use) will generate a Unicode string with etree.tostring(root, encoding="unicode"). We will want to move toward always manipulating string values as unecoded str objects, only encoding them at the moment they're being serialize for export. The parser will balk at etree.tostring(root, encoding="unicode"), however, if the document has an encoding of utf-8 attached to it, so we need to eliminate that encoding declaration from our documents and code.

This doesn't apply to the charset="utf-8" which we still want in the meta tag of HTML pages we generate, of course.

Comment entered 2019-09-16 11:13:21 by Englisch, Volker (NIH/NCI) [C]

Ah, your plate's not full? If so, let me take a look at my task list. I had assumed between keeping the CDR home fires burning, ramping up on Drupal, and the vendor and subsidiary site support they had pile on you, you'd be swamped.

You know how that works: You're swamped until the water recedes and you're waiting for the next Tsunami. It could roll in tomorrow or next Friday. Besides, I don't want to you create all the bugs yourself. 🙂

Comment entered 2019-09-16 18:51:39 by Kline, Bob (NIH/NCI) [C]

... tackle the task of running ...

Looks like I had already done that for the scheduler repo. Doesn't mean all the code will run correctly under Python 3, but it will when you're finished. 🙂

Comment entered 2019-09-17 08:06:00 by Kline, Bob (NIH/NCI) [C]

Here's another sticky issue with the scheduler running on Python 3. I was working on the TestPythonUpgrade.py CGI smoke test, and it reported that the apns package hadn't yet been installed. So I installed it using pip (making sure I got the package we've been using with the existing servers, as there's more than one APNs implementation floating around). I couldn't import it with Python 3, as it still had syntax which only works on Python 2.x (for example, except Exception, e instead of except Exception as e). I dug into the project's issues, and found that this had been reported as a bug (more than once: https://github.com/djacobs/PyAPNs/issues/163 and https://github.com/djacobs/PyAPNs/issues/177). Apparently, the version on pypi is out of date, and no one seems interested in addressing that problem. So in order to use this package with Python 3 we would have to install it directly from GitHub instead of using pip. Not an appealing path. Please investigate and determine whether (a) this package really is needed by ndscheduler and (b) whether it would be feasible to swap in one of the other apns implementations which actually supports Python 3.

Comment entered 2019-09-17 08:19:15 by Kline, Bob (NIH/NCI) [C]

Digging a little further myself, I'm starting to think we don't really need the apns package. In https://github.com/NCIOCPL/cdr-scheduler/blob/master/requirements.txt I see that this package is a

... dependencies for simple_scheduler only

and that's just a demo example we're not using. So I think we can just drop apns.

Comment entered 2019-09-17 13:44:32 by Englisch, Volker (NIH/NCI) [C]

Are we creating separate sub-tasks/branches for these individual items or is this all going to by under OCECDR-4129?

Comment entered 2019-09-17 14:15:37 by Englisch, Volker (NIH/NCI) [C]

There is another schema I've been thinking to get rid off which also got re-instated by a former refresh: xxtest (doc_type = 40, schema = 531742, title filter = 792271)
Can I just delete these documents (mark as deleted) from the document table and set the row in the doc_type table as inactive?

Comment entered 2019-09-17 14:20:48 by Kline, Bob (NIH/NCI) [C]

I've been tracking the task for this ticket in a separate list. Here's what that list looks like right now.

Create new tools to facilitate the upgrade work ✔
- tool to find all invocations of a function, method, constructors, etc. (has side benefit of checking for syntax errors in all Python code) ✔
- tool to find all imports of a module ✔
- tool to report unused installed modules ✔
- make the client APIs/libraries work on non-Windows machines ✔
- make non-IIS server for testing on MacBooks ✔
- enhance XML normalization tools to make document comparison more useful ✔
- eliminate encoding statement from <?xml ... declarations ✔
Run 2to3 on all Python source code (has to be done with inspection, as the tool makes some mistakes) ✔
- core API ✔
- legacy libraries ✔
- CGI scripts ✔
- publishing scripts ✔
- mailers ✔
- glossifier ✔
- licensee ✔
- scheduler ✔
- database ✔
- filters ✔
- api tunnel ✔
- bin ✔
- build ✔
- dev tools ✔
- utilities ✔
- report bugs in 2to3 tool to Python core team ✔
Get the https tunneling working under Python 3 ✔
Eliminate extra database packages ✔
- scheduler (switch to pyodbc) ✔
- change %s placeholders to ? ✔
- remove home-grown database exception classes ✔
- move timeouts from execute() to connect() ✔
- make all connect() arguments keyword args ✔
- use context blocks ("with conn.cursor() ...:") whenever changing code
Colsolidate multiple functions for sending email ✔
- create new class ✔
- modify all calls ✔
- test
Reduce logging to the class based on the standard library ✔
- eliminate other logging classes/functions ✔
- rewrite uses of those obsolete approaches ✔
Replace Page class so that it doesn't use a mix of Unicode and bytes
- write new class ✔
- test ✔
- plug into advanced search page software ✔
- add style rules to cdr.css for advanced pages ✔
- update other uses (possibly spread over time, beyond release)
Fix handling of Unicode/bytes in communication with web server
- always get incoming CGI parameters as strings, not bytes ✔
- always work with strings, not bytes, during processing
- use objects instead of strings while building HTML/XML
- defer encoding of strings until they are going out the door
- use urlencode() and parse_qs() for query paramater strings
- replace unicodeToLatin1() ✔
- in general, follow https://docs.python.org/3/howto/unicode.html
- rewrite cdrcgi.header() ✔
- fix sending spreadsheets ✔
Replace deprecated cgi.escape() ✔

If you would feel more comfortable creating separate tickets for the individual tasks, feel free to do so.

Comment entered 2019-09-17 14:22:09 by Kline, Bob (NIH/NCI) [C]

I think that should do it.

Comment entered 2019-09-17 19:10:40 by Englisch, Volker (NIH/NCI) [C]

so you just need to mark the schemas as deleted.)

Stupid question: This means active_status = 'D' and not active_status = 'I', right? I'm looking at it after the update and started wondering why I'm still seeing the records as part of the document view.

Comment entered 2019-09-17 19:16:01 by Kline, Bob (NIH/NCI) [C]

This means active_status = 'D' and not active_status = 'I', right?

Right. "I" corresponds to "Inactive" (also referred to as "blocked" by the users).

Comment entered 2019-09-25 06:30:21 by Kline, Bob (NIH/NCI) [C]

~oseipokuw - Any reason we shouldn't drop Protocols from the Reports menu? The only item on it is the Warehouse Box Number Report, and that will never have anything to report, now that the InScopeProtocol documents have been removed.

Comment entered 2019-11-13 17:43:29 by Kline, Bob (NIH/NCI) [C]

Ready for UAT.

Comment entered 2020-02-07 13:47:36 by Englisch, Volker (NIH/NCI) [C]

Attachments

File Name	Posted	User
Screen Shot 2019-02-09 at 10.56.54 AM.png	2019-02-09 10:58:46	Kline, Bob (NIH/NCI) [C]
Screen Shot 2019-09-13 at 14.28.53.png	2019-09-13 14:29:51	Englisch, Volker (NIH/NCI) [C]
Screen Shot 2020-02-07 at 1.32.53 PM.png	2020-02-07 13:47:23	Englisch, Volker (NIH/NCI) [C]

Elapsed: 0:00:00.001362

CDR Tickets