CDR Tickets

Issue Number 4127
Summary React to NCBI switch to HTTPS
Created 2016-06-22 09:51:22
Issue Type Improvement
Submitted By Kline, Bob (NIH/NCI) [C]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2016-08-08 17:35:23
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.186639
Description

NLM has announced that they will be switching to HTTPS on September 1, and that when they do that POST requests will be broken. From their announcement:

Starting on September 1st, when you visit NCBI pages, you'll see a green lock and https:// in the address bar instead of http://. This lets you know that you are really on an NCBI page - that our server identity is confirmed - and that your communication with our server is encrypted and private.

Here's what to expect if you're a general user or a scripter:

For general users
You will see the changes mentioned above - https:// and a green lock in the address bar - but you don't have to update or change anything.

You don't need to clear your cache or update any links to NCBI pages that you've put on your own webpages or shared with people. We will redirect all our pages to https://.

For scripters
To keep calls from failing, use https:, not http:.

Scripts that use HTTP POST to send data will not work once we transition from HTTP to HTTPS on September 1st.

If you'd like to know more about this change to HTTPS, please read The HTTPS-Only Standard https://https.cio.gov/ from the Federal Chief Information Officers website.

We currently use POST requests to retrieve articles from PubMed. According to the announcement, if we don't change the requests to use the GET verb our software will no longer work.

Comment entered 2016-07-01 07:52:43 by Kline, Bob (NIH/NCI) [C]

Got another email message from NLM this morning:

Please note, starting on October 1, 2016 – we will no longer be supporting HTTP.

Thank you,
PRS TEAM
ClinicalTrials.gov

If you did not get an adequate answer to your question or your problem has not been resolved, please email us back at Register@ClinicalTrials.gov.

Investigator's Login Page: https://register.clinicaltrials.gov
Study Record Managers’ Information: https://www.clinicaltrials.gov/ct2/help/for-manager
Protocol Detailed Review Items: https://prsinfo.clinicaltrials.gov/ProtocolDetailedReviewItems.pdf

This may mean that they've pushed back the deadline we're dealing with. Or (more likely?) the October 1 deadline refers to the service for registering trials, and we still need to hit the September 1 deadline for modifying the software that deals with the retrieval service.

Added as a watcher so this is on her radar.

Comment entered 2016-07-07 13:39:56 by Pizzillo, Bryan (NIH/NCI) [C]

,
I believe they are saying that a POST request to http:// will not work because the post body cannot be transfered to https. This is a common issue when setting up redirects. That being said, a POST request to https should not have an issue. So it sounds like you just need to update your URLs to HTTPS.

I do have two questions:

  1. Don't they currently have HTTPS version of the URLs we use that we can test against now?

  2. What are we POSTing to NCBI anyway?

Thanks,
Bryan

Comment entered 2016-07-07 15:54:05 by Kline, Bob (NIH/NCI) [C]

Answers:

  1. Yes, they do (and yes, POST works against HTTPS – as you predicted it would).

  2. The query parameters, the length of which is unpredictable (there are size limits on GET requests which are not imposed on POST requests).

So the work for this task should be pretty easy (just change the protocol string).

Comment entered 2016-07-21 14:26:18 by Kline, Bob (NIH/NCI) [C]

Bumping up the priority, since this has to get taken care of within the next few weeks.

Comment entered 2016-08-08 15:13:51 by Kline, Bob (NIH/NCI) [C]

Here's the list of places I've found where changes have to be made:

./DevTools/GlobalChange/GetReplacementCitations.py
./Filters/CDR0000000105.xml
./Filters/CDR0000000124.xml
./Inetpub/wwwroot/cgi-bin/cdr/CiteSearch.py
./Inetpub/wwwroot/cgi-bin/cdr/NewCitations.py
./Inetpub/wwwroot/cgi-bin/cdr/SummaryCitations.py
./Inetpub/wwwroot/cgi-bin/cdr/UpdatePreMedlineCitations.py
./XMetaL/Macros/Cdr.mcr

I have created branch ocecdr-4127 for the work on this ticket.

: I believe you had said you wanted me to work on this when I returned from vacation. Please let me know if that's not right.

Comment entered 2016-08-08 16:35:56 by Kline, Bob (NIH/NCI) [C]

Looks like we might also need to have some URLs cleaned up in some of the summaries:

256666.xml
256685.xml
256697.xml
256716.xml
256757.xml
256758.xml
269596.xml
299612.xml
453795.xml
517309.xml
552637.xml
574548.xml
62675.xml
62756.xml
62779.xml
62789.xml
62824.xml
62855.xml
62856.xml
62863.xml
62872.xml
62876.xml
62879.xml
62881.xml
62890.xml
62910.xml
681246.xml
687776.xml
688139.xml
719335.xml
733624.xml
736855.xml
744468.xml
761538.xml
765469.xml
765470.xml
770386.xml
770611.xml
773656.xml
773657.xml
773658.xml
774921.xml
775868.xml
778123.xml
778124.xml

Comment entered 2016-08-08 17:26:14 by Kline, Bob (NIH/NCI) [C]

I have made all the necessary code changes in the branch. I have installed everything except the global change (which is in DevTools, so nothing to install) and the two filters (need to check with first to make sure this won't mess up anything he's working on in the filters). Ready for user testing:

  • CIAT/CIPS Staff > Advanced Search > Citation

  • CIAT/CIPS Staff > Reports > Citations > New Citations Report

  • CIAT/CIPS Staff > Reports > Summaries and Miscellaneous Documents > Summaries Citations

  • CIAT/CIPS Staff > Update Pre-Medline Citations

  • "Search PubMed" on XMetaL Citation Toolbar

I'll work with Volker on getting the filter changes tested, and I'm beginning to suspect the global change script is obsolete (so testing won't be appropriate).

Comment entered 2016-08-24 16:13:38 by Kline, Bob (NIH/NCI) [C]

Just realized the users weren't watchers on this ticket. I have installed the changes on QA so you can test with reasonably fresh data.

: I installed the two filters (105 and 124). Do you need to run your tests? Or can these be tested by the users since the changes are in QC report filters?

Comment entered 2016-08-24 17:29:51 by Englisch, Volker (NIH/NCI) [C]

Are you referring to the diff reports I often run? These are run with a publishing job run before and after the filter update.
If all you did was to change the protocol from http to https I'm not sure we need to run diff reports but it's easy enough to do.

Comment entered 2016-08-24 17:36:06 by Kline, Bob (NIH/NCI) [C]

I agree that the protocol change doesn't seem to call for full publishing job runs for testing (though I think I remember that you have publishing jobs set up for the QC reports). Perhaps better to have the users test by using the QC reports and clicking on the links that got changed to see if they still work.

Comment entered 2016-08-24 17:37:15 by Kline, Bob (NIH/NCI) [C]

and : will CIAT take care of the cleanup of the URLs in the summaries?

Comment entered 2016-08-24 18:42:49 by Osei-Poku, William (NIH/NCI) [C]

Robin Juthe and William Osei-Poku: will CIAT take care of the cleanup of the URLs in the summaries?

Yes, we will do a cleanup of the URLs in the summaries. I assume we don't have to clean up all the URLs on QA, just enough to test and clean all of them up on PROD. Is that okay?

Comment entered 2016-08-24 21:07:37 by Kline, Bob (NIH/NCI) [C]

Yes, that's fine.

Comment entered 2016-08-25 09:00:41 by Juthe, Robin (NIH/NCI) [E]

Bob, could you please clarify what URL changes you're asking be handled manually?

Comment entered 2016-08-25 09:17:06 by Kline, Bob (NIH/NCI) [C]

For example, {code:xml}
<ExternalRef cdr:xref="http://www.ncbi.nlm.nih.gov/pubmed/20159818">PUBMED Abstract</ExternalRef>

 needs to become {code:xml}
<ExternalRef cdr:xref="https://www.ncbi.nlm.nih.gov/pubmed/20159818">PUBMED Abstract</ExternalRef>

(CDR256666).

Comment entered 2016-08-25 12:11:36 by Osei-Poku, William (NIH/NCI) [C]

I made changes in 256666 and 256697 and they seem okay. I didn't see any ncbi URL in 256685. Can you point me to the exact location of the URL in 256685? I searched all external refs and also did a quick search in text view but still didn't see it.

Comment entered 2016-08-25 12:27:38 by Kline, Bob (NIH/NCI) [C]

It's been a while since that script was run. I'll get you a fresh set later today.

Comment entered 2016-08-25 12:46:40 by Kline, Bob (NIH/NCI) [C]

I verified that 256685 had such links (four of them) back on the 8th of August when the script was previously run, but has none of them any more. Here's a fresh list:

256666.xml
256685.xml
256694.xml
256697.xml
256716.xml
256757.xml
256758.xml
269596.xml
299612.xml
453795.xml
517309.xml
552637.xml
574548.xml
62675.xml
62756.xml
62779.xml
62789.xml
62824.xml
62855.xml
62856.xml
62863.xml
62872.xml
62876.xml
62879.xml
62881.xml
62890.xml
62910.xml
681246.xml
687776.xml
688139.xml
695927.xml
719335.xml
733624.xml
736855.xml
744468.xml
758151.xml
761538.xml
765469.xml
765470.xml
770386.xml
770611.xml
773656.xml
773657.xml
773658.xml
774921.xml
775868.xml
778123.xml
778124.xml

Comment entered 2016-08-25 13:14:13 by Juthe, Robin (NIH/NCI) [E]

I've imported a citation on QA and linked it in a summary and that all worked fine. I noticed that the PubMed abstract links in the summary reference lists are still pointing to "http://", but I assume this will be handled in the filter changes you alluded to earlier, Bob. Is that right?

I also tested the Summaries Citations report on QA and the new links are working well.

William, are Cynthia and/or Minaxi testing the other citation reports?

Comment entered 2016-08-25 15:04:48 by Kline, Bob (NIH/NCI) [C]

Action items for Bob:

  1. find out where the http:// link Robin describes in the previous comment came from

  2. create a fresh list of documents needing manual changes (analyzing CDR docs, not exported XML; change the search to pick up URLs with either "ncbi.nlm" or "clinicaltrials.gov")

Comment entered 2016-08-25 17:47:43 by Kline, Bob (NIH/NCI) [C]

Well, the good news is that widening the net was successful. The bad news is that it was too successful. See attached spreadsheet.

: could you give me the CDR ID(s) for the document(s) involved in your previous comment?

Thanks.

Comment entered 2016-08-25 17:50:48 by Englisch, Volker (NIH/NCI) [C]

Did you mean to send the question to Robin Baldwin instead of Robin Juthe?

Comment entered 2016-08-25 17:53:49 by Kline, Bob (NIH/NCI) [C]

No, it was meant for RJ. I typed the at sign followed by "Robin" and both names showed up in a picklist, with RB first (and selected) and RJ second, so I pressed the down arrow (and watched the selection move down to RJ) and pressed the Enter key. The entries must have switched positions somewhere in that split-second window. I love JIRA so much. :-)

Comment entered 2016-08-25 18:15:30 by Juthe, Robin (NIH/NCI) [E]

Got it. Bob, the CDR ID is 517309. (Cancer Genetics Overview summary). I added the newly imported citation to the end of the first paragraph in that document.

That is quite a list. Almost 11,000 rows... A few comments/questions upon my quick review (by DocType):

1. Citations: We can ignore the Citation ones that fall within the AbstractText as that is taken directly from PubMed. (we only need to worry about the External Refs that we have control over). By my count, 19 links would require updating.
2. CTgov protocols: Can we ignore the URLs that are within CTgov protocols? Do these go to Cancer.gov? Those represent the overwhelming majority of hits - in the neighborhood of 9,700.
3. Glossary: We can ignore the GlossaryTermConcept hits (at least for now) since those are all within the DefinitionResource field, which is only used internally. It would be nice to fix them at some point so they aren't broken links if we try to use them, but I would consider this a low priority (165 links)
4. Summaries: We need to fix the Summary ExternalRefs, but can ignore the ones in Comment fields. Unfortunately, this does include a lot of OMIM links, as we suspected. (315 links)
5. Terms: These all look like they are internal, but would need Mary/William to confirm as I'm not too familiar with Term docs. (646 links)

Comment entered 2016-08-25 18:30:11 by Kline, Bob (NIH/NCI) [C]

Can we ignore the URLs that are within CTgov protocols?

I would think so for the ones which are mapped directly from values directly imported from the CTRP documents (which would be almost all of them).

Comment entered 2016-08-26 10:47:56 by Osei-Poku, William (NIH/NCI) [C]

I would think so for the ones which are mapped directly from values directly imported from the CTRP documents (which would be almost all of them).

The CTGov protocols URLs are the same URLs at the end of the trials on Cancer.gov that point directly to the corresponding trials on clinicaltrials.gov. I am not sure how they are connected. That is, whether you use what is stored in the CDR to create what is on Cancer.gov. If the URL stored in the CTGov document is not used to create the ones on Cancer.gov, then it looks like we can ignore them.

Also, it seems like the URLs for the trials on clinicaltrials.gov currently are using https.

https://clinicaltrials.gov/show/NCT00898079 - clinicaltrials.gov
http://clinicaltrials.gov/show/NCT00898079 - CDR document

Comment entered 2016-08-26 14:07:04 by Osei-Poku, William (NIH/NCI) [C]

5. Terms: These all look like they are internal, but would need Mary/William to confirm as I'm not too familiar with Term docs. (646 links)

That is correct. Majority of the URLs under the Term docs are coming from the comment element so they can be ignored. The remaining URLs were pulled from the ReferenceSource element which I believe is just for internal purposes only. So, they can be ignored as well.

I noticed that the links from the drug dictionary on Cancer.gov to the thesaurus is being redirected to the https address:
http://ncit.nci.nih.gov/ncitbrowser/ConceptReport.jsp?dictionary=NCI%20Thesaurus&code=C48398 - link on Cancer.gov
https://ncit.nci.nih.gov/ncitbrowser/ConceptReport.jsp?dictionary=NCI%20Thesaurus&code=C48398 - thesaurus site

We said in the review meeting that the terms may not be affected but it looks like we probably need to check with EVS to be sure they won't be affected.

Comment entered 2016-08-26 16:27:37 by Juthe, Robin (NIH/NCI) [E]

Before we get too far into the details with these URL updates, I think we should split off the change related to importing citations and promote that to PROD ahead of the Sept 1 deadline.

As for the URLs, we need to get some clarification on exactly which websites are affected and when the current redirects will expire (if they will expire). The messages from NLM that Bob pasted above are really unclear. For example, as we discussed yesterday, what about medlineplus.gov URLs (which we link to from DIS)? Will the current redirects expire Sept 1 (the date mentioned in the NCBI memo) or Oct 1 (the date mentioned in the ClinicalTrials.gov memo), or at another time? Would it be possible to reach back out to whomever these messages came from to get some clarification? Or, , would you happen to know who we should ask about this? Given all that's going on right now with clinical trials, in particular, I think we need to determine whether we need to worry about those ~9,700 trial URL updates.

William also raises the point about NCIT links currently redirecting to https. It seems like we will continue to identify URLs that are redirecting and may need updating given the HHS requirement to move our websites over to https. Is there also an HHS mandate to redirect the URLs until a certain date? I think we need to consider everything that's going to be affected by this broader HHS change and then determine if we can handle all of the updates at one time or if it makes sense to strategically split them up in some way. Doing them piecemeal as we receive these announcements seems inefficient (and we're more likely to miss something).

Comment entered 2016-08-26 16:55:44 by Osei-Poku, William (NIH/NCI) [C]

William, are Cynthia and/or Minaxi testing the other citation reports?

We've tested the new citations report as well as importing and adding new citations to summary documents. We will continue testing other citations reports on Monday morning.

Comment entered 2016-08-29 06:29:16 by Kline, Bob (NIH/NCI) [C]

I can create a global change script to do some of these (as long as we can nail down a consensus on which ones need to change) if that would be helpful. Wouldn't need to happen before the deadline.

Comment entered 2016-08-29 07:35:16 by Kline, Bob (NIH/NCI) [C]

I've imported a citation on QA and linked it in a summary and that all worked fine. I noticed that the PubMed abstract links in the summary reference lists are still pointing to "http://", but I assume this will be handled in the filter changes you alluded to earlier, Bob. Is that right?

I ran the modified summary through the Vendor Summary Set, and as far as I can tell, this is what we send to GateKeeper:

<ReferenceSection>
<Citation idx="1" PMID="27557410">
Zhang M, Lin O: Molecular Testing of Thyroid Nodules: A Review of Current Available Tests for Fine-Needle Aspiration Specimens. Arch Pathol Lab Med : , 2016.
</Citation>
<Citation idx="2" PMID="18559331">
Lindor NM, McMaster ML, Lindor CJ, et al.: Concise handbook of familial cancer susceptibility syndromes - second edition. J Natl Cancer Inst Monogr (38): 1-93, 2008.
</Citation>
</ReferenceSection>

So I'm going to guess that the URLs are constructed further downstream from us, and we'd have to get the WCMS team to modify their software to use the HTTPS protocol. Can you verify this, ?

Comment entered 2016-08-29 13:12:51 by Osei-Poku, William (NIH/NCI) [C]

William, are Cynthia and/or Minaxi testing the other citation reports?

Yes, new citations were imported and were added to summaries as well as running citations reports. They have all been tested without noticing any problems.

Comment entered 2016-08-30 10:41:46 by Kline, Bob (NIH/NCI) [C]

I believe the only outstanding question is the one I posed in my previous comment to . As soon as I get confirmation from him that the HTTP links are under the WCMS team's control, and not ours, I'll submit the ticket to CBIIT to deploy the changes for this ticket, unless someone still wants to do more testing.

Comment entered 2016-08-30 13:16:30 by Englisch, Volker (NIH/NCI) [C]

Can you verify this, Volker Englisch?

Yes and yes:

  • Yes, I can verify this.

  • Yes, the URL is created in the Gatekeeper rendering filter.

Does this mean we need to have a GK ticket to modify the GK filter?

Comment entered 2016-08-30 13:18:39 by Kline, Bob (NIH/NCI) [C]

Yes. Would you add one, please?

Comment entered 2016-08-30 13:30:23 by Englisch, Volker (NIH/NCI) [C]

A GK ticket has been created: WCMSGK-53

Comment entered 2016-08-30 14:24:38 by Kline, Bob (NIH/NCI) [C]

https://tracker.nci.nih.gov/browse/WEBTEAM-9060 has been submitted for the deployment to production (tested first on STAGE).

Comment entered 2016-09-01 15:52:49 by Kline, Bob (NIH/NCI) [C]

The patch has been deployed to STAGE and PROD. We will need to keep this ticket open (and I will keep the branch open) until we have completed the updates needed for existing documents. We'll need some input from to guide some of that work (see Robin's questions posted above in her comment from the 26th).

Comment entered 2016-09-08 14:06:40 by Osei-Poku, William (NIH/NCI) [C]

From the Review meeting, we don't have to worry about the static URLs in the data. The translation to https will be handled by the browser.

Comment entered 2016-11-01 15:25:58 by Kline, Bob (NIH/NCI) [C]

From the Review meeting, we don't have to worry about the static URLs in the data. The translation to https will be handled by the browser

Does that mean this ticket can be closed?

Comment entered 2016-11-01 18:08:26 by Osei-Poku, William (NIH/NCI) [C]

I guess so. We didn't close it on that day because we decided to wait for Robin to confirm and close it.

Comment entered 2016-11-08 14:47:14 by Kline, Bob (NIH/NCI) [C]

Robin's going to do a little more research.

Comment entered 2016-11-10 13:47:09 by Osei-Poku, William (NIH/NCI) [C]

Closed in the status meeting. URLs should continue to be rerouted to the new https URL.

Attachments
File Name Posted User
ocecdr-4127.xlsx 2016-08-25 17:47:43 Kline, Bob (NIH/NCI) [C]

Elapsed: 0:00:00.001274