CDR Tickets

Issue Number 4551
Summary Global to convert external refs to protocol refs
Created 2018-11-20 22:12:14
Issue Type Improvement
Submitted By Osei-Poku, William (NIH/NCI) [C]
Assigned To Englisch, Volker (NIH/NCI) [C]
Status Closed
Resolved 2019-01-24 19:09:25
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.236489
Description

We'd like to explore the possibility of doing a global change to convert all external refs in summaries pointing to clinical trials on clinicaltrials.gov, to Protocol Refs. The URLs of the external refs are mostly in this format https://clinicaltrials.gov/ct2/show/NCT00433589 . It would be good to know if this is possible considering the fact that the NCT IDs do not stand alone but are rather part of the URL.

Comment entered 2018-11-21 07:06:14 by Kline, Bob (NIH/NCI) [C]

It would certainly be possible to transform the ones which have a predictable format.

Comment entered 2018-11-29 12:39:30 by Osei-Poku, William (NIH/NCI) [C]

Thanks! I think all of them are in a predictable format. Let's discuss this in the Review meeting before moving forward with it.

Comment entered 2019-01-22 18:33:56 by Englisch, Volker (NIH/NCI) [C]

The URLs of the external refs are mostly in this format https://clinicaltrials.gov/ct2/show/NCT00433589

, would you be able to specify what you mean when you're saying mostly?
I understand that you'd like to update all ExternalRef elements with a URL that starts with "https://clinicaltrials.gov/ct2/show/" and ends with an NCT ID.
What are you expecting to happen with URLs that start with "https://clinicaltrials.gov/ct2/show/" and contain an NCT ID like the following

https://clinicaltrials.gov/ct2/show/study/NCT00866918?show_desc=Y#desc

or any URL similar to this?
Are we excluding this type of URL from the global change?

Comment entered 2019-01-22 18:40:44 by Osei-Poku, William (NIH/NCI) [C]

Please include all those URLs as long as there is an NCT ID in it. However, we would like to review all URLs if possible.

Comment entered 2019-01-22 19:04:22 by Englisch, Volker (NIH/NCI) [C]

I've attached a spreadsheet with all of the external refs starting with _https://clinicaltrials.gov/_ and containing an NCT-ID.

Comment entered 2019-01-24 11:10:21 by Osei-Poku, William (NIH/NCI) [C]

I have reviewed several of the URLs, especially the ones that do not follow the known format and they all appear to go to the same clinical trials page. So, please proceed and change all of them to protocol refs.

Comment entered 2019-01-24 19:09:12 by Englisch, Volker (NIH/NCI) [C]

I ran the global for a set of 50 summaries on DEV. The result can be inspected here:
https://cdr-dev.cancer.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2019-01-24_18-49-54

Comment entered 2019-01-25 12:48:07 by Osei-Poku, William (NIH/NCI) [C]

I reviewed all the external refs in the first summary on the test result and all of them appear to have been transformed correctly.Thanks!

However, the diff report is confusing, especially in cases where there are a lot of external refs. It includes most of the surround text in the summary so looking for the changes is very difficult. Can you please make the diff report display only the changes?

Comment entered 2019-01-25 13:30:49 by Osei-Poku, William (NIH/NCI) [C]

This is just a note for and with regards to the Closed Protocols on Cancer.gov we talked about in the meeting yesterday. I just came across 3 of them:

https://www.cancer.gov/about-cancer/treatment/clinical-trials/search/v?id=NCT01371981&r=1
https://www.cancer.gov/about-cancer/treatment/clinical-trials/search/v?id=NCT02538965&r=1
https://www.cancer.gov/about-cancer/treatment/clinical-trials/search/v?id=NCT02642965&r=1

It looks like they have been intentionally included since they have appropriately been labeled either "Status: Closed to Accrual and Intervention" OR "Status: Closed to Accrual"

Comment entered 2019-01-25 19:01:06 by Englisch, Volker (NIH/NCI) [C]

I understand that diff reports can be confusing but I didn't make any changes to the program that helps you to look at the diffs between the original and the modified version, therefore I don't want to mix these two issues here in one ticket.
Please submit a separate ticket if you feel the presentation of the diffs needs to be adjusted.

As a side note: Modifying the program ShowGlobalChangeTestResults.py is not a release independent task.

Comment entered 2019-01-25 19:02:19 by Englisch, Volker (NIH/NCI) [C]

I started a test run for all summaries on DEV for this global change.

Comment entered 2019-01-29 14:36:53 by Osei-Poku, William (NIH/NCI) [C]

We have reviewed several summaries from the test run and they all looked good. Please proceed to run in live mode on DEV. Thanks!

Comment entered 2019-01-29 16:39:58 by Englisch, Volker (NIH/NCI) [C]

The Live run on DEV completed in about 40 minutes.

Comment entered 2019-01-30 17:09:34 by Osei-Poku, William (NIH/NCI) [C]

Looks good on DEV. Please run in test mode on QA.

Comment entered 2019-02-06 14:38:54 by Englisch, Volker (NIH/NCI) [C]

The global change in test mode finished on QA.
Please review.

Comment entered 2019-02-08 15:28:22 by Osei-Poku, William (NIH/NCI) [C]

Verified. Please run in live mode on QA. Thanks!

Comment entered 2019-02-08 17:50:32 by Englisch, Volker (NIH/NCI) [C]

The live job finished running on QA.
Please verify.

Comment entered 2019-02-08 18:03:25 by Englisch, Volker (NIH/NCI) [C]

, I noticed the following validation messages in the log file on QA.

2019-02-08 16:45:11.429 [WARNING] CDR0000062687: b'Fragment _327 not found in target document'
2019-02-08 16:50:04.852 [WARNING] CDR0000062829: b'/Summary/AltTitle[4]: AltTitle exceeds allowed length (max 64/Short; 100/Navlabel)'
2019-02-08 16:58:06.169 [WARNING] CDR0000062910: b"Element 'Para': This element is not expected. Expected is ( ListItem )."
2019-02-08 16:58:06.169 [WARNING] CDR0000062910: b"Element 'ItemizedList': Missing child element(s). Expected is one of ( ListTitle, ListItem )."
2019-02-08 16:58:18.480 [WARNING] CDR0000062911: b'/Summary/SummarySection[12]/SummarySection[3]/OrderedList[2]/ListItem[3]: This element must have text content.'
Comment entered 2019-02-11 15:44:57 by Englisch, Volker (NIH/NCI) [C]

, something odd happened with the live job on QA. I'm looking at the logs on QA and I noticed that not all documents are listed. Only 12 of the 97 modified docs are listed with their diffs. After checking the database I can see that all documents have been updated - a second run of the global change shows 0 documents selected. Since the job ran on my DEV-VM and I had to copy the log files from my DEV-VM to the QA server it is possible (maybe even likely) I may have copied the wrong log directory or overwrote the correct directory with older log files.

At this point we could

  • Ignore the fact we won't have all log and diff files available, after all the test run finished successfully and the documents have been updated

  • Restore the database on QA and start over or

  • Run this global change job again on the STAGE server.

Do you have any preference, ?

Comment entered 2019-02-11 16:08:09 by Osei-Poku, William (NIH/NCI) [C]

Let's go with option 1 as I don't look at the diff report again after the live run. We will just review the documents to make sure the changes were as expected. We can then verify all the other aspects of the logs and diff report on STAGE.

Comment entered 2019-02-12 14:29:57 by Englisch, Volker (NIH/NCI) [C]

Did you want me to run a test run of the global on STAGE or are you still looking at the results on QA? I wasn't sure from your last comment if you wanted me to go ahead with a run on STAGE.

Comment entered 2019-02-12 15:23:23 by Osei-Poku, William (NIH/NCI) [C]

We are currently reviewing the changes on QA. I will let you know when we are done before we move to STAGE.

Comment entered 2019-02-20 17:21:27 by Osei-Poku, William (NIH/NCI) [C]

We have reviewed several summaries on QA and they all look good. We are ready for STAGE and PROD. Thanks!

Comment entered 2019-02-21 16:18:57 by Englisch, Volker (NIH/NCI) [C]

I ran the global in test mode on STAGE and copied the logs to our DEV server (2019-02-21_13-08-11).

Comment entered 2019-02-22 12:59:35 by Osei-Poku, William (NIH/NCI) [C]

The test results look good. Please run in live mode on STAGE. Thanks!

Comment entered 2019-02-22 14:02:23 by Englisch, Volker (NIH/NCI) [C]

The live run on STAGE completed.
There were three warnings:

  • [WARNING] CDR0000062829: b'/Summary/AltTitle[4]: AltTitle exceeds allowed length (max 64/Short; 100/Navlabel)'
  • [WARNING] CDR0000062910: b"Element 'Para': This element is not expected. Expected is ( ListItem )."
    [WARNING] CDR0000062910: b"Element 'ItemizedList': Missing child element(s). Expected is one of ( ListTitle, ListItem )."
Comment entered 2019-02-25 13:18:31 by Osei-Poku, William (NIH/NCI) [C]

Thanks! We'll look to see if this problem is on PROD and fix them.

Comment entered 2019-02-25 13:22:50 by Osei-Poku, William (NIH/NCI) [C]

Verified on STAGE. Please run in test mode on PROD. When done, please enter the stats from PROD in this ticket.

Comment entered 2019-02-26 11:59:45 by Englisch, Volker (NIH/NCI) [C]

Test run on PROD completed.

2019-02-26 11:43:39.886 [INFO] Run completed.
   Docs examined    = 123
   Docs changed     = 0
   Versions changed = 369
   Could not lock   = 0
   Errors           = 0
   Time             = 0:36:40.703200
Comment entered 2019-02-26 13:27:06 by Englisch, Volker (NIH/NCI) [C]

The logs for the PROD test run have been copied to DEV.

Comment entered 2019-02-26 15:03:21 by Osei-Poku, William (NIH/NCI) [C]

Looks good. Thanks! We are read for the live run on PROD.

Comment entered 2019-02-26 19:37:11 by Englisch, Volker (NIH/NCI) [C]

The live run finished on PROD. Please review.

2019-02-26 19:36:48.867 [INFO] Run completed.
   Docs examined    = 123
   Docs changed     = 122
   Versions changed = 272
   Could not lock   = 1
   Errors           = 0
   Time             = 1:00:23.906000
Specific versions saved:
  new cwd = 21
  new pub = 122
  new ver = 28
  old cwd = 56
Comment entered 2019-02-27 08:16:01 by Osei-Poku, William (NIH/NCI) [C]

Thank you! Would you be able to provide me with the list of summaries with new cwd = 21, new ver = 28 and old cwd = 56 ?

Comment entered 2019-02-27 10:52:01 by Englisch, Volker (NIH/NCI) [C]

I attached the log file. Here are the highlights:

  • 1 document failed validation (CDR62923)

  • 1 document locked (CDR256757)

Comment entered 2019-02-28 11:12:37 by Osei-Poku, William (NIH/NCI) [C]

Thank you for the files. The changes look good but I will close the ticket after we've had the chance to review some of the changes on Cancer.gov after Friday's publishing.

Comment entered 2019-02-28 11:15:38 by Englisch, Volker (NIH/NCI) [C]

Would you like to "hot-fix" one or two summaries ahead of Friday's publishing job to confirm with a small sample?

Comment entered 2019-02-28 11:20:44 by Osei-Poku, William (NIH/NCI) [C]

Sure. I will do that later today. Thanks!

Comment entered 2019-03-01 09:48:03 by Osei-Poku, William (NIH/NCI) [C]

The two summaries checked out okay on Cancer.gov. I will close this ticket now. Thank you!

Attachments
File Name Posted User
ExternalRef-CTGov.xlsx 2019-01-22 19:03:00 Englisch, Volker (NIH/NCI) [C]
ProtocolRef.txt 2019-02-27 10:48:59 Englisch, Volker (NIH/NCI) [C]

Elapsed: 0:00:00.001569