Issue Number | 4551 |
---|---|
Summary | Global to convert external refs to protocol refs |
Created | 2018-11-20 22:12:14 |
Issue Type | Improvement |
Submitted By | Osei-Poku, William (NIH/NCI) [C] |
Assigned To | Englisch, Volker (NIH/NCI) [C] |
Status | Closed |
Resolved | 2019-01-24 19:09:25 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.236489 |
We'd like to explore the possibility of doing a global change to convert all external refs in summaries pointing to clinical trials on clinicaltrials.gov, to Protocol Refs. The URLs of the external refs are mostly in this format https://clinicaltrials.gov/ct2/show/NCT00433589 . It would be good to know if this is possible considering the fact that the NCT IDs do not stand alone but are rather part of the URL.
It would certainly be possible to transform the ones which have a predictable format.
Thanks! I think all of them are in a predictable format. Let's discuss this in the Review meeting before moving forward with it.
The URLs of the external refs are mostly in this format https://clinicaltrials.gov/ct2/show/NCT00433589
~oseipokuw, would you
be able to specify what you mean when you're saying
mostly?
I understand that you'd like to update all ExternalRef elements with a
URL that starts with "https://clinicaltrials.gov/ct2/show/" and ends
with an NCT ID.
What are you expecting to happen with URLs that start with
"https://clinicaltrials.gov/ct2/show/" and contain an
NCT ID like the following
://clinicaltrials.gov/ct2/show/study/NCT00866918?show_desc=Y#desc https
or any URL similar to this?
Are we excluding this type of URL from the global change?
Please include all those URLs as long as there is an NCT ID in it. However, we would like to review all URLs if possible.
I've attached a spreadsheet with all of the external refs starting with _https://clinicaltrials.gov/_ and containing an NCT-ID.
I have reviewed several of the URLs, especially the ones that do not follow the known format and they all appear to go to the same clinical trials page. So, please proceed and change all of them to protocol refs.
I ran the global for a set of 50 summaries on DEV. The result can be
inspected here:
https://cdr-dev.cancer.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2019-01-24_18-49-54
I reviewed all the external refs in the first summary on the test result and all of them appear to have been transformed correctly.Thanks!
However, the diff report is confusing, especially in cases where there are a lot of external refs. It includes most of the surround text in the summary so looking for the changes is very difficult. Can you please make the diff report display only the changes?
This is just a note for ~mbeckwit and ~juther with regards to the Closed Protocols on Cancer.gov we talked about in the meeting yesterday. I just came across 3 of them:
https://www.cancer.gov/about-cancer/treatment/clinical-trials/search/v?id=NCT01371981&r=1
https://www.cancer.gov/about-cancer/treatment/clinical-trials/search/v?id=NCT02538965&r=1
https://www.cancer.gov/about-cancer/treatment/clinical-trials/search/v?id=NCT02642965&r=1
It looks like they have been intentionally included since they have appropriately been labeled either "Status: Closed to Accrual and Intervention" OR "Status: Closed to Accrual"
I understand that diff reports can be confusing but I didn't make any
changes to the program that helps you to look at the diffs between the
original and the modified version, therefore I don't want to mix these
two issues here in one ticket.
Please submit a separate ticket if you feel the presentation of the
diffs needs to be adjusted.
As a side note: Modifying the program ShowGlobalChangeTestResults.py is not a release independent task.
I started a test run for all summaries on DEV for this global change.
We have reviewed several summaries from the test run and they all looked good. Please proceed to run in live mode on DEV. Thanks!
The Live run on DEV completed in about 40 minutes.
Looks good on DEV. Please run in test mode on QA.
The global change in test mode finished on QA.
Please review.
Verified. Please run in live mode on QA. Thanks!
The live job finished running on QA.
Please verify.
~oseipokuw, I noticed the following validation messages in the log file on QA.
2019-02-08 16:45:11.429 [WARNING] CDR0000062687: b'Fragment _327 not found in target document'
2019-02-08 16:50:04.852 [WARNING] CDR0000062829: b'/Summary/AltTitle[4]: AltTitle exceeds allowed length (max 64/Short; 100/Navlabel)'
2019-02-08 16:58:06.169 [WARNING] CDR0000062910: b"Element 'Para': This element is not expected. Expected is ( ListItem )."
2019-02-08 16:58:06.169 [WARNING] CDR0000062910: b"Element 'ItemizedList': Missing child element(s). Expected is one of ( ListTitle, ListItem )."
2019-02-08 16:58:18.480 [WARNING] CDR0000062911: b'/Summary/SummarySection[12]/SummarySection[3]/OrderedList[2]/ListItem[3]: This element must have text content.'
~oseipokuw, something odd happened with the live job on QA. I'm looking at the logs on QA and I noticed that not all documents are listed. Only 12 of the 97 modified docs are listed with their diffs. After checking the database I can see that all documents have been updated - a second run of the global change shows 0 documents selected. Since the job ran on my DEV-VM and I had to copy the log files from my DEV-VM to the QA server it is possible (maybe even likely) I may have copied the wrong log directory or overwrote the correct directory with older log files.
At this point we could
Ignore the fact we won't have all log and diff files available, after all the test run finished successfully and the documents have been updated
Restore the database on QA and start over or
Run this global change job again on the STAGE server.
Do you have any preference, ~oseipokuw?
Let's go with option 1 as I don't look at the diff report again after the live run. We will just review the documents to make sure the changes were as expected. We can then verify all the other aspects of the logs and diff report on STAGE.
Did you want me to run a test run of the global on STAGE or are you still looking at the results on QA? I wasn't sure from your last comment if you wanted me to go ahead with a run on STAGE.
We are currently reviewing the changes on QA. I will let you know when we are done before we move to STAGE.
We have reviewed several summaries on QA and they all look good. We are ready for STAGE and PROD. Thanks!
I ran the global in test mode on STAGE and copied the logs to our DEV server (2019-02-21_13-08-11).
The test results look good. Please run in live mode on STAGE. Thanks!
The live run on STAGE completed.
There were three warnings:
[WARNING] CDR0000062829: b'/Summary/AltTitle[4]: AltTitle exceeds allowed length (max 64/Short; 100/Navlabel)'
[WARNING] CDR0000062910: b"Element 'Para': This element is not expected. Expected is ( ListItem )."
[WARNING] CDR0000062910: b"Element 'ItemizedList': Missing child element(s). Expected is one of ( ListTitle, ListItem )."
Thanks! We'll look to see if this problem is on PROD and fix them.
Verified on STAGE. Please run in test mode on PROD. When done, please enter the stats from PROD in this ticket.
Test run on PROD completed.
2019-02-26 11:43:39.886 [INFO] Run completed.
Docs examined = 123
Docs changed = 0
Versions changed = 369
Could not lock = 0
Errors = 0
Time = 0:36:40.703200
The logs for the PROD test run have been copied to DEV.
Looks good. Thanks! We are read for the live run on PROD.
The live run finished on PROD. Please review.
2019-02-26 19:36:48.867 [INFO] Run completed.
Docs examined = 123
Docs changed = 122
Versions changed = 272
Could not lock = 1
Errors = 0
Time = 1:00:23.906000
Specific versions saved:
new cwd = 21
new pub = 122
new ver = 28
old cwd = 56
Thank you! Would you be able to provide me with the list of summaries with new cwd = 21, new ver = 28 and old cwd = 56 ?
I attached the log file. Here are the highlights:
1 document failed validation (CDR62923)
1 document locked (CDR256757)
Thank you for the files. The changes look good but I will close the ticket after we've had the chance to review some of the changes on Cancer.gov after Friday's publishing.
Would you like to "hot-fix" one or two summaries ahead of Friday's publishing job to confirm with a small sample?
Sure. I will do that later today. Thanks!
The two summaries checked out okay on Cancer.gov. I will close this ticket now. Thank you!
File Name | Posted | User |
---|---|---|
ExternalRef-CTGov.xlsx | 2019-01-22 19:03:00 | Englisch, Volker (NIH/NCI) [C] |
ProtocolRef.txt | 2019-02-27 10:48:59 | Englisch, Volker (NIH/NCI) [C] |
Elapsed: 0:00:00.001569