CDR Tickets

Issue Number 3877
Summary Using clinical trials to maintain the Drug Dictionary
Created 2015-02-25 19:22:57
Issue Type New Feature
Submitted By Osei-Poku, William (NIH/NCI) [C]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2015-06-01 14:08:22
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.155892
Description

In anticipation of turning off the CTGov import and download jobs in the near future when we start getting data from CTRP, CIAT met with Margaret on Tuesday to discuss the implications of this on maintaining the Drug Dictionary.

The major source of information for the Drug Dictionary is clinical trials. Specifically, it is new drug terms obtained from processing/reviewing clinical trials. This means that we will continue to need access to trials from clinicaltrials.gov after the jobs are turned off in order to keep the Drug Dictionary up to date. That is because the clinical trials from CTRP is only a subset of what we get from clinicaltrials.gov and there are plans to stop getting CTRP data into the CDR in the future.

Because of the above, CIAT recommended the following to Margaret on Tuesday for consideration and Margaret suggested adding this to Jira so that we discuss it in the CDR/EBMS meeting.

CIAT proposed/recommended that the same search criteria used to import trials from clinicaltrials.gov be used to retrieve and display the results in a new 'simple' report interface with links that go back to the individual trials on clinicaltrials.gov.

The idea is that no protocol records would be created in the CDR for the trials that are retrieved and only new trials would be displayed for the user to review (no need to check for updates). After that, the results can be discarded. While we do not want to create or keep the protocol records, being able to get some statistics on the retrieved trials would be beneficial for reporting purposes.

We also considered the possibility of leaving the existing import job to run as usual but not the download job. That is, continuing to display new trials for users to review but not mark them for import. The benefit of using the existing report is that the existing report has all the information users need to review a trial and look for new drug terms for the dictionary and users are already familiar with the report. It may also be less work for the developers. However, we are also aware of the recent history of import failures so we're not completely sure how beneficial this option would be compared to the new report proposed above.

Comment entered 2015-03-31 12:04:41 by Kline, Bob (NIH/NCI) [C]

The idea is that no protocol records would be created in the CDR for the trials that are retrieved and only new trials would be displayed for the user to review (no need to check for updates). After that, the results can be discarded. While we do not want to create or keep the protocol records, being able to get some statistics on the retrieved trials would be beneficial for reporting purposes.

It seems risky to discard the information for a trial as soon as it has been reported once.(I can imagine, for example, CBIIT doing an app-scan which brings up the report page, causing the trials displayed on the report to be dropped from future displays of the report, which would result in CIAT never seeing those trials).

I propose instead a report which would show the ID, title, and link for trials first received by NLM in a specific date range, defaulting to something like the most recent 30 days.

This would be implemented using a new database table, along these lines:

  CREATE TABLE ctgov_trials
       (nct_id VARCHAR(11) NOT NULL PRIMARY KEY,
   trial_title NVARCHAR(1024) NOT NULL
first_received DATETIME)

I'll use the brief title if there is one, falling back on the official title if there's no brief title (unless you'd prefer that I reverse the default).

Does this approach sound reasonable? If so, I would estimate the level of effort to implement this report at a day or less.

Comment entered 2015-03-31 15:46:36 by Osei-Poku, William (NIH/NCI) [C]

This sounds good to me. We'd also like to be able to retrieve the report in Excel for workflow purposes, if that is possible.

Comment entered 2015-05-27 18:13:26 by Kline, Bob (NIH/NCI) [C]
Comment entered 2015-05-28 13:07:46 by Osei-Poku, William (NIH/NCI) [C]

Thanks, Bob! The report looks good on DEV. Could you please add the following columns/data?

1. Phase
2. Sponsorship (or sponsor/collaborator)
3. Other IDs

Comment entered 2015-06-01 10:28:24 by Kline, Bob (NIH/NCI) [C]

Starting over. Moved ticket back to "In Progress" to work with new requirements.

Comment entered 2015-06-01 11:57:23 by Kline, Bob (NIH/NCI) [C]

Replaced original table for this ticket with three new ones:

  CREATE TABLE ctgov_trial
       (nct_id VARCHAR(11) NOT NULL PRIMARY KEY,
   trial_title NVARCHAR(1024) NOT NULL,
   trial_phase NVARCHAR(20) NULL,
first_received DATETIME NOT NULL)
CREATE TABLE ctgov_trial_sponsor
     (nct_id VARCHAR(11) NOT NULL REFERENCES ctgov_trial,
    position INTEGER NOT NULL,
     sponsor VARCHAR(1024) NOT NULL,
 PRIMARY KEY (nct_id, position))
CREATE TABLE ctgov_trial_other_id
     (nct_id VARCHAR(11) NOT NULL REFERENCES ctgov_trial,
    position INTEGER NOT NULL,
    other_id VARCHAR(1024) NOT NULL,
 PRIMARY KEY (nct_id, position))
GRANT SELECT on ctgov_trial to CdrGuest
GRANT SELECT on ctgov_trial_sponsor to CdrGuest
GRANT SELECT on ctgov_trial_other_id to CdrGuest
Comment entered 2015-06-01 11:58:13 by Kline, Bob (NIH/NCI) [C]

The original report will not work while I am rewriting it. I'll let you know when it's ready for testing again.

Comment entered 2015-06-01 14:07:24 by Kline, Bob (NIH/NCI) [C]

I have finished the re-implementation for this ticket. Please test (on DEV):

https://cdr.dev.cancer.gov/cgi-bin/cdr/RecentCTGovProtocols.py?Session=guest

Comment entered 2015-06-01 18:29:03 by Osei-Poku, William (NIH/NCI) [C]

Verified on DEV. Looks great. Thank you!!!

Comment entered 2015-06-02 14:58:47 by alan

Reminder: We'll want to update tables.sql with the new table definitions, if it hasn't already been done.

Comment entered 2015-06-02 15:14:43 by Kline, Bob (NIH/NCI) [C]

It has, just not in Subversion yet (remember?).

Comment entered 2015-06-18 09:31:34 by Kline, Bob (NIH/NCI) [C]

In Subversion for this ticket:

  • R13198 /branches/Curie/Database/CreateLogins.sql

  • R13198 /branches/Curie/Database/tables.sql

  • R13199 /branches/Curie/Inetpub/wwwroot/cgi-bin/cdr/RecentCTGovProtocols.py

  • R13201 /branches/Curie/Bin/GetRecentCTGovProtocols.cmd

  • R13200 /branches/Curie/Utilities/GetRecentCTGovProtocols.py

Will require CBIIT to install a scheduled job to populate the new tables periodically.

The ticket to create the new tables on STAGE and PROD is DBATEAM-1922.

Comment entered 2015-06-19 08:52:43 by Kline, Bob (NIH/NCI) [C]

The new tables are on PROD and STAGE. The ticket for creating the scheduled job on QA and DEV is WEBTEAM-6588 (separate WEBTEAM tickets will be needed for STAGE and PROD).

Comment entered 2015-06-19 13:23:53 by Kline, Bob (NIH/NCI) [C]

The scheduled job has been installed on DEV and QA. It will kick off at 4am tomorrow morning, so you'll be able to test the report on Monday (or tomorrow, if you're really eager :-) ).

Comment entered 2015-06-22 10:03:25 by Osei-Poku, William (NIH/NCI) [C]

Running the report in Web Page format produces a python script error:

A problem occurred in a Python script.
D:\cdr\Log\tmpkkokbp.html contains the description of this error

This is on QA.

Comment entered 2015-06-22 10:12:54 by Osei-Poku, William (NIH/NCI) [C]

Also, running the report in Worksheet mode appears to display trials that were received on only the selected Start Date. That is, if you choose,

Start Date: 2015-06-02
End Date: 2015-06-22

all the trials that are displayed would have a Received date of 2015-06-02. If you choose another Start Date, all the trials displayed would have that new Start Date. It does not appear to be retrieving the trials within the date range specified.

Comment entered 2015-06-22 10:29:00 by Kline, Bob (NIH/NCI) [C]

So, this worked correctly on DEV, but it's broken on QA?

Comment entered 2015-06-22 10:35:23 by Osei-Poku, William (NIH/NCI) [C]

That is correct. I didn't see any of these problems on DEV.

Comment entered 2015-06-22 11:08:34 by Kline, Bob (NIH/NCI) [C]

I'm rebuilding and repopulating the tables on QA. Will let you know when they're ready.

Comment entered 2015-06-22 11:21:58 by Kline, Bob (NIH/NCI) [C]

Please test again.

Comment entered 2015-06-22 11:34:26 by Osei-Poku, William (NIH/NCI) [C]

It worked. Thanks!

Comment entered 2015-06-22 11:35:00 by Osei-Poku, William (NIH/NCI) [C]

Verified on QA.

Comment entered 2015-06-22 11:36:07 by Kline, Bob (NIH/NCI) [C]

The report was showing your start date instead of the received date for all of the trials. It was showing you the correct trials, but the wrong date in the second column. The bug was present on both DEV and QA. Fixed on both tiers.

Comment entered 2015-09-01 13:46:36 by Osei-Poku, William (NIH/NCI) [C]

Verified on PROD. Thank you!

Elapsed: 0:00:00.002344