CDR Tickets

Issue Number 3890
Summary Possible discrepancies in count of Dictionary Terms in PCIB Status Report
Created 2015-04-01 13:47:14
Issue Type Inquiry
Submitted By Osei-Poku, William (NIH/NCI) [C]
Assigned To Englisch, Volker (NIH/NCI) [C]
Status Closed
Resolved 2015-06-09 16:26:55
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.158053
Description

It appears that the PCIB report isn't counting all Drug Dictionary terms that are newly added to the dictionary. For example, in the month of February, the following new terms were reported as new terms added to the dictionary:

dextroamphetamine sulfate
levobupivacaine hydrochloride
nitazoxanide
perampanel
sugammadex sodium

..and in the month of March, the following terms were reported as new terms added to the dictionary:

P-cadherin inhibitor PCA062
USP14/UCHL5 inhibitor VLX1570
abacavir sulfate
anti-CD40 monoclonal antibody SEA-CD40
ropidoxuridine

Meanwhile the following terms were created in February but definitions were not added to them until March and so should have been reported in the March Report but they weren't.

CDR0000769608 nanocell-encapsulated miR-16-based microRNA mimic
CDR0000769589 anti-FGFR2 antibody-drug conjugate BAY1187982
CDR0000769590 BET inhibitor BAY1238097
CDR0000769566 anti-CD47 monoclonal antibody CC-90002

This is possibly because some of these terms are added to protocols as soon as they are created and so get 'published' right away in the month they are created. But because they do not get added to the Drug Dictionary until a later month, they do not get counted either in the month they are first published with protocols or later when they get definitions and are 're-published' to the Drug Dictionary. Essentially they are counted as updates at the time that they get definitions and then published to the Drug Dictionary.

Could you please review the program to verify that all terms added to the Drug Dictionary are counted correctly?

Comment entered 2015-04-01 19:42:17 by Englisch, Volker (NIH/NCI) [C]

Meanwhile the following terms were created in February but definitions were not added to them until March and so should have been reported in the March Report but they weren't.

That is correct. Since this is a time based report the SQL query is looking at the first_pub date to determine when the term has been published first. I guess the assumption was that a drug term cannot be published without a definition text. Therefore, even if a term has been published during the selected time frame it isn't included in this count unless it also has a definition text.

I'd like to include Margaret into this discussion since we created the report based on her requirements.

Comment entered 2015-04-01 19:46:50 by Englisch, Volker (NIH/NCI) [C]

This is possibly because some of these terms are added to protocols as soon as they are created and so get 'published' right away in the month they are created.

What does it mean in this context for a term to be published? I'm guessing a drug term can't be considered published without a definition. We either need to drop the requirement of having a definition text for a published drug term or we may want to add a date of when the definition text has been created.

Comment entered 2015-06-09 16:26:42 by Englisch, Volker (NIH/NCI) [C]

The SQL script has been updated and the existence of the definition is not required anymore for a term to be counted.
The updated Python program is

  • ICRDBStatsReport.py

This is ready for testing on DEV.

Comment entered 2015-06-09 17:50:11 by Juthe, Robin (NIH/NCI) [E]

"the existence of the definition is not required anymore for a term to be counted"

/, is this what we want? Based on William's comment above it sounds like we are interested a term counting as a new term in the drug dictionary when it has been published WITH a definition, but I may have that wrong.

Comment entered 2015-06-09 18:19:52 by Osei-Poku, William (NIH/NCI) [C]

When we discussed this we agreed that all the terms would eventually end up in the dictionary so to avoid the problem, we don't have to require that a term have a definition before it is counted.

Comment entered 2015-06-09 18:24:13 by Juthe, Robin (NIH/NCI) [E]

Ah, OK. That makes sense. Thanks!

Comment entered 2015-06-12 17:21:03 by Osei-Poku, William (NIH/NCI) [C]

These test documents are ready for publishing on DEV.

CDR0000759417
CDR0000759418
CDR0000759419
CDR0000759420
CDR0000759421
CDR0000759422
CDR0000759423
CDR0000759425

Comment entered 2015-06-12 17:48:16 by Englisch, Volker (NIH/NCI) [C]

The documents have been published on DEV.

Comment entered 2015-06-12 18:03:19 by Osei-Poku, William (NIH/NCI) [C]

It looks like all were published with the exception of CDR0000759418 so 7 out of 8 were reported.

Comment entered 2015-06-12 18:16:28 by Englisch, Volker (NIH/NCI) [C]

That's not exactly right. All terms were published but only 7 of those were reported as a Drug/Agent term. The missing one, CDR759422 (orbital/periocular basal cell carcinoma), has been entered as an index term.

Is this what you were expecting?

Comment entered 2015-06-12 18:24:31 by Osei-Poku, William (NIH/NCI) [C]

That is right. I looked at the doc history of the terms on DEV and didn't see at the time that CDR0000759418 had been marked as published and that confused me. I just checked again and they all have publication dates now. I think that is it. The report is displaying the correct number.

Comment entered 2015-06-12 18:24:59 by Osei-Poku, William (NIH/NCI) [C]

Verified on DEV.

Comment entered 2015-06-18 12:43:46 by Englisch, Volker (NIH/NCI) [C]

The following program has been saved to subversion:

  • R13207: ICRDBStatsReport.py

Comment entered 2015-06-22 18:01:16 by Osei-Poku, William (NIH/NCI) [C]

I am using these documents to test on QA.
CDR0000771349
CDR0000771348
CDR0000771347
CDR0000771346
CDR0000771345
CDR0000771344
CDR0000771343
CDR0000771342

Comment entered 2015-06-22 18:05:18 by Osei-Poku, William (NIH/NCI) [C]

Please run a publishing job for the documents above on QA.

Comment entered 2015-06-23 11:17:35 by Englisch, Volker (NIH/NCI) [C]

The documents have been published on QA.

Comment entered 2015-06-23 14:20:57 by Osei-Poku, William (NIH/NCI) [C]

Verified on QA. Thanks!

Comment entered 2015-09-01 13:41:43 by Englisch, Volker (NIH/NCI) [C]

We were in luck because Curie had just been released before the new report got created.
Please verify the new numbers and close this ticket.

Comment entered 2015-09-01 13:59:27 by Osei-Poku, William (NIH/NCI) [C]

Verified on PROD. Thanks!

Elapsed: 0:00:00.001494