Issue Number | 3890 |
---|---|
Summary | Possible discrepancies in count of Dictionary Terms in PCIB Status Report |
Created | 2015-04-01 13:47:14 |
Issue Type | Inquiry |
Submitted By | Osei-Poku, William (NIH/NCI) [C] |
Assigned To | Englisch, Volker (NIH/NCI) [C] |
Status | Closed |
Resolved | 2015-06-09 16:26:55 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.158053 |
It appears that the PCIB report isn't counting all Drug Dictionary terms that are newly added to the dictionary. For example, in the month of February, the following new terms were reported as new terms added to the dictionary:
dextroamphetamine sulfate
levobupivacaine hydrochloride
nitazoxanide
perampanel
sugammadex sodium
..and in the month of March, the following terms were reported as new terms added to the dictionary:
P-cadherin inhibitor PCA062
USP14/UCHL5 inhibitor VLX1570
abacavir sulfate
anti-CD40 monoclonal antibody SEA-CD40
ropidoxuridine
Meanwhile the following terms were created in February but definitions were not added to them until March and so should have been reported in the March Report but they weren't.
CDR0000769608 nanocell-encapsulated miR-16-based microRNA mimic
CDR0000769589 anti-FGFR2 antibody-drug conjugate BAY1187982
CDR0000769590 BET inhibitor BAY1238097
CDR0000769566 anti-CD47 monoclonal antibody CC-90002
This is possibly because some of these terms are added to protocols as soon as they are created and so get 'published' right away in the month they are created. But because they do not get added to the Drug Dictionary until a later month, they do not get counted either in the month they are first published with protocols or later when they get definitions and are 're-published' to the Drug Dictionary. Essentially they are counted as updates at the time that they get definitions and then published to the Drug Dictionary.
Could you please review the program to verify that all terms added to the Drug Dictionary are counted correctly?
Meanwhile the following terms were created in February but definitions were not added to them until March and so should have been reported in the March Report but they weren't.
That is correct. Since this is a time based report the SQL query is looking at the first_pub date to determine when the term has been published first. I guess the assumption was that a drug term cannot be published without a definition text. Therefore, even if a term has been published during the selected time frame it isn't included in this count unless it also has a definition text.
I'd like to include Margaret into this discussion since we created the report based on her requirements.
This is possibly because some of these terms are added to protocols as soon as they are created and so get 'published' right away in the month they are created.
What does it mean in this context for a term to be published? I'm guessing a drug term can't be considered published without a definition. We either need to drop the requirement of having a definition text for a published drug term or we may want to add a date of when the definition text has been created.
The SQL script has been updated and the existence of the definition
is not required anymore for a term to be counted.
The updated Python program is
ICRDBStatsReport.py
This is ready for testing on DEV.
"the existence of the definition is not required anymore for a term to be counted"
~MBeckwit/~oseipokuw, is this what we want? Based on William's comment above it sounds like we are interested a term counting as a new term in the drug dictionary when it has been published WITH a definition, but I may have that wrong.
When we discussed this we agreed that all the terms would eventually end up in the dictionary so to avoid the problem, we don't have to require that a term have a definition before it is counted.
Ah, OK. That makes sense. Thanks!
These test documents are ready for publishing on DEV.
CDR0000759417
CDR0000759418
CDR0000759419
CDR0000759420
CDR0000759421
CDR0000759422
CDR0000759423
CDR0000759425
The documents have been published on DEV.
It looks like all were published with the exception of CDR0000759418 so 7 out of 8 were reported.
That's not exactly right. All terms were published but only 7 of those were reported as a Drug/Agent term. The missing one, CDR759422 (orbital/periocular basal cell carcinoma), has been entered as an index term.
Is this what you were expecting?
That is right. I looked at the doc history of the terms on DEV and didn't see at the time that CDR0000759418 had been marked as published and that confused me. I just checked again and they all have publication dates now. I think that is it. The report is displaying the correct number.
Verified on DEV.
The following program has been saved to subversion:
R13207: ICRDBStatsReport.py
I am using these documents to test on QA.
CDR0000771349
CDR0000771348
CDR0000771347
CDR0000771346
CDR0000771345
CDR0000771344
CDR0000771343
CDR0000771342
Please run a publishing job for the documents above on QA.
The documents have been published on QA.
Verified on QA. Thanks!
We were in luck because Curie had just been released before the new
report got created.
Please verify the new numbers and close this ticket.
Verified on PROD. Thanks!
Elapsed: 0:00:00.001494