Issue Number | 4916 |
---|---|
Summary | Create Ad-hoc Query for Drug Terms |
Created | 2020-11-13 17:12:26 |
Issue Type | Improvement |
Submitted By | Englisch, Volker (NIH/NCI) [C] |
Assigned To | Englisch, Volker (NIH/NCI) [C] |
Status | Closed |
Resolved | 2020-11-13 19:48:19 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.278607 |
Could you please generate an ad hoc query for all drug terms with the following fields:
Preferred name
Definition
CDR ID
NCI Thesaurus ID
This should be for all drug terms that have currently been published to Cancer.gov. If it is possible, on a second tab, would you include all drug terms that are in the CDR but have not been published to Cancer.gov and their status (blocked or not). Also, please indicate which ones have been made publishable but have not bee published to Cancer.gov yet. I know a second tab might not be possible so a second query should be fine.
Thank you!
William
Names for the ad-hoc queries:
Drug Terms - published to Drug Dictionary
Drug Terms - NOT published
The following path has to be added to the list of query terms. All term documents will need to be re-indexed on all tiers:
/Term/Definition/DefinitionText
Hi Volker,
Would you be able to modify the query to include Other Names in a column separated by semi colons?
Thanks,
William
The queries have been created on all tiers. The drug terms have been reindexed.
Please check out the 2 new queries.
Thanks ~volker. When I run the first query Drug terms on published to Cancer.gov, it retrieves 7261 however, on Cancer.gov the total number appears to be 7845. I am not sure where the discrepancy is coming from but it appears to be significant.
I see that Cancer.gov lists brand names as a separate entry, therefore inflating the numbers we list since it would count a drug multiple times, once for the preferred name and once for the brand name.
I'm still looking at the possibility that some terms aren't listing the C-code properly in the field for the NCIThesaurusConcept. A term without this element would be dropped on the list.
I looked through the output for one letter (Z) to compare and find any differences between the Cancer.gov list and the ad-hoc query output:
The total number of drugs listed on Cancer.gov is 113
55 of those are brand names and need to be excluded: Total on Cancer.gov - 58
2 terms are missing on Cancer.gov because of an incorrect
SemanticType (this is a bug in our export software)
Total on Cancer.gov: 60
Ad-hoc query lists 59
1 term is excluded from Cancer.gov because of a missing
definition (missing ReviewStatus)
Total ad-hoc: 58
2 terms are excluded from ad-hoc query because of missing
C-code
Total ad-hoc: 60
It appears the numbers are matching between both lists when the missing data elements are considered.
These are the terms with issues:
803278 - zelenoleucel: SemanticType
803287 - zirconium Zr 89-DFO-fianlimab: SemanticType
764414 - zirconium Zr 89-girentuximab: missing ReviewStatus
zirconium Zr 89-desferrioxamine B monoclonal antibody huJ591: missing C-code
zirconium Zr 89-labeled anti-PIGF monoclonal antibody RO5323441: missing C-code
zirconium Zr 89˗DFO˗REGN3504
For this term the dash ('-') character is not a dash and is displayed as
question mark in the ad-hoc query.
Could you please include only terms with a Drug/Agent semantic type. Currently the spreadsheet include terms that are not drug terms. Like:
37766 |
stage 0 chronic lymphocytic leukemia |
37769 |
adult lymphoblastic lymphoma |
**
37790 |
Waldenström macroglobulinemia |
The query has been updated on PROD.
The errors have been fixed. Thank you!
Looks good on PROD. Thank you!
Adding offline conversation with about this ticket.
From: Englisch, Volker (NIH/NCI) [C]
<volker@mail.nih.gov>
Sent: Thursday, November 19, 2020 5:14 PM
To: Osei-Poku, William
<William.Osei-Poku@icf.com>
Subject: Re: CDR oddity this week
In that case, why don’t we remove the definition from the query but keep the query with a comment and delete the query term?
Deleting the query_term will prevent the warning message to pop up with every save and the comment should remind us what to do the next time we need to run this query again if the definition needs to be included.
Plus, we should add a comment to the ticket.
Thanks,
Volker
Volker Englisch
NCI OCPL – Office of Communications & Public Liaison
Contractor: publicis sapient
NCI: 240-276-6583
From: "Osei-Poku, William" <William.Osei-Poku@icf.com>
Date: Thursday, November 19, 2020 at 3:48 PM
To: Volker Englisch <volker@mail.nih.gov>
Subject: RE: CDR oddity this week
We can remove it from the query terms and the report for now as we’ve already generated a spreadsheet to send to EVS. However, I am sure at some point, they will ask for another one so we may have to repeat this whole thing again.
Thanks,
William
From: Englisch, Volker (NIH/NCI) [C] <volker@mail.nih.gov>
Sent: Thursday, November 19, 2020 3:46 PM
To: Osei-Poku, William <William.Osei-Poku@icf.com>
Subject: Re: CDR oddity this week
I think the only way to avoid this would be to remove the DrugDefinitionText from the query terms and remove it from the new ad-hoc report or create the report via a Python script.
It’s just a notification and therefore not really something to be concerned about but I can see how it might get annoying.
Thanks,
Volker
Volker Englisch
NCI OCPL – Office of Communications & Public Liaison
Contractor: publicis sapient
NCI: 240-276-6583
From: "Osei-Poku, William" <William.Osei-Poku@icf.com>
Date: Thursday, November 19, 2020 at 3:22 PM
To: Volker Englisch <volker@mail.nih.gov>
Subject: RE: CDR oddity this week
Should we just ignore it ?
Thanks,
William
From: Englisch, Volker (NIH/NCI) [C] <volker@mail.nih.gov>
Sent: Thursday, November 19, 2020 2:57 PM
To: Osei-Poku, William <William.Osei-Poku@icf.com>
Subject: Re: CDR oddity this week
I don’t think it’s a limit on drug definitions but on indexed terms. I had to add the drug definitions to the query_term index table for your ad-hoc report. That’s likely why you’re now seeing this because the query_term table gets updated with every save.
Thanks,
Volker
Volker Englisch
NCI OCPL – Office of Communications & Public Liaison
Contractor: publicis sapient
NCI: 240-276-6583
From: "Osei-Poku, William" <William.Osei-Poku@icf.com>
Date: Thursday, November 19, 2020 at 2:46 PM
To: Volker Englisch <volker@mail.nih.gov>
Subject: FW: CDR oddity this week
Hi Volker,
Are you aware that there is an 800 character limit on Drug definitions ?
Thanks,
William
From: Barnstead, Mary (NIH/NCI) [C] <mary.barnstead@nih.gov>
Sent: Thursday, November 19, 2020 2:44 PM
To: Osei-Poku, William <William.Osei-Poku@icf.com>
Subject: RE: CDR oddity this week
Hi William,
I first started seeing it yesterday.
Thanks
Mary
From: Osei-Poku, William <William.Osei-Poku@icf.com>
Sent: Thursday, November 19, 2020 2:27 PM
To: Barnstead, Mary (NIH/NCI) [C] <mary.barnstead@nih.gov>
Subject: RE: CDR oddity this week
Hi Mary,
Is this the first time you’re seeing this or you’ve seen it before.
Thanks,
William
From: Barnstead, Mary (NIH/NCI) [C] <mary.barnstead@nih.gov>
Sent: Thursday, November 19, 2020 8:53 AM
To: Osei-Poku, William <William.Osei-Poku@icf.com>
Subject: CDR oddity this week
Hi William,
In term records, I’ve started getting this message when there is a long definition:
Is this anything to worry about?
Thanks!
Mary
______________________________________________________________________________________
Mary Barnstead, MS PMP
CIAT Terminology and Drug Information Manager – NCI/ICF
(301) 407-6640 (office) (240) 449-9762 (cell)
Elapsed: 0:00:00.001897