Issue Number | 3588 |
---|---|
Summary | Links from Drug Dictionary to NCI Thesaurus |
Created | 2013-03-06 15:13:46 |
Issue Type | Improvement |
Submitted By | Osei-Poku, William (NIH/NCI) [C] |
Assigned To | Englisch, Volker (NIH/NCI) [C] |
Status | Closed |
Resolved | 2013-09-26 09:02:57 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.107916 |
BZISSUE::5287
BZDATETIME::2013-03-06 15:13:46
BZCREATOR::William Osei-Poku
BZASSIGNEE::Volker Englisch
BZQACONTACT::William Osei-Poku
NCI thesaurus concept codes/IDs are added to Term documents (using the NCIThesaurusConcept element) which are in turn used to create links from the Drug Dictionary to the NCI Thesaurus on Cancer.gov. However, there is always a lag between when we manually add the Concept code to the CDR term document and when the ID (term) is made available in the thesaurus (by NCI thesaurus staff) During this period, the link doesn't work from the Drug Dictionary on Cancer.gov to the NCI Thesaurus. As suggested by Margaret in the CIAT meeting yesterday, we want to implement a solution similar to the one for Glossary terms that have not been published yet but have been linked to by other CDR documents. We will like to discuss this and explore other solutions to the problem.
Let's talk about the options at this Thursday's meeting.
This will involve:
1. A new attribute on the Term document (thesaurus record is
public)
2. Filter change to avoid creating link in the published doc if
thesaurus record isn't public
3. Report of Term documents with thesaurus links not marked public
4. Possible enhanced version of 3 to include realtime check of
thesaurus
5. Global change job to populate existing Term docs with the new
attribute
Mary and I discussed this report and she indicated that this will be helpful if there is a real-time check of the thesaurus as suggested in #4 above. The inclusion of the possible new attribute and changes to the vendor fitter will fix the dead links on Cancer.gov. Reporting thesaurus links not marked public (#3) will mean checking about 30 to 40 terms each week to see they are available on the public web site of the NCI Thesaurus. Currently she is only checking the 'high profile' terms which is just a few terms.
Volker:
Can I help with parts of this? How about if I do the schema change, the report, and the global change, leaving you to wrestle with the filter changes?
That would certainly help a great deal. Aren't you already working on the next EBMS release?
Not yet. I'll jump in those parts of this task.
Schema has been modified on DEV, and a test-mode global change has been run. Please review the results:
https://cdr.dev.cancer.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2013-09-24_17-26-01
In order to keep on track for the release schedule, we'll need to keep the momentum going on this issue. I'm going to proceed with a live run of the global change on DEV soon, without waiting too much longer for the review of the test run.
William, could you give me a sample of one of these terms with a broken link on Cancer.tov?
Ignore my last comment. I found what I was looking for.
When there is a NCIThesaurusConcept ID the text displayed is:
Check for active clinical trials or closed clinical trials using this agent. (NCI Thesaurus)
I thought we wanted to remove the link like we do with the glossary
terms but that wouldn't make sense here since the text (NCI
Thesaurus) would still be displayed.
Am I correct that we want to remove the text (NCI Thesaurus)
when the concept ID isn't public?
Yes, I think that text should be removed if we can't make the link.
The following filter has been modified in order to remove the NCI Thesaurus link:
CDR000134.xsl: Vendor Filter: Term
I will run a couple of tests once the data has been updated with the new attribute.
I have reviewed several of the terms in the test run and the live documents and new attribute appears to have been added correctly. But I do have a question about the timing of the global. Was the global applied only to terms that are in the thesaurus or it was applied to all terms?
I'm not sure what the relationship between scope and timing would be, but since we decided to use the same Public attribute we have on email addresses, the only valid value is "Yes" so yes, the attribute was only added to terms which are in the publicly available database, not the ones you get in advance of publication on the spreadsheets sent to you. If you would prefer to change the definition of this attribute to allow values of both "Yes" and "No" we can make the attribute required and apply it to all NCIThesaurusConcept attributes. That would actually make the report software for this issue more efficient.
One tip for the folks entering the concept IDs: they are treated as case-sensitive by the thesaurus software, so you may find some marked as not yet public which you didn't expect. For example, c97336 is not found, but C97336 is.
Volker,
For test purposes, I have modified the following terms so that you can
run your tests:
CDR0000539704
CDR0000539695
CDR0000539100
CDR0000038786
CDR0000540668
CDR0000539705
CDR0000543726
CDR0000544572
CDR0000544718
CDR0000544742
CDR0000544743
CDR0000041197
Bob,
I asked the question because from my testing, I didn't come across any
terms that had not been assigned the new attribute so I assumed that all
the terms with concept IDs had been assigned the new value (whether they
were in the thesaurus or not). One more question for you, what is the
easiest way to find terms with concept ids that are not yet in the
thesaurus?
I'll answer that question as soon as I have finished implementing step 3 above. :-)
Report has been implemented on DEV (including real-time check of thesaurus for item 4 above, reflected in third column of report). Ready for user testing:
All five of the tasks for this issue have been implemented (on DEV).
Please add two columns to the report and display data from the DateLastModified and SemanticType elements.
For test purposes, I have modified the following terms so that you can run your tests:
On which machine is this - QA or DEV?
On DEV.
I finished testing. The link to the NCI Thesaurus is dropped on DEV and listed on QA.
The two new columns have been added to the report.
Verified on DEV.
Did you want the value to be applied to all the NCIThesaurusConcept elements, using "Yes" and "No" as the valid values?
The following filter has been versioned in SVN:
R12050: CDR000134.xml (Vendor Filter: Term)
The filter needs to be installed using the following command (typed on one line):
.py <username> <password> CDR0000000134.xml
$ updateFilter--docid=134 --version=Y
--publishable=Y
--comment="R12050 (OCECDR-3588): Suppress NCI Thesaurus link"
That will be good. It will leave little or no room for misinterpretation. Also, please place the report on the reports menu under Terminology/Other Reports
Changes made; ready for review on DEV. You have CDR38786 locked, William, so the global wasn't able to update that document.
I tried to manually update one record to set the value to 'No' but I am getting a schema validation error (DEV).
Did you log out of XMetaL and back in?
I just did and I can now see the new value. All changes are now verified.
The report has also been installed on the admin menu as requested (DEV).
Yes. I did verify the report on DEV yesterday.
I get the following error when I run the report from the admin menu:
502 - Web server received an invalid response while acting as a gateway or proxy server.
There is a problem with the page you are looking for, and it cannot be displayed. When the Web server (while acting as a gateway or proxy) contacted the upstream content server, it received an invalid response from the content server.
I was able to run the report without any problems, so perhaps this was a temporary glitch (possibly caused by sluggish performance on the QA server, which is abysmally slow). Could you give it another try, please?
Yes. I am able to run the report successfully now. Thanks!
Verified on QA.
The global change job is running on PROD, but it will take a while (almost 10K docs to update). Hold off on trying the report until I let you know the global change job has finished.
The global change job completed successfully. You can run the report on production now.
It appears the schema changes are not installed yet. We are getting DTD validation errors when accessing some of the terminology files with the new attribute like this this term -CDR0000042613. I have logged out and logged back into XMetal with the same results.
The repository had the schema changes (so the global change job did not invalidate documents which would otherwise be valid), but somehow the job to update the DTD for the client didn't take. The problem has been corrected. Please log out, log back in, and try again.
Verified on Prod. and Cancer.gov
Elapsed: 0:00:00.000828