Issue Number | 4226 |
---|---|
Summary | [Term] Use current API for retrieval of concepts from the NCI Thesaurus |
Created | 2017-02-02 11:31:29 |
Issue Type | Improvement |
Submitted By | Kline, Bob (NIH/NCI) [C] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2017-02-08 16:07:55 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.202403 |
Tracy Safran (EVS Support) tells us that we have been using an unsupported version of the API to retrieve information from the NCI Thesaurus. We will need to migrate to the current API before too long. I have asked Tracy to give us an idea of the timeframe we're working with. The level of effort on our end would not be trivial, as we will not only have to change the host and request syntax for retrieving the concept information, but the data structures returned are completely different than those we've been working with, so our parser will have to be rewritten (so somewhere on the order of a day or two of development, plus testing and deployment).
While working on this task, I have discovered a number of bugs, unexplained inconsistencies, and other anomalies in the software which creates/updates CDR Term documents from the NCI thesaurus. I understand that several years ago there were a number of requests for custom logic in the import software which may no longer be appropriate. I suggest that we rationalize the logic in the software so that the OtherName and Definition blocks be the same after an update as they would have been if the Term document were being newly created. I would also like to drop the different handling for drug terms, for which I haven't been able to find a documented rationale (for example, the documents for drugs suppress the distinction between a synonym and a lexical variant, in contrast with all the other term documents, which make that distinction). ~oseipokuw: is this a reasonable proposal?
To summarize what we do for new Term documents:
We create an OtherName block for each FULL_SYN, CAS_Registry, and NSC_Code property which has a unique normalized name, ignoring FULL_SYN properties with a source other than "NCI" (we no longer receive IND_Code properties, since CIAT requested that they be suppressed as containing confidential information).
We create a Definition block for each unique definition which has a source of "NCI."
A couple of additional observations/questions:
There's a note in the code that we should start populating the NCIThesaurusConcept element which was added a while back; should we have the software start doing that, ~oseipokuw?
In retrospect, it's unfortunate that the name "OtherName" was adopted for the name properties, as it's confusing to have the preferred term name itself appearing in the "OtherName" list, since it isn't really a different name. I doubt there's anything we can do about that at this point, without a very expensive amount of work.
1. Yes, please populate the NCIThesaurusConcept element as we currently do that manually. We will continue add it manually if they are not present at the time of the initial import or during an update.
That pretty much can't happen. If the NCI Thesaurus were to give us a concept record without a concept ID (which in itself is hard to imagine) we'd refuse to import it, because we wouldn't have the confirmation we require that they gave us back the right record to match the import requested (by code) by the user.
I'm proceeding with modifying the software to create the same
OtherName and Definition blocks on update as would be created if the
Term document were being newly import (see comment above). The one piece
we don't update is the PreferredName element. As a safety measure (to
avoid updating from the wrong concept record), the software fails the
update if the PreferredName element in the CDR Term document being
updated does not match the preferred name in the concept record. In the
rare case in which EVS changes the preferred name in a
concept, the CDR user must first manually update the Term document's
PreferredName to match the preferred name in the concept record, and
then proceed with the refresh request.
Would I be correct in assuming the software should always set the Public attribute to Yes?
This is ready for testing on DEV.
Documentation for new module (for the development team):
Yes.
So we should expect to always have the element populated when an import is successful, right?. If this is the case it appears that we may not need the "Does that mean we won't need the "Thesaurus Concepts Not Marked Public" in the future.
You would only need it for documents created manually.
I have attached the spreadsheet William requested for mismatched preferred name values. Bear in mind that this is data on DEV, not PROD.
I confirmed that the OtherName values in our Term documents are used in the support for searching clinical trials on the web site.
Verified on DEV.
Verified on QA.
Can you please run the attached report on PROD and include a column for Semantic Type?
Please open a separate ticket. We're just going to be looking for mismatched preferred names, ignoring discrepancies in the other names?
File Name | Posted | User |
---|---|---|
mismatched-preferred-terms.xlsx | 2017-02-09 15:31:42 | Kline, Bob (NIH/NCI) [C] |
Elapsed: 0:00:00.001200