CDR Tickets

Issue Number 4226
Summary [Term] Use current API for retrieval of concepts from the NCI Thesaurus
Created 2017-02-02 11:31:29
Issue Type Improvement
Submitted By Kline, Bob (NIH/NCI) [C]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2017-02-08 16:07:55
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.202403
Description

Tracy Safran (EVS Support) tells us that we have been using an unsupported version of the API to retrieve information from the NCI Thesaurus. We will need to migrate to the current API before too long. I have asked Tracy to give us an idea of the timeframe we're working with. The level of effort on our end would not be trivial, as we will not only have to change the host and request syntax for retrieving the concept information, but the data structures returned are completely different than those we've been working with, so our parser will have to be rewritten (so somewhere on the order of a day or two of development, plus testing and deployment).

Comment entered 2017-02-06 08:13:52 by Kline, Bob (NIH/NCI) [C]

While working on this task, I have discovered a number of bugs, unexplained inconsistencies, and other anomalies in the software which creates/updates CDR Term documents from the NCI thesaurus. I understand that several years ago there were a number of requests for custom logic in the import software which may no longer be appropriate. I suggest that we rationalize the logic in the software so that the OtherName and Definition blocks be the same after an update as they would have been if the Term document were being newly created. I would also like to drop the different handling for drug terms, for which I haven't been able to find a documented rationale (for example, the documents for drugs suppress the distinction between a synonym and a lexical variant, in contrast with all the other term documents, which make that distinction). : is this a reasonable proposal?

To summarize what we do for new Term documents:

  1. We create an OtherName block for each FULL_SYN, CAS_Registry, and NSC_Code property which has a unique normalized name, ignoring FULL_SYN properties with a source other than "NCI" (we no longer receive IND_Code properties, since CIAT requested that they be suppressed as containing confidential information).

  2. We create a Definition block for each unique definition which has a source of "NCI."

A couple of additional observations/questions:

  1. There's a note in the code that we should start populating the NCIThesaurusConcept element which was added a while back; should we have the software start doing that, ?

  2. In retrospect, it's unfortunate that the name "OtherName" was adopted for the name properties, as it's confusing to have the preferred term name itself appearing in the "OtherName" list, since it isn't really a different name. I doubt there's anything we can do about that at this point, without a very expensive amount of work.

Comment entered 2017-02-07 13:12:16 by Osei-Poku, William (NIH/NCI) [C]

1. Yes, please populate the NCIThesaurusConcept element as we currently do that manually. We will continue add it manually if they are not present at the time of the initial import or during an update.

Comment entered 2017-02-07 18:11:58 by Kline, Bob (NIH/NCI) [C]

That pretty much can't happen. If the NCI Thesaurus were to give us a concept record without a concept ID (which in itself is hard to imagine) we'd refuse to import it, because we wouldn't have the confirmation we require that they gave us back the right record to match the import requested (by code) by the user.

Comment entered 2017-02-07 18:48:05 by Kline, Bob (NIH/NCI) [C]

I'm proceeding with modifying the software to create the same OtherName and Definition blocks on update as would be created if the Term document were being newly import (see comment above). The one piece we don't update is the PreferredName element. As a safety measure (to avoid updating from the wrong concept record), the software fails the update if the PreferredName element in the CDR Term document being updated does not match the preferred name in the concept record. In the rare case in which EVS changes the preferred name in a concept, the CDR user must first manually update the Term document's PreferredName to match the preferred name in the concept record, and then proceed with the refresh request.

Comment entered 2017-02-07 21:10:33 by Kline, Bob (NIH/NCI) [C]

Would I be correct in assuming the software should always set the Public attribute to Yes?

Comment entered 2017-02-08 16:07:55 by Kline, Bob (NIH/NCI) [C]

This is ready for testing on DEV.

Comment entered 2017-02-09 09:16:19 by Kline, Bob (NIH/NCI) [C]

Documentation for new module (for the development team):

http://cdr-dev.cancer.gov/nci_thesaurus.html

Comment entered 2017-02-09 12:37:38 by Osei-Poku, William (NIH/NCI) [C]

Yes.

Comment entered 2017-02-09 12:42:52 by Osei-Poku, William (NIH/NCI) [C]

So we should expect to always have the element populated when an import is successful, right?. If this is the case it appears that we may not need the "Does that mean we won't need the "Thesaurus Concepts Not Marked Public" in the future.

Comment entered 2017-02-09 15:32:35 by Kline, Bob (NIH/NCI) [C]

You would only need it for documents created manually.

Comment entered 2017-02-09 15:33:16 by Kline, Bob (NIH/NCI) [C]

I have attached the spreadsheet William requested for mismatched preferred name values. Bear in mind that this is data on DEV, not PROD.

Comment entered 2017-02-14 15:41:02 by Kline, Bob (NIH/NCI) [C]

I confirmed that the OtherName values in our Term documents are used in the support for searching clinical trials on the web site.

Comment entered 2017-02-16 18:29:22 by Osei-Poku, William (NIH/NCI) [C]

Verified on DEV.

Comment entered 2017-02-28 13:17:27 by Osei-Poku, William (NIH/NCI) [C]

Verified on QA.

Comment entered 2017-05-01 19:05:02 by Osei-Poku, William (NIH/NCI) [C]

Can you please run the attached report on PROD and include a column for Semantic Type?

Comment entered 2017-05-01 19:22:11 by Kline, Bob (NIH/NCI) [C]

Please open a separate ticket. We're just going to be looking for mismatched preferred names, ignoring discrepancies in the other names?

Attachments
File Name Posted User
mismatched-preferred-terms.xlsx 2017-02-09 15:31:42 Kline, Bob (NIH/NCI) [C]

Elapsed: 0:00:00.001200