Issue Number | 4973 |
---|---|
Summary | Connection to EVS Server Failing |
Created | 2021-04-27 17:02:56 |
Issue Type | Improvement |
Submitted By | Englisch, Volker (NIH/NCI) [C] |
Assigned To | Englisch, Volker (NIH/NCI) [C] |
Status | Closed |
Resolved | 2021-04-30 17:45:53 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.289703 |
Our nightly job to processing the terminology documents and updating the entries for the Drug Dictionary needs to be updated. As it turns out there are 3 different errors we've experienced in the past week that we want to handle.
We are accessing the EVS system in order to extract the "preferred name" and therefore making this name independent from edits in the CDR. We're sending the C-code to retrieve this information. When the term document doesn't include the C-code, the software reports an error
As has happened yesterday, when there are network issues we may not get a response from the EVS. In this case, the software reports
failure resolving C1222
If we're sending a non-existing C-code, as we do for CDR800686 and CDR802479, we're getting a third type of error message:
"The concept code C1666138 is invalid."
We should catch these errors and report invalid data.
~oseipokuw, the CDR-IDs I listed above are incorrect. The correct CDR-IDs are 800069 (C167220 instead of C167720) and 799903 (C166138 instead of C1666138).
We could exclude any term document without a C-code from the filter query. That would eliminate those terms from being included in the vendor data and the Cancer.gov data.
I assume you mean the publishing query, not the filter query. Sounds good to me, as long as the terms which would be dropped aren't used elsewhere (such as by data partners to back up terms found in summaries).
I will make the modifications we decide we want for the
cdrapi.docs
module, and then reassign the ticket to you (~volker) to make any filter or
publishing query enhancements. At the very least, I will optimize the
case in which I get no concept ID from the filter, and not bother with
the invocation of the NCI/T API when that happens.
For all errors encountered by the ncit-pn
command in the
cdrapi.docs
module, I am returning an empty string. Would
you prefer that I throw an exception with an error message in some of
the cases you identified above?
Yes, we'll have to discuss with ~mbeckwit and ~oseipokuw how those terms might be used before we attempt to drop them.
What exactly happens further downstream with those terms that have an empty string? Will those terms be ignored for the drug dictionary or will they be included with a non-preferred name? I haven't tested that yet and would like to know exactly the type of damage this situation is causing.
Once you prevent calling the NCI/T API without a C-code it will cut down the number of errors in the log file dramatically and it will be much easier to identify real issues. However, I was a little surprised that we're seeing the same error for weeks and it's silently ignored because we didn't know to look for it. I suggest to maybe create a weekly report to look at the log files in order to find the errors (invalid C-code or connection error) or report those errors at the end of a publishing run so that those could be addressed immediately.
The codes have been corrected.
Reporting incorrect concept codes so we correct them should be OK with us. We manually add the concept codes so they are used to create links to the Thesaurus and in many cases this happens a few days or a week or two after the term is created and made publishable. I am not sure if you're suggesting dropping all changes to the term because of a missing or incorrect code but I think changes to the term should be published even if a concept code is missing or it is incorrect.
What exactly happens further downstream with those terms that have an empty string? Will those terms be ignored for the drug dictionary or will they be included with a non-preferred name?
I believe they're still sent to the ElasticSearch index, but they
just won't have an nci_concept_name
property (probably not
an nci_concept_id
property, either). That would mean that
you would be reducing the number of entries in the drug dictionary if
you filter out the documents without a concept ID in the publishing
filter. May not be a desirable outcome after all.
Once you prevent calling the NCI/T API without a C-code it will cut down the number of errors in the log file dramatically and it will be much easier to identify real issues.
Well, if your filter continues to call the extension function even when you don't have a concept ID, I'll still be logging that condition, even though I'm optimizing away the invocation of the NCI/T API (since I know that I'll fail in that case). It's true that the log entry will be smaller than it would be if I invoke the API triggering an exception, but the log entry will still be there.
I recommend that you modify the filter to avoid calling the
ncit-pn
extension function when you see that you don't have
a concept ID to pass to the function.
The ncit-pn function is called by the filter? I didn't realize that. I thought all the extension functions where part of the C++ code.
They were, originally. But the server doesn't have any C++ any more. I re-implemented all of those extension functions in Python. Here's the relevant template.
<!--
===================================================================
Template to retrieve and display the URL for the linked
DrugInfo Summary
=================================================================== -->
xsl:template name = "getNCITName">
<xsl:param name = "cname"
< select = "''"/>
xsl:element name = "NCITName">
<xsl:value-of select = "document(concat('cdrutil:ncit-pn/', $cname))"/>
<xsl:element>
</xsl:template> </
What exactly happens further downstream with those terms that have an empty string? Will those terms be ignored for the drug dictionary or will they be included with a non-preferred name?
I can answer my question now. From what I can find, the
NCIConceptName is not used at all. Therefore, if the concept name is
empty but the concept ID exists it can be assumed that the concept ID is
invalid. The term, however, is still loaded to ES and the drug term is
still part of the dictionary and includes a link to the NCI Thesaurus
which is build with the C-code
https://ncit.nci.nih.gov/ncitbrowser/ConceptReport.jsp?dictionary=NCI%20Thesaurus&code=C2039
as well as a link to clinical trials
https://www.cancer.gov/about-cancer/treatment/clinical-trials/intervention/C101261
and this links will fail for that invalid C-code.
If the C-code (concept ID) is missing we obviously won't be able to retrieve a concept name from the NCI Thesaurus. The term is still included in the dictionary (not all terms though) and the two links mentioned above are suppressed.
In other words, a missing NCIConceptName has no effect on Cancer.gov (as far as I can determine).
I saw some of the drugs without a concept ID to be included in the dictionary while others were not. I still need to identify what determines which of these terms are the lucky ones. It may be the existence of a definition block.
After talking to Blair he confirmed that the NCIConceptName isn't displayed on the front end and I can confirm that a term will only be included if a definition exists.
The following filter has been updated to avoid calling the fuction ncit-pn() if we don't have a concept ID:
CDR000134: Vendor Filter: Term
https://github.com/NCIOCPL/cdr-server/commit/5cc57b5
I ran a publishing job and a drug dictionary load. Looking at the log file I can confirm that the function ncit-pn() isn't called anymore for those terms without a concept ID.
The filter changes have been tested on the QA server. The function ncit-pn() doesn't get called if we don't have a concept ID.
The "ncit_pn" errors are gone on PROD since Newton was implemented. Closing ticket.
Elapsed: 0:00:00.001968