PDQ Issues

Issue Number	4973
Summary	Connection to EVS Server Failing
Created	2021-04-27 17:02:56
Issue Type	Improvement
Submitted By	Englisch, Volker (NIH/NCI) [C]
Assigned To	Englisch, Volker (NIH/NCI) [C]
Status	Closed
Resolved	2021-04-30 17:45:53
Resolution	Fixed
Path	/home/bkline/backups/jira/ocecdr/issue.289703

Description

Our nightly job to processing the terminology documents and updating the entries for the Drug Dictionary needs to be updated. As it turns out there are 3 different errors we've experienced in the past week that we want to handle.

We are accessing the EVS system in order to extract the "preferred name" and therefore making this name independent from edits in the CDR. We're sending the C-code to retrieve this information. When the term document doesn't include the C-code, the software reports an error
As has happened yesterday, when there are network issues we may not get a response from the EVS. In this case, the software reports

failure resolving C1222
If we're sending a non-existing C-code, as we do for CDR800686 and CDR802479, we're getting a third type of error message:

"The concept code C1666138 is invalid."

We should catch these errors and report invalid data.

Comment entered 2021-04-27 17:15:05 by Englisch, Volker (NIH/NCI) [C]

~oseipokuw, the CDR-IDs I listed above are incorrect. The correct CDR-IDs are 800069 (C167220 instead of C167720) and 799903 (C166138 instead of C1666138).

Comment entered 2021-04-27 18:00:06 by Englisch, Volker (NIH/NCI) [C]

We could exclude any term document without a C-code from the filter query. That would eliminate those terms from being included in the vendor data and the Cancer.gov data.

Comment entered 2021-04-28 07:37:14 by Kline, Bob (NIH/NCI) [C]

I assume you mean the publishing query, not the filter query. Sounds good to me, as long as the terms which would be dropped aren't used elsewhere (such as by data partners to back up terms found in summaries).

Comment entered 2021-04-28 11:00:02 by Kline, Bob (NIH/NCI) [C]

I will make the modifications we decide we want for the cdrapi.docs module, and then reassign the ticket to you (~volker) to make any filter or publishing query enhancements. At the very least, I will optimize the case in which I get no concept ID from the filter, and not bother with the invocation of the NCI/T API when that happens.

For all errors encountered by the ncit-pn command in the cdrapi.docs module, I am returning an empty string. Would you prefer that I throw an exception with an error message in some of the cases you identified above?

Comment entered 2021-04-28 11:07:49 by Englisch, Volker (NIH/NCI) [C]

Yes, we'll have to discuss with ~mbeckwit and ~oseipokuw how those terms might be used before we attempt to drop them.

Comment entered 2021-04-28 11:20:23 by Englisch, Volker (NIH/NCI) [C]

What exactly happens further downstream with those terms that have an empty string? Will those terms be ignored for the drug dictionary or will they be included with a non-preferred name? I haven't tested that yet and would like to know exactly the type of damage this situation is causing.

Once you prevent calling the NCI/T API without a C-code it will cut down the number of errors in the log file dramatically and it will be much easier to identify real issues. However, I was a little surprised that we're seeing the same error for weeks and it's silently ignored because we didn't know to look for it. I suggest to maybe create a weekly report to look at the log files in order to find the errors (invalid C-code or connection error) or report those errors at the end of a publishing run so that those could be addressed immediately.

Comment entered 2021-04-28 11:38:46 by Osei-Poku, William (NIH/NCI) [C]

The codes have been corrected.

Comment entered 2021-04-28 11:53:12 by Osei-Poku, William (NIH/NCI) [C]

Reporting incorrect concept codes so we correct them should be OK with us. We manually add the concept codes so they are used to create links to the Thesaurus and in many cases this happens a few days or a week or two after the term is created and made publishable. I am not sure if you're suggesting dropping all changes to the term because of a missing or incorrect code but I think changes to the term should be published even if a concept code is missing or it is incorrect.

Comment entered 2021-04-28 11:53:20 by Kline, Bob (NIH/NCI) [C]

What exactly happens further downstream with those terms that have an empty string? Will those terms be ignored for the drug dictionary or will they be included with a non-preferred name?

I believe they're still sent to the ElasticSearch index, but they just won't have an nci_concept_name property (probably not an nci_concept_id property, either). That would mean that you would be reducing the number of entries in the drug dictionary if you filter out the documents without a concept ID in the publishing filter. May not be a desirable outcome after all.

Once you prevent calling the NCI/T API without a C-code it will cut down the number of errors in the log file dramatically and it will be much easier to identify real issues.

Well, if your filter continues to call the extension function even when you don't have a concept ID, I'll still be logging that condition, even though I'm optimizing away the invocation of the NCI/T API (since I know that I'll fail in that case). It's true that the log entry will be smaller than it would be if I invoke the API triggering an exception, but the log entry will still be there.

I recommend that you modify the filter to avoid calling the ncit-pn extension function when you see that you don't have a concept ID to pass to the function.

Comment entered 2021-04-28 12:25:56 by Englisch, Volker (NIH/NCI) [C]

The ncit-pn function is called by the filter? I didn't realize that. I thought all the extension functions where part of the C++ code.

Comment entered 2021-04-28 12:55:41 by Kline, Bob (NIH/NCI) [C]

They were, originally. But the server doesn't have any C++ any more. I re-implemented all of those extension functions in Python. Here's the relevant template.

  <!--
 ===================================================================
 Template to retrieve and display the URL for the linked
 DrugInfo Summary
 =================================================================== -->
 <xsl:template                    name = "getNCITName">
  <xsl:param                      name = "cname"
                                select = "''"/>
  <xsl:element                    name = "NCITName">
   <xsl:value-of                select = "document(concat('cdrutil:ncit-pn/', $cname))"/>
  </xsl:element>
 </xsl:template>

Comment entered 2021-04-28 16:26:03 by Englisch, Volker (NIH/NCI) [C]

What exactly happens further downstream with those terms that have an empty string? Will those terms be ignored for the drug dictionary or will they be included with a non-preferred name?

I can answer my question now. From what I can find, the NCIConceptName is not used at all. Therefore, if the concept name is empty but the concept ID exists it can be assumed that the concept ID is invalid. The term, however, is still loaded to ES and the drug term is still part of the dictionary and includes a link to the NCI Thesaurus which is build with the C-code
https://ncit.nci.nih.gov/ncitbrowser/ConceptReport.jsp?dictionary=NCI%20Thesaurus&code=C2039

as well as a link to clinical trials
https://www.cancer.gov/about-cancer/treatment/clinical-trials/intervention/C101261

and this links will fail for that invalid C-code.

If the C-code (concept ID) is missing we obviously won't be able to retrieve a concept name from the NCI Thesaurus. The term is still included in the dictionary (not all terms though) and the two links mentioned above are suppressed.

In other words, a missing NCIConceptName has no effect on Cancer.gov (as far as I can determine).

I saw some of the drugs without a concept ID to be included in the dictionary while others were not. I still need to identify what determines which of these terms are the lucky ones. It may be the existence of a definition block.

Comment entered 2021-04-29 19:48:37 by Englisch, Volker (NIH/NCI) [C]

After talking to Blair he confirmed that the NCIConceptName isn't displayed on the front end and I can confirm that a term will only be included if a definition exists.

Comment entered 2021-04-29 19:52:18 by Englisch, Volker (NIH/NCI) [C]

The following filter has been updated to avoid calling the fuction ncit-pn() if we don't have a concept ID:

CDR000134: Vendor Filter: Term
https://github.com/NCIOCPL/cdr-server/commit/5cc57b5

Comment entered 2021-04-30 17:45:40 by Englisch, Volker (NIH/NCI) [C]

I ran a publishing job and a drug dictionary load. Looking at the log file I can confirm that the function ncit-pn() isn't called anymore for those terms without a concept ID.

Comment entered 2021-05-25 19:07:16 by Englisch, Volker (NIH/NCI) [C]

The filter changes have been tested on the QA server. The function ncit-pn() doesn't get called if we don't have a concept ID.

Comment entered 2021-06-22 09:59:29 by Englisch, Volker (NIH/NCI) [C]

The "ncit_pn" errors are gone on PROD since Newton was implemented. Closing ticket.

Elapsed: 0:00:00.001313

CDR Tickets