PDQ Issues

Issue Number	119
Summary	Import Citation: Exception Error when enter 9 char of PubMed ID
Created	2013-11-26 10:54:51
Issue Type	Bug
Submitted By	tanguturisk
Assigned To	alan
Status	Closed
Resolved	2014-01-06 10:30:05
Resolution	Fixed
Path	/home/bkline/backups/jira/oceebms/issue.115443

Description

Got an exception error when entering 9 char of pubmed id on Import Citation.

Screenshot attached for more detail.

Error Message:
exception 'Exception' with message 'EbmsImport.store: Nothing to store' in /local/content/web/appdev/sites/ebms.nci.nih.gov/modules/custom/ebms/EbmsImport.inc:574 Stack trace: #0 /local/content/web/appdev/sites/ebms.nci.nih.gov/modules/custom/ebms/EbmsImport.inc(993): Ebms\ImportBatch->store() #1 /local/content/web/appdev/sites/ebms.nci.nih.gov/modules/custom/ebms/import.inc(419): Ebms\importArticlesFromNLM('live', Array, '51', '139', '', true, 'R') #2 /local/content/web/appdev/includes/form.inc(1464): pdq_ebms_import_form_submit(Array, Array) #3 /local/content/web/appdev/includes/form.inc(860): form_execute_handlers('submit', Array, Array) #4 /local/content/web/appdev/includes/form.inc(374): drupal_process_form('pdq_ebms_import...', Array, Array) #5 /local/content/web/appdev/includes/form.inc(131): drupal_build_form('pdq_ebms_import...', Array) #6 /local/content/web/appdev/sites/ebms.nci.nih.gov/modules/custom/ebms/import.inc(81): drupal_get_form('pdq_ebms_import...', NULL) #7 /local/content/web/appdev/sites/ebms.nci.nih.gov/modules/custom/ebms/import.inc(32): EbmsImport->run() #8 [internal function]: pdq_ebms_import() #9 /local/content/web/appdev/includes/menu.inc(517): call_user_func_array('pdq_ebms_import', Array) #10 /local/content/web/appdev/index.php(25): menu_execute_active_handler() #11 {main}

Comment entered 2013-12-06 00:37:49 by alan

The cause of this exception is an attempt to store records when there
are no records to store.  In this particular case, it happened because
of an attempt to download an article from Pubmed using a 9 digit Pubmed
ID which doesn't match any article at Pubmed.

The same error will also occur whenever Pubmed cannot return any
articles matching a request.  For example a request for:

    Pubmed ID = 'abc'
    
will produce the same error.

We think, but cannot easily test, that the exception will also occur if
NLM has a problem and Pubmed is up but not answering requests properly
(which we think may have happened in OCEEBMS-121, which has been marked
as a duplicate of this bug.)

After discussing this with Bob, we think the proper fix should include:

 1. Do not raise an Exception.

    Exception messages should be used when we think there was an
    internal error of some kind that should never happen as a result of
    regular operations.  However, sending a single wrong Pubmed ID to
    NLM is perfectly possible in regular operation.

 2. Save the output from NLM for post-mortem analysis.

    In some cases it could be useful to see exactly what NLM returned
    for the search, so we'll find a way to save that when this type of
    error occurs.

 3. Compare the count of requested articles to the count of articles
    downloaded from NLM and notify users if not all were found.

    Currently, the Exceptions seen by Sridhar and Minaxi occurred
    because zero articles were successfully downloaded from NLM.
    However if two articles were requested and only one had a bad Pubmed
    ID, the program would import the good article and silently ignore
    the one that was not found.

    If NLM were to return an error message for the article, we'd see
    that and report it.  But NLM is not doing that.  Apparently they 
    return the articles that were found and silently ignore the
    requested IDs that were not found.

    We should be able to do something more useful for those that aren't
    found.

 4. Display a more useful message to users.

    i.e., something more like a regular EBMS error message explaining
    that no articles were returned by NLM that matching our query.

I'll stop at this point and not do any more work on this until Robin or
someone prioritizes the issue.

Comment entered 2013-12-30 15:53:05 by alan

It turns out that certain kinds of errors that could occur in the issue that Bob is working on to refresh the article XML we get from NLM could also be affected by this. I'll consider the issue in the light of the requirements for refreshing the data as well as for original import.

A key difference between the refresh program and the imports done by Minaxi is that the refresh program will probably work unattended. Errors should not only be handled in the edge cases where they are ignored now, but the handling shouldn't simply produce an error message for the screen and assume that a user is present to see the error message and decide on the spot whether something needs to be done.

Comment entered 2013-12-30 19:00:38 by alan

I've been doing some testing.  There seem to be four general
cases of errors that can occur.  Currently, some of these errors
are silently ignored in my code.

 1. Internet communications failure.

    Possible causes can include things like DNS is down, a URL
    changed at NLM, NLM is down, or an Internet connection failed
    or was closed by the other side during processing.

    In line with Bob's request, I propose to handle these as
    follows:

        Throw an exception containing the error message:

        "Unable to retrieve data from NLM: {curl error message}"

    For example:

        "Unable to retrieve data from NLM: Couldn't resolve host
        'eutils.ncbi.nlm.nih.gov'"

    For the import program, the calling program can either let
    this hit the screen (it should be very rare), or trap it and
    produce a less infelicitous message than an Exception report.

 2. Connection established but NLM returned a general error
    message:

    I'm inclined towards the same solution but with the message
    "Error returned by NLM ..."

    Example:

        "Error returned by NLM: "Database: pubmed - is down for
        maintenance"

    That error message is found by parsing XML that looks like
    this:

        "Error returned by NLM: "<?xml version="1.0" encoding="UTF-8"?>

        <eFetchResult>
              <ERROR>Database: pubmed - is down for maintenance</ERROR>
        </eFetchResult>
        "

    I'm not sure that we'll always get back a parsable XML
    response in the expected format so, as a backup, if I don't
    recognize the error message format, but can see that I'm not
    getting the response I expect, I'll just dump the whole
    response string (or some reasonable sized excerpt) into my
    Exception message.

    Again, the import program might choose to trap this exception
    and present it to the user without the exception trappings.

 3. Pubmed finds a requested PMID but there is a problem with it.

    An example is an article that was in Pubmed, has been taken
    away for some reason (publisher withdrew it, it was a
    duplicate, etc.), but an error record is available.

    I propose to handle these, as now, using the error mechanism
    currently in place.  The PMID and the message will be
    included in the list of individual errors, one per PMID, in
    the ImportBatch object that is returned.

 4. An article ID was requested that Pubmed does not know exists.

    This occurs if we send a wrong PMID, e.g., with an extra
    digit or an alpha character, but it appears that it may also
    occur with an article for which both the article and the PMID
    have gone up in smoke for unknown reasons.

    Pubmed appears to silently ignore these.  They always return
    a <PubmedArticleSet> XML container element which can have 0
    or more articles in it.  If I ask for one article and it
    isn't found, I get an empty element.  If I ask for three and
    one isn't found, I get a container with two articles in it
    and no mention whatever of the missing one.

    I propose to handle these using the same mechanism in 3
    above.  I will put code in place to detect and identify a
    PMID for which no response was received and create an error
    record for it with an error message like:

        "PMID not recognized by Pubmed as a Pubmed ID".

Does that meet all requirements for both interactive importing
and batch refreshes?

Comment entered 2014-01-06 10:30:05 by alan

All of the tasks were completed and tested at the end of last week.

As a result of additional discussions between Bob and myself, the number of cases where an exception would be thrown have been narrowed down further than my last comment about it would indicate.

There is significantly more error checking in place now and Bob has also installed some up front error checking to reject obviously invalid Pubmed IDs even before they get to the point of being sent to Pubmed.

Comment entered 2014-01-06 11:58:03 by Kline, Bob (NIH/NCI) [C]

Promoted to QA.

Comment entered 2014-01-17 11:49:08 by Juthe, Robin (NIH/NCI) [E]

Verified on QA.

Comment entered 2014-02-25 16:26:37 by Juthe, Robin (NIH/NCI) [E]

Verified on prod.

Attachments

File Name	Posted	User
screenshot.gif	2013-11-26 10:54:51

Elapsed: 0:00:00.000492

EBMS Tickets