Issue Number | 119 |
---|---|
Summary | Import Citation: Exception Error when enter 9 char of PubMed ID |
Created | 2013-11-26 10:54:51 |
Issue Type | Bug |
Submitted By | tanguturisk |
Assigned To | alan |
Status | Closed |
Resolved | 2014-01-06 10:30:05 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/oceebms/issue.115443 |
Got an exception error when entering 9 char of pubmed id on Import Citation.
Screenshot attached for more detail.
Error Message:
exception 'Exception' with message 'EbmsImport.store: Nothing to store'
in
/local/content/web/appdev/sites/ebms.nci.nih.gov/modules/custom/ebms/EbmsImport.inc:574
Stack trace: #0
/local/content/web/appdev/sites/ebms.nci.nih.gov/modules/custom/ebms/EbmsImport.inc(993):
Ebms\ImportBatch->store() #1
/local/content/web/appdev/sites/ebms.nci.nih.gov/modules/custom/ebms/import.inc(419):
Ebms\importArticlesFromNLM('live', Array, '51', '139', '', true, 'R') #2
/local/content/web/appdev/includes/form.inc(1464):
pdq_ebms_import_form_submit(Array, Array) #3
/local/content/web/appdev/includes/form.inc(860):
form_execute_handlers('submit', Array, Array) #4
/local/content/web/appdev/includes/form.inc(374):
drupal_process_form('pdq_ebms_import...', Array, Array) #5
/local/content/web/appdev/includes/form.inc(131):
drupal_build_form('pdq_ebms_import...', Array) #6
/local/content/web/appdev/sites/ebms.nci.nih.gov/modules/custom/ebms/import.inc(81):
drupal_get_form('pdq_ebms_import...', NULL) #7
/local/content/web/appdev/sites/ebms.nci.nih.gov/modules/custom/ebms/import.inc(32):
EbmsImport->run() #8 [internal function]: pdq_ebms_import() #9
/local/content/web/appdev/includes/menu.inc(517):
call_user_func_array('pdq_ebms_import', Array) #10
/local/content/web/appdev/index.php(25): menu_execute_active_handler()
#11 {main}
The cause of this exception is an attempt to store records when there
are no records to store. In this particular case, it happened because
of an attempt to download an article from Pubmed using a 9 digit Pubmed
ID which doesn't match any article at Pubmed.
The same error will also occur whenever Pubmed cannot return any
articles matching a request. For example a request for:
Pubmed ID = 'abc'
will produce the same error.
We think, but cannot easily test, that the exception will also occur if
NLM has a problem and Pubmed is up but not answering requests properly
(which we think may have happened in OCEEBMS-121, which has been marked
as a duplicate of this bug.)
After discussing this with Bob, we think the proper fix should include:
1. Do not raise an Exception.
Exception messages should be used when we think there was an
internal error of some kind that should never happen as a result of
regular operations. However, sending a single wrong Pubmed ID to
NLM is perfectly possible in regular operation.
2. Save the output from NLM for post-mortem analysis.
In some cases it could be useful to see exactly what NLM returned
for the search, so we'll find a way to save that when this type of
error occurs.
3. Compare the count of requested articles to the count of articles
downloaded from NLM and notify users if not all were found.
Currently, the Exceptions seen by Sridhar and Minaxi occurred
because zero articles were successfully downloaded from NLM.
However if two articles were requested and only one had a bad Pubmed
ID, the program would import the good article and silently ignore
the one that was not found.
If NLM were to return an error message for the article, we'd see
that and report it. But NLM is not doing that. Apparently they
return the articles that were found and silently ignore the
requested IDs that were not found.
We should be able to do something more useful for those that aren't
found.
4. Display a more useful message to users.
i.e., something more like a regular EBMS error message explaining
that no articles were returned by NLM that matching our query.
I'll stop at this point and not do any more work on this until Robin or
someone prioritizes the issue.
It turns out that certain kinds of errors that could occur in the issue that Bob is working on to refresh the article XML we get from NLM could also be affected by this. I'll consider the issue in the light of the requirements for refreshing the data as well as for original import.
A key difference between the refresh program and the imports done by Minaxi is that the refresh program will probably work unattended. Errors should not only be handled in the edge cases where they are ignored now, but the handling shouldn't simply produce an error message for the screen and assume that a user is present to see the error message and decide on the spot whether something needs to be done.
I've been doing some testing. There seem to be four general
cases of errors that can occur. Currently, some of these errors
are silently ignored in my code.
1. Internet communications failure.
Possible causes can include things like DNS is down, a URL
changed at NLM, NLM is down, or an Internet connection failed
or was closed by the other side during processing.
In line with Bob's request, I propose to handle these as
follows:
Throw an exception containing the error message:
"Unable to retrieve data from NLM: {curl error message}"
For example:
"Unable to retrieve data from NLM: Couldn't resolve host
'eutils.ncbi.nlm.nih.gov'"
For the import program, the calling program can either let
this hit the screen (it should be very rare), or trap it and
produce a less infelicitous message than an Exception report.
2. Connection established but NLM returned a general error
message:
I'm inclined towards the same solution but with the message
"Error returned by NLM ..."
Example:
"Error returned by NLM: "Database: pubmed - is down for
maintenance"
That error message is found by parsing XML that looks like
this:
"Error returned by NLM: "<?xml version="1.0" encoding="UTF-8"?>
<eFetchResult>
<ERROR>Database: pubmed - is down for maintenance</ERROR>
</eFetchResult>
"
I'm not sure that we'll always get back a parsable XML
response in the expected format so, as a backup, if I don't
recognize the error message format, but can see that I'm not
getting the response I expect, I'll just dump the whole
response string (or some reasonable sized excerpt) into my
Exception message.
Again, the import program might choose to trap this exception
and present it to the user without the exception trappings.
3. Pubmed finds a requested PMID but there is a problem with it.
An example is an article that was in Pubmed, has been taken
away for some reason (publisher withdrew it, it was a
duplicate, etc.), but an error record is available.
I propose to handle these, as now, using the error mechanism
currently in place. The PMID and the message will be
included in the list of individual errors, one per PMID, in
the ImportBatch object that is returned.
4. An article ID was requested that Pubmed does not know exists.
This occurs if we send a wrong PMID, e.g., with an extra
digit or an alpha character, but it appears that it may also
occur with an article for which both the article and the PMID
have gone up in smoke for unknown reasons.
Pubmed appears to silently ignore these. They always return
a <PubmedArticleSet> XML container element which can have 0
or more articles in it. If I ask for one article and it
isn't found, I get an empty element. If I ask for three and
one isn't found, I get a container with two articles in it
and no mention whatever of the missing one.
I propose to handle these using the same mechanism in 3
above. I will put code in place to detect and identify a
PMID for which no response was received and create an error
record for it with an error message like:
"PMID not recognized by Pubmed as a Pubmed ID".
Does that meet all requirements for both interactive importing
and batch refreshes?
All of the tasks were completed and tested at the end of last week.
As a result of additional discussions between Bob and myself, the number of cases where an exception would be thrown have been narrowed down further than my last comment about it would indicate.
There is significantly more error checking in place now and Bob has also installed some up front error checking to reject obviously invalid Pubmed IDs even before they get to the point of being sent to Pubmed.
Promoted to QA.
Verified on QA.
Verified on prod.
File Name | Posted | User |
---|---|---|
screenshot.gif | 2013-11-26 10:54:51 |
Elapsed: 0:00:00.000647