EBMS Tickets

Issue Number 307
Summary Error while importing to the EBMS
Created 2015-08-04 16:11:42
Issue Type Bug
Submitted By trivedim
Assigned To alan
Status Closed
Resolved 2015-08-04 19:52:55
Resolution Won't Fix
Path /home/bkline/backups/jira/oceebms/issue.166738
Description

While importing citations for August 2015 review cycle, I got the attached error two times. First time I got the error when I had just begun importing.
I was able to import one file Adrenocortical Carcinoma with 7 citations (one was duplicate). While importing the next file Adult ALL, I got the attached message. I logged out and logged in again and could import about 32 files. Now I have got the error again when I tried to import Head and Neck Cancer file. Please let me know if it is safe to import after logging off and logging in again.

Comment entered 2015-08-04 17:14:23 by Kline, Bob (NIH/NCI) [C]

Alan:

Please take a look at this.

Comment entered 2015-08-04 17:20:19 by alan

I'll look at it tonight and post something about what I found or didn't find.

Comment entered 2015-08-04 19:48:27 by alan

It appears that there were three failures today. As near as I can tell, all of them were errors originating at NLM. There were two 503 "Service Unavailable" and one 502 "Bad Gateway" errors. I presume these were caused by transient problems at NLM with their webserver, database, network, or some other component problem.

At our end, each of these errors occurred at the start of each import. No articles were imported and nothing went wrong in our database. No partial or corrupt article records were stored. Even if we had imported some articles and failed on others, everything should still be okay in our database. The articles imported successfully would be stored. If a mangled article came in (very unlikely, I think these transactions either succeed or fail), it would almost certainly result in the mangled article being discarded (due to an xml parse error.) If the search were re-run, the articles that were successfully imported in the job that got interrupted would come in again but simply be marked as duplicates, with no harm done, and any that were missed in the first import would come in in the second.

Therefore, I think it's perfectly safe to go on importing. If this happens again I would suggest waiting a few minutes to give NLM a chance to recognize and fix any problems at their end, and then try again. If it keeps happening, I'd wait longer than a few minutes. But whatever we do, I don't believe these kinds of errors will cause any problems other than inconvenience at our end.

Comment entered 2015-08-04 19:52:55 by alan

I'm resolving this as "Won't Fix" because I think the fixing needs to be done at NLM, not at our end. They're pretty reliable there. I'm optimistic that they've already fixed their problem and that we won't see the problems tomorrow.

Comment entered 2015-08-20 11:46:47 by Juthe, Robin (NIH/NCI) [E]

I just got this error too, when trying to import a single citation. I received the error 4-5 times, logged out and logged back in, and it worked. Just wanted to document that it's still happening, but it's sporadic and seems to be resolving itself relatively quickly.

Comment entered 2015-08-20 12:05:33 by alan

Oddly, I only see one error in the log file for production from today, at 11:39 am. Were all of the attempts made in PROD?

As before, it was a "Service unavailable" error coming from NLM. In theory your logout and login shouldn't have had any effect on this and it was just coincidence that NLM resolved the problem sometime just before your re-login.

I'll do a little noodling around at NLM to see if they provide any tracking info for when they've had problems.

Comment entered 2015-08-20 12:15:28 by Juthe, Robin (NIH/NCI) [E]

Yes, all attempts were on PROD in relatively quick succession (over the next few minutes) from the same page. I guess it's good to know that hitting "SUBMIT" again after a failure doesn't act as a new submission.

Comment entered 2015-08-20 12:16:31 by alan

I've sent a message to the Pubmed Help desk asking if they had an outage at 11:39:47 today. Hopefully they have a record they can use to confirm or deny that the problem was at their end.

I think Pubmed has an automatic failover to a backup site, just as we do for cancer.gov. I've also asked them about that and asked how long their failover takes.

I'll post the answers if I get any.

Comment entered 2015-08-20 17:55:42 by alan

As I think I mentioned in the status meeting, I did get an answer. The Help desk person was not aware of any outages. She asked for more information about it and I sent what we have.

Our interaction with Pubmed is through a front end "eutilities" webservice that, in turn, talks to the Pubmed database. It's possible that the error originated in Pubmed but in the eutilities service instead of the Pubmed search and retrieval system itself.

It will be reassuring if NLM is able to find that there was indeed an error today. However, I'm going to create another issue for improving the error management in the EBMS. The scope of it is a little larger than would seem to fit in just addressing these particular errors - which I still think are just transient errors from NLM.

Attachments
File Name Posted User
EBMS error Aug 4, 2015.docx 2015-08-04 16:11:42
screenshot-1.png 2015-08-20 11:43:24 Juthe, Robin (NIH/NCI) [E]

Elapsed: 0:00:00.000812