Issue Number | 3509 |
---|---|
Summary | CDR system Errors |
Created | 2012-05-15 12:26:12 |
Issue Type | Bug |
Submitted By | Osei-Poku, William (NIH/NCI) [C] |
Assigned To | alan |
Status | Closed |
Resolved | 2012-05-17 10:11:36 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.107837 |
BZISSUE::5204
BZDATETIME::2012-05-15 12:26:12
BZCREATOR::William Osei-Poku
BZASSIGNEE::Alan Meyer
BZQACONTACT::William Osei-Poku
We've been getting error messages while trying to run reports or performing other tasks in the CDR. It started with pub preview not working and producing errors such as these for mostly Treatment summaries:
"CDRPreview web service error: Xml data validation error,The 'Err' element is not declared.Validation error occurred when validating the instance document.,6,3"
QC reports - Mostly Bold/Underline reports initially produced the following error:
"Unexpected exception caught."
and later, the following:
"--------— Exception Report Date: Tue May 15 12:09:57 2012 ProcessID: 1664 ExceptionCode: c0000005 ExceptionFlags: 0 ExceptionAddr: 10002B8E NumberParams: 2 Exception: EXCEPTION_ACCESS_VIOLATION Access violation: Memory address 00000000 could not be written --------— Exception end
Also, while attaching audio files to CDR documents, we get the following error:
"Failure reading byte count from server"
The Media Caption and Content Report is also producing the following error message:
"Failure retrieving filtered doc for doc ID=732668
Error: Unexpected exception caught."
It appears this is a system-wide problem that is why I have combined all the error messages into this bug. Hopefully, the errors would be helpful in diagnosing the problem.
BZDATETIME::2012-05-15 15:24:12
BZCOMMENTOR::Alan Meyer
BZCOMMENT::1
It appears that at least part of this, and maybe all of it, is related to code that Volker has been working on. He's looking at now. I'll coordinate with him to help or take over depending on what he finds.
BZDATETIME::2012-05-15 15:24:59
BZCOMMENTOR::Alan Meyer
BZCOMMENT::2
I'm adding Volker and Bob to the cc list for this bug.
BZDATETIME::2012-05-15 15:37:52
BZCOMMENTOR::Volker Englisch
BZCOMMENT::3
Would you have a general idea of when the problems started and when it was still working correctly last time? I see you reported the problem originally around 11:30a
BZDATETIME::2012-05-15 15:40:01
BZCOMMENTOR::Volker Englisch
BZCOMMENT::4
Also, in terms of the reports not working, could you tell me if only summaries are affected as far as you know?
BZDATETIME::2012-05-15 15:57:17
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::5
(In reply to comment #3)
> Would you have a general idea of when the problems started and when
it was
> still working correctly last time? I see you reported the problem
originally
> around 11:30a
I got to know of the Media caption and content report error by 9:58 AM but that was pointing to a particular document so I investigated it for a while and then at about 10:05 AM I received another report of pub preview for a summary document not working. I investigated that for a while also and didn't find anything that could point to a possible user error. In terms of when it last worked successfully, I asked around and it looks like pub preview was working for all those who tried it this morning until just before 10:00 AM. So, generally it was working this morning before 10:00AM. It also looks like no one experienced problems with either QC reports or pub preview yesterday.
(In reply to comment #4)
> Also, in terms of the reports not working, could you tell me if
only summaries
> are affected as far as you know?
At least one media report is not working also. It is the Media Caption and Content report.
BZDATETIME::2012-05-15 16:18:29
BZCOMMENTOR::Volker Englisch
BZCOMMENT::6
Is it possible that all summaries that are failing are those that include an image? I've been able to track down the problem to the existence of a MediaLink element in the document.
BZDATETIME::2012-05-15 16:36:58
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::7
(In reply to comment #6)
> Is it possible that all summaries that are failing are those that
include an
> image? I've been able to track down the problem to the existence of
a
> MediaLink element in the document.
At least one of the sample documents I have does not contain any image as far as I can tell. CDR0000062870.
Also, I have started receiving reports about protocol documents failing validation and producing the following error:
"Unexpected exception caught()"
Samples - CDR0000579151 and CDR0000619334
It might not be related but it seems weird that all of these problems appear to be happening around the same time.
BZDATETIME::2012-05-15 17:24:53
BZCOMMENTOR::Alan Meyer
BZCOMMENT::8
Volker and I traced through part of this and it appears to be failing in an attempt to perform a "fast denormalization" filter. This is something used everywhere. It works fine on Mahler. The filters on Mahler and Bach are identical. The CdrServers on each have been running unchanged since March 2011. So it's all very puzzling.
I may try bouncing the CdrServer in a bit, before publishing starts at 6 pm, though I'm not hopeful that that will fix anything.
I'll avoid working on this while publishing is running (which might fail due to this problem) and try again afterward.
BZDATETIME::2012-05-15 17:38:00
BZCOMMENTOR::Alan Meyer
BZCOMMENT::9
Something misleading is happening. The fast denormalization filter runs fine on Bach through the CdrServer, so the error message we're getting is not caused by the filtering process itself - although that's what the error trace seems to indicate.
I'll keep quiet for a while now and keep trying to find out what's going on.
BZDATETIME::2012-05-15 19:35:12
BZCOMMENTOR::Alan Meyer
BZCOMMENT::10
Publishing failed tonight. The failure looks like something new, not like the problem that Volker worked on recently having to do with over speeding. The error messages in the log were unique. They don't occur anywhere else in the current uncompressed portion of the log file, which goes back to November 7, 2011.
The specific error message was:
"[Errno 10054] An existing connection was forcibly closed by the remote host"
It happened while filtering a protocol document and the remote host in this case was just the CdrServer on Bach ("remote" only from the point of view of the publishing program, not the hardware.) We never got as far as talking to Gatekeeper.
It's not the error message I would have expected if the problem were related to the one in this isssue, but I think that it's possible that it's the same.
I'm continuing to try to figure this out.
BZDATETIME::2012-05-15 19:55:58
BZCOMMENTOR::Alan Meyer
BZCOMMENT::11
I haven't figured it out but I may have fixed it.
I had decided not to stop and restart the CdrServer before publishing ran partly because I didn't believe that would really help and partly because I didn't want to change anything before nightly publishing. However after publishing crashed I stopped and restarted the Cdr service. I then ran the Media Caption and Content Report that reliably crashed every time I ran it before the restart. To my surprise, it now seems to run fine.
According to the CdrService log, the last full shutdown and restart of the server was on December 3, 2011, though individual server processes were started since then. I speculate that the service or the server was in a corrupt state, possibly some resource had been exhausted, or some deadlock existed, and just shutting everything down and restarting cleared the problem.
That's not a very satisfactory analysis. It's a little like slapping the side of the TV set to fix the broken picture (I know TVs don't work that way anymore but some of you may remember that traditional analog experience.) But that's all I've got, and most importantly, the system appears to be working again.
I talked to Blair, who was also here when publishing failed, and he said it's okay if we want to restart publishing tonight. However I'll leave that to Volker to decide since he knows how to do that and whether it's best to do it or just wait until tomorrow.
In the meantime, I'm hoping that CIAT will be working without errors tomorrow.
BZDATETIME::2012-05-15 20:46:06
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::12
(In reply to comment #11)
>
> In the meantime, I'm hoping that CIAT will be working without
errors tomorrow.
I was able to successfully run pup preview and QC reports for all the summaries that failed today. I was also able to validate the protocols that failed. However, I am still unable to run the Media Caption and Content Report. I didn't expect to encounter any problems with this report since it worked for you but I got the following error message:
"Failure retrieving filtered doc for doc ID=732714
Error: Invalid or expired session: None "
I will have someone else try running it tomorrow morning to see if
this is perculiar to me since it says something about invalid or
expeired session.
Thanks for taking care of this.
BZDATETIME::2012-05-15 20:59:48
BZCOMMENTOR::Alan Meyer
BZCOMMENT::13
(In reply to comment #12)
...
> "Failure retrieving filtered doc for doc ID=732714
> Error: Invalid or expired session: None "
>
> I will have someone else try running it tomorrow morning to see if
this is
> perculiar to me since it says something about invalid or expeired
session.
That's a different error from what we had and I'm hoping it's unrelated.
One way to create an error like that is to login to the CDR, duplicate the browser tab or window containing the logged in session, and log out of one of the tabs or windows. The session ID in the other window is the same as the one that was logged out and is therefore invalid.
That sounds like something that no one would do, but I've done it more than once when I was testing something and had two tabs open in order to see the effects of a change in one tab as compared to something in the other.
The report still runs fine for me - unless I do something like the above, in which case it fails with the same error you got.
I suggest logging out of any of your open sessions, then logging in fresh and trying again.
BZDATETIME::2012-05-15 22:41:16
BZCOMMENTOR::Volker Englisch
BZCOMMENT::14
I am able to run the Media Caption report as well and I've been able
to run the RS and BU reports.
It looks that bouncing the server was all that needed to be done.
BZDATETIME::2012-05-17 10:10:16
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::15
(In reply to comment #13)
> (In reply to comment #12)
> ...
> > "Failure retrieving filtered doc for doc ID=732714
> > Error: Invalid or expired session: None "
> >
> > I will have someone else try running it tomorrow morning to
see if this is
> > perculiar to me since it says something about invalid or
expeired session.
>
> That's a different error from what we had and I'm hoping it's
unrelated.
>
> One way to create an error like that is to login to the CDR,
duplicate the
> browser tab or window containing the logged in session, and log out
of one of
> the tabs or windows. The session ID in the other window is the same
as the one
> that was logged out and is therefore invalid.
>
> That sounds like something that no one would do, but I've done it
more than
> once when I was testing something and had two tabs open in order to
see the
> effects of a change in one tab as compared to something in the
other.
>
> The report still runs fine for me - unless I do something like the
above, in
> which case it fails with the same error you got.
>
> I suggest logging out of any of your open sessions, then logging in
fresh and
> trying again.
Some of the CDR reports don't run in my installation of IE so I copy
the URL to FF to run them from there. But what I do is, I copy the
starting URL of the report menu to FF and navigate to the specific
report I need and in most cases, it works but occasionally, like in this
case, it fails.
By the way, everything has been working well since yesterday so we are
back in business.
I am marking this issue as resolved.
BZDATETIME::2012-05-17 10:11:36
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::16
(In reply to comment #15)
> I am marking this issue as resolved.
Closing issue. Thanks!
Elapsed: 0:00:00.000338