CDR Tickets

Issue Number 5014
Summary [XMetal] Unable to initialize DOM error
Created 2021-08-05 17:15:58
Issue Type Bug
Submitted By Osei-Poku, William (NIH/NCI) [C]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2023-01-12 09:05:00
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.295925
Description

At least two users have encountered a new error message in the CDR on PROD starting from yesterday. The latest one happened to Tana a few about an hour ago. The error message is 
"Unable to initialize DOM". 
 
The error happens when the user is clicks on the save icon and it prevents the user from proceeding further. The only option is to close CDR and login again. 
The error happened to two different users working on two different document at different times -  Carolyn and Tana. It happened to Carolyn yesterday and Tana today. 
I am including the screenshots of the error message and attempts to check for validation.

Comment entered 2021-08-05 17:39:04 by Kline, Bob (NIH/NCI) [C]

Pretty sure this is a duplicate of OCECDR-4991.

Comment entered 2021-08-05 18:45:50 by Osei-Poku, William (NIH/NCI) [C]

If that is the case, I can mark this as a duplicate. However, the reported cases in OCECDR-4991 happened at the time of logging in and usually, closing XMetal and logging back in resolved the problem, temporarily in some cases. Users didn't lose any data as a result of that.

In the two cases  reported in this ticket, they happened when the users had already made significant changes in the CDR but were unable to save the changes. In the first case (Carolyn) we were able to get all her changes back by following the process to restore the document. In the second case (Tana), we weren't able to restore her changes because the document appeared corrupted. This error appears to be more serious so it will be helpful to address it as quickly as possible.

Comment entered 2021-08-08 09:59:22 by Kline, Bob (NIH/NCI) [C]

The new DLL is on STAGE and PROD. Users need to close XMetaL and log back in to use the new DLL. Let me know if the problems reported here can be reproduced with the new DLL.

Comment entered 2021-08-11 14:39:51 by Osei-Poku, William (NIH/NCI) [C]

Thanks, Bob! No further issues have been reported since the hot-fix. We will continue to monitor and report if any issues come up.

Comment entered 2021-08-19 10:35:19 by Osei-Poku, William (NIH/NCI) [C]

Closing this ticket because no issues have been reported since the hot-fix. Thanks!

Comment entered 2021-08-24 16:56:19 by Osei-Poku, William (NIH/NCI) [C]

Looks like I closed this ticket too quickly. Carolyn experienced the same problem again this afternoon.

Comment entered 2021-08-25 11:37:44 by Osei-Poku, William (NIH/NCI) [C]

Hi Bob  Is this what you are looking for? 

Comment entered 2021-08-25 11:41:15 by Kline, Bob (NIH/NCI) [C]

Close enough. I asked for the file size in bytes, not kilobytes, but this will do.

Comment entered 2021-08-30 15:10:47 by Englisch, Volker (NIH/NCI) [C]

Looks like I closed this ticket too quickly. Carolyn experienced the same problem again this afternoon.

Is Carolyn the same person who reported the error message originally?  In the ticket description you're talking about "Carolyn and Tara".

Comment entered 2021-08-30 15:32:05 by Osei-Poku, William (NIH/NCI) [C]

Yes, Carolyn is one of only two users who have received this error message. Tana has not experienced this message since I last reported it but Carolyn has received it two times now including the most recent one this past Friday 8/27/2021. She said she saw the message reported in this ticket and the other bug reported in this ticket - OCECDR-4991. Unfortunately I was offline when these happened so I couldn't take screenshots. I have asked Carolyn to take screenshots if it happens again.

Comment entered 2021-08-30 15:49:48 by Englisch, Volker (NIH/NCI) [C]

Was Carolyn working on the same document when the two problems occurred?  If so I will want to use the same document for testing.

Comment entered 2021-08-30 16:31:22 by Kline, Bob (NIH/NCI) [C]

The only document she was working on the first day that the error occurred was CDR755559. She was working on the same document on the 24th (the day the ticket was reopened).

Comment entered 2021-08-30 17:37:05 by Osei-Poku, William (NIH/NCI) [C]

From Carolyn:

 

"The summaries I worked on Friday were:

62739

62736

755559

 I can’t remember which error appeared on which summary."

Comment entered 2021-09-07 19:08:54 by Osei-Poku, William (NIH/NCI) [C]

The other user - Tana Smith - experienced the same problem on Friday. So, the only two users who have experienced this problem have reported experiencing the problem again. This time around, Tana mentioned that the problem appears to happen when she had disconnected from VPN for a long time, while CDR is still open, came back after a few hours, connected to VPN and attempted to continue editing the already open CDR document. I have experimented with this for a relatively short period of time and I couldn't confirm the behavior as described by Tana.

Comment entered 2021-09-08 09:58:02 by Englisch, Volker (NIH/NCI) [C]

This is an interesting bit of information and I will try to recreate the situation later this evening by disconnecting from the VPN for a while.

Comment entered 2021-09-09 12:43:19 by Englisch, Volker (NIH/NCI) [C]

I tried to replicate the same situation and disconnecting from my VPN, wait for several hours, re-connect my VPN and then try to edit and safe a summary.  I did not experience the problem.  However, I don't think my test was truly simulating the situation of the other users.

I have XMetaL installed on my Windows DEV-VM.  In order to connect to my DEV-VM I have to connect my VPN.  When I disconnect my VPN I'm unable to connect to my DEV-VM but XMetaL running on that VM is still connected to the network.  I will have to install XMetaL on my Parallels desktop and try again.

Comment entered 2021-09-14 18:16:00 by Englisch, Volker (NIH/NCI) [C]

I installed XMetaL on my Parallels Windows 10 installation which I have running on my Mac.  I edited a summary document, turned off the VPN over night, continued to edit the summary the next morning and saved the changes.  In other words, I was unable to reproduce the reported issue.  It was worth a shot!

Comment entered 2021-11-04 12:34:47 by Osei-Poku, William (NIH/NCI) [C]

It may be time to close this ticket. It's been about 3 weeks or more since users who were affected by this error got their laptops. There have since not been any report of this problem again. Ning experienced this same problem (on an older laptop) but in her case, XMetal kept freezing on each save action after the initial DOM error. Reinstalling XMetal fixed the problem.

Comment entered 2021-11-30 17:43:44 by Osei-Poku, William (NIH/NCI) [C]

We had the first case of this error on a new laptop, and it is one of the users (Carolyn) who experienced this problem in the past on the old laptop. One thing I noticed was that her VPN was not "fully" connected as it had the following notification " Establishing VPN connection...." instead the usual "Connected to NIH VPN". XMetal froze in the process of trying to save her changes. That was after the "Unable to initialize DOM" error message. Restarting the her laptop and connecting to the VPN again and making sure that it was "fully" connected appear to have fixed the problem.

Comment entered 2022-01-20 11:49:25 by Osei-Poku, William (NIH/NCI) [C]

Since there have not been any new issues since most users got new laptops, we decided to lose this ticket and reopen it if a new issue is reported.

Comment entered 2022-04-06 16:40:54 by Osei-Poku, William (NIH/NCI) [C]

One CDR user (Carolyn) ran into the "Unable to Initialize DOM" error message this afternoon at about 12:30 PM on PROD. Prior to getting that error message, she ran into the "Invalid at the top level of the document " error message (OCECDR-4991).  Please note earlier comments above about same user experiencing these errors. 

Below is a copy of exchanges from out Teams chat:

[WILLIAM] Following up on the private chat I sent it - Essentially XMetal froze so we had to restart the User's laptop but it is getting interesting because the laptop is one of the newer ones. The first message the User got was along the lines of " XML document must have a top level element." Then she also got the DOM message after several tries. I did check the network connection to VPN and everything seemed OK. I could reach CDR from the browser without any issues. Luckily the user had not made a lot of changes so no significant work was lost and we really didn't try much to recover because the work she had done was minimal. 

[BOB]It may be that we need to harden the DLL software to survive intermittent VPN failures, by detecting the failure and re-trying..

[BOB] Can you tell me what the user was doing when it happened? Also, if the cdr-dll-trace.log hasn't been wiped out by a subsequent session, please post that. Let's do our primary communication through the Jira tickets. Neither of us pays as much attention to Teams as we do to the email messages sent by Jira, it seems.

If this happens again, grab and send me the cdr-dll-trace.log file from the user's XMetaL directory BEFORE starting a new session.

Comment entered 2022-04-06 16:43:43 by Osei-Poku, William (NIH/NCI) [C]


BOB] Can you tell me what the user was doing when it happened? Also, if the cdr-dll-trace.log hasn't been wiped out by a subsequent session, please post that. Let's do our primary communication through the Jira tickets. Neither of us pays as much attention to Teams as we do to the email messages sent by Jira, it seems.

The user had made a few edits in the summary 802836 on PROD and was attempting to save the changes. 


If this happens again, grab and send me the cdr-dll-trace.log file from the user's XMetaL directory BEFORE starting a new session.

Sure.

Comment entered 2022-05-05 16:29:26 by Osei-Poku, William (NIH/NCI) [C]

A user experienced this error message about 3:55 PM today. I am attaching the trace file. Besides the Unable to Initialize DOM error message. Other error messages did show up after the first message was cleared. I am attaching screenshots of the error messages. One of them appeared to indicate that there was not enough memory but I checked the system memory usage and it was at about 54% usage.  cdr-dll-trace.log

Comment entered 2022-05-12 12:15:30 by Osei-Poku, William (NIH/NCI) [C]

This same user (Mariana) experienced this problem again yesterday at about 6:15 PM.  This time around, there were no messages referencing insufficient memory.

Comment entered 2022-05-12 12:21:50 by Osei-Poku, William (NIH/NCI) [C]

Another user (Isabel) ran into what looks like a different problem but with the same results of not being able to save changes and CDR eventually crashing. This happened on Tuesday (PROD) at about 5:20 PM. The error message is below:

Comment entered 2022-05-12 12:44:22 by Kline, Bob (NIH/NCI) [C]

Again, we don't know for sure, but our best theory for the explanation of these failures is that security layers are interfering with correct behavior of the software. If we're lucky, the ticket(s) you have open with CBIIT for other XMetaL-related failures will uncover information which will shed light on all or at least some of the problems documented in this ticket. If not, I recommend that we add this ticket to Pauling, and we will add more debug logging to the DLL, which might slow things down a bit, but might provide more clues to the nature of the underlying causes.

Comment entered 2022-05-19 13:23:23 by Osei-Poku, William (NIH/NCI) [C]

Another user experienced this problem this week. I am attaching the dll trace file here. cdr-dll-trace.log

Comment entered 2022-05-19 14:41:28 by Kline, Bob (NIH/NCI) [C]

Thanks, William.

Comment entered 2022-09-01 08:01:24 by Osei-Poku, William (NIH/NCI) [C]

A few users have recently reported that they encountered this error while working in the CDR so it is still an ongoing issue. Thanks!

Comment entered 2022-09-09 15:03:34 by Englisch, Volker (NIH/NCI) [C]

reported another issue with XMetaL that may or may not be related to this ticket, but I'm adding a comment here anyway just in case. 

After working on her Breast Cancer HP summary (CDR62787) and trying to save she received the following error:

"XML document must have a top level element"

Trying to open a previous version of the document produced the following error:

"Failure sending command error"

When trying to save the document locally, XMetaL creashed.

Comment entered 2022-10-27 19:47:07 by Englisch, Volker (NIH/NCI) [C]

As discussed at today's status meeting, the next step for this ticket would be to add additional log information to the DLL with the hope that we may catch some additional information pointing us in the right direction to identify the problem.  I'm assigning the ticket to Bob, who generally handles DLL changes in the CDR family. 🙂

Comment entered 2022-11-04 07:30:59 by Kline, Bob (NIH/NCI) [C]

- I have added more trace logging to the DLL, particularly around the two actions recorded as the last operations performed for the two trace logs attached to the ticket (retrieving a document and launching the browser). This new version of the DLL is installed on CDR DEV. When we get around to testing, it will be important whenever one of these DOM error failures occurs that the trace log (which will still be in the user's XMetaL 9.0—or 17.0 after we upgrade—directory if XMetaL has crashed) is retrieved and posted to this ticket BEFORE XMetaL is launched again. You'll want to canvas your staff and get them to identify any actions which were being performed when one of these failures occurred, and stress test XMetaL by performing those actions repeatedly. If we're able to reproduce the failures on the lower tiers, but the trace logs still don't provide enough diagnostic information, we'll be able to easily add more logging around the places where the failures occurred. If we're not able to reproduce the failures on the lower tiers and they only show up when Pauling is in production, it will be more difficult (but still possible, with mini-releases) to add more logging.

Comment entered 2023-01-05 08:16:32 by Kline, Bob (NIH/NCI) [C]

Now that the macros have all been rewritten in Python, our logging coverage is much more extensive. All exceptions should be caught and logged now, so if the problems reported by this ticket persist, it should be easier to track them down and address them. Ideally, the switch from C++ and a binary DLL will have eliminated the failures. We'll see what happens during testing.

Comment entered 2023-01-12 09:05:00 by Kline, Bob (NIH/NCI) [C]

Moving this into the "Development Complete" column, as the development work is done, and what's left is testing.

Comment entered 2023-05-09 12:39:31 by Osei-Poku, William (NIH/NCI) [C]

We have not come across this error on the VM. Marking as QA verified.

Comment entered 2023-06-01 10:47:40 by Osei-Poku, William (NIH/NCI) [C]

This issue is difficult to test but so far, no one has reported this issue since the release. Closing ticket.

Attachments
File Name Posted User
CDRDLL.PNG 2021-08-25 11:37:18 Osei-Poku, William (NIH/NCI) [C]
cdr-dll-trace.log 2022-05-05 16:27:42 Osei-Poku, William (NIH/NCI) [C]
cdr-dll-trace-1.log 2022-05-19 13:23:21 Osei-Poku, William (NIH/NCI) [C]
Error.docx 2021-08-05 17:15:36 Osei-Poku, William (NIH/NCI) [C]
image-2022-05-12-12-21-10-845.png 2022-05-12 12:21:11 Osei-Poku, William (NIH/NCI) [C]
MicrosoftTeams-image.png 2022-05-05 16:29:09 Osei-Poku, William (NIH/NCI) [C]
MicrosoftTeams-image (1).png 2022-05-05 16:28:46 Osei-Poku, William (NIH/NCI) [C]

Elapsed: 0:00:00.001442