CDR Tickets

Issue Number 3783
Summary Fix dropped trailing text attached to Comment elements in XML file generated for Spanish translation
Created 2014-07-14 13:54:29
Issue Type Improvement
Submitted By Osei-Poku, William (NIH/NCI) [C]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2014-09-25 15:22:01
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.134973
Description

Please exclude Comment elements from the XML files generated for Spanish translation in World Server. Linda reports that there are some inconsistencies in the way users use the Comment element and depending on whether they are placed in insertion tags or not, it creates translation problems in World server. Generally, when the comments are not placed in insertion tags, it appears part of the comment is truncated making it difficult to get an accurate translation in World Server. According to Linda, they can refer to all the comments in the English summary in XMetal.

Comment entered 2014-09-25 15:22:01 by Kline, Bob (NIH/NCI) [C]

I have implemented the logic to do this. If you give me the CDR ID (and optionally a version number) for a smaller summary which has comments, I generate an exported version without the Comment elements, and post it to this ticket.

Comment entered 2014-09-29 11:34:26 by Osei-Poku, William (NIH/NCI) [C]

Please use summary 62852, version 220. It is not a small summary but it does have the problem we're trying to resolve. Since this is being done on PROD, we are limited to existing true cases and they happen to be in bigger summaries.

Comment entered 2014-09-29 11:38:48 by Kline, Bob (NIH/NCI) [C]

It sounds like we'd get better use cases if I ran this on a lower tier. I can do that.

Comment entered 2014-09-29 11:51:31 by Osei-Poku, William (NIH/NCI) [C]

Yes, let's do it on a lower tier.

Comment entered 2014-09-29 12:02:11 by Kline, Bob (NIH/NCI) [C]

Sure; let me know when you're ready (and which tier and CDR ID).

Comment entered 2014-09-29 13:30:42 by Osei-Poku, William (NIH/NCI) [C]

These two summaries (269321 and 435963) are ready on QA.

Comment entered 2014-09-29 14:02:37 by Kline, Bob (NIH/NCI) [C]

Exported documents have been attached.

Comment entered 2014-09-30 14:34:13 by Osei-Poku, William (NIH/NCI) [C]

The comment elements are being excluded as we want. However, text following (now removed) comment elements continue to be truncated (as PROD) from the XML documents. This appears to happen only to comments that were inserted in text within a para element. comment elements outside of a para element appear to be okay.

Here are examples:

CDR 269321: Text is missing from XML file before the term "conditions"

<KeyPoint cdr:id="_16">Possible signs of thymoma and thymic carcinoma include a cough and chest pain. </KeyPoint> -<Para cdr:id="_17">Sometimes thymoma and thymic carcinoma do not cause <GlossaryTermRef cdr:href="CDR0000045022">symptoms</GlossaryTermRef>. The cancer may be found during a routine <GlossaryTermRef cdr:href="CDR0000304687">chest x-ray</GlossaryTermRef>. <GlossaryTermRef cdr:href="CDR0000651193">conditions</GlossaryTermRef>. Check with your doctor if you have any of the following problems:</Para>

CDR 435963: A sentence is missing from the XML file after the term white blood cells.

ListItem><StandardWording><Strong><GlossaryTermRef cdr:href="CDR0000046641">Urinalysis</GlossaryTermRef></Strong>: A test to check the color of urine and its contents, such as sugar, <GlossaryTermRef cdr:href="CDR0000046092">protein</GlossaryTermRef>, blood, and</StandardWording><GlossaryTermRef

Also, it appears that that there are certain elements like the MediaLink element as well as certain sections like the Changes to Summary section are included in the XML but they should be excluded.

Comment entered 2014-10-01 11:14:37 by Kline, Bob (NIH/NCI) [C]

See if these are better.

Comment entered 2014-10-01 17:31:20 by Osei-Poku, William (NIH/NCI) [C]

They are better. The truncation problem appears to have resolved in these new files. The following elements that are not translated are still in the file though, TranslationOf, Patient version of, MediaLink.

Comment entered 2014-10-02 09:25:23 by Kline, Bob (NIH/NCI) [C]

I think we've been working at cross purposes. I assumed that you wanted a new tool for getting the English summaries, one which only dropped the Comment elements, as the existing tool already dropped those elements, along with a bunch of others. Our travels down the wrong path haven't been completely worthless, though, as they highlighted the fact that the existing tool also dropped the trailing text attached to the Comment elements, and that the users don't want that to happen. I have modified the script on DEV to suppress dropping of the tail text. We can't install any changes on QA right now, because it's being used for a scan for the security fixes we just implemented, so you'll need to do your testing on DEV.

If you had mentioned https://cdr.cancer.gov/cgi-bin/cdr/get-english-summary.py (https://cdr.dev.cancer.gov/cgi-bin/cdr/get-english-summary.py on DEV) I might have realized you were asking for a tweak to the behavior of the existing tool (though the real request - unless I still misunderstand it - wouldn't be adding the exclusion of Comment elements, but rather tweaking that exclusion so it wasn't so aggressive).

This will need to wait for a release to get into production.

Are we on the same page now? 🙂

Comment entered 2014-10-02 10:56:20 by Osei-Poku, William (NIH/NCI) [C]

Sorry for the confusion as I also assumed that you would know I was talking about the existing tool since I did not specifically ask for a new tool for getting the xml files. I thought the inclusion of those elements (MediaLink etc) was because we moved testing from PROD to QA and didn't bother to mention that they were fine on PROD. I think we are on the same page now but it might be better for us to give the tool a name and place it on the Admin menu so that I can mention the name in future tickets so as to avoid any confusion.

Comment entered 2014-10-02 14:26:28 by Kline, Bob (NIH/NCI) [C]

William will get Linda to provide examples of documents exported from PROD with Comment elements still included, and he will attach them to this issue.

Comment entered 2014-10-03 10:13:56 by Osei-Poku, William (NIH/NCI) [C]

Linda could not confirm that the comment elements were included in previous files and they don't appear to be in the files so it looks like the only problem is the missing text after the comment elements.

Comment entered 2014-10-03 10:24:49 by Kline, Bob (NIH/NCI) [C]

Good. In that case, the version I've got on DEV should give her what she wants.

Comment entered 2014-10-09 11:45:28 by Osei-Poku, William (NIH/NCI) [C]

I updated the title to reflect the problem that was fixed.

Comment entered 2015-01-22 12:09:40 by Osei-Poku, William (NIH/NCI) [C]

Verified on QA.

Comment entered 2015-02-26 13:54:50 by Osei-Poku, William (NIH/NCI) [C]

Verified on PROD.

Elapsed: 0:00:00.001696