Issue Number | 3783 |
---|---|
Summary | Fix dropped trailing text attached to Comment elements in XML file generated for Spanish translation |
Created | 2014-07-14 13:54:29 |
Issue Type | Improvement |
Submitted By | Osei-Poku, William (NIH/NCI) [C] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2014-09-25 15:22:01 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.134973 |
Please exclude Comment elements from the XML files generated for Spanish translation in World Server. Linda reports that there are some inconsistencies in the way users use the Comment element and depending on whether they are placed in insertion tags or not, it creates translation problems in World server. Generally, when the comments are not placed in insertion tags, it appears part of the comment is truncated making it difficult to get an accurate translation in World Server. According to Linda, they can refer to all the comments in the English summary in XMetal.
I have implemented the logic to do this. If you give me the CDR ID (and optionally a version number) for a smaller summary which has comments, I generate an exported version without the Comment elements, and post it to this ticket.
Please use summary 62852, version 220. It is not a small summary but it does have the problem we're trying to resolve. Since this is being done on PROD, we are limited to existing true cases and they happen to be in bigger summaries.
It sounds like we'd get better use cases if I ran this on a lower tier. I can do that.
Yes, let's do it on a lower tier.
Sure; let me know when you're ready (and which tier and CDR ID).
These two summaries (269321 and 435963) are ready on QA.
Exported documents have been attached.
The comment elements are being excluded as we want. However, text following (now removed) comment elements continue to be truncated (as PROD) from the XML documents. This appears to happen only to comments that were inserted in text within a para element. comment elements outside of a para element appear to be okay.
Here are examples:
CDR 269321: Text is missing from XML file before the term "conditions"
<KeyPoint cdr:id="_16">Possible signs of thymoma and thymic carcinoma include a cough and chest pain. </KeyPoint> -<Para cdr:id="_17">Sometimes thymoma and thymic carcinoma do not cause <GlossaryTermRef cdr:href="CDR0000045022">symptoms</GlossaryTermRef>. The cancer may be found during a routine <GlossaryTermRef cdr:href="CDR0000304687">chest x-ray</GlossaryTermRef>. <GlossaryTermRef cdr:href="CDR0000651193">conditions</GlossaryTermRef>. Check with your doctor if you have any of the following problems:</Para>
CDR 435963: A sentence is missing from the XML file after the term white blood cells.
ListItem><StandardWording><Strong><GlossaryTermRef cdr:href="CDR0000046641">Urinalysis</GlossaryTermRef></Strong>: A test to check the color of urine and its contents, such as sugar, <GlossaryTermRef cdr:href="CDR0000046092">protein</GlossaryTermRef>, blood, and</StandardWording><GlossaryTermRef
Also, it appears that that there are certain elements like the MediaLink element as well as certain sections like the Changes to Summary section are included in the XML but they should be excluded.
See if these are better.
They are better. The truncation problem appears to have resolved in these new files. The following elements that are not translated are still in the file though, TranslationOf, Patient version of, MediaLink.
I think we've been working at cross purposes. I assumed that you wanted a new tool for getting the English summaries, one which only dropped the Comment elements, as the existing tool already dropped those elements, along with a bunch of others. Our travels down the wrong path haven't been completely worthless, though, as they highlighted the fact that the existing tool also dropped the trailing text attached to the Comment elements, and that the users don't want that to happen. I have modified the script on DEV to suppress dropping of the tail text. We can't install any changes on QA right now, because it's being used for a scan for the security fixes we just implemented, so you'll need to do your testing on DEV.
If you had mentioned https://cdr.cancer.gov/cgi-bin/cdr/get-english-summary.py (https://cdr.dev.cancer.gov/cgi-bin/cdr/get-english-summary.py on DEV) I might have realized you were asking for a tweak to the behavior of the existing tool (though the real request - unless I still misunderstand it - wouldn't be adding the exclusion of Comment elements, but rather tweaking that exclusion so it wasn't so aggressive).
This will need to wait for a release to get into production.
Are we on the same page now? 🙂
Sorry for the confusion as I also assumed that you would know I was talking about the existing tool since I did not specifically ask for a new tool for getting the xml files. I thought the inclusion of those elements (MediaLink etc) was because we moved testing from PROD to QA and didn't bother to mention that they were fine on PROD. I think we are on the same page now but it might be better for us to give the tool a name and place it on the Admin menu so that I can mention the name in future tickets so as to avoid any confusion.
William will get Linda to provide examples of documents exported from PROD with Comment elements still included, and he will attach them to this issue.
Linda could not confirm that the comment elements were included in previous files and they don't appear to be in the files so it looks like the only problem is the missing text after the comment elements.
Good. In that case, the version I've got on DEV should give her what she wants.
I updated the title to reflect the problem that was fixed.
Verified on QA.
Verified on PROD.
Elapsed: 0:00:00.001696