CDR Tickets

Issue Number 4683
Summary [Summaries] Modify XML Utility for World Server Translation
Created 2019-10-24 14:06:05
Issue Type Improvement
Submitted By Osei-Poku, William (NIH/NCI) [C]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2020-04-17 11:20:58
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.251521
Description

Please modify the XML tool used for generating XML documents for translation in World Server to exclude text marked up with the "approved" Insertion and ignore markup with "approved" deletion.

Comment entered 2020-02-27 12:52:05 by Juthe, Robin (NIH/NCI) [E]

Need to check on how approved deleted text is being handled. This text still needs to be translated.

Comment entered 2020-03-19 15:01:49 by Kline, Bob (NIH/NCI) [C]

I assume we need to get an answer to Robin's implied question before we implement this modification. It seems possible that the answer might be that we shouldn't make this modification after all, because it would prevent us from translating the approved text.

Comment entered 2020-03-19 15:13:28 by Juthe, Robin (NIH/NCI) [E]

Right, this is something that  is checking on. While it makes sense to exclude text in approved insertion elements, we want to be sure that text that is in approved deletion markup still gets translated since it is published text.

Comment entered 2020-03-23 14:42:04 by Osei-Poku, William (NIH/NCI) [C]

I have modified the original request to include approved deletion markup as well even though this is rarely found in summaries.

Comment entered 2020-03-23 15:13:25 by Juthe, Robin (NIH/NCI) [E]

, are you sure this is what you intended? I think we want to include text that is within approved deletion markup (this is published text) and exclude text that is within approved insertion markup.

Comment entered 2020-03-23 15:19:05 by Kline, Bob (NIH/NCI) [C]

There are couple of things I don't understand here.

  1. Why would we not want to translate text which someone has proposed for insertion, if that proposal has been approved? Does "approved" mean something different from "we've decided we want this inserted text/element/whatever"?

  2. Why would we want to treat insertions and deletions the same way, when those two actions are the opposite of each other?

Comment entered 2020-03-23 15:36:51 by Kline, Bob (NIH/NCI) [C]

Really? If I propose that we add a paragraph, and you approve my proposal, why wouldn't we want that paragraph translated? Similarly, If I suggest "let's get rid of that other paragraph" and you approve my suggestion, why would we bother to translate the paragraph, if you've decided we're getting rid of it?

Comment entered 2020-03-23 15:43:47 by Osei-Poku, William (NIH/NCI) [C]

Sorry, that is not what I intended. That is my mistake. This is actually the current behavior of the program as I stated in OCECDR-4587(quoted below). Which is, markup with approved deletion is deleted and we want it to be ignored.

"However, we ran into another problem that has to to with "Approved" Insertion and Deletion markup in one of the documents on DEV CDR62932 version 399. The "Approved" Insertion markup text in the summary is included in the XML while the "Approved" Deletion text is deleted from the XML. Is this the expected behavior of the program? While users assign "Approved" revision level attributes to the text, they really not ready to publish yet and they should not be translated yet. "

Comment entered 2020-03-23 15:52:46 by Osei-Poku, William (NIH/NCI) [C]

1. We had a lengthy discussion about this when talking about OCECDR-4587. The short answer is that in certain cases, we don't follow the markup procedure the way it was intended. Essentially, the markup has to be accepted in the document for it be considered final.

2. That was a mistake on my part and I have corrected it. I did not go back to look at the implication of what I had edited. I apologize for the confusion.

Comment entered 2020-03-23 19:01:59 by Kline, Bob (NIH/NCI) [C]

OK. To word the requirements more precisely:

  1. For Insertion elements with a RevisionLevel attribute value of "publish" the Insertion markup will be removed, and the contents of those elements will be retained.

  2. All other Insertion elements will be discarded with their contents.

  3. Deletion elements with a RevisionLevel attribute value of "publish" will be discarded with their contents.

  4. For all other Deletion elements the Deletion markup will be removed, and the contents of those elements will be retained.

These rules are applied recursively. So, for example, in the following snippet:

<Insertion RevisionLevel="approved">
  <Para>
    We're throwing this away <Insertion RevisionLevel="publish">even though ....</Insertion>
  </Para>
</Insertion>

the entire paragraph and its Insertion wrapper will be discarded.

Comment entered 2020-03-24 08:00:19 by Kline, Bob (NIH/NCI) [C]

I have modified the original request to include approved deletion markup as well even though this is rarely found in summaries.

There are indeed fewer approved Deletion elements than approved Insertion element in summaries, but I would say it would be a mistake to characterize them as "rare." There are 2,219 approved Insertion elements and 1,896 approved Deletion elements in summaries on PROD. So that's over 85% as many Deletion elements as Insertion elements.

Comment entered 2020-03-24 12:13:07 by Osei-Poku, William (NIH/NCI) [C]

Would you mind sharing the numbers in the last published versions of summaries, which is typically the version used for generating the XML for world server?

Comment entered 2020-03-24 12:19:48 by Osei-Poku, William (NIH/NCI) [C]

Also, if you can provide some of the CDR IDs of the ones with approved deletions, that would be good. Thanks!

Comment entered 2020-03-24 14:13:39 by Kline, Bob (NIH/NCI) [C]
('Deletion', 'proposed') 1027
('Insertion', 'proposed') 1028
('Deletion', 'approved') 130
('Insertion', 'approved') 149
   1 CDR62729 version 108
  11 CDR62740 version 82
   9 CDR62781 version 49
  11 CDR62782 version 90
   1 CDR62785 version 50
   1 CDR62829 version 383
   9 CDR62903 version 223
   2 CDR62928 version 42
  21 CDR62938 version 116
   3 CDR62960 version 211
   5 CDR258102 version 200
   3 CDR258195 version 237
   2 CDR334406 version 11
   1 CDR350260 version 107
   3 CDR658500 version 67
   2 CDR668479 version 24
  32 CDR763423 version 26
   1 CDR780119 version 12
   1 CDR790949 version 15
   3 CDR790961 version 17
   2 CDR798740 version 21
   3 CDR798746 version 21
   2 CDR798749 version 16
   1 CDR799642 version 13

The first four rows have:

  • element name

  • RevisionLevel attribute value

  • number of occurrences in the current publishable summary versions

The remaining rows contain:

  • number of Deletion elements in the document with RevisionLevel of "approved"

  • document ID

  • number of the latest publishable version

As you can see, there's not much difference in the number of approved Deletion and Insertion elements. Certainly not enough to characterize one as "rare" compared with the other.

Comment entered 2020-03-24 15:20:58 by Osei-Poku, William (NIH/NCI) [C]

Thanks, Bob! I was thinking in terms of the approved deletion element at the summary level rather than the number of occurrences in each summary when I said they were rare. However, I do understand that even at 24 summaries (if that is the total number of affected summaries), that is a lot to call it rare. Well, at least they are not in the two thousands 😃. Thanks again for providing this stats.

Comment entered 2020-03-31 13:06:19 by Osei-Poku, William (NIH/NCI) [C]

Yes, this is what we expect. Thanks!

Comment entered 2020-04-17 11:20:58 by Kline, Bob (NIH/NCI) [C]

Installed on DEV.

Comment entered 2020-04-22 11:04:16 by Osei-Poku, William (NIH/NCI) [C]

Verified on DEV. Thanks!

Comment entered 2020-06-11 11:16:22 by Osei-Poku, William (NIH/NCI) [C]

Verified on QA. Thanks!

Comment entered 2020-07-15 09:36:00 by Kline, Bob (NIH/NCI) [C]

Working as expected on PROD?

Comment entered 2020-07-16 13:02:24 by Osei-Poku, William (NIH/NCI) [C]

We have not been able to test this fix on PROD yet. I am closing the ticket and will reopen if necessary.

Elapsed: 0:00:00.001821