CDR Tickets

Issue Number 3218
Summary [TMS] Connecting CDR to World Server
Created 2010-09-09 11:50:22
Issue Type Improvement
Submitted By Juthe, Robin (NIH/NCI) [E]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2013-03-12 12:02:29
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.107546
Description

BZISSUE::4906
BZDATETIME::2010-09-09 11:50:22
BZCREATOR::Robin Juthe
BZASSIGNEE::Bob Kline
BZQACONTACT::Margaret Beckwith

Setting up an umbrella issue for starting the process of integrating World Server SDL translation software with the CDR. The first task will be to acquire the design document for review.

Bob, I'm not sure if this is the most appropriate component, so please revise if need be.

Comment entered 2010-09-09 12:01:08 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2010-09-09 12:01:08
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::1

I edited the title since we are not really talking of true integration! Integration is a politically loaded word around here. Also I would recommend using TMS as the abbreviation - Translation Management System.

Comment entered 2010-09-30 10:51:43 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-09-30 10:51:43
BZCOMMENTOR::Bob Kline
BZCOMMENT::2

[From email message sent to Mauricio 2010-09-28]

Thanks, Marucio. I have reviewed the document you copied for me. It appears to be a proposal in response to an RFP, rather than a design document. I have a few questions:

1. Could I have a copy of WS.A40 (Technical Architecture and Design)?
2. Are there really no deliverables for stage II (Solutions Design and Build)?
3. What stage has the process reached?
4. How will the PDF review output fit into the translation workflow?
5. Who will create the filters to generate that PDF?
6. Are you the project manager on the NCI side?
7. Who is the PM assigned by SDL?
8. Do you have a definition for the "Push-Poll" connector?
9. Where will the file system used by this connector be located?

My primary (only?) goal at this point is to learn the details of how markup in the original repository documents will be preserved when newly translated content is merged back into the repository.

Thanks!

Comment entered 2010-09-30 10:53:03 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-09-30 10:53:03
BZCOMMENTOR::Bob Kline
BZCOMMENT::3

[Email response from Mauricio 2010-09-28]

Hi Bob,

We have gotten that far yet. We are only at the data gathering process, pre-implementation. May be the document attached can answer some of your questions, but this is still a working document and may change a little bit. Right now, this is all the information we have.

Yes, I will be the PM for NCI and the main point of contact between NCI and the vendor. SDL has not assigned anybody yet because this process has not started.

Will keep you updated.

Thanks

Mauricio

Comment entered 2010-10-21 10:16:05 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-10-21 10:16:05
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::4

Added my name to the cc list

Comment entered 2011-04-08 12:45:55 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-04-08 12:45:55
BZCOMMENTOR::Bob Kline
BZCOMMENT::5

Margaret and I had a conference call with Pavel this morning to clarify how WorldServer's system works. It turns out that they don't expect us to give them the current version of the Spanish summary documents at all. The only thing they want is the current version of the English summary document, which they'll give back to us with the translatable parts translated, and which they expect us to drop in as a complete replacement version for the Spanish summary document, thus discarding any work the editors have ever done on that document outside the translation memory system.

The only possible use of the WorldServer system I can envision for our documents, given the limitations of that system, would be a process in which we give WS the latest translatable parts of the English document, the translators translate it in WS, and the results are manually pasted back into the Spanish document. It's hard to imagine that the manual reintegration effort would be made worthwhile by the translation work saved by WS.

If summary documents (not to mention the even more problematic structure of the glossary documents) were more like CTGovProtocol documents, in which we fence off most of the document as untouchable except by the import software, leaving a very tightly delimited sandbox in which PDQ editors can make changes which will be preserved by the import process, then we might want to consider a similar approach to this problem, but everything in the summary documents are fair game for editing, and translatable content is so tightly interwoven with non-translatable content that reintegration of a newly translated version into an existing document would be a dangerous nightmare.

We will meet with Mauricio to share our concerns.

Comment entered 2011-04-18 11:46:07 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-04-18 11:46:07
BZCOMMENTOR::Bob Kline
BZCOMMENT::6

We met with Mauricio and gave him an earful. He's reluctant to give up, so we agreed to try a pilot project with just a few summaries so we can get empirical data on how difficult it would be for the translators to lose the ability to work with the Spanish summaries in the CDR. Using WorldServer for glossary document translation is off the table.

Comment entered 2011-04-26 10:11:20 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-04-26 10:11:20
BZCOMMENTOR::Bob Kline
BZCOMMENT::7

Mauricio will set up a meeting with the technical folks at SDL to discuss next steps.

Comment entered 2011-09-01 12:30:15 by Beckwith, Margaret (NIH/NCI) [E]

BZDATETIME::2011-09-01 12:30:15
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::8

Just thought I would update this a bit:
1. May 25-26: SDL was onsite for a 2-day meeting to discuss implementation for CDR, Percussion, and Adhoc documents. Outcome was to break implementation into phases, where Phase I would end sometime in Feb. 2012. For CDR, Phase I entails doing alignment on 10 documents and authoring 2 new documents in Trados. Challenges with using the system to update existing summaries were not worked out.
2. Aug. 24-25: Training for users and admin. was given on use of Trados and Worldserver. The software has been installed on CIAT translator computers.
3. Sept. 1: There will be a follow-up telephone call today to talk about next steps.

Comment entered 2011-09-02 07:25:30 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-09-02 07:25:30
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::9

Linda would need the XML file for the Lung Cancer Screening HP summary - CDR 62832 to begin testing translations in SDL World Server and Trados next week or so. We also need to explore ways of getting the translated document back into the CDR.

Comment entered 2011-09-02 07:38:35 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-09-02 07:38:35
BZCOMMENTOR::Bob Kline
BZCOMMENT::10

Has the location where I'm supposed to place the file for pickup by Trados been established?

Comment entered 2011-09-06 09:26:57 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-09-06 09:26:57
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::11

(In reply to comment #10)
> Has the location where I'm supposed to place the file for pickup by Trados been
> established?

Not yet. Until we decide on a place, you can attach it to this issue, since it is only one file, or you can place it on the ftp server and we will download it from there.

Comment entered 2011-09-06 09:32:48 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-09-06 09:32:48
BZCOMMENTOR::Bob Kline
BZCOMMENT::12

Comment entered 2011-09-06 09:32:48 by Kline, Bob (NIH/NCI) [C]

Attachment 62832.xml has been added with description: Document for Linda to translate in standalone mode

Comment entered 2011-09-22 09:45:25 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-09-22 09:45:25
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::13

Linda needs the xml file for the Lung Cancer Screening summary - 258019 - for testing.

Comment entered 2011-09-22 10:33:35 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-09-22 10:33:35
BZCOMMENTOR::Bob Kline
BZCOMMENT::14

(In reply to comment #13)
> Linda needs the xml file for the Lung Cancer Screening summary - 258019 - for
> testing.

Any particular version, or should I just give her the current working document?

Comment entered 2011-09-22 10:43:39 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-09-22 10:43:39
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::15

(In reply to comment #14)
> (In reply to comment #13)
> > Linda needs the xml file for the Lung Cancer Screening summary - 258019 - for
> > testing.
>
> Any particular version, or should I just give her the current working document?

She wants the Last Publishable Version.

Comment entered 2011-09-23 11:42:34 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-09-23 11:42:34
BZCOMMENTOR::Bob Kline
BZCOMMENT::16

Comment entered 2011-09-23 11:42:34 by Kline, Bob (NIH/NCI) [C]

Attachment 258019v46.xml has been added with description: Another standalone test document for Linda

Comment entered 2011-09-27 15:20:06 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-09-27 15:20:06
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::17

Linda has access to the "Translation Management System" folder on the L:Drive. Path=
nci6116g.nci.nih.gov\group\OCE_CROSS\Translation Management System

Future requests for xml files should be placed in this folder.

Comment entered 2011-09-30 10:12:57 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-09-30 10:12:57
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::18

(In reply to comment #17)
> Linda has access to the "Translation Management System" folder on the L:Drive.
> Path=
nci6116g.nci.nih.gov\group\OCE_CROSS\Translation Management System
>
> Future requests for xml files should be placed in this folder.

Linda needs the xml file for 62765. Please place it in the above folder. Please expedite this request if you can. She needs it by this afternoon.

Comment entered 2011-09-30 11:09:54 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-09-30 11:09:54
BZCOMMENTOR::Bob Kline
BZCOMMENT::19

I created a subfolder called Documents and dropped the XML file there.

Comment entered 2011-09-30 12:24:03 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-09-30 12:24:03
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::20

(In reply to comment #19)
> I created a subfolder called Documents and dropped the XML file there.

Thank you. Linda is workding from home and for some reasons, she could not access the drive even though she logged in to VPN. Please attach the file to this issue.

Comment entered 2011-09-30 12:27:28 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-09-30 12:27:28
BZCOMMENTOR::Bob Kline
BZCOMMENT::21

Comment entered 2011-09-30 12:27:28 by Kline, Bob (NIH/NCI) [C]

Attachment CDR62765V165.xml has been added with description: Third test document

Comment entered 2011-09-30 12:28:58 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-09-30 12:28:58
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::22

(In reply to comment #21)
> Created attachment 2165 [details]
> Third test document

Thank you!

Comment entered 2011-10-06 11:55:42 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-10-06 11:55:42
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::23

We decided that instead of requesting individual summaries each time Linda needed one for testing, she would rather request between 5 to 10 of them at a time, which should be placed in the folder on the L drive and when she has exhausted them, she can request more.

Comment entered 2011-10-14 09:52:29 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-10-14 09:52:29
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::24

Linda needs the XML for 62751 for testing in WS. Please place it in the folder you created on the L drive.

Comment entered 2011-10-14 14:48:02 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-10-14 14:48:02
BZCOMMENTOR::Bob Kline
BZCOMMENT::25

(In reply to comment #24)
> Linda needs the XML for 62751 for testing in WS. Please place it in the folder
> you created on the L drive.

Done.

Comment entered 2011-10-21 15:52:15 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-10-21 15:52:15
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::26

Here are the elements Linda wanted stripped from the XML files:

Comments
Media Links
Section Type Metadata
Replacement for
Changes to this summary section
PDQ Key
SummaryFrag Refs ?
DateLast Modified
ComprehensiveREviewDate

Also, Bob will be loading some xml documents from World Server onto Franck for Linda to review.

Comment entered 2011-10-24 13:07:45 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-10-24 13:07:45
BZCOMMENTOR::Bob Kline
BZCOMMENT::27

(In reply to comment #26)

> Bob will be loading some xml documents from World Server onto Franck for
> Linda to review.

Why don't you look at them on Mahler before I create them on Franck, just in case changes to the software are needed.

http://mahler.nci.nih.gov/cgi-bin/cdr/show-cdr-doc.py?id=696910
http://mahler.nci.nih.gov/cgi-bin/cdr/show-cdr-doc.py?id=696911

(Or just bring up CDR606910 and CDR696911 in XMetaL.)

I noticed that the UTF-8 encoding was garbled for the trademark symbol in the SummaryUrl element for CDR696910. I checked the original English summary document, and the encoding is correct there, so it seems that the corruption occurred somewhere in the translation process, something which might need to be discussed with Trados.

Comment entered 2011-10-26 12:30:07 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-10-26 12:30:07
BZCOMMENTOR::Bob Kline
BZCOMMENT::28

(In reply to comment #26)
> Here are the elements Linda wanted stripped from the XML files:
> :
> :
> PDQ Key

So, strip the PDQKey elements, but not the PDQKey attributes?

> SummaryFrag Refs ?

What does the question mark mean here? Are the users still thinking about whether they want these stripped? Would it possibly be better to leave them in the documents, and have the software replace the cdr:href values which contain the CDR ID of the English document with a comparable value containing the CDR ID of the (possibly new) Spanish document? This only works, of course, if the cdr:id values of the English document are carried over intact into the Spanish documents, which I assume they would be.

Comment entered 2011-10-26 15:20:10 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-10-26 15:20:10
BZCOMMENTOR::Bob Kline
BZCOMMENT::29

I have created a web interface which Linda can use to retrieve an English summary document with the Comment, MediaLink, SectMetaData, ReplacementFor, PdqKey, DateLastModified, and ComprehensiveReviewDate elements stripped. She enters the CDR ID and optionally a version (which can be 'pub' to request the most recent publishable version). There are two options for output: the default is to display the document in the browser, and the other option lets her save the raw XML document to her computer, from which she can then pass it on to Trados. With this tool she can get documents for testing without waiting for me to fetch and process them. Here's the URL:

http://bach.nci.nih.gov/cgi-bin/cdr/get-english-summary.py

Comment entered 2011-10-27 12:59:36 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-10-27 12:59:36
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::30

(In reply to comment #28)
> (In reply to comment #26)
> > Here are the elements Linda wanted stripped from the XML files:
> > :
> > :
> > PDQ Key
>
> So, strip the PDQKey elements, but not the PDQKey attributes?
>
> > SummaryFrag Refs ?
>
> What does the question mark mean here? Are the users still thinking about
> whether they want these stripped? Would it possibly be better to leave them in

The question mark was to indicate whether it was possible to strip the SummaryFrag Refs.

> the documents, and have the software replace the cdr:href values which contain
> the CDR ID of the English document with a comparable value containing the CDR
> ID of the (possibly new) Spanish document? This only works, of course, if the
> cdr:id values of the English document are carried over intact into the Spanish
> documents, which I assume they would be.

The fragment ids in the Spanish document would be different from the fragment ids of the English version. Would it be possible to still do the replacement? (In reply to comment #29)

> for me to fetch and process them. Here's the URL:
>
> http://bach.nci.nih.gov/cgi-bin/cdr/get-english-summary.py
Thank you! This is very helpful. Linda has started using it already.
Could you remove the PDQBoardMember elements from the xml? Also strip all occurrences of MainTopics except the very first occurrence of it?

Comment entered 2011-10-28 09:18:42 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-10-28 09:18:42
BZCOMMENTOR::Bob Kline
BZCOMMENT::31

(In reply to comment #30)

> The fragment ids in the Spanish document would be different from the fragment
> ids of the English version. Would it be possible to still do the replacement?

We agreed in the meeting that you would investigate whether Linda might have been confused about the fragment IDs being altered by Trados.

> Could you remove the PDQBoardMember elements from the xml? Also strip all
> occurrences of MainTopics except the very first occurrence of it?

Done. Please give it a try.

Comment entered 2011-10-28 12:36:37 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-10-28 12:36:37
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::32

(In reply to comment #31)
> (In reply to comment #30)
>
> > The fragment ids in the Spanish document would be different from the fragment
> > ids of the English version. Would it be possible to still do the replacement?
>
> We agreed in the meeting that you would investigate whether Linda might have
> been confused about the fragment IDs being altered by Trados.
>
I spoke with Linda this morning and we looked at a few examples and it seems to me now that it would work the way you described in comment #28 and also explained yesterday in the meeting. In view of that please proceed to make the needed changes.

> > Could you remove the PDQBoardMember elements from the xml? Also strip all
> > occurrences of MainTopics except the very first occurrence of it?
>
> Done. Please give it a try.

All changes have been confirmed. Thank you!

Comment entered 2011-11-04 12:01:51 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-11-04 12:01:51
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::33

Linda has completed translating the attached document and wants it imported into Mahler for her to review.

Comment entered 2011-11-04 12:01:51 by Osei-Poku, William (NIH/NCI) [C]

Attachment assets.zip has been added with description: document to transfer to Mahler

Comment entered 2011-11-04 14:42:26 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-11-04 14:42:26
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::34

(In reply to comment #33)
> Created attachment 2180 [details]
> document to transfer to Mahler
>
> Linda has completed translating the attached document and wants it imported
> into Mahler for her to review.

Bob, is it possible to expedite this for Linda today?

Comment entered 2011-11-04 15:06:26 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-11-04 15:06:26
BZCOMMENTOR::Bob Kline
BZCOMMENT::35

(In reply to comment #34)
> (In reply to comment #33)
> > Created attachment 2180 [details]
> > document to transfer to Mahler
> >
> > Linda has completed translating the attached document and wants it imported
> > into Mahler for her to review.
>
> Bob, is it possible to expedite this for Linda today?

Imported as CDR696914. Very odd document; lots of the text content has disappeared.

Comment entered 2012-03-08 14:16:03 by Beckwith, Margaret (NIH/NCI) [E]

BZDATETIME::2012-03-08 14:16:03
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::36

Lowering priority until further notice.

Comment entered 2012-12-31 11:25:40 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2012-12-31 11:25:40
BZCOMMENTOR::Bob Kline
BZCOMMENT::37

For some reason Linda's not a user in Bugzilla. She has asked for the import of seven new Spanish summaries created in Trados. Bumping up priority so I can handle her request.

Comment entered 2012-12-31 11:29:05 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2012-12-31 11:29:05
BZCOMMENTOR::Bob Kline
BZCOMMENT::38

Comment entered 2012-12-31 11:29:05 by Kline, Bob (NIH/NCI) [C]

Attachment newpdqspanishsummariestranslatedinworldservertrados.zip has been added with description: XML files for new summaries to be created in the CDR

Comment entered 2012-12-31 11:31:09 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2012-12-31 11:31:09
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::39

(In reply to comment #37)
> For some reason Linda's not a user in Bugzilla.
A folder was created on the L:Drive for her to drop the XML files in it and then inform you about them. She is having a problem with the folder because she is temperorily using a loaner laptop that is why she emailed you the files. She should be sorted out soon and start using the folder on the L:Drive.

Comment entered 2012-12-31 11:35:17 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2012-12-31 11:35:17
BZCOMMENTOR::Bob Kline
BZCOMMENT::40

Documents have been imported on Franck. Please have Linda take a look and make sure they appear to have been created correctly before I add them to the production system.

CDR739849
CDR739850
CDR739851
CDR739852
CDR739853
CDR739854
CDR739855

Comment entered 2012-12-31 13:56:04 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2012-12-31 13:56:04
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::41

Linda reviewed them and they all look good. Please promote to Bach.

Comment entered 2012-12-31 14:04:39 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2012-12-31 14:04:39
BZCOMMENTOR::Bob Kline
BZCOMMENT::42

(In reply to comment #41)
> Linda reviewed them and they all look good. Please promote to Bach.

Done. CDR0000744468 through CDR0000744474.

Comment entered 2013-01-03 11:16:34 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2013-01-03 11:16:34
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::43

I have attached a new file to be imported/created in the CDR

Comment entered 2013-01-03 11:16:34 by Osei-Poku, William (NIH/NCI) [C]

Attachment Colorectal Cancer Screening - HP.xml has been added with description: XML file summary to be created in the CDR.

Comment entered 2013-01-03 12:26:08 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2013-01-03 12:26:08
BZCOMMENTOR::Bob Kline
BZCOMMENT::44

(In reply to comment #43)

> I have attached a new file to be imported/created in the CDR

Added as CDR0000744628.

Comment entered 2013-03-12 12:02:04 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2013-03-12 12:02:04
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::45

Closing this issue. I will enter a new issue if Linda needs a new file.

Marked as Resolved - Fixed

Comment entered 2013-03-12 12:02:29 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2013-03-12 12:02:29
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::46

Closed.

Attachments
File Name Posted User
258019v46.xml 2011-09-23 11:42:34
62832.xml 2011-09-06 09:32:48
assets.zip 2011-11-04 12:01:51 Osei-Poku, William (NIH/NCI) [C]
CDR62765V165.xml 2011-09-30 12:27:28
Colorectal Cancer Screening - HP.xml 2013-01-03 11:16:34 Osei-Poku, William (NIH/NCI) [C]
newpdqspanishsummariestranslatedinworldservertrados.zip 2012-12-31 11:29:05

Elapsed: 0:00:00.001753