CDR Tickets

Issue Number 3176
Summary [HP Summary Section] Global to populate the PurposeText element
Created 2010-06-11 16:02:01
Issue Type Improvement
Submitted By Beckwith, Margaret (NIH/NCI) [E]
Assigned To alan
Status Closed
Resolved 2010-08-30 11:22:28
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.107504
Description

BZISSUE::4863
BZDATETIME::2010-06-11 16:02:01
BZCREATOR::Margaret Beckwith
BZASSIGNEE::Alan Meyer
BZQACONTACT::William Osei-Poku

Every HP summary is having an element added to the metadata called PurposeText. This will be used to fill in text in the first paragraph of the new section. The text in the paragraph says:

"...evidence-based information about [Placeholder]."

Since all treatment summaries will have the same text:

"...information about the treatment of X cancer." where X comes from the main topic of the summary, it seems that we could plug that in to each of the summaries programmatically.

We could even do the same thing for the rest of the summaries and then CIAT or the Board Manager would just need to go in and edit the text as opposed to having to add the element and type everything in. Or perhaps we can come up with text for each type of summary that would be more appropriate. I will work on that.

Comment entered 2010-06-17 23:02:23 by alan

BZDATETIME::2010-06-17 23:02:23
BZCOMMENTOR::Alan Meyer
BZCOMMENT::1

It was decided at our status meeting today that Margaret would
define PurposeText prefix strings for each of the SummaryType
values:

Treatment
Supportive care
Screening
Prevention
Genetics
Complementary and alternative medicine

Comment entered 2010-06-17 23:24:35 by alan

BZDATETIME::2010-06-17 23:24:35
BZCOMMENTOR::Alan Meyer
BZCOMMENT::2

Here are some more issues in this global change:

1. SummaryTitle suffix deletions.

We'll also need to massage the SummaryTitle a bit to fit it in
to the PurposeText. For example, the SummaryTitle for Adult
Hodgkin Lymphoma is:

"Adult Hodgkin Lymphoma Treatment"

I assume we need to drop the " Treatment" and prepend a string
to produce something like:

"the treatment of Adult Hodgkin Lymphoma"

Other words like "Treatment" that might need to be dropped
include:

"Screening"
"Prevention"

2. Blocked documents?

Should we modify blocked Summaries? I should think we should.
I don't see any harm in it. If so, I'll need to delete the
string "BLOCKED" from the SummaryTitle

3. Spanish.

Finally (for this comment anyway) it looks like we'll need to
have Spanish versions of the prefix strings and parts to drop
from SummaryTitles. Is that right? Or are we leaving the
Spanish alone?

If we modify the Spanish, in addition to prefix strings we'll
also need any deletion suffixes, like ": Tratamiento", and
perhaps others.

Comment entered 2010-06-17 23:43:42 by alan

BZDATETIME::2010-06-17 23:43:42
BZCOMMENTOR::Alan Meyer
BZCOMMENT::3

(In reply to comment #2)
> ... If so, I'll need to delete the
> string "BLOCKED" from the SummaryTitle

Just to be clear. I won't delete the string from the
actual SummaryTitle. I'll only delete it from the copy
of the SummaryTitle that is placed into the PurposeText.

Comment entered 2010-06-18 09:01:36 by Beckwith, Margaret (NIH/NCI) [E]

BZDATETIME::2010-06-18 09:01:36
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::4

(In reply to comment #2)
> Here are some more issues in this global change:
> 1. SummaryTitle suffix deletions.
> We'll also need to massage the SummaryTitle a bit to fit it in
> to the PurposeText. For example, the SummaryTitle for Adult
> Hodgkin Lymphoma is:
> "Adult Hodgkin Lymphoma Treatment"
> I assume we need to drop the " Treatment" and prepend a string
> to produce something like:
> "the treatment of Adult Hodgkin Lymphoma"
> Other words like "Treatment" that might need to be dropped
> include:
> "Screening"
> "Prevention"

Actually, I was thinking that we would use the main topic (metadata terminology link) to fill in the name of the cancer, not the title. So the global would put in "the treatment of" and then append the main topic of "bladder cancer".

> 2. Blocked documents?
> Should we modify blocked Summaries? I should think we should.
> I don't see any harm in it. If so, I'll need to delete the
> string "BLOCKED" from the SummaryTitle

It is fine to modify the blocked documents, but again, I wasn't thinking we would use the title, so the issue with the string BLOCKED doesn't matter if we go with the main topic.

> 3. Spanish.
> Finally (for this comment anyway) it looks like we'll need to
> have Spanish versions of the prefix strings and parts to drop
> from SummaryTitles. Is that right? Or are we leaving the
> Spanish alone?
> If we modify the Spanish, in addition to prefix strings we'll
> also need any deletion suffixes, like ": Tratamiento", and
> perhaps others.

I talked to Linda Saucedo about the Spanish and we pretty much decided that they will need to go into every summary and put in the text. I could ask her to provide some common text for each summary type to at least get them started, but just having the element put in would be a big help.

Comment entered 2010-06-18 11:30:59 by Beckwith, Margaret (NIH/NCI) [E]

BZDATETIME::2010-06-18 11:30:59
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::5

Here is what I came up with for the text to populate the PurposeText element for each type of summary:

Treatment: ...the treatment of X. (where X= main topic)

Genetics: ...the genetics of X.

Screening: ...the screening of X.

Prevention:...the prevention of X.

CAM: ...the use of X as a treatment for people with cancer.

Supportive Care: NO good way to do this.

I am actually thinking that the best way to do this might be to create a spreadsheet with each summary and the PurposeText text, and then just populate the element from the spreadsheet. What do you think?

Comment entered 2010-06-18 11:57:25 by Beckwith, Margaret (NIH/NCI) [E]

BZDATETIME::2010-06-18 11:57:25
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::6

I am creating a spreadsheet for all of the summaries so forget about the text patterns I put in earlier. This way all of the Board Managers can check the text before it goes into the summaries and Linda can create the Spanish text for all of the Spanish summaries.

Comment entered 2010-06-18 13:11:03 by alan

BZDATETIME::2010-06-18 13:11:03
BZCOMMENTOR::Alan Meyer
BZCOMMENT::7

(In reply to comment #6)
> I am creating a spreadsheet for all of the summaries so forget about the text
> patterns I put in earlier. This way all of the Board Managers can check the
> text before it goes into the summaries and Linda can create the Spanish text
> for all of the Spanish summaries.

That sounds like an excellent idea. It enables the Board Managers
to review every single string in a single document instead of
having to review hundreds of documents separately. Since the
context is the same for all of them (the boiler plate that Volker
will insert at publishing time), there's no need to look at the
individual documents.

I presume we'll have something like:

1 row per document with two columns:

CDR ID | PurposeText

Comment entered 2010-06-18 15:09:58 by Beckwith, Margaret (NIH/NCI) [E]

BZDATETIME::2010-06-18 15:09:58
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::8

Right now the spreadsheet has 3 columns: CDRID, Summary Title, and PurposeText.

Comment entered 2010-07-07 14:49:50 by Beckwith, Margaret (NIH/NCI) [E]

BZDATETIME::2010-07-07 14:49:50
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::9

PurposeText spreadsheet: Workbook 1 is English, Workbook 2 is Spanish. Columns are CDRID, summary title, and PurposeText.

Comment entered 2010-07-07 14:49:50 by Beckwith, Margaret (NIH/NCI) [E]

Attachment PurposeTextFINALEngSpan.xls has been added with description: PurposeText spreadsheet

Comment entered 2010-07-21 15:51:46 by Beckwith, Margaret (NIH/NCI) [E]

BZDATETIME::2010-07-21 15:51:46
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::10

Raising the priority on this so we can possibly get it done next week.

Comment entered 2010-07-21 16:51:37 by alan

BZDATETIME::2010-07-21 16:51:37
BZCOMMENTOR::Alan Meyer
BZCOMMENT::11

(In reply to comment #10)
> Raising the priority on this so we can possibly get it done next week.

Okay. I finished a first draft of the program last night and got a clean compile but haven't yet done any testing. I'll make it a priority tomorrow.

There were some irregularities in the spreadsheets - occasional blank lines and a little different format between the English (1 row per document) and Spanish (2 rows per doc) that required a little extra programming, but I've got that done. I'll see if I can get everything tested tomorrow.

Comment entered 2010-07-22 17:24:28 by alan

BZDATETIME::2010-07-22 17:24:28
BZCOMMENTOR::Alan Meyer
BZCOMMENT::12

The last entry in the Spanish text had the PurposeText in the wrong row. I fixed that. If anyone needs to edit the text, please use this version in order to include this fix.

I also noticed that all of the English texts have periods (".") at the end, but the Spanish do not. Was this by design, or did it just happen because two different people prepared the data? If desired, I can change the program so that they are all the same - all with periods or all without, if that is appropriate.

Comment entered 2010-07-22 17:24:28 by alan

Attachment Request4863.xls has been added with description: PurposeText spreadsheet with correction

Comment entered 2010-07-22 17:35:15 by alan

BZDATETIME::2010-07-22 17:35:15
BZCOMMENTOR::Alan Meyer
BZCOMMENT::13

I've finished the program and run it in test mode on Mahler. The
results are in two separate places:

English results:
http://mahler.nci.nih.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2010-07-22_17-09-41

Spanish results:
http://mahler.nci.nih.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2010-07-22_16-54-42

In both cases, I only transformed documents that did not have "NO
SECTION" or some case variant thereof, in the PurposeText column.

I have attached the log files for the two runs as a single file.
The first group is Spanish, the next English. There were no
errors except a few of the usual locked documents.

The next step might be to run in live mode on Mahler. If QA on
the test mode run is done before I am in again on Tuesday, I can
run this from home in order to move the testing forward more
quickly.

Comment entered 2010-07-22 17:35:15 by alan

Attachment Request4863.log has been added with description: Log file from two test runs (Spanish + English) on Mahler

Comment entered 2010-07-28 14:26:13 by Beckwith, Margaret (NIH/NCI) [E]

BZDATETIME::2010-07-28 14:26:13
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::14

I added a new Supportive Care summary called Family Caregivers to the spreadsheet. I tried to make the change to Alan's attachment since he had made an additional change to the Spanish, but when I saved it the change didn't show up. Sorry Alan, you will either need to remake your change to the Spanish in my attachment or add the extra English summary to yours!

Comment entered 2010-07-28 14:26:13 by Beckwith, Margaret (NIH/NCI) [E]

Attachment PurposeTextFINALEngSpan.xls has been added with description: Updated spreadsheet

Comment entered 2010-07-28 14:32:29 by Beckwith, Margaret (NIH/NCI) [E]

BZDATETIME::2010-07-28 14:32:29
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::15

I looked at a bunch of these, and I think it is fine to run it in live mode on Mahler. At that point, William, could you ask Linda to take a look at the Spanish?

Comment entered 2010-07-28 16:36:39 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-07-28 16:36:39
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::16

(In reply to comment #15)
> I looked at a bunch of these, and I think it is fine to run it in live mode on
> Mahler. At that point, William, could you ask Linda to take a look at the
> Spanish?

Sure - will do.

Comment entered 2010-07-29 15:01:38 by Beckwith, Margaret (NIH/NCI) [E]

BZDATETIME::2010-07-29 15:01:38
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::17

Added Robin H as a cc.

Comment entered 2010-07-29 22:25:59 by alan

BZDATETIME::2010-07-29 22:25:59
BZCOMMENTOR::Alan Meyer
BZCOMMENT::18

Here's the log file for the live run for the English docs on Mahler.

After running it, I remembered that Margaret added another document
to the spreadsheet. It's not included, but will be included when
I do the test run on Bach.

Comment entered 2010-07-29 22:25:59 by alan

Attachment Request4863English.log has been added with description: Log file from live run on Mahler - English only

Comment entered 2010-07-29 22:26:53 by alan

BZDATETIME::2010-07-29 22:26:53
BZCOMMENTOR::Alan Meyer
BZCOMMENT::19

Log file for live run on Mahler, Spanish documents only.

Comment entered 2010-07-29 22:26:53 by alan

Attachment Request4863Spanish.log has been added with description: Log file from live run on Mahler - Spanish only

Comment entered 2010-07-29 23:20:32 by alan

BZDATETIME::2010-07-29 23:20:32
BZCOMMENTOR::Alan Meyer
BZCOMMENT::20

Test run on Bach, English only.

Comment entered 2010-07-29 23:20:32 by alan

Attachment Request4863English.log has been added with description: Log file for test run on Bach - English only

Comment entered 2010-07-29 23:21:39 by alan

BZDATETIME::2010-07-29 23:21:39
BZCOMMENTOR::Alan Meyer
BZCOMMENT::21

Test run on Bach, Spanish only.

Comment entered 2010-07-29 23:21:39 by alan

Attachment Request4863Spanish.log has been added with description: Log file for test run on Bach - Spanish only

Comment entered 2010-07-30 13:54:32 by Juthe, Robin (NIH/NCI) [E]

BZDATETIME::2010-07-30 13:54:32
BZCOMMENTOR::Robin Juthe
BZCOMMENT::22

I verified the new PurposeText in several summaries on Mahler (in English).

The English test run on Bach looks good, too.

Comment entered 2010-07-30 14:05:44 by Juthe, Robin (NIH/NCI) [E]

BZDATETIME::2010-07-30 14:05:44
BZCOMMENTOR::Robin Juthe
BZCOMMENT::23

(In reply to comment #12)
> I also noticed that all of the English texts have periods (".") at the end, but
> the Spanish do not. Was this by design, or did it just happen because two
> different people prepared the data? If desired, I can change the program so
> that they are all the same - all with periods or all without, if that is
> appropriate.

I don't know whether this question has been answered, but I think periods should be added to the Spanish entries such that they are all the same. (I have not seen the Spanish translation of the Purpose paragraph, but I am thinking this text would also come at the end of the sentence in Spanish.) William, could you please ask Linda to verify this? Thanks.

Comment entered 2010-08-03 13:38:47 by alan

BZDATETIME::2010-08-03 13:38:47
BZCOMMENTOR::Alan Meyer
BZCOMMENT::24

(In reply to comment #23)
> (In reply to comment #12)
> > I also noticed that all of the English texts have periods (".") at the end, but
> > the Spanish do not. Was this by design, or did it just happen because two
> > different people prepared the data? If desired, I can change the program so
> > that they are all the same - all with periods or all without, if that is
> > appropriate.
>
> I don't know whether this question has been answered, but I think periods
> should be added to the Spanish entries such that they are all the same. (I have
> not seen the Spanish translation of the Purpose paragraph, but I am thinking
> this text would also come at the end of the sentence in Spanish.) William,
> could you please ask Linda to verify this? Thanks.

I can do that very easily in the program.

In thinking about it, maybe we should take off the periods from the English instead. The rationale for that is that the PurposeText is a phrase, not a complete sentence. It gets put into a complete sentence by the publishing filters. In the admittedly unlikely event that some suffix could be added as well as a prefix, it might make sense for a publishing filter to be responsible for the final punctuation since it is the agent that knows what the final disposition of the phrase is.

Comment entered 2010-08-03 13:43:28 by Juthe, Robin (NIH/NCI) [E]

BZDATETIME::2010-08-03 13:43:28
BZCOMMENTOR::Robin Juthe
BZCOMMENT::25

I think that's a good solution, Alan. Let's keep the punctuation outside of the placeholder text in both languages. Thanks.

Comment entered 2010-08-03 17:14:30 by alan

BZDATETIME::2010-08-03 17:14:30
BZCOMMENTOR::Alan Meyer
BZCOMMENT::26

(In reply to comment #25)
> I think that's a good solution, Alan. Let's keep the punctuation outside of the
> placeholder text in both languages. Thanks.

This is done. I tested on Bach with a sample of 5 records in English and 5 in Spanish.

I think we're ready to run this live whenever we wish. I'd like Volker's confirmation of this, but I think this part doesn't require coordination with Volker's task since adding the PurposeText shouldn't affect anything in publishing until Volker promotes the changes he made to the filters.

Comment entered 2010-08-06 16:32:59 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-08-06 16:32:59
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::27

(In reply to comment #16)
> (In reply to comment #15)
> > I looked at a bunch of these, and I think it is fine to run it in live mode on
> > Mahler. At that point, William, could you ask Linda to take a look at the
> > Spanish?
>
> Sure - will do.

Linda looked at the (In reply to comment #19)
> Created attachment 1967 [details]
> Log file from live run on Mahler - Spanish only
>
> Log file for live run on Mahler, Spanish documents only.

Linda looked at the live run on Mahler and she said everything looked good.

Comment entered 2010-08-10 11:00:40 by alan

BZDATETIME::2010-08-10 11:00:40
BZCOMMENTOR::Alan Meyer
BZCOMMENT::28

Volker and I have discussed whether it is practical to run this global change first, without waiting for other changes.

We think it is. However, as I test, I will run the global change in live mode on Franck. Volker will run publishing jobs before and after to be sure that the change did not affect anything.

Comment entered 2010-08-10 11:05:19 by alan

BZDATETIME::2010-08-10 11:05:19
BZCOMMENTOR::Alan Meyer
BZCOMMENT::29

(In reply to comment #28)
> ... as I test ...
Should be " ... as a test ... "

Comment entered 2010-08-10 15:15:27 by alan

BZDATETIME::2010-08-10 15:15:27
BZCOMMENTOR::Alan Meyer
BZCOMMENT::30

The test on Franck was successful. Adding the PurposeText element had no effect on current publishing. We can do this on Bach whenever we wish.

Unless there are some changes to the spreadsheets listing English and Spanish texts, I suggest we go ahead and run this in live mode on Bach.

I can do that tonight if desired, or whenever I'm authorized to do it.

Comment entered 2010-08-10 17:58:17 by Juthe, Robin (NIH/NCI) [E]

BZDATETIME::2010-08-10 17:58:17
BZCOMMENTOR::Robin Juthe
BZCOMMENT::31

I tried to verify that everything looked okay in a few summaries on Franck, but I received an error message that the PurposeText element was not allowed by the DTD.

Comment entered 2010-08-11 00:25:12 by alan

BZDATETIME::2010-08-11 00:25:12
BZCOMMENTOR::Alan Meyer
BZCOMMENT::32

(In reply to comment #31)
> I tried to verify that everything looked okay in a few summaries on Franck, but
> I received an error message that the PurposeText element was not allowed by the
> DTD.

There's a program that generates a new DTD from any changed XML schemas. Since the version of XMetal we used doesn't support schemas, we use that program to generate a DTD from our schemas.

The program had not been run for a while on Franck (we rarely use XMetal with Franck) and so XMetal didn't know about the PurposeText element in the modified Summary schema. I ran it, and everything should now look fine in XMetal on Franck. If anyone logged into Franck with XMetal today and didn't log out, they'll have to logout and login again to see the change.

Comment entered 2010-08-11 12:14:17 by Juthe, Robin (NIH/NCI) [E]

BZDATETIME::2010-08-11 12:14:17
BZCOMMENTOR::Robin Juthe
BZCOMMENT::33

Everything does look good on Franck now. Thanks, Alan. Please go ahead with the live run on Bach.

Comment entered 2010-08-19 17:09:19 by alan

BZDATETIME::2010-08-19 17:09:19
BZCOMMENTOR::Alan Meyer
BZCOMMENT::34

I've re-run these on Franck with the last refresh from Bach.

Everything is ready for Volker's publishing test on Franck.

I'll add some more information in the OCECDR-3151 Bugzilla entry.

Comment entered 2010-08-27 00:26:53 by alan

BZDATETIME::2010-08-27 00:26:53
BZCOMMENTOR::Alan Meyer
BZCOMMENT::35

The final run is complete. Log files are attached to the
Bugzilla entry for OCECDR-3151.

Marking this resolved-fixed.

Comment entered 2010-08-27 00:38:46 by alan

BZDATETIME::2010-08-27 00:38:46
BZCOMMENTOR::Alan Meyer
BZCOMMENT::36

I forgot to note here that I checked the PurposeText global change.

It should be safe to run when PurposeText already exists in the
Summaries. In those cases, it will do nothing to the document.

Comment entered 2010-08-30 11:22:28 by Beckwith, Margaret (NIH/NCI) [E]

BZDATETIME::2010-08-30 11:22:28
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::37

Everything live on Cancer.gov. Issue closed.

Attachments
File Name Posted User
PurposeTextFINALEngSpan.xls 2010-07-28 14:26:13
PurposeTextFINALEngSpan.xls 2010-07-07 14:49:50
Request4863.log 2010-07-22 17:35:15
Request4863.xls 2010-07-22 17:24:28
Request4863English.log 2010-07-29 23:20:32
Request4863English.log 2010-07-29 22:25:59
Request4863Spanish.log 2010-07-29 23:21:39
Request4863Spanish.log 2010-07-29 22:26:53

Elapsed: 0:00:00.001361