Issue Number | 3351 |
---|---|
Summary | [DIS] Global to replace Description Text in Metadata of Drug Combination Summaries |
Created | 2011-05-05 11:34:13 |
Issue Type | Improvement |
Submitted By | Juthe, Robin (NIH/NCI) [E] |
Assigned To | alan |
Status | Closed |
Resolved | 2011-05-23 13:44:24 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.107679 |
BZISSUE::5044
BZDATETIME::2011-05-05 11:34:13
BZCREATOR::Robin Juthe
BZASSIGNEE::Alan Meyer
BZQACONTACT::William Osei-Poku
We are changing the description text in DIS document templates for
new DIS (see
issue 5042). We would like to also make a global change to all existing
DIS
documents to have consistency in the wording. The current and new
description
text is provided below.
Current Wording:
This page contains brief information from the National Cancer Institute
(NCI) about the drug combination called {Enter the DCS regimen} and
lists the drugs included in the combination. Links to NCI's Drug
Information Summaries about the individual drugs in the combination are
included, when available.
New Wording:
This page contains brief information about the drug combination called
{Enter the DCS regimen}. The drugs in the combination are listed, and
links to individual drug summaries are included.
BZDATETIME::2011-05-09 20:59:39
BZCOMMENTOR::Alan Meyer
BZCOMMENT::1
For this global I first extracted the relevant Description
strings and examined them, then I made four different
substitutions based on what I found in the record. I think all
of the ones I did were intended, though only two of them were
explicitly stated in the issue description.
Here are the substitutions I made:
1. Replaced:
" from the National Cancer Institute (NCI)"
with: nothing, i.e., deleted it from the Description.
2. Replaced:
" and lists the drugs included in the combination. Links to
NCI's Drug Information Summaries about the individual drugs in
the combination are included, when available."
3. Replaced:
" and lists the drugs included in the combination. Links to
NCI's Drug Information Summaries about the individual drugs in
the combination are included."
4. Replace:
" and lists the drugs included in the combination. Links to
NCI's Drug Information Summaries about the individual drugs in
the combination are also included."
2-4 got the replacement string:
". The drugs in the combination are listed, and links to
individual drug summaries are included."
To select documents I looked for Drug Information Summaries
that
had "drug combination" somewhere in the record. There were 27 of
those. 26 got modified by the global change.
Results are in:
http://mahler.nci.nih.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2011-05-09_20-10-02
BZDATETIME::2011-05-10 14:23:37
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::2
I am looking at the results and wondering if I am interpreting them correctly. Are the lines the + sign the current wording to be populated when you run in live mode?
BZDATETIME::2011-05-10 14:30:23
BZCOMMENTOR::Alan Meyer
BZCOMMENT::3
(In reply to comment #2)
> I am looking at the results and wondering if I am interpreting them
correctly.
> Are the lines the + sign the current wording to be populated when
you run in
> live mode?
Ooops. Yes.
Let me fix that and get back to you.
BZDATETIME::2011-05-10 16:49:34
BZCOMMENTOR::Alan Meyer
BZCOMMENT::4
(In reply to comment #3)
> Let me fix that and get back to you.
I looked at this one. It's yet a different pattern from the
others and it's further complicated by having a non-ascii
character for the single quote in "NCI's".
I wonder if we're going about this the wrong way. Instead of
doing a search and replace, attempting to recognize similar but
slightly different search texts, maybe we should just replace all
of the Description elements with perfectly standardized text,
with just the drug name or drug combination name varying between
them? I could do this for both this issue and the one for Bug
#5043.
Is that what we really want? It would make everything uniform,
no
matter what is in the Description text now.
Here's a possible algorithm:
Select all Drug Information Summaries.
For each one:
title = /DrugInformationSummary/Title
If /DrugInformationSummary/DrugInfoMetaData/DrugInfoType/
@Combination = "Yes"
drugCombo = true
Else
drugCombo = false
If drugCombo is true
Use the pattern given for OCECDR-3351 plugging in the
title of the drug.
Else
Use the pattern given for OCECDR-3350 plugging in the
title of the drug.
To make it easier to decide, I've extracted the Description
text
for all DrugInformationSummary docs on Bach. It is attached as a
plain text file.
Attachment disDesc.txt has been added with description: Current values of DIS Descriptions on Bach
BZDATETIME::2011-05-10 17:07:54
BZCOMMENTOR::Robin Juthe
BZCOMMENT::5
(In reply to comment #4)
I agree - this is a better approach. Please proceed with this algorithm (as opposed to the search & replace method) in test mode. Thanks, Alan.
BZDATETIME::2011-05-10 20:06:08
BZCOMMENTOR::Alan Meyer
BZCOMMENT::6
These things always have twists and turns.
I wrote the program and it works, but taking the name of the
drug
from the Title element produces a title with possibly
inappropriate capitalization. The title capitalizes the first
letter of each word but that doesn't look right in the text of
the description.
I could use the terminology or glossary link but they can have
extra words like "regimen" in "ABVD regimen".
So I tried the following:
For each word in the Title string:
If it has 2 capital letters together, or is a single capital:
Leave it alone.
Else lower case it.
The output results of the global are in:
http://mahler.nci.nih.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2011-05-10_20-01-49
To see what the capitalization did with data from Bach, see the
attached file showing:
CDR ID of the document.
Title string from the document.
(Possibly) transformed title string.
I notice that for a few drugs, the entire string is appears to
be
inappropriately capitalized.
I also notice that some of the customization that people put
into
the Descriptions is probably appropriate. For example, in our
boiler plate text we refer to "drugs", but in some cases the
original hand constructed description referred to "vaccines".
Maybe the best thing to do is to run the global, but edit a few
of the descriptions afterward by hand.
Attachment disCaps.txt has been added with description: Summary of capitalizations of terms on Bach.
BZDATETIME::2011-05-10 22:13:39
BZCOMMENTOR::Alan Meyer
BZCOMMENT::7
BZDATETIME::2011-05-10 22:16:39
BZCOMMENTOR::Alan Meyer
BZCOMMENT::8
In case anyone is interested, there were three vaccines in
the set of drug info summaries. They are:
CDR0000658573
CDR0000658589
CDR0000672821
If we want to edit any of them by hand to change "drug" to
"vaccine" after the global change, those are the ones to look
at.
Alternatively, I can build this into the software - which is
probably not justified if it's just for these three, but might
be if we want to keep this global and use it in the future as
the basis for revising future drug info summaries.
BZDATETIME::2011-05-11 08:17:29
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::9
These can definitely be fixed by hand. I like the red text in bugzilla!
BZDATETIME::2011-05-12 09:45:40
BZCOMMENTOR::Robin Juthe
BZCOMMENT::10
(In reply to comment #9)
> These can definitely be fixed by hand.
I agree - any variations will be fixed by hand. It does appear that your capitalization rule worked fine.
In addition to the few vaccines, I noticed the following docs that may need to be edited by hand to fix capitalization errors afterwards (just noting them here - I will share these with Deb/Diana):
632971
686550
698393
I verified the results of the global on Mahler. Please run on BACH.
BZDATETIME::2011-05-12 10:40:12
BZCOMMENTOR::Alan Meyer
BZCOMMENT::11
(In reply to comment #10)
> ...
> I verified the results of the global on Mahler. Please run on
BACH.
I've run in test mode on Bach.
BZDATETIME::2011-05-12 13:24:21
BZCOMMENTOR::Robin Juthe
BZCOMMENT::12
(In reply to comment #11)
> I've run in test mode on Bach.
Verified the test run. Please do a live run on Bach, and then users will fix the few documents that need it.
As decided in today's meeting, please also assemble the diff results of the test run into a single file so users can identify which changes need to be made. Thanks.
BZDATETIME::2011-05-12 15:14:52
BZCOMMENTOR::Alan Meyer
BZCOMMENT::13
Thinking this might be generally useful, I wrote a little
program
(DiffReport.py) that will combine all of the diffs output by a
test mode global change.
I ran it to just output diffs from current working docs. A copy
is attached.
The program can combine diffs from any of CWDs, last published
versions, and last versions.
I'm thinking that this may be useful enough that we want to put
it on the menus to enable users to run it on any global change
test results to enable them to see all of the outputs at once.
It might be a faster way to spot problems than checking outputs
one document at a time.
I'll do some other things for a bit in case someone spots
anything untoward in the output. If I don't hear of any
problems, I'll run the live mode global change after 5 pm
tonight.
Attachment 5044Diff.txt has been added with description: Combined diff report for current working docs
BZDATETIME::2011-05-13 12:45:55
BZCOMMENTOR::Alan Meyer
BZCOMMENT::14
(In reply to comment #12)
> (In reply to comment #11)
> > I've run in test mode on Bach.
>
> Verified the test run. Please do a live run on Bach, and then users
will fix
> the few documents that need it.
I forgot to do this last night. I'll do it tonight, after 5 pm.
Doing it in the evening often works better because there are less
likely to be documents locked by other users.
BZDATETIME::2011-05-13 18:08:49
BZCOMMENTOR::Alan Meyer
BZCOMMENT::15
(In reply to comment #14)
> (In reply to comment #12)
> > (In reply to comment #11)
> > > I've run in test mode on Bach.
> >
> > Verified the test run. Please do a live run on Bach, and then
users will fix
> > the few documents that need it.
>
> I forgot to do this last night. I'll do it tonight, after 5
pm.
> Doing it in the evening often works better because there are
less
> likely to be documents locked by other users.
Well... maybe not.
Weekly publishing is going to run tonight. We had a problem with the last publishing job. If I run this global, 130 or so protocols will be added to tonight's publishing job. Maybe it's safer to wait until we see what happens during tonight's run, without adding a new, possibly complicating, factor.
I'll hold off and run it later in the weekend.
BZDATETIME::2011-05-15 00:51:31
BZCOMMENTOR::Alan Meyer
BZCOMMENT::16
The program ran live on Bach. A log file is attached.
Attachment xxx has been added with description: Log file from live run on Bach
BZDATETIME::2011-05-16 12:40:19
BZCOMMENTOR::Robin Juthe
BZCOMMENT::17
(In reply to comment #16)
> The program ran live on Bach. A log file is attached.
Verified on Bach. The errors in the log file seem to all be in unpublished documents. I've sent this file as well as the combination diff report to Deb, Diana, and Erin so they can make the few manual changes to vaccines, etc. as mentioned earlier.
BZDATETIME::2011-05-16 12:41:47
BZCOMMENTOR::Robin Juthe
BZCOMMENT::18
I plan to wait until next Monday to take a look the descriptions on Cancer.gov before closing this issue.
BZDATETIME::2011-05-23 13:40:32
BZCOMMENTOR::Robin Juthe
BZCOMMENT::19
(In reply to comment #18)
> I plan to wait until next Monday to take a look the descriptions on
Cancer.gov
> before closing this issue.
I checked a few DIS and a few DCS on Cancer.gov and the descriptions look good. Closing this issue. Thanks!
BZDATETIME::2011-05-23 13:44:24
BZCOMMENTOR::Robin Juthe
BZCOMMENT::20
One last note: I learned that the drug combination summaries are always in all caps - even when are named by individual drugs (e.g., GEMCITABINE-CISPLATIN). This is a style convention and not an error (as I had thought in comment 10) so these will not be manually updated. The vaccines, however, were updated.
File Name | Posted | User |
---|---|---|
5044Diff.txt | 2011-05-12 15:14:52 | |
disCaps.txt | 2011-05-10 20:06:08 | |
disDesc.txt | 2011-05-10 16:49:34 | |
xxx | 2011-05-15 00:51:31 |
Elapsed: 0:00:00.001670