Issue Number | 3265 |
---|---|
Summary | Global Change for substage terms |
Created | 2010-11-18 12:15:07 |
Issue Type | Improvement |
Submitted By | Osei-Poku, William (NIH/NCI) [C] |
Assigned To | alan |
Status | Closed |
Resolved | 2011-02-03 11:36:37 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.107593 |
BZISSUE::4955
BZDATETIME::2010-11-18 12:15:07
BZCREATOR::William Osei-Poku
BZASSIGNEE::Alan Meyer
BZQACONTACT::William Osei-Poku
1. We are currently creating more substages (IA, IIB, etc.) for some existing indexing terms due to the AJCC project. Normally when a substage term is created, a search is conducted to find affected trials to manually reindex. In many cases, hundreds of trials would be identified for manual reindexing. The re-indexing may involve replacing the plain term (stage II, for example) with all the substages (IIA, IIB etc) (or possibly, the addition of other substages to the Plain term).
If re-indexing involves replacing one term with another, we use the Simple Links Global since it does a one to one replacement. What we need is to be able to replace one term with multiple terms or the ability to globally add terms to protocols, for example, without replacing existing ones.
Would it be possible to:
A. Modify the Global Change Simple Links global to allow the
replacement of one term with multiple terms? Or
B. Create another global similar to the Global Change Simple Links
global which will either:
i. Allow the addition of terms without replacement? In this case, the
new global will have to be used in conjunction with the Global Change
Simple Links global.
ii. Allow the replacement of one term with two or more terms (Same as 1.
above but the new global will be used only in these special cases
described above.)?
The global is likely to affect protocol documents more than any other document type.
2. Genetics Professional Mailer (687050) appears to have been
submitted by the professional multiple times, one with changes and the
other without changes. Judging from the dates in the emails, they appear
to have been submitted within one minute of each other. The mailer
was
generated or sent on 2010-10-08T12:09:09
First response was received back - October 13, 2010 4:38 PM - without
changes
Second response was received back - Wednesday, October 13, 2010 4:39 PM
- with changes.
This appears to be an isolated case.
3. Request from a Genetics Professional to add additional groups/Gen. Societies: Collaborative Group of the Americas on Inherited Colorectal Cancer (CGA-ICC) and International Society for Gastrointestinal Hereditary Tumors (InSiGHT)
BZDATETIME::2010-11-18 16:57:52
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::1
I modified the title of this issue so that it will be used for the global changes. I will provide details of our requests soon and also create another issue for the 3rd request in comment #1.
BZDATETIME::2010-11-18 17:47:51
BZCOMMENTOR::Alan Meyer
BZCOMMENT::2
At our meeting, we said that writing a specific global change
program would be dramatically easier than modifying the general
purpose simple link global, or creating a new general purpose
global.
However, in thinking about it, I don't want to rule out working
with general purpose software. In the first place, it may not
involve as much drama as we thought, and in the second place,
general purpose programs tend to be re-used at times that we
don't envision we write them and the long term benefit can be
high.
So, I think getting more details on the requirements for the
global is necessary, but let's not assume yet that we'll have to
write one or more special purpose globals.
Here is a basic question:
William said at the meeting that when a stage splits into two
substages e.g., IIa and IIb, we sometimes need to remove II
and add IIa and IIb, and we sometimes need to retain all
three (II, IIa, IIb.) How do we know which is which?
1. From the ID of the link to be changed, e.g.,
Change all CDR37761 (stage II colon cancer) to two new
terms for stage IIa and IIb.
Leave all CDR43716 (stage II childhood liver cancer)
alone but add two new terms for stage IIa and IIb.
2. From some other criterion that can be read
programmatically in the document we are changing, e.g.,
If (obviously I'm making this up):
/CTGovProtocol/PDQIndexing/StudyCategory/StudyCategoryName
= "Treatment"
Split into two terms.
Else:
Add two terms leaving the existing one alone.
3. There is no straightforward programmatic way to determine
which trial should be treated which way. A human has to
read the trial document and apply human judgment.
Here's another question that may help us in determining whether
its worthwhile to use a general purpose program.
Are there other uses that might benefit from this capability
besides the splitting of stages?
[I just said above that there tend to be uses for general
purpose software that we don't envision in advance, but if we
can envision some that gives us both more reason to write the
software in a general way, and more ideas about what the
generality should be.]
BZDATETIME::2010-12-20 12:51:52
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::3
I believe (1) above will meet our needs. Let's take the case where new substage terms are replacing a plain term, which is a major part of what we will be using the global for. (The other case of adding new substage terms to existing plain terms will rarely be used)
For example (and this is a real example of what we could use the
global for)
New specific substage terms have been created for stage II prostate
cancer 38784.
They are 688983 stage IIA prostate cancer
and 688984 stage IIB prostate cancer. The newly created terms (688983
and 688984) are to replace 38784 in all protocols (CTGov and InScope)
indexed with 38784.
So we know:
i. the CDR ID of the term to be replaced (38784)
ii. the CDR ID of the newly created substage or specific terms that are
replacing existing term - (688983 and 688984)
iii. that the substage terms (688983 and 688984) are replacing the plain
term (38784) (as opposed to adding them)
iv. through a linked doc report, the affected trials (CTGovProtocol and
InScopeProtocol and perhaps in the near future, CTRPProtocol)
The second part of this request is when we need to add specific terms to either existing plain terms or existing substage terms. But as I stated above, only in rare situations will we want to do this.
BZDATETIME::2010-12-20 16:15:40
BZCOMMENTOR::Alan Meyer
BZCOMMENT::4
I'm thinking about a design like the following that starts with
the Simple Link program but supplements its functionality:
1. Have the user choose whether to delete the existing link
and
replace it, or leave it alone and add more links.
This might be a pair of radio buttons on the screen on which
the user specifies the existing "Old" link, for example:
(o) Replace this link with one or more others.
( ) Keep this link and add one or more others.
"Replace..." would be the default.
Or we could have a single check box:
[ ] Keep this link and add one or more others.
If checked, we keep the old link. Else we delete the old
link. The default would be to leave the box unchecked,
causing the old link to be deleted/replaced.
2. Allow a user to enter more than one "New" (add or replace)
term.
Again there could be radio buttons or a check box. For
example:
( ) Add more terms after this one.
(o) This is the last term.
or alternatively:
[ ] Add more terms after this one.
If "Add more terms..." is selected or checked, then after the
user enters the data for a new term and resolved any name
choices, another screen comes up just like the previous one
to enter another "new" link.
I thought about putting multiple input boxes on one page so
that
a user could enter multiple terms all at once. That is faster
and more convenient, but may only be so if we're entering CDR
IDs, not strings. If we allow a user to enter a string like
"stage IIA prostate cancer" and then resolve it to a link, it
could be tricky to have multiple picklists returned to choose
from.
If we implement this approach, the new revised program will be
able to do everything that simple link replacement did, but also
be able to replace terms with more than one replacement and add
terms as well. I'd be inclined to eliminate the old program
entirely. Instead of:
CDR Global Change Simple Links
we'd just have:
CDR Global Change Links
Here are some questions:
Is this a good plan?
Can someone think of a better one?
Is there any benefit to the existing capability that allows a
user to enter a string for a new term and find a CDR ID in a
pick list, or do users always know the CDR ID of the term
they're adding? If the latter, we can simplify the user
interface a little by putting a collection of, say, five
input boxes on the page for entering new links and allow the
process to be done in one fell swoop.
If we do switch to CDR ID input only, I would still produce a
confirmation page with the full document title for each CDR
ID and allow the user to check that everything is right
before committing a change.
BZDATETIME::2010-12-21 11:44:08
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::5
(In reply to comment #4)
> I'm thinking about a design like the following that starts
with
> the Simple Link program but supplements its functionality:
> 1. Have the user choose whether to delete the existing link
and
> replace it, or leave it alone and add more links.
> This might be a pair of radio buttons on the screen on which
> the user specifies the existing "Old" link, for example:
> (o) Replace this link with one or more others.
> ( ) Keep this link and add one or more others.
> "Replace..." would be the default.
A pair of radio buttons is preferred.
> Or we could have a single check box:
> [ ] Keep this link and add one or more others.
> If checked, we keep the old link. Else we delete the old
> link. The default would be to leave the box unchecked,
> causing the old link to be deleted/replaced.
> 2. Allow a user to enter more than one "New" (add or replace)
> term.
> Again there could be radio buttons or a check box. For
> example:
> ( ) Add more terms after this one.
> (o) This is the last term.
> or alternatively:
> [ ] Add more terms after this one.
> If "Add more terms..." is selected or checked, then after the
> user enters the data for a new term and resolved any name
> choices, another screen comes up just like the previous one
> to enter another "new" link.
Two radio buttons is again preferred due to clarity.
Would the new screen that comes up (after Add more terms after this one
is selected), have the previously entered new term(s)? I thought it
would be good to have it that way.
> Here are some questions:
> Is this a good plan?
Yes. I think it is.
> Can someone think of a better one?
> Is there any benefit to the existing capability that allows a
> user to enter a string for a new term and find a CDR ID in a
> pick list, or do users always know the CDR ID of the term
> they're adding? If the latter, we can simplify the user
> interface a little by putting a collection of, say, five
> input boxes on the page for entering new links and allow the
> process to be done in one fell swoop.
> If we do switch to CDR ID input only, I would still produce a
> confirmation page with the full document title for each CDR
> ID and allow the user to check that everything is right
> before committing a change.
There are benefits to entering the string as opposed to the CDR IDs. For example, before the user can get the CDR ID, he or she will need to do a search to retrieve it, write it down or copy it and paste it but with the ability to do a string search on the same page, that shortens the process and makes it seamless. However, not many users will be using this global and it won't be used frequently. So I think it is OK to replace it with the CDR Input only.
BZDATETIME::2010-12-23 22:26:08
BZCOMMENTOR::Alan Meyer
BZCOMMENT::6
I have gotten pretty far along with the design of the new,
general purpose global change when I ran into a problem. The
problem has been there all along with the original Simple Links
global change, but we never noticed it before.
The problem is that we never gave sufficient thought to the
handling of attributes other than cdr:ref and cdr:href in the
simple links program.
Looking back at Bugzilla issue #4586 I see that, at that time,
I
thought it was right to preserve all of the attributes on a link
element when we changed its value. That's what the software
does. But the only attribute I seem to have thought about was
cdr:id.
It does seem right to preserve a cdr:id, but there are other
attributes for which it's not right. One is PdqKey - for which
it's always wrong to preserve it - though PdqKey no longer has
any real meaning and can be harmlessly wrong.
And what about the url attribute on an href? Sometimes it might
be right to preserve it and sometimes not.
At this point I can think of various possible changes we can
make
even to the existing simple link replacement program. I've given
some relative difficulty ratings to each one, where each one
might be the number of hours required to implement it. These are
purely ballpark numbers. I won't know real numbers until I've
done it.
1. Have the program always preserve certain specific
attributes
and always discard others. Presumably the list of attributes
is small.
Hours = 5
1.a. Have the list of attributes to preserve or discard be
document type and/or link type specific - possibly specifying
them in the link type tables.
Hours = 25
2. Have the user explicitly specify attributes he wants
preserved or discarded. Again more programming.
Hours = 15
3. Always preserve everything (the current technique).
Hours = 0. We already have this.
4. Always discard everything.
Hours = 1. It's very easy.
Testing will be proportionately difficult for users. The harder
to implement options will also be harder to test.
If we do preserve attributes, the same questions arise
regarding
propagating them to new elements. If link A splits into A1 and
A2, do A's attributes attach to A1, both A1 and A2, or neither?
In light of this issue, it's possible that we don't want to
pursue the general purpose global link change that I had in mind,
and just write a custom global change for substage terms. But I
don't want to throw out the general purpose approach if we decide
on a simple solution to the problem (see below).
If we do go for the complex general purpose global link change,
I'm inclined toward the following simple solution.
a. Implement technique number 4 above in all cases.
4 makes the most sense when a term splits into two terms, and
it provides some consistency to do it in all cases. It might
also be a better choice for most simple one -> one link
cases.
b. Document what will happen in the user interface so that
users
are more aware of the issue.
If users aren't sure what will happen, they should use test
mode to find out (as I'm sure they do now.)
c. Don't use the (modified) simple link program where the
above
rules don't work.
I think we should discuss this before I get further into the
program.
BZDATETIME::2010-12-28 11:41:15
BZCOMMENTOR::Alan Meyer
BZCOMMENT::7
(In reply to comment #6)
> ...
> I think we should discuss this before I get further into the
> program.
It looks like, with everyone away for the holidays, there won't
be any quick answers to the questions I raised in comment #6. In
order to make progress I've therefore decided on a redesign of
the transformation technique in the program that should make
implementation of any of the solutions above fairly
straightforward.
I've written some lxml code to replace the XSLT transform that
I
used in the original Simple Link global. The new lxml code is
easier to modify and test to handle the different cases.
I'm going to proceed with the solution I recommended and get
everything working if I can. If we want to change things and
adopt different answers to the questions than the answers I
implement, it should be relatively easy to do.
BZDATETIME::2010-12-28 12:28:57
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::8
Sounds good to me. I was waiting for us to discuss this at the meeting as you suggested. I just wanted to say that so far we have used the Simple Links Global for links with the cdr:ref attribute and not for the cdr:href attribute. This is likely to be the case in the future but I agree that the problem you identified should be addressed, just in case.
BZDATETIME::2010-12-31 00:23:21
BZCOMMENTOR::Alan Meyer
BZCOMMENT::9
Progress report:
I've finished the program. It's not right yet. There's a lot of testing still to do, but it's getting there. I expect it to be ready some time next week.
BZDATETIME::2011-01-06 11:17:32
BZCOMMENTOR::Alan Meyer
BZCOMMENT::10
This is complete and ready for test on Mahler.
I have tested only in test mode, not live mode, but since the mechanisms for test/live processing have not changed I didn't see a need to save modified documents with incorrect data.
BZDATETIME::2011-01-06 17:36:22
BZCOMMENTOR::Alan Meyer
BZCOMMENT::11
The program is currently accessible for testing through the following menu:
CDR Admin
Developers/System Administrators
Global Changes
Global Change Links
If someone who needs to test can't get to that I can add it to the CIAT/OCCM menu.
I now think that, once this is tested and approved, we should remove the old Global Change Simple Links program and only use this one. The reason is that we shouldn't have two ways to do the same thing. We risk making an update to one of them one day and either doubling the maintenance work to apply it to the other or, worse, not remembering to do it in the other.
BZDATETIME::2011-01-11 13:31:30
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::12
I am able to run the global successfully in test mode but not live mode. Below is the message I get when I run in live mode:
GlobalChangeLinkBatch failed: Failure saving changes for CDR0000063882: User woseipoku has not checked out this document. Storing document not allowed
BZDATETIME::2011-01-11 14:33:40
BZCOMMENTOR::Alan Meyer
BZCOMMENT::13
(In reply to comment #12)
> I am able to run the global successfully in test mode but not live
mode. Below
> is the message I get when I run in live mode:
>
> GlobalChangeLinkBatch failed: Failure saving changes for
CDR0000063882: User
> woseipoku has not checked out this document. Storing document not
allowed
Maybe I broke something in the ModifyDocs modules (OCECDR-3210).
I'm not at the office today but I'll check it out and fix it on Thursday.
BZDATETIME::2011-01-11 14:34:46
BZCOMMENTOR::Alan Meyer
BZCOMMENT::14
(In reply to comment #13)
> Maybe I broke something in the ModifyDocs modules (OCECDR-3210).
That should have said OCECDR-3209.
BZDATETIME::2011-01-13 18:34:43
BZCOMMENTOR::Alan Meyer
BZCOMMENT::15
I believe that I found the cause of the problem. I was using the values True and False when I should have been using 'Y' and 'N' to tell the system to, or not to, checkout the document before processing it.
However it's not ready for testing yet. I discovered what looks like two other bugs that I hadn't seen before.
I'll post a message when it's ready/
BZDATETIME::2011-01-20 11:41:27
BZCOMMENTOR::Alan Meyer
BZCOMMENT::16
This is ready for testing again. I fixed the bug that William reported and, as often happens, found two other subtle bugs and fixed those too.
BZDATETIME::2011-01-20 14:27:08
BZCOMMENTOR::Alan Meyer
BZCOMMENT::17
For the historical archives:
----------------------------------------------------------------------
M:\home\alan\cdr\trunk\Lib\Python>"C:\Program Files
(x86)\CollabNet\Subversion C
lient\svn.exe" commit GlobalChangeLinkBatch.py
Sending GlobalChangeLinkBatch.py
Transmitting file data .
Committed revision 10000.
----------------------------------------------------------------------
That was our 10,000ths source code commit to our version control system.
BZDATETIME::2011-01-24 15:12:00
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::18
I have done a lot of tests and the global seems to be working fine. However, I found a few things that may need fixing:
1. When adding multiple terms (either to an existing term or
replacing an existing term), a bug appears to prevent me from entering a
cdr ID. I am only successful at entering the title string. This is the
error message I get
“Please enter either an id OR a string, not both. ID=CDR0000039905
Name=acral lentiginous malignant melanoma;Index term;Legacy-cellular
type;Cancer diagnosis "
Steps to duplicate the error:
i. I enter the old CDR ID without a problem
ii. I enter the first new CDR ID without a problem
iii. I enter the second new CDR ID and click next.
Then the above message displays. The only way I am able to proceed is to
go back, clear the cdr ID and enter the title of the term. From the
error message, you would think that I had both a string and CDR ID in
the input boxes but that was not the case. Initially, I thought that it
was a caching problem with the browser since I am testing with the same
set of index terms but I varied the terms just to verify this and I
still got this error message.
2. This is not a bug but I think it will be good to implement. When a
change will break the link target rule, would it be possible to either
(1), prevent the global from working or (2) Display a warning message
for the user to determine the next line of action?
For example, when doing a global change to replace a diagnosis term with
a drug term; typically, this should not happen but it appears that the
global will run 'successfully' but invalidate the records. I only tried
in test mode and I was presented with errors in the results.
Example: Replacing CDR0000038784 with CDR0000561135 should not work
since they have different semantic types and the semantic type for
561135 does not appear to have been defined for the diagnosis linking
element.
At this point, my preference is to prevent the global from running. If
this cannot be done then, users will have to be careful before running
the global because that could invalidate hundreds of records.
3. This one is also not a bug it will be good to warn the user when attempting to do so. The global allowed me to add the same term multiple times in one run of the global. I do not see the need for this right now but we may need it in the future. However, it is possible for a user to type in the same cdr id multiple times thinking that he/she is adding two or more terms. It looks like a warning message will do in this case. If that is not possible or if a warning message is not adequate, we can prevent this from working since there is no value adding one term multiple times.
I am still testing. I have done a lot of testing in test mode and starting to do live mode testing....
BZDATETIME::2011-01-25 16:42:21
BZCOMMENTOR::Alan Meyer
BZCOMMENT::19
(In reply to comment #18)
> I have done a lot of tests and the global seems to be working fine.
However, I
> found a few things that may need fixing:
Thanks for your excellent testing William.
I have fixed the bug you reported in your item number 1, and implemented the warning you requested in item 3.
I haven't done number 2, the check for link properties.
This is feasible but not easy. Right now, the only validation that the server supports is validating an entire document, not a single element or a single link. To implement it I'd have to write some pretty significant code in the server as well as in the client side global change modules. So I'm inclined to think that this may be more trouble than we wish - especially given that problems of this kind will be revealed if a user runs first in test mode.
If you think it's important, let's discuss it in Thursday's meeting.
Meanwhile, the other changes are in place on Mahler and can be tested.
BZDATETIME::2011-01-28 00:53:31
BZCOMMENTOR::Alan Meyer
BZCOMMENT::20
Notes to myself:
How to put this into production:
o Replace Global Change Simple Links with the new Global Change
Links
in the menu structures.
o Install any modified files in Subversion (file list below).
o Promote:
Revised cgi/menu files.
cgi/GlobalChangeLink.py
lib/Python/ModifyDocs.py
lib/Python/GlobalChangeLinkBatch.py
lib/Python/cdr.py
BZDATETIME::2011-01-28 14:42:42
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::21
(In reply to comment #19)
> (In reply to comment #18)
> > I have done a lot of tests and the global seems to be working
fine. However, I
> > found a few things that may need fixing:
>
> Thanks for your excellent testing William.
>
> I have fixed the bug you reported in your item number 1, and
implemented the
> warning you requested in item 3.
I tested this one and it works now. Thanks!
>
> I haven't done number 2, the check for link properties.
>
> This is feasible but not easy. Right now, the only validation that
the server
> supports is validating an entire document, not a single element or
a single
> link. To implement it I'd have to write some pretty significant
code in the
> server as well as in the client side global change modules. So I'm
inclined to
> think that this may be more trouble than we wish - especially given
that
> problems of this kind will be revealed if a user runs first in test
mode.
>
> If you think it's important, let's discuss it in Thursday's
meeting.
I think it is OK to shelve this for now. The global will still be useful
without this enhancement. If it becomes necessary to have it, I bring it
up at the appropriate time.
>
> Meanwhile, the other changes are in place on Mahler and can be
tested.
I tested # 2 also and it works also. Thanks!
BZDATETIME::2011-02-01 15:30:41
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::22
I have completed testing the global and I did not find any more problems. Previously, I said I wanted to also test on Franck before you promote the changes to Bach but I have done enough tests on Mahler and I am confident that if there were any problems with the global I would have found them so please promote to BACH and FRANCK. I would rather run the first global change on Franck before running it on Bach but I don't expect to run into any problems.
I also tested the global with href linking elements to verify the issue you brought up about the text content and that is indeed the case. However, the warning message beneath the refs and hrefs check boxes should be enough to warn users to be careful when making href global changes.
BZDATETIME::2011-02-01 15:39:23
BZCOMMENTOR::Alan Meyer
BZCOMMENT::23
I will promote the changes to Bach and Franck tonight.
BZDATETIME::2011-02-03 00:29:22
BZCOMMENTOR::Alan Meyer
BZCOMMENT::24
I have promoted everything to Franck and Bach. I tested up
through the point of submitting a job, but canceled there.
Just for safety, I suggest that the first time a global change
in production is made using the new software, that it be run in
test mode first - though that may already be standard practice.
BZDATETIME::2011-02-03 11:36:37
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::25
(In reply to comment #24)
> I have promoted everything to Franck and Bach. I tested up
> through the point of submitting a job, but canceled there.
>
> Just for safety, I suggest that the first time a global
change
> in production is made using the new software, that it be run
in
> test mode first - though that may already be standard practice.
I have verified this exists on Bach. Since I will not be running any
globals now, I am closing this issue and will re-open if I run into any
problems when I run my first global. I have also noted your suggestion
of running first in test mode.
Thank you, Alan!
Elapsed: 0:00:00.001732