CDR Tickets

Issue Number 2696
Summary Add CTGovInterventionType to CTGov exports
Created 2008-11-06 15:47:44
Issue Type Improvement
Submitted By alan
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2009-02-17 20:21:51
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.107024
Description

BZISSUE::4367
BZDATETIME::2008-11-06 15:47:44
BZCREATOR::Alan Meyer
BZASSIGNEE::Bob Kline
BZQACONTACT::Lakshmi Grama

When the global change to add CTGovInterventionType elements
to Term documents is completed, the new element should be
included in protocols sent to NLM.

Comment entered 2008-11-13 10:56:48 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2008-11-13 10:56:48
BZCOMMENTOR::Bob Kline
BZCOMMENT::1

Lakshmi:

Any instructions on how to do this?

Comment entered 2008-11-13 13:19:17 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2008-11-13 13:19:17
BZCOMMENTOR::Bob Kline
BZCOMMENT::2

Next step is for Bob to post a description of what we currently do for intervention, then Lakshmi will provide instructions for how this will be modified.

Comment entered 2008-11-14 10:43:22 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2008-11-14 10:43:22
BZCOMMENTOR::Bob Kline
BZCOMMENT::3

Here's what the documentation for the Intervention mapping class says:

Transformation to convert all of the original document's
Intervention elements to the structure specified by NLM's
DTD, based on the semantic types of the terms found in the
Intervention elements. Implemented as a post-process since
the XSL/T filter which is fed the vendor XML document must
retrieve the semantic type information from the CDR, as it
is not exported with the vendor document.

For mapping logic, see mapping.xls (Lakshmi Grama, 2002-12-12,
revised 2002-12-13), attached to issue #1892 with comment #42.
Intervention type parents suppressed 2007-10-24 at Lakshmi's
request. Further revision of the mapping logic posted by Lakshmi
2008-05-18 with comment #43 of issue #4076.

Here's a link to the most recent version of that mapping logic:

http://verdi.nci.nih.gov/tracker/attachment.cgi?id=1473

Comment entered 2008-11-25 09:42:19 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2008-11-25 09:42:19
BZCOMMENTOR::Bob Kline
BZCOMMENT::4

Lakshmi and I met this morning to go over the revised logic for exporting intervention information to CT.gov.

LET PROTOCOL-DOC = VENDOR IN-SCOPE-PROTOCOL DOCUMENT
LET AOG-BLOCKS = ARRAY OF ARM-OR-GROUP ELEMENT BLOCKS OF PROTOCOL-DOC
FOR EACH INTERVENTION ELEMENT IN PROTOCOL-DOC:
LET I-DESC = VALUE OF INTERVENTION-DESCRIPTION ELEMENT
LET TYPE-DOC = TERM DOCUMENT LINKED BY INTERVENTION-TYPE ELEMENT
LET CTG-TYPE = CTGOV-INTERVENTION-TYPE FROM TYPE-DOC
LET AOG-LINKS = ARRAY OF VALUES FROM ARM-OR-GROUP-LINK ELEMENTS
FOR EACH AOG-LINK IN AOG-LINKS:
IF AOG-LINK NOT FOUND AS LABEL IN ANY MEMBER OF AOG-BLOCKS:
RAISE EXCEPTION, FAILING EXPORT OF DOCUMENT
IF AOG-BLOCK IS EMPTY OR AOG-LINKS IS NOT EMPTY:
FOR EACH INTERVENTION-NAME-LINK ELEMENT:
LET LINKED-TERM-DOC = VENDOR DOCUMENT TARGET OF ELEMENT LINK
LET S-TYPES = ARRAY OF SEMANTIC TYPES FROM LINKED-TERM-DOC
IF NONE OF S-TYPES IS 'DRUG/AGENT COMBINATION':
LET OUTPUT-NAME = PREFERRED-NAME FROM LINKED-TERM-DOC
CREATE NEW INTERVENTION BLOCK IN OUTPUT DOCUMENT
ADD INTERVENTION-TYPE CHILD TO BLOCK WITH VALUE FROM CTG-TYPE
ADD INTERVENTION-NAME CHILD TO BLOCK WITH VALUE FROM OUTPUT-NAME
IF AOG-BLOCKS IS NOT EMPTY:
IF I-DESC IS MISSING:
RAISE EXCEPTION
ADD INTERVENTION-DESCRIPTION CHILD TO BLOCK WITH VALUE FROM I-DESC
FOR EACH AOG-LINK IN AOG-LINKS:
ADD ARM-GROUP-LABEL CHILD TO BLOCK WITH VALUE OF AOG-LINK
IF NO INTERVENTION-NAME-LINK ELEMENTS ARE PRESENT:
LET OUTPUT-NAME = PREFERRED-NAME FROM TYPE-DOC
CREATE NEW INTERVENTION BLOCK IN OUTPUT DOCUMENT
ADD INTERVENTION-TYPE CHILD TO BLOCK WITH VALUE FROM CTG-TYPE
ADD INTERVENTION-NAME CHILD TO BLOCK WITH VALUE FROM OUTPUT-NAME
IF AOG-BLOCKS IS NOT EMPTY:
IF I-DESC IS MISSING:
RAISE EXCEPTION
ADD INTERVENTION-DESCRIPTION CHILD TO BLOCK WITH VALUE FROM I-DESC
FOR EACH AOG-LINK IN AOG-LINKS:
ADD ARM-GROUP-LABEL CHILD TO BLOCK WITH VALUE OF AOG-LINK

This logic will replace the mapping spreadsheets cited in the previous comment. Please let me know if I have captured what we said accurately, Lakshmi.

We also discussed a report which will contain the following columns:

  • Name of Term document linked by InterventionType element

  • Name of Term document linked by InterventionNameLink element
    ("None" for Intervention blocks without any InterventionNameLink children)

  • Mapped value for intervention_type element in output document

  • Mapped value for intervention_name element in output document

  • Count of unique occurrences for this combination of values in the other
    four columns

I will capture the raw data so that if the report shows anomalies in some of the mapping values Lakshmi can be provided with the list of InScopeProtocol documents for which the suspect mappings occurred.

Comment entered 2008-11-25 10:24:35 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2008-11-25 10:24:35
BZCOMMENTOR::Bob Kline
BZCOMMENT::5

Here's a revision of the logic, consolidating some duplicated information):

LET PROTOCOL-DOC = VENDOR IN-SCOPE-PROTOCOL DOCUMENT
LET AOG-BLOCKS = ARRAY OF ARM-OR-GROUP ELEMENT BLOCKS OF PROTOCOL-DOC
FOR EACH INTERVENTION ELEMENT IN PROTOCOL-DOC:
LET I-DESC = VALUE OF INTERVENTION-DESCRIPTION CHILD ELEMENT
LET TYPE-DOC = TERM DOCUMENT LINKED BY INTERVENTION-TYPE CHILD ELEMENT
LET AOG-LINKS = ARRAY OF VALUES FROM ARM-OR-GROUP-LINK CHILD ELEMENTS
LET CTG-TYPE = CTGOV-INTERVENTION-TYPE FROM TYPE-DOC
FOR EACH AOG-LINK IN AOG-LINKS:
IF AOG-LINK NOT FOUND AS LABEL IN ANY MEMBER OF AOG-BLOCKS:
RAISE EXCEPTION, FAILING EXPORT OF DOCUMENT
IF AOG-BLOCK IS EMPTY OR AOG-LINKS IS NOT EMPTY:
IF AOG-BLOCKS IS NOT EMPTY AND I-DESC IS MISSING:
RAISE EXCEPTION, FAILING EXPORT OF DOCUMENT
FOR EACH INTERVENTION-NAME-LINK ELEMENT E:
LET LINKED-TERM-DOC = VENDOR DOCUMENT TARGET OF E
LET S-TYPES = ARRAY OF SEMANTIC TYPES FROM LINKED-TERM-DOC
IF NONE OF S-TYPES IS 'DRUG/AGENT COMBINATION':
LET OUTPUT-NAME = PREFERRED-NAME FROM LINKED-TERM-DOC
CREATE NEW INTERVENTION BLOCK IN OUTPUT DOCUMENT
ADD INTERVENTION-TYPE CHILD TO BLOCK WITH VALUE FROM CTG-TYPE
ADD INTERVENTION-NAME CHILD TO BLOCK WITH VALUE FROM OUTPUT-NAME
IF AOG-BLOCKS IS NOT EMPTY:
ADD INTERVENTION-DESCRIPTION CHILD TO BLOCK WITH VALUE FROM I-DESC
FOR EACH AOG-LINK IN AOG-LINKS:
ADD ARM-GROUP-LABEL CHILD TO BLOCK WITH VALUE OF AOG-LINK
IF NO INTERVENTION-NAME-LINK ELEMENTS ARE PRESENT:
LET OUTPUT-NAME = PREFERRED-NAME FROM TYPE-DOC
CREATE NEW INTERVENTION BLOCK IN OUTPUT DOCUMENT
ADD INTERVENTION-TYPE CHILD TO BLOCK WITH VALUE FROM CTG-TYPE
ADD INTERVENTION-NAME CHILD TO BLOCK WITH VALUE FROM OUTPUT-NAME
IF AOG-BLOCKS IS NOT EMPTY:
ADD INTERVENTION-DESCRIPTION CHILD TO BLOCK WITH VALUE FROM I-DESC
FOR EACH AOG-LINK IN AOG-LINKS:
ADD ARM-GROUP-LABEL CHILD TO BLOCK WITH VALUE OF AOG-LINK
REMOVE DUPLICATE INTERVENTION BLOCKS IN OUTPUT (IGNORING I-DESC DIFFERENCES)

Comment entered 2008-11-28 18:20:14 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2008-11-28 18:20:14
BZCOMMENTOR::Bob Kline
BZCOMMENT::6

Implemented on Mahler. Here's the report described in comment #4:

http://mahler.nci.nih.gov/InterventionMappings-20081128164537.html

and here are the results of a test run on Mahler:

http://mahler.nci.nih.gov/cgi-bin/cdr/ViewCTGovExports.py?job=20081128160210

Comment entered 2008-12-01 08:36:02 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2008-12-01 08:36:02
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::7

I would like Mary and Doug to review the first report, particularly those that are marked as Biologic/vaccine. I think some of these may need to be tagged as drug. I am also looking at the list and will send the spreadsheet with my questions to Mary.

Comment entered 2008-12-01 09:04:53 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2008-12-01 09:04:53
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::8

Need CDR ID for this combination in report
cardiotoxicity attenuation None Drug cardiotoxicity attenuation

Comment entered 2008-12-01 10:18:56 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2008-12-01 10:18:56
BZCOMMENTOR::Bob Kline
BZCOMMENT::9

(In reply to comment #8)
> Need CDR ID for this combination in report
> cardiotoxicity attenuation None Drug cardiotoxicity attenuation
>

CDR601334

Comment entered 2008-12-01 11:23:25 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2008-12-01 11:23:25
BZCOMMENTOR::Bob Kline
BZCOMMENT::10

Here's a version of the mapping report which allows you to see which documents were involved in any particular mapping combination by just clicking on the Count column for any row in the table, which will display the CDR IDs for the documents in which the mapping represented by that row was performed.

http://mahler.nci.nih.gov/cgi-bin/cdr/InterventionMappings.py

Comment entered 2008-12-01 16:01:37 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2008-12-01 16:01:37
BZCOMMENTOR::Bob Kline
BZCOMMENT::11

Here's what the results on Bach look like:

http://bach.nci.nih.gov/cgi-bin/cdr/InterventionMappings.py

Comment entered 2008-12-01 17:25:10 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2008-12-01 17:25:10
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::12

(In reply to comment #10)
> Here's a version of the mapping report which allows you to see which documents
> were involved in any particular mapping combination by just clicking on the
> Count column for any row in the table, which will display the CDR IDs for the
> documents in which the mapping represented by that row was performed.
> http://mahler.nci.nih.gov/cgi-bin/cdr/InterventionMappings.py

It was a little hard to see this because I had to keep going to the top of the page to see the Ids and I kept losing my place. Could the IDs just show in another window?

Comment entered 2008-12-02 10:00:31 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2008-12-02 10:00:31
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::13

Bob and I discussed possible modifications to see if we could resolve some of the problems with mappings for terms that have a semantic type of drug/agent.

Also, in the context of CTGOV, we should only export an InterventionName as part of a single "intervention_type, intervention_name" pair. CTGOV is looking at intervention type as the inherent property of the intervention rather than its use as a modality or method of action (for drugs). In this paradigm, it is difficult to think of the same substance showing up in the same record as a Drug and a Dietary Supplement - which could possibly happen.

To avoid this, I am recommending that in CTGOV export, we only allow an intervention name to appear in one intervention_type/intervention_name pair. In cases where there are multiple possibilities, I recommend that we use a precedence table -
1. Drug
2. Biologic/Vaccine
3. Dietary Supplement.

This does not address issues where the dietary supplement such as White button mushroom extract is paired with a intervention type of aromatase inhibition therapy - it will be mapped to drug since armomatase inhibition therapy is mapped to Drug.

But it would be worth trying to see what, if any, really problematic instances are identified with this tweak to the logic.

Comment entered 2008-12-02 14:32:36 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2008-12-02 14:32:36
BZCOMMENTOR::Bob Kline
BZCOMMENT::14

(In reply to comment #12)

> It was a little hard to see this because I had to keep going to the top of the
> page to see the Ids and I kept losing my place. Could the IDs just show in
> another window?

Done. And here's the URL for the mappings using the modified logic based on the hierarchy of CT.gov intervention type values:

http://bach.nci.nih.gov/cgi-bin/cdr/InterventionMappings.py?suffix=20081202124117

Comment entered 2008-12-02 14:41:32 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2008-12-02 14:41:32
BZCOMMENTOR::Bob Kline
BZCOMMENT::15

This is an excerpt from the export logs for failures caused when the presence of multiple occurrences of the same intervention name could not be eliminated by using the new hard-wired intervention type precedence table.

Comment entered 2008-12-02 14:41:32 by Kline, Bob (NIH/NCI) [C]

Attachment multiple-types.log has been added with description: List of failures caused by duplicate intervention names

Comment entered 2008-12-02 14:51:16 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2008-12-02 14:51:16
BZCOMMENTOR::Bob Kline
BZCOMMENT::16

(In reply to comment #15)
> Created an attachment (id=1588) [details]
> List of failures caused by duplicate intervention names
>
> This is an excerpt from the export logs for failures caused when the presence
> of multiple occurrences of the same intervention name could not be eliminated
> by using the new hard-wired intervention type precedence table.
>

Looks like most (though not all) of the lines in this list of failures were caused by a typo in comment 13. I'm going to change "Biologic/Vaccine" to "Biological/Vaccine" and run a fresh test export job.

Comment entered 2008-12-02 15:51:59 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2008-12-02 15:51:59
BZCOMMENTOR::Bob Kline
BZCOMMENT::17

(In reply to comment #16)

> Looks like most (though not all) of the lines in this list of failures were
> caused by a typo in comment 13. I'm going to change "Biologic/Vaccine" to
> "Biological/Vaccine" and run a fresh test export job.
>

Here are the mappings from this latest run:

http://bach.nci.nih.gov/cgi-bin/cdr/InterventionMappings.py?suffix=20081202152603

Comment entered 2008-12-02 15:58:15 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2008-12-02 15:58:15
BZCOMMENTOR::Bob Kline
BZCOMMENT::18

Comment entered 2008-12-02 15:58:15 by Kline, Bob (NIH/NCI) [C]

Attachment failures.log has been added with description: Problems encountered during test export job.

Comment entered 2008-12-04 13:02:45 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2008-12-04 13:02:45
BZCOMMENTOR::Bob Kline
BZCOMMENT::19

Increased priority at status meeting.

Comment entered 2008-12-09 11:04:01 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2008-12-09 11:04:01
BZCOMMENTOR::Bob Kline
BZCOMMENT::20

This is a report requested by Lakshmi off-line.

Comment entered 2008-12-09 11:04:01 by Kline, Bob (NIH/NCI) [C]

Attachment InterventionNameSemanticTypes.xls has been added with description: Semantic types for intervention names

Comment entered 2008-12-11 07:55:00 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2008-12-11 07:55:00
BZCOMMENTOR::Bob Kline
BZCOMMENT::21

Lakshmi:

Could you post a summary here of what the new mapping logic will be? Thanks!

Comment entered 2008-12-11 16:04:25 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2008-12-11 16:04:25
BZCOMMENTOR::Bob Kline
BZCOMMENT::22

Comment entered 2008-12-11 16:04:25 by Kline, Bob (NIH/NCI) [C]

Attachment InterventionNameSemanticTypes.xls has been added with description: Same spreadsheet, but with the "Drug/agent combination" lines deleted

Comment entered 2008-12-17 08:56:39 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2008-12-17 08:56:39
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::23

Bob and I discussed modifications to logic based on the additional mapping of drug/agent terms to CTGOVInterventionType. Issue needs to be updated with the revised logic.

Comment entered 2008-12-17 09:32:23 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2008-12-17 09:32:23
BZCOMMENTOR::Bob Kline
BZCOMMENT::24

The notes we were scribbling the other day for the latest logic for mapping intervention values are a bit cryptic. Here's what I have been able to decipher (with the help of some fuzzy memory):

for each Intervention element in the trial document:
find the InterventionType child of the Intervention element
find the document (IT) linked by that element
for each InterventionNameLink child:
find the document (INL) linked by that child element
get the preferred name (NAME) from that document
get the semantic types for the INL document
if any of these semantic types is 'Drug/agent combination':
do nothing
otherwise, if any of these semantic types is 'Drug/agent':
find the CTGovInterventionType value from the INL document
use it as intervention_type, with NAME as intervention_name
otherwise:
find the CTGovInterventionType value in the IT document
use it as intervention_type, with NAME as intervention_name
if the Intervention element has no InterventionNameLink children:
find the CTGovInterventionType value in the IT document
use it as intervention_type
use preferred name of IT document as intervention_name

I've elided logic for the checks for arms and intervention descriptions, which hasn't changed.

Does this look right to you?

Comment entered 2008-12-24 11:30:01 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2008-12-24 11:30:01
BZCOMMENTOR::Bob Kline
BZCOMMENT::25

(In reply to comment #24)

> ... Does this look right to you?

Lakshmi:

I'm holding off on the actual implementation until I get confirmation that you've looked over the logic and approve it.

Comment entered 2009-01-05 08:19:16 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2009-01-05 08:19:16
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::26

(In reply to comment #24)
I think the logic is correct. Please go ahead.

Comment entered 2009-01-05 11:06:13 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-01-05 11:06:13
BZCOMMENTOR::Bob Kline
BZCOMMENT::27

The code has been modified to reflect the new logic. As soon as the global change for issue #4414 has been run in live mode and I have confirmation that the results are correct I'll run a test export job on Bach.

Just to confirm: we still have the extra code to make sure only one intervention block goes out for a given intervention name, using the hard-wired hierarchy of intervention types (looking first for 'Drug' then for 'Biological/Vaccine' then for 'Dietary Supplement').

Comment entered 2009-01-05 14:39:27 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2009-01-05 14:39:27
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::28

>Just to confirm: we still have the extra code to make sure only one
>intervention block goes out for a given intervention name, using the hard->wired hierarchy of intervention types (looking first for 'Drug' then for
>'Biological/Vaccine' then for 'Dietary Supplement').

Actually we may be able to do away with that code - I would like to see the results with and without this extra code so I can confirm.

Comment entered 2009-01-08 08:18:34 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-01-08 08:18:34
BZCOMMENTOR::Bob Kline
BZCOMMENT::29

[Writeup of high-level test plan requested by Lakshmi in another issue]

The tests for this change will be set up as follows:

[ ] Create test output set with old code (set A)
[ ] Create test output set with new code (set B)
[ ] Create test output set with part of new code [1] (set C)
[ ] Create diff report comparing set A with set B (Report D)
[ ] Create diff report comparing set B with set C (Report E)
[ ] CIAT (and possibly Lakshmi) reviews report D
[ ] Lakshmi reviews report E
[ ] Lakshmi and CIAT review sample documents from sets B and C
[ ] Lakshmi decides whether to keep code used to generate set B but not set C
[ ] Lakshmi and CIAT decide whether the new code is ready for production

[1] omitting the code to apply a hard-wired hierarchy of intervention types to
ensure that no more than one intervention block is exported a given
intervention name

Comment entered 2009-01-12 10:43:58 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-01-12 10:43:58
BZCOMMENTOR::Bob Kline
BZCOMMENT::30

William has closed issue #4414, so I have begun the first test run of the export software using the existing code ("set A" in the test plan above).

Comment entered 2009-01-12 14:50:47 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-01-12 14:50:47
BZCOMMENTOR::Bob Kline
BZCOMMENT::31

(In reply to comment #30)
> William has closed issue #4414, so I have begun the first test run of the
> export software using the existing code ("set A" in the test plan above).

I have generated all three sets, and the two diff reports. However, unless you object, I'm going to run the jobs to create the three sets again, with slight modifications to the intervention output code to make the intervention blocks come out in a predictable order, and then generate the diff reports again. Otherwise, I think you'll find that the diff reports will be difficult to read, as a lot of the diff output reflects reordering of the intervention blocks, rather than real changes to the output. Another approach I could take would be to write custom code to compare the intervention block sets between the different runs. The advantage of this second approach is that we don't have to tamper with the code we're testing. The disadvantages are that it will take a little longer to write the extra code, and it might suppress evidence of inadvertent changes to other parts of the documents (though I didn't notice any such changes in my cursory review of the first set of reports). Which approach would you prefer?

Comment entered 2009-01-12 16:35:37 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2009-01-12 16:35:37
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::32

>>However, unless you object, I'm going to run the jobs to create the three sets >>again, with slight modifications to the intervention output code to make the >>intervention blocks come out in a predictable order, and then generate the diff >>reports again.

Please go ahead and do this

Comment entered 2009-01-13 10:29:07 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-01-13 10:29:07
BZCOMMENTOR::Bob Kline
BZCOMMENT::33

Here is the report comparing the first two sets:

http://bach.nci.nih.gov/issue4367-sets-a-and-b.html

There's still some distracting juggling that may be making it less straight-forward to review the results, even though I sorted the blocks to make sure they weren't coming out in random order. The sorting was by intervention type, and under that by intervention name. It may be that if the sort had been by intervention name first and then by intervention type it would be easier still to review the results. I'll run the tests again with that modification if you think it's needed.

The third set is still being generated.

Comment entered 2009-01-13 14:00:00 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2009-01-13 14:00:00
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::34

Noticed this in my review - it seems odd

diff -ru set-a/CDR256897.xml set-b/CDR256897.xml

      • set-a/CDR256897.xml Tue Jan 13 08:56:53 2009
        +++ set-b/CDR256897.xml Tue Jan 13 09:34:43 2009
        @@ -77,11 +77,11 @@
        <condition>myelodysplastic syndromes</condition>
        <condition>myelodysplastic/myeloproliferative diseases</condition>
        <intervention>

  • <intervention_type>Procedure</intervention_type>
    + <intervention_type>Procedure/Surgery</intervention_type>
    <intervention_name>chromosomal translocation analysis</intervention_name>
    </intervention>
    <intervention>

  • <intervention_type>Procedure</intervention_type>
    + <intervention_type>Procedure/Surgery</intervention_type>
    <intervention_name>cytogenetic analysis</intervention_name>
    </intervention>
    <eligibility>

I checked the CDR term record on BACH for chromosomal translocation analysis and it is correctly mapped to CTGOVInterventionType of "Genetic" and yet here it is showing up with Intervention_type of Procedure/Surgery.

Could you check. Same with cytogenetic analysis

Comment entered 2009-01-13 14:46:26 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-01-13 14:46:26
BZCOMMENTOR::Bob Kline
BZCOMMENT::35

The diff report between sets B and C had only one line in it, showing that CDR532941.xml only appeared in set C, not in set B. The log for set B has the following line:

!5404 Tue Jan 13 09:46:02 2009: failure processing CDR532941: intervention 'mutation carrier screening' has multiple types 'Genetic'; 'Procedure/Surgery'

The code we dropped for set C included logic to fail processing if the hard-coded hierarchy of types was unable to winnow down the number of intervention blocks for any given intervention type to one.

There are a number of "Only in ..." lines in the report of differences between sets A and B:

Only in set-a: CDR256871.xml
Only in set-a: CDR256919.xml
Only in set-a: CDR331829.xml
Only in set-a: CDR378183.xml
Only in set-b: CDR502363.xml
Only in set-a: CDR532941.xml
Only in set-a: CDR629778.xml
Only in set-a: CDR630380.xml
Only in set-a: CDR65713.xml
Only in set-a: CDR67380.xml
Only in set-a: CDR68093.xml
Only in set-a: CDR68106.xml

Here are the error messages (with the timestamps stripped) for the ones which were dropped with the new intervention mapping code:

failure processing CDR256871: missing CT.gov intervention type for CDR531923
failure processing CDR256919: missing CT.gov intervention type for CDR37779
failure processing CDR331829: missing CT.gov intervention type for CDR39187
failure processing CDR378183: missing CT.gov intervention type for CDR380753
failure processing CDR532941: intervention 'mutation carrier screening' has multiple types 'Genetic'; 'Procedure/Surgery'
failure processing CDR629778: missing CT.gov intervention type for CDR630382
failure processing CDR630380: missing CT.gov intervention type for CDR630596
failure processing CDR65713: missing CT.gov intervention type for CDR41911
failure processing CDR67380: missing CT.gov intervention type for CDR37779
failure processing CDR68093: missing CT.gov intervention type for CDR37779
failure processing CDR68106: missing CT.gov intervention type for CDR37779

The one that was missing from the first set was caused by a database query timeout error, which is probably extremely rare when the job is running during off hours without CIAT users working.

Comment entered 2009-01-13 14:59:01 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-01-13 14:59:01
BZCOMMENTOR::Bob Kline
BZCOMMENT::36

(In reply to comment #34)
> Noticed this in my review - it seems odd
>
> ...
>
> I checked the CDR term record on BACH for chromosomal translocation analysis
> and it is correctly mapped to CTGOVInterventionType of "Genetic" and yet here
> it is showing up with Intervention_type of Procedure/Surgery.
>
> Could you check. Same with cytogenetic analysis

Perhaps I have misunderstood what you wanted for the mapping logic. I came away from the meeting we had in my office a few days ago with the idea that we were supposed to use the CTGovInterventionType from the document linked by the protocol's InterventionNameLink element only if the semantic types of that document included "Drug/agent"; otherwise we were supposed to use the CTGovInterventionType from the document linked by the InterventionType element in the protocol document. I think this understanding matches the logic I wrote up in comment #24. Let me know if that's not right.

Comment entered 2009-01-14 09:57:04 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-01-14 09:57:04
BZCOMMENTOR::Bob Kline
BZCOMMENT::37

Lakshmi tried to post a reply to my last comment, but Internet Explorer couldn't find Verdi. We came up with a new version of the mapping logic, which I am posting here:

  for each Intervention element in the trial document:
    find the InterventionType child of the Intervention element
    find the document (IT) linked by that element
    for each InterventionNameLink child:
      find the document (INL) linked by that child element
      get the preferred name (NAME) from that document
      get the semantic types for the INL document
      if any of these semantic types is 'Drug/agent combination':
        do nothing
      otherwise:
        find the CTGovInterventionType value from the INL document
        use it as intervention_type, with NAME as intervention_name
    if the Intervention element has no InterventionNameLink children:
      find the CTGovInterventionType value in the IT document
      use it as intervention_type
      use preferred name of IT document as intervention_name

In addition, we decided to retain the extra code which looks at trials whose documents have multiple intervention blocks with the same intervention name but different intervention types, but instead of trying to pick one of the blocks using the hard-coded hierarchy of preferred types the software will always fail the export of such trial documents.

Lakshmi:

Please let me know if I've captured this accurately.

Comment entered 2009-01-14 16:03:25 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2009-01-14 16:03:25
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::38

The logic is OK

Comment entered 2009-01-14 16:04:50 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-01-14 16:04:50
BZCOMMENTOR::Bob Kline
BZCOMMENT::39

Thanks. Do you want me to hold off on running another test set until CIAT has addressed the failures from the global change?

Comment entered 2009-01-14 16:21:14 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2009-01-14 16:21:14
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::40

(In reply to comment #39)
> Thanks. Do you want me to hold off on running another test set until CIAT has
> addressed the failures from the global change?
>
Last comments in Issue 4414 should be here. I just copied the comments to this issue:

CIAT has looked at the errors but we thought that if there is an intervention
name, CTGovInterventionType supersedes that of the intervention type it is
pared with?

Comment entered 2009-01-15 09:36:20 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-01-15 09:36:20
BZCOMMENTOR::Bob Kline
BZCOMMENT::41

(In reply to comment #40)
> (In reply to comment #39)
> > Thanks. Do you want me to hold off on running another test set until CIAT has
> > addressed the failures from the global change?
> >
> Last comments in Issue 4414 should be here. I just copied the comments to this
> issue:
>
>
> CIAT has looked at the errors but we thought that if there is an intervention
> name, CTGovInterventionType supersedes that of the intervention type it is
> pared with?
>

Actually, Issue #4414 is the right home for this thread. See Lakshmi's latest comment (#19 in that issue).

I'm holding off on further testing of the export mapping until the loose ends in issue #4414 are resolved.

Comment entered 2009-01-16 13:16:06 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-01-16 13:16:06
BZCOMMENTOR::Bob Kline
BZCOMMENT::42

I ran a new set with the latest logic and compared the results to the first set (set A): http://bach.nci.nih.gov/issue4367-sets-a-and-d.html. However, you may decide that enough changes to the documents have been made since Tuesday that the noise makes this comparison report difficult to use, so I'm running a new base set with the production code, and will post the comparison against that set when it's done. The only way to get a pure test result is to stop CIAT from editing documents while I'm running the two sets, or clone the database to Franck and run the tests there. If you think that's necessary, let me know. I know you're anxious to get this done, but I don't know for sure how you view the tradeoff of that urgency with the need to get a clean test comparison.

Comment entered 2009-01-16 14:07:27 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2009-01-16 14:07:27
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::43

It seems like we needed to refresh Franck for some other reason - maybe we should just go ahead and do that and run the reports there.

Comment entered 2009-01-16 14:30:42 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2009-01-16 14:30:42
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::44

If I wanted to look at the CTGOV export XML file for a document, where do I have to go?

Comment entered 2009-01-16 14:58:40 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-01-16 14:58:40
BZCOMMENTOR::Bob Kline
BZCOMMENT::45

Volker:

Could you go ahead and refresh Franck so I can run these tests there? Let me know when it's ready.

Comment entered 2009-01-16 15:02:22 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-01-16 15:02:22
BZCOMMENTOR::Bob Kline
BZCOMMENT::46

(In reply to comment #44)
> If I wanted to look at the CTGOV export XML file for a document, where do I
> have to go?

http://bach.nci.nih.gov/cgi-bin/cdr/ViewCTGovExports.py?job=test-set-a
http://bach.nci.nih.gov/cgi-bin/cdr/ViewCTGovExports.py?job=test-set-b
http://bach.nci.nih.gov/cgi-bin/cdr/ViewCTGovExports.py?job=test-set-c
http://bach.nci.nih.gov/cgi-bin/cdr/ViewCTGovExports.py?job=test-set-d

Comment entered 2009-01-16 16:16:48 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2009-01-16 16:16:48
BZCOMMENTOR::Volker Englisch
BZCOMMENT::47

(In reply to comment #45)
> Let me know when it's ready.

It's ready now.

Comment entered 2009-01-16 17:59:37 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-01-16 17:59:37
BZCOMMENTOR::Bob Kline
BZCOMMENT::48

(In reply to comment #43)
> It seems like we needed to refresh Franck for some other reason - maybe we
> should just go ahead and do that and run the reports there.
>

Here you go:

http://franck.nci.nih.gov/issue4367-sets-e-and-f.html

Here are the documents:

http://bach.nci.nih.gov/cgi-bin/cdr/ViewCTGovExports.py?job=test-set-e
http://bach.nci.nih.gov/cgi-bin/cdr/ViewCTGovExports.py?job=test-set-f

Comment entered 2009-01-21 12:21:17 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2009-01-21 12:21:17
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::49

(In reply to comment #48)
> (In reply to comment #43)
> > It seems like we needed to refresh Franck for some other reason - maybe we
> > should just go ahead and do that and run the reports there.
> >
>
> Here you go:
>
> http://franck.nci.nih.gov/issue4367-sets-e-and-f.html
>
> Here are the documents:
>
> http://bach.nci.nih.gov/cgi-bin/cdr/ViewCTGovExports.py?job=test-set-e
> http://bach.nci.nih.gov/cgi-bin/cdr/ViewCTGovExports.py?job=test-set-f
>

Bob,
Are you expecting comments from CIAT at this point?

Comment entered 2009-01-21 13:53:55 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-01-21 13:53:55
BZCOMMENTOR::Bob Kline
BZCOMMENT::50

(In reply to comment #49)

> Are you expecting comments from CIAT at this point?

See comment #29 for an outline of the test plan. You need to review the report whose link I posted in comment #48 (sets E and F are the Franck equivalents of sets A and B in the Bach tests; we switched to Franck so the diff report wouldn't have any noise created by concurrent editing of the documents between the two test runs), as well as sample documents from set F.

Comment entered 2009-01-21 16:57:42 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2009-01-21 16:57:42
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::51

(In reply to comment #50)
> (In reply to comment #49)
>
> > Are you expecting comments from CIAT at this point?
>
> See comment #29 for an outline of the test plan. You need to review the report
> whose link I posted in comment #48 (sets E and F are the Franck equivalents of
> sets A and B in the Bach tests; we switched to Franck so the diff report
> wouldn't have any noise created by concurrent editing of the documents between
> the two test runs), as well as sample documents from set F.
>

We compared the two reports and they look fine. We did not see anything that was not expected.

Comment entered 2009-01-22 09:32:32 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-01-22 09:32:32
BZCOMMENTOR::Bob Kline
BZCOMMENT::52

As soon as you've had a chance to review the test results and I get the green light from you, Lakshmi, I'll put this into production.

Comment entered 2009-01-22 11:55:26 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2009-01-22 11:55:26
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::53

Were there any error messages in the logs?

Comment entered 2009-01-22 14:18:21 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-01-22 14:18:21
BZCOMMENTOR::Bob Kline
BZCOMMENT::54

(In reply to comment #53)

> Were there any error messages in the logs?

!1120 Fri Jan 16 17:10:36 2009: failure processing CDR269315: no match for study category 'BIOMARKER/LABORATORY ANALYSIS' found
!1120 Fri Jan 16 17:14:09 2009: failure processing CDR378088: no match for study category 'BIOMARKER/LABORATORY ANALYSIS' found
!1120 Fri Jan 16 17:19:20 2009: failure processing CDR485360: no match for study category 'BIOMARKER/LABORATORY ANALYSIS' found
!1120 Fri Jan 16 17:22:04 2009: failure processing CDR547101: no match for study category 'BIOMARKER/LABORATORY ANALYSIS' found
!1120 Fri Jan 16 17:22:25 2009: failure processing CDR554708: missing intervention description
!1120 Fri Jan 16 17:23:36 2009: failure processing CDR574195: no match for study category 'BIOMARKER/LABORATORY ANALYSIS' found
!1120 Fri Jan 16 17:24:06 2009: failure processing CDR581165: no match for study category 'TISSUE COLLECTION/REPOSITORY' found
!1120 Fri Jan 16 17:26:17 2009: failure processing CDR613100: no match for study category 'TISSUE COLLECTION/REPOSITORY' found
!1120 Fri Jan 16 17:26:34 2009: failure processing CDR617990: missing intervention description

Comment entered 2009-01-26 09:45:40 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2009-01-26 09:45:40
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::55

(In reply to comment #54)
> (In reply to comment #53)
>
> > Were there any error messages in the logs?
>

The errors have been fixed. According to Mary:

"I didn’t see any problems that would preclude CT.gov intervention mapping with the following docs:
485360
547101
"

Comment entered 2009-01-26 13:23:36 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2009-01-26 13:23:36
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::56

According to Mary:
> "I didn’t see any problems that would preclude CT.gov intervention mapping
> with the following docs:
> 485360
> 547101
> "

Well she needs to look at issues a little more carefully - the error message indicates that 485360 cannot be processed for export because there is no match for the Biomarker/Lab Analysis value since we do not export these studies to CTGOV. i would ask her to review the previous versions of this trial . At some point it went from being a Research study with Primary type of Biomarker Lab analysis to a Clinical Trial with primary type of biomarker lab analysis.

Somehow the the trial is already on CTGOV - as a treatment study. We may have made some code changes since the time the study was originally published and now the trial is really not being updated. Given that it is already registered and JHOC may not be happy if we pull this trial, we may have to adopt a slightly different data standard to this trial. Please ask Mary to call if she has questions.

She may want to look closely at the other trial as well.

Comment entered 2009-01-27 11:05:40 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2009-01-27 11:05:40
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::57

I talked to Bob and we will upload the export job from Franck to PRS Test. VOlker will need to let us know about the error messages that show up in the PRS Admin screen as a result of this load

Comment entered 2009-01-27 14:49:02 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-01-27 14:49:02
BZCOMMENTOR::Bob Kline
BZCOMMENT::58

I've been looking at the results of the PRS test and have discovered something fishy in the output the new software generated. I'm seeing trials with multiple intervention blocks, some with arm_group_label children and some without. Not supposed to happen. I am digging in to find out why it did.

Comment entered 2009-01-27 15:58:36 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-01-27 15:58:36
BZCOMMENTOR::Bob Kline
BZCOMMENT::59

Found the bug. Will fix and do another test run.

Comment entered 2009-01-27 18:27:46 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-01-27 18:27:46
BZCOMMENTOR::Bob Kline
BZCOMMENT::60

(In reply to comment #59)
> Found the bug. Will fix and do another test run.
>

http://bach.nci.nih.gov/cgi-bin/cdr/ViewCTGovExports.py?job=test-set-g

Volker:

Please upload this to PRS Test.

Comment entered 2009-01-28 10:11:06 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-01-28 10:11:06
BZCOMMENTOR::Bob Kline
BZCOMMENT::61

Here's the diff file for the latest run with the fixed code:

http://bach.nci.nih.gov/issue4367-sets-e-and-g.html

Comment entered 2009-01-30 09:41:28 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-01-30 09:41:28
BZCOMMENTOR::Bob Kline
BZCOMMENT::62

(In reply to comment #60)

> Please upload this to PRS Test.

The test came out clean.

Comment entered 2009-02-03 09:37:19 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2009-02-03 09:37:19
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::63

Go ahead and move changes to production so we can publish with the new code tonight.

Comment entered 2009-02-03 09:47:19 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-02-03 09:47:19
BZCOMMENTOR::Bob Kline
BZCOMMENT::64

Promoted to Bach.

Comment entered 2009-02-04 11:02:09 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2009-02-04 11:02:09
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::65

(In reply to comment #56)
> Somehow the the trial is already on CTGOV - as a treatment study. We may have
> made some code changes since the time the study was originally published and
> now the trial is really not being updated. Given that it is already registered
> and JHOC may not be happy if we pull this trial, we may have to adopt a
> slightly different data standard to this trial. Please ask Mary to call if she
> has questions.
>
> She may want to look closely at the other trial as well.
>

Lakshmi:
Mary mentioned to me yesterday that you had decided to map the research trials that fail to be exported to intervention type - "Other" so that they could successfully be exported. Should I put in a different issue for this case or it would be taken care of under this issue?

Comment entered 2009-02-04 15:39:01 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2009-02-04 15:39:01
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::66

I think the suggestion from Mary was to map all Laboratory Analysis/Biomarker studies to have study_type of Interventional and interventional_subtype of Other. There are some other issues with regard to mapping these as above:

1. Make sure that these studies have Number of Arms value mapped correctly - they should all be single arm.

2. They should have Outcome measures, PrimaryCompletion Dates, FDA Regulated information blocks - at least for the ones that were active as of the Dec 26, 2007 cutoff date

3. Make sure these trials will all have Interventions

Here is the requirement for Intervention studies from CTGOV DTD
Primary and secondary outcomes are required for interventional studies,
optional for observational studies.

Primary completion date and type are required for interventional studies,
optional for observational studies.

For interventional studies, if number_of_arms > 1, the corresponding
number of arm_group tags must be included.

Comment entered 2009-02-05 08:24:17 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-02-05 08:24:17
BZCOMMENTOR::Bob Kline
BZCOMMENT::67

Lakshmi:

I wasn't sure how much of the previous comment was directed to me (as "go ahead and make the software take care of this") and how much to CIAT (for them to do data cleanup and QA), and whether you intended William to create a new issue for handling these trials, or to piggy-back the additional work on this issue.

Comment entered 2009-02-17 20:21:51 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2009-02-17 20:21:51
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::68

Closing this issue per CDR Meeting of 02/12/2009.

A new issue #4487 has been created to address the mapping of Biomarker/Lab Analysis studies.

Attachments
File Name Posted User
failures.log 2008-12-02 15:58:15
InterventionNameSemanticTypes.xls 2008-12-11 16:04:25
InterventionNameSemanticTypes.xls 2008-12-09 11:04:01
multiple-types.log 2008-12-02 14:41:32

Elapsed: 0:00:00.001778