PDQ Issues

Issue Number	2769
Summary	[CTGov] Include additional elements in CTGOV imports
Created	2009-01-14 15:14:58
Issue Type	Improvement
Submitted By	Osei-Poku, William (NIH/NCI) [C]
Assigned To	Kline, Bob (NIH/NCI) [C]
Status	Closed
Resolved	2010-09-15 13:30:27
Resolution	Fixed
Path	/home/bkline/backups/jira/ocecdr/issue.107097

Description

BZISSUE::4444
BZDATETIME::2009-01-14 15:14:58
BZCREATOR::William Osei-Poku
BZASSIGNEE::Bob Kline
BZQACONTACT::Lakshmi Grama

There are a couple of elements in CTGOV that are not currently included in the imports into the CDR but would be useful to have. We would like CTGOV imports to include the following elements:
1. Primary Outcome Measures
2. Secondary Outcome Measures (When applicable)
3. Arms and Assigned Interventions (When applicable)- This information is usually in a table format on CTGOV and it is under the Arms, Groups and Interventions elements.
4. Groups/Cohorts information (when applicable). This information is usually in a table formate on CTGOV under the Arms, Groups and Interventions elements.

The following two elements will be helpful to have but not as important as the four above:

5. Biospecimen Retention
6. Biospecimen Description

Comment entered 2009-01-15 09:31:30 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-01-15 09:31:30
BZCOMMENTOR::Bob Kline
BZCOMMENT::1

Please provide schema change and mapping specs for the new elements you want to have under CTGovIndexing. I can come up with proposals for those myself if that's needed, but in the past CIAT and OCCM have preferred to take the lead for determining the changes to the document structures and import/export mapping logic.

Comment entered 2009-02-03 08:48:43 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2009-02-03 08:48:43
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::2

I have included proposed schema changes and mapping specs in the attached file.

Comment entered 2009-02-03 08:48:43 by Osei-Poku, William (NIH/NCI) [C]

Attachment ClinicalTrialsSchemaChanges.doc has been added with description: Proposed Schema Changes and Mapping Specs

Comment entered 2009-02-03 09:39:53 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-02-03 09:39:53
BZCOMMENTOR::Bob Kline
BZCOMMENT::3

Some questions.

Lakshmi: I've made you the QA contact so you can make sure you're happy with the end result here.

It's not clear to me why the Outcomes element should be multiply-occurring. If we do make it multiply-occurring, how would the import software know which incoming primary_outcome and secondary_outcome elements get placed in which Outcomes blocks in the imported document?

It's also not clear why the incoming measure element (whose name is singular) should map to an element whose name is plural (OutcomeMeasures). Along the same lines, why would primary_outcome and secondary_outcome, which can only contain a single outcome child, be mapped to an element whose name is plural (PrimaryOutcomeMeasures)?

I'd raise the same concern about the fact that the name ArmsAssignedInterventions is plural when it itself is multiply occurring and each only contains a single ArmGroup child element, but it's not clear why the name should have "Intervention" in it at all (singular or plural). If the semantics of this part of the incoming document bear any resemblance to those for documents we're sending to ClinicalTrial.gov ourselves, arms and interventions are not the same thing. Instead arms and interventions are given separately, with links in the interventions to show which arms they're associated with (if any), but there's no indication in the mapping table that we're supposed to implement the logic to extract these associations, nor is there any place to put these connections in the new elements to be added to the schema.

For BiospecimenRetention, is there any reason we're only allowing two of the three values they're telling us to expect?

Are you sure the names of the incoming elements are "biospec_retention" and "biospec_descr" (I see "biospecimen_retention" and "biospecimen_description" in the version of the DTD I'm looking at [1], but not "biospec_retention" or "biospec_descr"). If the incoming biospecimen_retention (or biospec_retention) is a singly-occurring child of the observational_design element, of which there can be only one in the entire incoming document according to this version of the DTD, why would our schema want to allow multiple occurrences of BiospecimenRetention, to which the incoming element is to be mapped? Same question for biospecimen_description (or biospec_descr).

[1] https://register.clinicaltrials.gov/prs/html/clinical_study.dtd

Comment entered 2009-02-03 10:21:53 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2009-02-03 10:21:53
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::4

(In reply to comment #3)
> Some questions.
>
> Lakshmi: I've made you the QA contact so you can make sure you're happy with
> the end result here.
>
>
> It's also not clear why the incoming measure element (whose name is singular)
> should map to an element whose name is plural (OutcomeMeasures). Along the
> same lines, why would primary_outcome and secondary_outcome, which can only
> contain a single outcome child, be mapped to an element whose name is plural
> (PrimaryOutcomeMeasures)?

The display elements (in protocols on CTGOV) has the elements as Primary Outcome Measures and Secondary Outcome measures. I thought it would be good to have the elements in CDR and CTgov look as close as possible since in many cases users would be comparing the two documents while abstracting.
>
> I'd raise the same concern about the fact that the name
> ArmsAssignedInterventions is plural when it itself is multiply occurring and
> each only contains a single ArmGroup child element, but it's not clear why the
> name should have "Intervention" in it at all (singular or plural). If the
> semantics of this part of the incoming document bear any resemblance to those
> for documents we're sending to ClinicalTrial.gov ourselves, arms and
> interventions are not the same thing. Instead arms and interventions are given
> separately, with links in the interventions to show which arms they're
> associated with (if any), but there's no indication in the mapping table that
> we're supposed to implement the logic to extract these associations, nor is
> there any place to put these connections in the new elements to be added to the
> schema.

This data is presented in a table form (as mentioned in my first post) in CTGov with one column titled "Arms" and the other column titled "Assigned Interventions".

>
> For BiospecimenRetention, is there any reason we're only allowing two of the
> three values they're telling us to expect?

This was an oversight. I thought the first option was a comment instead of an option.

>
> Are you sure the names of the incoming elements are "biospec_retention" and
> "biospec_descr" (I see "biospecimen_retention" and "biospecimen_description" in
> the version of the DTD I'm looking at [1], but not "biospec_retention" or
> "biospec_descr"). If the incoming biospecimen_retention (or biospec_retention)
> is a singly-occurring child of the observational_design element, of which there
> can be only one in the entire incoming document according to this version of
> the DTD, why would our schema want to allow multiple occurrences of
> BiospecimenRetention, to which the incoming element is to be mapped? Same
> question for biospecimen_description (or biospec_descr).
>
> [1] https://register.clinicaltrials.gov/prs/html/clinical_study.dtd

It looks like I was using a different DTD than the one above

http://clinicaltrials.gov/ct2/html/images/info/public.dtd
>

Comment entered 2009-02-03 10:49:26 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2009-02-03 10:49:26
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::5

If a data element exists in InscopeProtocol and in CTGOV, we want to match the schema to the InscopeProtocol as much as possible. That way we can be consistent. I will look at the specifics and get back to you.

Comment entered 2009-02-19 15:10:50 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-02-19 15:10:50
BZCOMMENTOR::Bob Kline
BZCOMMENT::6

Lakshmi asked me to find out if CT.gov are already including the 'type' attribute on the enrollment elements in what they send us. They aren't.

Comment entered 2009-02-23 15:39:15 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-02-23 15:39:15
BZCOMMENTOR::Bob Kline
BZCOMMENT::7

(In reply to comment #6)
> Lakshmi asked me to find out if CT.gov are already including the 'type'
> attribute on the enrollment elements in what they send us. They aren't.

She also asked me to determine whether we have CTExpectedEnrollment in any of our documents. We do, in 176 of them.

Comment entered 2009-02-24 10:34:12 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-02-24 10:34:12
BZCOMMENTOR::Bob Kline
BZCOMMENT::8

I am unable to install the expanded schema on Mahler, because Sheri has it locked. I am also unable to add Sheri as a CC to this tracker issue for some reason, so William, please communicate with Sheri to get her to check the schema back in.

Comment entered 2009-02-24 10:36:35 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-02-24 10:36:35
BZCOMMENTOR::Bob Kline
BZCOMMENT::9

Right. I had forgotten that part of this new feature involved a change to the CDR Server. Unless this is urgent, I plan to install the modified server after working hours tonight.

Comment entered 2009-02-24 10:44:03 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-02-24 10:44:03
BZCOMMENTOR::Bob Kline
BZCOMMENT::10

(In reply to comment #9)
> Right. I had forgotten that part of this new feature involved a change to the
> CDR Server. Unless this is urgent, I plan to install the modified server after
> working hours tonight.

Wrong issue. Please ignore comment.

Comment entered 2009-02-24 10:49:16 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2009-02-24 10:49:16
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::11

(In reply to comment #8)
> I am unable to install the expanded schema on Mahler, because Sheri has it
> locked. I am also unable to add Sheri as a CC to this tracker issue for some
> reason, so William, please communicate with Sheri to get her to check the
> schema back in.

The schema is checked-in now. Sheri finally resigned last month.

Comment entered 2009-02-24 14:21:12 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-02-24 14:21:12
BZCOMMENTOR::Bob Kline
BZCOMMENT::12

I have expanded the CTGovProtocol schema along the lines discussed in Thursday's meeting, based on the notes I took. I guess there are two things for the users to verify: (1) that I have represented the structures for the new information appropriately; and (2) that the changes to the schema don't invalidate existing documents which were valid against the previous version of the schema.

http://bach.nci.nih.gov/cgi-bin/cdr/GetSchema.py?id=349686

Comment entered 2009-02-24 14:25:31 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-02-24 14:25:31
BZCOMMENTOR::Bob Kline
BZCOMMENT::13

(In reply to comment #12)
> I have expanded the CTGovProtocol schema along the lines discussed in
> Thursday's meeting, based on the notes I took. I guess there are two things
> for the users to verify: (1) that I have represented the structures for the new
> information appropriately; and (2) that the changes to the schema don't
> invalidate existing documents which were valid against the previous version of
> the schema.
>
> http://bach.nci.nih.gov/cgi-bin/cdr/GetSchema.py?id=349686

Wrong server. URL should be:

http://mahler.nci.nih.gov/cgi-bin/cdr/GetSchema.py?id=349686

Comment entered 2009-03-03 17:57:03 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2009-03-03 17:57:03
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::14

(In reply to comment #13)
> (In reply to comment #12)
> > I have expanded the CTGovProtocol schema along the lines discussed in
> > Thursday's meeting, based on the notes I took. I guess there are two things
> > for the users to verify: (1) that I have represented the structures for the new
> > information appropriately; and (2) that the changes to the schema don't
> > invalidate existing documents which were valid against the previous version of
> > the schema.
> >

Tested on Mahler. I will enter a new issue to modify the CSS Template.
Another level of testing would be to force import a new document on Mahler (after the schema changes have been verified by Lakshmi).

Comment entered 2009-03-04 10:13:42 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2009-03-04 10:13:42
BZCOMMENTOR::Volker Englisch
BZCOMMENT::15

(In reply to comment #8)
> I am also unable to add Sheri as a CC to this tracker issue for some
> reason

The reason is that her user account has been disabled since all of her email to Lockheed bounced back after she resigned.

Comment entered 2009-03-06 11:07:46 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-03-06 11:07:46
BZCOMMENTOR::Bob Kline
BZCOMMENT::16

Next step is for Lakshmi to review the schema changes and (if satisfied) give the green light to proceed with implementing the changes to the import software.

Comment entered 2009-03-18 18:59:53 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2009-03-18 18:59:53
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::17

I think the schema looks OK -

>>that the changes to the schema don't
> > invalidate existing documents which were valid against the previous version of
> > the schema.

Is there a way you can test this perhaps on Franck?

Comment entered 2009-03-19 08:00:26 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-03-19 08:00:26
BZCOMMENTOR::Bob Kline
BZCOMMENT::18

(In reply to comment #17)

> Is there a way you can test this perhaps on Franck?

I installed the new version of the schema on Franck and ran the CheckValidity.py script for the CTGovProtocol documents and none of the validation status were changed.

Comment entered 2009-03-24 16:06:35 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-03-24 16:06:35
BZCOMMENTOR::Bob Kline
BZCOMMENT::19

Should I assume I have the green light to start implementing the changes to the import software?

Comment entered 2009-03-26 08:14:18 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2009-03-26 08:14:18
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::20

Yes - please go ahead.

Comment entered 2009-03-26 16:24:01 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-03-26 16:24:01
BZCOMMENTOR::Bob Kline
BZCOMMENT::21

I just noticed the following comment up in the history section of NLM's DTD:

Moved and renamed expected_enrollment to enrollment

Our schema currently has:

CTGovProtocol
RequiredHeader
....
CTEnrollment
....
CTEligibility
....
CTExpectedEnrollment
....
....

Should we:

1. Keep this structure, mapping enrollment to CTEnrollment?
2. Drop CTExpectedEnrollment from the schema, do a global change
to move the values to CTEnrollment, and map enrollment to
CTEnrollment for future imports?
3. Drop CTEnrollment from the top-level children, and map
enrollment to CTEligibility/CTExpectedEnrollment?
4. Napalm NLM?

Comment entered 2009-03-26 16:29:20 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2009-03-26 16:29:20
BZCOMMENTOR::Volker Englisch
BZCOMMENT::22

Option (4) seems to be appealing but I'm afraid it's not politically correct. :-)

Comment entered 2009-03-27 15:59:23 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-03-27 15:59:23
BZCOMMENTOR::Bob Kline
BZCOMMENT::23

I noticed that in the new CTGovProtocol schema we have mapped link/url to RelatedWebsites. I must have copied this from the common protocol schema and the terminology schema, but the use of the plural name for an element which contains a link for a single web site is surely wrong. Did we already have this discussion?

Comment entered 2009-03-27 16:25:40 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-03-27 16:25:40
BZCOMMENTOR::Bob Kline
BZCOMMENT::24

Note that this mapping table reflects a couple of outstanding mapping questions (see comments #21 and #23).

[Volker: Why does Bugzilla no longer include a MIME type for Excel spreadsheets when making an attachment for an issue?]

Comment entered 2009-03-27 16:25:40 by Kline, Bob (NIH/NCI) [C]

Attachment task-4444-mapping.xls has been added with description: Proposed mappings for import software

Comment entered 2009-03-27 16:56:04 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2009-03-27 16:56:04
BZCOMMENTOR::Volker Englisch
BZCOMMENT::25

(In reply to comment #24)
> [Volker: Why does Bugzilla no longer include a MIME type for Excel spreadsheets
> when making an attachment for an issue?]

Because it wasn't part of the installation and we added the MIME type ourselves and I didn't think it would be necessary with the current installation to add it since Bugzilla is now smart enough to auto-detect Excel spreadsheets.

Comment entered 2009-05-07 14:17:42 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-05-07 14:17:42
BZCOMMENTOR::Bob Kline
BZCOMMENT::26

Lakshmi says we can keep the incorrectly named RelatedWebsites. Bob will follow up with John G. to find out what their intentions are for enrollment values (are they currently implied to be "expected" values? will we get multiple occurrences if/when they start giving us "type" attribute values? when we start getting "type" attributes, will we just get "actual" with "expected" implied, or will the attribute be required?).

Comment entered 2009-05-07 15:35:48 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-05-07 15:35:48
BZCOMMENTOR::Bob Kline
BZCOMMENT::27

Sent the following note to John Gillen:

John:

Could you give us a little guidance on the use of the enrollment element in XML documents retrieved from ClinicalTrials.gov? The DTD indicates that it was "moved and renamed" from expected_enrollment, and there is also a comment which says "In future: expect 'type' attribute." Is it correct to interpret the values we are currently getting (without the type attribute) as "expected"? When the type attribute is included, will the values be "expected" and "actual"? Any idea on the time frame when this change might occur?

Thanks!

Comment entered 2009-05-12 07:43:59 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-05-12 07:43:59
BZCOMMENTOR::Bob Kline
BZCOMMENT::28

Here's John's initial reply [2009-05-09]:

Not sure why this has not been added to the public xml. Will take
this up with others on the software team and one of us will get back
to you shortly.

We are collecting enrollment type in the PRS, with values "Actual" or
"Anticipated". The basic idea is to specify anticipated enrollment
up front, update it along the way, and then to replace it with actual
enrollment after study completion.

That said, enrollment type is one of those FDAAA-required data
elements, so it is not present in all records (long story). Once
this is added to the public site xml, the absence of it can only
safely be interpreted to mean "unknown", except in those cases where
it is impossible to have actual enrollment (i.e., a study that is
recruiting or not-yet-recruiting).

Comment entered 2009-05-13 10:22:38 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-05-13 10:22:38
BZCOMMENTOR::Bob Kline
BZCOMMENT::29

[John G., 2009-05-12]

Just to follow up...
We will add the "type" attribute to the enrollment tag in the
ClinicalTrials.gov XML. Value will either be "Actual" or
"Anticipated", when the data provider has supplied this
information. The changes should be in place by the end of
the week.

Comment entered 2009-06-29 15:57:33 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-06-29 15:57:33
BZCOMMENTOR::Bob Kline
BZCOMMENT::30

(In reply to comment #21)
> I just noticed the following comment up in the history section of NLM's DTD:
>
> Moved and renamed expected_enrollment to enrollment
>
> Our schema currently has:
>
> CTGovProtocol
> RequiredHeader
> ....
> CTEnrollment
> ....
> CTEligibility
> ....
> CTExpectedEnrollment
> ....
> ....
>
> Should we:
>
> 1. Keep this structure, mapping enrollment to CTEnrollment?
> 2. Drop CTExpectedEnrollment from the schema, do a global change
> to move the values to CTEnrollment, and map enrollment to
> CTEnrollment for future imports?
> 3. Drop CTEnrollment from the top-level children, and map
> enrollment to CTEligibility/CTExpectedEnrollment?
> 4. Napalm NLM?

It was never decided what we want to do here. Option #4 was obviously just a joke, and option #3 no longer seems appropriate in light of the clarification we have from John G. (that is, some values for the new incoming enrollment element are expected, some are actual, and some are not identified as either type of value). So we need to decide whether we'll keep the existing CTExpectedEnrollment element in the schema (and the values in the documents), just using the new CTEnrollment element for new import jobs, or move the existing data to the new element with a global change, setting the Type attribute to 'Anticipated' (or 'Expected' if that's the term we'd rather use), dropping the old element from the schema.

Currently the proposed schema has the 'Type' attribute as a string. Should we leave it that way, just sticking whatever value we get from NLM into the attribute? Or should we constrain the attribute with an enumerated valid values list? If the latter, do we want to have 'Anticipated' (or 'Expected'), 'Actual' and 'Unknown' (reflecting the language in John's reply) or just leave it out if we can't determine what the value should be? Should we use the logic he alluded to of determining that the value must be 'Anticipated' if the status is such that they can't possibly know yet what the actual enrollment numbers are?

What would you like us to do, Lakshmi?

Comment entered 2009-07-23 08:08:45 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2009-07-23 08:08:45
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::31

>>> So we need to decide whether we'll keep the existing
>>>CTExpectedEnrollment element in the schema (and the values in the documents),
>>>just using the new CTEnrollment element for new import jobs, or move the
>>>existing data to the new element with a global change, setting the Type
>>>attribute to 'Anticipated' (or 'Expected' if that's the term we'd rather use),
>>>dropping the old element from the schema.

1. My preference is to create a new element CTEnrollment with the three attributes - Anticipated, Actual, Unknown.

2.Since we have not imported the new element before, will we be able to just import the data for all trials from the new element in CTGOV into the new element? That would seem to negate the need to move the existing data to the new element and set the type to Anticipated. Otherwise, I agree we need to move the data.

3. Can you confirm that the CTGOV DTD for licensees does not include CTExpectedenrollment. If it does, we need to coordinate with Volker so that CTExpectedEnrollment uses data from the new field - only values that have ahticipated.

Reply to
"Currently the proposed schema has the 'Type' attribute as a string. Should we
leave it that way, just sticking whatever value we get from NLM into the
attribute? Or should we constrain the attribute with an enumerated valid
values list? If the latter, do we want to have 'Anticipated' (or 'Expected'),
'Actual' and 'Unknown' (reflecting the language in John's reply) or just leave
it out if we can't determine what the value should be?"

Use Anticipated, Actual and just leave it out if we don't know what the value is. I don't see us displaying Unknown with the data on Cancer.gov

Reply to
"Should we use the logic he alluded to of determining that the value must be 'Anticipated' if the status is such that they can't possibly know yet what the actual enrollment numbers are?"

If they are implementing the check on their end, why do we need to implement it as well. We should be able to take their values and use them as is.

Comment entered 2009-07-23 09:19:21 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-07-23 09:19:21
BZCOMMENTOR::Bob Kline
BZCOMMENT::32

(In reply to comment #31)

> 2.Since we have not imported the new element before, will we be able to just
> import the data for all trials from the new element in CTGOV into the new
> element? That would seem to negate the need to move the existing data to the
> new element and set the type to Anticipated. Otherwise, I agree we need to . > move the data.

We can:

1. Strip the old element out (from schema and documents)
2. Move the data and drop the old element from the schema
3. Leave the obsolete data in the documents (and the schema).

I'm not sure which of these you're asking for (maybe I just need to finish my coffee and it will make sense :-) ).

>
> 3. Can you confirm that the CTGOV DTD for licensees does not include
> CTExpectedenrollment.

No, just ExpectedEnrollment (Volker may be mapping CTExpectedEnrollment into this element).

> Reply to
> "Should we use the logic he alluded to of determining that the value must be
> 'Anticipated' if the status is such that they can't possibly know yet what the
> actual enrollment numbers are?"
>
> If they are implementing the check on their end, why do we need to implement it
> as well. We should be able to take their values and use them as is.

I wasn't thinking so much about a validation check. I was instead asking about taking advantage of his assurance about what the absence of the attribute would mean when the status of the trial indicated that accrual numbers couldn't be actual (because the trial wasn't finished recruiting), in which case we could set the attribute to "Anticipated" ourselves.

Comment entered 2009-09-01 11:30:13 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-09-01 11:30:13
BZCOMMENTOR::Bob Kline
BZCOMMENT::33

NLM added nct_alias to the id_info block back in 2007. We're not picking up that element. Should we be?

Comment entered 2009-09-03 12:48:59 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-09-03 12:48:59
BZCOMMENTOR::Bob Kline
BZCOMMENT::34

LG: go ahead and pick up (store) nct_alias.

Comment entered 2009-09-08 10:52:59 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-09-08 10:52:59
BZCOMMENTOR::Bob Kline
BZCOMMENT::35

NLM's current DTD has both completion_date and primary_completion_date elements (both optional, singly-occurring top-level elements).

The comment at the top of the DTD says:

<!– 05/12/09 added tag <completion_date> -->
<!– This field is the last followup date, when available, -->
<!– otherwise, end date. <end_date> tag is now obsolete. -->

We have PrimaryCompletionDate and EndDate in our modified schema, but no CompletionDate (our schema modifications were made before May, when NLM made this change). What should we do about the new completion_date element, and about the fact that end_date is now obsolete?

Comment entered 2009-09-10 13:03:56 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-09-10 13:03:56
BZCOMMENTOR::Bob Kline
BZCOMMENT::36

(In reply to comment #35)
> NLM's current DTD has both completion_date and primary_completion_date elements
> (both optional, singly-occurring top-level elements).
>
> The comment at the top of the DTD says:
>
> <!– 05/12/09 added tag <completion_date> -->
> <!– This field is the last followup date, when available, -->
> <!– otherwise, end date. <end_date> tag is now obsolete. -->
>
> We have PrimaryCompletionDate and EndDate in our modified schema, but no
> CompletionDate (our schema modifications were made before May, when NLM made
> this change). What should we do about the new completion_date element, and
> about the fact that end_date is now obsolete?

LG: Ask John G if the data that used to be in end_date is now being stored in completion_date. If not, ask him what we should do with our end_date values.

Comment entered 2009-09-17 10:58:38 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-09-17 10:58:38
BZCOMMENTOR::Bob Kline
BZCOMMENT::37

(In reply to comment #36)

> LG: Ask John G if the data that used to be in end_date is now being stored in
> completion_date. If not, ask him what we should do with our end_date values.

Sent question to John G. Will post his reply when it's received.

Comment entered 2009-09-21 11:09:36 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2009-09-21 11:09:36
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::38

The import software was modified to accommodate changes in expanded access status in OCECDR-445. This needs to be tested along with changes in this issue.

Specifically, "Temporarily not available” was mapped to "Temporarily closed."
and "Approved for marketing" was mapped to "Completed"

Comment entered 2009-09-21 11:13:32 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2009-09-21 11:13:32
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::39

(In reply to comment #38)
> The import software was modified to accommodate changes in expanded access
> status in OCECDR-445. This needs to be tested along with changes in this issue.
> Specifically, "Temporarily not available” was mapped to "Temporarily closed."
> and "Approved for marketing" was mapped to "Completed"

Correction: The changes were done in OCECDR-2824 and not OCECDR-445. Thanks!

Comment entered 2009-09-30 11:59:19 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-09-30 11:59:19
BZCOMMENTOR::Bob Kline
BZCOMMENT::40

(In reply to comment #37)
> (In reply to comment #36)
>
> > LG: Ask John G if the data that used to be in end_date is now being stored in
> > completion_date. If not, ask him what we should do with our end_date values.
>
> Sent question to John G. Will post his reply when it's received.

Here's his answer (at the end of a multi-message exchange):

======================= START QUOTED MESSAGE ===========================

> As for the original question, you advise us to "ignore end date." That
> still leaves the question of what to do with the data we have from that
> element which we have in previously imported documents. Should we strip
> that out? Move the information into the new CTCompletionDate element?

This is difficult for us to answer without fully understanding your
application, but maybe explaining more about the history of these data
elements will help. On the PRS side, the old end date data element was
"depracated" some time ago in favor of the more precisely defined "last
follow-up date" (now labeled study completion date in the PRS web
interface). Because we had old records with end date values that did
not necessarily equate to LFUD, we could not simply rename the original
end date field.

On the public site, our approach is to use the LFUD for completion date
if it has been specified. If not, and the end date has been specified,
we use that. So the completion date can be thought of as the "best
available" completion date.

So if you have a record where LFUD was used as completion date, it is
best to use that. If you have a record where end date was used, that
has effectively already been copied over for you. In either case, there
should be no need to hold onto the old end date value.

Hope this helps.

======================= END QUOTED MESSAGE ===========================

So I guess this means we should take out the deprecated element from the schema, and modify the import software to avoid creating it. Presumably this will obviate the need for a global change to strip the element out, since the change in the import software should result in modifications to all of the imported documents (or at least the ones that NLM hasn't dropped from the set we get from them).

Comment entered 2009-10-29 17:36:36 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-10-29 17:36:36
BZCOMMENTOR::Bob Kline
BZCOMMENT::41

After discussion between Bob, Alan, Volker, and Lakshmi, we have decided that I will back out the work I have done on this task and create a branch for those changes in the source version control system. This will allow me to proceed with work on other enhancement requests which have been coming in involving the same source code affected by this request (and which have been assigned higher priority than this one has) without having to bypass the version control system and manually patch the production system.

We'll resume this task later on, when the dust has settled on the more urgent protocol-related work. At that point, I'll merge the branch back into the trunk, Volker will implement the changes to the publishing scripts, CSS, etc., and CIAT will test the results.

Don't forget to create the task for Volker to modify the CSS connected with these changes.

Comment entered 2009-10-30 08:51:40 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2009-10-30 08:51:40
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::42

(In reply to comment #41)
>
> Don't forget to create the task for Volker to modify the CSS connected with
> these changes.

I have created a new task - OCECDR-3007 - for the CSS.

Comment entered 2009-10-30 16:55:28 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-10-30 16:55:28
BZCOMMENTOR::Bob Kline
BZCOMMENT::43

(In reply to comment #41)
> After discussion between Bob, Alan, Volker, and Lakshmi, we have decided that I
> will back out the work I have done on this task and create a branch for those
> changes in the source version control system.

Branch created at https://imbncipf01.nci.nih.gov/svn/CDR/branches/Task4444. The trunk now contains all of the other patches which have been applied to the production system, but none of the work done for this task, so work can now proceed on the other tasks involving the CTGovProtocol schema and the XSL/T import filter.

Comment entered 2009-11-10 08:25:44 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-11-10 08:25:44
BZCOMMENTOR::Bob Kline
BZCOMMENT::44

Even though this issue is at the top of my task priority queue of issues for which I'm not blocked waiting for responses from CIAT or ICRDB, we agreed that we would defer further work on this issue until the other changes for CT.gov export currently being tested have been promoted (and Cancer.gov is closer to being ready to use the results of this task). So I'm lowering the priority to remove it from the front of my queue. Feel free to raise it again when you're ready to resume work on this task.

Comment entered 2010-04-29 14:10:38 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-04-29 14:10:38
BZCOMMENTOR::Bob Kline
BZCOMMENT::45

Bumped up priority at Lakshmi's suggestion.

Comment entered 2010-05-11 11:39:53 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-05-11 11:39:53
BZCOMMENTOR::Bob Kline
BZCOMMENT::46

This task is now back at the top of my queue. Are you ready for me to install the new schema, filter, and import script and start testing import of the new elements? If so, should that be done on Mahler first or Franck?

Comment entered 2010-05-12 12:10:17 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-05-12 12:10:17
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::47

(In reply to comment #46)
> This task is now back at the top of my queue. Are you ready for me to install
> the new schema, filter, and import script and start testing import of the new
> elements? If so, should that be done on Mahler first or Franck?

Yes. Let's start with Mahler first before going to Franck.

Comment entered 2010-05-12 15:08:01 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-05-12 15:08:01
BZCOMMENTOR::Bob Kline
BZCOMMENT::48

(In reply to comment #47)

> Yes. Let's start with Mahler first before going to Franck.

OK. I have installed the new version of the schema and the modified XSL/T import filter on Mahler, and I ran an import job in test mode. Test mode stops after importing 10 trials, and that way we can look at the partial results, find and fix any problems in that subset, and run another test job.

!8416 Wed May 12 14:08:11 2010: Updated CDR360653 from NCT00001469
!8416 Wed May 12 14:08:14 2010: Updated CDR647606 from NCT00002786
!8416 Wed May 12 14:08:22 2010: Updated CDR651987 from NCT00002864
!8416 Wed May 12 14:11:23 2010: Updated CDR65921 from NCT00003140
!8416 Wed May 12 14:11:26 2010: Updated CDR647609 from NCT00003145
!8416 Wed May 12 14:11:28 2010: Updated CDR66633 from NCT00003567
!8416 Wed May 12 14:11:29 2010: Updated CDR68001 from NCT00005997
!8416 Wed May 12 14:11:37 2010: Updated CDR68027 from NCT00006018
!8416 Wed May 12 14:11:39 2010: Updated CDR68197 from NCT00006261
!8416 Wed May 12 14:11:40 2010: Updated CDR68862 from NCT00023790
!8416 Wed May 12 14:11:42 2010: Updated CDR360755 from NCT00031551

Here are the links for viewing the imported documents:

Here's a link for viewing the schema:

http://mahler.nci.nih.gov/cgi-bin/cdr/GetSchema.py?id=349686

Please review the new schema and the imported documents and let me know if you see any anomalies that need to be addressed. Once we've done enough of the small batches to satisfy you that we're ready to run the rest of the import job, I'll kick it off without the testing throttle.

Comment entered 2010-06-03 11:11:26 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-06-03 11:11:26
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::49

(In reply to comment #48)
> (In reply to comment #47)
>
> > Yes. Let's start with Mahler first before going to Franck.
>
> OK. I have installed the new version of the schema and the modified XSL/T
> import filter on Mahler, and I ran an import job in test mode. Test mode stops
> after importing 10 trials, and that way we can look at the partial results,
> find and fix any problems in that subset, and run another test job

I am reviewing the schema and the imported trials. So far, I have not come across any problems. I will post anything I find.

Comment entered 2010-06-15 15:56:01 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-06-15 15:56:01
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::50

Should I also be looking at the imported documents in XMetal on Mahler? When I try to retrieve the documents, I get the "Document does not conform to DTD or XML Schema" dialog box.

Comment entered 2010-06-15 16:34:23 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-06-15 16:34:23
BZCOMMENTOR::Bob Kline
BZCOMMENT::51

So much time had elapsed since I ran that test import that the provisional schema for this issue has been overwritten by the changes required for issues #4837 and #4850. Are you ready for me to re-install this version (understanding that if other issues require changes to the CTGovProtocol schema before this testing is complete it will disappear again)?

Comment entered 2010-06-15 16:51:08 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-06-15 16:51:08
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::52

(In reply to comment #51)
> #4837 and #4850. Are you ready for me to re-install this version
> (understanding that if other issues require changes to the CTGovProtocol schema
> before this testing is complete it will disappear again)?

Yes, please. Proceed to install it, I should finish testing in the next couple of days.

Comment entered 2010-06-16 09:39:14 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-06-16 09:39:14
BZCOMMENTOR::Bob Kline
BZCOMMENT::53

Schema for this issue reinstalled on Mahler. Ready for user testing.

Comment entered 2010-06-16 14:52:52 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-06-16 14:52:52
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::54

1. An empty ArmsOrGroups block appears to be dropped in whenever there is no data present. Examples, CDR0000360653 and CDR0000647606.

2. Also the CTNumberOfArms appears to default to a value of '1' for some of the trials even though there is not Arms information. For example CDR0000066633 and CDR0000068001.

Please run the next test.

Comment entered 2010-06-16 15:20:29 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-06-16 15:20:29
BZCOMMENTOR::Bob Kline
BZCOMMENT::55

(In reply to comment #54)
> (In reply to comment #48)
> > (In reply to comment #47)
> >
> > > Yes. Let's start with Mahler first before going to Franck.
> >
> > OK. I have installed the new version of the schema and the modified XSL/T
> > import filter on Mahler, and I ran an import job in test mode. Test mode stops
> > after importing 10 trials, and that way we can look at the partial results,
> > find and fix any problems in that subset, and run another test job.
> >
>
> 1. An empty ArmsOrGroups block appears to be dropped in whenever there is no
> data present. Examples, CDR0000360653 and CDR0000647606.
>
> 2. Also the CTNumberOfArms appears to default to a value of '1' for some of
> the trials even though there is not Arms information. For example CDR0000066633
> and CDR0000068001.
>
> Please run the next test.

Before I "fix" anything, let's speak with Lakshmi about how these elements are supposed to work. My understanding is that absence of explicit ARM info implies that a single ARM is used for a trial.

Comment entered 2010-06-17 12:46:02 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-06-17 12:46:02
BZCOMMENTOR::Bob Kline
BZCOMMENT::56

Lakshmi will take a look on Monday.

Comment entered 2010-07-08 17:27:16 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-07-08 17:27:16
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::57

(In reply to comment #55)

> > Please run the next test.
>
> Before I "fix" anything, let's speak with Lakshmi about how these elements are
> supposed to work. My understanding is that absence of explicit ARM info
> implies that a single ARM is used for a trial.

We decided to proceed with importing new trials for testing since what I reported as empty ArmsOrGroups elements were actually not empty. They did have the SingleArmOrGroup = 'Yes' attribute. Also, the CTNumberOfArms is supposed to default to '1' in the cases I reported from my testing.

Comment entered 2010-07-12 15:09:56 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-07-12 15:09:56
BZCOMMENTOR::Bob Kline
BZCOMMENT::58

Another 10 test trials have been imported on Mahler:

!9172 Mon Jul 12 15:03:49 2010: Updated CDR360649 from NCT00001159
!9172 Mon Jul 12 15:03:50 2010: Updated CDR360650 from NCT00001160
!9172 Mon Jul 12 15:03:51 2010: Updated CDR360897 from NCT00001254
!9172 Mon Jul 12 15:03:57 2010: Updated CDR360652 from NCT00001397
!9172 Mon Jul 12 15:03:59 2010: Updated CDR360653 from NCT00001469
!9172 Mon Jul 12 15:04:00 2010: Updated CDR662657 from NCT00001582
!9172 Mon Jul 12 15:04:01 2010: Updated CDR360655 from NCT00001595
!9172 Mon Jul 12 15:04:03 2010: Updated CDR360866 from NCT00001620
!9172 Mon Jul 12 15:04:04 2010: Updated CDR360656 from NCT00001852
!9172 Mon Jul 12 15:04:06 2010: Updated CDR360898 from NCT00001872
!9172 Mon Jul 12 15:04:08 2010: Updated CDR75917 from NCT00002463

Here are the URLs for looking at the documents:

Comment entered 2010-07-28 19:20:59 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-07-28 19:20:59
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::59

Please run another import job. The last set is not good candidates for testing because the trials don't have a lot of the new elements.

Comment entered 2010-07-29 12:35:26 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-07-29 12:35:26
BZCOMMENTOR::Bob Kline
BZCOMMENT::60

I've run another batch of ten.

Comment entered 2010-07-30 14:07:59 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-07-30 14:07:59
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::61

(In reply to comment #60)
> I've run another batch of ten.

Will you post the results here or I should use the ctgov report to review them?

Comment entered 2010-07-30 14:16:30 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-07-30 14:16:30
BZCOMMENTOR::Bob Kline
BZCOMMENT::62

!7476 Thu Jul 29 12:33:58 2010: Updated CDR78316 from NCT00002524
!7476 Thu Jul 29 12:34:05 2010: Updated CDR63860 from NCT00002604
!7476 Thu Jul 29 12:34:06 2010: Updated CDR652417 from NCT00002790
!7476 Thu Jul 29 12:34:07 2010: Updated CDR65044 from NCT00002835
!7476 Thu Jul 29 12:34:09 2010: Updated CDR65081 from NCT00002844
!7476 Thu Jul 29 12:34:12 2010: Updated CDR67433 from NCT00004192
!7476 Thu Jul 29 12:34:14 2010: Updated CDR67440 from NCT00004197
!7476 Thu Jul 29 12:34:18 2010: Updated CDR67441 from NCT00004198
!7476 Thu Jul 29 12:34:20 2010: Updated CDR360901 from NCT00004847
!7476 Thu Jul 29 12:34:21 2010: Updated CDR652223 from NCT00005592
!7476 Thu Jul 29 12:34:23 2010: Updated CDR652158 from NCT00005594

It appears that an 11th trial was imported (presumably one that CIAT marked on the review interface).

Comment entered 2010-08-02 11:34:33 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-08-02 11:34:33
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::63

Thanks!

Please, run another import job.

Comment entered 2010-08-02 11:39:22 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-08-02 11:39:22
BZCOMMENTOR::Bob Kline
BZCOMMENT::64

(In reply to comment #63)

> Please, run another import job.

Do you want me to continue throttling the job, or should I run a full import (on Mahler) for the next job?

Comment entered 2010-08-02 11:45:40 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-08-02 11:45:40
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::65

(In reply to comment #64)
> (In reply to comment #63)
>
> > Please, run another import job.
>
> Do you want me to continue throttling the job, or should I run a full import
> (on Mahler) for the next job?

I think throttling it will still be good if you double the documents imported per import job. After about two runs I will ask for full import before promoting to Bach.

Comment entered 2010-08-02 11:59:25 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-08-02 11:59:25
BZCOMMENTOR::Bob Kline
BZCOMMENT::66

(In reply to comment #65)

> I think throttling it will still be good if you double the documents imported
> per import job. After about two runs I will ask for full import before
> promoting to Bach.

!1932 Mon Aug 02 11:55:49 2010: Updated CDR360902 from NCT00005902
!1932 Mon Aug 02 11:55:51 2010: Updated CDR360904 from NCT00005927
!1932 Mon Aug 02 11:55:53 2010: Updated CDR68307 from NCT00006478
!1932 Mon Aug 02 11:55:55 2010: Updated CDR662658 from NCT00006518
!1932 Mon Aug 02 11:55:57 2010: Updated CDR68306 from NCT00007865
!1932 Mon Aug 02 11:55:58 2010: Updated CDR662660 from NCT00026884
!1932 Mon Aug 02 11:55:59 2010: Updated CDR360755 from NCT00031551
!1932 Mon Aug 02 11:56:01 2010: Updated CDR258139 from NCT00052286
!1932 Mon Aug 02 11:56:04 2010: Updated CDR389483 from NCT00060541
!1932 Mon Aug 02 11:56:05 2010: Updated CDR352403 from NCT00060710
!1932 Mon Aug 02 11:56:06 2010: Updated CDR305817 from NCT00063934
!1932 Mon Aug 02 11:56:07 2010: Updated CDR662661 from NCT00068003
!1932 Mon Aug 02 11:56:11 2010: Updated CDR350343 from NCT00068146
!1932 Mon Aug 02 11:56:15 2010: Updated CDR350108 from NCT00071045
!1932 Mon Aug 02 11:56:19 2010: Updated CDR478803 from NCT00071799
!1932 Mon Aug 02 11:56:21 2010: Updated CDR350451 from NCT00073073
!1932 Mon Aug 02 11:56:22 2010: Updated CDR363923 from NCT00080262
!1932 Mon Aug 02 11:56:25 2010: Updated CDR365561 from NCT00080301
!1932 Mon Aug 02 11:56:44 2010: Updated CDR357423 from NCT00080912
!1932 Mon Aug 02 11:56:53 2010: Updated CDR662662 from nct00081354

Comment entered 2010-08-04 16:39:51 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-08-04 16:39:51
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::67

I am not finding enough trials to test the new elements so I am suggesting that CIAT will search clinicaltrials.gov for trials that are more likely to contain the elements we have added and send you the NCT IDs for the trials so that you can import. We will then review the trials. Is this Okay?

Comment entered 2010-08-05 08:52:56 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-08-05 08:52:56
BZCOMMENTOR::Bob Kline
BZCOMMENT::68

I could write a custom version of the import program to bring in specific trials for this task, but you may want to consider just having me run a full import job.

Comment entered 2010-08-05 10:57:44 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-08-05 10:57:44
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::69

(In reply to comment #68)
> I could write a custom version of the import program to bring in specific
> trials for this task, but you may want to consider just having me run a full
> import job.

Please proceed to run a full import job. It may take us a longer time to review but that's Okay.

Comment entered 2010-08-05 11:28:07 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-08-05 11:28:07
BZCOMMENTOR::Bob Kline
BZCOMMENT::70

Well, you wouldn't have to review every single imported trial, right? Couldn't you just look at the ones that would be on the list you described in comment #67?

Comment entered 2010-08-10 10:18:12 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-08-10 10:18:12
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::71

Bob:
I see CTSource in the new CTGovProtocol schema on Mahler. What does it map to in clinicaltrials.gov?

Also, are these all the new elements added to the schema?

1. CTOversightInfo
2. ReasonStopped
3. PrimaryCompletionDate
4. CTStudyType
5. CTOutcomes
6. CTNumberOfArms
7. CTEnrollment
8. ArmsOrGroups
9. BiospecRetention
10. BiospecDescription
11. ProtocolRelatedLinks
12. CTReference
13. CTResultsReference
14. FirstReceivedDate
15. CTSource
16. CTAcronym
17. CTStudyPop
18. CTSamplingMethod

Comment entered 2010-08-10 10:33:37 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-08-10 10:33:37
BZCOMMENTOR::Bob Kline
BZCOMMENT::72

(In reply to comment #71)

> I see CTSource in the new CTGovProtocol schema on Mahler. What does it map to
> in clinicaltrials.gov?

http://verdi.nci.nih.gov/tracker/attachment.cgi?id=1653

> Also, are these all the new elements added to the schema? ....

Yes.

http://mahler.nci.nih.gov/cgi-bin/cdr/GetSchema.py?id=349686

Comment entered 2010-08-10 10:53:40 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-08-10 10:53:40
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::73

(In reply to comment #72)
> (In reply to comment #71)
>
> > I see CTSource in the new CTGovProtocol schema on Mahler. What does it map to
> > in clinicaltrials.gov?
>
> http://verdi.nci.nih.gov/tracker/attachment.cgi?id=1653
>
Thanks!
I was unable to find the 'Source' element in the NLM DTD provided as a link in comment #3. I also don't see it mentioned in their data element definitions
http://prsinfo.clinicaltrials.gov/definitions.html
Is it possible that it goes by another name? In clinicaltrials.gov, the closest display name I can find is the 'Information Provided by" field.

Comment entered 2010-08-10 11:16:36 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-08-10 11:16:36
BZCOMMENTOR::Bob Kline
BZCOMMENT::74

It's there in their DTD [1], but our software is looking for it as part of the required_header block (we create it as part of the RequiredHeader block in our imported document), even though the element is a top-level element. I have modified the import filter to look for it at the correct level. Do we need to adjust the placement in our own document, or did we deliberately put Source in our RequiredHeader block?

[1] http://clinicaltrials.gov/ct2/html/images/info/public.dtd

Comment entered 2010-08-10 11:43:20 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-08-10 11:43:20
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::75

(In reply to comment #74)
> It's there in their DTD [1], but our software is looking for it as part of the
> required_header block (we create it as part of the RequiredHeader block in our
> imported document), even though the element is a top-level element. I have
> modified the import filter to look for it at the correct level. Do we need to
> adjust the placement in our own document, or did we deliberately put Source in
> our RequiredHeader block?
>
I don't remember all the details of the discussions of the elements but this element appears to be in the right place in our document.

Comment entered 2010-08-10 15:19:09 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-08-10 15:19:09
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::76

We have identified the 'test trials'. I believe some of the trials may not be picked up for this import job because they may not meet the criteria for importing. Can we do a force download of those trials to review?

Comment entered 2010-08-10 15:22:36 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-08-10 15:22:36
BZCOMMENTOR::Bob Kline
BZCOMMENT::77

(In reply to comment #76)

> Can we do a force download of those trials to review?

Yes. I'll have to run a real download job from Franck (not just use the download from production), but I don't think they'll be too upset at having an extra download job run in the daytime if it's just this once.

Comment entered 2010-08-10 16:41:27 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-08-10 16:41:27
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::78

(In reply to comment #77)
> (In reply to comment #76)
>
> > Can we do a force download of those trials to review?
>
> Yes. I'll have to run a real download job from Franck (not just use the
> download from production), but I don't think they'll be too upset at having an
> extra download job run in the daytime if it's just this once.

Great. I think we can proceed with the full import job when you are ready.

Comment entered 2010-08-10 17:15:11 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-08-10 17:15:11
BZCOMMENTOR::Bob Kline
BZCOMMENT::79

(In reply to comment #77)

> Yes. I'll have to run a real download job from Franck ....

I misspoke when I said Franck (I meant Mahler); on which server did you flag the trials for forced download?

Comment entered 2010-08-10 17:20:09 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-08-10 17:20:09
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::80

(In reply to comment #79)
> (In reply to comment #77)
>
> > Yes. I'll have to run a real download job from Franck ....
>
> I misspoke when I said Franck (I meant Mahler); on which server did you flag
> the trials for forced download?

We have not flagged them yet. We can flag them this evening so that you can do the download tomorrow, if that is Okay?

Comment entered 2010-08-10 17:23:39 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-08-10 17:23:39
BZCOMMENTOR::Bob Kline
BZCOMMENT::81

That's fine. Just remember, I'll be leaving town mid-day tomorrow, and won't be back until next week.

Comment entered 2010-08-11 07:23:04 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-08-11 07:23:04
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::82

(In reply to comment #81)
> That's fine. Just remember, I'll be leaving town mid-day tomorrow, and won't
> be back until next week.

Please proceed with the download job. We have flagged about 18 trials for force download.

Comment entered 2010-08-11 11:54:17 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-08-11 11:54:17
BZCOMMENTOR::Bob Kline
BZCOMMENT::83

The download job completed a while ago. The import job is still running (mostly done). You can check periodically for the individual documents you're testing with to see if they've finished importing.

Comment entered 2010-08-12 15:58:24 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-08-12 15:58:24
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::84

I went back after the review meeting to re-check some of the documents we reviewed yesterday and I was surprised to see that many of the new data are now in there. When I looked at the doc history, it turned out that we did most of our reviews about an hour before (between 2:15 and 3:00) the import job completed (about 4:16). I did not realize it will take that long for the job to complete.

Now, I can say that a lot of the data are present in the documents and they are where we wanted to see them, but there are still a few of them we have not been able to see in the documents because we have not seen trials that are supposed to have them (but they are in the minority). This is because some of the trials we identified and did a force download for, did not get downloaded as expected. Four of those documents ended up on the review page and we have marked them to be imported. They are:
NCT01168791
NCT01169220
NCT01143545
NCT01040949
I believe Bob will have to manually start another download job so that we can see them.

Other data elements we have not been able to test because the documents we identified did not get imported are:

CTAcronym
We will be able to test with one of the 4 above

CTSamplingMethod
NCT00643929
NCT00727571

CTStudyPop
NCT00870376
NCT01126567

BiospecRetention
NCT00989846
NCT01126567

BiospecDescription
NCT00710307
NCT01126567
NCT00989846

We will continue to review the imports to see if we can identify the above data in some of the trials. If we are not successful or if it takes too long, we may have to find a way to get the above documents imported or we can identify another set of trials to force download to test the remaining elements.

Comment entered 2010-08-12 15:59:37 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-08-12 15:59:37
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::85

Added Margaret to this issue.

Comment entered 2010-08-23 15:00:27 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-08-23 15:00:27
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::86

It looks like the schema has been overwritten again. I am getting the "Document does not conform to DTD or XML Schema" dialog box when I attempt to retrieve and review documents.

Comment entered 2010-08-23 15:15:48 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-08-23 15:15:48
BZCOMMENTOR::Bob Kline
BZCOMMENT::87

Right. That will keep happening as long as we delay deployment of these changes while making other changes to the same schema (which need to be installed, tested, and promoted). Probably task #4891 this time. Changes for this task restored on Mahler.

Comment entered 2010-08-24 16:16:05 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-08-24 16:16:05
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::88

1. In the ArmsOrGroups block when there is no data for the ArmsOrGroupType, a blank element is dropped in. Is it possible not to display the element when data is not present? Should I put this under OCECDR-3007?

2. It looks like not all relevant information for the ArmsOrGroups are not brought in.
For example

CDR0000671043 - NCT01103323 -

http://www.clinicaltrials.gov/ct2/show/NCT01103323?term=NCT01103323&rank=1

In the record on ctgov, there is an “Assigned Interventions" column. However, this information is not included in the CDR.

Same issue with CDR0000671899 - NCT01108484

http://www.clinicaltrials.gov/ct2/show/NCT01108484?term=NCT01108484&rank=1

Comment entered 2010-08-25 10:17:54 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-08-25 10:17:54
BZCOMMENTOR::Bob Kline
BZCOMMENT::89

Doesn't look like that's in the proposed mapping table that I posted for review back in March of 2009. Do you want to have it added?

Comment entered 2010-08-25 11:13:50 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-08-25 11:13:50
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::90

(In reply to comment #89)
> Doesn't look like that's in the proposed mapping table that I posted for review
> back in March of 2009. Do you want to have it added?

Yes please.

Comment entered 2010-08-25 11:30:13 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-08-25 11:30:13
BZCOMMENTOR::Bob Kline
BZCOMMENT::91

In addition to arm_group_label, NLM's intervention block also contains description and other_name elements. Do you want these mapped as well?

Comment entered 2010-08-25 11:38:18 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-08-25 11:38:18
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::92

(In reply to comment #91)
> In addition to arm_group_label, NLM's intervention block also contains
> description and other_name elements. Do you want these mapped as well?

For the description, there is already an element in the PDQ ArmsOrGroups blocked called ArmOrGroupDescription. Are you referring to that or a different element and Yes, please map the other_name element.

Comment entered 2010-08-25 12:36:51 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-08-25 12:36:51
BZCOMMENTOR::Bob Kline
BZCOMMENT::93

(In reply to comment #92)

> For the description, there is already an element in the PDQ ArmsOrGroups
> blocked called ArmOrGroupDescription. Are you referring to that or a different
> element ....

That's a description of the arm or group; this is a description of the intervention. I'm looking at the intervention block because that's where the link is back to the arm/group which makes it possible for CT.gov to populate the Assigned Interventions column of the arms table on its web site.

Comment entered 2010-08-25 12:41:06 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-08-25 12:41:06
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::94

(In reply to comment #93)
> (In reply to comment #92)
> That's a description of the arm or group; this is a description of the
> intervention. I'm looking at the intervention block because that's where the
> link is back to the arm/group which makes it possible for CT.gov to populate
> the Assigned Interventions column of the arms table on its web site.

Yes. Please include that also. Thanks!

Comment entered 2010-08-26 10:56:20 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-08-26 10:56:20
BZCOMMENTOR::Bob Kline
BZCOMMENT::95

Revised schema at http://mahler.nci.nih.gov/cgi-bin/cdr/GetSchema.py?id=349686; please take a look and see if there are any other mappings you'd like to have included.

Comment entered 2010-08-26 10:56:20 by Kline, Bob (NIH/NCI) [C]

Attachment task-4444-mapping (1).xls has been added with description: Added more children for CTIntervention

Comment entered 2010-08-30 12:13:29 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-08-30 12:13:29
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::96

(In reply to comment #95)
> Created attachment 1985 [details]
> Added more children for CTIntervention
>
> Revised schema at http://mahler.nci.nih.gov/cgi-bin/cdr/GetSchema.py?id=349686;
> please take a look and see if there are any other mappings you'd like to have
> included.

There is a Keywords section in the display on clinicaltrials.gov that we will like to include. In the link below the heading for this information is:
"Keywords provided by National Institutes of Health Clinical Center (CC):"

http://www.clinicaltrials.gov/ct2/show/NCT00070785?term=NCT00070785&rank=1

Comment entered 2010-08-30 13:17:16 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-08-30 13:17:16
BZCOMMENTOR::Bob Kline
BZCOMMENT::97

http://mahler.nci.nih.gov/cgi-bin/cdr/GetSchema.py?id=349686

If there are no further additions, I will finish the modifications to the import filter and we can run some another test.

Comment entered 2010-08-30 13:17:16 by Kline, Bob (NIH/NCI) [C]

Attachment task-4444-mapping-20100830.xls has been added with description: Added CTKeyword as child of CTGovIndexing block

Comment entered 2010-08-31 09:40:07 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-08-31 09:40:07
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::98

(In reply to comment #97)
> Created attachment 1988 [details]
> Added CTKeyword as child of CTGovIndexing block
>
> http://mahler.nci.nih.gov/cgi-bin/cdr/GetSchema.py?id=349686
>
> If there are no further additions, I will finish the modifications to the
> import filter and we can run some another test.

Yes. Please proceed. No further additions.

Comment entered 2010-08-31 11:33:35 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-08-31 11:33:35
BZCOMMENTOR::Bob Kline
BZCOMMENT::99

I've installed the code to pick up the new elements and I ran a complete download and import job suite on Mahler. If a document hasn't changed since the previous copy we got from NLM it won't have been marked for re-import, so if you have any specific trials you need to have re-imported with the new filter, let me know and I'll flag them by hand.

Comment entered 2010-08-31 14:22:42 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-08-31 14:22:42
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::100

(In reply to comment #99)
> if you have any specific trials you need to have re-imported with the new
> filter, let me know and I'll flag them by hand.

Please, flag these:

NCT01188759
NCT01188187
NCT01186263
NCT01103323
NCT01108484

Comment entered 2010-08-31 15:17:52 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-08-31 15:17:52
BZCOMMENTOR::Bob Kline
BZCOMMENT::101

!1092 Tue Aug 31 15:16:09 2010: Updated CDR671043 from NCT01103323
!1092 Tue Aug 31 15:16:10 2010: Updated CDR671899 from NCT01108484
!1092 Tue Aug 31 15:16:12 2010: Added NCT01186263 as CDR0000672011
!1092 Tue Aug 31 15:16:12 2010: Added NCT01188187 as CDR0000672012
!1092 Tue Aug 31 15:16:12 2010: Added NCT01188759 as CDR0000672013

Comment entered 2010-09-01 13:40:02 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-09-01 13:40:02
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::102

Could you move the ArmsOrGroups block and place it right before the CTGovIndexing block? Is this possible?

Comment entered 2010-09-01 14:40:31 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-09-01 14:40:31
BZCOMMENTOR::Bob Kline
BZCOMMENT::103

Done.

Comment entered 2010-09-02 09:32:51 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-09-02 09:32:51
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::104

(In reply to comment #103)
> Done.

Can you please do a full download job so that we can do a final review?

Comment entered 2010-09-07 14:10:26 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-09-07 14:10:26
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::105

We have finished reviewing the documents. They look good. Please promote the changes Bach.

Comment entered 2010-09-08 13:15:01 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-09-08 13:15:01
BZCOMMENTOR::Bob Kline
BZCOMMENT::106

Hurrah! We started this task back in January 2009, so it feels good to finally be putting it to bed!

I have installed the new versions of the filter and schema on Bach, and merged the version control branch back into the trunk.

Please keep a close eye on the imported documents over the next few days to make sure nothing is broken.

Be aware that the new elements won't be picked up for a given document until NLM sends us a new version of that document. If that presents problems, then we may want to discuss manually setting all of the 'imported' dispositions to 'import requested' in the ctgov_import table to force a re-import. Might be preferable to let things just take their normal course, though, at least for a while, in order to minimize the amount of republished trials we end up pushing to Cancer.gov.

Comment entered 2010-09-08 14:24:59 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-09-08 14:24:59
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::107

(In reply to comment #106)
> Be aware that the new elements won't be picked up for a given document until
> NLM sends us a new version of that document. If that presents problems, then
> we may want to discuss manually setting all of the 'imported' dispositions to
> 'import requested' in the ctgov_import table to force a re-import. Might be
> preferable to let things just take their normal course, though, at least for a
> while, in order to minimize the amount of republished trials we end up pushing
> to Cancer.gov.

In the short term, I think it is okay to let things take their normal course as you suggested since the records that don’t get updated won’t be invalidated.

Comment entered 2010-09-09 11:10:46 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-09-09 11:10:46
BZCOMMENTOR::Bob Kline
BZCOMMENT::108

Looks as if this morning's import job processed 99 trials, 88 of them updates and the rest new. Don't see evidence of any problems. I've attached the list of the IDs so you can spot-check a few to make sure they look OK.

Comment entered 2010-09-09 11:10:46 by Kline, Bob (NIH/NCI) [C]

Attachment task4444-20100909.log has been added with description: Extract from import log

Comment entered 2010-09-15 13:30:27 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-09-15 13:30:27
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::109

(In reply to comment #108)
> Created attachment 1996 [details]
> Extract from import log
>
> Looks as if this morning's import job processed 99 trials, 88 of them updates
> and the rest new. Don't see evidence of any problems. I've attached the list
> of the IDs so you can spot-check a few to make sure they look OK.

Yes. We have not come across any problems.
I have entered a new OCECDR-3224 to take care of documents that may not be not be updated during the normal daily updates.

This issue is now closed. Thank you!!

Attachments

File Name	Posted	User
ClinicalTrialsSchemaChanges.doc	2009-02-03 08:48:43	Osei-Poku, William (NIH/NCI) [C]
task4444-20100909.log	2010-09-09 11:10:46
task-4444-mapping.xls	2009-03-27 16:25:40
task-4444-mapping (1).xls	2010-08-26 10:56:20
task-4444-mapping-20100830.xls	2010-08-30 13:17:16

Elapsed: 0:00:00.001021

CDR Tickets