Issue Number | 2769 |
---|---|
Summary | [CTGov] Include additional elements in CTGOV imports |
Created | 2009-01-14 15:14:58 |
Issue Type | Improvement |
Submitted By | Osei-Poku, William (NIH/NCI) [C] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2010-09-15 13:30:27 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.107097 |
BZISSUE::4444
BZDATETIME::2009-01-14 15:14:58
BZCREATOR::William Osei-Poku
BZASSIGNEE::Bob Kline
BZQACONTACT::Lakshmi Grama
There are a couple of elements in CTGOV that are not currently
included in the imports into the CDR but would be useful to have. We
would like CTGOV imports to include the following elements:
1. Primary Outcome Measures
2. Secondary Outcome Measures (When applicable)
3. Arms and Assigned Interventions (When applicable)- This information
is usually in a table format on CTGOV and it is under the Arms, Groups
and Interventions elements.
4. Groups/Cohorts information (when applicable). This information is
usually in a table formate on CTGOV under the Arms, Groups and
Interventions elements.
The following two elements will be helpful to have but not as important as the four above:
5. Biospecimen Retention
6. Biospecimen Description
BZDATETIME::2009-01-15 09:31:30
BZCOMMENTOR::Bob Kline
BZCOMMENT::1
Please provide schema change and mapping specs for the new elements you want to have under CTGovIndexing. I can come up with proposals for those myself if that's needed, but in the past CIAT and OCCM have preferred to take the lead for determining the changes to the document structures and import/export mapping logic.
BZDATETIME::2009-02-03 08:48:43
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::2
I have included proposed schema changes and mapping specs in the attached file.
Attachment ClinicalTrialsSchemaChanges.doc has been added with description: Proposed Schema Changes and Mapping Specs
BZDATETIME::2009-02-03 09:39:53
BZCOMMENTOR::Bob Kline
BZCOMMENT::3
Some questions.
Lakshmi: I've made you the QA contact so you can make sure you're happy with the end result here.
It's not clear to me why the Outcomes element should be multiply-occurring. If we do make it multiply-occurring, how would the import software know which incoming primary_outcome and secondary_outcome elements get placed in which Outcomes blocks in the imported document?
It's also not clear why the incoming measure element (whose name is singular) should map to an element whose name is plural (OutcomeMeasures). Along the same lines, why would primary_outcome and secondary_outcome, which can only contain a single outcome child, be mapped to an element whose name is plural (PrimaryOutcomeMeasures)?
I'd raise the same concern about the fact that the name ArmsAssignedInterventions is plural when it itself is multiply occurring and each only contains a single ArmGroup child element, but it's not clear why the name should have "Intervention" in it at all (singular or plural). If the semantics of this part of the incoming document bear any resemblance to those for documents we're sending to ClinicalTrial.gov ourselves, arms and interventions are not the same thing. Instead arms and interventions are given separately, with links in the interventions to show which arms they're associated with (if any), but there's no indication in the mapping table that we're supposed to implement the logic to extract these associations, nor is there any place to put these connections in the new elements to be added to the schema.
For BiospecimenRetention, is there any reason we're only allowing two of the three values they're telling us to expect?
Are you sure the names of the incoming elements are "biospec_retention" and "biospec_descr" (I see "biospecimen_retention" and "biospecimen_description" in the version of the DTD I'm looking at [1], but not "biospec_retention" or "biospec_descr"). If the incoming biospecimen_retention (or biospec_retention) is a singly-occurring child of the observational_design element, of which there can be only one in the entire incoming document according to this version of the DTD, why would our schema want to allow multiple occurrences of BiospecimenRetention, to which the incoming element is to be mapped? Same question for biospecimen_description (or biospec_descr).
[1] https://register.clinicaltrials.gov/prs/html/clinical_study.dtd
BZDATETIME::2009-02-03 10:21:53
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::4
(In reply to comment #3)
> Some questions.
>
> Lakshmi: I've made you the QA contact so you can make sure you're
happy with
> the end result here.
>
>
> It's also not clear why the incoming measure element (whose name is
singular)
> should map to an element whose name is plural (OutcomeMeasures).
Along the
> same lines, why would primary_outcome and secondary_outcome, which
can only
> contain a single outcome child, be mapped to an element whose name
is plural
> (PrimaryOutcomeMeasures)?
The display elements (in protocols on CTGOV) has the elements as
Primary Outcome Measures and Secondary Outcome measures. I thought it
would be good to have the elements in CDR and CTgov look as close as
possible since in many cases users would be comparing the two documents
while abstracting.
>
> I'd raise the same concern about the fact that the name
> ArmsAssignedInterventions is plural when it itself is multiply
occurring and
> each only contains a single ArmGroup child element, but it's not
clear why the
> name should have "Intervention" in it at all (singular or plural).
If the
> semantics of this part of the incoming document bear any
resemblance to those
> for documents we're sending to ClinicalTrial.gov ourselves, arms
and
> interventions are not the same thing. Instead arms and
interventions are given
> separately, with links in the interventions to show which arms
they're
> associated with (if any), but there's no indication in the mapping
table that
> we're supposed to implement the logic to extract these
associations, nor is
> there any place to put these connections in the new elements to be
added to the
> schema.
This data is presented in a table form (as mentioned in my first post) in CTGov with one column titled "Arms" and the other column titled "Assigned Interventions".
>
> For BiospecimenRetention, is there any reason we're only allowing
two of the
> three values they're telling us to expect?
This was an oversight. I thought the first option was a comment instead of an option.
>
> Are you sure the names of the incoming elements are
"biospec_retention" and
> "biospec_descr" (I see "biospecimen_retention" and
"biospecimen_description" in
> the version of the DTD I'm looking at [1], but not
"biospec_retention" or
> "biospec_descr"). If the incoming biospecimen_retention (or
biospec_retention)
> is a singly-occurring child of the observational_design element, of
which there
> can be only one in the entire incoming document according to this
version of
> the DTD, why would our schema want to allow multiple occurrences
of
> BiospecimenRetention, to which the incoming element is to be
mapped? Same
> question for biospecimen_description (or biospec_descr).
>
> [1] https://register.clinicaltrials.gov/prs/html/clinical_study.dtd
It looks like I was using a different DTD than the one above
BZDATETIME::2009-02-03 10:49:26
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::5
If a data element exists in InscopeProtocol and in CTGOV, we want to match the schema to the InscopeProtocol as much as possible. That way we can be consistent. I will look at the specifics and get back to you.
BZDATETIME::2009-02-19 15:10:50
BZCOMMENTOR::Bob Kline
BZCOMMENT::6
Lakshmi asked me to find out if CT.gov are already including the 'type' attribute on the enrollment elements in what they send us. They aren't.
BZDATETIME::2009-02-23 15:39:15
BZCOMMENTOR::Bob Kline
BZCOMMENT::7
(In reply to comment #6)
> Lakshmi asked me to find out if CT.gov are already including the
'type'
> attribute on the enrollment elements in what they send us. They
aren't.
She also asked me to determine whether we have CTExpectedEnrollment in any of our documents. We do, in 176 of them.
BZDATETIME::2009-02-24 10:34:12
BZCOMMENTOR::Bob Kline
BZCOMMENT::8
I am unable to install the expanded schema on Mahler, because Sheri has it locked. I am also unable to add Sheri as a CC to this tracker issue for some reason, so William, please communicate with Sheri to get her to check the schema back in.
BZDATETIME::2009-02-24 10:36:35
BZCOMMENTOR::Bob Kline
BZCOMMENT::9
Right. I had forgotten that part of this new feature involved a change to the CDR Server. Unless this is urgent, I plan to install the modified server after working hours tonight.
BZDATETIME::2009-02-24 10:44:03
BZCOMMENTOR::Bob Kline
BZCOMMENT::10
(In reply to comment #9)
> Right. I had forgotten that part of this new feature involved a
change to the
> CDR Server. Unless this is urgent, I plan to install the modified
server after
> working hours tonight.
Wrong issue. Please ignore comment.
BZDATETIME::2009-02-24 10:49:16
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::11
(In reply to comment #8)
> I am unable to install the expanded schema on Mahler, because Sheri
has it
> locked. I am also unable to add Sheri as a CC to this tracker issue
for some
> reason, so William, please communicate with Sheri to get her to
check the
> schema back in.
The schema is checked-in now. Sheri finally resigned last month.
BZDATETIME::2009-02-24 14:21:12
BZCOMMENTOR::Bob Kline
BZCOMMENT::12
I have expanded the CTGovProtocol schema along the lines discussed in Thursday's meeting, based on the notes I took. I guess there are two things for the users to verify: (1) that I have represented the structures for the new information appropriately; and (2) that the changes to the schema don't invalidate existing documents which were valid against the previous version of the schema.
BZDATETIME::2009-02-24 14:25:31
BZCOMMENTOR::Bob Kline
BZCOMMENT::13
(In reply to comment #12)
> I have expanded the CTGovProtocol schema along the lines discussed
in
> Thursday's meeting, based on the notes I took. I guess there are
two things
> for the users to verify: (1) that I have represented the structures
for the new
> information appropriately; and (2) that the changes to the schema
don't
> invalidate existing documents which were valid against the previous
version of
> the schema.
>
> http://bach.nci.nih.gov/cgi-bin/cdr/GetSchema.py?id=349686
Wrong server. URL should be:
http://mahler.nci.nih.gov/cgi-bin/cdr/GetSchema.py?id=349686
BZDATETIME::2009-03-03 17:57:03
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::14
(In reply to comment #13)
> (In reply to comment #12)
> > I have expanded the CTGovProtocol schema along the lines
discussed in
> > Thursday's meeting, based on the notes I took. I guess there
are two things
> > for the users to verify: (1) that I have represented the
structures for the new
> > information appropriately; and (2) that the changes to the
schema don't
> > invalidate existing documents which were valid against the
previous version of
> > the schema.
> >
Tested on Mahler. I will enter a new issue to modify the CSS
Template.
Another level of testing would be to force import a new document on
Mahler (after the schema changes have been verified by Lakshmi).
BZDATETIME::2009-03-04 10:13:42
BZCOMMENTOR::Volker Englisch
BZCOMMENT::15
(In reply to comment #8)
> I am also unable to add Sheri as a CC to this tracker issue for
some
> reason
The reason is that her user account has been disabled since all of her email to Lockheed bounced back after she resigned.
BZDATETIME::2009-03-06 11:07:46
BZCOMMENTOR::Bob Kline
BZCOMMENT::16
Next step is for Lakshmi to review the schema changes and (if satisfied) give the green light to proceed with implementing the changes to the import software.
BZDATETIME::2009-03-18 18:59:53
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::17
I think the schema looks OK -
>>that the changes to the schema don't
> > invalidate existing documents which were valid against the
previous version of
> > the schema.
Is there a way you can test this perhaps on Franck?
BZDATETIME::2009-03-19 08:00:26
BZCOMMENTOR::Bob Kline
BZCOMMENT::18
(In reply to comment #17)
> Is there a way you can test this perhaps on Franck?
I installed the new version of the schema on Franck and ran the CheckValidity.py script for the CTGovProtocol documents and none of the validation status were changed.
BZDATETIME::2009-03-24 16:06:35
BZCOMMENTOR::Bob Kline
BZCOMMENT::19
Should I assume I have the green light to start implementing the changes to the import software?
BZDATETIME::2009-03-26 08:14:18
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::20
Yes - please go ahead.
BZDATETIME::2009-03-26 16:24:01
BZCOMMENTOR::Bob Kline
BZCOMMENT::21
I just noticed the following comment up in the history section of NLM's DTD:
Moved and renamed expected_enrollment to enrollment
Our schema currently has:
CTGovProtocol
RequiredHeader
....
CTEnrollment
....
CTEligibility
....
CTExpectedEnrollment
....
....
Should we:
1. Keep this structure, mapping enrollment to CTEnrollment?
2. Drop CTExpectedEnrollment from the schema, do a global change
to move the values to CTEnrollment, and map enrollment to
CTEnrollment for future imports?
3. Drop CTEnrollment from the top-level children, and map
enrollment to CTEligibility/CTExpectedEnrollment?
4. Napalm NLM?
BZDATETIME::2009-03-26 16:29:20
BZCOMMENTOR::Volker Englisch
BZCOMMENT::22
Option (4) seems to be appealing but I'm afraid it's not politically correct. :-)
BZDATETIME::2009-03-27 15:59:23
BZCOMMENTOR::Bob Kline
BZCOMMENT::23
I noticed that in the new CTGovProtocol schema we have mapped link/url to RelatedWebsites. I must have copied this from the common protocol schema and the terminology schema, but the use of the plural name for an element which contains a link for a single web site is surely wrong. Did we already have this discussion?
BZDATETIME::2009-03-27 16:25:40
BZCOMMENTOR::Bob Kline
BZCOMMENT::24
Note that this mapping table reflects a couple of outstanding mapping questions (see comments #21 and #23).
[Volker: Why does Bugzilla no longer include a MIME type for Excel spreadsheets when making an attachment for an issue?]
Attachment task-4444-mapping.xls has been added with description: Proposed mappings for import software
BZDATETIME::2009-03-27 16:56:04
BZCOMMENTOR::Volker Englisch
BZCOMMENT::25
(In reply to comment #24)
> [Volker: Why does Bugzilla no longer include a MIME type for Excel
spreadsheets
> when making an attachment for an issue?]
Because it wasn't part of the installation and we added the MIME type ourselves and I didn't think it would be necessary with the current installation to add it since Bugzilla is now smart enough to auto-detect Excel spreadsheets.
BZDATETIME::2009-05-07 14:17:42
BZCOMMENTOR::Bob Kline
BZCOMMENT::26
Lakshmi says we can keep the incorrectly named RelatedWebsites. Bob will follow up with John G. to find out what their intentions are for enrollment values (are they currently implied to be "expected" values? will we get multiple occurrences if/when they start giving us "type" attribute values? when we start getting "type" attributes, will we just get "actual" with "expected" implied, or will the attribute be required?).
BZDATETIME::2009-05-07 15:35:48
BZCOMMENTOR::Bob Kline
BZCOMMENT::27
Sent the following note to John Gillen:
John:
Could you give us a little guidance on the use of the enrollment element in XML documents retrieved from ClinicalTrials.gov? The DTD indicates that it was "moved and renamed" from expected_enrollment, and there is also a comment which says "In future: expect 'type' attribute." Is it correct to interpret the values we are currently getting (without the type attribute) as "expected"? When the type attribute is included, will the values be "expected" and "actual"? Any idea on the time frame when this change might occur?
Thanks!
BZDATETIME::2009-05-12 07:43:59
BZCOMMENTOR::Bob Kline
BZCOMMENT::28
Here's John's initial reply [2009-05-09]:
Not sure why this has not been added to the public xml. Will
take
this up with others on the software team and one of us will get
back
to you shortly.
We are collecting enrollment type in the PRS, with values "Actual"
or
"Anticipated". The basic idea is to specify anticipated enrollment
up front, update it along the way, and then to replace it with
actual
enrollment after study completion.
That said, enrollment type is one of those FDAAA-required data
elements, so it is not present in all records (long story). Once
this is added to the public site xml, the absence of it can only
safely be interpreted to mean "unknown", except in those cases
where
it is impossible to have actual enrollment (i.e., a study that is
recruiting or not-yet-recruiting).
BZDATETIME::2009-05-13 10:22:38
BZCOMMENTOR::Bob Kline
BZCOMMENT::29
[John G., 2009-05-12]
Just to follow up...
We will add the "type" attribute to the enrollment tag in the
ClinicalTrials.gov XML. Value will either be "Actual" or
"Anticipated", when the data provider has supplied this
information. The changes should be in place by the end of
the week.
BZDATETIME::2009-06-29 15:57:33
BZCOMMENTOR::Bob Kline
BZCOMMENT::30
(In reply to comment #21)
> I just noticed the following comment up in the history section of
NLM's DTD:
>
> Moved and renamed expected_enrollment to enrollment
>
> Our schema currently has:
>
> CTGovProtocol
> RequiredHeader
> ....
> CTEnrollment
> ....
> CTEligibility
> ....
> CTExpectedEnrollment
> ....
> ....
>
> Should we:
>
> 1. Keep this structure, mapping enrollment to CTEnrollment?
> 2. Drop CTExpectedEnrollment from the schema, do a global
change
> to move the values to CTEnrollment, and map enrollment to
> CTEnrollment for future imports?
> 3. Drop CTEnrollment from the top-level children, and map
> enrollment to CTEligibility/CTExpectedEnrollment?
> 4. Napalm NLM?
It was never decided what we want to do here. Option #4 was obviously just a joke, and option #3 no longer seems appropriate in light of the clarification we have from John G. (that is, some values for the new incoming enrollment element are expected, some are actual, and some are not identified as either type of value). So we need to decide whether we'll keep the existing CTExpectedEnrollment element in the schema (and the values in the documents), just using the new CTEnrollment element for new import jobs, or move the existing data to the new element with a global change, setting the Type attribute to 'Anticipated' (or 'Expected' if that's the term we'd rather use), dropping the old element from the schema.
Currently the proposed schema has the 'Type' attribute as a string. Should we leave it that way, just sticking whatever value we get from NLM into the attribute? Or should we constrain the attribute with an enumerated valid values list? If the latter, do we want to have 'Anticipated' (or 'Expected'), 'Actual' and 'Unknown' (reflecting the language in John's reply) or just leave it out if we can't determine what the value should be? Should we use the logic he alluded to of determining that the value must be 'Anticipated' if the status is such that they can't possibly know yet what the actual enrollment numbers are?
What would you like us to do, Lakshmi?
BZDATETIME::2009-07-23 08:08:45
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::31
>>> So we need to decide whether we'll keep the
existing
>>>CTExpectedEnrollment element in the schema (and the values
in the documents),
>>>just using the new CTEnrollment element for new import jobs,
or move the
>>>existing data to the new element with a global change,
setting the Type
>>>attribute to 'Anticipated' (or 'Expected' if that's the term
we'd rather use),
>>>dropping the old element from the schema.
1. My preference is to create a new element CTEnrollment with the three attributes - Anticipated, Actual, Unknown.
2.Since we have not imported the new element before, will we be able to just import the data for all trials from the new element in CTGOV into the new element? That would seem to negate the need to move the existing data to the new element and set the type to Anticipated. Otherwise, I agree we need to move the data.
3. Can you confirm that the CTGOV DTD for licensees does not include CTExpectedenrollment. If it does, we need to coordinate with Volker so that CTExpectedEnrollment uses data from the new field - only values that have ahticipated.
Reply to
"Currently the proposed schema has the 'Type' attribute as a string.
Should we
leave it that way, just sticking whatever value we get from NLM into
the
attribute? Or should we constrain the attribute with an enumerated
valid
values list? If the latter, do we want to have 'Anticipated' (or
'Expected'),
'Actual' and 'Unknown' (reflecting the language in John's reply) or just
leave
it out if we can't determine what the value should be?"
Use Anticipated, Actual and just leave it out if we don't know what the value is. I don't see us displaying Unknown with the data on Cancer.gov
Reply to
"Should we use the logic he alluded to of determining that the value
must be 'Anticipated' if the status is such that they can't possibly
know yet what the actual enrollment numbers are?"
If they are implementing the check on their end, why do we need to implement it as well. We should be able to take their values and use them as is.
BZDATETIME::2009-07-23 09:19:21
BZCOMMENTOR::Bob Kline
BZCOMMENT::32
(In reply to comment #31)
> 2.Since we have not imported the new element before, will we be
able to just
> import the data for all trials from the new element in CTGOV into
the new
> element? That would seem to negate the need to move the existing
data to the
> new element and set the type to Anticipated. Otherwise, I agree we
need to . > move the data.
We can:
1. Strip the old element out (from schema and documents)
2. Move the data and drop the old element from the schema
3. Leave the obsolete data in the documents (and the schema).
I'm not sure which of these you're asking for (maybe I just need to finish my coffee and it will make sense :-) ).
>
> 3. Can you confirm that the CTGOV DTD for licensees does not
include
> CTExpectedenrollment.
No, just ExpectedEnrollment (Volker may be mapping CTExpectedEnrollment into this element).
> Reply to
> "Should we use the logic he alluded to of determining that the
value must be
> 'Anticipated' if the status is such that they can't possibly know
yet what the
> actual enrollment numbers are?"
>
> If they are implementing the check on their end, why do we need to
implement it
> as well. We should be able to take their values and use them as
is.
I wasn't thinking so much about a validation check. I was instead asking about taking advantage of his assurance about what the absence of the attribute would mean when the status of the trial indicated that accrual numbers couldn't be actual (because the trial wasn't finished recruiting), in which case we could set the attribute to "Anticipated" ourselves.
BZDATETIME::2009-09-01 11:30:13
BZCOMMENTOR::Bob Kline
BZCOMMENT::33
NLM added nct_alias to the id_info block back in 2007. We're not picking up that element. Should we be?
BZDATETIME::2009-09-03 12:48:59
BZCOMMENTOR::Bob Kline
BZCOMMENT::34
LG: go ahead and pick up (store) nct_alias.
BZDATETIME::2009-09-08 10:52:59
BZCOMMENTOR::Bob Kline
BZCOMMENT::35
NLM's current DTD has both completion_date and primary_completion_date elements (both optional, singly-occurring top-level elements).
The comment at the top of the DTD says:
<!– 05/12/09 added tag <completion_date> -->
<!– This field is the last followup date, when available,
-->
<!– otherwise, end date. <end_date> tag is now obsolete.
-->
We have PrimaryCompletionDate and EndDate in our modified schema, but no CompletionDate (our schema modifications were made before May, when NLM made this change). What should we do about the new completion_date element, and about the fact that end_date is now obsolete?
BZDATETIME::2009-09-10 13:03:56
BZCOMMENTOR::Bob Kline
BZCOMMENT::36
(In reply to comment #35)
> NLM's current DTD has both completion_date and
primary_completion_date elements
> (both optional, singly-occurring top-level elements).
>
> The comment at the top of the DTD says:
>
> <!– 05/12/09 added tag <completion_date> -->
> <!– This field is the last followup date, when available,
-->
> <!– otherwise, end date. <end_date> tag is now obsolete.
-->
>
> We have PrimaryCompletionDate and EndDate in our modified schema,
but no
> CompletionDate (our schema modifications were made before May, when
NLM made
> this change). What should we do about the new completion_date
element, and
> about the fact that end_date is now obsolete?
LG: Ask John G if the data that used to be in end_date is now being stored in completion_date. If not, ask him what we should do with our end_date values.
BZDATETIME::2009-09-17 10:58:38
BZCOMMENTOR::Bob Kline
BZCOMMENT::37
(In reply to comment #36)
> LG: Ask John G if the data that used to be in end_date is now
being stored in
> completion_date. If not, ask him what we should do with our
end_date values.
Sent question to John G. Will post his reply when it's received.
BZDATETIME::2009-09-21 11:09:36
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::38
The import software was modified to accommodate changes in expanded access status in OCECDR-445. This needs to be tested along with changes in this issue.
Specifically, "Temporarily not available” was mapped to "Temporarily
closed."
and "Approved for marketing" was mapped to "Completed"
BZDATETIME::2009-09-21 11:13:32
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::39
(In reply to comment #38)
> The import software was modified to accommodate changes in expanded
access
> status in OCECDR-445. This needs to be tested along with changes in
this issue.
> Specifically, "Temporarily not available” was mapped to
"Temporarily closed."
> and "Approved for marketing" was mapped to "Completed"
Correction: The changes were done in OCECDR-2824 and not OCECDR-445. Thanks!
BZDATETIME::2009-09-30 11:59:19
BZCOMMENTOR::Bob Kline
BZCOMMENT::40
(In reply to comment #37)
> (In reply to comment #36)
>
> > LG: Ask John G if the data that used to be in end_date is now
being stored in
> > completion_date. If not, ask him what we should do with our
end_date values.
>
> Sent question to John G. Will post his reply when it's
received.
Here's his answer (at the end of a multi-message exchange):
======================= START QUOTED MESSAGE ===========================
> As for the original question, you advise us to "ignore end
date." That
> still leaves the question of what to do with the data we have from
that
> element which we have in previously imported documents. Should we
strip
> that out? Move the information into the new CTCompletionDate
element?
This is difficult for us to answer without fully understanding
your
application, but maybe explaining more about the history of these
data
elements will help. On the PRS side, the old end date data element
was
"depracated" some time ago in favor of the more precisely defined
"last
follow-up date" (now labeled study completion date in the PRS web
interface). Because we had old records with end date values that
did
not necessarily equate to LFUD, we could not simply rename the
original
end date field.
On the public site, our approach is to use the LFUD for completion
date
if it has been specified. If not, and the end date has been
specified,
we use that. So the completion date can be thought of as the "best
available" completion date.
So if you have a record where LFUD was used as completion date, it
is
best to use that. If you have a record where end date was used,
that
has effectively already been copied over for you. In either case,
there
should be no need to hold onto the old end date value.
Hope this helps.
======================= END QUOTED MESSAGE ===========================
So I guess this means we should take out the deprecated element from the schema, and modify the import software to avoid creating it. Presumably this will obviate the need for a global change to strip the element out, since the change in the import software should result in modifications to all of the imported documents (or at least the ones that NLM hasn't dropped from the set we get from them).
BZDATETIME::2009-10-29 17:36:36
BZCOMMENTOR::Bob Kline
BZCOMMENT::41
After discussion between Bob, Alan, Volker, and Lakshmi, we have decided that I will back out the work I have done on this task and create a branch for those changes in the source version control system. This will allow me to proceed with work on other enhancement requests which have been coming in involving the same source code affected by this request (and which have been assigned higher priority than this one has) without having to bypass the version control system and manually patch the production system.
We'll resume this task later on, when the dust has settled on the more urgent protocol-related work. At that point, I'll merge the branch back into the trunk, Volker will implement the changes to the publishing scripts, CSS, etc., and CIAT will test the results.
Don't forget to create the task for Volker to modify the CSS connected with these changes.
BZDATETIME::2009-10-30 08:51:40
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::42
(In reply to comment #41)
>
> Don't forget to create the task for Volker to modify the CSS
connected with
> these changes.
I have created a new task - OCECDR-3007 - for the CSS.
BZDATETIME::2009-10-30 16:55:28
BZCOMMENTOR::Bob Kline
BZCOMMENT::43
(In reply to comment #41)
> After discussion between Bob, Alan, Volker, and Lakshmi, we have
decided that I
> will back out the work I have done on this task and create a branch
for those
> changes in the source version control system.
Branch created at https://imbncipf01.nci.nih.gov/svn/CDR/branches/Task4444. The trunk now contains all of the other patches which have been applied to the production system, but none of the work done for this task, so work can now proceed on the other tasks involving the CTGovProtocol schema and the XSL/T import filter.
BZDATETIME::2009-11-10 08:25:44
BZCOMMENTOR::Bob Kline
BZCOMMENT::44
Even though this issue is at the top of my task priority queue of issues for which I'm not blocked waiting for responses from CIAT or ICRDB, we agreed that we would defer further work on this issue until the other changes for CT.gov export currently being tested have been promoted (and Cancer.gov is closer to being ready to use the results of this task). So I'm lowering the priority to remove it from the front of my queue. Feel free to raise it again when you're ready to resume work on this task.
BZDATETIME::2010-04-29 14:10:38
BZCOMMENTOR::Bob Kline
BZCOMMENT::45
Bumped up priority at Lakshmi's suggestion.
BZDATETIME::2010-05-11 11:39:53
BZCOMMENTOR::Bob Kline
BZCOMMENT::46
This task is now back at the top of my queue. Are you ready for me to install the new schema, filter, and import script and start testing import of the new elements? If so, should that be done on Mahler first or Franck?
BZDATETIME::2010-05-12 12:10:17
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::47
(In reply to comment #46)
> This task is now back at the top of my queue. Are you ready for me
to install
> the new schema, filter, and import script and start testing import
of the new
> elements? If so, should that be done on Mahler first or Franck?
Yes. Let's start with Mahler first before going to Franck.
BZDATETIME::2010-05-12 15:08:01
BZCOMMENTOR::Bob Kline
BZCOMMENT::48
(In reply to comment #47)
> Yes. Let's start with Mahler first before going to Franck.
OK. I have installed the new version of the schema and the modified XSL/T import filter on Mahler, and I ran an import job in test mode. Test mode stops after importing 10 trials, and that way we can look at the partial results, find and fix any problems in that subset, and run another test job.
!8416 Wed May 12 14:08:11 2010: Updated CDR360653 from
NCT00001469
!8416 Wed May 12 14:08:14 2010: Updated CDR647606 from NCT00002786
!8416 Wed May 12 14:08:22 2010: Updated CDR651987 from NCT00002864
!8416 Wed May 12 14:11:23 2010: Updated CDR65921 from NCT00003140
!8416 Wed May 12 14:11:26 2010: Updated CDR647609 from NCT00003145
!8416 Wed May 12 14:11:28 2010: Updated CDR66633 from NCT00003567
!8416 Wed May 12 14:11:29 2010: Updated CDR68001 from NCT00005997
!8416 Wed May 12 14:11:37 2010: Updated CDR68027 from NCT00006018
!8416 Wed May 12 14:11:39 2010: Updated CDR68197 from NCT00006261
!8416 Wed May 12 14:11:40 2010: Updated CDR68862 from NCT00023790
!8416 Wed May 12 14:11:42 2010: Updated CDR360755 from NCT00031551
Here are the links for viewing the imported documents:
http://mahler.nci.nih.gov/cgi-bin/cdr/ShowDocXml.py?DocId=360653
http://mahler.nci.nih.gov/cgi-bin/cdr/ShowDocXml.py?DocId=647606
http://mahler.nci.nih.gov/cgi-bin/cdr/ShowDocXml.py?DocId=651987
http://mahler.nci.nih.gov/cgi-bin/cdr/ShowDocXml.py?DocId=65921
http://mahler.nci.nih.gov/cgi-bin/cdr/ShowDocXml.py?DocId=647609
http://mahler.nci.nih.gov/cgi-bin/cdr/ShowDocXml.py?DocId=66633
http://mahler.nci.nih.gov/cgi-bin/cdr/ShowDocXml.py?DocId=68001
http://mahler.nci.nih.gov/cgi-bin/cdr/ShowDocXml.py?DocId=68027
http://mahler.nci.nih.gov/cgi-bin/cdr/ShowDocXml.py?DocId=68197
http://mahler.nci.nih.gov/cgi-bin/cdr/ShowDocXml.py?DocId=68892
http://mahler.nci.nih.gov/cgi-bin/cdr/ShowDocXml.py?DocId=360755
Here's a link for viewing the schema:
http://mahler.nci.nih.gov/cgi-bin/cdr/GetSchema.py?id=349686
Please review the new schema and the imported documents and let me know if you see any anomalies that need to be addressed. Once we've done enough of the small batches to satisfy you that we're ready to run the rest of the import job, I'll kick it off without the testing throttle.
BZDATETIME::2010-06-03 11:11:26
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::49
(In reply to comment #48)
> (In reply to comment #47)
>
> > Yes. Let's start with Mahler first before going to
Franck.
>
> OK. I have installed the new version of the schema and the modified
XSL/T
> import filter on Mahler, and I ran an import job in test mode. Test
mode stops
> after importing 10 trials, and that way we can look at the partial
results,
> find and fix any problems in that subset, and run another test
job
I am reviewing the schema and the imported trials. So far, I have not come across any problems. I will post anything I find.
BZDATETIME::2010-06-15 15:56:01
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::50
Should I also be looking at the imported documents in XMetal on Mahler? When I try to retrieve the documents, I get the "Document does not conform to DTD or XML Schema" dialog box.
BZDATETIME::2010-06-15 16:34:23
BZCOMMENTOR::Bob Kline
BZCOMMENT::51
So much time had elapsed since I ran that test import that the provisional schema for this issue has been overwritten by the changes required for issues #4837 and #4850. Are you ready for me to re-install this version (understanding that if other issues require changes to the CTGovProtocol schema before this testing is complete it will disappear again)?
BZDATETIME::2010-06-15 16:51:08
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::52
(In reply to comment #51)
> #4837 and #4850. Are you ready for me to re-install this
version
> (understanding that if other issues require changes to the
CTGovProtocol schema
> before this testing is complete it will disappear again)?
Yes, please. Proceed to install it, I should finish testing in the next couple of days.
BZDATETIME::2010-06-16 09:39:14
BZCOMMENTOR::Bob Kline
BZCOMMENT::53
Schema for this issue reinstalled on Mahler. Ready for user testing.
BZDATETIME::2010-06-16 14:52:52
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::54
(In reply to comment #48)
> (In reply to comment #47)
>
> > Yes. Let's start with Mahler first before going to
Franck.
>
> OK. I have installed the new version of the schema and the modified
XSL/T
> import filter on Mahler, and I ran an import job in test mode. Test
mode stops
> after importing 10 trials, and that way we can look at the partial
results,
> find and fix any problems in that subset, and run another test
job.
>
1. An empty ArmsOrGroups block appears to be dropped in whenever there is no data present. Examples, CDR0000360653 and CDR0000647606.
2. Also the CTNumberOfArms appears to default to a value of '1' for some of the trials even though there is not Arms information. For example CDR0000066633 and CDR0000068001.
Please run the next test.
BZDATETIME::2010-06-16 15:20:29
BZCOMMENTOR::Bob Kline
BZCOMMENT::55
(In reply to comment #54)
> (In reply to comment #48)
> > (In reply to comment #47)
> >
> > > Yes. Let's start with Mahler first before going to
Franck.
> >
> > OK. I have installed the new version of the schema and the
modified XSL/T
> > import filter on Mahler, and I ran an import job in test mode.
Test mode stops
> > after importing 10 trials, and that way we can look at the
partial results,
> > find and fix any problems in that subset, and run another test
job.
> >
>
> 1. An empty ArmsOrGroups block appears to be dropped in whenever
there is no
> data present. Examples, CDR0000360653 and CDR0000647606.
>
> 2. Also the CTNumberOfArms appears to default to a value of '1' for
some of
> the trials even though there is not Arms information. For example
CDR0000066633
> and CDR0000068001.
>
> Please run the next test.
Before I "fix" anything, let's speak with Lakshmi about how these elements are supposed to work. My understanding is that absence of explicit ARM info implies that a single ARM is used for a trial.
BZDATETIME::2010-06-17 12:46:02
BZCOMMENTOR::Bob Kline
BZCOMMENT::56
Lakshmi will take a look on Monday.
BZDATETIME::2010-07-08 17:27:16
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::57
(In reply to comment #55)
> > Please run the next test.
>
> Before I "fix" anything, let's speak with Lakshmi about how these
elements are
> supposed to work. My understanding is that absence of explicit ARM
info
> implies that a single ARM is used for a trial.
We decided to proceed with importing new trials for testing since what I reported as empty ArmsOrGroups elements were actually not empty. They did have the SingleArmOrGroup = 'Yes' attribute. Also, the CTNumberOfArms is supposed to default to '1' in the cases I reported from my testing.
BZDATETIME::2010-07-12 15:09:56
BZCOMMENTOR::Bob Kline
BZCOMMENT::58
Another 10 test trials have been imported on Mahler:
!9172 Mon Jul 12 15:03:49 2010: Updated CDR360649 from
NCT00001159
!9172 Mon Jul 12 15:03:50 2010: Updated CDR360650 from NCT00001160
!9172 Mon Jul 12 15:03:51 2010: Updated CDR360897 from NCT00001254
!9172 Mon Jul 12 15:03:57 2010: Updated CDR360652 from NCT00001397
!9172 Mon Jul 12 15:03:59 2010: Updated CDR360653 from NCT00001469
!9172 Mon Jul 12 15:04:00 2010: Updated CDR662657 from NCT00001582
!9172 Mon Jul 12 15:04:01 2010: Updated CDR360655 from NCT00001595
!9172 Mon Jul 12 15:04:03 2010: Updated CDR360866 from NCT00001620
!9172 Mon Jul 12 15:04:04 2010: Updated CDR360656 from NCT00001852
!9172 Mon Jul 12 15:04:06 2010: Updated CDR360898 from NCT00001872
!9172 Mon Jul 12 15:04:08 2010: Updated CDR75917 from NCT00002463
Here are the URLs for looking at the documents:
http://mahler.nci.nih.gov/cgi-bin/cdr/ShowDocXml.py?DocId=360649
http://mahler.nci.nih.gov/cgi-bin/cdr/ShowDocXml.py?DocId=360650
http://mahler.nci.nih.gov/cgi-bin/cdr/ShowDocXml.py?DocId=360897
http://mahler.nci.nih.gov/cgi-bin/cdr/ShowDocXml.py?DocId=360652
http://mahler.nci.nih.gov/cgi-bin/cdr/ShowDocXml.py?DocId=360653
http://mahler.nci.nih.gov/cgi-bin/cdr/ShowDocXml.py?DocId=662657
http://mahler.nci.nih.gov/cgi-bin/cdr/ShowDocXml.py?DocId=360655
http://mahler.nci.nih.gov/cgi-bin/cdr/ShowDocXml.py?DocId=360866
http://mahler.nci.nih.gov/cgi-bin/cdr/ShowDocXml.py?DocId=360656
http://mahler.nci.nih.gov/cgi-bin/cdr/ShowDocXml.py?DocId=360898
http://mahler.nci.nih.gov/cgi-bin/cdr/ShowDocXml.py?DocId=75917
BZDATETIME::2010-07-28 19:20:59
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::59
Please run another import job. The last set is not good candidates for testing because the trials don't have a lot of the new elements.
BZDATETIME::2010-07-29 12:35:26
BZCOMMENTOR::Bob Kline
BZCOMMENT::60
I've run another batch of ten.
BZDATETIME::2010-07-30 14:07:59
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::61
(In reply to comment #60)
> I've run another batch of ten.
Will you post the results here or I should use the ctgov report to review them?
BZDATETIME::2010-07-30 14:16:30
BZCOMMENTOR::Bob Kline
BZCOMMENT::62
!7476 Thu Jul 29 12:33:58 2010: Updated CDR78316 from
NCT00002524
!7476 Thu Jul 29 12:34:05 2010: Updated CDR63860 from NCT00002604
!7476 Thu Jul 29 12:34:06 2010: Updated CDR652417 from NCT00002790
!7476 Thu Jul 29 12:34:07 2010: Updated CDR65044 from NCT00002835
!7476 Thu Jul 29 12:34:09 2010: Updated CDR65081 from NCT00002844
!7476 Thu Jul 29 12:34:12 2010: Updated CDR67433 from NCT00004192
!7476 Thu Jul 29 12:34:14 2010: Updated CDR67440 from NCT00004197
!7476 Thu Jul 29 12:34:18 2010: Updated CDR67441 from NCT00004198
!7476 Thu Jul 29 12:34:20 2010: Updated CDR360901 from NCT00004847
!7476 Thu Jul 29 12:34:21 2010: Updated CDR652223 from NCT00005592
!7476 Thu Jul 29 12:34:23 2010: Updated CDR652158 from NCT00005594
It appears that an 11th trial was imported (presumably one that CIAT marked on the review interface).
BZDATETIME::2010-08-02 11:34:33
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::63
Thanks!
Please, run another import job.
BZDATETIME::2010-08-02 11:39:22
BZCOMMENTOR::Bob Kline
BZCOMMENT::64
(In reply to comment #63)
> Please, run another import job.
Do you want me to continue throttling the job, or should I run a full import (on Mahler) for the next job?
BZDATETIME::2010-08-02 11:45:40
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::65
(In reply to comment #64)
> (In reply to comment #63)
>
> > Please, run another import job.
>
> Do you want me to continue throttling the job, or should I run a
full import
> (on Mahler) for the next job?
I think throttling it will still be good if you double the documents imported per import job. After about two runs I will ask for full import before promoting to Bach.
BZDATETIME::2010-08-02 11:59:25
BZCOMMENTOR::Bob Kline
BZCOMMENT::66
(In reply to comment #65)
> I think throttling it will still be good if you double the
documents imported
> per import job. After about two runs I will ask for full import
before
> promoting to Bach.
!1932 Mon Aug 02 11:55:49 2010: Updated CDR360902 from
NCT00005902
!1932 Mon Aug 02 11:55:51 2010: Updated CDR360904 from NCT00005927
!1932 Mon Aug 02 11:55:53 2010: Updated CDR68307 from NCT00006478
!1932 Mon Aug 02 11:55:55 2010: Updated CDR662658 from NCT00006518
!1932 Mon Aug 02 11:55:57 2010: Updated CDR68306 from NCT00007865
!1932 Mon Aug 02 11:55:58 2010: Updated CDR662660 from NCT00026884
!1932 Mon Aug 02 11:55:59 2010: Updated CDR360755 from NCT00031551
!1932 Mon Aug 02 11:56:01 2010: Updated CDR258139 from NCT00052286
!1932 Mon Aug 02 11:56:04 2010: Updated CDR389483 from NCT00060541
!1932 Mon Aug 02 11:56:05 2010: Updated CDR352403 from NCT00060710
!1932 Mon Aug 02 11:56:06 2010: Updated CDR305817 from NCT00063934
!1932 Mon Aug 02 11:56:07 2010: Updated CDR662661 from NCT00068003
!1932 Mon Aug 02 11:56:11 2010: Updated CDR350343 from NCT00068146
!1932 Mon Aug 02 11:56:15 2010: Updated CDR350108 from NCT00071045
!1932 Mon Aug 02 11:56:19 2010: Updated CDR478803 from NCT00071799
!1932 Mon Aug 02 11:56:21 2010: Updated CDR350451 from NCT00073073
!1932 Mon Aug 02 11:56:22 2010: Updated CDR363923 from NCT00080262
!1932 Mon Aug 02 11:56:25 2010: Updated CDR365561 from NCT00080301
!1932 Mon Aug 02 11:56:44 2010: Updated CDR357423 from NCT00080912
!1932 Mon Aug 02 11:56:53 2010: Updated CDR662662 from nct00081354
BZDATETIME::2010-08-04 16:39:51
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::67
I am not finding enough trials to test the new elements so I am suggesting that CIAT will search clinicaltrials.gov for trials that are more likely to contain the elements we have added and send you the NCT IDs for the trials so that you can import. We will then review the trials. Is this Okay?
BZDATETIME::2010-08-05 08:52:56
BZCOMMENTOR::Bob Kline
BZCOMMENT::68
I could write a custom version of the import program to bring in specific trials for this task, but you may want to consider just having me run a full import job.
BZDATETIME::2010-08-05 10:57:44
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::69
(In reply to comment #68)
> I could write a custom version of the import program to bring in
specific
> trials for this task, but you may want to consider just having me
run a full
> import job.
Please proceed to run a full import job. It may take us a longer time to review but that's Okay.
BZDATETIME::2010-08-05 11:28:07
BZCOMMENTOR::Bob Kline
BZCOMMENT::70
Well, you wouldn't have to review every single imported trial, right? Couldn't you just look at the ones that would be on the list you described in comment #67?
BZDATETIME::2010-08-10 10:18:12
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::71
Bob:
I see CTSource in the new CTGovProtocol schema on Mahler. What does it
map to in clinicaltrials.gov?
Also, are these all the new elements added to the schema?
1. CTOversightInfo
2. ReasonStopped
3. PrimaryCompletionDate
4. CTStudyType
5. CTOutcomes
6. CTNumberOfArms
7. CTEnrollment
8. ArmsOrGroups
9. BiospecRetention
10. BiospecDescription
11. ProtocolRelatedLinks
12. CTReference
13. CTResultsReference
14. FirstReceivedDate
15. CTSource
16. CTAcronym
17. CTStudyPop
18. CTSamplingMethod
BZDATETIME::2010-08-10 10:33:37
BZCOMMENTOR::Bob Kline
BZCOMMENT::72
(In reply to comment #71)
> I see CTSource in the new CTGovProtocol schema on Mahler. What
does it map to
> in clinicaltrials.gov?
http://verdi.nci.nih.gov/tracker/attachment.cgi?id=1653
> Also, are these all the new elements added to the schema? ....
Yes.
http://mahler.nci.nih.gov/cgi-bin/cdr/GetSchema.py?id=349686
BZDATETIME::2010-08-10 10:53:40
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::73
(In reply to comment #72)
> (In reply to comment #71)
>
> > I see CTSource in the new CTGovProtocol schema on Mahler. What
does it map to
> > in clinicaltrials.gov?
>
> http://verdi.nci.nih.gov/tracker/attachment.cgi?id=1653
>
Thanks!
I was unable to find the 'Source' element in the NLM DTD provided as a
link in comment #3. I also don't see it mentioned in their data element
definitions
http://prsinfo.clinicaltrials.gov/definitions.html
Is it possible that it goes by another name? In clinicaltrials.gov, the
closest display name I can find is the 'Information Provided by"
field.
BZDATETIME::2010-08-10 11:16:36
BZCOMMENTOR::Bob Kline
BZCOMMENT::74
It's there in their DTD [1], but our software is looking for it as part of the required_header block (we create it as part of the RequiredHeader block in our imported document), even though the element is a top-level element. I have modified the import filter to look for it at the correct level. Do we need to adjust the placement in our own document, or did we deliberately put Source in our RequiredHeader block?
[1] http://clinicaltrials.gov/ct2/html/images/info/public.dtd
BZDATETIME::2010-08-10 11:43:20
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::75
(In reply to comment #74)
> It's there in their DTD [1], but our software is looking for it as
part of the
> required_header block (we create it as part of the RequiredHeader
block in our
> imported document), even though the element is a top-level element.
I have
> modified the import filter to look for it at the correct level. Do
we need to
> adjust the placement in our own document, or did we deliberately
put Source in
> our RequiredHeader block?
>
I don't remember all the details of the discussions of the elements but
this element appears to be in the right place in our document.
BZDATETIME::2010-08-10 15:19:09
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::76
We have identified the 'test trials'. I believe some of the trials may not be picked up for this import job because they may not meet the criteria for importing. Can we do a force download of those trials to review?
BZDATETIME::2010-08-10 15:22:36
BZCOMMENTOR::Bob Kline
BZCOMMENT::77
(In reply to comment #76)
> Can we do a force download of those trials to review?
Yes. I'll have to run a real download job from Franck (not just use the download from production), but I don't think they'll be too upset at having an extra download job run in the daytime if it's just this once.
BZDATETIME::2010-08-10 16:41:27
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::78
(In reply to comment #77)
> (In reply to comment #76)
>
> > Can we do a force download of those trials to review?
>
> Yes. I'll have to run a real download job from Franck (not just use
the
> download from production), but I don't think they'll be too upset
at having an
> extra download job run in the daytime if it's just this once.
Great. I think we can proceed with the full import job when you are ready.
BZDATETIME::2010-08-10 17:15:11
BZCOMMENTOR::Bob Kline
BZCOMMENT::79
(In reply to comment #77)
> Yes. I'll have to run a real download job from Franck ....
I misspoke when I said Franck (I meant Mahler); on which server did you flag the trials for forced download?
BZDATETIME::2010-08-10 17:20:09
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::80
(In reply to comment #79)
> (In reply to comment #77)
>
> > Yes. I'll have to run a real download job from Franck
....
>
> I misspoke when I said Franck (I meant Mahler); on which server did
you flag
> the trials for forced download?
We have not flagged them yet. We can flag them this evening so that you can do the download tomorrow, if that is Okay?
BZDATETIME::2010-08-10 17:23:39
BZCOMMENTOR::Bob Kline
BZCOMMENT::81
That's fine. Just remember, I'll be leaving town mid-day tomorrow, and won't be back until next week.
BZDATETIME::2010-08-11 07:23:04
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::82
(In reply to comment #81)
> That's fine. Just remember, I'll be leaving town mid-day tomorrow,
and won't
> be back until next week.
Please proceed with the download job. We have flagged about 18 trials for force download.
BZDATETIME::2010-08-11 11:54:17
BZCOMMENTOR::Bob Kline
BZCOMMENT::83
The download job completed a while ago. The import job is still running (mostly done). You can check periodically for the individual documents you're testing with to see if they've finished importing.
BZDATETIME::2010-08-12 15:58:24
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::84
I went back after the review meeting to re-check some of the documents we reviewed yesterday and I was surprised to see that many of the new data are now in there. When I looked at the doc history, it turned out that we did most of our reviews about an hour before (between 2:15 and 3:00) the import job completed (about 4:16). I did not realize it will take that long for the job to complete.
Now, I can say that a lot of the data are present in the documents
and they are where we wanted to see them, but there are still a few of
them we have not been able to see in the documents because we have not
seen trials that are supposed to have them (but they are in the
minority). This is because some of the trials we identified and did a
force download for, did not get downloaded as expected. Four of those
documents ended up on the review page and we have marked them to be
imported. They are:
NCT01168791
NCT01169220
NCT01143545
NCT01040949
I believe Bob will have to manually start another download job so that
we can see them.
Other data elements we have not been able to test because the documents we identified did not get imported are:
CTAcronym
We will be able to test with one of the 4 above
CTSamplingMethod
NCT00643929
NCT00727571
CTStudyPop
NCT00870376
NCT01126567
BiospecRetention
NCT00989846
NCT01126567
BiospecDescription
NCT00710307
NCT01126567
NCT00989846
We will continue to review the imports to see if we can identify the above data in some of the trials. If we are not successful or if it takes too long, we may have to find a way to get the above documents imported or we can identify another set of trials to force download to test the remaining elements.
BZDATETIME::2010-08-12 15:59:37
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::85
Added Margaret to this issue.
BZDATETIME::2010-08-23 15:00:27
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::86
It looks like the schema has been overwritten again. I am getting the "Document does not conform to DTD or XML Schema" dialog box when I attempt to retrieve and review documents.
BZDATETIME::2010-08-23 15:15:48
BZCOMMENTOR::Bob Kline
BZCOMMENT::87
Right. That will keep happening as long as we delay deployment of these changes while making other changes to the same schema (which need to be installed, tested, and promoted). Probably task #4891 this time. Changes for this task restored on Mahler.
BZDATETIME::2010-08-24 16:16:05
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::88
1. In the ArmsOrGroups block when there is no data for the ArmsOrGroupType, a blank element is dropped in. Is it possible not to display the element when data is not present? Should I put this under OCECDR-3007?
2. It looks like not all relevant information for the ArmsOrGroups
are not brought in.
For example
CDR0000671043 - NCT01103323 -
http://www.clinicaltrials.gov/ct2/show/NCT01103323?term=NCT01103323&rank=1
In the record on ctgov, there is an “Assigned Interventions" column. However, this information is not included in the CDR.
Same issue with CDR0000671899 - NCT01108484
http://www.clinicaltrials.gov/ct2/show/NCT01108484?term=NCT01108484&rank=1
BZDATETIME::2010-08-25 10:17:54
BZCOMMENTOR::Bob Kline
BZCOMMENT::89
Doesn't look like that's in the proposed mapping table that I posted for review back in March of 2009. Do you want to have it added?
BZDATETIME::2010-08-25 11:13:50
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::90
(In reply to comment #89)
> Doesn't look like that's in the proposed mapping table that I
posted for review
> back in March of 2009. Do you want to have it added?
Yes please.
BZDATETIME::2010-08-25 11:30:13
BZCOMMENTOR::Bob Kline
BZCOMMENT::91
In addition to arm_group_label, NLM's intervention block also contains description and other_name elements. Do you want these mapped as well?
BZDATETIME::2010-08-25 11:38:18
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::92
(In reply to comment #91)
> In addition to arm_group_label, NLM's intervention block also
contains
> description and other_name elements. Do you want these mapped as
well?
For the description, there is already an element in the PDQ ArmsOrGroups blocked called ArmOrGroupDescription. Are you referring to that or a different element and Yes, please map the other_name element.
BZDATETIME::2010-08-25 12:36:51
BZCOMMENTOR::Bob Kline
BZCOMMENT::93
(In reply to comment #92)
> For the description, there is already an element in the PDQ
ArmsOrGroups
> blocked called ArmOrGroupDescription. Are you referring to that or
a different
> element ....
That's a description of the arm or group; this is a description of the intervention. I'm looking at the intervention block because that's where the link is back to the arm/group which makes it possible for CT.gov to populate the Assigned Interventions column of the arms table on its web site.
BZDATETIME::2010-08-25 12:41:06
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::94
(In reply to comment #93)
> (In reply to comment #92)
> That's a description of the arm or group; this is a description of
the
> intervention. I'm looking at the intervention block because that's
where the
> link is back to the arm/group which makes it possible for CT.gov to
populate
> the Assigned Interventions column of the arms table on its web
site.
Yes. Please include that also. Thanks!
BZDATETIME::2010-08-26 10:56:20
BZCOMMENTOR::Bob Kline
BZCOMMENT::95
Revised schema at http://mahler.nci.nih.gov/cgi-bin/cdr/GetSchema.py?id=349686; please take a look and see if there are any other mappings you'd like to have included.
Attachment task-4444-mapping (1).xls has been added with description: Added more children for CTIntervention
BZDATETIME::2010-08-30 12:13:29
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::96
(In reply to comment #95)
> Created attachment 1985 [details]
> Added more children for CTIntervention
>
> Revised schema at http://mahler.nci.nih.gov/cgi-bin/cdr/GetSchema.py?id=349686;
> please take a look and see if there are any other mappings you'd
like to have
> included.
There is a Keywords section in the display on clinicaltrials.gov that
we will like to include. In the link below the heading for this
information is:
"Keywords provided by National Institutes of Health Clinical Center
(CC):"
http://www.clinicaltrials.gov/ct2/show/NCT00070785?term=NCT00070785&rank=1
BZDATETIME::2010-08-30 13:17:16
BZCOMMENTOR::Bob Kline
BZCOMMENT::97
http://mahler.nci.nih.gov/cgi-bin/cdr/GetSchema.py?id=349686
If there are no further additions, I will finish the modifications to the import filter and we can run some another test.
Attachment task-4444-mapping-20100830.xls has been added with description: Added CTKeyword as child of CTGovIndexing block
BZDATETIME::2010-08-31 09:40:07
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::98
(In reply to comment #97)
> Created attachment 1988 [details]
> Added CTKeyword as child of CTGovIndexing block
>
> http://mahler.nci.nih.gov/cgi-bin/cdr/GetSchema.py?id=349686
>
> If there are no further additions, I will finish the modifications
to the
> import filter and we can run some another test.
Yes. Please proceed. No further additions.
BZDATETIME::2010-08-31 11:33:35
BZCOMMENTOR::Bob Kline
BZCOMMENT::99
I've installed the code to pick up the new elements and I ran a complete download and import job suite on Mahler. If a document hasn't changed since the previous copy we got from NLM it won't have been marked for re-import, so if you have any specific trials you need to have re-imported with the new filter, let me know and I'll flag them by hand.
BZDATETIME::2010-08-31 14:22:42
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::100
(In reply to comment #99)
> if you have any specific trials you need to have re-imported with
the new
> filter, let me know and I'll flag them by hand.
Please, flag these:
NCT01188759
NCT01188187
NCT01186263
NCT01103323
NCT01108484
BZDATETIME::2010-08-31 15:17:52
BZCOMMENTOR::Bob Kline
BZCOMMENT::101
!1092 Tue Aug 31 15:16:09 2010: Updated CDR671043 from
NCT01103323
!1092 Tue Aug 31 15:16:10 2010: Updated CDR671899 from NCT01108484
!1092 Tue Aug 31 15:16:12 2010: Added NCT01186263 as CDR0000672011
!1092 Tue Aug 31 15:16:12 2010: Added NCT01188187 as CDR0000672012
!1092 Tue Aug 31 15:16:12 2010: Added NCT01188759 as CDR0000672013
BZDATETIME::2010-09-01 13:40:02
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::102
Could you move the ArmsOrGroups block and place it right before the CTGovIndexing block? Is this possible?
BZDATETIME::2010-09-01 14:40:31
BZCOMMENTOR::Bob Kline
BZCOMMENT::103
Done.
BZDATETIME::2010-09-02 09:32:51
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::104
(In reply to comment #103)
> Done.
Can you please do a full download job so that we can do a final review?
BZDATETIME::2010-09-07 14:10:26
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::105
We have finished reviewing the documents. They look good. Please promote the changes Bach.
BZDATETIME::2010-09-08 13:15:01
BZCOMMENTOR::Bob Kline
BZCOMMENT::106
Hurrah! We started this task back in January 2009, so it feels good to finally be putting it to bed!
I have installed the new versions of the filter and schema on Bach, and merged the version control branch back into the trunk.
Please keep a close eye on the imported documents over the next few days to make sure nothing is broken.
Be aware that the new elements won't be picked up for a given document until NLM sends us a new version of that document. If that presents problems, then we may want to discuss manually setting all of the 'imported' dispositions to 'import requested' in the ctgov_import table to force a re-import. Might be preferable to let things just take their normal course, though, at least for a while, in order to minimize the amount of republished trials we end up pushing to Cancer.gov.
BZDATETIME::2010-09-08 14:24:59
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::107
(In reply to comment #106)
> Be aware that the new elements won't be picked up for a given
document until
> NLM sends us a new version of that document. If that presents
problems, then
> we may want to discuss manually setting all of the 'imported'
dispositions to
> 'import requested' in the ctgov_import table to force a re-import.
Might be
> preferable to let things just take their normal course, though, at
least for a
> while, in order to minimize the amount of republished trials we end
up pushing
> to Cancer.gov.
In the short term, I think it is okay to let things take their normal course as you suggested since the records that don’t get updated won’t be invalidated.
BZDATETIME::2010-09-09 11:10:46
BZCOMMENTOR::Bob Kline
BZCOMMENT::108
Looks as if this morning's import job processed 99 trials, 88 of them updates and the rest new. Don't see evidence of any problems. I've attached the list of the IDs so you can spot-check a few to make sure they look OK.
Attachment task4444-20100909.log has been added with description: Extract from import log
BZDATETIME::2010-09-15 13:30:27
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::109
(In reply to comment #108)
> Created attachment 1996 [details]
> Extract from import log
>
> Looks as if this morning's import job processed 99 trials, 88 of
them updates
> and the rest new. Don't see evidence of any problems. I've attached
the list
> of the IDs so you can spot-check a few to make sure they look
OK.
Yes. We have not come across any problems.
I have entered a new OCECDR-3224 to take care of documents that may not
be not be updated during the normal daily updates.
This issue is now closed. Thank you!!
File Name | Posted | User |
---|---|---|
ClinicalTrialsSchemaChanges.doc | 2009-02-03 08:48:43 | Osei-Poku, William (NIH/NCI) [C] |
task4444-20100909.log | 2010-09-09 11:10:46 | |
task-4444-mapping.xls | 2009-03-27 16:25:40 | |
task-4444-mapping (1).xls | 2010-08-26 10:56:20 | |
task-4444-mapping-20100830.xls | 2010-08-30 13:17:16 |
Elapsed: 0:00:00.001803