PDQ Issues

Issue Number	3272
Summary	[CTRP] Modify CTgov trials schema to take RSS sites from CTRP
Created	2010-12-06 14:42:41
Issue Type	Improvement
Submitted By	Beckwith, Margaret (NIH/NCI) [E]
Assigned To	Kline, Bob (NIH/NCI) [C]
Status	Closed
Resolved	2012-10-05 09:59:57
Resolution	Fixed
Path	/home/bkline/backups/jira/ocecdr/issue.107600

Description

BZISSUE::4962
BZDATETIME::2010-12-06 14:42:41
BZCREATOR::Margaret Beckwith
BZASSIGNEE::Bob Kline
BZQACONTACT::Lakshmi Grama

We talked about needing to develop a schema for CTRP trials in order to be able to import them. Suggestion was made to ask them if we could get a schema from them as a starting point. Maybe we can ask Charles and also get some idea of the time frame for exporting the set of trials to them now that the dates have shifted again. We expect the transfer of active Coop.Group trials to begin Feb. 1 and so we need to be able to import them back in from CTRP at that time.

Comment entered 2010-12-07 11:13:28 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-12-07 11:13:28
BZCOMMENTOR::Volker Englisch
BZCOMMENT::1

I've added my task as a dependency to keep up with the changes.

Comment entered 2010-12-14 16:13:03 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-12-14 16:13:03
BZCOMMENTOR::Bob Kline
BZCOMMENT::2

I've asked Charles Yaghmour to send us the schema for what they'll be exporting to us.

Comment entered 2011-01-15 12:22:40 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-01-15 12:22:40
BZCOMMENTOR::Bob Kline
BZCOMMENT::3

The first step was to write software which parsed the schema we got from them and generated the equivalent schema which our homegrown subset schema implementation is able to use. The results can be viewed at http://mahler.nci.nih.gov/cgi-bin/cdr/GetSchema.py?id=686743. I will post the corresponding DTD for those who prefer to review that format.

The next question is whether we want to:

1. use the DTD as it is, augmented with blocks which will hold PDQ info; or
2. programmatically convert the element names to our own style; or
3. manually rework the schema to make it more closely resemble the schema
for CTGovProtocol documents.

The first option means the easiest and quickest path to implementing the import software. The second option, automatically changing things like arm_group_label to ArmGroupLabel, is not much more difficult, because I would be able to write a program which would generate the XSL/T script that would transform their element names to the names in the schema. The third option would involve significantly more work to write the import software, but might reduce Volker's work for creating the publishing filter.

Assuming the end-of-month deadline holds, this decision will need to be made very quickly, if we're to have any hope of meeting that deadline. Best is probably a command decision from Margaret or Lakshmi.

Here are the problems I have reported to Charles so far:

1. There's a discrepancy between the data and the schema for the
structure of the resp_party_person element; NCI-2009-00045.xml
has a name child (with text content) where the schema has articulated
first_name, middle_initial, and last_name elements.

2. In the same document, the sponsors/collaborator block has a single
child element (agency, with text content), where the schema has
only allows child elements: name, po_id, ctep_id, address, phone,
fax, or email.

3. Many of the dates are incomplete (for example, "2004-12" in the
current_trial_status_date element); didn't we say we'd be getting
complete ISO formatted dates?

4. The organization of the condition_info block is unfortunate; in
NCI-2009-00045.xml, for example, there are 18 sibling elements
of types disease_code, menu_display_name, and preferred_name
all the same block. Best practice is to group name and code
elements which belong to the same condition in a wrapper element,
rather than relying on assumptions about the order of alternating
elements.

5. The use of plural and singular names for elements is misleading in
places. For example, the disease_condition element name is
singular, but more than one condition can appear in the block for
that element. Similarly (with the mistake in the other direction),
each "criteria" element holds a single criterion. Wouldn't it be
more sensible to have a structure along these lines?

<criteria>
o <criterion/>
o <criterion/>
o <criterion/> ...

6. The use of textblock elements seems arbitrary. Why, for example,
would "- WBC >= 3,000/mm^3" need to be wrapped in a textblock
element, when "- No symptomatic congestive heart failure" in
the same set of criterion is not?

7. Some of the "criteria" elements are empty.

8. I noticed an email address ("sisherma@mdanderson.org") in one of
the phone elements for NCI-2009-00045.xml.

9. For the contact block, the schema allows a 'middle_initials'
child element but I see 'middle_name' elements in the data.

10. Is 'null.' really the middle name for Jacqueline Jonklass?

Comment entered 2011-01-15 12:24:03 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-01-15 12:24:03
BZCOMMENTOR::Bob Kline
BZCOMMENT::4

Comment entered 2011-01-15 12:24:03 by Kline, Bob (NIH/NCI) [C]

Attachment ctrp.dtd has been added with description: Here's the DTD I promised

Comment entered 2011-02-01 16:30:59 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-02-01 16:30:59
BZCOMMENTOR::Bob Kline
BZCOMMENT::5

William:

Lakshmi suggested that you do the preliminary work on coming up with mappings from the structure we got from CTRP into our own structure which more closely resembles a CTGovProtocol document, going with option #3 from comment #3.

Comment entered 2011-02-01 16:55:31 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-02-01 16:55:31
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::6

(In reply to comment #5)
> William:
>
> Lakshmi suggested that you do the preliminary work on coming up with mappings
> from the structure we got from CTRP into our own structure which more closely
> resembles a CTGovProtocol document, going with option #3 from comment #3.

For clarification:
Considering the following option,
(3. manually rework the schema to make it more closely resemble the schema
for CTGovProtocol documents.), I should use the schema you generated here: (http://mahler.nci.nih.gov/cgi-bin/cdr/GetSchema.py?id=686743) to provide a mapping in a spreadsheet that resemble CTGovProtocol documents. Is that right?

Comment entered 2011-02-03 11:41:44 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-02-03 11:41:44
BZCOMMENTOR::Bob Kline
BZCOMMENT::7

Here's the latest message from Charles Yaghmour:

Hi Bob,

We have completed the changes related to the PDQ export as described below. You can find the updated XSD at:

https://ncisvn.nci.nih.gov/svn/coppa/trunk/code/pa/public/

Sample exported trials could be found at:

https://trials-dev.nci.nih.gov/pa/pdqgetFileByDate.action?date=CTRP-TRIALS-2011-02-01-T-01-24-58.zip

https://trials-dev.nci.nih.gov/pa/pdqgetFileByDate.action?date=CTRP-TRIALS-2011-02-02-T-19-00-00.zip

Please take a look and let me know if you have any questions.

Thank you,

Charles

Comment entered 2011-02-04 10:35:58 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-02-04 10:35:58
BZCOMMENTOR::Bob Kline
BZCOMMENT::8

I posted this here in case Mahler is unavailable later today (they're still working on system maintenance for that server).

Comment entered 2011-02-04 10:35:58 by Kline, Bob (NIH/NCI) [C]

Attachment ctrp.xml has been added with description: Incorporates modifications made in response to our initial feedback

Comment entered 2011-02-07 19:44:26 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-02-07 19:44:26
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::9

I have attached the spreadsheet of the mapping.

1. There were elements or sets of elements from the CTRP schema that matched some of the elements or blocks of elements in the CTGovProtocol schema exactly.
For example: brief_title = BriefTitle
collaborator = Collaborator

2. There are many cases where CTRP provides more information for an element or a set of elements that are already in the CTGovProtocol schema:
For example: The trial_staus elements. CTRP provides additional elements:
current_trial_status_date
current_trial_start_date_type
current_trial_completion_date_type
Also, for the person and organization or location information, the CTRP schema provides additional information: po_id and the ctep_id.

3. There are elements from the CTRP schema that are new. For these elements, I was not able to identify matching CTGovProtocol elements for them.
For example: resp_party
trial_ind_ide

4. There are a few elements that are captured in the PDQ Indexing block of the CTGovProtocol document but my guess is that you would not want to map them that way so I noted them in the comments column which is the fourth column.

The spreadsheet identifies the four categories of elements mentioned above. I should also mention that there were elements that are in the CTGovProtocol schema but are not in the CTRP protocol schema. I did not go into the trouble of identifying all those elements. Please let me know if you want me to identify these elements.

I created two tabs in the attached spreadsheet because presenting all the information in one tab seemed confusing to me:

TAB 1. NEW BLOCKS OR ELEMENTS - This tab highlights cases where CTRP is providing us with new elements or blocks of elements and you can find this information in the second column titled NOT IN CTGOVPROTOCOL SCHEMA. Please note that in most cases, I mention only the wrapper or parent element.
The first column (CLINICAL TRIALS REPORTING PROGRAM) of this spreadsheet captures all the CTRP elements that have matching CTGovProtocol elements and those matching CTGovProtocol elements can
be found in the third column titled CLINICALTRIALS GOV.

TAB 2. The second tab is called ADDITIONAL ELEMENTS. This tab highlights elements that have matching CTGovProtocol elements but CTRP is providing additional elements and this new information is also found in the second column: NOT IN CTGOVPROTOCOL SCHEMA
For example:
The Phase element is same for both schema but CTRP is providing one additional element called "phase_additional_qualifier" so you will find this element listed in the second column under phase.

Please let me know if you have questions or if you want me to present the information in a different way.

Comment entered 2011-02-07 19:44:26 by Osei-Poku, William (NIH/NCI) [C]

Attachment CTRP_CTGOV_MAP.xls has been added with description: CTRP_CTGOV MAPPING

Comment entered 2011-02-08 10:35:05 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-02-08 10:35:05
BZCOMMENTOR::Bob Kline
BZCOMMENT::10

What is the significance of the very large empty gaps in the sheets?

Comment entered 2011-02-08 12:06:22 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-02-08 12:06:22
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::11

I thought I had deleted all the elements after the gap but it turned out not to be the case. I have attached a new spreadsheet that does not include the duplicate elements after the gap.

Comment entered 2011-02-08 12:06:22 by Osei-Poku, William (NIH/NCI) [C]

Attachment CTRP_CTGOV_MAP (1).xls has been added with description: CTRP_CTGOV MAPPING

Comment entered 2011-02-08 17:33:23 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-02-08 17:33:23
BZCOMMENTOR::Bob Kline
BZCOMMENT::12

Perhaps I don't understand what you're representing in the spreadsheets. Starting at the top, for example, CTRP has elements like

/clinical_study/id_info/ctep_id

and

/clinical_study/id_info/dcp_id

which I would have expected to see on the second sheet, which you describe as the sheet on which you identify "elements that have matching CTGovProtocol elements but CTRP is providing additional elements" but I don't see those on either of the first two sheets (not sure what the third sheet is for).

Also, the mapping document's primary purpose would be to tell us what you want us to do with the elements which don't already have a corresponding place in the CTGovProtocol schema. Where, for example, do you want us to put the trial owners?

Comment entered 2011-02-08 18:04:05 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-02-08 18:04:05
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::13

Well the IDs part was a little tricky for me because it looks like we do have all the IDs they are providing us but they present them differently. For example we do have NCTID as an element or tag but CTRP represents this using the id_domain tag with a value of NCT and id_type element with a value of nct_id :
<secondary_id>
<id>NCT00020306</id>
<id_type>nct-id</id_type>
<id_domain>NCT</id_domain>
</secondary_id>

Since we did not have the equivalence of that presentation, I included the id_domain element in the second column (and also included in the first column as mapping to NCTID) of the second sheet as if they were providing more information. In a nutshell the IDs part was not as straightforward to me as the other parts. Please let me know if you want me to represent it in a better way.

>Also, the mapping document's primary purpose would be to tell us what you want
>us to do with the elements which don't already have a corresponding place in
>the CTGovProtocol schema. Where, for example, do you want us to put the trial
>owners?

Sure. I was thinking that we will first discuss which of the new elements to include and which ones we should not before saying where they should go but I can work from the assumption that we will include all the new elements and provide you with another document stating where each new element should go.

Comment entered 2011-02-08 18:07:57 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-02-08 18:07:57
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::14

>
> which I would have expected to see on the second sheet, which you describe as
> the sheet on which you identify "elements that have matching CTGovProtocol
> elements but CTRP is providing additional elements" but I don't see those on
> either of the first two sheets (not sure what the third sheet is for).

I have attached a new spreadsheet without the third sheet.

Comment entered 2011-02-08 18:07:57 by Osei-Poku, William (NIH/NCI) [C]

Attachment CTRP_CTGOV_MAP.xls has been added with description: CTRP_CTGOV_MAP

Comment entered 2011-02-09 16:21:15 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-02-09 16:21:15
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::15

I am attaching another spreadsheet with a tab called "LAYOUT" and a column called "LAYOUT/DISPLAY" which contains comments about where each of the new elements should go. This is just a Draft since there are a few things I need to know before coming up with a complete set of elements for the CTRP document indicating where each one should go. For example, since I don't know what the new elements would be called, I used the same element names as the one in the CTRP schema except in a few places where I was a little creative. May be after we’ve discussed this further, we will come up with a better structure of the CTRP document.

Comment entered 2011-02-09 16:21:15 by Osei-Poku, William (NIH/NCI) [C]

Attachment CTRP_CTGOV_MAP_Layout_Display.xls has been added with description: Layout

Comment entered 2011-02-15 09:45:54 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-02-15 09:45:54
BZCOMMENTOR::Bob Kline
BZCOMMENT::16

Thanks, William. I have taken what you prepared and expanded it to a mapping [1] I was able to use to convert the documents we got from Charles.[2] I compiled some additional notes and questions[3] which I was hoping to review with Lakshmi, but I haven't heard back from her. I'll need guidance from you and her about what you need in the schema for information owned by PDQ (as opposed to imported from CTRP), as well as what custom logic is needed for processing those blocks. The question on top of the pile concerns maintenance of information on persons and organizations. We have drifted steadily in the direction of abandoning active work to keep our Person and Organization documents up to date, and I assume we're hoping to let CTRP take over that responsibility. I can think of a couple of ways this might work. The most straightforward approach would be to just store the IDs and name strings we get from CTRP, and for those occasions on which we need more information we would hook into APIs provided by CTRP for retrieving that information in real time. Such occasions may never actually arise, as we're getting all of the contact information (postal address, phone, fax, and email) in the trial documents themselves. A more complicated approach would be something similar to what we do for trial information imported from NLM, who aren't as careful about tracking the identities of persons and organizations, and who don't provide as much contact information in what they export. For those we maintain our own Person and Organization documents and attempt (not always successfully) to map what we get from them to those documents using the external map table. Which approach are we planning to take? Or is there some third option being considered? As soon as I get the answers to the questions above I'll create a schema document reflecting those answers and the structures implied by the mapping outline and the converted trial documents. Please review the converted documents and let me know if you need anything changed. I have already noted a number of anomalies in the sample data we got from CTRP, and I have reported the problems to Charles. The worst problem at this point is the garbling they've done with the entry criteria information.

[1] http://bach.nci.nih.gov/ctrp-to-pdq.html
[2] http://bach.nci.nih.gov/cgi-bin/cdr/view-converted-ctrp-docs.py
[3] http://bach.nci.nih.gov/ctrp-mapping-notes.html

Comment entered 2011-02-16 15:30:15 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-02-16 15:30:15
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::17

I met with my colleagues here at CIAT to discuss these and have come up with the following suggested responses (attached). I believe Margaret and Lakshmi need to take a look at our suggestions before they are implemented or maybe we can discuss them in our meeting tomorrow.

Comment entered 2011-02-16 15:30:15 by Osei-Poku, William (NIH/NCI) [C]

Attachment CTRP Mapping Notes.doc has been added with description: suggested responses

Comment entered 2011-02-21 17:48:58 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-02-21 17:48:58
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::18

I am attaching the updated copy of the suggested responses with notes in red font taking into consideration our discussions last Thursday in the CDR Meeting.

Comment entered 2011-02-21 17:48:58 by Osei-Poku, William (NIH/NCI) [C]

Attachment CTRP Mapping Notes_Updated.doc has been added with description: suggested responses updated

Comment entered 2011-02-22 14:01:23 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-02-22 14:01:23
BZCOMMENTOR::Bob Kline
BZCOMMENT::19

(In reply to comment #18)

> I am attaching the updated copy of the suggested responses with notes in red
> font taking into consideration our discussions last Thursday in the CDR
> Meeting.

One thing I believe we did agree on, but which I didn't see reflected in the updated response document, was the decision to keep the blocks of related information intact. So, for example, instead of scattering the elements contained in the study design block around in different places in our own documents, I believe we decided we'd keep those elements together as CTRP does. Does this match your understanding of what we said in the meeting?

Comment entered 2011-02-23 11:06:56 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-02-23 11:06:56
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::20

(In reply to comment #19)
> (In reply to comment #18)
>
> > I am attaching the updated copy of the suggested responses with notes in red
> > font taking into consideration our discussions last Thursday in the CDR
> > Meeting.
>
> One thing I believe we did agree on, but which I didn't see reflected in the
> updated response document, was the decision to keep the blocks of related
> information intact. So, for example, instead of scattering the elements
> contained in the study design block around in different places in our own
> documents, I believe we decided we'd keep those elements together as CTRP does.
> Does this match your understanding of what we said in the meeting?

Yes it does. I did not update it because our initial response agrees with your original suggestion to keep the elements of the block together as CTRP does.

Comment entered 2011-03-07 12:52:29 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-03-07 12:52:29
BZCOMMENTOR::Bob Kline
BZCOMMENT::21

Here's the proposed schema:

http://franck.nci.nih.gov/cgi-bin/cdr/GetSchema.py?id=694781

Note that I have the PDQIndexing block temporarily set as optional, so that I could run some validation checks on the rest of the document without getting noise about missing or invalid PDQIndexing blocks.

Here are the person and organization documents created by the import software:

http://franck.nci.nih.gov/cgi-bin/cdr/view-ctrp-orgs-and-persons.py

The reconverted trials can be reviewed here:

http://bach.nci.nih.gov/cgi-bin/cdr/view-converted-ctrp-docs.py

Comment entered 2011-03-10 08:59:52 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-03-10 08:59:52
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::22

(In reply to comment #21)

I have not finished reviewing yet but here are two quick observations:
1. It looks like new persons and organizations were created even if we already have records for them. Can we re-use the existing records instead? This will eliminate a lot of confusion down the line since we continue to work with person and organization records and having duplicates always creates problems. Also, because what we expect from CTRP in terms of persons and organizations are all RSS data, we already have records created for majority of them, if not all of them.

2. The OrgName element continue to have the following string for all the ones I reviewed “replace with PRS Organization Name you log in with” which I believe is the instruction provided in the XML for the cancer centers to register with CTGov.

Comment entered 2011-03-10 15:11:42 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-03-10 15:11:42
BZCOMMENTOR::Bob Kline
BZCOMMENT::23

(In reply to comment #22)
> (In reply to comment #21)
>
> I have not finished reviewing yet but here are two quick observations:
> 1. It looks like new persons and organizations were created even if we already
> have records for them. Can we re-use the existing records instead? This will
> eliminate a lot of confusion down the line since we continue to work with
> person and organization records and having duplicates always creates problems.
> Also, because what we expect from CTRP in terms of persons and organizations
> are all RSS data, we already have records created for majority of them, if not
> all of them.

We discussed this problem in this afternoon's status meeting, and I came away with the action item to investigate whether the fact that the conversion software was creating so many new documents for persons and organizations which CIAT believes are already in the CDR and have mappings to CTEP IDs was the result of:

(a) a bug in the conversion software;
(b) missing CTEP IDs in the CTRP trial documents; or
(b) mismatches in the CTEP IDs we use versus what they give us.

The good news is that the conversion software appears to be behaving correctly. If you look at

http://bach.nci.nih.gov/cgi-bin/cdr/view-converted-ctrp-docs.py?doc=NCI-2009-00038

which is the converted document for the example we were looking at in the status meeting you'll see that the conversion software actually found the mapping to CDR31539 we were hoping it would. The reason a new document was created for the Dana-Farber Center is that while some of the trial documents did include the CTEP ID for the center, others did not (that's the bad news). If you look, for example, at

http://bach.nci.nih.gov/cgi-bin/cdr/view-converted-ctrp-docs.py?doc=NCI-2009-00042

you'll find that this time the center appears with a CTRP ID (po_id="3456936") but without a CTEP ID (and with a slight variation in the name of the center ("Dana-Farber Harvard Cancer Center" instead of "Dana-Farber Cancer Institute" which is the string they use for the ones which do have the CTEP ID).

> 2. The OrgName element continue to have the following string for all the
> ones I reviewed “replace with PRS Organization Name you log in with”
> which I believe is the instruction provided in the XML for the cancer
> centers to register with CTGov.

I have reported this problem to Charles Y.

Comment entered 2011-03-15 11:07:56 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-03-15 11:07:56
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::24

I am wondering if we could limit the test data to only cooperative group trials since those are the priority. It seems also that if the test data is just cooperative trials we will have fewer issues with missing CTEP ids and duplicates because the RSS service provided CTEP ids most of the times.

Comment entered 2011-03-15 11:46:50 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-03-15 11:46:50
BZCOMMENTOR::Bob Kline
BZCOMMENT::25

(In reply to comment #24)
> I am wondering if we could limit the test data to only cooperative group trials
> since those are the priority. It seems also that if the test data is just
> cooperative trials we will have fewer issues with missing CTEP ids and
> duplicates because the RSS service provided CTEP ids most of the times.

We can do that, but you'll need to supply the CTRP trial IDs (e.g., NCI-2009-00002) for the ones you want included. Since one of the problems we're trying to report on is the inability to map to an existing CDR Organization document we can't tell in such cases whether the Organization document would have the OrganizationType values we look for when we want to find cooperative groups ("US clinical trials group" or "Non-US clinical trials group"). Classic "Catch 22" problem.

Comment entered 2011-03-15 22:09:01 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-03-15 22:09:01
BZCOMMENTOR::Bob Kline
BZCOMMENT::26

(In reply to comment #23)

> > 2. The OrgName element continue to have the following string for all the
> > ones I reviewed “replace with PRS Organization Name you log in with”
> > which I believe is the instruction provided in the XML for the cancer
> > centers to register with CTGov.
>
> I have reported this problem to Charles Y.

Response from Charles:

We do not store this field in CTRP since the responsible party will be registering the trial in ct.gov. that's why we have that value in the xml file defaulted to "replace with PRS Organization Name you log in with" for the submitter to replace with their organization name.

Comment entered 2011-03-16 13:35:07 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-03-16 13:35:07
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::27

(In reply to comment #26)

> Response from Charles:
>
> We do not store this field in CTRP since the responsible party will be
> registering the trial in ct.gov. that's why we have that value in the xml file
> defaulted to "replace with PRS Organization Name you log in with" for the
> submitter to replace with their organization name.

It looks like we can do without this information because in all cases, they (CTRP) are providing us with the Lead Organ which is what we typically use to identify the owner of the trial and for the Coop Group trials this is typically the Primary Lead Org (North Central Cancer Treatment Group (NCCTG), for example). But it possible for this information to be different from what will be on clinicaltrials.gov.

I propose that we remove the OrgName element from the (our) schema.

(In reply to comment #25)
> (In reply to comment #24)
> > I am wondering if we could limit the test data to only cooperative group
> We can do that, but you'll need to supply the CTRP trial IDs (e.g.,
> NCI-2009-00002) for the ones you want included. Since one of the problems
> we're trying to report on is the inability to map to an existing CDR
> Organization document we can't tell in such cases whether the Organization
> document would have the OrganizationType values we look for when we want to
> find cooperative groups ("US clinical trials group" or "Non-US clinical trials
> group"). Classic "Catch 22" problem.

I was hoping for a programmatic approach for identifying the cooperative trials since they are all trials we provided CTRP. However, if that is not possible, we can manually identify a subset of the them to use for testing.

Comment entered 2011-03-18 11:17:29 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-03-18 11:17:29
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::28

We agreed in the meeting yesterday that I will provide Bob with the list of the names of cooperative groups. Bob will then use this list to identify trials that have the LeadOrg/PDQOrganization/Name value set to any of the names provided below.

1. American College of Surgeons Oncology Group
2. Cancer and Leukemia Group B
3. Eastern Cooperative Oncology Group
4. Radiation Therapy Oncology Group
5. National Surgical Adjuvant Breast and Bowel Project
6. North Central Cancer Treatment Group
7. NCIC-Clinical Trials Group
8. Gynecologic Oncology Group
9. Southwest Oncology Group

Comment entered 2011-03-18 12:22:45 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2011-03-18 12:22:45
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::29

Just remember that the number and names of coop groups will change in the next two years. That program is being revamped! Identifying by name is risky!

Comment entered 2011-03-18 13:06:14 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-03-18 13:06:14
BZCOMMENTOR::Bob Kline
BZCOMMENT::30

(In reply to comment #29)
> Just remember that the number and names of coop groups will change in the next
> two years. That program is being revamped! Identifying by name is risky!

My understanding is that William is asking for the use of the list of name strings to restrict the set of converted trials CIAT needs to review for problems. I warned him that even for this limited purpose the track record of garbling CTRP has done with our data makes it likely we'd be skipping over trials that should be taken into consideration during this review process, but William expressed "100% confidence" that this can't possibly happen, so I agreed to create the reduced test set.

Comment entered 2011-03-18 14:20:56 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-03-18 14:20:56
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::31

I am wondering how we will identify cooperative group trials when we go live with this project. Will CTRP provide us with only cooperative group trials data when we are ready to put this in production? If that is the case then it looks like we need to ask CTRP to provide us with only cooperative group data for testing at this point because if we cannot reliably identify cooperative trials from the data they provide now, then we may not be able to identify them when we are ready to go live. But from the beginning we have said we will concentrate only on cooperative group trials for the first phase of this project. For testing purposes, as I said in comment #27, CIAT can manually identify a subset of cooperative group trials from the list of converted trials and use for testing. But even in this case, I am not sure how reliable our tests will be.

Comment entered 2011-03-21 11:22:47 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-03-21 11:22:47
BZCOMMENTOR::Bob Kline
BZCOMMENT::32

(In reply to comment #31)
> I am wondering how we will identify cooperative group trials when we go live
> with this project.

Here's one possible approach: CIAT could populate the CTRP external mappings with CTRP IDs for the cooperative groups for which we expect to receive trials. The import program can report on the lead orgs found in trials we get from CTRP and which have CTRP IDs which are not in the CTRP external mappings populated by CIAT. CIAT can then review these to identify any lead orgs which should be added to the mappings so that their trials will be imported by the next run of the job. A second mapping table usage can be established where we can record the names which CIAT has determined are not cooperative groups (by using the "unmappable" flag) so that CIAT won't have to keep reviewing the same names over and over.

To assist you in thinking over the merits of this approach I have attached a list of the lead org names which don't match any of the ones supplied in comment #28. I think you'll agree that some of the names in the attachment corroborate the warnings Lakshmi and I gave about the dangers of relying on string matching to identify which trials should be imported.

Comment entered 2011-03-21 11:22:47 by Kline, Bob (NIH/NCI) [C]

Attachment unmatched-org-names.txt has been added with description: lead org names not found in William's list of cooperative groups

Comment entered 2011-03-21 11:36:48 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-03-21 11:36:48
BZCOMMENTOR::Bob Kline
BZCOMMENT::33

(In reply to comment #32)

> ... A second mapping table usage can be established where we can record
> the names which CIAT has determined are not cooperative groups (by using
> the "unmappable" flag) so that CIAT won't have to keep reviewing the same
> names over and over.

A better, less convoluted approach would be to have CIAT add mappings for the CTRP IDs for the non-cooperative-group organizations. The import will know to skip over their trials because their Organization documents will not have the organization type which identifies them as cooperative groups. Not only does this approach avoid the odd creation of an external mapping usage whose sole purpose is to record unmappable strings, but it also reduces the workload for CIAT, who won't have to deal with variations of the organizations' names once mappings have been entered for the CTRP IDs for those organizations, no matter how many ways CTRP alters the organization name strings.

Comment entered 2011-03-21 12:06:48 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-03-21 12:06:48
BZCOMMENTOR::Bob Kline
BZCOMMENT::34

If you think my proposal is worth pursuing, you might want to take the attached list of CTRP IDs for the lead orgs found in their trials and add them to the mapping table on Franck with the usage CTRP_PO_ID. Then I'll modify the import script to skip over trials whose lead orgs don't have one of the organization type values you want to target.

I'll propagate the mappings to Bach for those organizations whose documents existed on Bach before the latest refresh of Franck when we're preparing to go live so you don't have to enter them twice.

Comment entered 2011-03-21 12:06:48 by Kline, Bob (NIH/NCI) [C]

Attachment lead-orgs.txt has been added with description: CTRP IDs for lead orgs in sample trial documents

Comment entered 2011-03-21 12:15:00 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-03-21 12:15:00
BZCOMMENTOR::Bob Kline
BZCOMMENT::35

Here's a spreadsheet for the CTEP org IDs, in case you'd prefer to have me populate the external map table programmatically from values you enter in the third column of the sheet.

Comment entered 2011-03-21 12:15:00 by Kline, Bob (NIH/NCI) [C]

Attachment ctep-lead-orgs.xls has been added with description: Excel version of previous attachment

Comment entered 2011-03-22 11:19:58 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-03-22 11:19:58
BZCOMMENTOR::Bob Kline
BZCOMMENT::36

(In reply to comment #27)

> I propose that we remove the OrgName element from the (our) schema.

Done (also removed the creation of the element from the conversion software).

http://franck.nci.nih.gov/cgi-bin/cdr/GetSchema.py?id=694781

Comment entered 2011-03-22 12:01:25 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-03-22 12:01:25
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::37

In addition to one of the options you provided, can we also do some mapping or checks at the trial level? Initially, we know the cooperative group trials we expect from CTRP, at least using one of the trial IDs (It will be good if they will echo back the CDR IDs for just the trials). So if a trial is missing from the list, it will be easier to know which one.

For new trials that will be added going forward, we should receive them, assuming our mapping is correct and CTRP is providing the correct data that we expect. But because we can't be absolutely sure about the data, if we can continue to get from CTSU/RSS a list of trials they are updating (using the existing service with some modification, obviously), that will even be better. The idea is that if we are getting the trial, we are also sure that we are getting all the sites since they are all included in the protocol document.

I will let you know our choice of the two options you provided above.

Comment entered 2011-03-22 16:11:38 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-03-22 16:11:38
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::38

(In reply to comment #33)
> (In reply to comment #32)
>
> > ... A second mapping table usage can be established where we can record
> > the names which CIAT has determined are not cooperative groups (by using
> > the "unmappable" flag) so that CIAT won't have to keep reviewing the same
> > names over and over.
>
> A better, less convoluted approach would be to have CIAT add mappings for the
> CTRP IDs for the non-cooperative-group organizations. The import will know to
> skip over their trials because their Organization documents will not have the
> organization type which identifies them as cooperative groups. Not only does
> this approach avoid the odd creation of an external mapping usage whose sole
> purpose is to record unmappable strings, but it also reduces the workload for
> CIAT, who won't have to deal with variations of the organizations' names once
> mappings have been entered for the CTRP IDs for those organizations, no matter
> how many ways CTRP alters the organization name strings.

I like this approach but how will the program know to skip a new trial with a new lead organization which is not a cooperative group and which has not been mapped yet? If what I think is correct, then it looks like we have to continually update the mapping table manually since we will likely be receiving a lot of new non-cooperative trials.

Also, on the issue of getting existing cooperative groups trials from CTRP, we agreed in OCECDR-3252 comment #4 that we will import the CTRP document directly on top of the InScopeProtocol document. So, it seems the best solution will be for CTRP to echo back the CDR IDs of the cooperative groups.

Comment entered 2011-03-22 16:54:06 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-03-22 16:54:06
BZCOMMENTOR::Bob Kline
BZCOMMENT::39

If there's a finite set of organizations which are cooperative groups, and this set won't expand, then yes, we can either (a) make sure CTRP has CDR IDs for all of those organizations and echoes them back with the trial documents or (b) populate the external mapping table with the CTRP IDs for those organizations. Either way, we can ignore trials with lead_org elements which don't match one of the IDs in that fixed set.

If it's not true that the set of cooperative groups is fixed, then neither approach will prevent you from having to look at new organizations to see if they're a cooperative group.

Of course, you could try and convince CTRP to flag trials which are cooperative group trials. This would only be a useful approach if you're convinced that CTRP will use the same definition for what is a cooperative group trial as you do.

Comment entered 2011-03-24 11:35:33 by Beckwith, Margaret (NIH/NCI) [E]

BZDATETIME::2011-03-24 11:35:33
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::40

I am absolutely sure that I am missing something in this discussion, especially since I missed the last meeting, but I have a question about this. It seems to me that the set of trials we are interested in getting from CTRP (at least for now) are just the trials that are updated through RSS so that we can get the site information. Isn't there someway for CTRP to let us know which trials are being updated through RSS? THen it doesn't matter to us whether they are Coop.Group trials or not (even though they should all be). Like I said, I may be missing the main point here, but thought I would throw in my 2 cents worth!

Comment entered 2011-03-25 10:20:28 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-03-25 10:20:28
BZCOMMENTOR::Bob Kline
BZCOMMENT::41

Next step will be to ask CTRP (in this afternoon's meeting) to identify the trials we should be importing. They will do that by flagging trials as "RSS" trials or flagging the ones which are cooperative group trials. Ideally we should get them to do both if they have the information, I would think.

Comment entered 2011-03-29 08:23:11 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-03-29 08:23:11
BZCOMMENTOR::Bob Kline
BZCOMMENT::42

They're going to add to attributes to the root Element, one for flagging a cooperative group trial, and the other to indicate a trial updated by RSS.

Comment entered 2011-03-29 08:29:48 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-03-29 08:29:48
BZCOMMENTOR::Bob Kline
BZCOMMENT::43

(In reply to comment #42)
> They're going to add to attributes ....

They're going to add two attributes .... :-)

Comment entered 2011-04-01 11:44:20 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-04-01 11:44:20
BZCOMMENTOR::Bob Kline
BZCOMMENT::44

Following up on last week's meeting with CTRP I wrote up a proposed spec for how the criterion elements will use string formatting to convey structure:

1. A line whose first character is an asterisk (Unicode code point U+002A) represents a list item.
2. An unbroken sequence of two or more asterisks at the beginning of a line represents a list item with a nesting level conveyed by the number of asterisks in that sequence.
3. The appearance of a list item at a given nesting level is considered to close off lower levels of nesting; a nested list item (that is, whose line begins with two or more asterisks) must be preceded in the same list by a list item whose nesting level is one less than the current list item and whose nesting level has not be closed off.[1]
4. Whitespace characters (as defined by the Unicode specification) immediately following the asterisks used as list item delimiters, as well as trailing whitespace characters in a list item, are insignificant and may be discarded;
5. A list item must contain at least one non-whitespace character following the leading asterisk sequence and any optional whitespace characters.
6. Each line in the text content of a criterion element is terminated by a single line feed character (Unicode code point U+000A); this rule, in combination with the first rule, implies that a list item may not contain embedded line feed characters.
7. Any line following a list item (including a line which consists of a single line feed character, that is, an empty line) which does not start with an asterisk indicates the end of the list in which the list item is a member; this implies that list items cannot embed separate paragraphs, as is possible in XML markup for lists; a subsequent line which does begin with an asterisk represents a new list.
8. All occurrences of the character sequence carriage return, line feed (U+000D, U+000A) will be replaced by a single line feed character before the application of the rules above; similarly, a carriage return which is not followed by a line feed character will be replaced with a line feed character.

[1] In other words, this sequence is not allowed:

*Top
**Nested
***Subnested
*Another top-level list item
**Can't do this: the * group has been closed off

Comment entered 2011-04-11 14:28:25 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-04-11 14:28:25
BZCOMMENTOR::Bob Kline
BZCOMMENT::45

(In reply to comment #44)
> Following up on last week's meeting with CTRP I wrote up a proposed spec for
> how the criterion elements will use string formatting to convey structure:...

It was decided that we can't use the spec developed last week, because CTRP told us they don't have control over whether that syntax will be used in all cases.

We also decided in last Friday's conference call that we will use the po_id values in the list of cooperative groups provided by Charles to identify which trials are cooperative group trials. They will identify someone at CTRP who will be responsible for sending us an email message whenever the composition of that list changes. For now I have told them to send the email messages to bkline@rksystems.com; should I have them also include another mailing list (such as pdqupdate@cancer.gov)? In the long run they will modify the export schema to include a top-level attribute explicitly identifying cooperative group trials as such, and they will populate the attribute appropriately.

Should I run a new conversion job on the last set of documents we got from them? If so, should I use the information we received from them about how to identify RSS trials or should I use the po_id values from the list Charles gave us for cooperative group lead orgs?

Comment entered 2011-04-12 15:00:29 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-04-12 15:00:29
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::46

(In reply to comment #45)
> (In reply to comment #44)
> We also decided in last Friday's conference call that we will use the po_id
> values in the list of cooperative groups provided by Charles to identify which
> trials are cooperative group trials. They will identify someone at CTRP who
> will be responsible for sending us an email message whenever the composition of
> that list changes. For now I have told them to send the email messages to
> bkline@rksystems.com; should I have them also include another mailing list
> (such as pdqupdate@cancer.gov)? In the long run they will modify the export

I think it is Ok to include the pdqupdate email for now but I am not sure if that mailbox will be actively managed in the future when PDQ stops processing new trials and stops updating existing trials so it may be safe to include Margaret's email as well.

> schema to include a top-level attribute explicitly identifying cooperative
> group trials as such, and they will populate the attribute appropriately.
>
> Should I run a new conversion job on the last set of documents we got from
> them? If so, should I use the information we received from them about how to
> identify RSS trials or should I use the po_id values from the list Charles gave
> us for cooperative group lead orgs?
If I am reading your comments correctly, it looks like you decided in the conference call to use the po_id of the cooperative groups Charles provided to identify the trials. Given the options we have, I think that is a better approach so I vote for using the po_ids for the cooperative groups to run the conversion on the last of set of trials from CTRP.

Could you please post the cooperative groups Charles provided? Could you also post the information they provided about how to identify RSS trials?

Comment entered 2011-04-12 17:40:21 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-04-12 17:40:21
BZCOMMENTOR::Bob Kline
BZCOMMENT::47

[From email message sent by Charles Y. on 2011-04-07]

A few items to follow up on our last conversation:

1. Eligibility Criteria Formatting: for the most part, we are following the specs that you listed in your last note on the subject to import the eligibility criteria from PDQ. Please note that this format will be preserved as long as no changes are made to the eligibility criteria using the CTRP UI. CTRP currently does not have a way to enforce this format in the UI

2. RSS Trials: the XML file contains a trial_owners tags. If one of the owners is equal to “CTEPRSS RSS”, then the participating sites data came from RSS for that trial. See attached example NCI-2009-00290

3. Cooperative Groups Trials: Those trials can be identified by interrogating the lead_org tag in the xml file. If the lead organization is one of the organizations listed below, the trial is a “Cooperative Group” trial.

Cooperative Group Name (CTEP_ID PO_ID)

American College of Radiology Imaging Network (ACRIN 24037)
American College of Surgeons Oncology Trials Group (ACOSOG 23977)
Cancer and Leukemia Group B (CALGB 56231)
Cancer and Leukemia Group B (CALGB) Research Base (RSB016 214184)
Children’s Oncology Group (COG 60948)
Eastern Cooperative Oncology Group (ECOG 68593)
Eastern Cooperative Oncology Group (ECOG) Research Base (RSB004 213880)
European Organization for Research and Treatment of Cancer (EORTC 69348)
Gynecologic Oncology Group (GOG 88642)
Gynecologic Oncology Group (GOG) Research Base (RSB006 213941)
National Cancer Institute of Canada Clinical Trials Group (NCIC 154526)
National Surgical Adjuvant Brea st and Bowel Project (NSABP) Research Base (RSB015 214153)
National Surgical Adjuvant Breast and Bowel Project (NSABP 168015)
North Central Cancer Treatment Group (NCCTG 154345)
North Central Cancer Treatment Group (NCCTG) Research Base (RSB010 214002)
Radiation Therapy Oncology Group (RTOG 214245)
Radiation Therapy Oncology Group (RTOG) Research Base (RSB012 214062)
Southwest Oncology Group (SWOG 220068)
Southwest Oncology Group (SWOG) Research Base (RSB013 214092)

Comment entered 2011-04-13 11:54:22 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-04-13 11:54:22
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::48

> If I am reading your comments correctly, it looks like you decided in the
> conference call to use the po_id of the cooperative groups Charles provided to
> identify the trials. Given the options we have, I think that is a better
> approach so I vote for using the po_ids for the cooperative groups to run the
> conversion on the last of set of trials from CTRP.
>
> Could you please post the cooperative groups Charles provided? Could you also
> post the information they provided about how to identify RSS trials?

Looking at the list of cooperative groups provided by CTRP, they include Research bases for the cooperative groups. For example Gynecologic Oncology Group (GOG) Research Base (RSB006 213941). In PDQ we do not have a separate organization record for this cooperative group research base and for many, if not all of the other cooperative group research bases. Besides, all RSS updates to the Gynecologic Oncology Group (GOD) are directed to trials under the main group and not the research base group. It seems to me that using the po_id alone may not be a good approach since we will likely be including trials that are not updated by RSS. So I am changing my vote to using the “CTEPRSS RSS” designation.

Comment entered 2011-04-14 08:12:47 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-04-14 08:12:47
BZCOMMENTOR::Bob Kline
BZCOMMENT::49

(In reply to comment #48)

> ... So I am changing my vote to using the “CTEPRSS RSS” designation.

Well, I had already implemented the other approach, so you get to determine by inspection which approach produces the better results. Using the RSS approach gets almost the same documents as the previous approach of converting all of the trials they gave us (726 of the 751 documents). Selecting the ones with a lead org that's on Charles's list of cooperative groups gets a much smaller set (422 documents). I have attached the list showing which trials only showed up in one of the sets. To review the sets separately:

http://bach.nci.nih.gov/cgi-bin/cdr/view-converted-ctrp-docs.py?set=cg
(for the cooperative group selection method)

http://bach.nci.nih.gov/cgi-bin/cdr/view-converted-ctrp-docs.py?set=rss
(for the RSS method).

Comment entered 2011-04-14 08:12:47 by Kline, Bob (NIH/NCI) [C]

Attachment set-diffs has been added with description: Differences between the two selection methods

Comment entered 2011-04-14 17:49:29 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-04-14 17:49:29
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::50

(In reply to comment #49)
> http://bach.nci.nih.gov/cgi-bin/cdr/view-converted-ctrp-docs.py?set=cg
> (for the cooperative group selection method)
>
> http://bach.nci.nih.gov/cgi-bin/cdr/view-converted-ctrp-docs.py?set=rss
> (for the RSS method).

I have reviewed many of the trials and here are my observations:
Using the cooperative group approach appears to have worked well in that it brings in most, if not all, of the cooperative group trials. However, like we pointed out already, not all cooperative groups update their trials through
RSS and a group like COG updates some of its trials through RSS so it is likely we will be retrieving trials that are not updated through RSS in addition to the ones that are updated through RSS if we use this method. For example, I identified at least 35 COG trials on this list but I have no way of knowing whether the trials need to be updated by RSS or by the COG service and 35(possibly more) appears to be too high a number for the trials COG updates through RSS.

Using the "CTEPRSS RSS" designation is the desired approach but this method also brings in a lot of trials that are not RSS updated trials, neither are they cooperative group
trials. I see that some of the NCI trials that we now get from clinicaltrials.gov have been assigned "CTEP RSS". I suspect that CTRP is marking all CTEP trials as being updated by CTEP RSS. Examples are below:

NCI-2009-00002 [CDR0000068270] - This trial is an NCI Clinical Center trial and not a cooperative group trial. Neither is it updated through RSS. We actually now get this trial though CTGOV so it will probably drop off when Bob gives them new set of data but the question still remains as to why it is tagged as an RSS updated trial.

NCI-2009-00011 [CDR0000067889] - This is also another NCI trial that is not updated by RSS but is marked as a CTEPRSS RSS trial by CTRP.

There is also another set of trials that should have been marked as CTEP RSS trials but are not marked as such. These are part of the trials found in the diff report. Examples

NCI-2009-01172 - [CALGB-40603 - CDR0000636850] - This is a CALGB trial that is currently updated by RSS in PDQ but CTRP has not tagged it as CTEPRSS RSS.

NCI-2009-01178 - [NCCTG-N0745 CDR0000637866] - Same as the trial above.

In conclusion, it looks like this is data errors issue and not a problem with the approach or method we are using to identify the trials. Also, while we still desire to use the “CTEP RSS” approach, if CTRP does not tag trials correctly, we would either be missing trials or we will be importing trials that we really shouldn’t be importing.

Comment entered 2011-04-15 10:02:41 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-04-15 10:02:41
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::51

I am wondering if we need a vendor filter issue for the CTRP trials. We have other issues related to the CTRP trials but there is none for the vendor filter. If we need one, I will create a new issue for that.

Comment entered 2011-04-15 12:21:30 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-04-15 12:21:30
BZCOMMENTOR::Bob Kline
BZCOMMENT::52

Yes, we will need a vendor filter.

Comment entered 2011-04-15 14:46:17 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-04-15 14:46:17
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::53

(In reply to comment #52)
> Yes, we will need a vendor filter.

Created OCECDR-3341 for this.

Comment entered 2011-05-05 10:30:38 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-05-05 10:30:38
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::54

We discussed in one of the CDR meetings whether we should be importing Completed (and closed trials as new trials?) trials from CTRP and Margaret was going to discuss this with Lakshmi.

Comment entered 2011-05-06 11:31:52 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-05-06 11:31:52
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::55

(In reply to comment #54)
> We discussed in one of the CDR meetings whether we should be importing
> Completed (and closed trials as new trials?) trials from CTRP and Margaret was
> going to discuss this with Lakshmi.

Margaret mentioned in yesterday's meeting that we should not import new trials which have a status of Closed or Completed.

Comment entered 2011-05-20 14:34:18 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-05-20 14:34:18
BZCOMMENTOR::Bob Kline
BZCOMMENT::56

On the CTRP phone call today I reminded Charles that we're still blocked in our efforts to finalize the schema by the need for a reliable method for determining which trials should be imported and reviewed with the proposed schema. Charles wondered if the cooperative group approach to selecting the trials was acceptable, except that it wasn't picking up all the trials you expected to see (which seems to be in alignment with what you said in comment #50). He said that if that's true then it's likely we're just not seeing some of the trials because they haven't imported everything we've sent them yet. Is the absence of those trials an obstacle to proceeding with the work on determining what should be in the schema? If so, I need you to help me formulate an explanation of why we're stuck that I can pass on to Charles.

Comment entered 2011-05-23 10:01:39 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-05-23 10:01:39
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::57

(In reply to comment #56)
Actually what I found out from the review I did was that using the cooperative group approach, we were getting slightly more than expected trials (compared with the CTEP RSS approach which retrieved significantly more than expected). So using this approach and barring all data issues in terms of CTRP tagging the trials appropriately for each cooperative group, we will get all the trials we need (RSS trials) but in addition, we will get trials that are cooperative group trials but not necessarily RSS updated trials.

If I am not mistaken, the goal is to get only trials that are updated through the RSS service and not just all cooperative group trials. However, because a cooperative group like COG does not update the bulk of its trials through the RSS service, we will be retrieving all its trials irrespective of their mode of update, if we use the cooperative group approach.

The second issue with using the cooperative group approach is that, the cooperative group list Charles provided (Comment # 47) included groups that typically do not update their trials through RSS. For example, the National Cancer Institute of Canada Clinical Trials Group (NCIC 154526) does not update it trials through RSS; the cooperative group approach retrieved MA.17 (CDR0000078629) but this trial is not updated through the RSS in PDQ. We could easily fix this by excluding the National Cancer Institute of Canada from the list but assuming one of their trials gets updated through RSS, we will miss that trial.

At this point, I think what we need is a better way of identifying only trials updated through RSS, and this is true for both approaches.
In case we can't get CTRP to provide us with a better way of identifying only RSS trials, in comment #37, I suggested that we continue to use the existing RSS service but this time not to update the trials, but to check or verify that we are getting all the trials we need and only the trials we need. In case we find any differences, we can inform CTRP to make the appropriate changes. Getting the information from the source would ensure that we are getting exactly what we need.

Comment entered 2011-06-08 10:11:07 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-06-08 10:11:07
BZCOMMENTOR::Bob Kline
BZCOMMENT::58

OK, Charles gave me a list of CTEP IDs for trials which he said they received from the RSS team. I extracted NCT IDs and CTEP IDs from all of our published InScopeProtocol documents, and used the NCT IDs from the trials we got from CTRP to find the CDR ID of the corresponding InScopeProtocol document, which in turn gave me that document's CTEP ID. If that CTEP ID was in the set of IDs I just got from Charles I included the trial in a report which you can review at http://bach.nci.nih.gov/ctrp-rss-trials.html . Please take a look and let me know if this is close enough to what you would expect in order to have me convert that subset of the trials for the purpose of verifying whether the proposed schema is satisfactory.

Comment entered 2011-06-08 13:52:48 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-06-08 13:52:48
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::59

I reviewed some of the trials on the report and also did a quick scan of all the IDs on the report and it appears to me that all the trials on the report would be ones that we expect to be updated by RSS. All the ones I reviewed did have an update mode of RSS so it looks like we are getting the right trials.

Also, I just wanted to mention that the total number of trials in this report (243) is about half of what is currently in the manifest file (the number is 552 for today's update). May be this does not matter at this point.

Comment entered 2011-06-15 15:54:38 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-06-15 15:54:38
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::60

Since I don't have access to the manifest file, I used the most recent RSS updates from the import/update statistics report to find trials that RSS is currently updating in PDQ but are not on the list from the link you provided in comment #58. I found the following trials:

CDR0000288823 RTOG-0232
CDR0000475774 E1505
CDR0000597665 NCCTG-N07C2
CDR0000631962 NCCTG-N08C1
CDR0000637947 CALGB-70604
CDR0000640898 E1508
CDR0000641815 RTOG-0539
CDR0000643361 N0871
CDR0000646724 N0849
CDR0000647146 RTOG-0831
CDR0000649174 CALGB-90601
CDR0000654472 GOG-0086P
CDR0000683076 NSABP-C-11

Comment entered 2011-06-24 12:52:30 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-06-24 12:52:30
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::61

We agreed that CIAT will continue the process of validating the schema and not wait until CTRP accurately identifies the set of RSS trials we expect to get. I believe the next step is for Bob to reconvert the trials based on the new method of identifying RSS trials. Also, we have been testing and tracking the creation of new orgs and person records in this issue (comment #21 onwards) we may want to reconvert the orgs and person based on sites from this new set of RSS trials.

Comment entered 2011-08-04 15:40:19 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-08-04 15:40:19
BZCOMMENTOR::Bob Kline
BZCOMMENT::62

The review process for approving this schema has stretched on longer than anticipated, with the result that refresh of the databases loses the draft schema between CIAT visits. Therefore I decided to install the schema on Bach so it will continue to be available indefinitely.

http://bach.nci.nih.gov/cgi-bin/cdr/GetSchema.py?id=708123

Comment entered 2011-08-08 10:09:16 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-08-08 10:09:16
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::63

There seems to be a problem with the links to the converted files. They bring up a page with just the heading and no trials are displayed.

Comment entered 2011-08-08 12:01:16 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-08-08 12:01:16
BZCOMMENTOR::Bob Kline
BZCOMMENT::64

(In reply to comment #63)

> There seems to be a problem with the links to the converted files. They bring
> up a page with just the heading and no trials are displayed.

That's because the server was refreshed between the time the job run and the review. I will generate a new set as soon as we get an answer back from CTRP about why most of the trials are missing from the most recent batches.

Comment entered 2011-08-23 08:31:26 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-08-23 08:31:26
BZCOMMENTOR::Bob Kline
BZCOMMENT::65

The approach we're using to identify which trials to import isn't producing very encouraging results. I just ran a conversion job on the latest set from CTRP, which had an already reduced number of trials (because they're only giving us abstracted trials now), and of those only 6 were identifiable as RSS trials. To start with, another half-dozen or so trials are unparseable because the top of the document starts with non-XML along these lines:

Exception in generating PDQ XML for Study 245346
cvc-complex-type.2.4.b: The content of element 'secondary_id' is not complete. O
ne of '{id_type}' is expected.

For most of the ones which are parseable, we can't find a CTEP ID that's on the spreadsheet they gave us, and for most of those, it's because we can't find a CTEP ID at all. That's probably because they're no longer InScopeProtocols in the CDR.

I've attached a list of the trials I could parse but couldn't identify as RSS trials. Each line in the list represents a single trial, with the name of the file from CTRP followed by the CTEP ID we were able to map (if any) by following the chain NCT ID -> CDR ID -> CTEP ID.

I think we're going to need to start a conversation about an alternate method for identifying which trials to import from CTRP.

Comment entered 2011-08-23 08:31:26 by Kline, Bob (NIH/NCI) [C]

Attachment not-rss has been added with description: Trials we can't identify as RSS trials

Comment entered 2011-08-23 12:44:05 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-08-23 12:44:05
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::66

As discussed in last week's CDR meeting, we have the following changes to the schema. The first 7 items pertain to moving elements and blocks around to help in processing and workflow management. #8 is about removing two values we no longer use and have recently been removed from the InScopeProtocol and CTGovProtocol schemas. #10 is a question which we probably need to discuss on Thursday and #11 pertains to a duplicate value.

1.Move the ProtocolProcessingDetails block to in front of the “BriefSummary” element.
2.Move the “ArmsOrGroups” block to after the “DetailedDescription” element.
3.Move the “CTEnrollment” element to after “ArmsOrGroups” block.
4.Move the “CTStudyDesign” block to after the “Eligibility” element.
5.Move the “Outcomes” block to after “StudyDesign” block.
6.Move CTLocation block to between PDQIndexing and VerificationDate elements.
7.Move “ NumberOfArms” element from the CTStudyDesign block and place it within the “ArmsOrGroups” block. It could be the very first element in the block.
8.Remove “Psychosocial” and “Methods Development” from the list of values for the StudyCategoryName element.
9.Does CTRP capture Last Changed Date or Date Last Modified information? If they do, we would like to have that information included in the data they send us.
10.Under PDQSponsorship, you have the following question “
<!– XXX Uncertain whether we'll need this --> “. It looks like we are going to need this because the PDQSponsorship data determines whether a trial is marked as NCI sponsored or not on Cancer.gov (Volker may have to confirm this).
11.'NIGMS' is listed twice in the list of values for the PDQSponsorship element.

Comment entered 2011-08-25 12:12:05 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-08-25 12:12:05
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::67

(In reply to comment #65)
> For most of the ones which are parseable, we can't find a CTEP ID that's on the
> spreadsheet they gave us, and for most of those, it's because we can't find a
> CTEP ID at all. That's probably because they're no longer InScopeProtocols in
> the CDR.
> I've attached a list of the trials I could parse but couldn't identify as RSS
> trials. Each line in the list represents a single trial, with the name of the
> file from CTRP followed by the CTEP ID we were able to map (if any) by
> following the chain NCT ID -> CDR ID -> CTEP ID.
> I think we're going to need to start a conversation about an alternate method
> for identifying which trials to import from CTRP.

Could it be that CTRP is not sending us RSS trials?
I have reviewed the attached list. Some of the ones you found a match for are certainly RSS trials. They are:
AHOD0521
GOG-0243
GOG-0188
GOG-0087M
GOG-0076EE
CALGB-10001
ANHL0131
AHOD0431
AEWS07P1
AEWS0521

However, none of the above is found in the earlier conversion you ran in comment #58 (which produced 243 trials which were all RSS trials).
Also, I looked up others that you found a match for that didn't look like RSS trials. Some of the CTEP IDs are:
5876
5965
6082

These CTEP IDs correspond to trials that are not cooperative group trials and for that matter, not RSS trials:
5876-
CDR0000363562; MAYO-MC0261; Mayo Clinic Cancer Center
5965-
CDR304460; UCCRC-12209B; University of Chicago Cancer Research Center
6082-
CDR301644; UCSD-040749; Rebecca and John Moores UCSD Cancer Center

This leads me to believe that CTRP is not providing you with RSS trials and this may be why you are not able to identify the RSS trials.

Comment entered 2011-08-25 12:52:22 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-08-25 12:52:22
BZCOMMENTOR::Bob Kline
BZCOMMENT::68

Bob will create a summary email message for Charles explaining the current state of this task, noting that 6 trials is not what we expected to import.

Comment entered 2011-08-30 10:03:57 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-08-30 10:03:57
BZCOMMENTOR::Bob Kline
BZCOMMENT::69

(In reply to comment #68)
> Bob will create a summary email message for Charles explaining the current
> state of this task, noting that 6 trials is not what we expected to import.

Here's what I sent him:

Charles:

It doesn't look as if the approach we settled on for identifying which trials to import from CTRP into PDQ is working as well as we had hoped. The set I most recently retrieved to test the logic contained a total of 293 documents. Only six of them were successfully identified as importable. Of the remaining 287 documents:

6 were malformed XML
21 contained no NCT ID
222 contained no NCT ID which could be found in a CDR InScopeProtocol document
38 mapped to a CTEP ID which was not found on the list of RSS trials you gave us

It hardly seems worth while to build an elaborate set of tools to import six documents. Perhaps we need to revisit the topic of how we identify RSS trials we should be importing. It may be that we're stuck until CTRP is ready to embed the information about which trials are RSS trials within the exported documents themselves.

Comment entered 2011-09-06 08:36:30 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-09-06 08:36:30
BZCOMMENTOR::Bob Kline
BZCOMMENT::70

Here is an interface you can use to view the converted documents for the six trials we are able to identify as RSS trials using the current method. The conversion was performed on Franck, and Person and Organization documents were created on that server for persons and organizations not found in the CTRP mapping table (which will be everything for the initial imports unless we seed the table in advance). At some point earlier in the project I recall that CIAT was having second thoughts about the approach of having the import job create new person and organization documents (since that is likely to result in multiple documents for the same person or organization), but I don't think this discussion was ever completed.

http://bach.nci.nih.gov/cgi-bin/cdr/view-converted-ctrp-docs.py?set=rss

Comment entered 2011-09-06 11:01:15 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-09-06 11:01:15
BZCOMMENTOR::Bob Kline
BZCOMMENT::71

I have asked Charles if it would be possible to add a LastModified date element to the trial documents exported by CTRP, to record when the last significant change was made to each document. Do you want to provide specific guidance about what "significant" means in this context, or would you prefer to leave it to the judgement of the staff maintaining the CTRP documents?

Comment entered 2011-09-08 11:28:14 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-09-08 11:28:14
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::72

(In reply to comment #70)
> Here is an interface you can use to view the converted documents for the six
> trials we are able to identify as RSS trials using the current method.

It appears the schema changes we requested in comment #66 have not been done yet. It would be good for us to have the new documents reflect the proposed schema changes.

> At some point earlier in the project I recall that CIAT
> was having second thoughts about the approach of having the import job create
> new person and organization documents (since that is likely to result in
> multiple documents for the same person or organization), but I don't think this
> discussion was ever completed.
>
That is correct. What we suggested was an interface which will allow us to determine whether the new person or organization is a duplicate or not. For example, the current conversion on Franck created 5 new person records. However, when I did a duplicate search, there were records for each of them that appeared to duplicates.

Comment entered 2011-09-08 11:46:17 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-09-08 11:46:17
BZCOMMENTOR::Bob Kline
BZCOMMENT::73

Schema modifications installed on Franck:

http://franck.nci.nih.gov/cgi-bin/cdr/GetSchema.py?id=706082

Please review. After I get confirmation that this is what you want I'll modify the conversion software to conform with the new version of the schema.

Comment entered 2011-09-12 10:55:45 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-09-12 10:55:45
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::74

(In reply to comment #73)
> Schema modifications installed on Franck:
>
> http://franck.nci.nih.gov/cgi-bin/cdr/GetSchema.py?id=706082
>
> Please review. After I get confirmation that this is what you want I'll modify
> the conversion software to conform with the new version of the schema.

It looks like the refresh on Franck wiped out the schema you just installed.

Comment entered 2011-09-12 12:59:58 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-09-12 12:59:58
BZCOMMENTOR::Bob Kline
BZCOMMENT::75

I have restored the CTRP schema.

Comment entered 2011-09-12 13:13:24 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-09-12 13:13:24
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::76

(In reply to comment #75)
> I have restored the CTRP schema.

Is it still on Franck? I am still getting the python script error.

A problem occurred in a Python script.

d:\cdr\Log\tmptkpb9o.html contains the description of this error. d:\python\lib\cgitb.py:173: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6 value = pydoc.html.repr(getattr(evalue, name))

Comment entered 2011-09-12 13:48:35 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-09-12 13:48:35
BZCOMMENTOR::Bob Kline
BZCOMMENT::77

Sorry, the ID changed.

http://franck.nci.nih.gov/cgi-bin/cdr/GetSchema.py?id=708123

Comment entered 2011-09-13 15:25:27 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-09-13 15:25:27
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::78

>4.Move the “CTStudyDesign” block to after the “Eligibility” element.
“Eligibility” above should have been “CTEligibility”.

>5.Move the “Outcomes” block to after “StudyDesign” block.
“StudyDesign” above should have been” CTStudyDesign” and “Outcomes” should have been “CTOutcomes “

Sorry about these.

Currently “CTStudyDesign” and “CTOutcomes” have been removed from the Document Type section of the schema perhaps because of the confusion about the names of the elements. It was not our intention to either rename or remove them. We just want to rearrange them.

Comment entered 2011-09-13 17:14:41 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-09-13 17:14:41
BZCOMMENTOR::Bob Kline
BZCOMMENT::79

(In reply to comment #78)

> It was not our intention to either rename or remove them. We just
> want to rearrange them.

Well, they were indeed rearranged (not removed), just not to where you thought they'd be. :-)

Schema modified again:

http://franck.nci.nih.gov/cgi-bin/cdr/GetSchema.py?id=708123

Comment entered 2011-09-14 14:05:18 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-09-14 14:05:18
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::80

(In reply to comment #79)

>
> Schema modified again:
>
> http://franck.nci.nih.gov/cgi-bin/cdr/GetSchema.py?id=708123

The schema is good to go. Please run the conversion based on this schema. There is only one element we are not so sure about. We will wait until we see data from CTRP to confirm its placement - The SubGroups element.

Comment entered 2011-09-14 15:21:32 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-09-14 15:21:32
BZCOMMENTOR::Bob Kline
BZCOMMENT::81

(In reply to comment #80)

> The schema is good to go. Please run the conversion based on this schema.

Done.

http://bach.nci.nih.gov/cgi-bin/cdr/view-converted-ctrp-docs.py?set=rss

Comment entered 2011-09-15 13:01:25 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-09-15 13:01:25
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::82

(In reply to comment #81)
> (In reply to comment #80)
>
> > The schema is good to go. Please run the conversion based on this schema.
>
> Done.
>
> http://bach.nci.nih.gov/cgi-bin/cdr/view-converted-ctrp-docs.py?set=rss

Out of the 8 converted trials, none of them matches any of 'our' RSS trials. Only one of them appears to be a cooperative group trial for COG. But even that one is not marked as RSS trial in PDQ currently. Actually, some of the 8 trials are now CTGov trials so they are not trials we expect to receive from CTRP at this point.

**Bob, you mentioned that CTRP provided you with a list of trials that they have identified as RSS. Is it possible to attach it to this issue? We can review the list against the trials we currently have that are RSS trials.

Comment entered 2011-09-15 13:07:43 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-09-15 13:07:43
BZCOMMENTOR::Bob Kline
BZCOMMENT::83

(In reply to comment #82)

> **Bob, you mentioned that CTRP provided you with a list of trials that they
> have identified as RSS. Is it possible to attach it to this issue? We can
> review the list against the trials we currently have that are RSS trials.

Here's the spreadsheet Charles gave us on June 6 with the note "Attached is the list of RSS trials CTEP IDs we received from the RSS team."

Comment entered 2011-09-15 13:07:43 by Kline, Bob (NIH/NCI) [C]

Attachment RSS Trials.xlsx has been added with description: Spreadsheet from Charles

Comment entered 2011-09-16 10:32:17 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-09-16 10:32:17
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::84

The attached spreadsheet contains the 8 trials you converted. I added the CTEP IDs, NCT IDS and CDR IDs (among other columns). I found the corresponding PDQ trials by taking the NCT IDs in the converted files and searching the CDR. They each retrieved one matching trial.

I also took the CTEP IDs in the PDQ trial and searched the Spreadsheet provided by Charles (comment #83) and expected to see matching IDs (CTEP IDs). But I found only one matching CTEP ID from the list of 8 trials (AALL08P1) . I couldn't find matching CTEP IDs for the remaining 7 trials.

There were three CTGOV trials from the list of 8 converted trials and I expected to see that all three of them would be converted trials but one of them appears NOT to be a converted/transferred trial (CDR0000585133). If I found the right trial in this case, and if it was indeed never converted, then it shouldn't have come up on the list of converted trials because the trial would not have been an InScopeProtocol in the past and so would not at any point have had a CTEP ID.

Could you please confirm that the approach I am using to find matching trials between what was converted and what Charles provided is the right approach?

Comment entered 2011-09-16 10:32:17 by Osei-Poku, William (NIH/NCI) [C]

Attachment CTRP _CONVERTED_TRIALS.xlsx has been added with description: CTRP converted trials

Comment entered 2011-09-16 10:44:09 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-09-16 10:44:09
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::85

(In reply to comment #83)
> Created attachment 2156 [details]
> Spreadsheet from Charles
>
> (In reply to comment #82)
>
> > **Bob, you mentioned that CTRP provided you with a list of trials that they
> > have identified as RSS. Is it possible to attach it to this issue? We can
> > review the list against the trials we currently have that are RSS trials.
>
> Here's the spreadsheet Charles gave us on June 6 with the note "Attached is the
> list of RSS trials CTEP IDs we received from the RSS team."

By looking at the the IDs alone, I can confirm that at least 95% of them are RSS trials. Also, the total number - 559 is really close to the number of RSS trials we expect to get back from CTRP.

Comment entered 2011-10-04 09:18:18 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-10-04 09:18:18
BZCOMMENTOR::Bob Kline
BZCOMMENT::86

I think, in light of the recent decision to import the CTRP RSS trials as CTGovProtocols and append site information (and possibly other data at some point in the future) from CTRP to those documents, this task needs to be replaced by one for the appropriate modifications to the CTGovProtocol schema to accommodate the CTRP information to be pulled into the CTGovProtocol document. I assume we'll use the Location elements designed for the abandoned CTRPProtocol schema, which I recommend we wrap in a CTRPInfo block element (to centralize the portion of the CTGovProtocol document controlled by CTRP). I have no preference as to whether this tracking issue or a newly-created separate issue is used for the new task. Similar modifications will be needed for the existing CTRP CSS and vendor filter tasks.

Comment entered 2011-10-06 12:25:40 by Beckwith, Margaret (NIH/NCI) [E]

BZDATETIME::2011-10-06 12:25:40
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::87

I think we should use this issue just because it has the history of our decision making. So, I think what we need to do is to modify the schema for CT.gov trials to allow us to add RSS sites from trials that originate in CTRP. We need to create a CTRPInfo block of elements and use the Location elements designed for the abandoned CTRPProtocol schema. Does that summarize it? I also changed the issue summary title.

Comment entered 2011-10-25 14:37:03 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-10-25 14:37:03
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::88

I am attaching the spreadsheet containing the mapping provided by CTRP for the CTEP (including RSS) trials.

Comment entered 2011-10-25 14:37:03 by Osei-Poku, William (NIH/NCI) [C]

Attachment CTEP DCP Trials and identifiers 20111020.xlsx has been added with description: CDR - CTRP RSS Mapping

Comment entered 2011-10-25 14:56:19 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-10-25 14:56:19
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::89

Here are brief notes from last Friday's call:

1. We need to suppress email addresses before we go live with CTRP data. That is, until the email address error is corrected.
2. CTRP is in the process of changing SPONSOR information for NCI sponsored trials (Changing from CTEP to NCI).
3. RSS - CTRP Integration to be completed by mid November.
4. There were also discussions on the timeline for transfers but I don't remember that there was an agreement on when it would be completed.

Comment entered 2011-10-26 15:14:12 by Beckwith, Margaret (NIH/NCI) [E]

BZDATETIME::2011-10-26 15:14:12
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::90

Upped priority.

Comment entered 2011-10-27 08:44:36 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-10-27 08:44:36
BZCOMMENTOR::Bob Kline
BZCOMMENT::91

New CTGovProtocol schema installed on Mahler. Ready for user review.

http://mahler.nci.nih.gov/cgi-bin/cdr/GetSchema.py?id=349686

Comment entered 2011-10-31 12:12:16 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-10-31 12:12:16
BZCOMMENTOR::Bob Kline
BZCOMMENT::92

Comment entered 2011-10-31 12:12:16 by Kline, Bob (NIH/NCI) [C]

Attachment mappings.xls has been added with description: New workbook created for William

Comment entered 2011-11-14 16:46:23 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-11-14 16:46:23
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::93

I have finished reviewing the spreadsheet (attached). As you know already, this spreadsheet is a merger of spreadsheet1 (list of RSS trials provided by Charles) and spreadsheet2 (mappings provided by Sulekha). I sorted the spreadsheet based on column G (Import Sites) and compared the total number of trials that have been marked for import (X) with the list of RSS trials provided by Charles. The numbers matched - 459. I also reviewed the IDs column C (CTEP) to see if they are IDs that correspond to cooperative groups or trials updated by RSS while also verifying some of them in the CDR. All of the trials marked for import in this spreadsheet, (except 6 highlighted in blue background color) are cooperative groups trials or RSS trials in PDQ currently. I reviewed these 6 trials that are not cooperative group trials in PDQ and found that they are currently not being updated by RSS. Therefore, CTRP needs to confirm that they are RSS trials or they were marked in error. For the rest of the trials on the spreadsheet, I again reviewed the ID column (CTEP) to see if there are trials that should be marked as RSS trials but have not been marked as such. There were approximately 100 Approved-not yet active, Active and Temp. Closed trials that were not marked for import as RSS trials (highlighted with red background color). I retrieved all of these trials in PDQ/CDR to verify their statuses and recorded some of the statuses in the new column I created (Comment). Going by the RSS spreadsheet (spreadsheet1) that CTRP (Charles) provided, we shouldn't import sites from CTRP for them but there is no indication that they are not RSS trials. We need to know from CTRP why these trials are not marked as RSS trials. Further checks also revealed many other trials (approximately 110 trials) that are currently Closed or Completed in PDQ, they are highlighted with a lighter red background color which looks more like brown on the spreadsheet, were also not marked as RSS trials on the spreadsheet CTRP provided. Generally it should be okay to exclude these trials from the import. They are either closed or completed and wouldn't make a difference since in most cases, RSS drops all the sites from a closed or completed trial. However, there is no reason not to continue to mark them as RSS trials in spite of the status of the trial. There was one trial (yellow highlighted) that should have been marked as RSS but it does not have any CDR ID on the spreadsheet and according to Charles, rows without CDR IDs are for trials that were included in an initial export but were later dropped from subsequent export. The trial is closed but I don't see why we would not include it in the export. We should investigate this one. Lastly, the cooperative group that posed a problem during this review is COG. COG updates some of their trials through their own service and others through RSS. Unlike the other cooperative groups, it is difficult to determine which of their trials are updated by RSS and which ones are not.

Legend and summary of questions:
GREEN background - These trials are good to go. They are cooperative group trials that appear to have been correctly marked for RSS updates
BLUE background - These are non-cooperative group trials that have been marked for RSS updates. However, in PDQ, these trials are not updated by the RSS service. Will they start updates after switching the service?
WHITE background - These trials are also good to go. They are mostly non-cooperative group trials or cooperative group trials that are withdrawn or blocked from publication. Generally, we don’t have to do anything with these trials for now.
lIGHT RED/BROWN background - RSS trials that are mostly closed or completed but were not marked for RSS updates. Some of them are currently being updated by the RSS service in PDQ. Would updates stop after the switch?
RED background - These are mostly Approved-not yet active, Active, Temporarily closed trials, some of which are receiving active updates from the RSS service. Why were they not marked for RSS updates?
YELLOW background – Only one trial – It is closed in PDQ and it does not have a corresponding CDR number. Why did we drop it from out export to CTRP?

Bob:
In last week's meeting, you suggested that the attachment should go into a particular issue but I can remember which issue you were referring to.

Comment entered 2011-11-14 16:46:23 by Osei-Poku, William (NIH/NCI) [C]

Attachment mappings.xls has been added with description: Review of mappings

Comment entered 2011-11-15 08:20:39 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-11-15 08:20:39
BZCOMMENTOR::Bob Kline
BZCOMMENT::94

(In reply to comment #93)

> BLUE background - These are non-cooperative group trials that have been marked
> for RSS updates. However, in PDQ, these trials are not updated by the RSS
> service. Will they start updates after switching the service?

If we are told to continue running the RSS site/status import job, and those trials are included in the feed we get from RSS, then yes. This is perhaps a question you were directing more at some else.

> lIGHT RED/BROWN background - RSS trials that are mostly closed or completed but
> were not marked for RSS updates. Some of them are currently being updated by
> the RSS service in PDQ. Would updates stop after the switch?

Again, you didn't identify who you expected to answer this question, but it probably wasn't me.

> RED background - These are mostly Approved-not yet active, Active, Temporarily
> closed trials, some of which are receiving active updates from the RSS service.
> Why were they not marked for RSS updates?

You might want to identify to whom you were directing these questions, so you don't run into the problem of everyone reading the issue assuming that someone else will reply.

> YELLOW background – Only one trial – It is closed in PDQ and it does not have a
> corresponding CDR number. Why did we drop it from out export to CTRP?
>
> Bob:
> In last week's meeting, you suggested that the attachment should go into a
> particular issue but I can remember which issue you were referring to.

Well, this is the issue for changing the schema, and task #4942 is the one for actually pulling in the site information from CTRP documents. Since the spreadsheet is to be used for identifying those documents and connecting them with the CDR documents into which the site information will be imported, the spreadsheet is more relevant to that task. In fact, all of these questions have less (or nothing) to do with the schema changes, and everything to do with the import of the site information, so you might consider posting future attachments and comments along these lines to #4942 instead of here.

Comment entered 2011-12-01 12:47:07 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-12-01 12:47:07
BZCOMMENTOR::Bob Kline
BZCOMMENT::95

Need to modify the schema to make address information for the contact and PI optional.

Comment entered 2011-12-01 15:08:40 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-12-01 15:08:40
BZCOMMENTOR::Bob Kline
BZCOMMENT::96

(In reply to comment #95)
> Need to modify the schema to make address information for the contact and PI
> optional.

Modification installed on Mahler and Franck.

Comment entered 2012-01-30 11:47:00 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2012-01-30 11:47:00
BZCOMMENTOR::Bob Kline
BZCOMMENT::97

William decided he wants to expand the schema to include the overall official contact information (see recent comments in issue #4942). We agreed at last Thursday's status meeting that it would be best if CIAT reviewed the schema we came up with a couple of months ago to identify any further changes which will be needed so we can address all of the required changes at once, in order to reduce the impact on other projects as much as possible.

Comment entered 2012-03-12 10:44:31 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2012-03-12 10:44:31
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::98

(In reply to comment #97)
> William decided he wants to expand the schema to include the overall official
> contact information (see recent comments in issue #4942). We agreed at last
> Thursday's status meeting that it would be best if CIAT reviewed the schema we
> came up with a couple of months ago to identify any further changes which will
> be needed so we can address all of the required changes at once, in order to
> reduce the impact on other projects as much as possible.

After reviewing the schema, we decided to add only the overall official. Please proceed to add it to the CTGovProtocol schema.

Comment entered 2012-03-13 08:06:19 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2012-03-13 08:06:19
BZCOMMENTOR::Bob Kline
BZCOMMENT::99

(In reply to comment #98)

> After reviewing the schema, we decided to add only the overall official. Please
> proceed to add it to the CTGovProtocol schema.

CTRPOverallOfficial block added to schema on Mahler. Ready for user review. You'll need corresponding changes to the CSS and export filters.

Comment entered 2012-03-29 11:47:18 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2012-03-29 11:47:18
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::100

(In reply to comment #99)
> (In reply to comment #98)
>
> > After reviewing the schema, we decided to add only the overall official. Please
> > proceed to add it to the CTGovProtocol schema.
>
> CTRPOverallOfficial block added to schema on Mahler. Ready for user review.
> You'll need corresponding changes to the CSS and export filters.

Verified on Franck. Thanks!

Comment entered 2012-05-10 13:56:22 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2012-05-10 13:56:22
BZCOMMENTOR::Bob Kline
BZCOMMENT::101

Decided at status meeting:

Modify schema to make person mapping optional.

Comment entered 2012-06-14 11:57:14 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2012-06-14 11:57:14
BZCOMMENTOR::Bob Kline
BZCOMMENT::102

(In reply to comment #101)
> Decided at status meeting:
>
> Modify schema to make person mapping optional.

Schema modified on Mahler and Franck.

Comment entered 2012-08-27 13:03:39 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2012-08-27 13:03:39
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::103

Please promote schema changes to Bach.

Comment entered 2012-08-27 13:10:35 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2012-08-27 13:10:35
BZCOMMENTOR::Bob Kline
BZCOMMENT::104

(In reply to comment #103)
> Please promote schema changes to Bach.

Done.

Comment entered 2012-08-27 13:11:06 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2012-08-27 13:11:06
BZCOMMENTOR::Bob Kline
BZCOMMENT::105

Schema promoted to Bach.

Comment entered 2012-10-05 09:59:57 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2012-10-05 09:59:57
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::106

(In reply to comment #105)
> Schema promoted to Bach.

Verified on Bach. Issue closed. Thank you!

Attachments

File Name	Posted	User
CTEP DCP Trials and identifiers 20111020.xlsx	2011-10-25 14:37:03	Osei-Poku, William (NIH/NCI) [C]
ctep-lead-orgs.xls	2011-03-21 12:15:00
CTRP _CONVERTED_TRIALS.xlsx	2011-09-16 10:32:17	Osei-Poku, William (NIH/NCI) [C]
CTRP_CTGOV_MAP_Layout_Display.xls	2011-02-09 16:21:15	Osei-Poku, William (NIH/NCI) [C]
CTRP_CTGOV_MAP.xls	2011-02-08 18:07:57	Osei-Poku, William (NIH/NCI) [C]
CTRP_CTGOV_MAP.xls	2011-02-07 19:44:26	Osei-Poku, William (NIH/NCI) [C]
CTRP_CTGOV_MAP (1).xls	2011-02-08 12:06:22	Osei-Poku, William (NIH/NCI) [C]
ctrp.dtd	2011-01-15 12:24:03
ctrp.xml	2011-02-04 10:35:58
CTRP Mapping Notes_Updated.doc	2011-02-21 17:48:58	Osei-Poku, William (NIH/NCI) [C]
CTRP Mapping Notes.doc	2011-02-16 15:30:15	Osei-Poku, William (NIH/NCI) [C]
lead-orgs.txt	2011-03-21 12:06:48
mappings.xls	2011-11-14 16:46:23	Osei-Poku, William (NIH/NCI) [C]
mappings.xls	2011-10-31 12:12:16
not-rss	2011-08-23 08:31:26
RSS Trials.xlsx	2011-09-15 13:07:43
set-diffs	2011-04-14 08:12:47
unmatched-org-names.txt	2011-03-21 11:22:47

Elapsed: 0:00:00.001412

CDR Tickets