Issue Number | 3252 |
---|---|
Summary | [CTRP] Merge RSS site info from CTRP trials into CT.gov record |
Created | 2010-10-21 13:40:19 |
Issue Type | Improvement |
Submitted By | Kline, Bob (NIH/NCI) [C] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2012-10-05 09:59:00 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.107580 |
BZISSUE::4942
BZDATETIME::2010-10-21 13:40:19
BZCREATOR::Bob Kline
BZASSIGNEE::Bob Kline
BZQACONTACT::Lakshmi Grama
Develop software to retrieve and store clinical trial documents from CTRP.
BZDATETIME::2010-10-21 13:43:20
BZCOMMENTOR::Bob Kline
BZCOMMENT::1
[From my email message sent to Lakshmi and Margaret 2010-10-14]
Mostly looks OK to me. Some notes:
1. It would be good to have the ability to request a full set,
ignoring condition 2 ("The trial record was changed since the last
export") in section 4.1.
2. I'm not familiar with "PA module pages" (item #6 in section
4.2).
3. "All data element ..." (item #6 of section 4.2) should read "All data
elements ...."
4. "Trial Data to be Exclude" (item #6.b of section 4.2) should read
"Trial Data to be Excluded."
5. I'm a little confused by item #6.a of section 4.2: "[... except ...]
Elements already included in the ct.gov xml file"; does this mean that
we'll need to retrieve NLM's document for the trial and CTRP's document
for the trial, match them up and merge them?
6. The document seems inconsistent about whether IND/IDE information
will be included.
7. If we're going to the trouble to insert the "T" in the archive name
for the export set, to mirror the format specified by ISO 8601, we
should probably omit the hyphens surrounding the "T" character
(following that same standard).
Attachment CTRP to cancer gov daily update - Requirements lg.doc has been added with description: Requirements document from Charles Y, edited by LG
BZDATETIME::2010-10-28 16:42:33
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::2
Notes from status meeting today to try and capture basic process and flow of trial info.
Case I
Active
Cooperative Group Trial
Currently updated via RSS/amendments in the CDR
The ultimate goal with these trials is to transfer ownership to the Responsible Party, have all updates, including RSS, be done in the CTRP system, and for the CDR to get updated data directly from CTRP to publish on Cancer.gov.
Assumptions:
1. CTRP will have RSS functionality up and running
2. CDR will have the schema/dtd for the CTRP document type and will have
developed the capability to accept XML documents from CTRP and insert
them "on top of" a CT.gov document; time needs to be allowed for this to
happen
3. CTRP will have a calendar determining when each set of trials will be
processed
Steps:
1. CDR sends CTRP file with updated information on all active, In Scope,
CTEP/DCP studies; this includes Cooperative Group Trials.
2. CTRP identifies subset of Coop. Group Trials that will be transferred
in a particular time period.
3. CTRP works on those trials to supply missing info and clean them up;
they create XML which will be sent to the Responsible Party.
4. CTRP transfers ownership to Responsible Party in CT.gov.
5. CDR imports the CT.gov version of the trial and places it on top of
the In Scope document.
6. Current process: Z-Tech processes the CT.gov version (reviews
indexing, etc.) and creates a publishable version
NOTE Z-Tech may not need to do this
since the CTRP version will be inserted on top of this CT.gov
version.
7. CDR imports trial from CTRP
8. CDR creates CTRP version on top of the CT.gov version; this gets
saved as a non-Pub. version; can do matching on NCTID since all of these
trials will have one NOTE Question
about whether we should carry over the PDQ indexing block from the
CT.gov version since we will also be getting indexing information from
CTRP
9. Z-Tech reviews the CTRP version and makes it publishable.
10. This trial gets updated with data from CTRP, including RSS
information.
OCECDR-3
New Cooperative Group Trial
No In-scope or CT.gov version in CDR
1. CTRP processes trial and creates XML to give to the Responsible
Party.
2. The RP uploads XML to CT.gov so CT.gov version gets created.
3. CDR imports the CT.gov version as a non-pub document, which will have
an NCT ID and the unique CTRP ID
4. CDR then imports the corresponding document from CTRP and inserts it
on top of the CT.gov version, matching on the unique CTRP ID
5. Z-Tech reviews document and creates a Pub. version
6. This trial gets updated with data from CTRP, including RSS
information.
For OCECDR-3, we didn't go through it at the meeting so I may be making stuff up, but these steps makes sense to me. Okay, that's about it for my brain right now on this topic (remember those old ads that said "Here is your brain on drugs" and showed an egg frying?)
BZDATETIME::2010-11-18 11:42:16
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::3
Adding comments from our phone call on 11/17/10 (new text marked by **):
Assumptions:
1. CTRP will have RSS functionality up and running
2. CTRP will be able to process/update trials in their
system before we transfer ownership
3. CDR will have the schema/dtd for the CTRP document type and will
have
developed the capability to accept XML documents from CTRP and insert
them "on
top of" a CT.gov document; time needs to be allowed for this to
happen
4. CTRP will have a calendar determining when each set of trials will
be
processed
Case I
Active
Cooperative Group Trial
Currently updated via RSS/amendments in the CDR
The ultimate goal with these trials is to transfer ownership to the
Responsible
Party, have all updates, including RSS, be done in the CTRP system, and
for the
CDR to get updated data directly from CTRP to publish on Cancer.gov.
Steps:
1. CDR sends CTRP file with updated information on all active, In
Scope,
CTEP/DCP studies; this includes Cooperative Group
Trials.NOTE: We need to determine when we are going to
send them the final data load from the CDR
2. CTRP identifies subset of Coop. Group Trials that will be transferred
in a
particular time period.
3. CTRP works on those trials to supply missing info and clean them up;
they
create XML which will be sent to the Responsible Party.
4. CTRP *(PDQ)* transfers ownership to Responsible
Party in CT.gov. NOTE: As part of the transfer block
that gets added we need to have a way to indicate trials that we want to
import this trial from CTRP; add a flag "import from
CTRP"
5. CDR imports the CT.gov version of the trial and places it on top of
the In
Scope document.
6. Current process: Z-Tech processes the CT.gov version (reviews
indexing,
etc.) and creates a publishable version [NOTE Z-Tech may not need to do
this
since the CTRP version will be inserted on top of this CT.gov
version.]
7. CDR imports trial from CTRP NOTE CTRP import will
need to happen after the CT.gov nightly import as part of the scheduled
job.
8. CDR creates CTRP version on top of the CT.gov version; this gets
saved as a
non-Pub. version; can do matching on NCTID since all of these trials
will have
one [NOTE Question about whether we should carry over the PDQ indexing
block
from the CT.gov version since we will also be getting indexing
information from
CTRP. YES, we want to bring in the PDQ indexing
block.
9. Z-Tech reviews the CTRP version and makes it publishable.
10. This trial gets updated with data from CTRP, including RSS
information.
OCECDR-3
New Cooperative Group Trial
No In-scope or CT.gov version in CDR
1. CTRP processes trial and creates XML to give to the Responsible
Party.
2. The RP uploads XML to CT.gov so CT.gov version gets created.
3. CDR imports the CT.gov version as a non-pub document, which will have
an
NCT ID and the unique CTRP ID
4. CDR then imports the corresponding document from CTRP and inserts it
on top
of the CT.gov version, matching on the unique CTRP ID
5. Z-Tech reviews document and creates a Pub. version
6. This trial gets updated with data from CTRP, including RSS
information.
The above process assumes that the Responsible party will use the CTRP XML file to enter trial into CT.gov. Z-Tech will need to look at the CT.gov import and identify cooperative group trials that don't have a CTRP ID; these will need to be imported from CTRP manually.
**NEW ISSUES:
1. Schema for CTRP trials
2. CSS/Template for CTRP trials
3. Create new filters for export to Cancer.gov of CTRP trials
BZDATETIME::2010-11-18 16:19:11
BZCOMMENTOR::Bob Kline
BZCOMMENT::4
(In reply to comment #3)
> 5. CDR imports the CT.gov version of the trial and places it on
top of the
> In Scope document.
We decided we would skip this step and instead just import the CTRP document directly on top of the InScopeProtocol document (step 8). We'll use a new table (ctrp_import) to queue up and track imports, similar to the way we use ctgov_import for trials imported from NLM. So the PDQ indexing block (and possibly the processing status block, depending on what's in the schema) will be carried over directly from the InScopeProtocol document.
BZDATETIME::2011-05-06 11:47:32
BZCOMMENTOR::Bob Kline
BZCOMMENT::5
[copied from issue #4962 to this issue, where it really belongs]
Margaret mentioned in yesterday's meeting that we should not import new trials which have a status of Closed or Completed.
BZDATETIME::2011-09-06 08:56:56
BZCOMMENTOR::Bob Kline
BZCOMMENT::6
Will we need the same approval process for importing new CTRP trials (splitting the processing into "download" and "import" steps) as we do for CT.gov protocols? Or should the import software assume we're going to import everything we get from CTRP which matches the import criteria?
BZDATETIME::2011-09-08 12:40:09
BZCOMMENTOR::Bob Kline
BZCOMMENT::7
(In reply to comment #6)
> Will we need the same approval process for importing new CTRP
trials (splitting
> the processing into "download" and "import" steps) as we do for
CT.gov
> protocols? Or should the import software assume we're going to
import
> everything we get from CTRP which matches the import criteria?
William and Margaret said in today's status meeting that we will need separate download, review/approval, and import steps for the CTRP trials.
BZDATETIME::2011-09-12 09:10:56
BZCOMMENTOR::Bob Kline
BZCOMMENT::8
William:
You were going to write up the details of how you wanted unmapped persons and organizations to be handled when importing new and previously imported trials. You can do that here or create a new task for this issue.
BZDATETIME::2011-09-12 19:34:09
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::9
(In reply to comment #8)
> William:
>
> You were going to write up the details of how you wanted unmapped
persons and
> organizations to be handled when importing new and previously
imported trials.
> You can do that here or create a new task for this issue.
This is what I remember from our discussions and the notes I took:
To avoid many duplicate person and organization records from being created by the import program, we suggested that we (CIAT) make the final determination whether to create a new person or organization records or not. That is, if no mappings are found for them during import, the unmapped records will be reported to us and we will create new records manually* (and map them accordingly) if they don't exist already, or if the records exist but need to be mapped to different CDR IDs, then we will update the mapping table accordingly. This will be done after we have reviewed and completed duplicate searches of the records that are presented to us in a new user interface (Interface 1).
Interface 1 will have person and organization records that the import program did not find any mapping for. Meanwhile, the program will proceed to map records that it finds mappings for. When all mappings are completed for a trial, Interface 2 will allow us to import the trial into the CDR. We discussed that the program will prevent us from importing a trial until all organization and person mappings have been completed for a particular trial. This will give us cleaner data.
Handling updated trials will be a little different since the trials have been imported already. Existing trials with new contact information or updated sites and person records that have mapping issues will also show up on Interface 1 for review and mapping by CIAT users. The trial will however show up in a different report that would be generated. When the site and person mappings are completed for a particular trial, the trial will drop off this report.
*Further notes in Interface 1:
For new records that need to be created, just like Interface 2 it would
be good to have a feature in Interface 1 that will permit us to import
the records automatically into the CDR since the program has all the
contact information that is needed to create the record.
BZDATETIME::2011-09-13 08:30:04
BZCOMMENTOR::Bob Kline
BZCOMMENT::10
Here's what I plan to implement.
The CTRP download job will examine each new or changed trial to determine whether there are any persons, organizations, states, or countries which cannot be mapped, and if so, the trial will be flagged as "needs mappings."
Each new trial will be presented for CIAT to decide the trial's disposition, with any mapping problems displayed below the trial's listing. For example:
NCI-2009-00002 Active Immunization of Sibling Stem Cell Transplant
...
( ) Import
( ) Reject
Unmapped:
Person 838232 Michael R Bishop 9200 West Wisconsin Avenue Milwaukee
WI
53226-3596 United States
Org 155161 National Cancer Institute Medicine Branch 9000
Rockville
Pike Bethesda MD 20892 United States
State Ontar. Canada
A separate report will be available showing each trial whose disposition is "ïmport requested" and which is flagged as "needs mappings." The report display will be identical to the interface for approval of new trials show above, except that the buttons for determining whether the trials should be imported will not be present. This report will include previously imported trials for which new mapping problems are present, as well as new trials which were approved for import using the first interface above, without resolution of all of the mapping problems for those trials.
Note that neither of these interfaces will duplicate the existing functionality implemented by the mapping table update table: these interfaces will only report the mapping gaps, which will be resolved using the existing "Update Mapping Table" interface.
Each time the import job runs each trial with a disposition of "import requested" will be examined to make a fresh determination as to which trials have mapping problems at the time of import. By doing this, any work by CIAT to resolve mapping problems will be used to allow trials which no longer have mapping problems to be imported, and any new mapping problems which have crept in (for example, by having an entry in the mapping table removed or edited) will be detected and will prevent import of trials with such problems as appropriate.
Note that I have added states (political subdivisions) and countries. I realized, after William pointed out a problem with one of the state mappings, that we will need mappings for strings used by CTRP for these geographical entities which we don't use ourselves in our own documents for those entities.
Does this make sense? Will it accomplish what you need?
BZDATETIME::2011-09-14 05:46:59
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::11
(In reply to comment #10)
.
>
> Does this make sense? Will it accomplish what you need?
Yes. It does make sense. Thanks!
(In reply to comment #9)
> *Further notes in Interface 1:
> For new records that need to be created, just like Interface 2 it
would be good
> to have a feature in Interface 1 that will permit us to import the
records
> automatically into the CDR since the program has all the contact
information
> that is needed to create the record.
The part I didn't see in your description is what I suggested in comment #9. This is certainly a feature that will be helpful.
BZDATETIME::2011-09-14 08:30:36
BZCOMMENTOR::Bob Kline
BZCOMMENT::12
Ah, I bet you mean new Person and Organization documents, not new records in the mapping table (which I was already planning to insert automatically). Yes, we can look into the feasibility of doing something like that.
BZDATETIME::2011-09-14 14:55:00
BZCOMMENTOR::Bob Kline
BZCOMMENT::13
(In reply to comment #12)
> Ah, I bet you mean new Person and Organization documents, not
new records in
> the mapping table (which I was already planning to insert
automatically).
> Yes, we can look into the feasibility of doing something like
that.
I'm investigating approaches to meeting this request, and one of the difficulties has to do with representation of addresses when creating new Person documents. Our schema requires us to know whether an address is for a private practice location, but we don't have enough information in the incoming CTRP document to make that determination. In the software I had written to implement the decision to automatically create new documents when no mapping was present I had side-stepped this problem by taking advantage of the fact that in the schema all three of the address block types are optional, but I had overlooked the fact that the CIPSContact element is required and must link to an existing address block, so in effect at least one address is required, so that approach isn't really available, unless we modify the schema to make the CIPSContact element optional.
Let's look at an example. NCI-2009-00002 (not one of the ones included in the latest conversion, I believe) has several blocks for Michael R. Bishop, with an address in Milwaukee, Wisconsin. There is no organization listed for the Wisconsin address, but I have no way of knowing whether that's because CTRP's document structure doesn't have anywhere to put an organization with the address, or because this individual has a private practice in Wisconsin. [1]
The example is further complicated by the fact that one of the blocks for Bishop has an affiliation sub-block giving "National Cancer Institute Medicine Branch" with an address in Bethesda, Maryland. Clearly, this organization doesn't belong with the Wisconsin address, so presumably I'd be faced with a decision about which of Bishop's blocks I should use to create his Person document: one with the second address for the affiliation, or one with just the Wisconsin address. If I'm always supposed to go through the entire trial document (or worse, search through all of the queued trial documents with mapping problems) and find the one with the most addresses for an individual, what should the software do if none of the blocks found contains a proper superset of all of the addresses? What logic should be used to compare address blocks to determine if they represent the same address? Just pick one version at random? If we decide we'll pick up the affiliation addresses as well (as OtherPracticeLocation blocks) what should the software do for such blocks which have organizations which themselves are not mapped? Should the software pretend that we know that any address which has no organization named (for example, Bishop's Wisconsin address) is for a private practice and just create a PrivatePractice block for those? If we have a block for a person with both a directly included address block as well as an affiliation block with an organization and it's own nested address block, and the address blocks are identical, do we create two contact blocks, one a PrivatePractice block, and the other an OtherPracticeLocation block? What if the address blocks are very similar, but with minor differences? What would be the definition of "minor" in this context?
Any guidance on how we should resolve these issues? Perhaps the software should just pick a random block and address for an individual when there are multiple to choose from, arbitrarily treating the person's address as a private practice location, ignoring nested affiliation blocks, report the newly created documents and let CIAT clean up the loose ends. Or (simpler) just create an invalid Person document with no address blocks at all (as I was originally doing) and let CIAT fill in the locations.
It didn't do anything to clear the fog when I looked in the CDR to find that we already have a document for this same Michael R. Bishop, without the Wisconsin address at all. :-)
[1] Actually, after writing that I did some digging and, using the host name in his email address, tracked down the address given in the block to one of the campuses of the Medical College of Wisconsin. So this isn't a private practice location. But I don't think anyone expects that the import software should be able to figure that out.
BZDATETIME::2011-09-14 15:29:48
BZCOMMENTOR::Bob Kline
BZCOMMENT::14
(In reply to comment #13)
> Any guidance on how we should resolve these issues? Perhaps the
software
> should just pick a random block and address for an individual when
there are
> multiple to choose from, arbitrarily treating the person's address
as a private
> practice location, ignoring nested affiliation blocks, report the
newly created
> documents and let CIAT clean up the loose ends. Or (simpler) just
create an
> invalid Person document with no address blocks at all (as I was
originally
> doing) and let CIAT fill in the locations.
Or (another possibility) we could just create support for a command to create a new Organization document, and leave the trickier Person documents as a manual task for CIAT.
BZDATETIME::2011-09-15 16:11:42
BZCOMMENTOR::Bob Kline
BZCOMMENT::15
(In reply to comment #13)
> Any guidance on how we should resolve these issues?
We decided in this afternoon's status meeting that we will take advantage of the fact that the specific contact information for all persons and organizations will included directly in the converted trial documents. This means that while it will be useful for CIAT to have some contact information saved in a newly created Person or Organization document (to help with subsequent research to identify duplicates) it's not so critical that we capture all variants of all addresses and other contact information when the software creates those documents. So it will be sufficient to pick the first block which matches the po_id value in the current trial document for the person or organization for which CIAT wants a new CDR document created, and use the address information associated directly with the person or organization in the selected block. Among other advantages, this avoids having to create invalid Person documents, which could have happened if we attempted to create OtherPracticeLocations with organizations for which we don't yet have a mapping in the CDR. We'll pretend that the contact blocks for the newly created Person documents are for private practices, and for those documents for which CIAT determines that the contact block is not for a private practice, they can do the research necessary to track down the organization information needed to convert the block to an OtherPracticeLocation block, if and when they determine that this is worth the trouble.
Does this match what you remember of our discussion?
BZDATETIME::2011-09-15 16:27:48
BZCOMMENTOR::Bob Kline
BZCOMMENT::16
One more note about the mapping gaps: although the software will include geographical mapping problems in the list of mapping gaps which would prevent import of a CTRP trial document, I'm not planning to implement the software to create new Country or PoliticalSubUnit documents. For one thing, the need to create such new documents will arise so much less frequently than for new persons and organizations that I don't believe it would be worth the additional development effort. For another thing, I wouldn't have all the information I'd need to create those documents anyway.
On another topic, did the issue of what the software should do if a trial which we have been importing because it was identified as an RSS trial is no longer in scope because it is subsequently determined that it is not an RSS trial get resolved?
BZDATETIME::2011-09-16 07:31:12
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::17
(In reply to comment #15)
> (In reply to comment #13)
> Does this match what you remember of our discussion?
Yes it does. Thanks!
>(In reply to comment #16)
> On another topic, did the issue of what the software should do if a
trial which
> we have been importing because it was identified as an RSS trial is
no longer
> in scope because it is subsequently determined that it is not an
RSS trial get
> resolved?
I suggest that such trials be reported to us so that we can follow up with CTRP staff to determine the reason for dropping the RSS designation.
BZDATETIME::2011-09-16 08:18:08
BZCOMMENTOR::Bob Kline
BZCOMMENT::18
(In reply to comment #17)
> I suggest that such trials be reported to us so that we can
follow up with CTRP
> staff to determine the reason for dropping the RSS designation.
OK. Margaret also suggested that we stop importing the trials, so we'll do both (report the change and stop importing). We also agreed that it would be good for Lakshmi to review all of the decisions we have made about how to handle the CTRP imports.
BZDATETIME::2011-09-16 10:39:26
BZCOMMENTOR::Bob Kline
BZCOMMENT::19
This question is most appropriate for this issue, but it first arose in the most recent comment for issue #4958 (q.v.): What is the most appropriate primary identifier to use for the trials in the ctrp_import table? I had been thinking it would be the secondary ID found in the document with id_type of "Registry Identifier" and id_domain of "CTRP (Clinical Trial Reporting Program)" (and also embedded in the file name for the trial document we download from CTRP), but the requirements for #4958 look like we'd need to use the NCT ID. Is that correct?
BZDATETIME::2011-09-16 18:25:43
BZCOMMENTOR::Bob Kline
BZCOMMENT::20
I have created CDR documents (not in the repository, just in the file system for testing) for all of the persons and organizations in all of the CTRP trials contained in the last set I pulled down (except for one individual whose state was given as "UK"; I reported this anomaly to Charles). You can review the documents with the interface here:
http://bach.nci.nih.gov/cgi-bin/cdr/view-converted-ctrp-po-docs.py
BZDATETIME::2011-09-22 14:59:06
BZCOMMENTOR::Bob Kline
BZCOMMENT::21
Here are the responses from CTRP on the most recently reported issues, along with my replies:
On 9/22/2011 1:23 PM, Sulekha Avasthi wrote:
>
> 1.RSS Ownership
>
> 1.Majority of RSS owned trials have not finished abstraction
yet.
> Therefore these are not present in Export.
>
> 2.Previously some trials were present as RSS owned trials in
error.
> A SQL script was executed to fix the ownership on Aug 31. After Aug
31
> the issue is resolved and we don’t see RSS as an owner for
non-RSS
> trials. The script was based on a list of CTEP ids we have
received
> from RSS. There are some trials in CTRP without CTEP ID. Even if
they
> are a good candidate for RSS owned trials the script did not update
the
> ownership for these trials. Therefore on Export they don’t show the
RSS
> as an owner.
>
I guess this means we have no trials we can use for our testing/development at this stage, which might have an impact on the deployment schedule.
> 2.PDQ Export Zip File name without Time stamp
>
> I talked to the developer. This is doable. Please confirm if we
need to do it in 3.6.1 release(mid Nov) or in 3.7 (Dec end).
>
Since we agreed back in December of last year that this was how we would address the "disoverability" problem I would hope they'd be able to implement it before a whole year has passed.
> 3.Data Anomaly
>
> CTRP does not validate the address stateorprovince code. Please
note the
> mentioned person belongs to United Kingdom. StateorProvince= UK is
fine
> for CTRP. There are many other cases where we have StateorProvince
= UK,
> empty, just a major city name or so. Most of these cases belong to
the
> country outside USA.
I'll pass this along to the data maintainers at this end for them to decide what they want to do with bogus geographical information.
BZDATETIME::2011-09-22 17:30:22
BZCOMMENTOR::Bob Kline
BZCOMMENT::22
I started a wiki page to capture Lakshmi's modest proposal for handling CTRP trial information. What I've got is incomplete and possibly wrong in places, but I wanted to get the process rolling. Please take a stab at fleshing it out and fixing what's not right.
BZDATETIME::2011-09-23 12:38:26
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::23
(In reply to comment #22)
> I started a wiki page to capture Lakshmi's modest proposal for
handling CTRP
> trial information. What I've got is incomplete and possibly wrong
in places,
> but I wanted to get the process rolling. Please take a stab at
fleshing it out
> and fixing what's not right.
>
> http://verdi.nci.nih.gov/pdqwiki/index.php/Ctrp
Bob, for all your hesitation, I think you captured it correctly. Just want to make sure we have thought through every use case.
BZDATETIME::2011-09-23 13:19:22
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::24
Your write matches exactly what I have in my notes! Are we planning to propose this to CTRP on our call at 2:00 today?
BZDATETIME::2011-09-23 13:46:08
BZCOMMENTOR::Bob Kline
BZCOMMENT::25
(In reply to comment #24)
> Your write matches exactly what I have in my notes! Are we planning
to propose
> this to CTRP on our call at 2:00 today?
I believe that's what Lakshmi said she was planning to do.
BZDATETIME::2011-09-30 10:55:06
BZCOMMENTOR::Bob Kline
BZCOMMENT::26
I have expanded the Wiki writeup of the proposed approach for matching trials and merging site information into the CTGovProtocol documents, based on conversations with Lakshmi and Margaret yesterday afternoon. Please review the document and let me know if it accurately reflects our current plan.
BZDATETIME::2011-10-04 10:57:38
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::27
(In reply to comment #26)
> I have expanded the Wiki writeup of the proposed approach for
matching trials
> and merging site information into the CTGovProtocol documents,
based on
> conversations with Lakshmi and Margaret yesterday afternoon. Please
review the
> document and let me know if it accurately reflects our current
plan.
>
> http://verdi.nci.nih.gov/pdqwiki/index.php/Ctrp
I added two comments to the wiki.
BZDATETIME::2011-10-13 12:26:41
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::28
Changed issue title.
BZDATETIME::2011-10-26 15:13:13
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::29
Upped priority.
BZDATETIME::2011-10-27 11:04:17
BZCOMMENTOR::Bob Kline
BZCOMMENT::30
Responding to the more urgent priority, I have installed the new CTGovProtocol schema on Franck, added the required link definitions, and written the code to merge the site information from an incoming CTRP document into an existing CTGovProtocol document. The timing is unfortunate (as all of you already are aware), because there's only one document represented in the mappings we got from CTRP which is already a CTGovProtocol document, so we can only test with a single case, and CTRP has been very slow in cleaning up its bugs, so I've had to shortcut around some of the steps that will be required in the production process, and I had to do quite a bit of hand mapping and cleanup of their data, which makes all of this much more expensive than it should be (and would be in a rationally scheduled implementation – not an option available to us, unfortunately), and renders the testing results much less informative than they should be. The converted test document is CDR687126. The document passes validation and can be viewed in XMetaL on Franck, or in your web browser:
http://franck.nci.nih.gov/cgi-bin/cdr/show-cdr-doc.py?id=687126
Ready for user review.
BZDATETIME::2011-10-27 12:22:11
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::31
(In reply to comment #30)
> The converted test document is CDR687126. The document
passes
> validation and can be viewed in XMetaL on Franck, or in your web
browser:
>
> http://franck.nci.nih.gov/cgi-bin/cdr/show-cdr-doc.py?id=687126
>
> Ready for user review.
The converted document is not an RSS trial. It doesn't look like it is one of the trials CTRP has identified as RSS. Actually, at this point, we don't expect that any of the RSS updated trials would be a CTGovProtocol.
BZDATETIME::2011-10-28 16:24:59
BZCOMMENTOR::Bob Kline
BZCOMMENT::32
(In reply to comment #31)
> The converted document is not an RSS trial. It doesn't look like
it is one of
> the trials CTRP has identified as RSS. Actually, at this point, we
don't expect
> that any of the RSS updated trials would be a CTGovProtocol.
As discussed at length in yesterday's status meeting, this presents a serious problem for development and testing of the software for download, review, merge of site information, and publishing of the new structure for the CTGovProtocol documents. The worst-case scenario is that we have what we need to begin development and testing at the same point as we are expected to have all of this in production. Unfortunately, NCI has laid out the process for this transition in such a way as to bring about exactly this scenario and (also unfortunate) my time travel machine is broken. To try and avoid at least some of the nightmare, I propose that CIAT produce a second mapping table, in addition to the one William is creating based on the latest mapping spreadsheet we got from CTRP. This second version would supply pretend mappings between random existing CTRP documents and random existing CTGovProtocol documents, ignoring the fact that none of the existing CTGovProtocol documents are actually documents for which we will be performing the site merge in the real system, and also ignoring the fact that none of the documents we are currently getting from CTRP are documents whose site information we will be using when this goes into production. It's all just pretend mappings: we'll be importing site information into documents for trials which in reality have nothing to do with those sites, but we'll be doing it on Franck, not in the production system, and it will give us something to test the import and the modified publishing filters with. It's a kludge, it's ugly, it's not a good test, but it's the best I can think of to do. After I get the pretend mappings I'll create a report that shows the person and organization mappings that CIAT will need to supply in order for the site imports to succeed.
Can anyone think of a better plan than this? Can you think of any reasons why this won't help get us closer to the goal?
BZDATETIME::2011-11-02 10:41:04
BZCOMMENTOR::Bob Kline
BZCOMMENT::33
Since the priority on this task has been raised, it would be good to have feedback soon for the proposed approach to development and testing laid out in the previous comment.
Note that some of the tests described in the Wiki document will be suppressed in this phase of development and testing (that's one of the reasons the testing will be incomplete). Specifically, we cannot verify that the org_study_id contains the CDR ID or the CTRP ID from the mapping table row for a trial's NCT ID. Instead we will pretend that we know that the information in the mapping table is correct. For both the "pretend" mapping table, as well as the one to be used in production, the check of the CTRP document to verify that RSS is one of the owners will not be performed for trials present in the original mappings we get from CIAT (we can't do that for the mappings we're using during development, because none of the trial documents we're currently getting from CTRP are marked as RSS, and we won't need to do it for the trials in the original production mappings, because CIAT will have already verified that the trials in that original mapping document will all be RSS trials).
Finally, note that the Wiki document includes a section which says:
"If the org_study_id is a CTRP ID which is not found in the mapping table, we will assume that the trial is a new one for which we do not have an InScopeProtocol document needing conversion, and we will use the existing CTGovImport logic to determine whether to import the trial, and if so, will add the CTRP and NCT IDs to the mapping table. If the trial is approved for import the CDR ID for the new CTGovProtocol will be added to the mapping table. When the corresponding CTRP document is fetched, it will be examined to determine whether it is an RSS trial, and if so, the site information will be imported into the CTGovProtocol document."
This last sentence implies that we will not need a review interface for CIAT to decide which trials will have sites imported from CTRP documents. Can I get confirmation for this implication? This will mean that the only review interface we'll need to implement will be the one for CIAT to supply missing mappings for persons and organizations (including buttons to create new Person or Organizations programmatically from the CTRP information) blocking import of the site information.
Again, I'd like to get rolling on the implementation of all of these pieces as soon as possible, so early feedback on these questions will be very much appreciated. :-)
BZDATETIME::2011-11-02 13:17:25
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::34
(In reply to comment #32)
> I propose that CIAT produce a second mapping table, in
> addition to the one William is creating based on the latest mapping
spreadsheet
> we got from CTRP. This second version would supply pretend mappings
between
I am okay with the proposed test plan. I am attaching the spreadsheet
which contains the pretend mapping for CTGov and CTRP trials. I left
empty columns I thought would not be necessary for the mapping.
Attachment pretend_mapping.xls has been added with description: Pretend mapping for testing
BZDATETIME::2011-11-02 13:34:14
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::35
(In reply to comment #33)
ll add
> the CTRP and NCT IDs to the mapping table. If the trial is approved
for import
> the CDR ID for the new CTGovProtocol will be added to the mapping
table. When
> the corresponding CTRP document is fetched, it will be examined to
determine
> whether it is an RSS trial, and if so, the site information will be
imported
> into the CTGovProtocol document."
>
> This last sentence implies that we will not need a review interface
for CIAT to
> decide which trials will have sites imported from CTRP documents.
Can I get
> confirmation for this implication? This will mean that the only
review
> interface we'll need to implement will be the one for CIAT to
supply missing
> mappings for persons and organizations (including buttons to create
new Person
> or Organizations programmatically from the CTRP information)
blocking import of
> the site information.
>
Yes. This is correct. That makes our reliance on the mapping table all
the more important.
BZDATETIME::2011-11-03 12:26:03
BZCOMMENTOR::Bob Kline
BZCOMMENT::36
The interface for reviewing the CTRP mapping gaps (and optionally creating new Person and Organization documents) has been implemented on Franck.
BZDATETIME::2011-11-07 16:06:34
BZCOMMENTOR::Bob Kline
BZCOMMENT::37
Should the import software create a publishable version of the CTGovProtocol with the newly merged CTRP site information?
BZDATETIME::2011-11-07 16:17:14
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::38
(In reply to comment #37)
> Should the import software create a publishable version of the
CTGovProtocol
> with the newly merged CTRP site information?
My vote is yes. But it should not create the first publishable version.
BZDATETIME::2011-11-07 19:16:56
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::39
(In reply to comment #36)
> The interface for reviewing the CTRP mapping gaps (and optionally
creating new
> Person and Organization documents) has been implemented on
Franck.
We have finished reviewing all the mapping gaps on Franck so we can proceed with the next steps. The interface looks great and it is easy to use.
With regards to the mapping table for the CTRP_PO_ID mapping usage, could you also display the names of the organizations and persons instead of just the IDs?
BZDATETIME::2011-11-08 08:40:48
BZCOMMENTOR::Bob Kline
BZCOMMENT::40
We don't want to store more than the CTRP po_id as the value in the mapping table, because that would mean the mapping would fail every time there was a variation in the name of the person or organization, a problem you're faced with in the most other mappings (that's why we're so happy to have an actual unique ID from the source for these entities).
BZDATETIME::2011-11-09 16:13:54
BZCOMMENTOR::Bob Kline
BZCOMMENT::41
(In reply to comment #39)
> We have finished reviewing all the mapping gaps on Franck so we
can proceed
> with the next steps.
I'm having a hard time testing the import software I'm working on, because every document I try either has mapping problems or is locked by another user. Any possibility you could get at least some of the documents selected for testing checked back in?
BZDATETIME::2011-11-09 16:23:14
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::42
(In reply to comment #41)
> (In reply to comment #39)
>
> > We have finished reviewing all the mapping gaps on Franck so
we can proceed
> > with the next steps.
>
> I'm having a hard time testing the import software I'm working on,
because
> every document I try either has mapping problems or is locked by
another user.
> Any possibility you could get at least some of the documents
selected for
> testing checked back in?
This is on Franck, right? Also, are you referring to the protocol documents? I checked all the protocol documents and they don't appear to be checked out to anyone.
BZDATETIME::2011-11-09 17:44:38
BZCOMMENTOR::Bob Kline
BZCOMMENT::43
(In reply to comment #42)
> This is on Franck, right? Also, are you referring to the
protocol documents? I
> checked all the protocol documents and they don't appear to be
checked out to
> anyone.
Yes, Franck. My fault: I forgot to adjust the CDR permissions for the ExternalImporter account to reflect the fact that instead of writing to CTRPProtocol documents it will instead need to be able to modify CTGovProtocol documents. Once I fixed that problem I was back on the road. I have two new publishable versions for CDR573235 and CDR560722, with CTRPInfo blocks merged into the documents on Franck. Please take a look before I test with any more of the trials.
BZDATETIME::2011-11-15 11:31:02
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::44
Please attach the XML documents of the two CTRP trials you converted for testing.
BZDATETIME::2011-11-15 15:38:53
BZCOMMENTOR::Bob Kline
BZCOMMENT::45
(In reply to comment #44)
> Please attach the XML documents of the two CTRP trials you
converted for
> testing.
http://franck.nci.nih.gov/cgi-bin/cdr/show-ctrp-doc.py?id=NCI-2011-02038
http://franck.nci.nih.gov/cgi-bin/cdr/show-ctrp-doc.py?id=NCI-2011-02042
BZDATETIME::2011-11-15 17:08:21
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::46
This is a duplicate of recent comments which were wrongly placed in OCECDR-3272
I have finished reviewing the spreadsheet (attached). As you know
already, this
spreadsheet is a merger of spreadsheet1 (list of RSS trials provided
by
Charles) and spreadsheet2 (mappings provided by Sulekha). I sorted
the
spreadsheet based on column G (Import Sites) and compared the total
number of
trials that have been marked for import (X) with the list of RSS
trials
provided by Charles. The numbers matched - 459. I also reviewed the IDs
column
C (CTEP) to see if they are IDs that correspond to cooperative groups or
trials
updated by RSS while also verifying some of them in the CDR. All of the
trials
marked for import in this spreadsheet, (except 6 highlighted in blue
background
color) are cooperative groups trials or RSS trials in PDQ currently.
I
reviewed these 6 trials that are not cooperative group trials in PDQ and
found
that they are currently not being updated by RSS. Therefore, CTRP needs
to
confirm that they are RSS trials or they were marked in error. For the
rest of
the trials on the spreadsheet, I again reviewed the ID column (CTEP) to
see if
there are trials that should be marked as RSS trials but have not been
marked
as such. There were approximately 100 Approved-not yet active, Active
and Temp.
Closed trials that were not marked for import as RSS trials (highlighted
with
red background color). I retrieved all of these trials in PDQ/CDR to
verify
their statuses and recorded some of the statuses in the new column I
created
(Comment). Going by the RSS spreadsheet (spreadsheet1) that CTRP
(Charles)
provided, we shouldn't import sites from CTRP for them but there is
no
indication that they are not RSS trials. We need to know from CTRP why
these
trials are not marked as RSS trials. Further checks also revealed many
other
trials (approximately 110 trials) that are currently Closed or Completed
in
PDQ, they are highlighted with a lighter red background color which
looks more
like brown on the spreadsheet, were also not marked as RSS trials on
the
spreadsheet CTRP provided. Generally it should be okay to exclude these
trials
from the import. They are either closed or completed and wouldn't make
a
difference since in most cases, RSS drops all the sites from a closed
or
completed trial. However, there is no reason not to continue to mark
them as
RSS trials in spite of the status of the trial. There was one trial
(yellow
highlighted) that should have been marked as RSS but it does not have
any CDR
ID on the spreadsheet and according to Charles, rows without CDR IDs are
for
trials that were included in an initial export but were later dropped
from
subsequent export. The trial is closed but I don't see why we would not
include
it in the export. We should investigate this one. Lastly, the
cooperative group
that posed a problem during this review is COG. COG updates some of
their
trials through their own service and others through RSS. Unlike the
other
cooperative groups, it is difficult to determine which of their trials
are
updated by RSS and which ones are not.
Legend and summary of questions:
GREEN background - These trials are good to go. They are cooperative
group
trials that appear to have been correctly marked for RSS updates
BLUE background - These are non-cooperative group trials that have been
marked
for RSS updates. However, in PDQ, these trials are not updated by the
RSS
service. Will they start updates after switching the service?
WHITE background - These trials are also good to go. They are
mostly
non-cooperative group trials or cooperative group trials that are
withdrawn or
blocked from publication. Generally, we don’t have to do anything with
these
trials for now.
lIGHT RED/BROWN background - RSS trials that are mostly closed or
completed but
were not marked for RSS updates. Some of them are currently being
updated by
the RSS service in PDQ. Would updates stop after the switch?
RED background - These are mostly Approved-not yet active, Active,
Temporarily
closed trials, some of which are receiving active updates from the RSS
service.
Why were they not marked for RSS updates?
YELLOW background – Only one trial – It is closed in PDQ and it does not
have a
corresponding CDR number. Why did we drop it from out export to
CTRP?
Bob:
In last week's meeting, you suggested that the attachment should go into
a
particular issue but I can remember which issue you were referring
to.
Attachment mappings.xls has been added with description: Review of mappings
BZDATETIME::2011-11-15 17:09:47
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::47
**This is a duplicate of recent comments which were wrongly placed in
bug
#4962** (Below is the response from Bob)
(In reply to comment #93)
> BLUE background - These are non-cooperative group trials that
have been marked
> for RSS updates. However, in PDQ, these trials are not updated by
the RSS
> service. Will they start updates after switching the service?
If we are told to continue running the RSS site/status import job,
and those
trials are included in the feed we get from RSS, then yes. This is
perhaps a
question you were directing more at some else.
> lIGHT RED/BROWN background - RSS trials that are mostly closed
or completed but
> were not marked for RSS updates. Some of them are currently being
updated by
> the RSS service in PDQ. Would updates stop after the switch?
Again, you didn't identify who you expected to answer this question,
but it
probably wasn't me.
> RED background - These are mostly Approved-not yet active,
Active, Temporarily
> closed trials, some of which are receiving active updates from the
RSS service.
> Why were they not marked for RSS updates?
You might want to identify to whom you were directing these
questions, so you
don't run into the problem of everyone reading the issue assuming that
someone
else will reply.
> YELLOW background – Only one trial – It is closed in PDQ and it
does not have a
> corresponding CDR number. Why did we drop it from out export to
CTRP?
>
> Bob:
> In last week's meeting, you suggested that the attachment should go
into a
> particular issue but I can remember which issue you were referring
to.
BZDATETIME::2011-11-15 17:20:19
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::48
All the questions are for CTRP staff except this one:
>YELLOW background – Only one trial – It is closed in PDQ and it
does not have a
>corresponding CDR number. Why did we drop it from out export to
CTRP?
Also, should I email CTRP with the question about my review?
BZDATETIME::2011-11-16 11:32:37
BZCOMMENTOR::Bob Kline
BZCOMMENT::49
(In reply to comment #46)
> YELLOW background – Only one trial – It is closed in PDQ and it
does not have
> a corresponding CDR number. Why did we drop it from out export to
CTRP?
There are multiple rows with a yellow background. To which are you referring?
BZDATETIME::2011-11-16 11:33:26
BZCOMMENTOR::Bob Kline
BZCOMMENT::50
(In reply to comment #48)
> Also, should I email CTRP with the question about my review?
Seems like a good idea.
BZDATETIME::2011-11-16 12:51:59
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::51
(In reply to comment #49)
> (In reply to comment #46)
>
> > YELLOW background – Only one trial – It is closed in PDQ and
it does not have
> > a corresponding CDR number. Why did we drop it from out export
to CTRP?
>
> There are multiple rows with a yellow background. To which are you
referring?
Actually, there are two you need to investigate:
NCI-2009-00573 (NCT00379145) CDR0000502192
NCI-2009-00436 (NCT00039377) CDR0000069378
I couldn't find the rest of the trials with yellow background in the CDR.
BZDATETIME::2011-11-16 14:04:35
BZCOMMENTOR::Bob Kline
BZCOMMENT::52
(In reply to comment #51)
> Actually, there are two you need to investigate:
> NCI-2009-00573 (NCT00379145) CDR0000502192
> NCI-2009-00436 (NCT00039377) CDR0000069378
You'll find the explanations for why those weren't included (as well as answers to many similar questions) in the report I posted in the issue for our export to CTRP at http://verdi.nci.nih.gov/tracker/attachment.cgi?id=2010 .
BZDATETIME::2011-11-17 11:38:53
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::53
(In reply to comment #45)
> (In reply to comment #44)
> > Please attach the XML documents of the two CTRP trials you
converted for
> > testing.
>
> http://franck.nci.nih.gov/cgi-bin/cdr/show-ctrp-doc.py?id=NCI-2011-02038
> http://franck.nci.nih.gov/cgi-bin/cdr/show-ctrp-doc.py?id=NCI-2011-02042
Thanks! Could you show as part of the CTRP INFO block, the CTRP identifier? It could also go with the IDs at the beginning of the document if that is more appropriate.
Apart from this, everything seems fine to me. There are still some data issues that we need to bring to the attention of CTRP. They are providing the names of the cooperative groups as the facility or location but I believe the facility or location should rather be the principal Investigator's location. That is where the recruitment is taking place. The principal Investigator's address information is provided but there is no mention of the name of his/her institution. I would like to review more data from the conversion to confirm this.
BZDATETIME::2011-11-17 14:41:24
BZCOMMENTOR::Bob Kline
BZCOMMENT::54
(In reply to comment #53)
> Could you show as part of the CTRP INFO block, the CTRP identifier?
Added ctrp_id attribute to CTRPInfo element.
> I would like to review more data ....
More imports to review:
http://franck.nci.nih.gov/cgi-bin/cdr/show-cdr-doc.py?id=492831
http://franck.nci.nih.gov/cgi-bin/cdr/show-ctrp-doc.py?id=NCI-2011-02207
http://franck.nci.nih.gov/cgi-bin/cdr/show-cdr-doc.py?id=490122
http://franck.nci.nih.gov/cgi-bin/cdr/show-ctrp-doc.py?id=NCI-2011-02216
http://franck.nci.nih.gov/cgi-bin/cdr/show-cdr-doc.py?id=487571
http://franck.nci.nih.gov/cgi-bin/cdr/show-ctrp-doc.py?id=NCI-2011-02224
BZDATETIME::2011-11-23 14:33:33
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::55
I have reviewed all the converted documents and didn't find any
problem with the conversion. The only problem I found had to do with the
data and CTRP said it should be fixed when they receive real data from
RSS.
I think at this point we can proceed to convert more documents for
further testing.
BZDATETIME::2011-11-23 16:19:00
BZCOMMENTOR::Bob Kline
BZCOMMENT::56
(In reply to comment #55)
> I think at this point we can proceed to convert more documents
for further
> testing.
http://franck.nci.nih.gov/cgi-bin/cdr/show-cdr-doc.py?id=481416
http://franck.nci.nih.gov/cgi-bin/cdr/show-ctrp-doc.py?id=NCI-2011-02598
http://franck.nci.nih.gov/cgi-bin/cdr/show-cdr-doc.py?id=467205
http://franck.nci.nih.gov/cgi-bin/cdr/show-ctrp-doc.py?id=NCI-2011-02599
http://franck.nci.nih.gov/cgi-bin/cdr/show-cdr-doc.py?id=463533
http://franck.nci.nih.gov/cgi-bin/cdr/show-ctrp-doc.py?id=NCI-2011-02600
http://franck.nci.nih.gov/cgi-bin/cdr/show-cdr-doc.py?id=451948
http://franck.nci.nih.gov/cgi-bin/cdr/show-ctrp-doc.py?id=NCI-2011-02602
http://franck.nci.nih.gov/cgi-bin/cdr/show-cdr-doc.py?id=446842
http://franck.nci.nih.gov/cgi-bin/cdr/show-ctrp-doc.py?id=NCI-2011-02603
http://franck.nci.nih.gov/cgi-bin/cdr/show-cdr-doc.py?id=446841
http://franck.nci.nih.gov/cgi-bin/cdr/show-ctrp-doc.py?id=NCI-2011-02644
http://franck.nci.nih.gov/cgi-bin/cdr/show-cdr-doc.py?id=355910
http://franck.nci.nih.gov/cgi-bin/cdr/show-ctrp-doc.py?id=NCI-2011-02653
http://franck.nci.nih.gov/cgi-bin/cdr/show-cdr-doc.py?id=368777
http://franck.nci.nih.gov/cgi-bin/cdr/show-ctrp-doc.py?id=NCI-2011-02654
http://franck.nci.nih.gov/cgi-bin/cdr/show-cdr-doc.py?id=467205
http://franck.nci.nih.gov/cgi-bin/cdr/show-ctrp-doc.py?id=NCI-2011-02599
BZDATETIME::2011-12-01 11:17:34
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::57
(In reply to comment #56)
We have reviewed all the converted documents and didn't find any problem
with the conversion.
BZDATETIME::2011-12-06 12:08:39
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::58
I have attached a new made up mapping for five trials to test the new changes to the schema. Please convert these trials. Hopefully we will be able to get some data to test the new changes.
Attachment pretend_mapping_one.xls has been added with description: second pretend mapping for testing
BZDATETIME::2011-12-08 07:23:55
BZCOMMENTOR::Bob Kline
BZCOMMENT::59
New rows have been added to the import table for the test pairings. Next step is to use the mapping gaps tool to provide the missing Person and Organization mappings.
BZDATETIME::2011-12-14 13:40:49
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::60
(In reply to comment #59)
> New rows have been added to the import table for the test pairings.
Next step
> is to use the mapping gaps tool to provide the missing Person and
Organization
> mappings.
This is done. There were no conversion errors.
BZDATETIME::2011-12-14 15:09:03
BZCOMMENTOR::Bob Kline
BZCOMMENT::61
Looks like the second set of pretend mappings were for InScopeProtocol documents instead of CTGovProtocol documents, so the conversions failed. Please go back and re-read comment #32.
BZDATETIME::2011-12-14 15:26:10
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::62
(In reply to comment #61)
> Looks like the second set of pretend mappings were for
InScopeProtocol
> documents instead of CTGovProtocol documents, so the conversions
failed.
> Please go back and re-read comment #32.
They are indeed InScopeProtocol document. Sorry, I missed that part. Is it okay to update the spreadsheet with new CTGovProtocol documents or you would prefer a completely new mapping?
BZDATETIME::2011-12-14 15:39:36
BZCOMMENTOR::Bob Kline
BZCOMMENT::63
(In reply to comment #62)
> Is it okay to update the spreadsheet with new CTGovProtocol
documents
> or you would prefer a completely new mapping?
Let's do different trials on both ends.
BZDATETIME::2011-12-14 16:02:14
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::64
I have attached a new pretend mapping with ctgov protocol documents.
Attachment pretend_mapping_two.xls has been added with description: third pretend mapping
BZDATETIME::2011-12-21 12:18:42
BZCOMMENTOR::Bob Kline
BZCOMMENT::65
(In reply to comment #64)
> Created attachment 2198 [details]
> third pretend mapping
>
> I have attached a new pretend mapping with ctgov protocol
documents.
All but one failed, because the CTRP IDs you provided were not found in the sets we're getting from them:
NCI-2011-02219: (2, 'No such file or directory')
NCI-2011-02220: (2, 'No such file or directory')
NCI-2011-02222: (2, 'No such file or directory')
NCI-2011-02223: (2, 'No such file or directory')
NCI-2011-02225: (2, 'No such file or directory')
inserting row for NCI-2009-00324
NCI-2009-00330: (2, 'No such file or directory')
I even tried going back to the October set that I used for your first set of bogus mappings, but got the same failures.
So I backed out the one that succeeded, and I suggest that you use IDs from the attached list (avoiding the IDs we've already used for the bogus mappings).
Attachment trial-ids-2011-12-20 has been added with description: Available CTRP IDs
BZDATETIME::2011-12-22 09:21:49
BZCOMMENTOR::Bob Kline
BZCOMMENT::66
(In reply to comment #65)
> So I backed out the one that succeeded, and I suggest that you
use IDs from the
> attached list (avoiding the IDs we've already used for the bogus
mappings).
Or, better still, use the IDs from the list of RSS trials that I just posted.
Attachment rss-trial-ids-2011-12-21 has been added with description: RSS trial IDs
BZDATETIME::2011-12-27 11:59:59
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::67
I am attaching another pretend mapping of 20 trials. This is probably too many for what we are testing now so you may convert about half of them. If we have to test another enhancement, we can use the remaining half of the trials.
Attachment pretend_mapping_three.xls has been added with description: Pretend mapping
BZDATETIME::2011-12-27 14:20:54
BZCOMMENTOR::Bob Kline
BZCOMMENT::68
You dropped the population of the Import Sites column, so I modified the software to import every trial on the sheet, regardless of what was in that column.
My plan was to let you do the throttling of the number of trials which get imported at any one time, by filling in the mapping gaps for a subset of the ones that were loaded into the table, so I loaded them all. However, the mapping gap interface no longer works correctly, because CTRP is now leaving out the po_id elements in lots of places. I have reported the problem to them.
BZDATETIME::2012-01-12 07:54:02
BZCOMMENTOR::Bob Kline
BZCOMMENT::69
CTRP has corrected the problem with the missing po_id elements. I dropped the rows from your latest spreadsheet and reloaded them with the fixed versions of the documents. Next step is for CIAT to add the missing mappings for as many trials from the set as you want to test at one time (I believe you indicated you wanted to do half at first and half later). When you're ready let me know and I'll run the import job.
BZDATETIME::2012-01-13 15:51:51
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::70
I have started reviewing the mapping gaps report but the number of
mapping gaps reported appears to be too many for 24 trials. Could you
please check to see if the program is first finding a match before
reporting a mapping gap? For example:
Organization 199035 Fox Chase Cancer Center 333 Cottman Avenue
Philadelphia PA 19111-2497 United States
We have this organization as CDR0000036874 with the same address but it
is being reported on the mapping gap page.
BZDATETIME::2012-01-14 07:54:46
BZCOMMENTOR::Bob Kline
BZCOMMENT::71
The software is not matching the CTRP persons and orgs the way a human would, by comparing names and addresses. Until there is a mapping between the po_id and the CDR ID you'll see the row on the mapping table. There will be a fair amount of work up front to get the mapping table populated, but that work will drop off dramatically once we get the existing CTRP po_id values mapped. We should be careful to preserve the mappings we capture on Franck for CDR IDs which are lower than or equal to the highest CDR ID assigned on Bach the last time we refreshed Franck so you won't have to repeat that work (assuming you've been using real mappings instead of randomly plugging in IDs). Won't help for Person and Org docs created on Franck without a counterpart on Bach with the same CDR ID, but presumably that will be the minority.
BZDATETIME::2012-01-17 10:03:15
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::72
(In reply to comment #71)
> The software is not matching the CTRP persons and orgs the way a
human would,
> by comparing names and addresses. Until there is a mapping between
the po_id
> and the CDR ID you'll see the row on the mapping table.
Can you please post the files from CTRP for the 24 test trials? In earlier files that I looked at, it seemed to me that CTRP was providing us with the CDR IDs we provided them for these sites and investigators. If that is the case, can you have the program first populate the mapping table with the CDR ID and the po_id? We would then take care of the failures.
BZDATETIME::2012-01-17 14:09:16
BZCOMMENTOR::Bob Kline
BZCOMMENT::73
(In reply to comment #72)
> Can you please post the files from CTRP for the 24 test trials?
http://franck.nci.nih.gov/cgi-bin/cdr/show-ctrp-test-docs.py
> In earlier
> files that I looked at, it seemed to me that CTRP was providing us
with the CDR
> IDs we provided them for these sites and investigators.
We don't get CDR IDs from them in their documents.
BZDATETIME::2012-01-17 14:53:59
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::74
(In reply to comment #73)
> (In reply to comment #72)
>
> > Can you please post the files from CTRP for the 24 test
trials?
>
> http://franck.nci.nih.gov/cgi-bin/cdr/show-ctrp-test-docs.py
Thanks!
>
> > In earlier
> > files that I looked at, it seemed to me that CTRP was
providing us with the CDR
> > IDs we provided them for these sites and investigators.
>
> We don't get CDR IDs from them in their documents.
I am thinking we should ask CTRP to give us the CDR IDs of the sites and persons for the initial set of coop group trials we sent them. This would be very helpful and will save us a lot of time because some of the coop group protocols have over 250 sites and mapping each one of them manually would take a long time to finish. For example, the 24 test documents currently have roughly several hundreds of mapping gaps.
BZDATETIME::2012-01-17 14:59:06
BZCOMMENTOR::Bob Kline
BZCOMMENT::75
(In reply to comment #74)
> I am thinking we should ask CTRP to give us the CDR IDs of the
sites and
> persons for the initial set of coop group trials we sent them.
Can't hurt to ask. Go ahead and send them the request.
BZDATETIME::2012-01-17 21:59:09
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::76
From Sulekha's email, it looks like we don't export the CDR IDs for the persons and organizations to to CTRP so we can't get the CDR IDs from CTRP.
I think our current mapping for RSS trial site information uses CTEP IDs and CDR IDs (I will confirm this tomorrow morning). Assuming CTRP can export CTEP IDs for site and persons info, would you be able to use that to populate the CTRP mapping table?
BZDATETIME::2012-01-18 09:39:46
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::77
(In reply to comment #76)
> From Sulekha's email, it looks like we don't export the CDR IDs for
the persons
> and organizations to to CTRP so we can't get the CDR IDs from
CTRP.
>
> I think our current mapping for RSS trial site information uses
CTEP IDs and
> CDR IDs (I will confirm this tomorrow morning). Assuming CTRP can
export CTEP
> IDs for site and persons info, would you be able to use that to
populate the
> CTRP mapping table?
I just checked the mapping tables. RSS seems to be using CTSU_Person_ID and CTSU_Institution_Code mapping usages among others, which I believe are CTEP IDs. So I am assuming if CTRP can provide the CTEP IDs for the contacts they are sending us, we can map them to the existing mapping usages that we have. I will wait for you to confirm that this is possible before I ask CTRP to send us the CTEP IDs.
BZDATETIME::2012-01-18 10:05:53
BZCOMMENTOR::Bob Kline
BZCOMMENT::78
It probably would have been a good idea to mention this to CTRP back in December when they asked us if we needed the CTEP IDs.
BZDATETIME::2012-01-19 09:17:30
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::79
Please run the import job. Only two trials are ready at this point but it will be good to take a look at them.
BZDATETIME::2012-01-19 10:32:41
BZCOMMENTOR::Bob Kline
BZCOMMENT::80
(In reply to comment #79)
> Please run the import job.
http://franck.nci.nih.gov/cgi-bin/cdr/show-cdr-doc.py?id=712085
http://franck.nci.nih.gov/cgi-bin/cdr/show-ctrp-doc.py?id=NCI-2009-00449
http://franck.nci.nih.gov/cgi-bin/cdr/show-cdr-doc.py?id=450985
http://franck.nci.nih.gov/cgi-bin/cdr/show-ctrp-doc.py?id=NCI-2009-00460
BZDATETIME::2012-01-20 10:25:24
BZCOMMENTOR::Bob Kline
BZCOMMENT::81
CTRP reported that they have filed a ticket for giving us CTEP IDs. We agreed in yesterday's status meeting that we will use these to create a one-time seed of the mapping table for the existing persons and orgs, but we will not rewrite the import software to use CTEP IDs for mapping as part of the import job.
BZDATETIME::2012-01-20 13:42:43
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::82
(In reply to comment #81)
> CTRP reported that they have filed a ticket for giving us CTEP IDs.
We agreed
> in yesterday's status meeting that we will use these to create a
one-time seed
> of the mapping table for the existing persons and orgs, but we will
not rewrite
> the import software to use CTEP IDs for mapping as part of the
import job.
Since the plan is to import only trials that have been transferred and converted to CTGovProtocols, I assume you will retrieve the CTEP IDs before the trials (sites) are ready to be imported into the CDR? Also, since CTRP has still not been able to identify all RSS trials, we may have to broaden the scope of the trials to get CTEP IDs from to include ones that have not yet been tagged as RSS trials in order to get the maximum number of CTEP IDs as possible. On the other hand we can wait until at least most of the RSS trials have been identified and d tagged as such before getting the CTEP IDs since this is going to be a one-time seed.
BZDATETIME::2012-01-23 09:42:36
BZCOMMENTOR::Bob Kline
BZCOMMENT::83
Since there's nothing to prevent us from sucking all of the IDs out of every document in their feed, regardless of whether we plan to import the documents, and create the one-time seed of the mapping table, the only drawback I can see to mapping every person and organization for which they give us a CTEP ID which we can trace back unambiguously to a CDR document is that CIAT will have more mappings to review to verify that they are correct. Can't do anything, yet, of course, until they get they CTEP IDs in their documents. We can talk further about how long we want to defer this seeding event.
BZDATETIME::2012-01-25 09:25:11
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::84
(In reply to comment #80)
> (In reply to comment #79)
>
> > Please run the import job.
>
> http://franck.nci.nih.gov/cgi-bin/cdr/show-cdr-doc.py?id=712085
> http://franck.nci.nih.gov/cgi-bin/cdr/show-ctrp-doc.py?id=NCI-2009-00449
> http://franck.nci.nih.gov/cgi-bin/cdr/show-cdr-doc.py?id=450985
> http://franck.nci.nih.gov/cgi-bin/cdr/show-ctrp-doc.py?id=NCI-2009-00460
CTRP is sending us data for the <overall_official> but this is missing from the CTRP Info block of the converted document. Please add this information.
BZDATETIME::2012-01-25 11:03:52
BZCOMMENTOR::Bob Kline
BZCOMMENT::85
There's lots of information in the CTRP documents that is not being imported into the CTGovProtocol documents. The overall_official element is an example of such excluded information. You can see what we're pulling in by looking at the definition of the CTRPInfo block in the CTGovProtocol schema.
BZDATETIME::2012-01-25 11:58:25
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::86
(In reply to comment #85)
> There's lots of information in the CTRP documents that is not being
imported
> into the CTGovProtocol documents. The overall_official element is
an example
> of such excluded information. You can see what we're pulling in by
looking at
> the definition of the CTRPInfo block in the CTGovProtocol
schema.
Yes. I am aware that we don't include a lot of information from the CTRP document but the overall_official is one piece of information we should not have excluded since we publish it to cancer.gov. We can get this information from the CtgovProtocol but for consistency, we should be getting all contact information from the CTRP document.
BZDATETIME::2012-01-25 12:11:51
BZCOMMENTOR::Bob Kline
BZCOMMENT::87
(In reply to comment #86)
> ... we should be getting all contact information from the CTRP document.
That's not consistent with our decision to restrict our import to the site information.
BZDATETIME::2012-01-25 12:26:32
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::88
(In reply to comment #87)
> (In reply to comment #86)
>
> > ... we should be getting all contact information from the CTRP
document.
>
> That's not consistent with our decision to restrict our import to
the site
> information.
It looks like overall_official is considered “site information” because it seems we publish that information under the broad heading of "Trial Sites" on cancer.gov
BZDATETIME::2012-02-13 11:04:34
BZCOMMENTOR::Bob Kline
BZCOMMENT::89
(In reply to comment #81)
> CTRP reported that they have filed a ticket for giving us CTEP IDs.
We agreed
> in yesterday's status meeting that we will use these to create a
one-time seed
> of the mapping table for the existing persons and orgs, but we will
not rewrite
> the import software to use CTEP IDs for mapping as part of the
import job.
It appears that most of the orgs will be mapped (in fact, it looks like most if not all of the ones that aren't mapped are because CTRP didn't have a CTEP ID for the org), but well under half of the persons have mappings.
Attachment one-time-ctrp-po-mappings.xls has been added with description: Mappings gleaned from CTEP IDs
BZDATETIME::2012-02-20 10:59:47
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::90
We've reviewed several of the mappings contained in the spreadsheet and didn't see anything out of the ordinary. Please proceed to load this on Franck so that we can continue testing. We may have to wait until we're ready to go live to load the mappings on Bach.
BZDATETIME::2012-02-22 08:41:17
BZCOMMENTOR::Bob Kline
BZCOMMENT::91
(In reply to comment #90)
> Please proceed to load this on Franck so that we can continue testing.
Done. One mapping didn't load because the CDR document (715609) is on Bach but not yet on Franck.
BZDATETIME::2012-02-23 09:18:31
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::92
(In reply to comment #91)
> (In reply to comment #90)
>
> > Please proceed to load this on Franck so that we can continue
testing.
>
> Done. One mapping didn't load because the CDR document (715609) is
on Bach but
> not yet on Franck.
Please run the import job. We have about three trials ready to review.
BZDATETIME::2012-02-24 14:13:04
BZCOMMENTOR::Bob Kline
BZCOMMENT::93
(In reply to comment #92)
> Please run the import job. We have about three trials ready to review.
Done:
22 CTRP trials queued for site import
Trial 'NCI-2009-00315' skipped (mapping problems)
Trial 'NCI-2009-00434' skipped (mapping problems)
Trial 'NCI-2009-00495' skipped (mapping problems)
Trial 'NCI-2009-00496' skipped (mapping problems)
Trial 'NCI-2009-00654' skipped (mapping problems)
Trial 'NCI-2009-00791' skipped (mapping problems)
Trial 'NCI-2009-01173' skipped (mapping problems)
Trial 'NCI-2009-01664' skipped (mapping problems)
Trial 'NCI-2011-01981' skipped (mapping problems)
Trial 'NCI-2011-01982' skipped (mapping problems)
Trial 'NCI-2011-02016' skipped (mapping problems)
Trial 'NCI-2011-02023' skipped (mapping problems)
Trial 'NCI-2011-02029' skipped (mapping problems)
merging sites from CTRP trial NCI-2011-02041 into CDR712087
creating publishable version of CDR712087 from sites in CTRP trial
NCI-2011-02041
Trial 'NCI-2011-02052' skipped (mapping problems)
merging sites from CTRP trial NCI-2011-02053 into CDR454735
creating publishable version of CDR454735 from sites in CTRP trial
NCI-2011-02053
Trial 'NCI-2011-02067' skipped (mapping problems)
Trial 'NCI-2011-02187' skipped (mapping problems)
Trial 'NCI-2011-02609' skipped (mapping problems)
Trial 'NCI-2011-02615' skipped (mapping problems)
Trial 'NCI-2011-02619' skipped (mapping problems)
merging sites from CTRP trial NCI-2011-02678 into CDR582858
creating publishable version of CDR582858 from sites in CTRP trial
NCI-2011-02678
Updated 3 trials
Skipped 19 trials
BZDATETIME::2012-02-28 11:27:57
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::94
(In reply to comment #93)
> (In reply to comment #92)
>
> > Please run the import job. We have about three trials ready to
review.
>
> Done:
> NCI-2011-02678
> Updated 3 trials
> Skipped 19 trials
Please post the links to the xml files for the ones that have imported.
BZDATETIME::2012-03-01 09:11:20
BZCOMMENTOR::Bob Kline
BZCOMMENT::95
(In reply to comment #94)
> Please post the links to the xml files for the ones that have imported.
http://franck.nci.nih.gov/cgi-bin/cdr/show-cdr-doc.py?id=712087
http://franck.nci.nih.gov/cgi-bin/cdr/show-ctrp-doc.py?id=NCI-2011-02041
http://franck.nci.nih.gov/cgi-bin/cdr/show-cdr-doc.py?id=454735
http://franck.nci.nih.gov/cgi-bin/cdr/show-ctrp-doc.py?id=NCI-2011-02053
http://franck.nci.nih.gov/cgi-bin/cdr/show-cdr-doc.py?id=582858
http://franck.nci.nih.gov/cgi-bin/cdr/show-ctrp-doc.py?id=NCI-2011-02678
BZDATETIME::2012-03-21 09:49:01
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::96
Please run the import job again and post the xml files for the two completed trials we just reviewed and mapped.
BZDATETIME::2012-03-21 11:14:27
BZCOMMENTOR::Bob Kline
BZCOMMENT::97
(In reply to comment #96)
> Please run the import job again and post the xml files for the
two completed
> trials we just reviewed and mapped.
http://franck.nci.nih.gov/cgi-bin/cdr/show-cdr-doc.py?id=593237
http://franck.nci.nih.gov/cgi-bin/cdr/show-ctrp-doc.py?id=NCI-2011-01982
http://franck.nci.nih.gov/cgi-bin/cdr/show-cdr-doc.py?id=712088
http://franck.nci.nih.gov/cgi-bin/cdr/show-ctrp-doc.py?id=NCI-2011-02615
Looks like you've got a mapping from po_id 1529517 (Jondavid Pollock) to org document CDR30708 (Aurora Health Center - Kenosha), which caused one of the documents (CDR593237) to have a non-publishable version created instead of a publishable version. I assume since the CDR ID for the org document is so low that it's not a problem caused by mapping to a document that's so new that the CDR IDs are different between the test and production servers.
BZDATETIME::2012-03-21 11:52:43
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::98
(In reply to comment #97)
> (In reply to comment #96)
>
> > Please run the import job again and post the xml files for the
two completed
> > trials we just reviewed and mapped.
>
> http://franck.nci.nih.gov/cgi-bin/cdr/show-cdr-doc.py?id=593237
> http://franck.nci.nih.gov/cgi-bin/cdr/show-ctrp-doc.py?id=NCI-2011-01982
> http://franck.nci.nih.gov/cgi-bin/cdr/show-cdr-doc.py?id=712088
> http://franck.nci.nih.gov/cgi-bin/cdr/show-ctrp-doc.py?id=NCI-2011-02615
>
Thanks!
> Looks like you've got a mapping from po_id 1529517 (Jondavid
Pollock) to org
> document CDR30708 (Aurora Health Center - Kenosha), which caused
one of the
> documents (CDR593237) to have a non-publishable version created
instead of a
> publishable version.
This error has been fixed in the mapping table.
>I assume since the CDR ID for the org document is so low
> that it's not a problem caused by mapping to a document that's so
new that the
> CDR IDs are different between the test and production servers.
I am not sure I understand the above. Could you please explain further ?
Also, I vaguely remember that we've discussed why the Person records don't have addresses in the mapping gaps report. Do you remember why no address information is included for the person records?
BZDATETIME::2012-03-21 12:06:01
BZCOMMENTOR::Bob Kline
BZCOMMENT::99
(In reply to comment #98)
> > I assume since the CDR ID for the org document is so low
that it's
> > not a problem caused by mapping to a document that's so new
that the
> > CDR IDs are different between the test and production
servers.
>
> I am not sure I understand the above. Could you please explain
further ?
I can imagine a situation in which you found a newer person document on Bach which matched a po_id and you plugged the CDR ID into the mapping table on Franck, but the document was so new that it was created after Franck had been refreshed, and some other document (in this case, an org document) was created on Franck with that CDR ID. I knew that couldn't be the explanation in this case because the CDR was so low that it had to represent a document which was already on Bach before the last time Franck was refreshed. Does that help?
> Also, I vaguely remember that we've discussed why the Person
records don't have
> addresses in the mapping gaps report. Do you remember why no
address
> information is included for the person records?
Because we're no longer getting it from CTRP. They were sending us bogus values for the address information and some of the values were so obviously fake that even the software recognized the problems and refused to validate the results. When we asked them to fix the bad data they said the only fix they could think of was to drop the address information. Not the solution I would have picked, but it wasn't my decision to make. :-)
BZDATETIME::2012-03-21 12:37:56
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::100
(In reply to comment #99)
> (In reply to comment #98)
> >
> > I am not sure I understand the above. Could you please explain
further ?
>
> I can imagine a situation in which you found a newer person
document on Bach
> which matched a po_id and you plugged the CDR ID into the mapping
table on
> Franck, but the document was so new that it was created after
Franck had been
> refreshed, and some other document (in this case, an org document)
was created
> on Franck with that CDR ID. I knew that couldn't be the explanation
in this
> case because the CDR was so low that it had to represent a document
which was
> already on Bach before the last time Franck was refreshed. Does
that help?
Yes it does. Thanks!
>
> > Also, I vaguely remember that we've discussed why the Person
records don't have
> > addresses in the mapping gaps report. Do you remember why no
address
> > information is included for the person records?
>
> Because we're no longer getting it from CTRP. They were sending us
bogus
> values for the address information and some of the values were so
obviously
> fake that even the software recognized the problems and refused to
validate the
> results. When we asked them to fix the bad data they said the only
fix they
> could think of was to drop the address information. Not the
solution I would
> have picked, but it wasn't my decision to make. :-)
Thanks! I thought they were dropping the addresses for persons with the UNKNOWN or fake values only and not for all address that weren't fake. If we continue this way, the person mapping we are doing will not be reliable and it will also defeat the purpose of mapping the person records. Should I contact Sulekha and Charles about this?
BZDATETIME::2012-03-21 13:31:19
BZCOMMENTOR::Bob Kline
BZCOMMENT::101
(In reply to comment #100)
> Thanks! I thought they were dropping the addresses for persons
with the UNKNOWN
> or fake values only and not for all address that weren't fake. If
we continue
> this way, the person mapping we are doing will not be reliable and
it will also
> defeat the purpose of mapping the person records. Should I contact
Sulekha and
> Charles about this?
Sure. Good luck!
BZDATETIME::2012-03-21 14:02:52
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::102
(In reply to comment #101)
> (In reply to comment #100)
>
> > Thanks! I thought they were dropping the addresses for persons
with the UNKNOWN
> > or fake values only and not for all address that weren't fake.
If we continue
> > this way, the person mapping we are doing will not be reliable
and it will also
> > defeat the purpose of mapping the person records. Should I
contact Sulekha and
> > Charles about this?
>
> Sure. Good luck!
If a PI is always assigned to an Org in the data CTRP is currently providing, that should be helpful. In that case, for each person with a mapping gap, if you provide us with the name and address of the org, that should solve problem. Having the names by themselves like we currently have makes finding the correct person record difficult since we have to make a lot of guesses.
BZDATETIME::2012-03-22 09:56:04
BZCOMMENTOR::Bob Kline
BZCOMMENT::103
(In reply to comment #102)
> If a PI is always assigned to an Org in the data CTRP is
currently providing,
> that should be helpful.
Most PIs we get from them do not have an affiliation block.
BZDATETIME::2012-03-26 11:16:51
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::104
Adding email communication with CTRP -
From: Sulekha Avasthi savasthi@samvit-solutions.com
Sent: Thursday, March 22, 2012 3:57 PM
To: Osei-Poku, William
Cc: cyaghmour@samvit-solutions.com; 'Bob Kline'; 'Beckwith, Margaret
(NIH/NCI) [E]'; 'edmond mulaire'
Subject: RE: Getting address info. for person records
Hi William
We do have some unknowns for Organizations and Persons addresses in
our Database which we need to cleanup. It is a known issue.
However, we are still sending the addresses in the export regardless of
‘Unknowns’ for Organization.
We don’t send the addresses at person level because requirement document
does not state to include address for persons.
The following is an excerpt from Requirement Document.
• For all persons and organizations provide the following
information:
a. PO-ID
b. CTEP-ID (when available)
c. Full address for Organizations
d. Phone number, and extension when available
e. Email address
We can discuss the alternative options to locate the ‘Persons addresses’ in tomorrow’s call.
Thanks
Sulekha
From: Osei-Poku, William wOsei-Poku@icfi.com
Sent: Thursday, March 22, 2012 2:16 PM
To: savasthi@samvit-solutions.com
Cc: cyaghmour@samvit-solutions.com; Bob Kline; Beckwith, Margaret
(NIH/NCI) [E]
Subject: Getting address info. for person records
Hi Sulekha,
Would it be possible for you to export addresses for PIs in all cases
except when they are Unknown (that is, only drop the address block for
records with bogus or Unknown values?). We ran into a few problems while
trying to identify some of the person records to map them to existing
CDR records. We do this for two main purposes; to avoid creating
duplicate records and also to get better or accurate results when users
search for trials using PI names on Cancer.gov. It looks like it was
decided during one of the conference calls to not export addresses for
person records due to the issue of the Unknown or bogus addresses we
received initially and to rely on the affiliation addresses instead.
However, currently most PIs don’t have an affiliation block in the
current data which makes getting the addresses for the PIs from you very
important.
Thanks,
William
William Osei-Poku
Cancer Information Analysis and Tracking – CIAT
ICF International, Contractor
Office: 301-407-6647
Cell: 240-506-5860
BZDATETIME::2012-03-26 11:22:06
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::105
> ------Original Message
> From: Osei-Poku, William
> Sent: Friday, March 23, 2012 3:01 PM
> To: 'Bob Kline'; Sulekha Avasthi
> Cc: cyaghmour@samvit-solutions.com; edmond mulaire
> Subject: RE: FW: Getting address info. for person records
>
> [Osei-Poku, William]
>
> Like I said on the call, the best solution for me is to provide the
actual
> addresses of the PIs if they are available.
> However, if that is not possible, using the addresses of the
facilities each
> PI is associated with should help somewhat. It will not be the
perfect
> solution but it would be better than the current situation.
>
> Thanks,
> William
>
>
> > ------Original Message
> > From: Bob Kline bkline@rksystems.com
> > Sent: Friday, March 23, 2012 2:54 PM
> > To: Sulekha Avasthi
> > Cc: Osei-Poku, William; cyaghmour@samvit-solutions.com; edmond
mulaire
> >
> > That's more a question for William, who's having to figure out
which
> > John Smith he's dealing with when he's filling in mapping
gaps. It's
> > an issue of how reliable the link between the site
participating in
> > the trial and the person is as distinguishing information.
That's
> > subject matter with which William is much more familiar than I
am.
> >
> > --
> > Bob Kline
> > http://www.rksystems.com
> > mailto:bkline@rksystems.com
>
>
> [Osei-Poku, William] From: Sulekha Avasthi savasthi@samvit-solutions.com
> Sent: Friday, March 23, 2012 2:34 PM
> To: 'Bob Kline'
> Cc: Osei-Poku, William; cyaghmour@samvit-solutions.com; 'edmond
mulaire'
> Subject: FW: Getting address info. for person records
>Hi Bob
> We just finished the PDQ Call.
> We have Principal Investigator’s address available under Overall Official. For Overall official we also have >affiliated organization address . These are available only at Study level.
> At each location/participating site level, even if don’t have addresses for principal investigators available, we do > have addresses of facility. Can we use facility address as affiliated organization address for Principal >Investigator at Location(Participating Site level)?
>Please let me know if you have any questions.
>Thanks
>Sulekha
>From: Sulekha Avasthi savasthi@samvit-solutions.com
>Sent: Thursday, March 22, 2012 3:57 PM
>To: 'Osei-Poku, William'
>Cc: 'cyaghmour@samvit-solutions.com'; 'Bob Kline'; 'Beckwith,
Margaret (NIH/NCI) [E]'; 'edmond mulaire'
>Subject: RE: Getting address info. for person records
>Hi William
>We do have some unknowns for Organizations and Persons addresses
in our Database which we need to cleanup. It is a >known issue.
>However, we are still sending the addresses in the export regardless
of ‘Unknowns’ for Organization.
>We don’t send the addresses at person level because requirement
document does not state to include address for >persons.
>The following is an excerpt from Requirement Document.
>• For all persons and organizations provide the following
information:
>a. PO-ID
>b. CTEP-ID (when available)
>c. Full address for Organizations
>d. Phone number, and extension when available
>e. Email address
>We can discuss the alternative options to locate the ‘Persons addresses’ in tomorrow’s call.
>Thanks
>Sulekha
BZDATETIME::2012-03-26 11:39:24
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::106
(In reply to comment #103)
> (In reply to comment #102)
>
> > If a PI is always assigned to an Org in the data CTRP is
currently providing,
> > that should be helpful.
>
> Most PIs we get from them do not have an affiliation block.
On last Friday's call, Sulekha said that they can provide us with the addresses of the PI's but they will not be adding it to the existing XML files. I thought that getting the addresses separately from the XML files will not be the best solution unless Bob can extract the information and incorporate it in the interface for reviewing the mapping gaps. If this is not possible to do programmatically, then it will even be more difficult for us to review the person records by referencing another spreadsheet in the process. The other solution, which is not necessarily better but may be helpful is to use the names and addresses of the facilities of each PI to aid in searching for the PIs since at least for trials that are active and have participating sites, each PI will have a facility assigned to him or her.
BZDATETIME::2012-04-05 08:59:48
BZCOMMENTOR::Bob Kline
BZCOMMENT::107
(In reply to comment #106)
> ... The other solution, which is not necessarily better but may
be
> helpful is to use the names and addresses of the facilities of
each
> PI to aid in searching for the PIs ....
Are you asking me to implement this, or is this just in the speculation stage (that is, you're still weighing whether it would actually be helpful)? If you are asking for this enhancement, can you confirm that you only want the facility address for persons who appear in the location/investigator blocks with role of "Principal Investigator"?
BZDATETIME::2012-04-12 10:10:44
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::108
(In reply to comment #107)
> (In reply to comment #106)
>
> > ... The other solution, which is not necessarily better but
may be
> > helpful is to use the names and addresses of the facilities of
each
> > PI to aid in searching for the PIs ....
>
> Are you asking me to implement this, or is this just in the
speculation stage
> (that is, you're still weighing whether it would actually be
helpful)? If you
> are asking for this enhancement, can you confirm that you only want
the
> facility address for persons who appear in the
location/investigator blocks
> with role of "Principal Investigator"?
In our last CDR meeting we agreed that it would be helpful for Margaret to put in a word for adding the PIs addresses to the schema and for that matter the existing XML files instead of providing them in a separate file as Sulekha suggested. That would be our first choice and it would be most helpful in finding the correct records in the CDR. Displaying the organization addresses in the web interface will be the last resort. While that will be helpful in certain cases, it will not be helpful in the vast majority of cases.
BZDATETIME::2012-05-10 13:54:10
BZCOMMENTOR::Bob Kline
BZCOMMENT::109
Decided at the status meeting:
drop CTRP overall official from our publishing filters
make the person mapping optional
keep the org mapping required
keep the person mapping gaps on the mapping gap report
BZDATETIME::2012-06-14 11:58:18
BZCOMMENTOR::Bob Kline
BZCOMMENT::110
(In reply to comment #109)
> Decided at the status meeting:
>
> * drop CTRP overall official from our publishing filters
> * make the person mapping optional
> * keep the org mapping required
> * keep the person mapping gaps on the mapping gap report
The schema modification has been made on Mahler and Franck. I estimate that I can complete the rest of the work for these changes in 4 hours or less.
BZDATETIME::2012-06-14 15:44:02
BZCOMMENTOR::Bob Kline
BZCOMMENT::111
Are we ready to push the CTRP mapping values from Franck to Bach? Or did you want to do some more work on those before I do that? We definitely want to take care of this before the next refresh of Franck, though Volker tells me there are no immediate plans for such a refresh. Of the 5,394 rows in the mapping table with CTRP_PO_ID as the usage, 3,621 have mappings to CDR documents. Of those, 21 are to documents created on Franck without counterparts on Bach, so very little of the mapping work done on Frank will be lost.
BZDATETIME::2012-06-14 16:22:47
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::112
(In reply to comment #111)
> Are we ready to push the CTRP mapping values from Franck to Bach?
Or did you
> want to do some more work on those before I do that? We definitely
want to
> take care of this before the next refresh of Franck, though Volker
tells me
> there are no immediate plans for such a refresh. Of the 5,394 rows
in the
> mapping table with CTRP_PO_ID as the usage, 3,621 have mappings to
CDR
> documents. Of those, 21 are to documents created on Franck
without
> counterparts on Bach, so very little of the mapping work done on
Frank will be
> lost.
We will like to quickly do some more mapping on Franck before you push the values to Bach.
BZDATETIME::2012-06-14 17:21:37
BZCOMMENTOR::Bob Kline
BZCOMMENT::113
(In reply to comment #112)
> We will like to quickly do some more mapping on Franck before
you push the
> values to Bach.
OK. Be aware, though, that there's nothing I can think of that would prohibit creating mappings on Bach. They just wouldn't be used by any software until we roll out the CTRP download and import jobs.
BZDATETIME::2012-06-14 17:31:05
BZCOMMENTOR::Bob Kline
BZCOMMENT::114
We said in this afternoon's status meeting that the initial CTRP import activity would be for new trials. I printed out this issue and started to read through it to refresh my memory on the logic for how the download software knows which new trials to put in the queue for importing sites. I quickly realized that it would be more efficient and reliable to have William or Margaret provide a succinct statement of what that logic should be (at least for the new trials that we'll have to deal with at the outset) than for me to piece it together from this 35-page ❗ printout. I did get far enough in my reading of the issue's comments to find this gem: "The worst-case scenario is that we have what we need to begin development and testing at the same point as we are expected to have all of this in production." I assume the testing part will still be true, right? That is, the test will be what happens when we are in production.
BZDATETIME::2012-06-15 10:28:04
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::115
(In reply to comment #113)
> (In reply to comment #112)
>
> > We will like to quickly do some more mapping on Franck before
you push the
> > values to Bach.
>
> OK. Be aware, though, that there's nothing I can think of that
would prohibit
> creating mappings on Bach. They just wouldn't be used by any
software until we
> roll out the CTRP download and import jobs.
In that case, please proceed to push the values to Bach. I was concerned we wouldn't be able to map on Bach.
BZDATETIME::2012-06-15 11:32:55
BZCOMMENTOR::Bob Kline
BZCOMMENT::116
(In reply to comment #115)
> In that case, please proceed to push the values to Bach.
Before I do, you might want to check the 10 mappings for the CTRP_PO_ID usage and let me know whether you want them preserved (don't know if they duplicate or conflict with mappings on Franck).
BZDATETIME::2012-06-15 12:25:10
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::117
(In reply to comment #116)
> (In reply to comment #115)
>
> > In that case, please proceed to push the values to Bach.
>
> Before I do, you might want to check the 10 mappings for the
CTRP_PO_ID usage
> and let me know whether you want them preserved (don't know if they
duplicate
> or conflict with mappings on Franck).
I am not sure about the 10 mappings you're referring to. Are these the first 10 mappings that show up on the CTRP_PO_ID mapping usage table? On the other hand if we will be losing just 10 mappings, that should not pose any serious problems.
BZDATETIME::2012-06-15 12:48:04
BZCOMMENTOR::Bob Kline
BZCOMMENT::118
(In reply to comment #117)
> I am not sure about the 10 mappings you're referring to. Are
these the first 10
> mappings that show up on the CTRP_PO_ID mapping usage table?
They're the only ones that show up for that usage on Bach.
BZDATETIME::2012-06-15 13:30:45
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::119
(In reply to comment #118)
> (In reply to comment #117)
>
> > I am not sure about the 10 mappings you're referring to. Are
these the first 10
> > mappings that show up on the CTRP_PO_ID mapping usage
table?
>
> They're the only ones that show up for that usage on Bach.
Found them. Thanks! I was still looking on Franck :-). I have been trying to find out how the table got populated on Bach since we haven't actively populated the table, as far as I can remember. At this time, I am still inclined to either delete or overwrite them since we have not started using the table on Bach yet and it doesn't look like losing the 10 mappings would cause any problems. If you agree, please proceed to copy the mapping values from Franck to Bach and overwrite the 10 mappings.
BZDATETIME::2012-06-15 16:03:47
BZCOMMENTOR::Bob Kline
BZCOMMENT::120
(In reply to comment #119)
> Found them. Thanks! I was still looking on Franck :-). I have
been trying to
> find out how the table got populated on Bach since we haven't
actively
> populated the table, as far as I can remember. At this time, I am
still
> inclined to either delete or overwrite them since we have not
started using the
> table on Bach yet and it doesn't look like losing the 10 mappings
would cause
> any problems. If you agree, please proceed to copy the mapping
values from
> Franck to Bach and overwrite the 10 mappings.
"Delete" and "overwrite" aren't exactly the same thing. Earlier in your comment you said "delete or overwrite" whereas later you said "overwrite" the 10 mappings. If you can confirm that it makes no difference to you which approach I use I'll just delete them from Bach, as that would simplify the process.
BZDATETIME::2012-06-15 16:17:20
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::121
(In reply to comment #120)
> If you can confirm that it makes no difference to you which
> approach I use I'll just delete them from Bach, as that would
simplify the
> process.
Confirmed. Please delete them from Bach.
BZDATETIME::2012-06-15 16:32:10
BZCOMMENTOR::Bob Kline
BZCOMMENT::122
OK, I wiped out the 10 mappings which were there and have installed the 3,600 mappings from Franck which pointed to documents which were also on Bach. Please look them over carefully to confirm that the transfer was successful. Also, please make confirm that everyone knows to make sure when they're engaged in any future work on CTRP mapping which they expect to carry forward into production, that they're doing it on BACH.
BZDATETIME::2012-06-19 10:38:15
BZCOMMENTOR::Bob Kline
BZCOMMENT::123
[Email communication 2012-06-19 from Edmond Mulaire (CTRP)]
Bob,
We are preparing our QA1 environment for your team to resume testing of the nightly XML file dumps. This is the environment you tested on earlier.
We believe that we've included all the agreed upon changes and that the nightly job should now work as expected.
I anticipate we will have QA1 ready for your team in 1-3 days. We are just waiting at this point on getting a refresh of QA1 with latest from PROD DB so that we can have you test against most current data.
Sulekha or I will let you know once the environment is ready and will set up a call to discuss the testing and next steps. In the meantime, please let us know if you have any questions for us or recommendations for the testing.
Thanks,
Edmond
BZDATETIME::2012-06-19 10:43:37
BZCOMMENTOR::Bob Kline
BZCOMMENT::124
(In reply to comment #114)
> ... it would be more efficient and reliable to have William or
Margaret
> provide a succinct statement of what that logic should be (at
least
> for the new trials that we'll have to deal with at the outset)
....
In light of Edmond's email message (posted in the previous comment) talking about an upcoming phone call to discuss testing, it would probably be good to nail down the logic for identifying new CTRP trials which should be queued for merge of trial information into CTGovProtocol documents pretty soon. Also, can you tell me if for new trials we'll be in a position to actually test with the QA data they're preparing for us? Or will we not have the corresponding CTGovProtocol documents in our system?
BZDATETIME::2012-06-22 16:51:44
BZCOMMENTOR::Bob Kline
BZCOMMENT::125
I have checked the latest versions of the code into version control, and they are installed on Franck. Nothing's set up in either of the schedulers. For one thing, so many aspects of this project are extremely volatile, and it makes much more sense to me that we would run the jobs by hand at the outset, monitoring them very closely for unexpected behavior. For another thing, it probably makes more sense to have Volker plug this into the Windows scheduler (with which he's familiar and I'm not) than to have me install it with the SQL Server scheduler. The three source files are:
Utilities/bin/DownloadCtrpTrials.py
Utilities/bin/ImportCtrpSites.py
lib/Python/ctrp.py
The first two (the main scripts) have adequate documentation in the code. The third has very skimpy documentation, which I plan to flesh out after the dust has settled on the requirements for the system. I'm hoping that Volker and Alan won't have need to dig into that module too much in my absence.
I have also updated and checked in tables.sql and CreateLogins.sql.
Alan and Volker:
Please take a look at the download and import scripts and let me know if you think it's clear enough how things work, at least at a high level, in anticipation of the possibility you may have to step in to fix things if we get something unexpected from CTRP, or the fact that I haven't been able to do any end-to-end tested means I've missed some bugs (that's a certainty). One light of hope is that I just heard from CTRP that their launch will likely be delayed (I guess I'm less surprised than Margaret will be). If we're really lucky, things won't really get started until at least after July (and if we're really really lucky, it will never happen).
BZDATETIME::2012-06-28 11:40:32
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::126
Well, I had heard that there was likely a delay at the CTRP meeting last week and forgot to pass it along since I left town right after the meeting. I was a little surprised, but also really glad to hear it! I don't want it to be delayed forever because I really, really, want to get done with this for good.
BZDATETIME::2012-08-07 12:50:25
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::127
I have attached the spreadsheet of trials from CTRP which will be used to create mappings of CTRP and CDR trials. This should help us check to see if a trial from CTRP is new or not.
Attachment CTRP Trials 8-1-12.xlsx has been added with description: list of trials from CTRP
BZDATETIME::2012-08-28 14:44:45
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::128
The Mapping Gaps page on Bach is returning a CGI error
"CGI Error
The specified CGI application misbehaved by not returning a complete set
of HTTP headers."
Just a friendly reminder that I have attached the spreadsheet to be used to check for existing trials (comment #127).
BZDATETIME::2012-08-29 09:24:44
BZCOMMENTOR::Bob Kline
BZCOMMENT::129
(In reply to comment #127)
> I have attached the spreadsheet of trials from CTRP which will
be used to
> create mappings of CTRP and CDR trials. This should help us check
to see if a
> trial from CTRP is new or not.
I'm planning on ignoring rows that don't have all three IDs (CTRP, CDR, and NCT). Does that sound right? I assume there's an explanation for rows that are on the spreadsheet but don't have all three IDs.
Just to refresh everyone's memory for how I'll match trials from CTRP up with our CDR CTGovProtocol documents, here's the approach the download program will take:
For each trial in the download set:
Check to see if the CTRP ID is already in the ctrp_import table
If it is, we have already matched up the trial to the right
document
Otherwise, look in the mapping table derived from this spreadsheet
If we find a match, we use it and add a row to the ctrp_import
table
If we still don't have a match, look for CTRP ID in the OrgStudyId
or SecondaryID element of a CTGovProtocol doc
If we find exactly one such document, accept that CTGovProtocol
document
as the match, and add a row to the ctrp_import table
Let me know if you see any gaps or other problems with this approach.
Before I can proceed any further, we need to address the problem that the spreadsheet maps more than one CTRP ID to the same CDR ID in a number of cases, and also maps more than one CTRP ID to the same NCT ID:
duplicate CDR ID in mapping table: 396777
duplicate NCT ID in mapping table: NCT00098839
duplicate CDR ID in mapping table: 445095
duplicate NCT ID in mapping table: NCT00238264
duplicate CDR ID in mapping table: 658554
duplicate NCT ID in mapping table: NCT01011478
duplicate CDR ID in mapping table: 658554
duplicate NCT ID in mapping table: NCT01011478
duplicate CDR ID in mapping table: 66277
duplicate NCT ID in mapping table: NCT00003325
duplicate CDR ID in mapping table: 643316
duplicate NCT ID in mapping table: NCT00951184
duplicate CDR ID in mapping table: 594334
duplicate NCT ID in mapping table: NCT01119560
duplicate CDR ID in mapping table: 656038
duplicate CDR ID in mapping table: 719956
duplicate NCT ID in mapping table: NCT01493817
If we don't fix this problem, I'm faced with the following choices:
1. Ignore those trials
2. Arbitrarily pick one of the CTRP trials to merge into the CTGov
doc
3. Import from one, then overwrite from another CTRP trial
4. Modify the schema/software to merge multiple CTRP docs into 1 CDR
doc
None of these options is very appealing, particularly the last.
While we're chewing on the above, should we have Volker refresh Franck so we can do some testing? It looks like we actually have some CTGovProtocols into which we can merge CTRP site information.
BZDATETIME::2012-08-29 11:02:13
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::130
(In reply to comment #129)
>
> I'm planning on ignoring rows that don't have all three IDs (CTRP,
CDR, and
> NCT). Does that sound right? I assume there's an explanation for
rows that
> are on the spreadsheet but don't have all three IDs.
It is okay to ignore rows without all three IDs.
>
> Just to refresh everyone's memory for how I'll match trials from
CTRP up with
> our CDR CTGovProtocol documents, here's the approach the download
program will
> take:
>
> For each trial in the download set:
> Check to see if the CTRP ID is already in the ctrp_import
table
> If it is, we have already matched up the trial to the right
document
> Otherwise, look in the mapping table derived from this
spreadsheet
> If we find a match, we use it and add a row to the ctrp_import
table
> If we still don't have a match, look for CTRP ID in the
OrgStudyId
> or SecondaryID element of a CTGovProtocol doc
> If we find exactly one such document, accept that CTGovProtocol
document
> as the match, and add a row to the ctrp_import table
>
> Let me know if you see any gaps or other problems with this
approach.
This sounds right to me.
>
> Before I can proceed any further, we need to address the problem
that the
> spreadsheet maps more than one CTRP ID to the same CDR ID in a
number of cases,
> and also maps more than one CTRP ID to the same NCT ID:
>
Please ignore these for now. If I am recollecting correctly, you will
give us the ability to update the mapping table so we will investigate
and fix these anomalies.
> While we're chewing on the above, should we have Volker refresh
Franck so we
> can do some testing? It looks like we actually have some
CTGovProtocols into
> which we can merge CTRP site information.
Yes. That would be good. We have started getting some of these trials from CTGov already so we will have real data to test with on Franck.
BZDATETIME::2012-08-29 11:12:59
BZCOMMENTOR::Bob Kline
BZCOMMENT::131
(In reply to comment #130)
> > Before I can proceed any further, we need to address the
problem that the
> > spreadsheet maps more than one CTRP ID to the same CDR ID in a
number of
> > cases, and also maps more than one CTRP ID to the same NCT
ID:
> >
> Please ignore these for now. If I am recollecting correctly, you
will give us
> the ability to update the mapping table so we will investigate and
fix these
> anomalies.
This table is not the mapping table for which you will have a user interface. We're talking about the one-time table used to match up trials which were already in PDQ when we cut over to importing CTRP site information, based on the spreadsheet you give me. In other words, this is a static table. We can fix the spreadsheet before I put it on Bach, but don't think of it as a fluid, editable table.
> > ... should we have Volker refresh Franck ....
>
> Yes. That would be good. We have started getting some of these
trials from
> CTGov already so we will have real data to test with on Franck.
Volker:
Could you do a refresh of Franck at some time when it's convenient for you? (You've probably been eager to get fresh data on Franck for a while now.) I believe I will either get what's already on Franck for CTRP (person, org mappings, etc.) from Bach, since we took the trouble to promote those mappings, or (for table definitions) I'll recreate the rest from SQL scripts (though I believe I've already got the tables on Bach, too).
Thanks!
BZDATETIME::2012-08-29 11:54:18
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::132
(In reply to comment #131)
> (In reply to comment #130)
>
> > > Before I can proceed any further, we need to address the
problem that the
> > > spreadsheet maps more than one CTRP ID to the same CDR ID
in a number of
> > > cases, and also maps more than one CTRP ID to the same
NCT ID:
> > >
> > Please ignore these for now. If I am recollecting correctly,
you will give us
> > the ability to update the mapping table so we will investigate
and fix these
> > anomalies.
>
> This table is not the mapping table for which you will have a user
interface.
> We're talking about the one-time table used to match up trials
which were
> already in PDQ when we cut over to importing CTRP site information,
based on
> the spreadsheet you give me. In other words, this is a static
table. We can
> fix the spreadsheet before I put it on Bach, but don't think of it
as a fluid,
> editable table.
Sounds good to me. We can update the spreadsheet before you do the final mapping. However, it is possible that we may want to update the mapping when we find errors in the future so I was under the impression that we would be able to update it without having to come to you each time we find something wrong.
BZDATETIME::2012-08-29 13:56:18
BZCOMMENTOR::Bob Kline
BZCOMMENT::133
(In reply to comment #132)
> ... I was under the impression that we would be able to update
it
> without having to come to you each time we find something
wrong.
Well, it's ultimately up to Margaret, balancing the requirements of the all the projects we have going on with the available resources. My own picture of the equation is that, since this table's usefulness is temporary (it only applies to trials that are already in PDQ, not to any future trials), allocating resources to build a user interface "fixing" the mapping in the table would be less appealing than if the table were a permanent feature of CTRP process. Am I missing something?
BZDATETIME::2012-08-29 14:32:41
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::134
(In reply to comment #133)
> (In reply to comment #132)
>
> > ... I was under the impression that we would be able to update
it
> > without having to come to you each time we find something
wrong.
>
> Well, it's ultimately up to Margaret, balancing the requirements of
the all the
> projects we have going on with the available resources. My own
picture of the
> equation is that, since this table's usefulness is temporary (it
only applies
> to trials that are already in PDQ, not to any future trials),
allocating
> resources to build a user interface "fixing" the mapping in the
table would be
> less appealing than if the table were a permanent feature of CTRP
process. Am
> I missing something?
I don't think you're missing anything and I am okay with going without an interface to update the mapping. However, I can't confirm that every single trial that needs to be transferred and ultimately updated by CTRP has been captured in the spreadsheet. Also, the mapping was done by CTRP and we reviewed a considerable number to confirm that the mapping was done right. It is conceivable that some trials are missing from the spreadsheet, for example. If we find out about this and other potential ID issues later, we will have to depend on you to update the table since we won't have access to it.
BZDATETIME::2012-08-29 16:14:23
BZCOMMENTOR::Bob Kline
BZCOMMENT::135
I have run a download job on Franck, resulting in 69 CTRP trial documents queued to have their site information imported. I then ran the import job. Two documents were imported, the rest had mapping problems (use the CTRP mapping interface to review these). The mapping interface shows missing person mappings, but the import job ignores those. You can see the documents represented in the ctrp_import table using
http://franck.nci.nih.gov/cgi-bin/cdr/show-ctrp-doc.py
The IDs in the first column link to a page showing you the XML we got from CTRP. The second column shows the current disposition for the trial. The IDs in the third column link to a page showing the corresponding CDR document. The IDs in the last column link to the CT.gov page for the trial.
BZDATETIME::2012-08-29 16:36:05
BZCOMMENTOR::Bob Kline
BZCOMMENT::136
(In reply to comment #135)
> Two documents were imported, the rest had mapping problems (use
the CTRP
> mapping interface to review these).
Just a reminder: you probably don't want to invest a lot of time filling in the mapping gaps on Franck. Do it on Bach instead, and then we can have Volker do another refresh of Franck if you want to do more testing. You can do a little mapping on Franck if you like, but just remember that you'll to do it all over again on Bach.
BZDATETIME::2012-08-29 19:37:09
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::137
There appear to be access issues with the mapping table on Franck. I am unable to add anything to it and even when I click on the view button to see existing mappigns, I am getting the following error:
CDR Web Interface
An error has occured
Failure inserting parms: The SELECT permission was denied on the
object 'url_parm_set', database 'cdr', schema 'dbo'.
Content-type: text/html
CDR Web Interface
An error has occured
Something went wrong
BZDATETIME::2012-08-29 22:55:52
BZCOMMENTOR::Bob Kline
BZCOMMENT::138
(In reply to comment #137)
> Something went wrong
Looks like an older version of the script to update the database permission got used for the refresh of Franck. I have updated the version of the script on all three servers from Subversion. Please try the mapping interface again.
BZDATETIME::2012-08-30 07:30:00
BZCOMMENTOR::Bob Kline
BZCOMMENT::139
I took one of the trials on Franck which had only a couple of missing Organization mappings and a handful of missing Person mappings. I mapped CTRP's po_id 170827 to the existing Organization document CDR0000029082, and I used the mapping gaps interface (pretending that there wasn't already an existing Organization document) to create the new Organization document CDR0000739362. I then ran the import job again and the sites for that trial (2009-00361) were imported:
http://franck.nci.nih.gov/cgi-bin/cdr/show-cdr-doc.py?id=507414
BZDATETIME::2012-08-30 09:42:55
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::140
I have added a few more mappings. Could you please run the import job again?
BZDATETIME::2012-08-30 09:51:25
BZCOMMENTOR::Bob Kline
BZCOMMENT::141
(In reply to comment #140)
> I have added a few more mappings. Could you please run the import
job again?
Done. Two more trials made it in.
BZDATETIME::2012-08-30 10:05:06
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::142
(In reply to comment #129)
>
> Before I can proceed any further, we need to address the problem
that the
> spreadsheet maps more than one CTRP ID to the same CDR ID in a
number of cases,
> and also maps more than one CTRP ID to the same NCT ID:
>
I am attaching the updated spreadsheet. Besides fixing the errors you
reported. I found additional mapping errors from the 69 you imported on
Franck. There were two trials that have been mapped to blocked trials. I
have fixed the errors in the spreadsheet so that they will be mapped to
the right trials. Could you please run the import job again once you
incorporate the spreadsheet so that I can double-check the mappings?
Attachment 8-30-12 CTRP List.xlsx has been added with description: list of trials from CTRP
BZDATETIME::2012-08-30 11:37:56
BZCOMMENTOR::Bob Kline
BZCOMMENT::143
(In reply to comment #142)
> Could you please run the import job again once you incorporate
the spreadsheet
> so that I can double-check the mappings?
I assumed you meant the download job, since that's the step which uses the table created from your spreadsheet. I ran it:
total trials downloaded: 835
RSS trials downloaded: 487
non-RSS trials downloaded: 348
new RSS trials queued: 0
changed RSS trials queued: 0
unchanged RSS trials skipped: 69
unmatched RSS trials skipped: 418
trials failed processing: 0
BZDATETIME::2012-08-30 11:55:52
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::144
(In reply to comment #143)
> (In reply to comment #142)
>
> > Could you please run the import job again once you incorporate
the spreadsheet
> > so that I can double-check the mappings?
>
> I assumed you meant the download job, since that's the step which
uses the
> table created from your spreadsheet. I ran it:
Yes. Thanks! Also, please run the import job again.
BZDATETIME::2012-08-30 16:40:25
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::145
Please run another import job. Hopefully, the last import job for today.
BZDATETIME::2012-08-30 16:50:07
BZCOMMENTOR::Bob Kline
BZCOMMENT::146
(In reply to comment #145)
> Please run another import job. Hopefully, the last import job for
today.
No problem; only two keystrokes. Done.
BZDATETIME::2012-08-30 16:59:33
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::147
(In reply to comment #146)
> (In reply to comment #145)
> > Please run another import job. Hopefully, the last import job
for today.
>
> No problem; only two keystrokes. Done.
I expected CDR0000600217 NCI-2009-00312 to be imported but it did not.
BZDATETIME::2012-08-30 17:26:33
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::148
(In reply to comment #147)
> (In reply to comment #146)
> > (In reply to comment #145)
> > > Please run another import job. Hopefully, the last import
job for today.
> >
> > No problem; only two keystrokes. Done.
>
> I expected CDR0000600217 NCI-2009-00312 to be imported but it did
not.
It appears these data problems are what is causing it not to
import.
Country None
State/Province None|New Zealand
State/Province None|None
BZDATETIME::2012-08-31 10:00:35
BZCOMMENTOR::Bob Kline
BZCOMMENT::149
(In reply to comment #148)
> It appears these data problems are what is causing it not to
import.
> Country None
> State/Province None|New Zealand
> State/Province None|None
Right. You'll need to get CTRP to fix their data for those.
BZDATETIME::2012-08-31 10:18:03
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::150
(In reply to comment #149)
> (In reply to comment #148)
>
> > It appears these data problems are what is causing it not to
import.
> > Country None
> > State/Province None|New Zealand
> > State/Province None|None
>
> Right. You'll need to get CTRP to fix their data for those.
I looked at their XML and couldn't tell if they were including the text "None" or they were leaving the tags empty. I searched for the word "None" and came up empty. Can you please confirm exactly what they are sending us?
BZDATETIME::2012-08-31 10:19:56
BZCOMMENTOR::Bob Kline
BZCOMMENT::151
They're sending empty elements.
BZDATETIME::2012-08-31 10:30:17
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::152
(In reply to comment #151)
> They're sending empty elements.
Can you ignore empty elements only when they are addresses? Are the empty elements part of the organization information, person information or both? If they are part of the person information only, I think we can safely ignore them. I think they said there were cases where they did not have the data and we asked them not to send bogus data.
BZDATETIME::2012-08-31 11:03:30
BZCOMMENTOR::Bob Kline
BZCOMMENT::153
(In reply to comment #152)
> (In reply to comment #151)
> > They're sending empty elements.
>
> Can you ignore empty elements only when they are addresses? Are the
empty
> elements part of the organization information, person information
or both? If
> they are part of the person information only, I think we can safely
ignore
> them. I think they said there were cases where they did not have
the data and
> we asked them not to send bogus data.
I will modify the software to pretend the empty address elements aren't there at all. That doesn't solve all your problems, though, because the country element is required.
BZDATETIME::2012-08-31 11:16:08
BZCOMMENTOR::Bob Kline
BZCOMMENT::154
(In reply to comment #153)
> I will modify the software to pretend the empty address elements
aren't there
> at all.
Done:
merging sites from CTRP trial NCI-2009-00312 into CDR600217
creating publishable version of CDR600217 from sites in CTRP trial
NCI-2009-00312
No match found in content model for type CTAddress with child elements
of PostalAddress element (Street); stopped at element Street',
'Non-publishable version will be created.
BZDATETIME::2012-08-31 11:50:50
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::155
(In reply to comment #154)
> (In reply to comment #153)
>
> > I will modify the software to pretend the empty address
elements aren't there
> > at all.
>
> Done:
>
> merging sites from CTRP trial NCI-2009-00312 into CDR600217
> creating publishable version of CDR600217 from sites in CTRP
trial
> NCI-2009-00312
> No match found in content model for type CTAddress with child
elements of
> PostalAddress element (Street); stopped at element Street',
'Non-publishable
> version will be created.
Thanks! It looks like this error is being raised at the CTRPOverallOfficial level, which I believe we no longer use, If I remember correctly. I think it will be safe here also not to require the address information (Under the CTRPOverallOfficial) in the Schema, if we really don't use that information for publishing. (In reply to comment #153)
>
> I will modify the software to pretend the empty address elements
aren't there
> at all. That doesn't solve all your problems, though, because the
country
> element is required.
Yes. That will be a problem. I can't imagine publishing a site to Cancer.gov without the country. So far, all the trials I have reviewed are either closed or completed. I am actually having difficulty identifying an Active trial for Volker to test publishing. Hopefully, we won't see this kind of data in Active trials.
BZDATETIME::2012-08-31 12:20:48
BZCOMMENTOR::Bob Kline
BZCOMMENT::156
(In reply to comment #155)
> it will be safe here also not to require the address information
I modified the software again, this time to drop an address block if all the elements are empty. Tried to re-import the trial, but you've got it locked.
BZDATETIME::2012-08-31 12:25:53
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::157
(In reply to comment #156)
> (In reply to comment #155)
>
> > it will be safe here also not to require the address
information
>
> I modified the software again, this time to drop an address block
if all the
> elements are empty. Tried to re-import the trial, but you've got it
locked.
Sorry. It is unlocked now.
BZDATETIME::2012-08-31 12:29:34
BZCOMMENTOR::Bob Kline
BZCOMMENT::158
I re-imported. This time a publishable version was created.
BZDATETIME::2012-08-31 13:55:31
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::159
(In reply to comment #158)
> I re-imported. This time a publishable version was created.
Thanks. I am trying to map a few more trials but the Mapping Gaps page has become very slow. It takes about 25secs to add a site but an additional 2mins 25secs for the page to completely reload to the point where I can add another site.
BZDATETIME::2012-08-31 14:42:05
BZCOMMENTOR::Bob Kline
BZCOMMENT::160
(In reply to comment #159)
> Thanks. I am trying to map a few more trials but the Mapping
Gaps page has
> become very slow. It takes about 25secs to add a site but an
additional 2mins
> 25secs for the page to completely reload to the point where I can
add another
> site.
I assume that will be a temporary problem which will be alleviated once the initial pile of unmapped values has been taken care of, and we only have to deal with the (by comparison) occasional new Organization or geographic name. Would it be helpful if I temporarily throttled the program so that it only shows you one or two of the trials in the import queue? Right now it has to retrieve and parse every document in the queue.
BZDATETIME::2012-08-31 14:52:25
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::161
(In reply to comment #160)
> (In reply to comment #159)
>
> > Thanks. I am trying to map a few more trials but the Mapping
Gaps page has
> > become very slow. It takes about 25secs to add a site but an
additional 2mins
> > 25secs for the page to completely reload to the point where I
can add another
> > site.
>
> I assume that will be a temporary problem which will be alleviated
once the
> initial pile of unmapped values has been taken care of, and we only
have to
> deal with the (by comparison) occasional new Organization or
geographic name.
> Would it be helpful if I temporarily throttled the program so that
it only
> shows you one or two of the trials in the import queue? Right now
it has to
> retrieve and parse every document in the queue.
Yes. For now, if you can show only these two, that will be
okay:
NCI-2009-00469 597649 NCT00693992
NCI-2011-02070 582632 NCT00626990
BZDATETIME::2012-08-31 15:59:05
BZCOMMENTOR::Bob Kline
BZCOMMENT::162
(In reply to comment #161)
> Yes. For now, if you can show only these two, that will be
okay:
> NCI-2009-00469 597649 NCT00693992
> NCI-2011-02070 582632 NCT00626990
Well, that's not quite what I had in mind by "throttle." I was just going to have it give you the next trial or two in the queue, rather than modifying the software each time we needed to look at different specific trials. So instead I've modified the code so that if you append (for example)
&trial=NCI-2009-00469
to the end of the url for the report, you'll get the mapping gaps just for that trial. Of course, you'll want to take the line "1 Trials Queued For Site Import (1 With Mapping Gaps)" with a grain of salt: there's really more than 1; we've just artificially made the program find only one of them.
BZDATETIME::2012-08-31 16:19:46
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::163
(In reply to comment #162)
> (In reply to comment #161)
>
> > Yes. For now, if you can show only these two, that will be
okay:
> > NCI-2009-00469 597649 NCT00693992
> > NCI-2011-02070 582632 NCT00626990
>
> Well, that's not quite what I had in mind by "throttle." I was just
going to
> have it give you the next trial or two in the queue, rather than
modifying the
> software each time we needed to look at different specific trials.
So instead
> I've modified the code so that if you append (for example)
>
> &trial=NCI-2009-00469
>
It does display one trial until a document is added and then it reloads
with all 61 trials again. I am working around it by pasting the url back
in the address each time it reloads.
BZDATETIME::2012-08-31 16:24:25
BZCOMMENTOR::Bob Kline
BZCOMMENT::164
OK, I have hard-coded the two IDs into the program.
BZDATETIME::2012-08-31 16:25:59
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::165
(In reply to comment #164)
> OK, I have hard-coded the two IDs into the program.
Yes. I can already see the effect. It more than a 100% faster. Thanks you!
BZDATETIME::2012-08-31 16:32:18
BZCOMMENTOR::Bob Kline
BZCOMMENT::166
It looks as if you're using the "Add Doc" link for all of the missing mappings, even for Organizations that I know we've already got in the repository. Can I assume you're just doing this for testing purposes, and that when we're in production CIAT will be using that link only when they've determined that we don't yet have a particular organization in the CDR?
BZDATETIME::2012-08-31 16:34:41
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::167
(In reply to comment #166)
> It looks as if you're using the "Add Doc" link for all of the
missing mappings,
> even for Organizations that I know we've already got in the
repository. Can I
> assume you're just doing this for testing purposes, and that when
we're in
> production CIAT will be using that link only when they've
determined that we
> don't yet have a particular organization in the CDR?
That is correct. We need at least two Active trials for Volker to test publishing.So this is just for testing purposes.
BZDATETIME::2012-08-31 17:13:14
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::168
Please run another import job.
BZDATETIME::2012-08-31 18:40:49
BZCOMMENTOR::Bob Kline
BZCOMMENT::169
(In reply to comment #168)
> Please run another import job.
Done.
BZDATETIME::2012-09-13 10:59:50
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::170
Please promote this Bach.
BZDATETIME::2012-09-13 11:52:10
BZCOMMENTOR::Bob Kline
BZCOMMENT::171
(In reply to comment #170)
> Please promote this Bach.
Promoted:
d:/cdr/Utilities/bin/DownloadCtrpTrials.py
d:/cdr/Utilities/bin/ImportCtrpTrials.py
d:/cdr/Utilities/bin/CTRPDownloads
Turning this over to Volker to set up the Windows scheduler. I recommend kicking off the download job a little before midnight so we don't have it waste time trying to pull down a set that CTRP hasn't had a chance to create yet. Then schedule the import job to follow. Probably best to have them in the same job, so the import kicks off after the download has finished.
BZDATETIME::2012-09-14 11:12:22
BZCOMMENTOR::Volker Englisch
BZCOMMENT::172
(In reply to comment #171)
> Promoted:
>
> d:/cdr/Utilities/bin/DownloadCtrpTrials.py
> d:/cdr/Utilities/bin/ImportCtrpTrials.py
> d:/cdr/Utilities/bin/CTRPDownloads
Is there any specific reason why you are collecting the downloaded content in the bin directory instead of the /cdr/Utilities directory?
BZDATETIME::2012-09-14 11:34:49
BZCOMMENTOR::Bob Kline
BZCOMMENT::173
Just following the pattern of making the path to the documents folder a subdirectory of the script's directory, as we do for all other such imports.
BZDATETIME::2012-09-14 18:49:21
BZCOMMENTOR::Volker Englisch
BZCOMMENT::174
Could these jobs be run on FRANCK for testing without any problems or would it mess up something if the jobs are run multiple times?
BZDATETIME::2012-09-14 18:58:28
BZCOMMENTOR::Volker Englisch
BZCOMMENT::175
I've created a scheduled job on FRANCK that will submit these two
programs one after another. The program is located in the
D:\cdr\publishing
directory and I've named the job
JobmasterCTRP.py
If there will be any other maintenance tasks we can just add those to
this script. The job has been setup to be submitted at 23:30h each night
but has not been enabled yet. I will first need to know if it's OK to
just run it.
BZDATETIME::2012-09-14 20:46:20
BZCOMMENTOR::Bob Kline
BZCOMMENT::176
(In reply to comment #174)
> Could these jobs be run on FRANCK for testing without any problems
or would it
> mess up something if the jobs are run multiple times?
They can be run as many times as you want. The download job won't do anything after it has pulled in the most recent set CTRP has on its server. The download job is run with no command-line arguments. The import job is run with the command-line argument --live .
BZDATETIME::2012-09-17 11:31:50
BZCOMMENTOR::Bob Kline
BZCOMMENT::177
(In reply to comment #176)
> (In reply to comment #174)
> > Could these jobs be run on FRANCK for testing without any
problems or would it
> > mess up something if the jobs are run multiple times?
>
> They can be run as many times as you want. The download job won't
do anything
> after it has pulled in the most recent set CTRP has on its server.
The
> download job is run with no command-line arguments. The import job
is run with
> the command-line argument --live .
I've been trying to tinker with with JobmasterCTRP.py script. I changed the command line argument for the import job to --live. I changed the path for the two CTRP scripts from PUBPATH to UBIN. I added a line
os.chdir(UBIN)
... since the way the download script is written there is an assumption that CTRPDownloads is a subdirectory of the current working directory. I added myself to the Operator Publishing Notification group.
I'm now working on removing some of the residual code from the cloned publishing stuff. Will keep you posted.
BZDATETIME::2012-09-17 12:07:08
BZCOMMENTOR::Bob Kline
BZCOMMENT::178
(In reply to comment #177)
> I'm now working on removing some of the residual code from the
cloned
> publishing stuff. Will keep you posted.
I think it's working now. I like this scheduler better than the one that comes with SQL Server. For one thing, the job status panel updates itself automatically as the job moves through the process.
One remaining fly in the ointment. The job script assumes if it sees the string "Failure" in the output that the job failed to run. However, the download job's logic is to start with the current date and work backwards until it finds a set that it hasn't already pulled down. In the process it tries to pull down the set for a given day, then tries to unpack the zip file. For sets which haven't been created the file that got downloaded won't be a zip file, and we record "Failure opening [name of file] ...." Let's discuss whether it makes more sense to rewrite the download program or the job master script.
I've set the schedule back to 23:30. I think if it runs successfully tonight we can promote it to Bach the next day.
Volker:
Please review the changes I made to the JobmasterCTRP.py script and make
sure I didn't do any serious damage. :-)
BZDATETIME::2012-09-18 12:19:46
BZCOMMENTOR::Volker Englisch
BZCOMMENT::179
The job ran on FRANCK last night and downloaded data.
Bob, do you want to double-check that the data downloaded is what you
expected and that the import job did what it was used to do?
I would then move the job to BACH.
As for the 'Failure' test it is probably better to modify the Jobmaster test for that. I've never been too happy with it but left it due to lack of time to come up with better ideas.
BZDATETIME::2012-09-18 14:14:06
BZCOMMENTOR::Bob Kline
BZCOMMENT::180
(In reply to comment #179)
> The job ran on FRANCK last night and downloaded data.
>
> Bob, do you want to double-check that the data downloaded is what
you expected
> and that the import job did what it was used to do?
Looks good!
BZDATETIME::2012-09-18 14:30:11
BZCOMMENTOR::Bob Kline
BZCOMMENT::181
Here are the first two sets of imported sites in production:
http://bach.nci.nih.gov/cgi-bin/cdr/show-cdr-doc.py?id=585700
http://bach.nci.nih.gov/cgi-bin/cdr/show-ctrp-doc.py?id=NCI-2009-00363
http://bach.nci.nih.gov/cgi-bin/cdr/show-cdr-doc.py?id=486425
http://bach.nci.nih.gov/cgi-bin/cdr/show-ctrp-doc.py?id=NCI-2009-00407
BZDATETIME::2012-09-18 15:27:36
BZCOMMENTOR::Volker Englisch
BZCOMMENT::182
I've copied the scheduled job to FRANCK and BACH:
JobmasterCTRP.py - R10622
This will run Mon-Fri at 11:30pm.
BZDATETIME::2012-09-19 09:27:59
BZCOMMENTOR::Volker Englisch
BZCOMMENT::183
Bob/William, did either of you receive the email notification for the CTRP download/import job? According to the log files the job ran over night but I didn't see the email notification.
By the way, who should get notified? The default is to send the message to William and myself.
BZDATETIME::2012-09-19 09:50:40
BZCOMMENTOR::Bob Kline
BZCOMMENT::184
(In reply to comment #183)
> By the way, who should get notified?
I believe the Jobmaster script uses the Operator Publishing Notification group as the list of recipients for the notification.
BZDATETIME::2012-09-19 10:02:42
BZCOMMENTOR::Volker Englisch
BZCOMMENT::185
That is true but I am thinking about changing it and optionally use a
different DL if necessary.
The publishing DL currently notifies the operator, William, and
myself.
BZDATETIME::2012-09-19 10:11:32
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::186
(In reply to comment #183)
> Bob/William, did either of you receive the email notification for
the CTRP
> download/import job? According to the log files the job ran over
night but I
> didn't see the email notification.
>
I did not receive any notification last night. However, I did receive what appeared to be test publishing notifications yesterday afternoon.
BZDATETIME::2012-09-19 10:14:16
BZCOMMENTOR::Bob Kline
BZCOMMENT::187
(In reply to comment #185)
> That is true but I am thinking about changing it and optionally use
a different
> DL if necessary.
> The publishing DL currently notifies the operator, William, and
myself.
So we have a mystery: why didn't you get notified? Could this be related to the NIH mailer server problem Alan referred to last night? It would be unfortunate (to put it in very polite terms) if NIH didn't just defer delivery of messages when it experienced difficulties, but actually threw the messages away.
BZDATETIME::2012-09-19 11:21:23
BZCOMMENTOR::Alan Meyer
BZCOMMENT::188
(In reply to comment #187)
> ... It would be
> unfortunate (to put it in very polite terms) if NIH didn't just
defer delivery
> of messages when it experienced difficulties, but actually threw
the messages
> away.
I still have not received the messages I sent last night. My guess is that the NIH mail server went totally down and never even accepted mail, much less queued it for later delivery.
BZDATETIME::2012-09-19 11:28:59
BZCOMMENTOR::Volker Englisch
BZCOMMENT::189
(In reply to comment #188)
> I still have not received the messages I sent last night. My guess
is that the
> NIH mail server went totally down and never even accepted mail,
much less
> queued it for later delivery.
My guess is that the client wouldn't have send the message if the
server didn't accept messages.
I wasn't aware that you had email problems last night. This may indeed
be related.
BZDATETIME::2012-09-19 11:41:29
BZCOMMENTOR::Alan Meyer
BZCOMMENT::190
(In reply to comment #189)
> (In reply to comment #188)
> > I still have not received the messages I sent last night. My
guess is that the
> > NIH mail server went totally down and never even accepted
mail, much less
> > queued it for later delivery.
>
> My guess is that the client wouldn't have send the message if the
server didn't
> accept messages.
That's correct. The messages never got out. I guess I should have said something stronger than "my guess".
> I wasn't aware that you had email problems last night. This may
indeed be
> related.
An email from Bugzilla on EBMS printing was rejected. I tried it again from my Comcast account, not remembering for sure whether my desktop and Bugzilla were both using the same NIH mail server. That too failed - demonstrating that it was an NIH problem and not a problem with our mail forwarding software in icicsun as used by Bugzilla. So I sent the email again via the Web from a Yahoo account.
BZDATETIME::2012-09-19 18:09:21
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::191
It looks like I am the only one at CIAT with the rights to the 'CTRP protocols'. Everyone else is getting a schema validation error. It looks like it is the same for the External Mapping Table. I am the one who is able to add values. I looked to see if a group has been created for the 'CTRP protocols' but I didn't see any.
BZDATETIME::2012-09-19 18:20:31
BZCOMMENTOR::Bob Kline
BZCOMMENT::192
We decided we weren't going to create a separate CTRPProtocol document type. These are still CTGovProtocol documents.
BZDATETIME::2012-09-19 18:23:13
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::193
(In reply to comment #192)
> We decided we weren't going to create a separate CTRPProtocol
document type.
> These are still CTGovProtocol documents.
Yes. I am aware of that. But what I was saying is that for CTGov protocols that have CTRP data, I am the only one who is able to access them without getting a schema validation error and I am the only one here who is able to add values to the mapping table. Everyone else gets an error.
BZDATETIME::2012-09-19 18:28:36
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::194
(In reply to comment #193)
> (In reply to comment #192)
> > We decided we weren't going to create a separate CTRPProtocol
document type.
> > These are still CTGovProtocol documents.
>
> Yes. I am aware of that. But what I was saying is that for CTGov
protocols that
> have CTRP data, I am the only one who is able to access them
without getting a
> schema validation error and I am the only one here who is able to
add values to
> the mapping table. Everyone else gets an error.
They get the following error message when they open a CTGovProtocol with CTRP data "document does not conform to DTD or XML schema" and they get the following error when they try to update the mapping table "User XXXXX not authorized to edit CTRP_PO_ID mappings". However, I am able to view CTGov documents with CTRP data on Bach without any problems and I am able to udpate the mapping table on Bach without any problems.
BZDATETIME::2012-09-19 19:32:51
BZCOMMENTOR::Bob Kline
BZCOMMENT::195
Are they in a group which has the EDIT CTRP MAP authorization?
BZDATETIME::2012-09-20 10:07:21
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::196
(In reply to comment #195)
> Are they in a group which has the EDIT CTRP MAP authorization?
Which group has the EDIT CTRP MAP authorization? I looked at the groups and the only obvious one is the "CTRPProtocol Import Job" which is used by the import job.
BZDATETIME::2012-09-20 10:23:50
BZCOMMENTOR::Bob Kline
BZCOMMENT::197
Probably that group and the Developer group. You probably want to grant that permission to one of the existing groups, or create a new group if that's more appropriate, and put the right people in that group.
BZDATETIME::2012-09-20 10:49:50
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::198
(In reply to comment #197)
> Probably that group and the Developer group. You probably want to
grant that
> permission to one of the existing groups, or create a new group if
that's more
> appropriate, and put the right people in that group.
That worked and the "User XXXXX not
authorized to edit CTRP_PO_ID mappings" error is now fixed. How do I fix
the other schema validation error ?
BZDATETIME::2012-09-20 12:15:59
BZCOMMENTOR::Bob Kline
BZCOMMENT::199
Have you confirmed that they're still getting the validation errors now that they have permission to update the external mappings table?
BZDATETIME::2012-09-21 10:02:29
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::200
It looks like the CTGov Import program is overwriting the CTRP Info
block whenever there is an update. Examples:
486425
585700
BZDATETIME::2012-09-21 10:14:56
BZCOMMENTOR::Volker Englisch
BZCOMMENT::201
Since the scheduling is done (and working) now I'll throw this back to Bob.
BZDATETIME::2012-09-21 10:57:54
BZCOMMENTOR::Bob Kline
BZCOMMENT::202
I modified the CT.gov import software to preserve the CTRPInfo block when we get a new version of the document from NLM. I reimported the CTRP sites, set the disposition to the two trials (the IDs you gave weren't just example; they're the only two we've imported from CTRP so far) to "Import requested" and ran the CT.gov import job import again. I then confirmed that the CTRPInfo block is still present in both documents. Please review the documents and confirm that they look correct to you.
BZDATETIME::2012-10-05 09:59:00
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::203
We have finally received one new trial that was successfully updated with CTRP sites. As we discussed in yesterday's CDR meeting I am closing this issue. Thank you, Bob!
File Name | Posted | User |
---|---|---|
8-30-12 CTRP List.xlsx | 2012-08-30 10:05:06 | Osei-Poku, William (NIH/NCI) [C] |
CTRP to cancer gov daily update - Requirements lg.doc | 2010-10-21 13:43:20 | |
CTRP Trials 8-1-12.xlsx | 2012-08-07 12:50:25 | Osei-Poku, William (NIH/NCI) [C] |
mappings.xls | 2011-11-15 17:08:21 | Osei-Poku, William (NIH/NCI) [C] |
one-time-ctrp-po-mappings.xls | 2012-02-13 11:04:34 | |
pretend_mapping_one.xls | 2011-12-06 12:08:39 | Osei-Poku, William (NIH/NCI) [C] |
pretend_mapping_three.xls | 2011-12-27 11:59:59 | Osei-Poku, William (NIH/NCI) [C] |
pretend_mapping_two.xls | 2011-12-14 16:02:14 | Osei-Poku, William (NIH/NCI) [C] |
pretend_mapping.xls | 2011-11-02 13:17:25 | Osei-Poku, William (NIH/NCI) [C] |
rss-trial-ids-2011-12-21 | 2011-12-22 09:21:49 | |
trial-ids-2011-12-20 | 2011-12-21 12:18:42 |
Elapsed: 0:00:00.001708