Issue Number | 3140 |
---|---|
Summary | [CTgov Transfer] Table for PDQ Admin Info |
Created | 2010-05-07 11:38:14 |
Issue Type | Improvement |
Submitted By | Osei-Poku, William (NIH/NCI) [C] |
Assigned To | alan |
Status | Closed |
Resolved | 2010-07-01 15:22:40 |
Resolution | Won't Fix |
Path | /home/bkline/backups/jira/ocecdr/issue.107468 |
BZISSUE::4823
BZDATETIME::2010-05-07 11:38:14
BZCREATOR::William Osei-Poku
BZASSIGNEE::Alan Meyer
BZQACONTACT::William Osei-Poku
There are over 800 trials that were transferred by creating a new CTGov Protocol document without converting the InScope Protocol. We did a global that copied all the ID information and other elements into the PDQAdminInfo block of the CTGov document. Bob suggested creating a static table that contains all the elements of the PDQAdminInfo block of the CTGov Protocol document (for transferred trials). I think Bob mentioned that the benefit is that, it will speed up a lot of the reports.
BZDATETIME::2010-05-07 11:52:24
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::1
Added Bob to this issue.
BZDATETIME::2010-05-07 12:29:19
BZCOMMENTOR::Bob Kline
BZCOMMENT::2
(In reply to comment #0)
> Bob suggested creating a static table that contains all the
> elements of the PDQAdminInfo block of the CTGov Protocol document
(for
> transferred trials).
That's not exactly what I suggested (though I'm not saying that might not also be useful). What I was suggesting was a table which identified (by document IDs) which documents were transferred by creating a new CTGovProtocol document (with a separate CDR ID) instead of creating a new CTGovProtocol version using the existing CDR document whose earlier versions are InScopeProtocols (as we do now). What I had in mind was a table with two columns, perhaps along these lines:
CREATE TABLE old_trial_transfers
(old_id INTEGER NOT NULL PRIMARY KEY REFERENCES all_docs,
new_id INTEGER NOT NULL UNIQUE REFERENCES all_docs)
BZDATETIME::2010-05-10 17:48:09
BZCOMMENTOR::Alan Meyer
BZCOMMENT::3
I'll create the table that Bob indicated that he needed.
Bob, if you decide you want any of the other stuff in it
that William suggested, let me know. It will probably be
safer to get that data from the query_term table in case
anything changes - although changes may be pretty unlikely.
BZDATETIME::2010-05-10 19:01:54
BZCOMMENTOR::Alan Meyer
BZCOMMENT::4
I've written a script to run Bob's table definition and to
populate it. It ran on Mahler, creating 980 rows.
I ran into a little problem with one of the InScopeProtocols.
It
turns out that there were two InScopeProtocols that pointed to
the same CTGovProtocol - violating the uniqueness constraint for
the mapping. One document had been blocked as a duplicate, but I
couldn't use the fact that it was blocked to reject it because
all of these InScopeProtocols are now blocked. Since there was
only one, rather than try something based on dates or other
criteria, it seemed safer to inspect the two documents by eye and
determine which one was the bad one. I put the doc ID for that
one (CDR0000484669) into an exception list that is checked in the
SQL. I think that should work fine when we run on Bach.
Attachment old_trial_transfers.sql has been added with description: SQL to create and populate the table.
BZDATETIME::2010-05-10 19:02:17
BZCOMMENTOR::Alan Meyer
BZCOMMENT::5
I'm marking the task as resolved-fixed, and will run the script
on Bach and Franck when requested.
BZDATETIME::2010-05-24 10:58:22
BZCOMMENTOR::Alan Meyer
BZCOMMENT::6
It occurs to me that William has no straightforward way to test the table that was produced. To simplify that I'm attaching two dumps of the table, one sorted by new CDR ID, and one sorted by the old ID. The data is from Mahler, which should match what's on Bach since we stopped using the ID change mechanism before the last Mahler refresh.
Some of the trials are suspicious to me. I appended a list of 46 trials where the old_id has a higher numbered CDR ID than the new_id, seemingly indicating that a newly entered InScopeProtocol was replaced by an older CTGovProtocol. The ones I checked all shared a common NCT ID and so, in theory, are the same trial, but the count of 46 is higher than I expected. One I looked at: old_id=589172, new_id=554900, looked particularly suspicious. The trials didn't look the same to me. Could this be a data error?
In any case, when QC'ing the output, those 46 trials might be fruitful places to sample for possible errors.
Attachment old_trial_transfer.txt has been added with description: Table output for testing
BZDATETIME::2010-05-24 11:24:24
BZCOMMENTOR::Alan Meyer
BZCOMMENT::7
I hadn't included the 46 trials referenced in the previous comment in the saved attachment. Here it is again with the additional list.
Attachment old_trial_transfer.txt has been added with description: Table output for testing
BZDATETIME::2010-05-24 14:22:37
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::8
(In reply to comment #4)
> Created attachment 1917 [details]
> SQL to create and populate the table.
>
>
> I've written a script to run Bob's table definition and to
> populate it. It ran on Mahler, creating 980 rows.
>
> I ran into a little problem with one of the InScopeProtocols.
It
> turns out that there were two InScopeProtocols that pointed
to
> the same CTGovProtocol - violating the uniqueness constraint
for
> the mapping. One document had been blocked as a duplicate, but
I
> couldn't use the fact that it was blocked to reject it
because
> all of these InScopeProtocols are now blocked. Since there
was
> only one, rather than try something based on dates or other
> criteria, it seemed safer to inspect the two documents by eye
and
> determine which one was the bad one. I put the doc ID for
that
> one (CDR0000484669) into an exception list that is checked in
the
> SQL. I think that should work fine when we run on Bach.
We received and entered two related trials (CDR0000484655 and CDR0000484669). They were actually two studies under the same IRB. However, when the responsible party registered the trial on clinicaltrials.gov, he registered it as a single trial. When this was discovered, we 'merged' our records into one (CDR0000484655). So CDR0000484655 is the good one and CDR0000484669 is the bad one.
BZDATETIME::2010-05-25 09:49:08
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::9
(In reply to comment #6)
> Created attachment 1929 [details]
> Table output for testing
>
> It occurs to me that William has no straightforward way to test the
table that
> was produced. To simplify that I'm attaching two dumps of the
table, one
> sorted by new CDR ID, and one sorted by the old ID. The data is
from Mahler,
> which should match what's on Bach since we stopped using the ID
change
> mechanism before the last Mahler refresh.
>
> Some of the trials are suspicious to me. I appended a list of 46
trials where
> the old_id has a higher numbered CDR ID than the new_id, seemingly
indicating
> that a newly entered InScopeProtocol was replaced by an older
CTGovProtocol.
Actually, these trials are OKay. They were trials that we initially imported the ctgov versions (from NLM) and later received direct submissions from cancer centers. What we did then was to block the ctgov versions and create a new InScope protocols to replace the ctgov versions. When it came time for us to transfer these trials to the responsible parties, we activated the ctgov trials and blocked the Inscope trials.
> The ones I checked all shared a common NCT ID and so, in theory,
are the same
> trial, but the count of 46 is higher than I expected. One I looked
at:
> old_id=589172, new_id=554900, looked particularly suspicious. The
trials
> didn't look the same to me. Could this be a data error?
>
They are actually the same trial. I suspect the reason they look
different is because the protocols were written by two different
institutions.
> In any case, when QC'ing the output, those 46 trials might be
fruitful places
> to sample for possible errors.
I have checked a lot of these trials and they are all Okay. For most of these trials, the InScope trials have a CTGovDuplicate element that references the CTGov protocol. OCECDR-3141 is addressing some of these duplicate issues.
BZDATETIME::2010-05-25 11:26:41
BZCOMMENTOR::Alan Meyer
BZCOMMENT::10
Thanks for checking the outputs.
I can go ahead and build the new table on Bach and Franck
as soon as someone says to do it.
There's really no risk in doing this. It doesn't disturb
any existing data, just adds a new, static table.
BZDATETIME::2010-05-27 16:33:07
BZCOMMENTOR::Alan Meyer
BZCOMMENT::11
(In reply to comment #10)
> Thanks for checking the outputs.
>
> I can go ahead and build the new table on Bach and Franck
> as soon as someone says to do it.
>
> There's really no risk in doing this. It doesn't disturb
> any existing data, just adds a new, static table.
I'll simplify things and assert here that I will go ahead and
build the new table on Bach and Franck unless someone says
NOT to do it.
Unless I hear otherwise, I'll do it next Tuesday.
BZDATETIME::2010-06-10 22:46:57
BZCOMMENTOR::Alan Meyer
BZCOMMENT::12
I ran the script stored in the attachment to comment #4 on both
Bach and Franck.
To my surprise, 10 more documents were found on Bach than on
Franck. I had thought we refreshed Franck since the last time we
transferred any trials using the old technique, but maybe I'm
wrong.
I looked at a number of them and they looked legitimate, but in
case anyone wants to investigate further, here is the list of
trials found on Bach but not Franck.
old_id new_id
---—
68983 674585
316263 674586
354500 674656
355413 674587
442403 674588
570249 674362
599724 675378
617786 674658
638557 674589
650556 674368
10 row(s)
BZDATETIME::2010-06-11 09:52:11
BZCOMMENTOR::Bob Kline
BZCOMMENT::13
(In reply to comment #12)
> To my surprise, 10 more documents were found on Bach than on Franck.
We should determine the reason for the discrepancy. If the contents of the new table are not stable, as we thought they would be, then we may not be able to rely on using it. I will see if I can figure out what happened; if anyone else beats me to it, please post a note here.
BZDATETIME::2010-06-11 10:30:18
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::14
We have the ability to manually copy the admin info from the InScope documents to the CTGov document. We do this for some of the older trials that already have CTgov versions and will continue to do this. I will review all the trials and confirm this.
BZDATETIME::2010-06-17 11:37:04
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::15
(In reply to comment #14)
> We have the ability to manually copy the admin info from the
InScope documents
> to the CTGov document. We do this for some of the older trials that
already
> have CTgov versions and will continue to do this. I will review all
the trials
> and confirm this.
I can confirm that we recently added the PDQAdminInfo block manually to the trials identified by Alan in comment #12. They were trials that did not convert normally and users had to transfer the admin info manually form the InScope documents.
BZDATETIME::2010-06-28 12:03:19
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::16
We agreed in the cdr meeting of (6/17/10) that it is preferable to programmatically convert inscope documents into ctgov documents in all cases where there are problem transfers and for which CIAT would have to manually transfer the trial by copying admin information from the inscope document to the ctgov document. I think this will be OK in most cases where there are no duplicate CTGOV documents for a particular inscope trial. When there is a duplicate ctgov document of the trial that needs to be transferred, the conversion may not work as expected, at least not until we have implemented additional software changes.
In cases where there is also a ctgov document of the inscope document that needs to be transferred, CIAT would have initially imported the ctgov document ( ) which was registered in ctgov by a cancer center. We also would have received the same trial from the cancer center, processed it and asked ctgov to replace the cancer center's trial with the one being registered by PDQ, to avoid duplicate registration. We should have also blocked the CTGov trial and published the inscope trial. NCI now owns the trial in ctgov and would have to transfer the trial to the responsible party. So we do have the same trial as an inscope trial and also a ctgov trial, which is blocked from publication. This type of delicate will typically be captured by the ctgov duplicate report.
Example:
CDR0000495296 - inscope (currently on cancer.gov and
clinicaltrials.gov)
CDR0000465509 - ctgov (blocked from publication)
While it may be programmatically possible to convert the ctgov document on top of the inscope trial, in essence it will be a suppression of the inscope trial in favor of the ctgov trial since they have different CDR IDS and we are able to achieve this by blocking the inscope trial and copying the admin information into the ctgov document and re-activating the ctgov trial.
BZDATETIME::2010-06-29 11:23:22
BZCOMMENTOR::Alan Meyer
BZCOMMENT::17
It seems to me that, at this point, we have several options for
how to proceed:
1. Give up this lookup table.
We've done without it in the past. I don't know how much of
a time saver it will actually be.
2. Re-run the global change SQL script from time to time.
The script was not written to be run multiple times. I'd
want to make some revisions and repackage it if we wanted to
do that.
3. Update the table by hand.
If CIAT notified us each time a CTGovProtocol were
re-activated to replace an InScopeProtocol, we could simply
add it by hand to the table using SQL.
4. Update the table by hand, with a user interface.
We could provide a very simple user interface to allow
William or whoever he wishes to do it to update the table by
entering the CTGov and InScope CDR IDs of the two documents
into a web form.
5. Try to complete all of the updates at once, drop the
existing
table, and re-create it.
If CIAT can determine that there are exactly N protocols for
which this needs to be done (whatever N is), and if it make
sense for them all to be done now instead of waiting and
doing them individually from time to time, we could get them
all done, then re-create the lookup table.
If the assumptions in option 5 are valid, it would seem to be
the
best option. If we have to convert all of these trials anyway,
and if we know what they are, and if we can do them now instead
of waiting, then we can get it all over with without doing any
extra work (because we'll have to do the work eventually anyway,
we'll just be doing it now instead of later).
However I'm not sure the assumptions are valid, and I'm not
sure
that we have enough information to even know whether they're
valid or not.
If it turns out that option 5 is not practical, then I am
attracted to option 4 - but it depends on someone remembering to
use it when needed - which is also a problem with options 2 and
3. We can automate option 2, but I'm not enthusiastic about
adding yet another program that runs every night and sweeps the
data looking for one specific condition that will hardly ever
occur.
If we do choose option 2, and do want to automate it, I'd be
tempted to produce a data integrity sweep supervisor program that
we plug modules into to do each of the things we want done, and
to log results and send notifications in one centralized way.
The script to update this table could be one of the modules.
BZDATETIME::2010-07-01 14:16:03
BZCOMMENTOR::Bob Kline
BZCOMMENT::18
We have tentatively decided that we will go with option 1 (dispense with creating the table at all), in light of the fact that my assumption that the table would be stable was invalid. Alan will drop the table.
BZDATETIME::2010-07-01 15:22:40
BZCOMMENTOR::Alan Meyer
BZCOMMENT::19
(In reply to comment #18)
> We have tentatively decided that we will go with option 1 (dispense
with
> creating the table at all), in light of the fact that my assumption
that the
> table would be stable was invalid. Alan will drop the table.
I have dropped the tables on all three servers and am marking the issue resolved-wontfix.
File Name | Posted | User |
---|---|---|
old_trial_transfer.txt | 2010-05-24 11:24:24 | |
old_trial_transfer.txt | 2010-05-24 10:58:22 | |
old_trial_transfers.sql | 2010-05-10 19:01:54 |
Elapsed: 0:00:00.002110