Issue Number | 3141 |
---|---|
Summary | [CTgov Transfer] Identify Duplicate Trials |
Created | 2010-05-07 11:42:14 |
Issue Type | Improvement |
Submitted By | Osei-Poku, William (NIH/NCI) [C] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2010-08-09 16:20:33 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.107469 |
BZISSUE::4824
BZDATETIME::2010-05-07 11:42:14
BZCREATOR::William Osei-Poku
BZASSIGNEE::Bob Kline
BZQACONTACT::William Osei-Poku
Trials that have been marked for transfer but have corresponding CTGov duplicates in the CDR pose a significant challenge to CIAT staff because most of these trials do not convert as expected and it takes significant amount of time to research, identify or determine what is wrong and process them. Some of these trials can be found on the CTGov Duplicate report but others cannot be found on the report because they have not been marked as such.
In order to identify the trials that fall into the category of trials that have corresponding CTGov duplicates, we need to identify the following set of trials. These trials will usually have some or all of the following characteristics:
i. They have NCT IDs.
ii.They have not been blocked from publication (I am not sure why we suggested this because some of the trials will indeed be blocked. That is, in cases where we kept the CTGov Protocol and blocked the InScope Protocol. I guess we want to be able to identify these trials also).
iii. They have been published (Also, I am not sure if this should really be required. I think it is OK to look at the trial regardless of the publication status. It may well happen that users identified the trial as a duplicate just before the trial was published.)
iv. They do have a string that in most cases read, ‘Duplicate of CDRXXXXX’ or ‘Dupe of CDRXXXX’ or ‘Dup of CDRXXXX’ in the comments field of the ProtocolIDS block of the InScope protocol document.
To identify these records, look for the existence of the Dupe/Duplicate/Dup string in the comments field of the ProtocolIDs block, take the NCT ID in the InScope Protocol, if present, and search to see if there is an existing CTGov document with the same NCT ID.
Additional information that may be helpful:
i. For newer trials, there will be a CTGovDuplicate element with a link to the CTGov document. For older trials, the above element will not be present.
ii. The CDR IDs of the CTGov Protocol document may be in the comments string of the ProtocolIDS block.
BZDATETIME::2010-05-07 12:07:53
BZCOMMENTOR::Kim Eckley
BZCOMMENT::1
Adding offline email thread on this topic that Bob already started. Initial results included.
Hi Bob-
The search should be in all the comment fields in the various ID fields
(not through all the comment in the protocol document), not just the
primary ID field. The other criteria sound correct though.
OK. http://bach.nci.nih.gov/DupeComments2.html
From: Bob Kline bkline@rksystems.com
Sent: Friday, May 07, 2010 10:37 AM
To: Lakshmi Grama; Margaret Beckwith; Volker Englisch; Ning Yu; Judy
Morris; Alan Meyer; Eckley, Kimberly A
Subject: Protocols with 'dup' comments
I have collected all of the comments with 'dup' and posted them at http://bach.nci.nih.gov/DupeComments.html for review. Whoever has the best handle on what should be done with this information should create an issue in Bugzilla capturing that understanding. Just to make sure we're all on the same page, what I remember from yesterday was that we're only looking at comments whose XPath is /InScopeProtocol/ProtocolIDs/PrimaryID/Comment in documents which are published (i.e., actually on Cancer.gov) and which have an OtherID block with type 'ClinicalTrials.gov ID' and ID string beginning 'NCT' - right?
BZDATETIME::2010-05-11 11:29:17
BZCOMMENTOR::Bob Kline
BZCOMMENT::2
(In reply to comment #0)
I'm hoping for clarification of the "I'm not sure ..." passages in the original request. Also, I'd like to verify that the following description:
> To identify these records, look for the existence of the
Dupe/Duplicate/Dup
> string in the comments field of the ProtocolIDs block, take the NCT
ID in the
> InScope Protocol, if present, and search to see if there is an
existing CTGov
> document with the same NCT ID.
... means: "for all documents containing the string "dup" in a Comment descendant element of the ProtocolIDs block" find the OtherID/IDString in that document with a sibling OtherID/IDType having the value 'ClinicalTrials.gov ID' and look for that value in the query_term name with a path of '/CTGovProtocol/IDInfo/NCTID'; is that right?
> i. For newer trials, there will be a CTGovDuplicate element with
a link to the
> CTGov document. For older trials, the above element will not be
present.
>
> ii. The CDR IDs of the CTGov Protocol document may be in the
comments string of
> the ProtocolIDS block.
What should I do with this information (if anything)?
BZDATETIME::2010-05-14 11:56:12
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::3
I have attached the notes from yesterday's discussion. Please review and let me know if I missed anything or if there is anything I did not present well.
Attachment Duplicate Table.docx has been added with description: Notes from 5132010 meeting
BZDATETIME::2010-05-14 16:26:13
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::4
William, thanks for the notes. I will post a version with my edits.
Next steps seem to be:
1. Change the CTGOVDuplicate element to have one of two attributes (cdr:ref) or a string for an NCTID. Make sure that the attribute values can be see in the CSS for the InscopeProtocol document. (bob, Volker)
2. For the 111 rows in the CTGOV Import table that have a disposition of "Duplicate" with a CDRID for a CTGOV Protocol document, globally add the CDRID in the cdr:ref attribute for the CTGOVDuplicate element. (LM CIAT will have to follow the procedures we have outlined where they check for the presence of the element and notify Bob and William that the row be deleted from the CTGOVImport table so that the CTGOV import program that deals with transferred trials can do its trick. William or ZTECH CIAT will double check to make sure that the CTGOV trial document is blocked.
3. Add NCTID to the duplicate identification report. Bob will check if the NCTID in the report are matched in the 592 Inscope Protocol rows identified as dups in CTGOV Import table. These trials should have CTGOVDup element added with the NCTID in the string. Prior to this step, ZTECH CIAT should do some manual checks that this is the appropriate thing to do. Once this is done, LM CIAT will follow the same procedure as we have outlined before to inform Bob and William whenever there is a transferred trial with a CTGOV Dup element so that Bob can delete the row from the table.
4. For the remaining rows in the duplicate report, CIAT will do a review hopefully these will not require any additional work since the duplicate comment may not be relevant to the CTGOV record.
William, Kim, Bob, and others please review and let me know if this makes sense.
BZDATETIME::2010-05-17 12:56:49
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::5
(In reply to comment #4)
> 3. Add NCTID to the duplicate identification report. Bob will
check if the
> NCTID in the report are matched in the 592 Inscope Protocol rows
identified as
> dups in CTGOV Import table. These trials should have CTGOVDup
element added
> with the NCTID in the string. Prior to this step, ZTECH CIAT should
do some
> manual checks that this is the appropriate thing to do. Once this
is done, LM
> CIAT will follow the same procedure as we have outlined before to
inform Bob
> and William whenever there is a transferred trial with a CTGOV Dup
element so
> that Bob can delete the row from the table.
>
Some of the trials on the duplicate identification report do not need to be transferred and they can be identified by the CTGovOwnershipTransferContactResponse value of 'Transfer not required' in the CTGovOwnershipTransferContactLog block. It will be good to have another column in the duplicate identification report which reports any value (not just the 'Transfer not required' value) in the CTGovOwnershipTransferContactResponse element, when the element is present. Bob does not need to do anything with this information but it will be helpful to users.
BZDATETIME::2010-05-17 17:57:34
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::6
That is a good idea, William. Kim, please review ASAP
BZDATETIME::2010-05-18 08:22:35
BZCOMMENTOR::Bob Kline
BZCOMMENT::7
(In reply to comment #4)
> 1. Change the CTGOVDuplicate element to have one of two
attributes
> (cdr:ref) or a string for an NCTID.
There was discussion of capturing both when we have both (that is, "... and/or a string for an NCT ID"); did we decide against this, or would we want to have both values in the document when appropriate?
BZDATETIME::2010-05-18 08:45:37
BZCOMMENTOR::Kim Eckley
BZCOMMENT::8
(In reply to comment #4)
> William, thanks for the notes. I will post a version with my
edits.
Lakshmi- Did you post a version? I want to make sure I'm reviewing the current files.
BZDATETIME::2010-05-18 09:31:40
BZCOMMENTOR::Kim Eckley
BZCOMMENT::9
(In reply to comment #4)
> William, thanks for the notes. I will post a version with my
edits.
> Next steps seem to be:
> 2. For the 111 rows in the CTGOV Import table that have a
disposition of
> "Duplicate" with a CDRID for a CTGOV Protocol document, globally
add the CDRID
> in the cdr:ref attribute for the CTGOVDuplicate element.
What document are you referring to here? I'm a little confused on this one. Sounds like the CTGovProtocol document, but I didn't think the duplicate element was in that document.
BZDATETIME::2010-05-19 11:32:04
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::10
(In reply to comment #9)
> (In reply to comment #4)
> > William, thanks for the notes. I will post a version with my
edits.
> > Next steps seem to be:
> > 2. For the 111 rows in the CTGOV Import table that have a
disposition of
> > "Duplicate" with a CDRID for a CTGOV Protocol document,
globally add the CDRID
> > in the cdr:ref attribute for the CTGOVDuplicate element.
>
> What document are you referring to here? I'm a little confused on
this one.
> Sounds like the CTGovProtocol document, but I didn't think the
duplicate
> element was in that document.
The CTGOVDuplicate element in the InscopeProtocol document
BZDATETIME::2010-05-19 11:33:35
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::11
(In reply to comment #7)
> (In reply to comment #4)
>
> > 1. Change the CTGOVDuplicate element to have one of two
attributes
> > (cdr:ref) or a string for an NCTID.
>
> There was discussion of capturing both when we have both (that is,
"... and/or
> a string for an NCT ID"); did we decide against this, or would we
want to have
> both values in the document when appropriate?
Ok with me if it works easier for CIAT and for programs to have both
BZDATETIME::2010-05-21 12:50:29
BZCOMMENTOR::Bob Kline
BZCOMMENT::12
Waiting for feedback from Kim.
BZDATETIME::2010-05-21 13:24:38
BZCOMMENTOR::Kim Eckley
BZCOMMENT::13
(In reply to comment #12)
> Waiting for feedback from Kim.
Review of next steps in comment#4 seem appropriate.
Regarding comment 7 & 11 - seeing the ref and string when appropriate is helpful.
BZDATETIME::2010-05-21 15:19:14
BZCOMMENTOR::Bob Kline
BZCOMMENT::14
(In reply to comment #10)
> > > 2. For the 111 rows in the CTGOV Import table that
have a disposition of
> > > "Duplicate" with a CDRID for a CTGOV Protocol document,
globally add the
> > > CDRID in the cdr:ref attribute for the CTGOVDuplicate
element.
> >
> > What document are you referring to here? I'm a little confused
on this one.
> > Sounds like the CTGovProtocol document, but I didn't think the
duplicate
> > element was in that document.
>
> The CTGOVDuplicate element in the InscopeProtocol document
I'm afraid I'm no less confused than Kim on this one. I tried to understand the instructions for this by manually following the trail of an actual case. One of the 111 rows in the ctgov_import table with disposition of duplicate and cdr_id matching a CTGovProtocol document is NCT00001804, which has 66965 in the cdr_id column for the row, and CDR66965 is a CTGovProtocol document, so this case should match what you're describing here. So now my next step is presumably to find an InScopeProtocol document for which I should put something in the CTGovDuplicate element. How would I go about finding such a document? The query
SELECT *
FROM query_term
WHERE value in ('NCT00001804', 'NCT00019708')
comes up with only the following row:
66965 /CTGovProtocol/IDInfo/NCTID NCT00019708 19708 00010005
... in other words, the CTGovProtocol document we found in the ctgov_import table, marked as the one of which NCT00001804 is a duplicate.
BZDATETIME::2010-05-21 17:41:40
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::15
(In reply to comment #14)
It looks like the reason you are getting this result is because 66965 is already a transferred trial and *generally should not be in the ctgov_import table as a duplicate.
The NCT ID for the InScopeProtocol version of this trial (66965) was changed (from NCT00001804 to NCT00019708) before it was converted to a ctgov document with the same cdr id 66965. From the ctgov duplicate report, NCT00001804 is marked as a duplicate of the cdr id 66965. NCT00019708 is in the Doc Title column of the ctgov import table. So this record has had two NCT IDs assigned to it in the past and the one that is currently marked as a duplicate is NCT00001804 and not NCT00019708. I believe this is why the conversion worked without any problems.
Typically, if a trial is in the ctogov_import table as duplicate, it will not convert automatically. We will have to remove it from the table before it converts. Prior to version 36, there was no NCT ID. In version 36 NCT00001804 was initially added as the NCT ID per request #1601”. In Version 53, the NCT ID was changed to NCT00019708, by “Inserting NCTID from CTGovProtocol download Job”, which I assume was the time when the NCT ID was changed on clinicaltrials.gov.
This trial falls into another category of transfer trials we have not discussed extensively. These are trials that were duplicates in clinicaltrials.gov and NLM ‘merged’ the records and marked our record's NCT ID as the duplicate (or alias) NCT ID.
Question:
On the CTGov Duplicate report, NCT00001804 is under the “Trials marked
as duplicates prior to CTGOV imports” label. Could this be the reason
why the transfer worked? I believe when we mark a trial as a duplicate
it is placed under “Trials marked as duplicates by CIAT” label.
BZDATETIME::2010-05-28 11:08:34
BZCOMMENTOR::Bob Kline
BZCOMMENT::16
(In reply to comment #15)
> It looks like the reason you are getting this result is because
66965 is
> already a transferred trial and *generally should not be in the
ctgov_import
> table as a duplicate.
OK, let's back up. My goal is to understand exactly what I'm supposed to do for the instructions:
2. For the 111 rows in the CTGOV Import table that have a
disposition of "Duplicate" with a CDRID for a CTGOV
Protocol document, globally add the CDRID in the cdr:ref
attribute for the CTGOVDuplicate element.
I attempted to accomplish that goal by trying to walk through the process mentally for one of the trials. I picked the first trial I found in the set, and I gather that what you're telling me in comment #15 is that I had the misfortune of picking a trial that by chance didn't match the conditions appropriate for the instructions. So maybe the best thing to do next is for you to identify a trial to which the instructions would be applied from the set of 111 trials. Then I'll look at that trial and see if I can figure out what I'm supposed to do.
Thanks!
BZDATETIME::2010-05-28 11:15:05
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::17
(In reply to comment #16)
> (In reply to comment #15)
>
> > It looks like the reason you are getting this result is
because 66965 is
> > already a transferred trial and *generally should not be in
the ctgov_import
> > table as a duplicate.
>
> OK, let's back up. My goal is to understand exactly what I'm
supposed to do
> for the instructions:
>
> 2. For the 111 rows in the CTGOV Import table that have a
> disposition of "Duplicate" with a CDRID for a CTGOV
> Protocol document, globally add the CDRID in the cdr:ref
> attribute for the CTGOVDuplicate element.
>
> I attempted to accomplish that goal by trying to walk through the
process
> mentally for one of the trials. I picked the first trial I found in
the set,
> and I gather that what you're telling me in comment #15 is that I
had the
> misfortune of picking a trial that by chance didn't match the
conditions
> appropriate for the instructions. So maybe the best thing to do
next is for
> you to identify a trial to which the instructions would be applied
from the set
> of 111 trials. Then I'll look at that trial and see if I can figure
out what
> I'm supposed to do.
>
> Thanks!
Can you post a link to the 111 trials you identified?
BZDATETIME::2010-05-28 11:36:56
BZCOMMENTOR::Bob Kline
BZCOMMENT::18
(In reply to comment #17)
> Can you post a link to the 111 trials you identified?
Sure.
BZDATETIME::2010-05-28 13:43:44
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::19
(In reply to comment #17)
>> So maybe the best thing to do next is for
> > you to identify a trial to which the instructions would be
applied from the set
> > of 111 trials. Then I'll look at that trial and see if I can
figure out what
> > I'm supposed to do.
> >
I found only one example on the list that the instruction may apply:
NCT00109031 430886.
The instructions may apply to this one because 430886 has a corresponding InScope trial 437057 with the same NCT ID. However, I manually searched by the NCT IDs on the list for a lot of the trials on the list of 111 and did not find ones with corresponding InScope trials.
BZDATETIME::2010-05-28 14:42:37
BZCOMMENTOR::Bob Kline
BZCOMMENT::20
(In reply to comment #19)
> I found only one example on the list that the instruction may apply: ....
In that case, would a global change job really be appropriate?
BZDATETIME::2010-06-01 09:10:35
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::21
(In reply to comment #20)
> (In reply to comment #19)
>
> > I found only one example on the list that the instruction may
apply: ....
>
> In that case, would a global change job really be appropriate?
That would not be appropriate since there are so few with matching Inscope protocols. We may need to go with the plan in the notes (under 'Actions needed')
i. Generate a report which includes NCT IDs
ii. CIAT-ZTECH Reviews the report and takes out rows that are not
relevant
iii. Bob takes the NCT IDS and looks for them in CTGov import
table
iv. Some will have CTGov protocols docs – i.e. CTGov CDR IDs.
v. Add CTGovDuplicate element to these
vi. Some will NOT have CTGOV protocol records
vii. Add CTGovDuplicate element
BZDATETIME::2010-06-08 14:20:39
BZCOMMENTOR::Bob Kline
BZCOMMENT::22
(In reply to comment #2)
> I'm hoping for clarification of the "I'm not sure ..." passages
in the
> original request.
I never did get any response to that request, so I interpreted William's questions as requests that the original report be modified along the lines of his speculation, and since no one objected, I went ahead and made those modifications (ignoring publication or blocking statuses), handling the first step in William's comment #21:
> i. Generate a report which includes NCT IDs
BZDATETIME::2010-06-08 14:31:11
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::23
(In reply to comment #22)
> (In reply to comment #2)
>
> > I'm hoping for clarification of the "I'm not sure ..."
passages in the
> > original request.
>
> I never did get any response to that request, so I interpreted
William's
> questions as requests that the original report be modified along
the lines of
> his speculation, and since no one objected, I went ahead and made
those
> modifications (ignoring publication or blocking statuses), handling
the first
> step in William's comment #21:
>
Actually, we discussed this in the CDR meeting and Lakshmi agreed that
we should ignore publication and blocking statuses.Sorry, I should have
updated the issue.
BZDATETIME::2010-06-18 14:01:46
BZCOMMENTOR::Bob Kline
BZCOMMENT::24
William indicated that CIAT are in the process of reviewing the content of the most recent report. Lakshmi urged an expedited schedule for this review, in light of the fact that applying the result will resolve many of the loose ends that CIAT is chasing, as well as a very confusing set of scenarios.
The attached image captures the whiteboard notes summarizing all of the different possible cases involved. Perhaps William can post a comment providing a specific example trial for each case, including all CDR and NCT IDs associated with the trial, and chronology for the relevant events for the trial (with rationale for why actions were taken where it wouldn't be obvious). I think such examples will go a long way toward lifting the fog which has obscured these trials for so many of us.
Attachment P1020224.JPG has been added with description: Lakshmi's whiteboard notes
BZDATETIME::2010-06-30 10:26:50
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::25
I am attaching the report. I have removed all the rows that do not appear to refer to be relevant.
Attachment Duplicate comments.xlsx has been added with description: Duplicate comments report
BZDATETIME::2010-06-30 11:10:25
BZCOMMENTOR::Bob Kline
BZCOMMENT::26
How's progress on the set of example trials for each of the cases identified in Lakshmi's whiteboard notes (see comment #24)?
BZDATETIME::2010-06-30 15:48:04
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::27
(In reply to comment #26)
1. InScope/CTGov duplicate with one blocked and the other active
i. When the InScope protocol has a corresponding CTGov protocol in the CDR that is blocked from publication.
InScope CDR 550133 - NCT00380029
CTGov CDR 511871 - NCT00380029
We first received the ctgov protocol, processed and published it. We later received the InScope trial, processed and published it. The CTGov protocol was blocked almost at the same time that the InScope trial was published. That was how we dealt with duplicate ctgov trials then. In this case, the NCT ID for the two trials is the same, however, in some cases, the NCT IDs may be different, with one marked as an alias by NLM.
ii. When the CTGov protocol has a corresponding InScope protocol in the CDR that is blocked from publication.
Inscope CDR 549851 - NCT00448201
CTGov CDR 542709 - NCT00448201
Fairly recently, instead of keeping the InScope version of the protocol as the active trial, the rules have been reversed to rather keep the ctgov trials active and rather block the InScope protocols. Generally, trials in this category should not pose a problem because they do not need to be transferred but the issue is that NCI owns some of these trials in CTGov. As a result, they will need to be marked for transfer.
In the above two examples, the records may or may not be in the CTGov duplicate report.
2. When a CTGov protocol had previously been marked as a duplicate of an existing InScope protocol (with the CTGov protocol not imported).
InScope CDR 69344, NCT00036959. The CTGov version (NCT00032266) was marked as a duplicate in 2004 and not imported. When CTGov is searched with NCT00032266, it redirects to
NCT00036959. NCT00032266 is an obsolete identifier.
A trial in this category will likely be in the ctgov duplicate report.
3.Transfer without PDQ Notification
CDR0000355155 - NCT00079313
When this happens, during the Friday full load publication, we receive a notification from the PRS system that the trial is a duplicate. What we do is to mark the trial for transfer and block it from ctgov and from publication. If we are lucky and the responsible party has not changed the CDR ID as the Org_Study_ID, the conversion takes place normally, otherwise, the automatic conversion does not work (like the above example). It looks like in such cases, Lakshmi has asked NLM to inform PDQ to initiate transfer process.
Please, let me know if I am missing anything.
BZDATETIME::2010-07-08 08:20:10
BZCOMMENTOR::Bob Kline
BZCOMMENT::28
I researched the timelines for the four cases given by William:
> 1. InScope/CTGov duplicate with one blocked and the other
active
>
> i. When the InScope protocol has a corresponding CTGov protocol in
the CDR
> that is blocked from publication.
>
> InScope CDR 550133 - NCT00380029
> CTGov CDR 511871 - NCT00380029
UNC-LCCC-0521
=============
2006-09-22: received from UNC by NLM as NCT00380029
2006-09-27: CTGovProtocol CDR511871 created
2006-09-28: publishable version created
2006-10-06: CDR511871 published
2007-05-04: protocol received by CIAT
2007-05-12: CDR550133 created (with NCT00380029 as CT.gov OtherID)
2007-05-17: site info imported from Oncore
2007-05-23: CDR550133 info sent to NLM (before publication in
PDQ!!)
2007-06-11: publishable version created
2007-06-15: CDR550133 published; CDR511871 blocked/pulled from
cancer.gov
> ii. When the CTGov protocol has a corresponding InScope protocol
in the CDR
> that is blocked from publication.
>
> Inscope CDR 549851 - NCT00448201
> CTGov CDR 542709 - NCT00448201
UNC-LCCC-0306
=============
2007-04-04: NCT00448201 imported as CTGovProtocol CDR542709
2007-04-12: CDR542709 published
2007-05-10: InScopeProtocol CDR549851 created
2007-06-07: CDR549851 published
2007-06-08: CDR542709 blocked
2010-04-16: CDR549851 blocked; CDR542709 re-published
> 2. When a CTGov protocol had previously been marked as a
duplicate of an
> existing InScope protocol (with the CTGov protocol not
imported).
NCI-02-C-0141L
==============
2002-03-22: PDQ protocol 16594 created (in Oracle-based system)
2002-06-22: imported as InScopeProtocol CDR63944
2004-02-07: NCT00032266 received from NLM; marked as duplicate of
CDR63944
2005-06-23: CDR63944 exported to NLM; assigned ID NCT00036959
2007-06-19: 'NCT00036959' added to CDR63944 as CT.gov OtherID
> 3.Transfer without PDQ Notification
>
> CDR0000355155 - NCT00079313
NHLBI-04-H-0090
===============
2004-01-30: InScopeProtocol CDR355155 created
2004-02-20: CDR355155 published
2005-06-23: CDR355155 exported to NLM, assigned ID NCT00079313
2010-06-21: owner of NCT00079313 changed in CT.gov to NIHCC
2010-06-25: NCT00079313 received from NLM, imported as CTGovProtocol
CDR680977
2010-06-28: CDR355155 blocked; CDR680977 published
BZDATETIME::2010-07-08 13:49:31
BZCOMMENTOR::Bob Kline
BZCOMMENT::29
Still doing my best to figure out what I'm supposed to do here.
(In reply to comment #21)
> ....
> i. Generate a report which includes NCT IDs
> ii. CIAT-ZTECH Reviews the report and takes out rows that are not
relevant
> iii. Bob takes the NCT IDS and looks for them in CTGov import
table
> iv. Some will have CTGov protocols docs – i.e. CTGov CDR IDs.
> v. Add CTGovDuplicate element to these
> vi. Some will NOT have CTGOV protocol records
> vii. Add CTGovDuplicate element
Well, I started to implement the global change which would perform that insertion of the CTGovDuplicate elements, but it wasn't clear to me what that would accomplish. So I figured the best way to clear away the fog would be to just manually walk through the rows of the spreadsheet until I was satisfied that I understood what the global change would do. After only two rows, I'm beginning to think there might not be any way to do anything useful with such a global change. The two trials in those rows are DRCI-99052 (Phase III Randomized Study of Neoadjuvant Cisplatin and Fluorouracil With or Without Docetaxel Followed By Chemoradiotherapy in Patients With Locally Advanced Squamous Cell Carcinoma of the Head and Neck) and DMS-9939 (Phase I Study of Neoadjuvant Docetaxel and Carboplatin Followed by Capecitabine and Docetaxel With Concurrent External Beam Radiotherapy in Patients With Stage II or III Carcinoma of the Esophagus or Gastroesophageal Junction). Here's the chronology for the trials:
DFCI-99052
==========
1999-10-29: PDQ protocol 14602 created (in Oracle-based system)
2002-06-22: imported as InScopeProtocol CDR67407
2004-11-05: 'NCT00004214' added to CDR67407 (V15) as CT.gov
OtherID
2005-06-23: NLM creates first [sic] version of NCT00004214 ❗
2006-01-06: NLM gets EFC6043 - RP-56976-V-324 from Sanofi-Aventis;
NCT00273546 created
2008-07-04: NCT00273546 imported from NLM as CTGovProtocol
CDR600247
2008-07-07: CDR67407, version 24: CT.gov OtherID changed to
'NCT00273546'
2008-07-08: CDR67407 pulled from cancer.gov; CDR600247 published
ctgov_import table has 'NCT00273546' -> CDR600247 (disposition 'imported')
DMS-9939
========
2001-02-16: PDQ protocol 15772 created (in Oracle-based system)
2002-06-22: imported as InScopeProtocol CDR68542
2004-11-05: 'NCT00014417' added to CDR68542 (V7) as CT.gov OtherID
2005-06-23: NLM creates first [sic] version of NCT00014417 ❗
2005-09-09: NLM receives D-9939 from Dartmouth-Hitchcock; creates
NCT00153881
2010-05-19: NCT00153881 received from NLM; imported as CDR674027 &
published
2010-05-25: CDR68542 blocked
ctgov_import table has 'NCT00153881' -> CDR674027 (disposition 'imported')
So for the first trial the CT.gov OtherID has 'NCT00273546' which the global change would be able to find in the ctgov_import table, where it would be able to pick up the CDR ID 600247. In that case it might be reasonable to add a CTGovDuplicate element with 'NCTID' attribute of 'NCT00273546' and 'cdr:ref' attribute of 'CDR0000600247'. But for the second trial the CT.gov OtherID has 'NCT00014417', for which the global change won't find any rows in the ctgov_import table. So the "best" the program could do would be to add a CTGovDuplicate element with 'NCTID' attribute of 'NCT00014417'. The first question that comes to mind here is: If we just copy the OtherID value into the NCTID attribute of the CTGovDuplicate element, aren't we just duplicating information without adding any useful value? The next question is: What does "Duplicate" mean in this context, if we stuff NCT IDs in the attribute even when the ID refers to a document created by NLM as a copy of an InScopeProtocol document we exported?
A human might be able to determine what to do on a case-by-case basis, relying in part on the contents of the comments in the spreadsheet, but there is too much variation in the wording of the comments to expect a program to be able to interpret every case correctly. I think we need to discuss next steps a little more in depth at this afternoon's meeting.
BZDATETIME::2010-07-12 15:12:40
BZCOMMENTOR::Bob Kline
BZCOMMENT::30
What are the next steps for this issue?
BZDATETIME::2010-07-16 11:53:48
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::31
(In reply to comment #30)
> What are the next steps for this issue?
I am not sure if my recollection of our previous discussion is accurate but the next step should have been for you to take the NCT IDs and look for them in the CTGov import table and add the CTGovDuplicate element to the affected trials. That is, for the trials you find in the ctgov_import table. However, you raised some questions about finding trials (that were referenced in the comments) that are not really duplicates and I believe we agreed that even in those cases, it is important for the user to know the trial the trial mentioned in the comments.
BZDATETIME::2010-07-27 07:40:08
BZCOMMENTOR::Bob Kline
BZCOMMENT::32
(In reply to comment #31)
> (In reply to comment #30)
> > What are the next steps for this issue?
>
> I am not sure if my recollection of our previous discussion is
accurate but the
> next step should have been for you to take the NCT IDs and look for
them in the
> CTGov import table and add the CTGovDuplicate element to the
affected trials.
> That is, for the trials you find in the ctgov_import table.
However, you raised
> some questions about finding trials (that were referenced in the
comments) that
> are not really duplicates and I believe we agreed that even in
those cases, it
> is important for the user to know the trial the trial mentioned in
the
> comments.
I thought we all understood that it is not feasible to expect software to recognize the mention of a trial in the comments, given the variety of wording in those comments.
BZDATETIME::2010-07-27 16:10:12
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::33
(In reply to comment #32)
> I thought we all understood that it is not feasible to expect
software to
> recognize the mention of a trial in the comments, given the variety
of wording
> in those comments.
That is correct. I was thinking that since all the likely candidates have been identified already, it may be possible to make the comments for those candidate trials stand out, in a different font for example. So for all the trials that you are unable to find any matching in the ctgov_import table, you leave it up to the user to research further but you aid the user by making the comments stand out.
On the other hand, the most important thing is to find matching trials in the ctgov_import table. Depending on how many you are able to find, it may not be necessary to pursue the trials with ambiguous comments.
BZDATETIME::2010-07-28 13:46:24
BZCOMMENTOR::Bob Kline
BZCOMMENT::34
(In reply to comment #33)
> (In reply to comment #32)
> > I thought we all understood that it is not feasible to expect
software to
> > recognize the mention of a trial in the comments, given the
variety of
> > wording in those comments.
>
> That is correct. I was thinking that since all the likely
candidates have been
> identified already, it may be possible to make the comments for
those candidate
> trials stand out, in a different font for example. So for all the
trials that
> you are unable to find any matching in the ctgov_import table, you
leave it up
> to the user to research further but you aid the user by making the
comments
> stand out.
I thought we had this discussion, and agreed that Volker is unable to have different CSS instructions based on the text content of comments. I propose that I run a global change to insert an empty CTGovDuplicate element to the documents reflected by your pruned copy of the spreadsheet, and we have Volker display some prominent text when that element is present. Will that be acceptable (I think it's what we decided to do at the last meeting anyway)?
BZDATETIME::2010-07-28 19:22:22
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::35
(In reply to comment #34)
> I thought we had this discussion, and agreed that Volker is
unable to have
> different CSS instructions based on the text content of comments. I
propose
> that I run a global change to insert an empty CTGovDuplicate
element to the
> documents reflected by your pruned copy of the spreadsheet, and we
have Volker
> display some prominent text when that element is present. Will that
be
> acceptable (I think it's what we decided to do at the last meeting
anyway)?
Yes. This is OK with me. Let's proceed with that.
BZDATETIME::2010-08-02 16:34:56
BZCOMMENTOR::Bob Kline
BZCOMMENT::36
I have added the element to the documents on Mahler. I guess the next step would be to have Volker implement something in the CSS which will make these documents stand out. Want to give it a shot, Volker?
BZDATETIME::2010-08-03 10:27:35
BZCOMMENTOR::Bob Kline
BZCOMMENT::37
Volker's out, so I modified the CSS myself on Mahler. Please take a look.
BZDATETIME::2010-08-03 11:21:26
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::38
(In reply to comment #37)
> Volker's out, so I modified the CSS myself on Mahler. Please take a
look.
Would you be able to add more text? The text could be "Review Before Transfer" and the font for the new text can be smaller than the existing one.
BZDATETIME::2010-08-03 15:35:00
BZCOMMENTOR::Bob Kline
BZCOMMENT::39
CSS modified; please review.
BZDATETIME::2010-08-04 10:55:49
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::40
Verified on Mahler. Please promote to Bach.
BZDATETIME::2010-08-04 11:42:52
BZCOMMENTOR::Bob Kline
BZCOMMENT::41
The global change has been run on Bach and the modified CSS has been promoted. Please verify (and close if OK).
BZDATETIME::2010-08-09 16:20:33
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::42
(In reply to comment #41)
> The global change has been run on Bach and the modified CSS has
been promoted.
> Please verify (and close if OK).
Verified on Bach. Issue closed. Thank you!
File Name | Posted | User |
---|---|---|
Duplicate comments.xlsx | 2010-06-30 10:26:50 | Osei-Poku, William (NIH/NCI) [C] |
Duplicate Table.docx | 2010-05-14 11:56:12 | Osei-Poku, William (NIH/NCI) [C] |
P1020224.JPG | 2010-06-18 14:01:46 |
Elapsed: 0:00:00.000618