CDR Tickets

Issue Number 1806
Summary Develop a mechanism to create Clinical Trial Search strings programmatically
Created 2006-01-20 15:09:09
Issue Type Improvement
Submitted By Grama, Lakshmi (NIH/NCI) [E]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2006-02-16 14:02:03
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.106134
Description

BZISSUE::1957
BZDATETIME::2006-01-20 15:09:09
BZCREATOR::Lakshmi Grama
BZASSIGNEE::Bob Kline
BZQACONTACT::Sheri Khanna

We need a general purpose mechanism to create clinical trial search strings programmatically in various documents, for example, Cancer information summaries, terminology documents.

Cancer.gov has provided a mechanism whereby we can create canned searches for clinical trials. I am attaching the functional requirements document that they provided. for this.

Here are examples of two use cases where we want to implement this search string:

1. For term records that have SemanticType = Drug/Agent and have a Definition element with a Review status of Reviewed, when the document is published, we want to programmatically populate an ExternalRef data element (for the short term) that has text content of "List of active clinical trials using this agent" with a xref attribute http://www.cancer.gov/Search/ClinicalTrialsLink.aspx?id=1234&idtype=1, where id will be equal to the CDRID of the Drug term record, when the drug is being used on active trials that have a publishable version and another text string - List of closed clinical trials using this agent with xref attribute http://www.cancer.gov/Search/ClinicalTrialsLink.aspx?id=1234&idtype=1&Closed=1. We plan to subsequently add a new element to the common schema - "ClinicalTrialSearchString" which we will use in both Term and Summmary documents where we will create this link.

2. For Cancer information summaries, we want to be able to programmatically populate a similar clinical trials search string. This time however, the parameters will be diagnosis where the CDRID will be the id of the diagnosis in the section level metadata. The vendor filter task will specify which sections need to have this link. But the idea is that if the summary is a treatment summary, then in specified sections of the document we will need to create a search string that contains the appropriate parameters for the search. All searches will be for active trials. The tricky issue here is that the diagnosis term that is in the metadata may not be the term that is used on the protocol and we may need to in some cases utilize the terminology hierarchy in order to determine whether to put the link in or not.

Once the general purpose mechanism is developed, I will add vendor filter specific tasks.

Comment entered 2006-01-20 15:10:35 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2006-01-20 15:10:35
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::1

I used dummy IDs in the previous comment. here are some real search strings
Active trials
http://www.cancer.gov/Search/ClinicalTrialsLink.aspx?id=43649&idtype=1

http://www.cancer.gov/Search/ClinicalTrialsLink.aspx?id=43649&idtype=1&closed=1

Comment entered 2006-01-20 15:12:34 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2006-01-20 15:12:34
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::2

Cancer.gov functional specs document

Comment entered 2006-01-20 15:12:34 by Grama, Lakshmi (NIH/NCI) [E]

Attachment ClinicalTrialsLinkingFRD_v1 1.doc has been added with description: Specs for Clinical Trial search URIs

Comment entered 2006-01-31 14:58:03 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2006-01-31 14:58:03
BZCOMMENTOR::Bob Kline
BZCOMMENT::3

Does the software also need to conditionally add the closed trial search string for drug terms only when at least one closed trial with a publishable version exists (the description above specifies this condition for the active trial search but not for the closed trial search)? Is a similar restriction required for the search strings inserted into summaries? For the summaries, which way will the hierarchy need to be traversed? (That is, will the term in the summary be narrower than that applied to the trial or broader?)

Comment entered 2006-02-02 14:39:42 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2006-02-02 14:39:42
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::4

I made this a P2 - Bob, could you discuss approach with Volker so that he can work on this as soon as possible.

Comment entered 2006-02-02 15:24:59 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2006-02-02 15:24:59
BZCOMMENTOR::Bob Kline
BZCOMMENT::5

(In reply to comment #4)
> I made this a P2 - Bob, could you discuss approach with Volker so that he can
> work on this as soon as possible.
>

Should I reassign it to him?

I talked with him about this earlier today. There seems to be some confusion about whether the CDs should have HTML or PDF or something else, and we weren't sure about what the URL was for.

Comment entered 2006-02-02 15:50:11 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2006-02-02 15:50:11
BZCOMMENTOR::Volker Englisch
BZCOMMENT::6

Bob, I guess you added the last comment to the wrong issue, right? Do you have an issue for this? I thought it's not even in Bugzilla at this point.

Comment entered 2006-02-02 16:09:42 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2006-02-02 16:09:42
BZCOMMENTOR::Bob Kline
BZCOMMENT::7

Oops, sorry, I was responding to Lakshmi's comment on this issue, but wasn't paying attention to which issue it was.

Lakshmi:

Do you have answers to the questions in comment #3?

Comment entered 2006-02-03 11:36:35 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2006-02-03 11:36:35
BZCOMMENTOR::Bob Kline
BZCOMMENT::8

Here is a proposed approach to supporting the functionality requested in this issue, as well as an identification of some of the possible problems which might be encountered.

First, a restatement of the requirements, couched in slightly more general terms than used for the original request. As I understand it, we need to add Cancer.gov search URLs (along with some explanatory text) to some of the exported CDR documents when certain conditions are met. These URLs are used to identify clinical trials which are marked with a concept represented by a specific Term document (or with a narrower concept for the term) in the CDR. The URLs should only be added to the exported documents when the Cancer.gov search will find at least one clinical trial meeting the search criteria. In some cases the search is to find active published trials, and in other cases the search is for closed published trials.

XSL/T filters can insert the links, but cannot by themselves make the determination as to whether there are any trials which link to a given term (or a child of that term). We will add a function to the CDR Filter Module which will take the following parameters and return a boolean result indicating whether a trial search would result in a non-empty result set:

  • CDR ID for Term document

  • flag indicating whether the search should find active or closed trials

  • flag indicating whether to include narrower terms

The logic for determining the boolean result to be returned can be summarized as follows:

1. Assemble a set of CDR IDs for the specified term and its children
2. Find trials linking to any of the IDs in this set
3. Determine whether any of these trails have the specified status

There are two basic approaches the CDR server's Filter Module can use to implement this logic. One is to use the query term table and the other is to retrieve the most recent published version of each term and protocol document and parse the documents to extract the term hierarchy and linking information directly. The disadvantage of the first approach is that the results may not match what is published when non-publishable changes have been made to any of the documents involved. The disadvantage of the second approach is that it takes much longer. Caching can be used to mitigate some of the performance hit this would involve. After looking at the existing term caching in the CDR server, I don't think it will be useful for this purpose, because this cache stores a term and its parents, whereas what we need here is a term and its children.

We have at various times discussed possible plans to extend the query term mechanism to have at least one additional table to store query terms extracted from the most recent publishable versions of the documents. Such an extension might be useful for this project.

If we don't choose the approach of using the query_term table for implementing this capability, in addition to the basic problems that the existing term cache needed to solve (when does the cache become invalid? how is it invalidated? how do we prevent conflicts from access by simultaneous threads?) we would have the problem that this cache, unlike the existing cache, could not be built up incrementally as we go along: we would need to parse all of the protocol documents in advance to find all of the terminology links from the latest publishable versions of those documents.

No matter how we implement this, it appears that there will be the potential for what we have determined (about whether a given search will have a non-empty result set) to become obsolete as protocols and terms change. We also discussed the questions of whether the work to make the determination is worth the time and effort. If the search URL for which no trials would be found is inserted into a CDR term or summary document, the harm that is done is relatively minor, particularly if it would be possible to have Cancer.gov display a custom explanation for this case.

Questions for Lakshmi:

1. Do we need to include CTGovProtocols?
2. Am I right in thinking that in the first use case you give (links from drug terms) the reason we're not looking for links to more specific terms (unlike the handling for the summaries) is not because we don't want them, but because we don't expect there to be any for the drug terms? If I'm right, we might be able to simplify the logic slightly (by treating all cases the same in terms of using the term hierarchy). This might not be a net advantage, depending on which approach we choose and the performance implications of that choice.

Alan:

Please review this and let me know if I have come to any incorrect conclusions about how the current terminology caching mechanism works.

Volker:

Please tell me if you agree with my thinking about the division of labor between the XSL/T filters and the CDR Server.

Comment entered 2006-02-07 15:35:33 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2006-02-07 15:35:33
BZCOMMENTOR::Volker Englisch
BZCOMMENT::9

(In reply to comment #8)
...
> Volker:
>
> Please tell me if you agree with my thinking about the division of labor
> between the XSL/T filters and the CDR Server.

I don't think I replied to your question yet.
Yes, I do agree that this seems to be a good approach to create a module which can be called by the filter to give us the information necessary for creating the requested URLs.

Comment entered 2006-02-14 13:08:40 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2006-02-14 13:08:40
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::10

We are going with a simple solution for now. CDR does not need to conditinally insert the trial link based on whether the search will retrieve results or not. I am going to close this issue for now and add an issue for vendor filter fixes for each document type.

Comment entered 2006-02-16 14:01:41 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2006-02-16 14:01:41
BZCOMMENTOR::Bob Kline
BZCOMMENT::11

Closed at status meeting per LG.

Comment entered 2006-02-16 14:02:03 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2006-02-16 14:02:03
BZCOMMENTOR::Bob Kline
BZCOMMENT::12

Really closing.

Attachments
File Name Posted User
ClinicalTrialsLinkingFRD_v1 1.doc 2006-01-20 15:12:34 Grama, Lakshmi (NIH/NCI) [E]

Elapsed: 0:00:00.001485