CDR Tickets

Issue Number 2992
Summary Add function for deduplicating protocol IDs to the filtering module
Created 2009-10-20 13:37:35
Issue Type Improvement
Submitted By Kline, Bob (NIH/NCI) [C]
Assigned To alan
Status Closed
Resolved 2010-02-24 15:40:45
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.107320
Description

BZISSUE::4668
BZDATETIME::2009-10-20 13:37:35
BZCREATOR::Bob Kline
BZASSIGNEE::Alan Meyer
BZQACONTACT::Volker Englisch

We need to be able to publish alternate protocol IDs for CT.gov protocols without repeating IDs. Comparison of IDs should follow rules provided by Cancer.gov for normalization. In addition to removing duplicate IDs from the set of secondary IDs passed in by the filtering code, the function needs to drop IDs which are the same (after normalization) as IDs which will appear elsewhere in the published document. One possible interface would have the filtering code pass in an XML fragment that looks something like:

<ids>
<primary>
<id>a123</id>
<id>b-123</id>
</primary>
<other>
<id>A123</id>
<id>A1234</id>
<id>b 123</id>
<id>b 321</id>
<id>B 321</id>
</other>
</ids>

for which the server would return:

<result>
<id>A1234</id>
<id>b 321</id>
</result>

Doesn't have to look like that: whatever you and Volker determine between you will be the most effective way to accomplish what has to happen.

Comment entered 2009-10-20 16:55:40 by alan

BZDATETIME::2009-10-20 16:55:40
BZCOMMENTOR::Alan Meyer
BZCOMMENT::1

I've coordinated with Bob and Volker and am working on this now.
I'm going to try to have it working by sometime on Thursday.

Volker will be working on modifications to the output filter
for CTGovProtocols to incorporate the new enhancement.

Comment entered 2009-10-21 00:32:17 by alan

BZDATETIME::2009-10-21 00:32:17
BZCOMMENTOR::Alan Meyer
BZCOMMENT::2

I've completed the function to do this, but can't compile or
test yet. I've first got to convert our server makefile from
cvs to svn.

Comment entered 2009-10-22 23:59:29 by alan

BZDATETIME::2009-10-22 23:59:29
BZCOMMENTOR::Alan Meyer
BZCOMMENT::3

I believe this is now working. I've installed it on Mahler.

We decided to use tilde delimited strings instead of the
serial XML provided in the description of this issue in order
to simplify Volker's immediate task of interfacing this to
the vendor filter. The parser for the input is separated so
we can swap in a different one if we decide later to use the
serial XML approach.

An example calling sequence for the function from XSLT is:

document("cdrutil:/dedup-ids/swog1234~ecog-123~~NCI-442~SWOG 1234~ecog-123~Pf-99")

The return value for the above input is:

<?xml version='1.0' encode='UTF-8'?>
<result>
<id>NCI-442</id>
<id>Pf-99</id>
</result>

This example and other details can be found in the server
CdrFilter.cpp source code comment prolog for the function
execXsltDedupIDs().

Comment entered 2009-10-23 14:04:19 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2009-10-23 14:04:19
BZCOMMENTOR::Volker Englisch
BZCOMMENT::4

Yes, I was able to submit a set of protocol IDs and received back a node listing all of the IDs that were not a primary or other ID yet.

Comment entered 2010-02-24 15:40:45 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-02-24 15:40:45
BZCOMMENTOR::Volker Englisch
BZCOMMENT::5

Per discussion at our last status meeting we're closing this issue and create a new one to modify the CTGov Vendor filters to dedupe the secondary IDs.

Elapsed: 0:00:00.001639