Issue Number | 2992 |
---|---|
Summary | Add function for deduplicating protocol IDs to the filtering module |
Created | 2009-10-20 13:37:35 |
Issue Type | Improvement |
Submitted By | Kline, Bob (NIH/NCI) [C] |
Assigned To | alan |
Status | Closed |
Resolved | 2010-02-24 15:40:45 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.107320 |
BZISSUE::4668
BZDATETIME::2009-10-20 13:37:35
BZCREATOR::Bob Kline
BZASSIGNEE::Alan Meyer
BZQACONTACT::Volker Englisch
We need to be able to publish alternate protocol IDs for CT.gov protocols without repeating IDs. Comparison of IDs should follow rules provided by Cancer.gov for normalization. In addition to removing duplicate IDs from the set of secondary IDs passed in by the filtering code, the function needs to drop IDs which are the same (after normalization) as IDs which will appear elsewhere in the published document. One possible interface would have the filtering code pass in an XML fragment that looks something like:
<ids>
<primary>
<id>a123</id>
<id>b-123</id>
</primary>
<other>
<id>A123</id>
<id>A1234</id>
<id>b 123</id>
<id>b 321</id>
<id>B 321</id>
</other>
</ids>
for which the server would return:
<result>
<id>A1234</id>
<id>b 321</id>
</result>
Doesn't have to look like that: whatever you and Volker determine between you will be the most effective way to accomplish what has to happen.
BZDATETIME::2009-10-20 16:55:40
BZCOMMENTOR::Alan Meyer
BZCOMMENT::1
I've coordinated with Bob and Volker and am working on this
now.
I'm going to try to have it working by sometime on Thursday.
Volker will be working on modifications to the output filter
for CTGovProtocols to incorporate the new enhancement.
BZDATETIME::2009-10-21 00:32:17
BZCOMMENTOR::Alan Meyer
BZCOMMENT::2
I've completed the function to do this, but can't compile or
test yet. I've first got to convert our server makefile from
cvs to svn.
BZDATETIME::2009-10-22 23:59:29
BZCOMMENTOR::Alan Meyer
BZCOMMENT::3
I believe this is now working. I've installed it on Mahler.
We decided to use tilde delimited strings instead of the
serial XML provided in the description of this issue in order
to simplify Volker's immediate task of interfacing this to
the vendor filter. The parser for the input is separated so
we can swap in a different one if we decide later to use the
serial XML approach.
An example calling sequence for the function from XSLT is:
document("cdrutil:/dedup-ids/swog1234~ecog-123~~NCI-442~SWOG 1234~ecog-123~Pf-99")
The return value for the above input is:
<?xml version='1.0' encode='UTF-8'?>
<result>
<id>NCI-442</id>
<id>Pf-99</id>
</result>
This example and other details can be found in the server
CdrFilter.cpp source code comment prolog for the function
execXsltDedupIDs().
BZDATETIME::2009-10-23 14:04:19
BZCOMMENTOR::Volker Englisch
BZCOMMENT::4
Yes, I was able to submit a set of protocol IDs and received back a node listing all of the IDs that were not a primary or other ID yet.
BZDATETIME::2010-02-24 15:40:45
BZCOMMENTOR::Volker Englisch
BZCOMMENT::5
Per discussion at our last status meeting we're closing this issue and create a new one to modify the CTGov Vendor filters to dedupe the secondary IDs.
Elapsed: 0:00:00.001639