CDR Tickets

Issue Number 3025
Summary Modify cancer.gov glossifier service
Created 2009-11-13 15:26:26
Issue Type Improvement
Submitted By Kline, Bob (NIH/NCI) [C]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2010-02-23 08:45:26
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.107353
Description

BZISSUE::4701
BZDATETIME::2009-11-13 15:26:26
BZCREATOR::Bob Kline
BZASSIGNEE::Bob Kline
BZQACONTACT::Bob Kline

Bryan has requested that we make some changes to the glossifier service we provide for them.

Here's the tail end of the email message I sent to him capturing the discussion we had about what he needs:

============================ snip ======================================

If I understand correctly, you want me to:

1. add a requirement to mask out anything which matches the pattern
for an HTML comment
2. retain the requirement to mask out anchor elements (tags and
content) as well as anything marked ...
3. add a requirement to mask out anything left unmasked by the
previous two passes and which starts with '<' and ends with '>'

Sound right?

The corresponding regular expressions I'm using are:

1. <!-.*?->
2. <a\s>+>.*?</a>|.*?
3. <>*>

The first regular expression is used with the flag turned on to have newlines match the dot. The second one uses that flag as well as the "ignore case" flag and the "use the Unicode notion of what whitespace is" flag.

If this looks good to you, let me enable the changes and have you (or Jay) throw some tests at it.

Comment entered 2009-11-13 16:08:02 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-11-13 16:08:02
BZCOMMENTOR::Bob Kline
BZCOMMENT::1

Modifications implemented and installed on verdi for Bryan and Jay to test. Mini has scheduled a meeting on Tuesday from 12 to 12:30 to discuss the requirements. Will defer update of the documentation of the service's behavior embedded in the WSDL until after that meeting.

URL for the test WSDL is http://verdi.nci.nih.gov/u/glossify.

Comment entered 2009-11-16 10:35:57 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-11-16 10:35:57
BZCOMMENTOR::Bob Kline
BZCOMMENT::2

(In reply to comment #1)
> Modifications implemented and installed on verdi for Bryan and Jay to test.
> Mini has scheduled a meeting on Tuesday from 12 to 12:30 to discuss the
> requirements. Will defer update of the documentation of the service's behavior
> embedded in the WSDL until after that meeting.
>
> URL for the test WSDL is http://verdi.nci.nih.gov/u/glossify.

Jay reported that the modifications are working correctly.

Comment entered 2009-11-19 11:32:18 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-11-19 11:32:18
BZCOMMENTOR::Bob Kline
BZCOMMENT::3

The new requirements as quoted above were approved in Tuesday's meeting. Documentation of the service inside the WSDL document has been modified to reflect compliance with these new requirements. The changed documentation has been installed on Verdi (the test system) but not yet on the production server (won't do that until testing is complete and the modifications to the service's behavior have been promoted to production).

http://verdi.nci.nih.gov/u/glossify

Bryan indicated privately (after the meeting) that some time in the next couple of months he would be sending us requirements for a new service which takes well-formed XML (along the lines we originally proposed to Olga, to avoid the need to do our own attempt at parsing the possibly malformed input fragment) and returns a version marked up to indicate candidate glossification of matches with terms in the dictionary. A separate issue will be created for that version of the service when we get those requirements.

Comment entered 2010-02-23 08:45:26 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-02-23 08:45:26
BZCOMMENTOR::Bob Kline
BZCOMMENT::4

Testing completed; this is in production.

Elapsed: 0:00:00.001874