CDR Tickets

Issue Number 2246
Summary External Mapping table - normalizing data
Created 2007-06-15 16:16:01
Issue Type Improvement
Submitted By priced
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2007-06-27 17:09:10
Resolution Won't Fix
Path /home/bkline/backups/jira/ocecdr/issue.106574
Description

BZISSUE::3330
BZDATETIME::2007-06-15 16:16:01
BZCREATOR::Sheri Khanna
BZASSIGNEE::Bob Kline
BZQACONTACT::Sheri Khanna

We need to normalize data mapping rules to ignore spaces, periods, and hyphens in Facility names to help facilitate better matching, if possible.

I will attach an example of a Facility that is in the mapping table several times because of these issues.

Comment entered 2007-06-15 16:16:52 by priced

BZDATETIME::2007-06-15 16:16:52
BZCOMMENTOR::Sheri Khanna
BZCOMMENT::1

Comment entered 2007-06-15 16:16:52 by priced

Attachment Mapping_Normalizingproblem_example.doc has been added with description: CTGov Facility example

Comment entered 2007-06-16 05:02:53 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2007-06-16 05:02:53
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::2

Some of the issues in your example are not related to spacing, hyphens, or periods. They relate to data - e.g some zips have 5+4, some parts of the name are dropped in other variations. We can certainly try to normalize spaces and commas, and see how much we can cut down in variants, but it seems to me that the primary problem is that we cannot only try to match on name of the organization. We need the other information

Comment entered 2007-06-21 10:25:10 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2007-06-21 10:25:10
BZCOMMENTOR::Bob Kline
BZCOMMENT::3

I thought I had posted a comment similar to Lakshmi's last week, but I must have missed a Bugzilla "you're not logged in" message. I had done some investigation into the case you posted as an example, and discovered that while you found nine entries in the external_map table for the same organization, the modification you're requesting would eliminate only one of these. As Lakshmi points out, the overwhelming majority of the differences are in discrepancies in the data provided by the source which cannot be normalized away. I can implement the change you request (we'll need to modify the existing data, too), but I question whether the payoff would be worth it, at least based on the cited example of the problem. Do you still want me to proceed with the work on this request?

Comment entered 2007-06-27 17:09:10 by priced

BZDATETIME::2007-06-27 17:09:10
BZCOMMENTOR::Sheri Khanna
BZCOMMENT::4

(In reply to comment #3)
> but I question whether the payoff would be worth it, at least based on the
> cited example of the problem. Do you still want me to proceed with the work >on this request?

Trying to normalize the mapping was an issue that OCCM and CIAT had talked about at one of the Prot. Admin meetings, so this was on our list of issues to address. It doesn't sound like the end result in this case would would be worth
the work, so I will close this issue for now.

Comment entered 2013-12-16 17:44:31 by Englisch, Volker (NIH/NCI) [C]

Resolution of the issue has been set to "Won't fix".

Setting status to 'Closed'.

Attachments
File Name Posted User
Mapping_Normalizingproblem_example.doc 2007-06-15 16:16:52

Elapsed: 0:00:00.002017