CDR Tickets

Issue Number 1826
Summary Area Code Lookup File
Created 2006-02-07 08:39:14
Issue Type Improvement
Submitted By Grama, Lakshmi (NIH/NCI) [E]
Assigned To Beckwith, Margaret (NIH/NCI) [E]
Status Closed
Resolved 2013-07-12 09:45:05
Resolution Won't Fix
Path /home/bkline/backups/jira/ocecdr/issue.106154
Description

BZISSUE::1979
BZDATETIME::2006-02-07 08:39:14
BZCREATOR::Lakshmi Grama
BZASSIGNEE::Alan Meyer
BZQACONTACT::Lakshmi Grama

It would be helpful to get a subscription similar to the Zip code look up that will help us keep Area codes accurate. Of course, given the way we store our phone numbers as a string this may be a problem. But we could look into changing the Phone structure for US and Canadian phones into Area codes and Phone numbers. Since area codes change, it is possible that we have some wrong area codes in the CDR.

Could Bob or Alan look at options, including a subscription service that we can use every quarter. We would want to make sure that the data comes from an authenticated source.

Comment entered 2006-02-07 08:40:44 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2006-02-07 08:40:44
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::1

Here's an example that I found

http://www.zipcodeworld.com/zipcodebasic.htm

Comment entered 2006-02-07 10:20:00 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2006-02-07 10:20:00
BZCOMMENTOR::Bob Kline
BZCOMMENT::2

I'm not sure how useful this would be. For one thing, I looked at the example you found under my own zip code, which shows the area code 703, which is the area code traditionally given out for my area (and is my own area code), but the phone company has been forced to assign 571 to many of its customers in this area because 703 was running out of space. It might be possible to deal with this problem by using a service which was capable of storing many-to-many relationships for the codes (and keeping more current with what the phone companies are doing), but that wouldn't handle the more serious problem presented by mobile phones, which allow organizations to do business with phones which have area codes from anywhere. At best we might want to consider a report of phone numbers to be checked manually, but I wouldn't recommend invalidating documents with phone numbers whose area codes didn't match a looking table.

Similarly, I would recommend against restructuring the phone data, which could introduce more problems than it would be worth when dealing with imported phone numbers. We'll see if we can find a service with more reliable data.

Comment entered 2006-02-07 12:25:01 by alan

BZDATETIME::2006-02-07 12:25:01
BZCOMMENTOR::Alan Meyer
BZCOMMENT::3

I obtained some sample "basic" area code files from ZipCodeWorld.
Assuming the data is accurate, it does appear to be potentially
useful.

They provide the data in three formats, Excel, MS Access, and
comma separated value - which is probably the simplest for us to
use. The data lists each area code and exchange in the North
American numbering system (U.S., Canada, and some islands in the
Caribbean and Pacific), e.g., 301 + 435 for our local phone
numbers, and provides for each the country, State or territory or
province, City, Zip code, and some statistical information of no
interest to us. The "premium" and "gold" editions add
information of no interest to us.

The cost for one year of quarterly updates is $150, which seems
reasonable to me. [Oddly, the company advertises monthly update
but only quarterly download. I'll write to them about that.]

In theory, this would would enable us to validate that any
particular area code + exchange is in a valid combination, and
might also allow us to validate by state or zip code.

The programming for a report shouldn't be too difficult, but we'd
have to actually write the program and experiment with it to find
out how effective it would be.

I recommend that we order either the $150 annual, or $80 one-time
download, write the program, and test it.

The program might extract phone number, state, and zip code from
each CDR address, validate them, and report all errors with an
indication of which type(s) of validation failed. The possible
failures would be:

area code / exchange mismatch
area code / state mismatch
area code / zip code mismatch

Someone would then have to do some checking to find out which, if
any, of the reported mismatches were reliable enough to include
in a production report.

Comment entered 2006-02-07 12:28:07 by alan

BZDATETIME::2006-02-07 12:28:07
BZCOMMENTOR::Alan Meyer
BZCOMMENT::4

Implementation note:

The full CSV file is 31 MB. It might not be practical to create
an in-memory dictionary of the file. However, it might be
straightforward and fast to sort extracted address information
and area code information into the same sort order and then do a
"sort merge" comparison to find all the errors.

Comment entered 2006-02-07 14:24:31 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2006-02-07 14:24:31
BZCOMMENTOR::Bob Kline
BZCOMMENT::5

Reassigned to Alan.

Comment entered 2006-02-14 12:07:15 by alan

BZDATETIME::2006-02-14 12:07:15
BZCOMMENTOR::Alan Meyer
BZCOMMENT::6

ZipCodeWorld has not responded to the email I sent last Thursday
and their phone number leads to a voice mailbox.

I searched the web for companies selling area code information,
but all I found was ZipCodeWorld and some even smaller and less
confidence-inspiring home businesses.

Then I called Verizon to see what they could offer. After the
usual Verizon series of waits, transfers, off-shore customer
service people who suggested I call 411 to get area code
information, etc., I eventually got through to someone in New
York who told me they didn't sell any data appropriate to my
usage. They could offer a complete listing of all phone numbers
in the U.S. at $.06 per phone number - that's what they sell to
phone number publishers, but nothing else. However they
suggested I look at NANPA, the North American Number Plan
Administrator, and after much searching on their website, I
believe I hit the jackpot.

There is a free report there listed under "All States" "utilized
codes". It can be downloaded from the page listing all reports
at:

http://www.nanpa.com/reports/reports_cocodes_assign.html

I'm guessing that this report is the primary source used by
ZipCodeWorld/AreaCodeWorld for their area code reports. The
report is an 18 MB text file with fixed length fields and tab
separators. It does not have zip codes, but does have area
codes, local exchanges, states, current status (one of five
status codes, two or three of which (I'm not sure which) indicate
that the area code and exchange are usable by customers) and some
other information that may only be of interest to phone
companies.

As compared to AreaCodeWorld, this data has the following
limitations:

No Canadian numbers are included.
No zip codes are included.

I think Canadian codes may also be available in a different
report, but I haven't found it yet.

Despite the advantages of the AreaCodeWorld data, this source
seems like a better bet for the following reasons:

1. It's authoritative.

NANPA assigns area codes and exchanges. I presume their
data is the best there is.

2. We can download it any time we want.

We aren't limited to quarterly updates. I don't know how
often the report is updated, but if we run the validation
report monthly, or weekly, we can just download the latest
version, whatever it is, for the purpose. We don't have
to wait for AreaCodeWorld to publish the data.

3. It's free.

I propose to use it to write a validation program.

Comment entered 2006-02-14 12:43:52 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2006-02-14 12:43:52
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::7

Sounds like the best data source. We did not want to programmatically validate our numbers. After discussion with Bob, it seemed that a report would be the best bet for us so that we could identify the problem numbers that needed to be reviewed manually.

Comment entered 2006-02-14 18:10:53 by alan

BZDATETIME::2006-02-14 18:10:53
BZCOMMENTOR::Alan Meyer
BZCOMMENT::8

I'll proceed with this.

I also found some files that have both U.S. and Canadian
area codes - though without the exchange information. One
useful one is in Microsoft Access MDB format and another in
HTML.

I won't worry about Canadian numbers for now. We can deal
with them later if U.S. validation proves useful.

Comment entered 2006-02-14 18:13:25 by alan

BZDATETIME::2006-02-14 18:13:25
BZCOMMENTOR::Alan Meyer
BZCOMMENT::9

I have searched through our schemas to identify document types
and elements that contain phone numbers. There are quite a few.

Document types include:

Person
Organization
InScopeProtocol
CTGovProtocol
PDQBoardMemberInfo

Some, but not all of the phone numbers, are stored in elements of
type ContactDetail. Elements include Phone, TollFreePhone, and
Fax.

For the ones that are in ContactDetail elements, I can use:

PostalAddress/PoliticalSubunit_Link and
PostalAddress/CountryLink

to link to the appropriate table to find the state and country.

For the other phone numbers, there are some variations on all
that.

The extraction of phone numbers and their associated state and
country information will probably turn out to be more complicated
than their validation.

I therefore propose the following:

1. Pick one or a few phone number types to validate.

I propose to start with the approximately 15,000
Organizations, taking one or two phone numbers from each.

2. Write a command line program to validate them and produce a
simple report.

The program will take an input file that I downloaded by hand
from NANPA, and validate the phone numbers, reporting errors
in a text file.

We'll look at the results. Only if they are valuable, i.e., a
useful number of errors are found with relatively few false error
reports, should we go on to develop the full program.

If the results are valuable and we decide to develop the program,
I can expand it to include more document types and elements. I
can also include automatic download of the latest area code file,
and perhaps add browser based components to initiate the report
and to view the results.

I'll proceed with this plan unless someone suggests otherwise.

Comment entered 2006-04-27 14:35:03 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2006-04-27 14:35:03
BZCOMMENTOR::Bob Kline
BZCOMMENT::10

Priority dropped at status meeting.

Comment entered 2006-07-27 20:25:32 by alan

BZDATETIME::2006-07-27 20:25:32
BZCOMMENTOR::Alan Meyer
BZCOMMENT::11

I've written and tested about half of a test program to do
this task but, alas, higher priority work is in the queue.

I've written down the design and will get back to it when
I can.

Comment entered 2007-01-18 13:25:08 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2007-01-18 13:25:08
BZCOMMENTOR::Bob Kline
BZCOMMENT::12

Dropping priority at Lakshmi's request.

Comment entered 2010-10-01 10:47:11 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-10-01 10:47:11
BZCOMMENTOR::Volker Englisch
BZCOMMENT::13

Is this a task that is still needed or should it be closed and opened in the future, if needed.

Comment entered 2012-01-03 10:57:25 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2012-01-03 10:57:25
BZCOMMENTOR::Volker Englisch
BZCOMMENT::14

This task is pretty dusty after being forgotten in the P10 dungeon for 5 years.

Should we keep it open, Lakshmi, or is it OK to close this issue?

Comment entered 2013-07-12 09:45:06 by Beckwith, Margaret (NIH/NCI) [E]

Decided at status meeting that we will not ever need this.

Comment entered 2013-07-12 10:52:28 by Beckwith, Margaret (NIH/NCI) [E]

I had in my notes from the meeting to close this issue, so I set it to Resolved/Won't fix. But I see that Bob had put it on hold. So now I am a little confused...

Elapsed: 0:00:00.001424