CDR Tickets

Issue Number 1812
Summary Create Statistics from the CIPSFTP Log Files
Created 2006-01-25 16:26:05
Issue Type Improvement
Submitted By Englisch, Volker (NIH/NCI) [C]
Assigned To Englisch, Volker (NIH/NCI) [C]
Status Closed
Resolved 2006-03-20 15:48:04
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.106140

BZDATETIME::2006-01-25 16:26:05
BZCREATOR::Volker Englisch
BZASSIGNEE::Volker Englisch
BZQACONTACT::Lakshmi Grama

Lakshmi would like to see statistics on who is accessing our CDR data, what and how often individual files are being retrieved.

The exact format and frequency of this report will be determined later.

Comment entered 2006-01-25 18:57:46 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2006-01-25 18:57:46
BZCOMMENTOR::Volker Englisch

I am using a modified Perl sript that I found on the internet in order to extract some data from the FTP log files. The result of this can be viewed at

This output shows about one weeks worth of data and it lists this information:

  • Summary table

  • Downloads by domain

  • Downloads by vendor (i.e. user account) this is not split if a vendor account
    is being used from multiple hosts.

  • Downloads from the 'monthly' directory. The vendor has either downloaded
    supporting documents (the DTD file, documentation, etc.) or has downloaded
    an entire document type with all of its documents

  • same as above but split by vendor (user account)

  • Downloads from the 'monthly' directory. The vendor downloaded individual
    CDR documents rather then retrieving an entire directory.

  • same as above but split by vendor (user account)

  • Downloads from the 'weekly' directory. The vendor download of individual CDR
    documents. The entry 'ProtocolActive' indicates downloads by the NLM. All
    other entries are regular vendor downloads.

  • same as above but split by vendor (user account)

  • all files/directories being retrieved.

Please let me know if this is all you need to see or if there are any other views you'd like to see of the data.

If this is OK I could run the report for all log files available reaching back a few months.

For my information, the perl script is located on CIPSFTP under

Comment entered 2006-02-08 18:51:28 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2006-02-08 18:51:28
BZCOMMENTOR::Volker Englisch

I have created the statistics output for the months of Oct'05 through Jan'06:
(Please note that these files are very large since they list each individual file. The most interesting data is probably listed on top of the report so that you could just click the 'Stop loading this Page' button once the top part loaded.)

Please let me know if there is anything else we need for this issue.

Comment entered 2006-03-09 15:02:44 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2006-03-09 15:02:44

Closed at status meeting per LG.

Comment entered 2006-03-20 15:48:04 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2006-03-20 15:48:04
BZCOMMENTOR::Volker Englisch

Per request, I have added the statistics output for the month of Feb '06:

(Also testing if I can add 'Hours Worked' to a bug with status 'Closed'.)

Elapsed: 0:00:00.000843