CDR Tickets

Issue Number 4297
Summary Create Scheduled Task for PDQAccess Report
Created 2017-08-03 13:38:16
Issue Type Improvement
Submitted By Englisch, Volker (NIH/NCI) [C]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2017-08-21 11:49:55
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.212413
Description

We have a "Production" report running on the DEV Linux application server. It's the PDQAccess report which creates a spreadsheet listing all downloads by PDQ partners by month.
With the removal of the Linux servers we will have to migrate this report to run as another scheduled job.

Comment entered 2017-08-09 11:15:46 by Kline, Bob (NIH/NCI) [C]

I'm assigning this to myself, as I fear that if I don't implement it myself, my understanding of what it's doing will never be any better than it is now (which is pretty shaky). :-)

Here's a crude stab at a spec:

CBIIT provides us access to a snapshot extracted from the SFTP logs, with information to which they do not wish us to have access removed. That snapshot will be generated and placed at TBD. The scheduled CDR task will create a spreadsheet report as close as possible to that created on the Linux server by ~cdroperator/prod/bin/PDQAccessReport.py.

: the only remaining work for you to do on this ticket is to fix the previous attempt at a spec so that it's correct, including replacement of TBD.

Comment entered 2017-08-09 12:36:31 by Englisch, Volker (NIH/NCI) [C]

I'm having trouble keeping up with your questions and remarks. :-)

I'll rewrite your spec with additional information:

CBIIT provides us access to a snapshot extracted from the SFTP logs. This snapshot will be located on the FTP server under

~cdroperator/logs

The name of the log-file is

pdq.log

The name of the monthly log files created shortly after midnight on the first of each month are

pdq.log-YYYYMMDD.gz

That snapshot is also stored in the same directory as the pdq.log file.
We will copy this file using rsync or ssh to

D:\cdr\sftp_log

We will then create a spreadsheet report for the latest monthly log file as close to that created on the Linux server by ~cdroperator/prod/bin/PDQAccessReport.py.

FYI: I have copied all of our past log files to the directory D:\cdr\sftp_log

Comment entered 2017-08-09 12:54:02 by Kline, Bob (NIH/NCI) [C]

Awesome! This is excellent. Looking at what you've put in that directory on DEV, it appears that we didn't actually lose any information when their cron job failed to kick off, right?

Comment entered 2017-08-09 13:09:09 by Englisch, Volker (NIH/NCI) [C]

That is correct but I don't remember if I manually corrected the log files. I think I did because I see two files for July 2016!

There were a few problems that arose with the missing log file:

  1. Our scheduled job to copy the latest log file to a local drive failed because the file didn't exist

  2. Once the file got created the filename didn't follow our expected date pattern (it was created on the 6th instead of the first)

  3. The spreadsheet created included some entries from the current month and not just the previous month.

None of this was a major issue, though.

Comment entered 2017-08-10 17:42:31 by Englisch, Volker (NIH/NCI) [C]

Note:
I heard earlier today from Stamen they may not be setting up the cdroperator account to use a SSH-key pair. It would mean we'd need to modify our programs accessing the data on the sFTP server and re-testing.

Comment entered 2017-08-10 17:59:31 by Kline, Bob (NIH/NCI) [C]

: You need to be aware that CBIIT is already talking about changing John's approach in ways which could jeopardize our work. It would be good to get John to nail down the approach we've worked out with him before he goes. See Volker's previous comment on this ticket.

Comment entered 2017-08-21 11:49:55 by Kline, Bob (NIH/NCI) [C]

This has been tested by Bob and Volker.

Comment entered 2017-09-08 13:00:49 by Englisch, Volker (NIH/NCI) [C]

The report ran successfully on the first of the month.

Closing ticket.

Elapsed: 0:00:00.001329