Issue Number | 4297 |
---|---|
Summary | Create Scheduled Task for PDQAccess Report |
Created | 2017-08-03 13:38:16 |
Issue Type | Improvement |
Submitted By | Englisch, Volker (NIH/NCI) [C] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2017-08-21 11:49:55 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.212413 |
We have a "Production" report running on the DEV Linux application
server. It's the PDQAccess report which creates a spreadsheet listing
all downloads by PDQ partners by month.
With the removal of the Linux servers we will have to migrate this
report to run as another scheduled job.
I'm assigning this to myself, as I fear that if I don't implement it myself, my understanding of what it's doing will never be any better than it is now (which is pretty shaky). :-)
Here's a crude stab at a spec:
CBIIT provides us access to a snapshot extracted from the SFTP logs, with information to which they do not wish us to have access removed. That snapshot will be generated and placed at TBD. The scheduled CDR task will create a spreadsheet report as close as possible to that created on the Linux server by ~cdroperator/prod/bin/PDQAccessReport.py.
~volker: the only remaining work for you to do on this ticket is to fix the previous attempt at a spec so that it's correct, including replacement of TBD.
I'm having trouble keeping up with your questions and remarks. :-)
I'll rewrite your spec with additional information:
CBIIT provides us access to a snapshot extracted from the SFTP logs. This snapshot will be located on the FTP server under
~cdroperator/logs
The name of the log-file is
.log pdq
The name of the monthly log files created shortly after midnight on the first of each month are
.log-YYYYMMDD.gz pdq
That snapshot is also stored in the same directory as the
pdq.log file.
We will copy this file using rsync or ssh to
:\cdr\sftp_log D
We will then create a spreadsheet report for the latest monthly log file as close to that created on the Linux server by ~cdroperator/prod/bin/PDQAccessReport.py.
FYI: I have copied all of our past log files to the directory D:\cdr\sftp_log
Awesome! This is excellent. Looking at what you've put in that directory on DEV, it appears that we didn't actually lose any information when their cron job failed to kick off, right?
That is correct but I don't remember if I manually corrected the log files. I think I did because I see two files for July 2016!
There were a few problems that arose with the missing log file:
Our scheduled job to copy the latest log file to a local drive failed because the file didn't exist
Once the file got created the filename didn't follow our expected date pattern (it was created on the 6th instead of the first)
The spreadsheet created included some entries from the current month and not just the previous month.
None of this was a major issue, though.
Note:
I heard earlier today from Stamen they may not be setting up the
cdroperator account to use a SSH-key pair. It would mean we'd
need to modify our programs accessing the data on the sFTP server and
re-testing.
~duganal: You need to be aware that CBIIT is already talking about changing John's approach in ways which could jeopardize our work. It would be good to get John to nail down the approach we've worked out with him before he goes. See Volker's previous comment on this ticket.
This has been tested by Bob and Volker.
The report ran successfully on the first of the month.
Closing ticket.
Elapsed: 0:00:00.001329