Issue Number | 4096 |
---|---|
Summary | [Scheduler] Add Scheduled Jobs to CDR Scheduler |
Created | 2016-05-13 13:11:33 |
Issue Type | Improvement |
Submitted By | Englisch, Volker (NIH/NCI) [C] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2017-08-23 07:45:43 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.184166 |
We've created the ability to control scheduled jobs ourselves. Now we want to add the current scheduled jobs to the CDR scheduler.
Here are the active jobs running on the CDR Windows PROD server under the Windows Scheduler right now:
CTEP Org Report (runs every Sunday morning, but has been failing consistently for years; we should either just turn this off, or check with CTEP to find out if they really care about this report – unlikely, as they never noticed the failures until we pointed them out to them and got them to fix the configuration problems (which we no longer bother to do))
CTGov Nightly Tasks (plugged into the new CDR scheduler on DEV and QA; waiting for promotion to upper tiers)
Emailer Tracking Update (syncs GP mailer tracking information; runs early every morning)
GovDelivery Weekly Reports (runs every Sunday morning at 1am)
Hoover Cleanup (runs every morning at 1:15; plugged into the new CDR scheduler on DEV and QA; waiting for promotion to the upper tiers)
ICRDB Stats Report (runs early in the morning on the 1st of each month)
Jobmaster Nightly (weekday publishing jobs; plugged into the new CDR scheduler on DEV and QA; waiting for promotion to the upper tiers)
Jobmaster Weekly (Friday publishing job; plugged into the new CDR scheduler on DEV and QA; waiting for promotion to the upper tiers)
Jobmaster911 (runs on demand; plugged into the new CDR scheduler on DEV and QA; waiting for promotion to the upper tiers)
Licensee List (runs early in the morning on the 1st of each month)
Monitor Blocking DB Processes (runs every 5 minutes)
Report CDR Scheduled Tasks (runs every 5 minutes; will no longer be needed if all of the other jobs listed here are migrated to the new scheduler)
Restart CDR Service (runs every Thursday evening at 9:30)
Track New Clinical Trials at NLM (runs every morning at 4am)
Zip_HTTPERR_Logs (run on demand)
Zip_IIS_Logs (run on demand)
Maintenance\Start CDR (run on demand)
~henryec: I have completed integration of the jobs you assigned me (see OCECDR-4136 and OCECDR-4137). Do want me to work on getting the other jobs (the ones in blue) plugged into the new scheduler?
Yes! Please move the remaining jobs into the new scheduler. Additionally, while it is great to have the list on this ticket, can you put this list somewhere on the Collaborate wiki as all the jobs that are(will be) in the scheduler? Thank you.
Blair already did that: https://collaborate.nci.nih.gov/display/OCECTBWIKI/Move+Scheduled+Tasks+to+CDR+Scheduler
I believe I have finished the development work for this task. Here are the jobs I'm going to skip:
CTEP Org Report (Margaret says they don't use this any more)
Monitor Blocking DB Processes (the scripts aren't there, so this doesn't do anything)
Report CDR Scheduled Task (no longer needed)
Zip_HTTPERR_Logs (CBIIT created this for themselves; run on demand)
Zip_IIS_Logs (CBIIT created this for themselves; run on demand)
Maintenance\Start CDR (CBIIT created this for themselves; run on demand)
Here is the status of the jobs we will have turned off in the Windows scheduler on the upper tiers and installed under the new scheduler:
CTGov Nightly Tasks (has been running successfully on DEV; ready for promotion)
Emailer Tracking Update (tested by developer on DEV; ready for promotion, though users may want to do more testing)
GovDelivery Weekly Reports (tested by Margaret, Robin, and Volker; ready for promotion, unless Kevin B. wants to weigh in)
Hoover Cleanup (has been running successfully on DEV; ready for promotion)
ICRDB Stats Report (users and Volker are currently reviewing/testing)
Jobmaster Nightly (has been running successfully on DEV; need Volker's green light for promotion)
Jobmaster Weekly (has been running successfully on DEV; need Volker's green light for promotion)
Jobmaster911 (needs more testing by Volker?)
Licensee List (ready for testing by Volker)
Restart CDR Service (tested by developer; needs more testing/review by Volker)
Track New Clinical Trials at NLM (currently being tested on DEV)
Hi Bob,
A few comments on the PCIB stats report:
1) On the report interface, please change "PDQ Drug Terms" to "NCI
Drug Terms".
2) Since the pronunciation audio are automatically included with
glossary, we don't think they need to be listed separately. Instead,
please add "(including audio pronunciations)" after Glossary and remove
the Pronunciation Audio option.
3) Please check the option to include a column for CDR IDs by
default.
4) Please include CDR IDs in the automated report that Margaret gets on
the 1st of each month, as well as the list of individual docs.
Drug Terms label changed
Separate audio option removed from web admin interface (retained internally, in case Volker needs it for something)
Glossary label altered as requested
IDs column option checked by default
Defaults for monthly report now include the doc tables with the ID column
Ready for testing on QA.
Here is a snapshot of the scheduled jobs on DEV in the new scheduler:
I will repopulate the scheduler's jobs on QA (which got wiped out by the refresh).
What's the sort order of the list? I'm guessing it's the next publishing date, right?
What is "Bare Publishing"? Is it publishing only or does it include the push?
I will repopulate the scheduler's jobs on QA (which got wiped out by the refresh).
Done.
What's the sort order of the list? I'm guessing it's the next publishing date, right?
The columns are click-sortable. So clicking on the third column ("Next Run") sorts the jobs with the next job to run at the top. Clicking on it again will reverse the sort. However, clicking on the first column (the default sort) does not order the rows by what's displayed in that column, but instead by the class name which implements the task (one of a number of oddities in this UI).
What is "Bare Publishing"? Is it publishing only or does it include the push?
It launches SubmitPubJob.py
but none of the other
scripts in the Publishing directory.
Including the CDR IDs in the report output shouldn't change the sort for each of the tables in this report - it makes it really hard to find what we're looking for in the lists. Could you please keep the lists alphabetical when the CDR IDs are displayed?
Could you please keep the lists alphabetical when the CDR IDs are displayed?
Done (ignoring case in the comparison). Ready for testing on QA.
Licensee List (ready for testing by Volker)
This report looks good. I would just like to have the Partner
Name field be a little wider to avoid wrapping names over 5 or 6
lines. Maybe the CDR-ID could be displayed without the CDR0000
prefix?
Of course, this is a very minor issue.
Prefix dropped for CDR IDs in the licensee report.
Track New Clinical Trials at NLM (runs every morning at 4am)
Tested this today. It appears to be working well and it is picking up more trials on DEV than currently on PROD. Thanks!
Emailer Tracking Update (syncs GP mailer tracking information; runs early every morning)
I generated 4 mailer documents which were successfully processed on DEV so I expected to receive the mailer mailer update requests early this morning but it doesn't look like they were sent.
As part of the work for OCECDR-4092 the mailer script was modified to send lower-tier emailers to CancerGovTest@mail.nih.gov. We'll need to get you on that distribution list. In the meantime, you can get to the mailers using https://gpmailers-dev.cancer.gov/cgi-bin/ListGPEmailers.
Thanks! I was able to access and update the mailers from the GP Emailers list. The responses were sent to the geneticsdirectory@cancer.gov mailbox.
~volker: I have documented all of the parameters supported by the jobs implemented under the new scheduler at https://collaborate.nci.nih.gov/display/OCECTBWIKI/Options+for+the+scheduled+jobs. Please review and improve as appropriate.
Thanks.
~volker: I agree with your offline proposal that we drop FileSweeper.cfg from its old location. Please review these two versions of that file and create a reconciled version which is (to the best of your understanding) correct:
~volker: I can't find any commits tagged with this issue's number, so I assume you're still working on the reconciliation of the two configuration files, right?
That's correct. The FileSweeper.cfg file had been updated as
part of the ticket OCECDR-4147. The file under the Scheduler
directory is the most recent updated file.
I do still have to remove the config file and the Python program from
the Utilities/bin directory.
Now that the scheduler appears to be more reliable on all the tiers (keeping our fingers crossed) it's time to revisit this ticket so we can get all of our scheduled tasks out of the Windows Scheduler and into the CDR Scheduler. This table only includes the tasks we're keeping (or adding).
Task |
Prod |
Stage |
QA |
Dev |
Notes |
---|---|---|---|---|---|
Batch Job Queue |
CS |
CS |
CS |
CS |
OCECDR-4266 promoted ✔ |
CTGov Nightly Tasks |
CS |
CS |
CS |
CS |
"Clinical Trials" in CS ✔ |
Emailer Tracking Update |
CS |
CS |
CS |
CS |
Every morning at 3:30 am ✔ |
Glossifier Service Refresh |
CS |
CS |
CS |
CS |
Every night at 11 pm ✔ |
GovDelivery Weekly Reports |
CS |
CS |
CS |
CS |
Sunday mornings at 1 am ✔ |
Hoover Cleanup |
CS |
CS |
CS |
CS |
"Hoover" in CS ✔ |
ICRDB Stats Report |
CS |
CS |
CS |
CS |
"PCIB Report" in CS ✔ |
Jobmaster Nightly |
CS |
CS |
CS |
CS |
Nightly publishing jobs ✔ |
Jobmaster Weekly |
CS |
CS |
CS |
CS |
Weekly full publishing job ✔ |
Jobmaster911 |
CS |
CS |
CS |
CS |
Bare/Post Publishing manual jobs in CS ✔ |
Licensee List |
CS |
CS |
CS |
CS |
PDQ Partner List in CS ✔ |
Publishing Queue Check |
CS |
CS |
CS |
CS |
OCECDR-4266: promoted ✔ |
Push Job Verifier |
CS |
CS |
CS |
CS |
OCECDR-4266: promoted ✔ |
Restart CDR Service |
CS |
CS |
CS |
CS |
Thursday evenings at 9:30 pm ✔ |
Track New Clinical Trials at NLM |
CS |
CS |
CS |
CS |
Recent CT.gov Trials in CS ✔ |
Translation Job Notification |
CS |
CS |
CS |
CS |
Every Monday at 5 am ✔ |
WS=Windows Scheduler
CS=CDR Scheduler
PS=Publishing Service
✔=Running successfully on PROD
👍=looks good for promotion to the next tier
⚑=problems to be resolved
❓=awaiting testing
Next steps:
resolve problems with weekly publishing task
confirm successful running of the jobs on DEV And QA
move jobs from WS to CS on STAGE
confirm successful running of the jobs on STAGE
move jobs from WS to CS on PROD
I reported a DB failure logged Saturday evening at 9:52 pm on ncidb-d111
Error 5128:
Write to sparse file 'G:
Data
Microsoft SQL Server
Data
MSSQLSG_BLUE
CDR.mdf:MSSQL_DBCC23' failed due to lack of disk space.
DB-Lib error message 20018, severity 17:
General SQL Server error: Check messages from the SQL Server
Ticket number INC3178040.
~volker, ~duganal or ~bryanp: can you think of any reason why I shouldn't go ahead and put in the ticket to have CBIIT turn off in the Windows Scheduler on STAGE the tasks marked 👍 in the table above and installing them in the CDR Scheduler on that tier?
No, I believe we're ready for STAGE.
On to STAGE!
Since we stopped copying documents to the QA-Linux server I cancelled (commented out) the cronjobs that were running:
Jobmaster Nightly
Jobmaster Weekly
Disk cleanup (removing old files)
Done! :-)
The STAGE server is struggling. I can't get into the administrative interface for the CDR scheduler, which is timing out. I have filed a ticket with CBIIT. I told them that an accurate diagnosis of the cause of the problems is more important than a solution, and asked them not to experiment with turning things on and off, but instead to examine things like the Event Log and the Process Manager and look for clues and report back.
Is it possible STAGE is currently publishing Summary documents?
No, the publishing job (including the push) finished last night before 9 pm. And this morning's clinical trial download and import jobs completed successfully well before 7 am. Shawn tells me that there's a python process which is using up a lot of the system's memory.
Without any explanation, the system is responding again. I know it's not because the scheduler got bounced, because (a) there's no record of that having happened in the logs; (b) we didn't get the email notification which is sent out when that happens; and (c) the process ID for the service hasn't changed. Shawn did imply that the server's only got 3GB of RAM, so I've asked if we can bump that up (this isn't Linux, after all). :-)
Shawn has put in a request to bring the RAM on STAGE up to the level on PROD (8GB). The exact times when the scheduler was not logging any activity were from 7:11:00.428 until 9:23:44.693.
All sixteen of the jobs to be migrated have been turned off in the Windows Scheduler on the production server and have been plugged into the new CDR Scheduler. Ten of the jobs have run successfully at least once, and none of those have failed. Of the remaining six, three are weekly jobs, two are monthly jobs, and one (split into two separate tasks in the CDR scheduler) is an unscheduled manually run task (Jobmaster911 in the Windows Scheduler). So we will have confirmation in a little over a week for everything except the manual task. I'll declare success even without testing of the manual task (though if Volker has occasion to use it during the next week that would be a bonus – from the perspective of this ticket). The table above in the July 31 comment has been updated to reflect the new statuses of the tasks.
All sixteen of the jobs to be migrated have been turned on in the Windows Scheduler
I'm sure you meant to say: turned off :-)
Yup! Fixed.
I was able to check off the "Restart CDR Service" task, which ran successfully as scheduled yesterday evening. However, I should point out that the logs also record a restart of the service around midnight, when network failures broke the connection to the database, ~volker, ~bryanp, and ~duganal.
Only two tasks remaining to be confirmed on PROD. They're the monthly tasks, and since they both run on the first of the month, we'll be looking at them tomorrow.
There seems to be a minor problem with the PDQ Content Partner report. It ran successfully but was delivered to the wrong distribution list. The last report was delivered to Margaret, Nanci, Operator, and myself. This month's report was delivered to Bob and myself.
That's an anomaly which is being corrected by Feynman. Please forward this month's report as appropriate.
All the tasks have green checks. I think we're done with this ticket.
The scheduled tasks are running successfully on PROD.
Closing ticket.
File Name | Posted | User |
---|---|---|
2016-08-25 09_11_57-Scheduler.png | 2016-08-25 10:30:30 | Kline, Bob (NIH/NCI) [C] |
Elapsed: 0:00:00.001416