CDR Tickets

Issue Number 4096
Summary [Scheduler] Add Scheduled Jobs to CDR Scheduler
Created 2016-05-13 13:11:33
Issue Type Improvement
Submitted By Englisch, Volker (NIH/NCI) [C]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2017-08-23 07:45:43
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.184166
Description

We've created the ability to control scheduled jobs ourselves. Now we want to add the current scheduled jobs to the CDR scheduler.

Comment entered 2016-08-08 14:06:36 by Kline, Bob (NIH/NCI) [C]

Here are the active jobs running on the CDR Windows PROD server under the Windows Scheduler right now:

  1. CTEP Org Report (runs every Sunday morning, but has been failing consistently for years; we should either just turn this off, or check with CTEP to find out if they really care about this report – unlikely, as they never noticed the failures until we pointed them out to them and got them to fix the configuration problems (which we no longer bother to do))

  2. CTGov Nightly Tasks (plugged into the new CDR scheduler on DEV and QA; waiting for promotion to upper tiers)

  3. Emailer Tracking Update (syncs GP mailer tracking information; runs early every morning)

  4. GovDelivery Weekly Reports (runs every Sunday morning at 1am)

  5. Hoover Cleanup (runs every morning at 1:15; plugged into the new CDR scheduler on DEV and QA; waiting for promotion to the upper tiers)

  6. ICRDB Stats Report (runs early in the morning on the 1st of each month)

  7. Jobmaster Nightly (weekday publishing jobs; plugged into the new CDR scheduler on DEV and QA; waiting for promotion to the upper tiers)

  8. Jobmaster Weekly (Friday publishing job; plugged into the new CDR scheduler on DEV and QA; waiting for promotion to the upper tiers)

  9. Jobmaster911 (runs on demand; plugged into the new CDR scheduler on DEV and QA; waiting for promotion to the upper tiers)

  10. Licensee List (runs early in the morning on the 1st of each month)

  11. Monitor Blocking DB Processes (runs every 5 minutes)

  12. Report CDR Scheduled Tasks (runs every 5 minutes; will no longer be needed if all of the other jobs listed here are migrated to the new scheduler)

  13. Restart CDR Service (runs every Thursday evening at 9:30)

  14. Track New Clinical Trials at NLM (runs every morning at 4am)

  15. Zip_HTTPERR_Logs (run on demand)

  16. Zip_IIS_Logs (run on demand)

  17. Maintenance\Start CDR (run on demand)

: I have completed integration of the jobs you assigned me (see OCECDR-4136 and OCECDR-4137). Do want me to work on getting the other jobs (the ones in blue) plugged into the new scheduler?

Comment entered 2016-08-10 09:23:28 by henryec

Yes! Please move the remaining jobs into the new scheduler. Additionally, while it is great to have the list on this ticket, can you put this list somewhere on the Collaborate wiki as all the jobs that are(will be) in the scheduler? Thank you.

Comment entered 2016-08-10 10:43:36 by Kline, Bob (NIH/NCI) [C]
Comment entered 2016-08-23 17:04:17 by Kline, Bob (NIH/NCI) [C]

I believe I have finished the development work for this task. Here are the jobs I'm going to skip:

  • CTEP Org Report (Margaret says they don't use this any more)

  • Monitor Blocking DB Processes (the scripts aren't there, so this doesn't do anything)

  • Report CDR Scheduled Task (no longer needed)

  • Zip_HTTPERR_Logs (CBIIT created this for themselves; run on demand)

  • Zip_IIS_Logs (CBIIT created this for themselves; run on demand)

  • Maintenance\Start CDR (CBIIT created this for themselves; run on demand)

Here is the status of the jobs we will have turned off in the Windows scheduler on the upper tiers and installed under the new scheduler:

  • CTGov Nightly Tasks (has been running successfully on DEV; ready for promotion)

  • Emailer Tracking Update (tested by developer on DEV; ready for promotion, though users may want to do more testing)

  • GovDelivery Weekly Reports (tested by Margaret, Robin, and Volker; ready for promotion, unless Kevin B. wants to weigh in)

  • Hoover Cleanup (has been running successfully on DEV; ready for promotion)

  • ICRDB Stats Report (users and Volker are currently reviewing/testing)

  • Jobmaster Nightly (has been running successfully on DEV; need Volker's green light for promotion)

  • Jobmaster Weekly (has been running successfully on DEV; need Volker's green light for promotion)

  • Jobmaster911 (needs more testing by Volker?)

  • Licensee List (ready for testing by Volker)

  • Restart CDR Service (tested by developer; needs more testing/review by Volker)

  • Track New Clinical Trials at NLM (currently being tested on DEV)

Comment entered 2016-08-24 10:22:44 by Juthe, Robin (NIH/NCI) [E]

Hi Bob,

A few comments on the PCIB stats report:

1) On the report interface, please change "PDQ Drug Terms" to "NCI Drug Terms".
2) Since the pronunciation audio are automatically included with glossary, we don't think they need to be listed separately. Instead, please add "(including audio pronunciations)" after Glossary and remove the Pronunciation Audio option.
3) Please check the option to include a column for CDR IDs by default.
4) Please include CDR IDs in the automated report that Margaret gets on the 1st of each month, as well as the list of individual docs.

Comment entered 2016-08-24 13:55:42 by Kline, Bob (NIH/NCI) [C]
  1. Drug Terms label changed

  2. Separate audio option removed from web admin interface (retained internally, in case Volker needs it for something)

  3. Glossary label altered as requested

  4. IDs column option checked by default

  5. Defaults for monthly report now include the doc tables with the ID column

Ready for testing on QA.

Comment entered 2016-08-25 10:33:30 by Kline, Bob (NIH/NCI) [C]

Here is a snapshot of the scheduled jobs on DEV in the new scheduler:

I will repopulate the scheduler's jobs on QA (which got wiped out by the refresh).

Comment entered 2016-08-25 10:45:32 by Englisch, Volker (NIH/NCI) [C]

What's the sort order of the list? I'm guessing it's the next publishing date, right?

What is "Bare Publishing"? Is it publishing only or does it include the push?

Comment entered 2016-08-25 11:08:29 by Kline, Bob (NIH/NCI) [C]

I will repopulate the scheduler's jobs on QA (which got wiped out by the refresh).

Done.

What's the sort order of the list? I'm guessing it's the next publishing date, right?

The columns are click-sortable. So clicking on the third column ("Next Run") sorts the jobs with the next job to run at the top. Clicking on it again will reverse the sort. However, clicking on the first column (the default sort) does not order the rows by what's displayed in that column, but instead by the class name which implements the task (one of a number of oddities in this UI).

What is "Bare Publishing"? Is it publishing only or does it include the push?

It launches SubmitPubJob.py but none of the other scripts in the Publishing directory.

Comment entered 2016-08-25 13:43:06 by Juthe, Robin (NIH/NCI) [E]

Including the CDR IDs in the report output shouldn't change the sort for each of the tables in this report - it makes it really hard to find what we're looking for in the lists. Could you please keep the lists alphabetical when the CDR IDs are displayed?

Comment entered 2016-08-25 13:49:22 by Kline, Bob (NIH/NCI) [C]

Could you please keep the lists alphabetical when the CDR IDs are displayed?

Done (ignoring case in the comparison). Ready for testing on QA.

Comment entered 2016-08-25 13:50:11 by Englisch, Volker (NIH/NCI) [C]

Licensee List (ready for testing by Volker)

This report looks good. I would just like to have the Partner Name field be a little wider to avoid wrapping names over 5 or 6 lines. Maybe the CDR-ID could be displayed without the CDR0000 prefix?
Of course, this is a very minor issue.

Comment entered 2016-08-25 13:55:25 by Kline, Bob (NIH/NCI) [C]

Prefix dropped for CDR IDs in the licensee report.

Comment entered 2016-08-26 14:38:38 by Osei-Poku, William (NIH/NCI) [C]

Track New Clinical Trials at NLM (runs every morning at 4am)

Tested this today. It appears to be working well and it is picking up more trials on DEV than currently on PROD. Thanks!

Comment entered 2016-08-26 14:43:28 by Osei-Poku, William (NIH/NCI) [C]

Emailer Tracking Update (syncs GP mailer tracking information; runs early every morning)

I generated 4 mailer documents which were successfully processed on DEV so I expected to receive the mailer mailer update requests early this morning but it doesn't look like they were sent.

Comment entered 2016-08-29 06:38:03 by Kline, Bob (NIH/NCI) [C]

As part of the work for OCECDR-4092 the mailer script was modified to send lower-tier emailers to CancerGovTest@mail.nih.gov. We'll need to get you on that distribution list. In the meantime, you can get to the mailers using https://gpmailers-dev.cancer.gov/cgi-bin/ListGPEmailers.

Comment entered 2016-08-29 13:54:05 by Osei-Poku, William (NIH/NCI) [C]

Thanks! I was able to access and update the mailers from the GP Emailers list. The responses were sent to the geneticsdirectory@cancer.gov mailbox.

Comment entered 2016-09-06 09:26:04 by Kline, Bob (NIH/NCI) [C]

: I have documented all of the parameters supported by the jobs implemented under the new scheduler at https://collaborate.nci.nih.gov/display/OCECTBWIKI/Options+for+the+scheduled+jobs. Please review and improve as appropriate.

Thanks.

Comment entered 2016-09-06 09:39:31 by Kline, Bob (NIH/NCI) [C]

: I agree with your offline proposal that we drop FileSweeper.cfg from its old location. Please review these two versions of that file and create a reconciled version which is (to the best of your understanding) correct:

Comment entered 2016-11-01 09:26:32 by Kline, Bob (NIH/NCI) [C]

: I can't find any commits tagged with this issue's number, so I assume you're still working on the reconciliation of the two configuration files, right?

Comment entered 2016-11-01 13:06:49 by Englisch, Volker (NIH/NCI) [C]

That's correct. The FileSweeper.cfg file had been updated as part of the ticket OCECDR-4147. The file under the Scheduler directory is the most recent updated file.
I do still have to remove the config file and the Python program from the Utilities/bin directory.

Comment entered 2017-07-31 08:12:52 by Kline, Bob (NIH/NCI) [C]

Now that the scheduler appears to be more reliable on all the tiers (keeping our fingers crossed) it's time to revisit this ticket so we can get all of our scheduled tasks out of the Windows Scheduler and into the CDR Scheduler. This table only includes the tasks we're keeping (or adding).

Task

Prod

Stage

QA

Dev

Notes

Batch Job Queue

CS

CS

CS

CS

OCECDR-4266 promoted ✔

CTGov Nightly Tasks

CS

CS

CS

CS

"Clinical Trials" in CS ✔

Emailer Tracking Update

CS

CS

CS

CS

Every morning at 3:30 am ✔

Glossifier Service Refresh

CS

CS

CS

CS

Every night at 11 pm ✔

GovDelivery Weekly Reports

CS

CS

CS

CS

Sunday mornings at 1 am ✔

Hoover Cleanup

CS

CS

CS

CS

"Hoover" in CS ✔

ICRDB Stats Report

CS

CS

CS

CS

"PCIB Report" in CS ✔

Jobmaster Nightly

CS

CS

CS

CS

Nightly publishing jobs ✔

Jobmaster Weekly

CS

CS

CS

CS

Weekly full publishing job ✔

Jobmaster911

CS

CS

CS

CS

Bare/Post Publishing manual jobs in CS ✔

Licensee List

CS

CS

CS

CS

PDQ Partner List in CS ✔

Publishing Queue Check

CS

CS

CS

CS

OCECDR-4266: promoted ✔

Push Job Verifier

CS

CS

CS

CS

OCECDR-4266: promoted ✔

Restart CDR Service

CS

CS

CS

CS

Thursday evenings at 9:30 pm ✔

Track New Clinical Trials at NLM

CS

CS

CS

CS

Recent CT.gov Trials in CS ✔

Translation Job Notification

CS

CS

CS

CS

Every Monday at 5 am ✔

  • WS=Windows Scheduler

  • CS=CDR Scheduler

  • PS=Publishing Service

  • ✔=Running successfully on PROD

  • 👍=looks good for promotion to the next tier

  • ⚑=problems to be resolved

  • ❓=awaiting testing

Next steps:

  1. resolve problems with weekly publishing task

  2. confirm successful running of the jobs on DEV And QA

  3. move jobs from WS to CS on STAGE

  4. confirm successful running of the jobs on STAGE

  5. move jobs from WS to CS on PROD

Comment entered 2017-07-31 09:14:40 by Kline, Bob (NIH/NCI) [C]

I reported a DB failure logged Saturday evening at 9:52 pm on ncidb-d111

Error 5128:
Write to sparse file 'G:
Data
Microsoft SQL Server
Data
MSSQLSG_BLUE
CDR.mdf:MSSQL_DBCC23' failed due to lack of disk space.
DB-Lib error message 20018, severity 17:
General SQL Server error: Check messages from the SQL Server

Ticket number INC3178040.

Comment entered 2017-08-01 15:19:03 by Kline, Bob (NIH/NCI) [C]

, or : can you think of any reason why I shouldn't go ahead and put in the ticket to have CBIIT turn off in the Windows Scheduler on STAGE the tasks marked 👍 in the table above and installing them in the CDR Scheduler on that tier?

Comment entered 2017-08-01 15:25:24 by Englisch, Volker (NIH/NCI) [C]

No, I believe we're ready for STAGE.

Comment entered 2017-08-01 15:53:03 by Dugan, Amy (NIH/NCI) [C]

On to STAGE!

Comment entered 2017-08-01 16:31:51 by Englisch, Volker (NIH/NCI) [C]

Since we stopped copying documents to the QA-Linux server I cancelled (commented out) the cronjobs that were running:

  • Jobmaster Nightly

  • Jobmaster Weekly

  • Disk cleanup (removing old files)

Comment entered 2017-08-01 17:33:03 by Kline, Bob (NIH/NCI) [C]

Done! :-)

Comment entered 2017-08-02 08:33:44 by Kline, Bob (NIH/NCI) [C]

The STAGE server is struggling. I can't get into the administrative interface for the CDR scheduler, which is timing out. I have filed a ticket with CBIIT. I told them that an accurate diagnosis of the cause of the problems is more important than a solution, and asked them not to experiment with turning things on and off, but instead to examine things like the Event Log and the Process Manager and look for clues and report back.

Comment entered 2017-08-02 10:33:51 by Englisch, Volker (NIH/NCI) [C]

Is it possible STAGE is currently publishing Summary documents?

Comment entered 2017-08-02 10:56:05 by Kline, Bob (NIH/NCI) [C]

No, the publishing job (including the push) finished last night before 9 pm. And this morning's clinical trial download and import jobs completed successfully well before 7 am. Shawn tells me that there's a python process which is using up a lot of the system's memory.

Comment entered 2017-08-02 11:03:11 by Kline, Bob (NIH/NCI) [C]

Without any explanation, the system is responding again. I know it's not because the scheduler got bounced, because (a) there's no record of that having happened in the logs; (b) we didn't get the email notification which is sent out when that happens; and (c) the process ID for the service hasn't changed. Shawn did imply that the server's only got 3GB of RAM, so I've asked if we can bump that up (this isn't Linux, after all). :-)

Comment entered 2017-08-02 12:13:53 by Kline, Bob (NIH/NCI) [C]

Shawn has put in a request to bring the RAM on STAGE up to the level on PROD (8GB). The exact times when the scheduler was not logging any activity were from 7:11:00.428 until 9:23:44.693.

Comment entered 2017-08-23 07:45:23 by Kline, Bob (NIH/NCI) [C]

All sixteen of the jobs to be migrated have been turned off in the Windows Scheduler on the production server and have been plugged into the new CDR Scheduler. Ten of the jobs have run successfully at least once, and none of those have failed. Of the remaining six, three are weekly jobs, two are monthly jobs, and one (split into two separate tasks in the CDR scheduler) is an unscheduled manually run task (Jobmaster911 in the Windows Scheduler). So we will have confirmation in a little over a week for everything except the manual task. I'll declare success even without testing of the manual task (though if Volker has occasion to use it during the next week that would be a bonus – from the perspective of this ticket). The table above in the July 31 comment has been updated to reflect the new statuses of the tasks.

Comment entered 2017-08-23 10:50:07 by Englisch, Volker (NIH/NCI) [C]

All sixteen of the jobs to be migrated have been turned on in the Windows Scheduler

I'm sure you meant to say: turned off :-)

Comment entered 2017-08-23 11:08:21 by Kline, Bob (NIH/NCI) [C]

Yup! Fixed.

Comment entered 2017-08-25 11:46:38 by Kline, Bob (NIH/NCI) [C]

I was able to check off the "Restart CDR Service" task, which ran successfully as scheduled yesterday evening. However, I should point out that the logs also record a restart of the service around midnight, when network failures broke the connection to the database, , , and .

Comment entered 2017-08-31 09:38:08 by Kline, Bob (NIH/NCI) [C]

Only two tasks remaining to be confirmed on PROD. They're the monthly tasks, and since they both run on the first of the month, we'll be looking at them tomorrow.

Comment entered 2017-09-01 11:22:47 by Englisch, Volker (NIH/NCI) [C]

There seems to be a minor problem with the PDQ Content Partner report. It ran successfully but was delivered to the wrong distribution list. The last report was delivered to Margaret, Nanci, Operator, and myself. This month's report was delivered to Bob and myself.

Comment entered 2017-09-01 13:52:11 by Kline, Bob (NIH/NCI) [C]

That's an anomaly which is being corrected by Feynman. Please forward this month's report as appropriate.

Comment entered 2017-09-01 13:53:33 by Kline, Bob (NIH/NCI) [C]

All the tasks have green checks. I think we're done with this ticket.

Comment entered 2017-09-08 12:56:43 by Englisch, Volker (NIH/NCI) [C]

The scheduled tasks are running successfully on PROD.

Closing ticket.

Attachments
File Name Posted User
2016-08-25 09_11_57-Scheduler.png 2016-08-25 10:30:30 Kline, Bob (NIH/NCI) [C]

Elapsed: 0:00:00.001416