=============================================================================== SUMMARY =============================================================================== Thu Dec 15 Scheduler installed; WSM tries to start multiple instances at once Sat Dec 17 Scheduler running with a single instance Sat Dec 17 Scheduler restarts 3 times starting at 7:40 pm; shutdown not logged Tue Jan 10 Exception thrown: "Not connected to any MS SQL Server" Sat Jan 21 Scheduler restarted at 9 pm; still no shutdown logging Tue Mar 07 Scheduler restarted at 3:44 pm; no shutdown logging Wed Mar 15 Einstein deployed (with zombie cleanup); scheduler restarts itself Fri Mar 17 Job for translation queue notification (TQN) installed Mon Mar 20 TQN job runs successfully as scheduled at 5 am Mon Mar 20 Job to bounce the CDR Publishing service is installed (disabled) Sat Mar 25 Scheduler restarted twice at 8:40 pm; still no shutdown logging Mon Mar 27 TQN job runs successfully as scheduled at 5 am Mon Apr 03 TQN job runs successfully as scheduled at 5 am Fri Apr 07 Weekly publishing job installed and run; fails (DB unavailable) Mon Apr 10 TQN job starts but interrupted by scheduler restart; marked FAILED Mon Apr 17 Exception "Not connected to any MS SQL Server" at 5 am (TQN job) Mon Apr 17 Scheduler restart fails first admin connection; 2nd attempt succeeds Sat Apr 22 Scheduler restarts 6:04 pm; no shutdown logging Sun Apr 23 Scheduler restarts 2:11 am; no shutdown logging Mon Apr 24 Exception "Not connected to any MS SQL Server" at 5 am (TQN job) Mon Apr 24 Gov Delivery report installed (during which scheduler restarted) Mon May 01 Exception "Not connected to any MS SQL Server" at 5 am (TQN job) Mon May 01 Trial tracking table refresh job installed Sat May 06 Scheduler restarts 8:11 pm; no shutdown logging Mon May 08 TQN job runs successfully as scheduled at 5 am Tue May 09 Scheduler restarts 5:57 pm; no shutdown logging Tue May 09 Scheduler restarts 2x, starting 7:32 pm; no shutdown logging Mon May 15 TQN job runs successfully as scheduled at 5 am Wed May 17 Pub queue check (PQC) installed; runs every 2 minutes Sat May 20 MSSQLDatabaseException: "SHUTDOWN ..." at 7:16 pm Sat May 20 Scheduler restarts (twice) and resumes processing (11:14 pm) Mon May 22 TQN job runs successfully as scheduled at 5 am Mon May 22 Scheduler logs unknown database failure but continues Mon May 29 TQN job runs successfully as scheduled at 5 am Tue May 30 Nightly glossifier job installed; no restarts logged Thu Jun 01 Scheduler loses db connection and stops processing Fri Jun 09 CBIIT starts 2 scheduler instances; 1 keeps trying to restart Mon Jun 12 TQN job runs successfully as scheduled at 5 am Mon Jun 12 Second instance stops logging restarts every 2-3 seconds Mon Jun 19 TQN job runs successfully as scheduled at 5 am Mon Jun 26 TQN job runs successfully as scheduled at 5 am Wed Jun 28 Scheduler restarts itself; WCM tries to launch 2nd instance Thu Jun 29 Scheduler loses db connect; WCM keeps trying to launch another Fri Jun 30 CBIIT shuts down the service =============================================================================== ANNOTATED LOGS =============================================================================== Installed on PROD 2016-12-15: 2016-12-15 10:33:47.533 [INFO] Scheduler started 2016-12-15 10:33:47.533 [INFO] Scheduler started 2016-12-15 10:33:47.598 [INFO] Running server at 127.0.0.1:8888 ... 2016-12-15 10:33:47.601 [INFO] *** You can access scheduler web ui at http://localhost:8888 *** 2016-12-15 10:33:47.602 [DEBUG] Looking for jobs to run 2016-12-15 10:33:47.619 [DEBUG] No jobs; waiting until a job is added 2016-12-15 10:33:50.400 [INFO] Scheduler started 2016-12-15 10:33:52.895 [INFO] Scheduler started 2016-12-15 10:33:55.214 [INFO] Scheduler started 2016-12-15 10:33:57.578 [INFO] Scheduler started 2016-12-15 10:33:59.757 [INFO] Scheduler started .... making it appear that the Windows Service Manager tried launching two instances of the scheduler simultaneously, with the second instance failing the call to listen on the port that the first instance had already claimed. The failed second instance shuts down (without logging the failure) and the Windows Service Manager (I'm guessing that's what doing this) tries to start it again, over and over, about 2 seconds between attempts. This goes on for about 2 and 1/2 days until about 6:30 pm Saturday December 17, when a successful start of the scheduler service is logged (perhaps the first instance was aborted without any log entry?). 2016-12-17 18:34:41.624 [INFO] Running server at 127.0.0.1:8888 ... 2016-12-17 18:34:41.642 [INFO] *** You can access scheduler web ui at http://localhost:8888 *** 2016-12-17 18:34:41.644 [DEBUG] Looking for jobs to run 2016-12-17 18:34:41.674 [DEBUG] No jobs; waiting until a job is added Then (again without a log entry for a shutdown), the service starts again: 2016-12-17 19:40:06.154 [INFO] Scheduler started 2016-12-17 19:40:06.187 [INFO] Running server at 127.0.0.1:8888 ... 2016-12-17 19:40:06.187 [INFO] *** You can access scheduler web ui at http://localhost:8888 *** 2016-12-17 19:40:06.188 [DEBUG] Looking for jobs to run 2016-12-17 19:40:06.223 [DEBUG] No jobs; waiting until a job is added Then again 8 and 1/2 minutes later: 2016-12-17 19:48:30.665 [INFO] Scheduler started 2016-12-17 19:48:30.724 [INFO] Running server at 127.0.0.1:8888 ... 2016-12-17 19:48:30.730 [INFO] *** You can access scheduler web ui at http://localhost:8888 *** 2016-12-17 19:48:30.730 [DEBUG] Looking for jobs to run 2016-12-17 19:48:30.749 [DEBUG] No jobs; waiting until a job is added And again a little over 2 minutes later: 2016-12-17 20:01:46.464 [INFO] Scheduler started 2016-12-17 20:01:46.519 [INFO] Running server at 127.0.0.1:8888 ... 2016-12-17 20:01:46.519 [INFO] *** You can access scheduler web ui at http://localhost:8888 *** 2016-12-17 20:01:46.520 [DEBUG] Looking for jobs to run 2016-12-17 20:01:46.536 [DEBUG] No jobs; waiting until a job is added After that, absolutely nothing in the logs for a couple of weeks; then someone brings up the user interface: 2016-12-30 19:04:29.947 [INFO] 200 GET /?Session=5866F606-A0F30E-107-TDNF5LDMMPBJ (127.0.0.1) 214.00ms 2016-12-30 19:04:31.599 [INFO] 200 GET /api/v1/jobs?Session=5866F606-A0F30E-107-TDNF5LDMMPBJ (127.0.0.1) 15.00ms Then silence again for about a dozen days, after which another hit for the web interface, this time failing with a nice meaty stack trace: 2017-01-10 10:31:36.835 [INFO] 200 GET /?Session=5874FE50-64C4F0-107-E1V8H9GPGCXI (127.0.0.1) 19.00ms 2017-01-10 10:31:38.653 [ERROR] Exception during reset or similar Traceback (most recent call last): File "D:\Python\lib\site-packages\sqlalchemy\pool.py", line 567, in _finalize_fairy fairy._reset(pool) File "D:\Python\lib\site-packages\sqlalchemy\pool.py", line 701, in _reset pool._dialect.do_rollback(self) File "D:\Python\lib\site-packages\sqlalchemy\engine\default.py", line 412, in do_rollback dbapi_connection.rollback() File "pymssql.pyx", line 330, in pymssql.Connection.rollback (pymssql.c:5166) File "pymssql.pyx", line 314, in pymssql.Connection.rollback (pymssql.c:5089) File "_mssql.pyx", line 980, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11361) File "_mssql.pyx", line 1001, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11183) File "_mssql.pyx", line 1156, in _mssql.MSSQLConnection.format_and_run_query (_mssql.c:12584) File "_mssql.pyx", line 712, in _mssql.MSSQLConnection.cancel (_mssql.c:7494) File "_mssql.pyx", line 1635, in _mssql.assert_connected (_mssql.c:17597) MSSQLDriverException: Not connected to any MS SQL server 2017-01-10 10:31:38.686 [ERROR] Uncaught exception GET /api/v1/jobs?Session=5874FE50-64C4F0-107-E1V8H9GPGCXI (127.0.0.1) HTTPServerRequest(protocol='http', host='localhost:8888', method='GET', uri='/api/v1/jobs?Session=5874FE50-64C4F0-107-E1V8H9GPGCXI', version='HTTP/1.1', remote_ip='127.0.0.1', headers={'Accept-Encoding': 'gzip, deflate', 'Host': 'localhost:8888', 'Connection': 'keep-alive', 'Accept': '*/*', 'User-Agent': 'python-requests/2.12.3'}) Traceback (most recent call last): File "D:\Python\lib\site-packages\tornado\web.py", line 1401, in _stack_context_handle_exception raise_exc_info((type, value, traceback)) File "D:\Python\lib\site-packages\tornado\stack_context.py", line 314, in wrapped ret = fn(*args, **kwargs) File "D:\Python\lib\site-packages\tornado\gen.py", line 196, in final_callback if future.result() is not None: File "D:\Python\lib\site-packages\tornado\concurrent.py", line 232, in result raise_exc_info(self._exc_info) File "D:\Python\lib\site-packages\tornado\gen.py", line 1014, in run yielded = self.gen.throw(*exc_info) File "D:\Python\lib\site-packages\ndscheduler\server\handlers\jobs.py", line 64, in get_jobs_yield return_json = yield self.get_jobs() File "D:\Python\lib\site-packages\tornado\gen.py", line 1008, in run value = future.result() File "D:\Python\lib\site-packages\concurrent\futures\_base.py", line 398, in result return self.__get_result() File "D:\Python\lib\site-packages\concurrent\futures\thread.py", line 55, in run result = self.fn(*self.args, **self.kwargs) File "D:\Python\lib\site-packages\ndscheduler\server\handlers\jobs.py", line 59, in get_jobs return self._get_jobs() File "D:\Python\lib\site-packages\ndscheduler\server\handlers\jobs.py", line 21, in _get_jobs jobs = self.scheduler_manager.get_jobs() File "D:\Python\lib\site-packages\ndscheduler\core\scheduler_manager.py", line 126, in get_jobs return self.sched.get_jobs() File "D:\Python\lib\site-packages\apscheduler\schedulers\base.py", line 458, in get_jobs jobs.extend(store.get_all_jobs()) File "D:\Python\lib\site-packages\apscheduler\jobstores\sqlalchemy.py", line 73, in get_all_jobs return self._get_jobs() File "D:\Python\lib\site-packages\apscheduler\jobstores\sqlalchemy.py", line 122, in _get_jobs for row in self.engine.execute(selectable): File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1978, in execute return connection.execute(statement, *multiparams, **params) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 914, in execute return meth(self, multiparams, params) File "D:\Python\lib\site-packages\sqlalchemy\sql\elements.py", line 323, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1010, in _execute_clauseelement compiled_sql, distilled_params File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1146, in _execute_context context) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1325, in _handle_dbapi_exception self._autorollback() File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 791, in _autorollback self._root._rollback_impl() File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 670, in _rollback_impl self._handle_dbapi_exception(e, None, None, None, None) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1263, in _handle_dbapi_exception exc_info File "D:\Python\lib\site-packages\sqlalchemy\util\compat.py", line 199, in raise_from_cause reraise(type(exception), exception, tb=exc_tb) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 668, in _rollback_impl self.engine.dialect.do_rollback(self.connection) File "D:\Python\lib\site-packages\sqlalchemy\engine\default.py", line 412, in do_rollback dbapi_connection.rollback() File "pymssql.pyx", line 330, in pymssql.Connection.rollback (pymssql.c:5166) File "pymssql.pyx", line 314, in pymssql.Connection.rollback (pymssql.c:5089) File "_mssql.pyx", line 980, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11361) File "_mssql.pyx", line 1001, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11183) File "_mssql.pyx", line 1156, in _mssql.MSSQLConnection.format_and_run_query (_mssql.c:12584) File "_mssql.pyx", line 712, in _mssql.MSSQLConnection.cancel (_mssql.c:7494) File "_mssql.pyx", line 1635, in _mssql.assert_connected (_mssql.c:17597) DBAPIError: (_mssql.MSSQLDriverException) Not connected to any MS SQL server Then a 500 error page is sent back to the user: 2017-01-10 10:31:38.878 [ERROR] 500 GET /api/v1/jobs?Session=5874FE50-64C4F0-107-E1V8H9GPGCXI (127.0.0.1) 246.00ms The user waits about 10 seconds and tries again, this time successfully: 2017-01-10 10:31:48.753 [INFO] 200 GET /?Session=5874FE50-64C4F0-107-E1V8H9GPGCXI (127.0.0.1) 16.00ms 2017-01-10 10:31:50.203 [INFO] 200 GET /api/v1/jobs?Session=5874FE50-64C4F0-107-E1V8H9GPGCXI (127.0.0.1) 40.00ms Then radio silence for another eleven days, when the scheduler is started again: 2017-01-21 20:59:57.980 [INFO] Scheduler started 2017-01-21 20:59:58.092 [INFO] Running server at 127.0.0.1:8888 ... 2017-01-21 20:59:58.094 [INFO] *** You can access scheduler web ui at http://localhost:8888 *** 2017-01-21 20:59:58.095 [DEBUG] Looking for jobs to run 2017-01-21 20:59:58.131 [DEBUG] No jobs; waiting until a job is added Another couple of checks of the web front end a couple of days later: 2017-01-23 09:22:36.546 [INFO] 200 GET /?Session=588610D0-C6197B-002-PPMDFBGKC8EV (127.0.0.1) 124.00ms 2017-01-23 09:22:38.394 [INFO] 200 GET /api/v1/jobs?Session=588610D0-C6197B-002-PPMDFBGKC8EV (127.0.0.1) 13.00ms 2017-01-23 11:27:52.598 [INFO] 200 GET /?Session=58862F00-287E35-107-IBMBR3EU64VO (127.0.0.1) 7.00ms 2017-01-23 11:27:55.016 [INFO] 200 GET /api/v1/jobs?Session=58862F00-287E35-107-IBMBR3EU64VO (127.0.0.1) 44.00ms Then nothing until March: 2017-03-07 15:44:15.480 [INFO] Scheduler started 2017-03-07 15:44:15.523 [INFO] Running server at 127.0.0.1:8888 ... 2017-03-07 15:44:15.525 [INFO] *** You can access scheduler web ui at http://localhost:8888 *** 2017-03-07 15:44:15.525 [DEBUG] Looking for jobs to run 2017-03-07 15:44:15.542 [DEBUG] No jobs; waiting until a job is added Then nothing for another week or so, when Einstein was deployed, with an update of a module used by the scheduler: 2017-03-15 19:24:37.644 [INFO] D:\Python\lib\site-packages\pytz\exceptions.py modified; restarting server 2017-03-15 19:24:45.269 [INFO] Scheduler started 2017-03-15 19:24:45.286 [INFO] Scheduler started 2017-03-15 19:24:45.359 [INFO] Running server at 127.0.0.1:8888 ... 2017-03-15 19:24:45.361 [INFO] *** You can access scheduler web ui at http://localhost:8888 *** 2017-03-15 19:24:45.365 [DEBUG] Looking for jobs to run 2017-03-15 19:24:45.418 [DEBUG] No jobs; waiting until a job is added Notice that we again have multiple instances starting at the same time, but here we don't see the second one trying over and over to bind to 8888. Then it gets started again after about eight minutes: 2017-03-15 19:32:02.907 [INFO] found 0 zombie executions 2017-03-15 19:32:02.910 [INFO] Scheduler started 2017-03-15 19:32:02.917 [INFO] Running server at 127.0.0.1:8888 ... 2017-03-15 19:32:02.917 [INFO] *** You can access scheduler web ui at http://localhost:8888 *** The new sweep for zombie jobs (OCECDR-4064) has been added as part of Einstein. Then as a manual step for the Einstein release, the Monday morning job for notifying CIAT's Spanish translation staff about what's in their queues is set up: 2017-03-17 23:07:57.280 [INFO] 200 GET /?Session=58CCA47A-144802-002-EHL7APDIS1OM (127.0.0.1) 122.00ms 2017-03-17 23:08:00.058 [INFO] 200 GET /api/v1/jobs?Session=58CCA47A-144802-002-EHL7APDIS1OM (127.0.0.1) 57.00ms 2017-03-17 23:11:10.750 [INFO] Added job "Translation Job Notification" to job store "default" 2017-03-17 23:11:10.823 [INFO] 201 POST /api/v1/jobs (127.0.0.1) 364.00ms 2017-03-17 23:11:12.105 [INFO] 200 GET /api/v1/jobs?Session=58CCA47A-144802-002-EHL7APDIS1OM (127.0.0.1) 15.00ms And the following Monday morning it runs successfully: 2017-03-20 05:00:00.513 [INFO] Running job "Translation Job Notification (trigger: cron[month='*', day='*', day_of_week='mon', hour='5', minute='0'], next run at: 2017-03-20 05:00:00 EDT)" (scheduled at 2017-03-20 05:00:00-04:00) 2017-03-20 05:00:02.724 [INFO] Job "Translation Job Notification (trigger: cron[month='*', day='*', day_of_week='mon', hour='5', minute='0'], next run at: 2017-03-27 05:00:00 EDT)" executed successfully Later that morning the job to bounce the CDR publishing service is installed: 2017-03-20 09:51:40.980 [INFO] 200 GET /?Session=58CFDE58-AE5416-002-XDERAHG4OHVL (127.0.0.1) 27.00ms 2017-03-20 09:51:43.168 [INFO] 200 GET /api/v1/jobs?Session=58CFDE58-AE5416-002-XDERAHG4OHVL (127.0.0.1) 22.00ms 2017-03-20 09:52:30.806 [INFO] Added job "Bounce CDR Publishing Service" to job store "default" 2017-03-20 09:52:30.819 [INFO] 201 POST /api/v1/jobs (127.0.0.1) 32.00ms 2017-03-20 09:52:32.219 [INFO] 200 GET /api/v1/jobs?Session=58CFDE58-AE5416-002-XDERAHG4OHVL (127.0.0.1) 20.00ms 2017-03-20 09:52:38.838 [INFO] 200 PATCH /api/v1/jobs/7629ae5e0d7411e78fa70050569d4f34 (127.0.0.1) 41.00ms 2017-03-20 09:52:40.114 [INFO] 200 GET /api/v1/jobs?Session=58CFDE58-AE5416-002-XDERAHG4OHVL (127.0.0.1) 19.00ms Then on Thursday one of us looks at the web interface (probably me showing Volker the installed jobs): 2017-03-23 14:27:55.624 [INFO] 200 GET /?Session=58D40FF2-101679-002-NNW73EG3RURN (127.0.0.1) 16.00ms 2017-03-23 14:27:57.279 [INFO] 200 GET /api/v1/jobs?Session=58D40FF2-101679-002-NNW73EG3RURN (127.0.0.1) 12.00ms Then the following Saturday night the scheduler gets restarted: 2017-03-25 20:39:20.851 [INFO] found 0 zombie executions 2017-03-25 20:39:20.858 [INFO] Scheduler started 2017-03-25 20:39:20.903 [INFO] Running server at 127.0.0.1:8888 ... 2017-03-25 20:39:20.904 [INFO] *** You can access scheduler web ui at http://localhost:8888 *** And for some reason restarted again about five minutes later: 2017-03-25 20:43:46.032 [INFO] found 0 zombie executions 2017-03-25 20:43:46.058 [INFO] Scheduler started 2017-03-25 20:43:46.121 [INFO] Running server at 127.0.0.1:8888 ... 2017-03-25 20:43:46.121 [INFO] *** You can access scheduler web ui at http://localhost:8888 *** Notice that we're still not getting any logging from a clean shutdown. Maybe this is one of the things that makes the ndscheduler package unsupported on Windows: on Linux the services get sent the proper signals to shut down. When we run from the command line on Windows we can send a SIGINT signal (with Ctrl+C) and the scheduler catches the signal, logs it, and shuts down cleanly. Perhaps the Windows Service Manager uses something more nuclear. :-) At any rate, the next two Monday mornings the Translation Job Notification job runs without a hitch: 2017-03-27 05:00:00.086 [INFO] Running job "Translation Job Notification (trigger: cron[month='*', day='*', day_of_week='mon', hour='5', minute='0'], next run at: 2017-03-27 05:00:00 EDT)" (scheduled at 2017-03-27 05:00:00-04:00) 2017-03-27 05:00:00.584 [INFO] Job "Translation Job Notification (trigger: cron[month='*', day='*', day_of_week='mon', hour='5', minute='0'], next run at: 2017-04-03 05:00:00 EDT)" executed successfully 2017-04-03 05:00:00.125 [INFO] Running job "Translation Job Notification (trigger: cron[month='*', day='*', day_of_week='mon', hour='5', minute='0'], next run at: 2017-04-03 05:00:00 EDT)" (scheduled at 2017-04-03 05:00:00-04:00) 2017-04-03 05:00:00.624 [INFO] Job "Translation Job Notification (trigger: cron[month='*', day='*', day_of_week='mon', hour='5', minute='0'], next run at: 2017-04-10 05:00:00 EDT)" executed successfully Then on Friday evening, April 7, Volker installs and runs the Jobmaster Weekly job, because the original job had failed. The new job is marked as disabled in the CDR Scheduler, to be only run manually when problems like this occur, until we're ready to switch over from the Windows Scheduler. The publishing job failed running under the CDR scheduler, probably for the same reason (database problems) that the original job failed: 2017-04-07 18:33:16.098 [INFO] 200 GET /?Session=58E813A3-7442F5-107-QV110ES5VN3F (127.0.0.1) 55.00ms 2017-04-07 18:33:17.762 [INFO] 200 GET /api/v1/jobs?Session=58E813A3-7442F5-107-QV110ES5VN3F (127.0.0.1) 24.00ms 2017-04-07 18:35:38.345 [INFO] Added job "Jobmaster Weekly" to job store "default" 2017-04-07 18:35:38.391 [INFO] 201 POST /api/v1/jobs (127.0.0.1) 371.00ms 2017-04-07 18:35:39.203 [INFO] 200 GET /api/v1/jobs?Session=58E813A3-7442F5-107-QV110ES5VN3F (127.0.0.1) 21.00ms 2017-04-07 18:35:47.447 [INFO] 200 PATCH /api/v1/jobs/85d88d4f1be211e7afb10050569d4f34 (127.0.0.1) 33.00ms 2017-04-07 18:35:48.255 [INFO] 200 GET /api/v1/jobs?Session=58E813A3-7442F5-107-QV110ES5VN3F (127.0.0.1) 7.00ms 2017-04-07 18:36:38.266 [INFO] weekly publishing task started 2017-04-07 18:36:38.266 [INFO] sending 'started' notification 2017-04-07 18:36:38.890 [INFO] python D:/cdr/publishing/SubmitPubJob.py --livemode --export 2017-04-07 21:04:54.131 [ERROR] SubmitPubJob.py 2017-04-07 21:04:54.839 [ERROR] failure in SubmitPubJob.py Traceback (most recent call last): File "D:\cdr\Scheduler\jobs\basic_job.py", line 31, in run status = task.Perform() File "D:\cdr\Scheduler\tasks\publishing_task.py", line 27, in Perform self.control.run() File "D:\cdr\Scheduler\tasks\publishing_task.py", line 128, in run self.launch("SubmitPubJob.py", merge_output=True) File "D:\cdr\Scheduler\tasks\publishing_task.py", line 224, in launch self.report_error(script) File "D:\cdr\Scheduler\tasks\publishing_task.py", line 182, in report_error raise TaskException("failure in SubmitPubJob.py") TaskException: failure in SubmitPubJob.py 2017-04-07 21:04:54.875 [ERROR] failure in SubmitPubJob.py Traceback (most recent call last): File "D:\Python\lib\site-packages\ndscheduler\job.py", line 100, in run_job job.run(*args, **kwargs) File "D:\cdr\Scheduler\jobs\basic_job.py", line 31, in run status = task.Perform() File "D:\cdr\Scheduler\tasks\publishing_task.py", line 27, in Perform self.control.run() File "D:\cdr\Scheduler\tasks\publishing_task.py", line 128, in run self.launch("SubmitPubJob.py", merge_output=True) File "D:\cdr\Scheduler\tasks\publishing_task.py", line 224, in launch self.report_error(script) File "D:\cdr\Scheduler\tasks\publishing_task.py", line 182, in report_error raise TaskException("failure in SubmitPubJob.py") TaskException: failure in SubmitPubJob.py 2017-04-07 21:04:54.887 [ERROR] Exception during reset or similar Traceback (most recent call last): File "D:\Python\lib\site-packages\sqlalchemy\pool.py", line 567, in _finalize_fairy fairy._reset(pool) File "D:\Python\lib\site-packages\sqlalchemy\pool.py", line 701, in _reset pool._dialect.do_rollback(self) File "D:\Python\lib\site-packages\sqlalchemy\engine\default.py", line 412, in do_rollback dbapi_connection.rollback() File "pymssql.pyx", line 330, in pymssql.Connection.rollback (pymssql.c:5166) File "pymssql.pyx", line 314, in pymssql.Connection.rollback (pymssql.c:5089) File "_mssql.pyx", line 980, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11361) File "_mssql.pyx", line 1001, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11183) File "_mssql.pyx", line 1156, in _mssql.MSSQLConnection.format_and_run_query (_mssql.c:12584) File "_mssql.pyx", line 712, in _mssql.MSSQLConnection.cancel (_mssql.c:7494) File "_mssql.pyx", line 1635, in _mssql.assert_connected (_mssql.c:17597) MSSQLDriverException: Not connected to any MS SQL server 2017-04-07 21:04:54.921 [ERROR] (_mssql.MSSQLDriverException) Not connected to any MS SQL server Traceback (most recent call last): File "D:\Python\lib\site-packages\ndscheduler\core\scheduler\base.py", line 45, in run_job cls.run_scheduler_job(job_class, job_id, execution_id, *args, **kwargs) File "D:\Python\lib\site-packages\ndscheduler\core\scheduler\base.py", line 65, in run_scheduler_job job_class.run_job(job_id, execution_id, *args, **kwargs) File "D:\Python\lib\site-packages\ndscheduler\job.py", line 107, in run_job description=cls.get_failed_description()) File "D:\Python\lib\site-packages\ndscheduler\core\datastore\providers\base.py", line 79, in update_execution self.engine.execute(execution_update) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1978, in execute return connection.execute(statement, *multiparams, **params) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 914, in execute return meth(self, multiparams, params) File "D:\Python\lib\site-packages\sqlalchemy\sql\elements.py", line 323, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1010, in _execute_clauseelement compiled_sql, distilled_params File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1146, in _execute_context context) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1325, in _handle_dbapi_exception self._autorollback() File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 791, in _autorollback self._root._rollback_impl() File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 670, in _rollback_impl self._handle_dbapi_exception(e, None, None, None, None) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1263, in _handle_dbapi_exception exc_info File "D:\Python\lib\site-packages\sqlalchemy\util\compat.py", line 199, in raise_from_cause reraise(type(exception), exception, tb=exc_tb) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 668, in _rollback_impl self.engine.dialect.do_rollback(self.connection) File "D:\Python\lib\site-packages\sqlalchemy\engine\default.py", line 412, in do_rollback dbapi_connection.rollback() File "pymssql.pyx", line 330, in pymssql.Connection.rollback (pymssql.c:5166) File "pymssql.pyx", line 314, in pymssql.Connection.rollback (pymssql.c:5089) File "_mssql.pyx", line 980, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11361) File "_mssql.pyx", line 1001, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11183) File "_mssql.pyx", line 1156, in _mssql.MSSQLConnection.format_and_run_query (_mssql.c:12584) File "_mssql.pyx", line 712, in _mssql.MSSQLConnection.cancel (_mssql.c:7494) File "_mssql.pyx", line 1635, in _mssql.assert_connected (_mssql.c:17597) DBAPIError: (_mssql.MSSQLDriverException) Not connected to any MS SQL server 2017-04-07 21:04:55.020 [ERROR] Exception during reset or similar Traceback (most recent call last): File "D:\Python\lib\site-packages\sqlalchemy\pool.py", line 567, in _finalize_fairy fairy._reset(pool) File "D:\Python\lib\site-packages\sqlalchemy\pool.py", line 701, in _reset pool._dialect.do_rollback(self) File "D:\Python\lib\site-packages\sqlalchemy\engine\default.py", line 412, in do_rollback dbapi_connection.rollback() File "pymssql.pyx", line 330, in pymssql.Connection.rollback (pymssql.c:5166) File "pymssql.pyx", line 314, in pymssql.Connection.rollback (pymssql.c:5089) File "_mssql.pyx", line 980, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11361) File "_mssql.pyx", line 1001, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11183) File "_mssql.pyx", line 1156, in _mssql.MSSQLConnection.format_and_run_query (_mssql.c:12584) File "_mssql.pyx", line 712, in _mssql.MSSQLConnection.cancel (_mssql.c:7494) File "_mssql.pyx", line 1635, in _mssql.assert_connected (_mssql.c:17597) MSSQLDriverException: Not connected to any MS SQL server 2017-04-07 21:04:55.043 [ERROR] Uncaught exception POST /api/v1/executions/85d88d4f1be211e7afb10050569d4f34 (127.0.0.1) HTTPServerRequest(protocol='http', host='localhost:8888', method='POST', uri='/api/v1/executions/85d88d4f1be211e7afb10050569d4f34', version='HTTP/1.1', remote_ip='127.0.0.1', headers={'Content-Length': '2', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'keep-alive', 'Accept': '*/*', 'User-Agent': 'python-requests/2.13.0', 'Host': 'localhost:8888', 'Content-Type': 'application/json'}) Traceback (most recent call last): File "D:\Python\lib\site-packages\tornado\web.py", line 1401, in _stack_context_handle_exception raise_exc_info((type, value, traceback)) File "D:\Python\lib\site-packages\tornado\stack_context.py", line 314, in wrapped ret = fn(*args, **kwargs) File "D:\Python\lib\site-packages\tornado\gen.py", line 196, in final_callback if future.result() is not None: File "D:\Python\lib\site-packages\tornado\concurrent.py", line 232, in result raise_exc_info(self._exc_info) File "D:\Python\lib\site-packages\tornado\gen.py", line 1014, in run yielded = self.gen.throw(*exc_info) File "D:\Python\lib\site-packages\ndscheduler\server\handlers\executions.py", line 154, in run_job_yield return_json = yield self.run_job(job_id) File "D:\Python\lib\site-packages\tornado\gen.py", line 1008, in run value = future.result() File "D:\Python\lib\site-packages\concurrent\futures\_base.py", line 398, in result return self.__get_result() File "D:\Python\lib\site-packages\concurrent\futures\thread.py", line 55, in run result = self.fn(*self.args, **self.kwargs) File "D:\Python\lib\site-packages\ndscheduler\server\handlers\executions.py", line 145, in run_job return self._run_job(job_id) File "D:\Python\lib\site-packages\ndscheduler\server\handlers\executions.py", line 126, in _run_job job_id, *args, **kwargs) File "D:\Python\lib\site-packages\ndscheduler\core\scheduler\base.py", line 50, in run_job description=job.JobBase.get_scheduled_error_description()) File "D:\Python\lib\site-packages\ndscheduler\core\datastore\providers\base.py", line 79, in update_execution self.engine.execute(execution_update) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1978, in execute return connection.execute(statement, *multiparams, **params) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 914, in execute return meth(self, multiparams, params) File "D:\Python\lib\site-packages\sqlalchemy\sql\elements.py", line 323, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1010, in _execute_clauseelement compiled_sql, distilled_params File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1146, in _execute_context context) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1325, in _handle_dbapi_exception self._autorollback() File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 791, in _autorollback self._root._rollback_impl() File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 670, in _rollback_impl self._handle_dbapi_exception(e, None, None, None, None) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1263, in _handle_dbapi_exception exc_info File "D:\Python\lib\site-packages\sqlalchemy\util\compat.py", line 199, in raise_from_cause reraise(type(exception), exception, tb=exc_tb) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 668, in _rollback_impl self.engine.dialect.do_rollback(self.connection) File "D:\Python\lib\site-packages\sqlalchemy\engine\default.py", line 412, in do_rollback dbapi_connection.rollback() File "pymssql.pyx", line 330, in pymssql.Connection.rollback (pymssql.c:5166) File "pymssql.pyx", line 314, in pymssql.Connection.rollback (pymssql.c:5089) File "_mssql.pyx", line 980, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11361) File "_mssql.pyx", line 1001, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11183) File "_mssql.pyx", line 1156, in _mssql.MSSQLConnection.format_and_run_query (_mssql.c:12584) File "_mssql.pyx", line 712, in _mssql.MSSQLConnection.cancel (_mssql.c:7494) File "_mssql.pyx", line 1635, in _mssql.assert_connected (_mssql.c:17597) DBAPIError: (_mssql.MSSQLDriverException) Not connected to any MS SQL server 2017-04-07 21:04:55.131 [ERROR] 500 POST /api/v1/executions/85d88d4f1be211e7afb10050569d4f34 (127.0.0.1) 8896964.00ms Even though the SQL Server connection was unavailable for that Friday job, the scheduler was able -- without having to be restarted -- to wake up on time the next Monday morning to pull the translation queue notification job from the database successfully: 2017-04-10 05:00:00.063 [INFO] Running job "Translation Job Notification (trigger: cron[month='*', day='*', day_of_week='mon', hour='5', minute='0'], next run at: 2017-04-17 05:00:00 EDT)" (scheduled at 2017-04-10 05:00:00-04:00) ... but we don't see the line that says that the notifications were sent out successfully, and a few seconds later the scheduler restarts: 2017-04-10 05:00:11.082 [INFO] found 2 zombie executions 2017-04-10 05:00:11.095 [INFO] marking zombie 13ca530f1dcc11e798400050569d4f34 as FAILED 2017-04-10 05:00:11.109 [INFO] marking zombie a9b37a001be211e786870050569d4f34 as FAILED 2017-04-10 05:00:11.117 [INFO] Scheduler started 2017-04-10 05:00:11.190 [INFO] Running server at 127.0.0.1:8888 ... 2017-04-10 05:00:11.190 [INFO] *** You can access scheduler web ui at http://localhost:8888 *** No activity logged for a week, and then when it's time for the following Monday's translation queue notification job, the dreaded "Not connected to any MS SQL server" error: 2017-04-17 05:00:00.039 [ERROR] Exception during reset or similar Traceback (most recent call last): File "D:\Python\lib\site-packages\sqlalchemy\pool.py", line 567, in _finalize_fairy fairy._reset(pool) File "D:\Python\lib\site-packages\sqlalchemy\pool.py", line 701, in _reset pool._dialect.do_rollback(self) File "D:\Python\lib\site-packages\sqlalchemy\engine\default.py", line 412, in do_rollback dbapi_connection.rollback() File "pymssql.pyx", line 330, in pymssql.Connection.rollback (pymssql.c:5166) File "pymssql.pyx", line 314, in pymssql.Connection.rollback (pymssql.c:5089) File "_mssql.pyx", line 980, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11361) File "_mssql.pyx", line 1001, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11183) File "_mssql.pyx", line 1156, in _mssql.MSSQLConnection.format_and_run_query (_mssql.c:12584) File "_mssql.pyx", line 712, in _mssql.MSSQLConnection.cancel (_mssql.c:7494) File "_mssql.pyx", line 1635, in _mssql.assert_connected (_mssql.c:17597) MSSQLDriverException: Not connected to any MS SQL server 2017-04-17 05:00:00.076 [ERROR] Exception in callback Traceback (most recent call last): File "D:\Python\lib\site-packages\tornado\ioloop.py", line 600, in _run_callback ret = callback() File "D:\Python\lib\site-packages\tornado\stack_context.py", line 275, in null_wrapper return fn(*args, **kwargs) File "D:\Python\lib\site-packages\apscheduler\schedulers\tornado.py", line 59, in wakeup wait_seconds = self._process_jobs() File "D:\Python\lib\site-packages\ndscheduler\core\scheduler\base.py", line 181, in _process_jobs return super(apscheduler_tornado.TornadoScheduler, self)._process_jobs() File "D:\Python\lib\site-packages\apscheduler\schedulers\base.py", line 801, in _process_jobs for job in jobstore.get_due_jobs(now): File "D:\Python\lib\site-packages\apscheduler\jobstores\sqlalchemy.py", line 64, in get_due_jobs return self._get_jobs(self.jobs_t.c.next_run_time <= timestamp) File "D:\Python\lib\site-packages\apscheduler\jobstores\sqlalchemy.py", line 122, in _get_jobs for row in self.engine.execute(selectable): File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1978, in execute return connection.execute(statement, *multiparams, **params) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 914, in execute return meth(self, multiparams, params) File "D:\Python\lib\site-packages\sqlalchemy\sql\elements.py", line 323, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1010, in _execute_clauseelement compiled_sql, distilled_params File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1146, in _execute_context context) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1325, in _handle_dbapi_exception self._autorollback() File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 791, in _autorollback self._root._rollback_impl() File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 670, in _rollback_impl self._handle_dbapi_exception(e, None, None, None, None) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1263, in _handle_dbapi_exception exc_info File "D:\Python\lib\site-packages\sqlalchemy\util\compat.py", line 199, in raise_from_cause reraise(type(exception), exception, tb=exc_tb) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 668, in _rollback_impl self.engine.dialect.do_rollback(self.connection) File "D:\Python\lib\site-packages\sqlalchemy\engine\default.py", line 412, in do_rollback dbapi_connection.rollback() File "pymssql.pyx", line 330, in pymssql.Connection.rollback (pymssql.c:5166) File "pymssql.pyx", line 314, in pymssql.Connection.rollback (pymssql.c:5089) File "_mssql.pyx", line 980, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11361) File "_mssql.pyx", line 1001, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11183) File "_mssql.pyx", line 1156, in _mssql.MSSQLConnection.format_and_run_query (_mssql.c:12584) File "_mssql.pyx", line 712, in _mssql.MSSQLConnection.cancel (_mssql.c:7494) File "_mssql.pyx", line 1635, in _mssql.assert_connected (_mssql.c:17597) DBAPIError: (_mssql.MSSQLDriverException) Not connected to any MS SQL server Later that morning William wrote "The translators are complaining that they’ve not gotten the scheduled email notifications last Monday and Today," so I logged on to the web interface and ran the job by hand. It failed the first time I tried and succeeded when I tried it again: 2017-04-17 12:53:02.309 [INFO] 200 GET /?Session=58F4F2D4-02203E-002-M8HSU8DKWXTS (127.0.0.1) 92.00ms 2017-04-17 12:53:04.880 [INFO] 200 GET /api/v1/jobs?Session=58F4F2D4-02203E-002-M8HSU8DKWXTS (127.0.0.1) 334.00ms 2017-04-17 12:53:17.730 [INFO] 200 GET /api/v1/logs?Session=58F4F2D4-02203E-002-M8HSU8DKWXTS (127.0.0.1) 36.00ms 2017-04-17 12:53:34.529 [INFO] 200 GET /api/v1/logs?Session=58F4F2D4-02203E-002-M8HSU8DKWXTS&time_range_start=2017-04-16T16%253A53%253A29.657Z&time_range_end=2017-04-17T16%253A53%253A29.657Z (127.0.0.1) 9.00ms 2017-04-17 12:54:43.529 [INFO] 200 GET /api/v1/jobs?Session=58F4F2D4-02203E-002-M8HSU8DKWXTS (127.0.0.1) 11.00ms 2017-04-17 13:03:02.779 [INFO] 200 GET /?Session=58F4F2D4-02203E-002-M8HSU8DKWXTS (127.0.0.1) 6.00ms 2017-04-17 13:03:05.460 [INFO] 200 GET /api/v1/jobs?Session=58F4F2D4-02203E-002-M8HSU8DKWXTS (127.0.0.1) 62.00ms 2017-04-17 13:11:13.914 [INFO] 200 GET /?Session=58F4F721-12EB30-002-5EU2Z7HFQLJ8 (127.0.0.1) 33.00ms 2017-04-17 13:11:16.358 [INFO] 200 GET /api/v1/jobs?Session=58F4F721-12EB30-002-5EU2Z7HFQLJ8 (127.0.0.1) 21.00ms 2017-04-17 13:11:35.802 [INFO] found 1 zombie executions 2017-04-17 13:11:35.819 [INFO] marking zombie e3f61eb0239011e787920050569d4f34 as FAILED 2017-04-17 13:11:35.831 [INFO] Scheduler started 2017-04-17 13:11:35.880 [INFO] Running server at 127.0.0.1:8888 ... 2017-04-17 13:11:35.881 [INFO] *** You can access scheduler web ui at http://localhost:8888 *** 2017-04-17 13:11:35.897 [WARNING] Run time of job "Translation Job Notification (trigger: cron[month='*', day='*', day_of_week='mon', hour='5', minute='0'], next run at: 2017-04-17 05:00:00 EDT)" was missed by 8:11:35.897000 2017-04-17 13:11:51.375 [INFO] 200 GET /api/v1/logs?Session=58F4F721-12EB30-002-5EU2Z7HFQLJ8 (127.0.0.1) 12.00ms 2017-04-17 13:11:58.483 [INFO] 200 GET /api/v1/logs?Session=58F4F721-12EB30-002-5EU2Z7HFQLJ8&time_range_start=2017-04-17T17%253A01%253A55.363Z&time_range_end=2017-04-17T17%253A11%253A55.363Z (127.0.0.1) 18.00ms 2017-04-17 13:16:37.276 [INFO] 200 GET /api/v1/jobs?Session=58F4F721-12EB30-002-5EU2Z7HFQLJ8 (127.0.0.1) 16.00ms 2017-04-17 13:16:39.329 [INFO] 200 GET /api/v1/logs?Session=58F4F721-12EB30-002-5EU2Z7HFQLJ8 (127.0.0.1) 12.00ms 2017-04-17 13:16:44.565 [INFO] 200 GET /api/v1/logs?Session=58F4F721-12EB30-002-5EU2Z7HFQLJ8&time_range_start=2017-04-16T17%253A16%253A42.890Z&time_range_end=2017-04-17T17%253A16%253A42.890Z (127.0.0.1) 16.00ms 2017-04-17 13:16:57.490 [INFO] 200 GET /api/v1/jobs?Session=58F4F721-12EB30-002-5EU2Z7HFQLJ8 (127.0.0.1) 22.00ms 2017-04-17 13:17:14.078 [INFO] 200 POST /api/v1/executions/8945f2cf0b8811e797150050569d4f34 (127.0.0.1) 403.00ms Later in the day I checked the web interface to see if I got another error message (I didn't): 2017-04-17 15:13:42.509 [INFO] 200 GET /?Session=58F513CE-82F090-002-PZHI9SPAED2M (127.0.0.1) 10.00ms 2017-04-17 15:13:44.474 [INFO] 200 GET /api/v1/jobs?Session=58F513CE-82F090-002-PZHI9SPAED2M (127.0.0.1) 10.00ms Then the following Saturday the scheduler restarted shortly after 6 pm: 2017-04-22 18:04:26.345 [INFO] found 0 zombie executions 2017-04-22 18:04:26.377 [INFO] Scheduler started 2017-04-22 18:04:26.563 [INFO] Running server at 127.0.0.1:8888 ... 2017-04-22 18:04:26.576 [INFO] *** You can access scheduler web ui at http://localhost:8888 *** Then again the next morning (shortly after 2 am) another bounce: 2017-04-23 02:11:39.789 [INFO] found 0 zombie executions 2017-04-23 02:11:39.809 [INFO] Scheduler started 2017-04-23 02:11:39.858 [INFO] Running server at 127.0.0.1:8888 ... 2017-04-23 02:11:39.859 [INFO] *** You can access scheduler web ui at http://localhost:8888 *** Then Monday morning another failure ("Not connected to any MS SQL server") at 5 am when it woke up to handle the translation queue notification job: 2017-04-24 05:00:00.028 [ERROR] Exception during reset or similar Traceback (most recent call last): File "D:\Python\lib\site-packages\sqlalchemy\pool.py", line 567, in _finalize_fairy fairy._reset(pool) File "D:\Python\lib\site-packages\sqlalchemy\pool.py", line 701, in _reset pool._dialect.do_rollback(self) File "D:\Python\lib\site-packages\sqlalchemy\engine\default.py", line 412, in do_rollback dbapi_connection.rollback() File "pymssql.pyx", line 330, in pymssql.Connection.rollback (pymssql.c:5166) File "pymssql.pyx", line 314, in pymssql.Connection.rollback (pymssql.c:5089) File "_mssql.pyx", line 980, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11361) File "_mssql.pyx", line 1001, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11183) File "_mssql.pyx", line 1156, in _mssql.MSSQLConnection.format_and_run_query (_mssql.c:12584) File "_mssql.pyx", line 712, in _mssql.MSSQLConnection.cancel (_mssql.c:7494) File "_mssql.pyx", line 1635, in _mssql.assert_connected (_mssql.c:17597) MSSQLDriverException: Not connected to any MS SQL server 2017-04-24 05:00:00.062 [ERROR] Exception in callback Traceback (most recent call last): File "D:\Python\lib\site-packages\tornado\ioloop.py", line 600, in _run_callback ret = callback() File "D:\Python\lib\site-packages\tornado\stack_context.py", line 275, in null_wrapper return fn(*args, **kwargs) File "D:\Python\lib\site-packages\apscheduler\schedulers\tornado.py", line 59, in wakeup wait_seconds = self._process_jobs() File "D:\Python\lib\site-packages\ndscheduler\core\scheduler\base.py", line 181, in _process_jobs return super(apscheduler_tornado.TornadoScheduler, self)._process_jobs() File "D:\Python\lib\site-packages\apscheduler\schedulers\base.py", line 801, in _process_jobs for job in jobstore.get_due_jobs(now): File "D:\Python\lib\site-packages\apscheduler\jobstores\sqlalchemy.py", line 64, in get_due_jobs return self._get_jobs(self.jobs_t.c.next_run_time <= timestamp) File "D:\Python\lib\site-packages\apscheduler\jobstores\sqlalchemy.py", line 122, in _get_jobs for row in self.engine.execute(selectable): File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1978, in execute return connection.execute(statement, *multiparams, **params) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 914, in execute return meth(self, multiparams, params) File "D:\Python\lib\site-packages\sqlalchemy\sql\elements.py", line 323, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1010, in _execute_clauseelement compiled_sql, distilled_params File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1146, in _execute_context context) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1325, in _handle_dbapi_exception self._autorollback() File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 791, in _autorollback self._root._rollback_impl() File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 670, in _rollback_impl self._handle_dbapi_exception(e, None, None, None, None) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1263, in _handle_dbapi_exception exc_info File "D:\Python\lib\site-packages\sqlalchemy\util\compat.py", line 199, in raise_from_cause reraise(type(exception), exception, tb=exc_tb) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 668, in _rollback_impl self.engine.dialect.do_rollback(self.connection) File "D:\Python\lib\site-packages\sqlalchemy\engine\default.py", line 412, in do_rollback dbapi_connection.rollback() File "pymssql.pyx", line 330, in pymssql.Connection.rollback (pymssql.c:5166) File "pymssql.pyx", line 314, in pymssql.Connection.rollback (pymssql.c:5089) File "_mssql.pyx", line 980, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11361) File "_mssql.pyx", line 1001, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11183) File "_mssql.pyx", line 1156, in _mssql.MSSQLConnection.format_and_run_query (_mssql.c:12584) File "_mssql.pyx", line 712, in _mssql.MSSQLConnection.cancel (_mssql.c:7494) File "_mssql.pyx", line 1635, in _mssql.assert_connected (_mssql.c:17597) DBAPIError: (_mssql.MSSQLDriverException) Not connected to any MS SQL server That afternoon, Volker wrote: "This is the first time the GovDelivery report failed in almost 1 1/2 years. The report failed because it couldn’t connect to the database. I’ll have to submit a ticket to CBIIT to have this report rerun manually." I replied: "Hi, Volker. Isn't this one of the reports​ slated to run under the new CDR scheduler (meaning we should be able to ring it manually without assistance from CBIIT)?" So that's what we did (with a little stumbling while Volker wrestled with the very awkward job configuration syntax: 2017-04-24 13:02:05.700 [INFO] 200 GET /?Session=58FE2F7F-70124B-107-1U55N335X7WL (127.0.0.1) 78.00ms 2017-04-24 13:02:07.240 [INFO] 200 GET /api/v1/jobs?Session=58FE2F7F-70124B-107-1U55N335X7WL (127.0.0.1) 95.00ms 2017-04-24 13:34:21.180 [INFO] 200 GET /?Session=58FE2F7F-70124B-107-1U55N335X7WL (127.0.0.1) 6.00ms 2017-04-24 13:34:42.842 [INFO] 200 GET /api/v1/jobs?Session=58FE2F7F-70124B-107-1U55N335X7WL (127.0.0.1) 12.00ms 2017-04-24 14:30:28.696 [INFO] 200 GET /?Session=58FE443E-C11D4B-107-14N1QEJJ5BEI (127.0.0.1) 6.00ms 2017-04-24 14:30:30.403 [INFO] 200 GET /api/v1/jobs?Session=58FE443E-C11D4B-107-14N1QEJJ5BEI (127.0.0.1) 300.00ms 2017-04-24 14:31:43.536 [INFO] Added job "GovDelivery Weekly Reports" to job store "default" 2017-04-24 14:31:43.609 [INFO] 201 POST /api/v1/jobs (127.0.0.1) 366.00ms 2017-04-24 14:31:43.642 [WARNING] Run time of job "Translation Job Notification (trigger: cron[month='*', day='*', day_of_week='mon', hour='5', minute='0'], next run at: 2017-04-24 05:00:00 EDT)" was missed by 9:31:43.642000 2017-04-24 14:31:44.526 [INFO] 200 GET /api/v1/jobs?Session=58FE443E-C11D4B-107-14N1QEJJ5BEI (127.0.0.1) 12.00ms 2017-04-24 14:31:51.267 [INFO] 200 PATCH /api/v1/jobs/43dcb3cf291c11e78d5d0050569d4f34 (127.0.0.1) 51.00ms 2017-04-24 14:31:52.187 [INFO] 200 GET /api/v1/jobs?Session=58FE443E-C11D4B-107-14N1QEJJ5BEI (127.0.0.1) 17.00ms 2017-04-24 14:34:18.645 [INFO] 200 PUT /api/v1/jobs/43dcb3cf291c11e78d5d0050569d4f34 (127.0.0.1) 2.00ms 2017-04-24 14:34:19.415 [INFO] 200 GET /api/v1/jobs?Session=58FE443E-C11D4B-107-14N1QEJJ5BEI (127.0.0.1) 11.00ms 2017-04-24 14:34:44.065 [INFO] found 1 zombie executions 2017-04-24 14:34:44.069 [INFO] marking zombie aba2dd4f291c11e7b5300050569d4f34 as FAILED 2017-04-24 14:34:44.079 [INFO] Scheduler started 2017-04-24 14:34:44.130 [INFO] Running server at 127.0.0.1:8888 ... 2017-04-24 14:34:44.131 [INFO] *** You can access scheduler web ui at http://localhost:8888 *** 2017-04-24 14:37:10.506 [INFO] 200 GET /api/v1/jobs?Session=58FE443E-C11D4B-107-14N1QEJJ5BEI (127.0.0.1) 15.00ms 2017-04-24 14:37:14.311 [INFO] 200 GET /api/v1/executions?Session=58FE443E-C11D4B-107-14N1QEJJ5BEI (127.0.0.1) 69.00ms 2017-04-24 14:37:50.983 [INFO] 200 GET /api/v1/jobs/43dcb3cf291c11e78d5d0050569d4f34?Session=58FE443E-C11D4B-107-14N1QEJJ5BEI (127.0.0.1) 7.00ms 2017-04-24 14:39:27.756 [INFO] 200 POST /api/v1/executions/43dcb3cf291c11e78d5d0050569d4f34 (127.0.0.1) 3981.00ms 2017-04-24 14:40:33.900 [INFO] 200 GET /api/v1/executions?Session=58FE443E-C11D4B-107-14N1QEJJ5BEI (127.0.0.1) 30.00ms 2017-04-24 16:33:42.921 [INFO] 200 GET /?Session=58FE6120-31EECA-002-Z823IYPKSAH2 (127.0.0.1) 9.00ms 2017-04-24 16:33:44.502 [INFO] 200 GET /api/v1/jobs?Session=58FE6120-31EECA-002-Z823IYPKSAH2 (127.0.0.1) 7.00ms 2017-04-24 16:40:14.173 [INFO] 200 POST /api/v1/executions/8945f2cf0b8811e797150050569d4f34 (127.0.0.1) 353.00ms 2017-04-24 16:40:27.474 [INFO] 200 GET /api/v1/executions?Session=58FE6120-31EECA-002-Z823IYPKSAH2 (127.0.0.1) 22.00ms 2017-04-24 16:40:31.306 [INFO] 200 GET /api/v1/jobs/8945f2cf0b8811e797150050569d4f34?Session=58FE6120-31EECA-002-Z823IYPKSAH2 (127.0.0.1) 12.00ms A couple of things to note in that patch of entries. You can see by the [WARNING] message that the scheduler is aware that it dropped the ball. Also, the scheduler restarted itself in the middle of our session to set up and run the GovDelivery report. Nothing more until the following morning at 5 am when it was time for the translation queue notification report: 2017-05-01 05:00:00.019 [ERROR] Exception during reset or similar Traceback (most recent call last): File "D:\Python\lib\site-packages\sqlalchemy\pool.py", line 567, in _finalize_fairy fairy._reset(pool) File "D:\Python\lib\site-packages\sqlalchemy\pool.py", line 701, in _reset pool._dialect.do_rollback(self) File "D:\Python\lib\site-packages\sqlalchemy\engine\default.py", line 412, in do_rollback dbapi_connection.rollback() File "pymssql.pyx", line 330, in pymssql.Connection.rollback (pymssql.c:5166) File "pymssql.pyx", line 314, in pymssql.Connection.rollback (pymssql.c:5089) File "_mssql.pyx", line 980, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11361) File "_mssql.pyx", line 1001, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11183) File "_mssql.pyx", line 1156, in _mssql.MSSQLConnection.format_and_run_query (_mssql.c:12584) File "_mssql.pyx", line 712, in _mssql.MSSQLConnection.cancel (_mssql.c:7494) File "_mssql.pyx", line 1635, in _mssql.assert_connected (_mssql.c:17597) MSSQLDriverException: Not connected to any MS SQL server 2017-05-01 05:00:00.065 [ERROR] Exception in callback Traceback (most recent call last): File "D:\Python\lib\site-packages\tornado\ioloop.py", line 600, in _run_callback ret = callback() File "D:\Python\lib\site-packages\tornado\stack_context.py", line 275, in null_wrapper return fn(*args, **kwargs) File "D:\Python\lib\site-packages\apscheduler\schedulers\tornado.py", line 59, in wakeup wait_seconds = self._process_jobs() File "D:\Python\lib\site-packages\ndscheduler\core\scheduler\base.py", line 181, in _process_jobs return super(apscheduler_tornado.TornadoScheduler, self)._process_jobs() File "D:\Python\lib\site-packages\apscheduler\schedulers\base.py", line 801, in _process_jobs for job in jobstore.get_due_jobs(now): File "D:\Python\lib\site-packages\apscheduler\jobstores\sqlalchemy.py", line 64, in get_due_jobs return self._get_jobs(self.jobs_t.c.next_run_time <= timestamp) File "D:\Python\lib\site-packages\apscheduler\jobstores\sqlalchemy.py", line 122, in _get_jobs for row in self.engine.execute(selectable): File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1978, in execute return connection.execute(statement, *multiparams, **params) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 914, in execute return meth(self, multiparams, params) File "D:\Python\lib\site-packages\sqlalchemy\sql\elements.py", line 323, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1010, in _execute_clauseelement compiled_sql, distilled_params File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1146, in _execute_context context) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1325, in _handle_dbapi_exception self._autorollback() File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 791, in _autorollback self._root._rollback_impl() File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 670, in _rollback_impl self._handle_dbapi_exception(e, None, None, None, None) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1263, in _handle_dbapi_exception exc_info File "D:\Python\lib\site-packages\sqlalchemy\util\compat.py", line 199, in raise_from_cause reraise(type(exception), exception, tb=exc_tb) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 668, in _rollback_impl self.engine.dialect.do_rollback(self.connection) File "D:\Python\lib\site-packages\sqlalchemy\engine\default.py", line 412, in do_rollback dbapi_connection.rollback() File "pymssql.pyx", line 330, in pymssql.Connection.rollback (pymssql.c:5166) File "pymssql.pyx", line 314, in pymssql.Connection.rollback (pymssql.c:5089) File "_mssql.pyx", line 980, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11361) File "_mssql.pyx", line 1001, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11183) File "_mssql.pyx", line 1156, in _mssql.MSSQLConnection.format_and_run_query (_mssql.c:12584) File "_mssql.pyx", line 712, in _mssql.MSSQLConnection.cancel (_mssql.c:7494) File "_mssql.pyx", line 1635, in _mssql.assert_connected (_mssql.c:17597) DBAPIError: (_mssql.MSSQLDriverException) Not connected to any MS SQL server I checked in after lunch, greeted with another failure + reboot: 2017-05-01 12:49:40.605 [INFO] 200 GET /?Session=59074BE0-2B35F8-002-2ALI81IW4SRG (127.0.0.1) 18.00ms 2017-05-01 12:49:42.782 [INFO] 200 GET /api/v1/jobs?Session=59074BE0-2B35F8-002-2ALI81IW4SRG (127.0.0.1) 44.00ms 2017-05-01 12:50:00.039 [INFO] found 1 zombie executions 2017-05-01 12:50:00.048 [INFO] marking zombie 326a4af02e8e11e78bf30050569d4f34 as FAILED 2017-05-01 12:50:00.059 [INFO] Scheduler started 2017-05-01 12:50:00.108 [INFO] Running server at 127.0.0.1:8888 ... 2017-05-01 12:50:00.118 [INFO] *** You can access scheduler web ui at http://localhost:8888 *** 2017-05-01 12:50:00.190 [WARNING] Run time of job "Translation Job Notification (trigger: cron[month='*', day='*', day_of_week='mon', hour='5', minute='0'], next run at: 2017-05-01 05:00:00 EDT)" was missed by 7:50:00.191000 2017-05-01 12:51:05.055 [INFO] 200 GET /api/v1/executions?Session=59074BE0-2B35F8-002-2ALI81IW4SRG (127.0.0.1) 86.00ms 2017-05-01 12:51:16.947 [INFO] 200 GET /api/v1/jobs/8945f2cf0b8811e797150050569d4f34?Session=59074BE0-2B35F8-002-2ALI81IW4SRG (127.0.0.1) 11.00ms 2017-05-01 12:51:22.926 [INFO] 200 POST /api/v1/executions/8945f2cf0b8811e797150050569d4f34 (127.0.0.1) 411.00ms Later that evening, I added a job to run the refresh of the table for recent CT.gov trials (used by William for terminology work), and this I didn't get an error or restart: 2017-05-01 19:05:58.799 [INFO] 200 GET /?Session=5907BCCD-E452FF-002-YDBFQQ4ZEZWK (127.0.0.1) 62.00ms 2017-05-01 19:06:01.095 [INFO] 200 GET /api/v1/jobs?Session=5907BCCD-E452FF-002-YDBFQQ4ZEZWK (127.0.0.1) 15.00ms 2017-05-01 19:12:20.530 [INFO] Added job "Recent CT.gov Trials (manual)" to job store "default" 2017-05-01 19:12:20.546 [INFO] 201 POST /api/v1/jobs (127.0.0.1) 234.00ms 2017-05-01 19:12:21.437 [INFO] 200 GET /api/v1/jobs?Session=5907BCCD-E452FF-002-YDBFQQ4ZEZWK (127.0.0.1) 32.00ms 2017-05-01 19:14:11.030 [INFO] 200 GET /api/v1/executions?Session=5907BCCD-E452FF-002-YDBFQQ4ZEZWK (127.0.0.1) 32.00ms 2017-05-01 19:14:45.092 [INFO] 200 GET /?Session=5907BCCD-E452FF-002-YDBFQQ4ZEZWK (127.0.0.1) 0.00ms 2017-05-01 19:14:47.092 [INFO] 200 GET /api/v1/executions?Session=5907BCCD-E452FF-002-YDBFQQ4ZEZWK (127.0.0.1) 31.00ms 2017-05-01 19:14:47.124 [INFO] 200 POST /api/v1/executions/a06da48f2ec311e7b0590050569d4f34 (127.0.0.1) 131452.00ms 2017-05-01 19:15:53.421 [INFO] 200 GET /api/v1/jobs/a06da48f2ec311e7b0590050569d4f34?Session=5907BCCD-E452FF-002-YDBFQQ4ZEZWK (127.0.0.1) 0.00ms 2017-05-01 19:15:58.124 [INFO] 200 PATCH /api/v1/jobs/a06da48f2ec311e7b0590050569d4f34 (127.0.0.1) 32.00ms 2017-05-01 19:15:59.013 [INFO] 200 GET /api/v1/jobs?Session=5907BCCD-E452FF-002-YDBFQQ4ZEZWK (127.0.0.1) 15.00ms Then nothing until the end of the week, when just after 8 pm on Saturday the scheduler was restarted: 2017-05-06 20:11:00.842 [INFO] found 0 zombie executions 2017-05-06 20:11:00.849 [INFO] Scheduler started 2017-05-06 20:11:01.020 [INFO] Running server at 127.0.0.1:8888 ... 2017-05-06 20:11:01.029 [INFO] *** You can access scheduler web ui at http://localhost:8888 *** About 10 minutes later I checked into the web interface: 2017-05-06 20:20:39.446 [INFO] 200 GET /?Session=590E65B3-1E54F1-002-N9BLLG2VEIG0 (127.0.0.1) 65.00ms 2017-05-06 20:20:41.720 [INFO] 200 GET /api/v1/jobs?Session=590E65B3-1E54F1-002-N9BLLG2VEIG0 (127.0.0.1) 102.00ms We had been preparing to install at least one stub job to try and keep the database connection alive, but hadn't been able yet to coordinate with CBIIT. But even without the stub job the Monday morning (May 8) translation queue notification job ran successfully: 2017-05-08 05:00:00.065 [INFO] Running job "Translation Job Notification (trigger: cron[month='*', day='*', day_of_week='mon', hour='5', minute='0'], next run at: 2017-05-15 05:00:00 EDT)" (scheduled at 2017-05-08 05:00:00-04:00) 2017-05-08 05:00:00.835 [INFO] Job "Translation Job Notification (trigger: cron[month='*', day='*', day_of_week='mon', hour='5', minute='0'], next run at: 2017-05-15 05:00:00 EDT)" executed successfully Another restart of the scheduler just before 6 pm the next evening: 2017-05-09 17:56:51.763 [INFO] found 0 zombie executions 2017-05-09 17:56:51.778 [INFO] Scheduler started 2017-05-09 17:56:51.861 [INFO] Running server at 127.0.0.1:8888 ... 2017-05-09 17:56:51.867 [INFO] *** You can access scheduler web ui at http://localhost:8888 *** I checked in to the web interface early the next morning: 2017-05-10 06:32:39.823 [INFO] 200 GET /?Session=591254B4-11AB3C-002-0HIH6AZULPCV (127.0.0.1) 145.00ms 2017-05-10 06:32:42.835 [INFO] 200 GET /api/v1/jobs?Session=591254B4-11AB3C-002-0HIH6AZULPCV (127.0.0.1) 144.00ms Less than a half hour later there were two more restarts of the scheduler. Note that we haven't seen the problem with two instances trying to run at the same time for a while. 2017-05-10 06:55:09.835 [INFO] found 0 zombie executions 2017-05-10 06:55:09.852 [INFO] Scheduler started 2017-05-10 06:55:09.930 [INFO] Running server at 127.0.0.1:8888 ... 2017-05-10 06:55:09.944 [INFO] *** You can access scheduler web ui at http://localhost:8888 *** 2017-05-10 07:00:40.841 [INFO] found 0 zombie executions 2017-05-10 07:00:40.476 [INFO] Scheduler started 2017-05-10 07:00:40.589 [INFO] Running server at 127.0.0.1:8888 ... 2017-05-10 07:00:40.591 [INFO] *** You can access scheduler web ui at http://localhost:8888 *** The following Monday another successful run of the translation queue notification job (so two weeks in a row without the database connection exception): 2017-05-15 05:00:00.125 [INFO] Running job "Translation Job Notification (trigger: cron[month='*', day='*', day_of_week='mon', hour='5', minute='0'], next run at: 2017-05-15 05:00:00 EDT)" (scheduled at 2017-05-15 05:00:00-04:00) 2017-05-15 05:00:00.651 [INFO] Job "Translation Job Notification (trigger: cron[month='*', day='*', day_of_week='mon', hour='5', minute='0'], next run at: 2017-05-22 05:00:00 EDT)" executed successfully I checked in later in the day to verify the job run: 2017-05-15 17:47:15.461 [INFO] 200 GET /?Session=591A219D-0C9E20-002-FTWF4SKLRHWN (127.0.0.1) 90.00ms 2017-05-15 17:47:17.572 [INFO] 200 GET /api/v1/jobs?Session=591A219D-0C9E20-002-FTWF4SKLRHWN (127.0.0.1) 11.00ms 2017-05-15 17:47:31.868 [INFO] 200 GET /api/v1/executions?Session=591A219D-0C9E20-002-FTWF4SKLRHWN (127.0.0.1) 119.00ms 2017-05-15 17:47:36.484 [INFO] 200 GET /api/v1/executions?Session=591A219D-0C9E20-002-FTWF4SKLRHWN&time_range_start=2017-05-14T21%253A47%253A34.730Z&time_range_end=2017-05-15T21%253A47%253A34.730Z (127.0.0.1) 89.00ms 2017-05-15 17:48:11.171 [INFO] 200 GET /api/v1/jobs?Session=591A219D-0C9E20-002-FTWF4SKLRHWN (127.0.0.1) 13.00ms Then on Wednesday CBIIT added the script for the publishing queue check stub job to the server, and I installed the job: 2017-05-17 09:05:35.092 [INFO] 200 GET /?Session=591C374D-31564F-002-QP2069OG1DLQ (127.0.0.1) 22.00ms 2017-05-17 09:05:37.338 [INFO] 200 GET /api/v1/jobs?Session=591C374D-31564F-002-QP2069OG1DLQ (127.0.0.1) 131.00ms 2017-05-17 09:06:40.960 [INFO] Added job "Publishing Queue Check (Stub)" to job store "default" 2017-05-17 09:06:41.005 [INFO] 201 POST /api/v1/jobs (127.0.0.1) 516.00ms 2017-05-17 09:06:42.516 [INFO] 200 GET /api/v1/jobs?Session=591C374D-31564F-002-QP2069OG1DLQ (127.0.0.1) 266.00ms 2017-05-17 09:06:47.109 [INFO] 200 GET /api/v1/executions?Session=591C374D-31564F-002-QP2069OG1DLQ (127.0.0.1) 16.00ms 2017-05-17 09:08:40.428 [INFO] 200 GET /api/v1/executions?Session=591C374D-31564F-002-QP2069OG1DLQ&time_range_start=2017-05-17T12%253A58%253A38.530Z&time_range_end=2017-05-17T13%253A08%253A38.530Z (127.0.0.1) 14.00ms 2017-05-17 09:08:53.961 [INFO] 200 GET /api/v1/jobs?Session=591C374D-31564F-002-QP2069OG1DLQ (127.0.0.1) 10.00ms 2017-05-17 09:09:07.940 [INFO] 200 PUT /api/v1/jobs/aad08d913b0111e7b4550050569d4f34 (127.0.0.1) 1.00ms 2017-05-17 09:09:08.914 [INFO] 200 GET /api/v1/jobs?Session=591C374D-31564F-002-QP2069OG1DLQ (127.0.0.1) 10.00ms 2017-05-17 09:09:21.467 [INFO] 200 GET /api/v1/executions?Session=591C374D-31564F-002-QP2069OG1DLQ (127.0.0.1) 16.00ms And it started running, every two minutes: 2017-05-17 09:10:00.089 [INFO] Running job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-17 09:10:00 EDT)" (scheduled at 2017-05-17 09:10:00-04:00) 2017-05-17 09:10:00.187 [INFO] Job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-17 09:12:00 EDT)" executed successfully 2017-05-17 09:12:00.061 [INFO] Running job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-17 09:12:00 EDT)" (scheduled at 2017-05-17 09:12:00-04:00) 2017-05-17 09:12:00.177 [INFO] Job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-17 09:14:00 EDT)" executed successfully ........... This goes on like this very smoothly, with an occasional check-in from the web interface: 2017-05-17 10:08:17.964 [INFO] 200 GET /api/v1/executions?Session=591C374D-31564F-002-QP2069OG1DLQ&time_range_start=2017-05-17T13%253A58%253A15.551Z&time_range_end=2017-05-17T14%253A08%253A15.551Z (127.0.0.1) 114.00ms 2017-05-17 10:09:01.578 [INFO] 200 GET /api/v1/executions?Session=591C374D-31564F-002-QP2069OG1DLQ&time_range_start=2017-05-16T14%253A08%253A59.423Z&time_range_end=2017-05-17T14%253A08%253A59.423Z (127.0.0.1) 145.00ms Then three days later, at about quarter past 7 Saturday evening, CBIIT shuts down the database server. It would be nice if they were to stop the services that depend on the database cleanly, but that's not how they roll. 2017-05-20 19:16:00.049 [ERROR] Exception during reset or similar Traceback (most recent call last): File "D:\Python\lib\site-packages\sqlalchemy\pool.py", line 567, in _finalize_fairy fairy._reset(pool) File "D:\Python\lib\site-packages\sqlalchemy\pool.py", line 701, in _reset pool._dialect.do_rollback(self) File "D:\Python\lib\site-packages\sqlalchemy\engine\default.py", line 412, in do_rollback dbapi_connection.rollback() File "pymssql.pyx", line 330, in pymssql.Connection.rollback (pymssql.c:5166) File "pymssql.pyx", line 314, in pymssql.Connection.rollback (pymssql.c:5089) File "_mssql.pyx", line 980, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11361) File "_mssql.pyx", line 1001, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11183) File "_mssql.pyx", line 1175, in _mssql.MSSQLConnection.format_and_run_query (_mssql.c:12746) File "_mssql.pyx", line 1586, in _mssql.check_cancel_and_raise (_mssql.c:16880) File "_mssql.pyx", line 1630, in _mssql.maybe_raise_MSSQLDatabaseException (_mssql.c:17524) MSSQLDatabaseException: (6005, 'SHUTDOWN is in progress.DB-Lib error message 20018, severity 14:\nGeneral SQL Server error: Check messages from the SQL Server\n') 2017-05-20 19:16:00.134 [ERROR] Exception in callback Traceback (most recent call last): File "D:\Python\lib\site-packages\tornado\ioloop.py", line 600, in _run_callback ret = callback() File "D:\Python\lib\site-packages\tornado\stack_context.py", line 275, in null_wrapper return fn(*args, **kwargs) File "D:\Python\lib\site-packages\apscheduler\schedulers\tornado.py", line 59, in wakeup wait_seconds = self._process_jobs() File "D:\Python\lib\site-packages\ndscheduler\core\scheduler\base.py", line 181, in _process_jobs return super(apscheduler_tornado.TornadoScheduler, self)._process_jobs() File "D:\Python\lib\site-packages\apscheduler\schedulers\base.py", line 801, in _process_jobs for job in jobstore.get_due_jobs(now): File "D:\Python\lib\site-packages\apscheduler\jobstores\sqlalchemy.py", line 64, in get_due_jobs return self._get_jobs(self.jobs_t.c.next_run_time <= timestamp) File "D:\Python\lib\site-packages\apscheduler\jobstores\sqlalchemy.py", line 122, in _get_jobs for row in self.engine.execute(selectable): File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1978, in execute return connection.execute(statement, *multiparams, **params) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 914, in execute return meth(self, multiparams, params) File "D:\Python\lib\site-packages\sqlalchemy\sql\elements.py", line 323, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1010, in _execute_clauseelement compiled_sql, distilled_params File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1146, in _execute_context context) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1325, in _handle_dbapi_exception self._autorollback() File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 791, in _autorollback self._root._rollback_impl() File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 670, in _rollback_impl self._handle_dbapi_exception(e, None, None, None, None) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1263, in _handle_dbapi_exception exc_info File "D:\Python\lib\site-packages\sqlalchemy\util\compat.py", line 199, in raise_from_cause reraise(type(exception), exception, tb=exc_tb) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 668, in _rollback_impl self.engine.dialect.do_rollback(self.connection) File "D:\Python\lib\site-packages\sqlalchemy\engine\default.py", line 412, in do_rollback dbapi_connection.rollback() File "pymssql.pyx", line 330, in pymssql.Connection.rollback (pymssql.c:5166) File "pymssql.pyx", line 314, in pymssql.Connection.rollback (pymssql.c:5089) File "_mssql.pyx", line 980, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11361) File "_mssql.pyx", line 1001, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11183) File "_mssql.pyx", line 1175, in _mssql.MSSQLConnection.format_and_run_query (_mssql.c:12746) File "_mssql.pyx", line 1586, in _mssql.check_cancel_and_raise (_mssql.c:16880) File "_mssql.pyx", line 1630, in _mssql.maybe_raise_MSSQLDatabaseException (_mssql.c:17524) DBAPIError: (_mssql.MSSQLDatabaseException) (6005, 'SHUTDOWN is in progress.DB-Lib error message 20018, severity 14:\nGeneral SQL Server error: Check messages from the SQL Server\n') We don't get any news from the scheduler until 4 and 1/2 hours later, when it restarts, warns us that a job got skipped, does three successful runs of the stub publishing queue check (spaced two minutes apart as scheduled), then restarts again and resumes the every-two-minutes routine: 2017-05-20 23:14:28.857 [INFO] found 0 zombie executions 2017-05-20 23:14:28.878 [INFO] Scheduler started 2017-05-20 23:14:28.950 [INFO] Running server at 127.0.0.1:8888 ... 2017-05-20 23:14:28.950 [INFO] *** You can access scheduler web ui at http://localhost:8888 *** 2017-05-20 23:14:29.079 [WARNING] Run time of job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-20 19:16:00 EDT)" was missed by 3:58:29.080000 2017-05-20 23:16:00.381 [INFO] Running job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-20 23:16:00 EDT)" (scheduled at 2017-05-20 23:16:00-04:00) 2017-05-20 23:16:01.351 [INFO] Job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-20 23:18:00 EDT)" executed successfully 2017-05-20 23:18:00.711 [INFO] Running job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-20 23:18:00 EDT)" (scheduled at 2017-05-20 23:18:00-04:00) 2017-05-20 23:18:00.967 [INFO] Job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-20 23:20:00 EDT)" executed successfully 2017-05-20 23:20:00.325 [INFO] Running job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-20 23:20:00 EDT)" (scheduled at 2017-05-20 23:20:00-04:00) 2017-05-20 23:20:00.533 [INFO] Job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-20 23:22:00 EDT)" executed successfully 2017-05-20 23:22:14.854 [INFO] found 0 zombie executions 2017-05-20 23:22:14.855 [INFO] Scheduler started 2017-05-20 23:22:14.884 [INFO] Running server at 127.0.0.1:8888 ... 2017-05-20 23:22:14.885 [INFO] *** You can access scheduler web ui at http://localhost:8888 *** 2017-05-20 23:22:14.928 [INFO] Running job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-20 23:22:00 EDT)" (scheduled at 2017-05-20 23:22:00-04:00) 2017-05-20 23:22:15.237 [INFO] Job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-20 23:24:00 EDT)" executed successfully 2017-05-20 23:24:00.071 [INFO] Running job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-20 23:24:00 EDT)" (scheduled at 2017-05-20 23:24:00-04:00) 2017-05-20 23:24:00.181 [INFO] Job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-20 23:26:00 EDT)" executed successfully 2017-05-20 23:26:00.039 [INFO] Running job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-20 23:26:00 EDT)" (scheduled at 2017-05-20 23:26:00-04:00) 2017-05-20 23:26:00.180 [INFO] Job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-20 23:28:00 EDT)" executed successfully ............ It continues chugging along through the rest of the weekend, and slips in another successful run of the Monday 5 am translation queue notification job: 2017-05-22 04:58:00.038 [INFO] Running job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-22 05:00:00 EDT)" (scheduled at 2017-05-22 04:58:00-04:00) 2017-05-22 04:58:00.095 [INFO] Job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-22 05:00:00 EDT)" executed successfully 2017-05-22 05:00:00.032 [INFO] Running job "Translation Job Notification (trigger: cron[month='*', day='*', day_of_week='mon', hour='5', minute='0'], next run at: 2017-05-22 05:00:00 EDT)" (scheduled at 2017-05-22 05:00:00-04:00) 2017-05-22 05:00:00.051 [INFO] Running job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-22 05:00:00 EDT)" (scheduled at 2017-05-22 05:00:00-04:00) 2017-05-22 05:00:00.174 [INFO] Job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-22 05:02:00 EDT)" executed successfully 2017-05-22 05:00:00.497 [INFO] Job "Translation Job Notification (trigger: cron[month='*', day='*', day_of_week='mon', hour='5', minute='0'], next run at: 2017-05-29 05:00:00 EDT)" executed successfully 2017-05-22 05:02:00.049 [INFO] Running job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-22 05:04:00 EDT)" (scheduled at 2017-05-22 05:02:00-04:00) 2017-05-22 05:02:00.612 [INFO] Job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-22 05:04:00 EDT)" executed successfully ...... [continuing similarly] In the middle of the day that Monday it stumbled with an unexplained database failure, but managed to recover and even finish the job it was in the middle of successfully. This is how we would like the scheduler to behave all the time in the face of unreliable environmental factors. 2017-05-22 12:36:00.112 [INFO] Running job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-22 12:38:00 EDT)" (scheduled at 2017-05-22 12:36:00-04:00) 2017-05-22 12:36:00.170 [INFO] Job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-22 12:38:00 EDT)" executed successfully 2017-05-22 12:38:00.119 [INFO] Running job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-22 12:40:00 EDT)" (scheduled at 2017-05-22 12:38:00-04:00) 2017-05-22 12:38:12.463 [ERROR] Connection to the database failed for an unknown reason. Traceback (most recent call last): File "D:\Python\lib\site-packages\ndscheduler\job.py", line 100, in run_job job.run(*args, **kwargs) File "D:\cdr\Scheduler\jobs\basic_job.py", line 31, in run status = task.Perform() File "D:\cdr\Scheduler\tasks\publishing_queue.py", line 21, in Perform cursor = cdrdb.connect("CdrPublishing").cursor() File "D:\cdr\Lib\Python\cdrdb2.py", line 96, in connect as_dict=as_dict) File "pymssql.pyx", line 644, in pymssql.connect (pymssql.c:10892) InterfaceError: Connection to the database failed for an unknown reason. 2017-05-22 12:38:12.506 [INFO] Job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-22 12:40:00 EDT)" executed successfully 2017-05-22 12:40:00.039 [INFO] Running job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-22 12:42:00 EDT)" (scheduled at 2017-05-22 12:40:00-04:00) 2017-05-22 12:40:00.095 [INFO] Job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-22 12:42:00 EDT)" executed successfully 2017-05-22 12:42:00.039 [INFO] Running job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-22 12:42:00 EDT)" (scheduled at 2017-05-22 12:42:00-04:00) 2017-05-22 12:42:00.114 [INFO] Job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-22 12:44:00 EDT)" executed successfully ..... [etc.] It continues on through the rest of the week and on into the weekend flawlessly, and for the first time I can remember, even makes it through a Saturday evening without a restart, sailing right through another successful translation queue notification job: 2017-05-29 04:58:00.073 [INFO] Running job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-29 05:00:00 EDT)" (scheduled at 2017-05-29 04:58:00-04:00) 2017-05-29 04:58:00.161 [INFO] Job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-29 05:00:00 EDT)" executed successfully 2017-05-29 05:00:00.072 [INFO] Running job "Translation Job Notification (trigger: cron[month='*', day='*', day_of_week='mon', hour='5', minute='0'], next run at: 2017-05-29 05:00:00 EDT)" (scheduled at 2017-05-29 05:00:00-04:00) 2017-05-29 05:00:00.098 [INFO] Running job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-29 05:00:00 EDT)" (scheduled at 2017-05-29 05:00:00-04:00) 2017-05-29 05:00:00.190 [INFO] Job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-29 05:02:00 EDT)" executed successfully 2017-05-29 05:00:00.482 [INFO] Job "Translation Job Notification (trigger: cron[month='*', day='*', day_of_week='mon', hour='5', minute='0'], next run at: 2017-06-05 05:00:00 EDT)" executed successfully 2017-05-29 05:02:00.099 [INFO] Running job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-29 05:04:00 EDT)" (scheduled at 2017-05-29 05:02:00-04:00) 2017-05-29 05:02:00.167 [INFO] Job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-29 05:04:00 EDT)" executed successfully Things are looking hopeful for the solution of continuously hitting the database to keep the connection alive (though it's worth noting that we had two successful translation queue notification jobs in a row right *before* installing the every other minute stub job). More smooth performance as we come into the end of May. On Tuesday we (with CBIIT's assistance) move the glossifier service from Linux to Windows, and I install the nightly job to refresh the glossifier data. The scheduler doesn't miss a beat. 2017-05-30 07:00:00.076 [INFO] Running job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-30 07:02:00 EDT)" (scheduled at 2017-05-30 07:00:00-04:00) 2017-05-30 07:00:00.365 [INFO] Job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-30 07:02:00 EDT)" executed successfully 2017-05-30 07:01:34.664 [INFO] 200 GET /?Session=592D50B5-9609F2-002-FK87JX7LX6E4 (127.0.0.1) 83.00ms 2017-05-30 07:01:37.275 [INFO] 200 GET /api/v1/jobs?Session=592D50B5-9609F2-002-FK87JX7LX6E4 (127.0.0.1) 18.00ms 2017-05-30 07:02:00.056 [INFO] Running job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-30 07:02:00 EDT)" (scheduled at 2017-05-30 07:02:00-04:00) 2017-05-30 07:02:00.160 [INFO] Job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-30 07:04:00 EDT)" executed successfully 2017-05-30 07:04:00.052 [INFO] Running job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-30 07:06:00 EDT)" (scheduled at 2017-05-30 07:04:00-04:00) 2017-05-30 07:04:00.142 [INFO] Job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-30 07:06:00 EDT)" executed successfully 2017-05-30 07:05:09.078 [INFO] Added job "Glossifier Refresh Service" to job store "default" 2017-05-30 07:05:09.130 [INFO] 201 POST /api/v1/jobs (127.0.0.1) 347.00ms 2017-05-30 07:05:10.555 [INFO] 200 GET /api/v1/jobs?Session=592D50B5-9609F2-002-FK87JX7LX6E4 (127.0.0.1) 283.00ms 2017-05-30 07:05:38.585 [INFO] 200 GET /api/v1/executions?Session=592D50B5-9609F2-002-FK87JX7LX6E4 (127.0.0.1) 279.00ms 2017-05-30 07:06:00.023 [INFO] Running job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-30 07:06:00 EDT)" (scheduled at 2017-05-30 07:06:00-04:00) 2017-05-30 07:06:00.121 [INFO] Job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-30 07:08:00 EDT)" executed successfully 2017-05-30 07:06:10.062 [INFO] 200 GET /api/v1/executions?Session=592D50B5-9609F2-002-FK87JX7LX6E4&time_range_start=2017-05-30T10%253A56%253A08.251Z&time_range_end=2017-05-30T11%253A06%253A08.251Z (127.0.0.1) 49.00ms 2017-05-30 07:06:20.433 [INFO] 200 POST /api/v1/executions/d7fd6b00452711e7b4f10050569d4f34 (127.0.0.1) 54376.00ms 2017-05-30 07:06:21.026 [INFO] 200 GET /api/v1/executions?Session=592D50B5-9609F2-002-FK87JX7LX6E4&time_range_start=2017-05-30T10%253A56%253A19.250Z&time_range_end=2017-05-30T11%253A06%253A19.250Z (127.0.0.1) 94.00ms 2017-05-30 07:08:00.052 [INFO] Running job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-30 07:08:00 EDT)" (scheduled at 2017-05-30 07:08:00-04:00) 2017-05-30 07:08:00.160 [INFO] Job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-30 07:10:00 EDT)" executed successfully 2017-05-30 07:10:00.069 [INFO] Running job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-30 07:12:00 EDT)" (scheduled at 2017-05-30 07:10:00-04:00) 2017-05-30 07:10:00.358 [INFO] Job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-30 07:12:00 EDT)" executed successfully [... simile] The glossifier job kicks in right on time: 2017-05-30 23:00:00.055 [INFO] Running job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-30 23:02:00 EDT)" (scheduled at 2017-05-30 23:00:00-04:00) 2017-05-30 23:00:00.073 [INFO] Running job "Glossifier Refresh Service (trigger: cron[month='*', day='*', day_of_week='*', hour='23', minute='0'], next run at: 2017-05-31 23:00:00 EDT)" (scheduled at 2017-05-30 23:00:00-04:00) 2017-05-30 23:00:00.141 [INFO] Job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-30 23:02:00 EDT)" executed successfully 2017-05-30 23:00:15.345 [INFO] Job "Glossifier Refresh Service (trigger: cron[month='*', day='*', day_of_week='*', hour='23', minute='0'], next run at: 2017-05-31 23:00:00 EDT)" executed successfully 2017-05-30 23:02:00.078 [INFO] Running job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-30 23:04:00 EDT)" (scheduled at 2017-05-30 23:02:00-04:00) 2017-05-30 23:02:00.164 [INFO] Job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-05-30 23:04:00 EDT)" executed successfully [... usw.] Things are looking very promising as we roll into the beginning of June, and then suddenly in the middle of the afternoon on Thursday June 1: 2017-06-01 13:44:00.239 [ERROR] Exception during reset or similar Traceback (most recent call last): File "D:\Python\lib\site-packages\sqlalchemy\pool.py", line 567, in _finalize_fairy fairy._reset(pool) File "D:\Python\lib\site-packages\sqlalchemy\pool.py", line 701, in _reset pool._dialect.do_rollback(self) File "D:\Python\lib\site-packages\sqlalchemy\engine\default.py", line 412, in do_rollback dbapi_connection.rollback() File "pymssql.pyx", line 330, in pymssql.Connection.rollback (pymssql.c:5166) File "pymssql.pyx", line 314, in pymssql.Connection.rollback (pymssql.c:5089) File "_mssql.pyx", line 980, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11361) File "_mssql.pyx", line 1001, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11183) File "_mssql.pyx", line 1156, in _mssql.MSSQLConnection.format_and_run_query (_mssql.c:12584) File "_mssql.pyx", line 712, in _mssql.MSSQLConnection.cancel (_mssql.c:7494) File "_mssql.pyx", line 1635, in _mssql.assert_connected (_mssql.c:17597) MSSQLDriverException: Not connected to any MS SQL server 2017-06-01 13:44:01.007 [ERROR] Exception in callback Traceback (most recent call last): File "D:\Python\lib\site-packages\tornado\ioloop.py", line 600, in _run_callback ret = callback() File "D:\Python\lib\site-packages\tornado\stack_context.py", line 275, in null_wrapper return fn(*args, **kwargs) File "D:\Python\lib\site-packages\apscheduler\schedulers\tornado.py", line 59, in wakeup wait_seconds = self._process_jobs() File "D:\Python\lib\site-packages\ndscheduler\core\scheduler\base.py", line 181, in _process_jobs return super(apscheduler_tornado.TornadoScheduler, self)._process_jobs() File "D:\Python\lib\site-packages\apscheduler\schedulers\base.py", line 801, in _process_jobs for job in jobstore.get_due_jobs(now): File "D:\Python\lib\site-packages\apscheduler\jobstores\sqlalchemy.py", line 64, in get_due_jobs return self._get_jobs(self.jobs_t.c.next_run_time <= timestamp) File "D:\Python\lib\site-packages\apscheduler\jobstores\sqlalchemy.py", line 122, in _get_jobs for row in self.engine.execute(selectable): File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1978, in execute return connection.execute(statement, *multiparams, **params) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 914, in execute return meth(self, multiparams, params) File "D:\Python\lib\site-packages\sqlalchemy\sql\elements.py", line 323, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1010, in _execute_clauseelement compiled_sql, distilled_params File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1146, in _execute_context context) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1325, in _handle_dbapi_exception self._autorollback() File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 791, in _autorollback self._root._rollback_impl() File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 670, in _rollback_impl self._handle_dbapi_exception(e, None, None, None, None) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1263, in _handle_dbapi_exception exc_info File "D:\Python\lib\site-packages\sqlalchemy\util\compat.py", line 199, in raise_from_cause reraise(type(exception), exception, tb=exc_tb) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 668, in _rollback_impl self.engine.dialect.do_rollback(self.connection) File "D:\Python\lib\site-packages\sqlalchemy\engine\default.py", line 412, in do_rollback dbapi_connection.rollback() File "pymssql.pyx", line 330, in pymssql.Connection.rollback (pymssql.c:5166) File "pymssql.pyx", line 314, in pymssql.Connection.rollback (pymssql.c:5089) File "_mssql.pyx", line 980, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11361) File "_mssql.pyx", line 1001, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11183) File "_mssql.pyx", line 1156, in _mssql.MSSQLConnection.format_and_run_query (_mssql.c:12584) File "_mssql.pyx", line 712, in _mssql.MSSQLConnection.cancel (_mssql.c:7494) File "_mssql.pyx", line 1635, in _mssql.assert_connected (_mssql.c:17597) DBAPIError: (_mssql.MSSQLDriverException) Not connected to any MS SQL server And then the logs go completely silent for the next eight days. On June 9, William files a bug report (OCECDR-4273) saying, "The scheduled email that goes out every Monday morning to translators who have work assigned to them through the Translation Workflow Management has stopped working for the past two weeks on PROD." I filed a ticket with CBIIT (INC3088939), and we were able to get the scheduler restarted, after some initial failures: 2017-06-09 14:52:00.986 [INFO] 200 GET /?Session=593AEE49-937285-002-D1VBO0QQIIQO (127.0.0.1) 60.00ms 2017-06-09 14:52:03.005 [ERROR] Exception during reset or similar Traceback (most recent call last): File "D:\Python\lib\site-packages\sqlalchemy\pool.py", line 567, in _finalize_fairy fairy._reset(pool) File "D:\Python\lib\site-packages\sqlalchemy\pool.py", line 701, in _reset pool._dialect.do_rollback(self) File "D:\Python\lib\site-packages\sqlalchemy\engine\default.py", line 412, in do_rollback dbapi_connection.rollback() File "pymssql.pyx", line 330, in pymssql.Connection.rollback (pymssql.c:5166) File "pymssql.pyx", line 314, in pymssql.Connection.rollback (pymssql.c:5089) File "_mssql.pyx", line 980, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11361) File "_mssql.pyx", line 1001, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11183) File "_mssql.pyx", line 1156, in _mssql.MSSQLConnection.format_and_run_query (_mssql.c:12584) File "_mssql.pyx", line 712, in _mssql.MSSQLConnection.cancel (_mssql.c:7494) File "_mssql.pyx", line 1635, in _mssql.assert_connected (_mssql.c:17597) MSSQLDriverException: Not connected to any MS SQL server 2017-06-09 14:52:03.038 [ERROR] Uncaught exception GET /api/v1/jobs?Session=593AEE49-937285-002-D1VBO0QQIIQO (127.0.0.1) HTTPServerRequest(protocol='http', host='localhost:8888', method='GET', uri='/api/v1/jobs?Session=593AEE49-937285-002-D1VBO0QQIIQO', version='HTTP/1.1', remote_ip='127.0.0.1', headers={'Accept-Encoding': 'gzip, deflate', 'Host': 'localhost:8888', 'Connection': 'keep-alive', 'Accept': '*/*', 'User-Agent': 'python-requests/2.13.0'}) Traceback (most recent call last): File "D:\Python\lib\site-packages\tornado\web.py", line 1401, in _stack_context_handle_exception raise_exc_info((type, value, traceback)) File "D:\Python\lib\site-packages\tornado\stack_context.py", line 314, in wrapped ret = fn(*args, **kwargs) File "D:\Python\lib\site-packages\tornado\gen.py", line 196, in final_callback if future.result() is not None: File "D:\Python\lib\site-packages\tornado\concurrent.py", line 232, in result raise_exc_info(self._exc_info) File "D:\Python\lib\site-packages\tornado\gen.py", line 1014, in run yielded = self.gen.throw(*exc_info) File "D:\Python\lib\site-packages\ndscheduler\server\handlers\jobs.py", line 64, in get_jobs_yield return_json = yield self.get_jobs() File "D:\Python\lib\site-packages\tornado\gen.py", line 1008, in run value = future.result() File "D:\Python\lib\site-packages\concurrent\futures\_base.py", line 398, in result return self.__get_result() File "D:\Python\lib\site-packages\concurrent\futures\thread.py", line 55, in run result = self.fn(*self.args, **self.kwargs) File "D:\Python\lib\site-packages\ndscheduler\server\handlers\jobs.py", line 59, in get_jobs return self._get_jobs() File "D:\Python\lib\site-packages\ndscheduler\server\handlers\jobs.py", line 21, in _get_jobs jobs = self.scheduler_manager.get_jobs() File "D:\Python\lib\site-packages\ndscheduler\core\scheduler_manager.py", line 126, in get_jobs return self.sched.get_jobs() File "D:\Python\lib\site-packages\apscheduler\schedulers\base.py", line 458, in get_jobs jobs.extend(store.get_all_jobs()) File "D:\Python\lib\site-packages\apscheduler\jobstores\sqlalchemy.py", line 73, in get_all_jobs return self._get_jobs() File "D:\Python\lib\site-packages\apscheduler\jobstores\sqlalchemy.py", line 122, in _get_jobs for row in self.engine.execute(selectable): File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1978, in execute return connection.execute(statement, *multiparams, **params) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 914, in execute return meth(self, multiparams, params) File "D:\Python\lib\site-packages\sqlalchemy\sql\elements.py", line 323, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1010, in _execute_clauseelement compiled_sql, distilled_params File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1146, in _execute_context context) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1325, in _handle_dbapi_exception self._autorollback() File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 791, in _autorollback self._root._rollback_impl() File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 670, in _rollback_impl self._handle_dbapi_exception(e, None, None, None, None) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1263, in _handle_dbapi_exception exc_info File "D:\Python\lib\site-packages\sqlalchemy\util\compat.py", line 199, in raise_from_cause reraise(type(exception), exception, tb=exc_tb) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 668, in _rollback_impl self.engine.dialect.do_rollback(self.connection) File "D:\Python\lib\site-packages\sqlalchemy\engine\default.py", line 412, in do_rollback dbapi_connection.rollback() File "pymssql.pyx", line 330, in pymssql.Connection.rollback (pymssql.c:5166) File "pymssql.pyx", line 314, in pymssql.Connection.rollback (pymssql.c:5089) File "_mssql.pyx", line 980, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11361) File "_mssql.pyx", line 1001, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11183) File "_mssql.pyx", line 1156, in _mssql.MSSQLConnection.format_and_run_query (_mssql.c:12584) File "_mssql.pyx", line 712, in _mssql.MSSQLConnection.cancel (_mssql.c:7494) File "_mssql.pyx", line 1635, in _mssql.assert_connected (_mssql.c:17597) DBAPIError: (_mssql.MSSQLDriverException) Not connected to any MS SQL server 2017-06-09 14:52:03.269 [ERROR] 500 GET /api/v1/jobs?Session=593AEE49-937285-002-D1VBO0QQIIQO (127.0.0.1) 295.00ms 2017-06-09 14:52:10.657 [ERROR] Exception during reset or similar Traceback (most recent call last): File "D:\Python\lib\site-packages\sqlalchemy\pool.py", line 567, in _finalize_fairy fairy._reset(pool) File "D:\Python\lib\site-packages\sqlalchemy\pool.py", line 701, in _reset pool._dialect.do_rollback(self) File "D:\Python\lib\site-packages\sqlalchemy\engine\default.py", line 412, in do_rollback dbapi_connection.rollback() File "pymssql.pyx", line 330, in pymssql.Connection.rollback (pymssql.c:5166) File "pymssql.pyx", line 314, in pymssql.Connection.rollback (pymssql.c:5089) File "_mssql.pyx", line 980, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11361) File "_mssql.pyx", line 1001, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11183) File "_mssql.pyx", line 1156, in _mssql.MSSQLConnection.format_and_run_query (_mssql.c:12584) File "_mssql.pyx", line 712, in _mssql.MSSQLConnection.cancel (_mssql.c:7494) File "_mssql.pyx", line 1635, in _mssql.assert_connected (_mssql.c:17597) MSSQLDriverException: Not connected to any MS SQL server 2017-06-09 14:52:10.664 [ERROR] Uncaught exception GET /api/v1/jobs?Session=593AEE49-937285-002-D1VBO0QQIIQO (127.0.0.1) HTTPServerRequest(protocol='http', host='localhost:8888', method='GET', uri='/api/v1/jobs?Session=593AEE49-937285-002-D1VBO0QQIIQO', version='HTTP/1.1', remote_ip='127.0.0.1', headers={'Accept-Encoding': 'gzip, deflate', 'Host': 'localhost:8888', 'Connection': 'keep-alive', 'Accept': '*/*', 'User-Agent': 'python-requests/2.13.0'}) Traceback (most recent call last): File "D:\Python\lib\site-packages\tornado\web.py", line 1401, in _stack_context_handle_exception raise_exc_info((type, value, traceback)) File "D:\Python\lib\site-packages\tornado\stack_context.py", line 314, in wrapped ret = fn(*args, **kwargs) File "D:\Python\lib\site-packages\tornado\gen.py", line 196, in final_callback if future.result() is not None: File "D:\Python\lib\site-packages\tornado\concurrent.py", line 232, in result raise_exc_info(self._exc_info) File "D:\Python\lib\site-packages\tornado\gen.py", line 1014, in run yielded = self.gen.throw(*exc_info) File "D:\Python\lib\site-packages\ndscheduler\server\handlers\jobs.py", line 64, in get_jobs_yield return_json = yield self.get_jobs() File "D:\Python\lib\site-packages\tornado\gen.py", line 1008, in run value = future.result() File "D:\Python\lib\site-packages\concurrent\futures\_base.py", line 398, in result return self.__get_result() File "D:\Python\lib\site-packages\concurrent\futures\thread.py", line 55, in run result = self.fn(*self.args, **self.kwargs) File "D:\Python\lib\site-packages\ndscheduler\server\handlers\jobs.py", line 59, in get_jobs return self._get_jobs() File "D:\Python\lib\site-packages\ndscheduler\server\handlers\jobs.py", line 21, in _get_jobs jobs = self.scheduler_manager.get_jobs() File "D:\Python\lib\site-packages\ndscheduler\core\scheduler_manager.py", line 126, in get_jobs return self.sched.get_jobs() File "D:\Python\lib\site-packages\apscheduler\schedulers\base.py", line 458, in get_jobs jobs.extend(store.get_all_jobs()) File "D:\Python\lib\site-packages\apscheduler\jobstores\sqlalchemy.py", line 73, in get_all_jobs return self._get_jobs() File "D:\Python\lib\site-packages\apscheduler\jobstores\sqlalchemy.py", line 122, in _get_jobs for row in self.engine.execute(selectable): File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1978, in execute return connection.execute(statement, *multiparams, **params) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 914, in execute return meth(self, multiparams, params) File "D:\Python\lib\site-packages\sqlalchemy\sql\elements.py", line 323, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1010, in _execute_clauseelement compiled_sql, distilled_params File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1146, in _execute_context context) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1325, in _handle_dbapi_exception self._autorollback() File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 791, in _autorollback self._root._rollback_impl() File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 670, in _rollback_impl self._handle_dbapi_exception(e, None, None, None, None) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1263, in _handle_dbapi_exception exc_info File "D:\Python\lib\site-packages\sqlalchemy\util\compat.py", line 199, in raise_from_cause reraise(type(exception), exception, tb=exc_tb) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 668, in _rollback_impl self.engine.dialect.do_rollback(self.connection) File "D:\Python\lib\site-packages\sqlalchemy\engine\default.py", line 412, in do_rollback dbapi_connection.rollback() File "pymssql.pyx", line 330, in pymssql.Connection.rollback (pymssql.c:5166) File "pymssql.pyx", line 314, in pymssql.Connection.rollback (pymssql.c:5089) File "_mssql.pyx", line 980, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11361) File "_mssql.pyx", line 1001, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11183) File "_mssql.pyx", line 1156, in _mssql.MSSQLConnection.format_and_run_query (_mssql.c:12584) File "_mssql.pyx", line 712, in _mssql.MSSQLConnection.cancel (_mssql.c:7494) File "_mssql.pyx", line 1635, in _mssql.assert_connected (_mssql.c:17597) DBAPIError: (_mssql.MSSQLDriverException) Not connected to any MS SQL server 2017-06-09 14:52:10.676 [ERROR] 500 GET /api/v1/jobs?Session=593AEE49-937285-002-D1VBO0QQIIQO (127.0.0.1) 24.00ms 2017-06-09 14:53:41.835 [INFO] 200 GET /api/v1/jobs?Session=593AEE49-937285-002-D1VBO0QQIIQO (127.0.0.1) 31.00ms 2017-06-09 14:54:01.448 [INFO] 200 GET /api/v1/executions?Session=593AEE49-937285-002-D1VBO0QQIIQO (127.0.0.1) 114.00ms 2017-06-09 14:54:10.542 [INFO] 200 GET /api/v1/executions?Session=593AEE49-937285-002-D1VBO0QQIIQO&time_range_start=2017-06-08T18%253A54%253A11.171Z&time_range_end=2017-06-09T18%253A54%253A11.171Z (127.0.0.1) 48.00ms 2017-06-09 14:54:51.127 [INFO] 200 GET /api/v1/jobs?Session=593AEE49-937285-002-D1VBO0QQIIQO (127.0.0.1) 21.00ms 2017-06-09 14:55:20.392 [INFO] found 1 zombie executions 2017-06-09 14:55:20.410 [INFO] marking zombie 28cdd6304d4511e78b480050569d4f34 as FAILED 2017-06-09 14:55:20.430 [INFO] Scheduler started 2017-06-09 14:55:20.464 [INFO] Running server at 127.0.0.1:8888 ... 2017-06-09 14:55:20.466 [INFO] *** You can access scheduler web ui at http://localhost:8888 *** 2017-06-09 14:55:20.569 [WARNING] Run time of job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-06-01 13:44:00 EDT)" was missed by 8 days, 1:11:20.570000 2017-06-09 14:55:20.586 [WARNING] Run time of job "Glossifier Refresh Service (trigger: cron[month='*', day='*', day_of_week='*', hour='23', minute='0'], next run at: 2017-06-01 23:00:00 EDT)" was missed by 7 days, 15:55:20.587000 2017-06-09 14:55:20.602 [WARNING] Run time of job "Translation Job Notification (trigger: cron[month='*', day='*', day_of_week='mon', hour='5', minute='0'], next run at: 2017-06-05 05:00:00 EDT)" was missed by 4 days, 9:55:20.603000 2017-06-09 14:55:29.053 [INFO] 200 GET /api/v1/executions?Session=593AEE49-937285-002-D1VBO0QQIIQO&time_range_start=2017-06-08T18%253A55%253A29.803Z&time_range_end=2017-06-09T18%253A55%253A29.803Z (127.0.0.1) 56.00ms 2017-06-09 14:55:40.022 [INFO] 200 GET /api/v1/executions?Session=593AEE49-937285-002-D1VBO0QQIIQO&time_range_start=2017-06-08T18%253A55%253A40.810Z&time_range_end=2017-06-09T18%253A55%253A40.810Z (127.0.0.1) 61.00ms 2017-06-09 14:55:42.542 [INFO] found 0 zombie executions 2017-06-09 14:55:42.545 [INFO] Scheduler started 2017-06-09 14:55:42.549 [INFO] Running server at 127.0.0.1:8888 ... 2017-06-09 14:55:42.549 [INFO] *** You can access scheduler web ui at http://localhost:8888 *** 2017-06-09 14:55:43.513 [INFO] 200 GET /api/v1/executions?Session=593AEE49-937285-002-D1VBO0QQIIQO&time_range_start=2017-06-08T18%253A55%253A44.105Z&time_range_end=2017-06-09T18%253A55%253A44.105Z (127.0.0.1) 57.00ms 2017-06-09 14:55:43.788 [INFO] found 0 zombie executions 2017-06-09 14:55:43.789 [INFO] Scheduler started 2017-06-09 14:55:45.596 [INFO] 200 GET /api/v1/executions?Session=593AEE49-937285-002-D1VBO0QQIIQO&time_range_start=2017-06-08T18%253A55%253A45.977Z&time_range_end=2017-06-09T18%253A55%253A45.977Z (127.0.0.1) 23.00ms 2017-06-09 14:55:46.430 [INFO] found 0 zombie executions 2017-06-09 14:55:46.433 [INFO] Scheduler started 2017-06-09 14:55:47.670 [INFO] 200 GET /api/v1/jobs?Session=593AEE49-937285-002-D1VBO0QQIIQO (127.0.0.1) 52.00ms 2017-06-09 14:55:48.637 [INFO] found 0 zombie executions 2017-06-09 14:55:48.638 [INFO] Scheduler started 2017-06-09 14:55:50.911 [INFO] found 0 zombie executions 2017-06-09 14:55:50.913 [INFO] Scheduler started 2017-06-09 14:55:53.098 [INFO] 200 POST /api/v1/executions/8945f2cf0b8811e797150050569d4f34 (127.0.0.1) 423.00ms 2017-06-09 14:55:53.165 [INFO] found 0 zombie executions 2017-06-09 14:55:53.167 [INFO] Scheduler started 2017-06-09 14:55:55.427 [INFO] found 0 zombie executions 2017-06-09 14:55:55.430 [INFO] Scheduler started 2017-06-09 14:55:57.691 [INFO] found 0 zombie executions 2017-06-09 14:55:57.694 [INFO] Scheduler started 2017-06-09 14:55:58.183 [INFO] 200 GET /api/v1/executions?Session=593AEE49-937285-002-D1VBO0QQIIQO (127.0.0.1) 102.00ms 2017-06-09 14:56:00.052 [INFO] Running job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-06-09 14:56:00 EDT)" (scheduled at 2017-06-09 14:56:00-04:00) 2017-06-09 14:56:00.216 [INFO] found 1 zombie executions 2017-06-09 14:56:00.220 [INFO] marking zombie 4731aed14d4511e789960050569d4f34 as FAILED 2017-06-09 14:56:00.224 [INFO] Job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-06-09 14:58:00 EDT)" executed successfully 2017-06-09 14:56:00.230 [INFO] Scheduler started 2017-06-09 14:56:02.522 [INFO] found 0 zombie executions 2017-06-09 14:56:02.523 [INFO] Scheduler started 2017-06-09 14:56:04.726 [INFO] found 0 zombie executions 2017-06-09 14:56:04.729 [INFO] Scheduler started 2017-06-09 14:56:06.894 [INFO] found 0 zombie executions 2017-06-09 14:56:06.894 [INFO] Scheduler started 2017-06-09 14:56:09.033 [INFO] found 0 zombie executions 2017-06-09 14:56:09.035 [INFO] Scheduler started So we can see that I was able to fire off the translation queue job successfully by hand, and the other jobs are able to run, but now we're back with the second instance of the scheduler trying to bind over and over again, every two or three seconds. Things continue like this for quite a while, filling the log with repeated pairs of "found 0 zombie executions" and "Scheduler started" lines. Every once in a while, we'd find a zombie to mark as failed: 2017-06-09 21:00:00.150 [INFO] found 1 zombie executions 2017-06-09 21:00:00.158 [INFO] marking zombie 20d69ec04d7811e7910d0050569d4f34 as FAILED That was for the file sweeper (I had forgotten, or perhaps never knew, that it was installed). There were a fair number of these file sweeper failures (once every three hours). I'll need to track that down with Volker. We continue on like this over the weekend, the second scheduler trying to get back in the game every two or three seconds, until all of a sudden, at 10 am Monday morning (June 12) we're back to a single scheduler chugging along as it should, running the stub job every two minutes. It's been like that ever since, so as of this writing (the end of Friday June 23) we've had over ten days of problem-free scheduling. We'll check back after the weekend. [Resumed analysis July 3] Some occasional successful web access to review job activity, for example on the morning of Tuesday, June 20: 2017-06-20 10:33:27.676 [INFO] 200 GET /?Session=59493166-58BB46-002-FXNA792GACKF (127.0.0.1) 657.00ms 2017-06-20 10:33:29.315 [INFO] 200 GET /api/v1/jobs?Session=59493166-58BB46-002-FXNA792GACKF (127.0.0.1) 90.00ms 2017-06-20 10:33:53.673 [INFO] 200 GET /api/v1/executions?Session=59493166-58BB46-002-FXNA792GACKF (127.0.0.1) 190.00ms 2017-06-20 10:34:00.088 [INFO] Running job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-06-20 10:36:00 EDT)" (scheduled at 2017-06-20 10:34:00-04:00) 2017-06-20 10:34:00.173 [INFO] Job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-06-20 10:36:00 EDT)" executed successfully 2017-06-20 10:34:07.303 [INFO] 200 POST /api/v1/executions/7629ae5e0d7411e78fa70050569d4f34 (127.0.0.1) 19838.00ms 2017-06-20 10:34:10.700 [INFO] 200 GET /api/v1/executions?Session=59493166-58BB46-002-FXNA792GACKF&time_range_start=2017-06-20T14%253A24%253A08.728Z&time_range_end=2017-06-20T14%253A34%253A08.728Z (127.0.0.1) 122.00ms Another database exception just after 7 pm on Saturday the 24th. 2017-06-24 19:02:02.061 [ERROR] Exception during reset or similar Traceback (most recent call last): File "D:\Python\lib\site-packages\sqlalchemy\pool.py", line 567, in _finalize_fairy fairy._reset(pool) File "D:\Python\lib\site-packages\sqlalchemy\pool.py", line 701, in _reset pool._dialect.do_rollback(self) File "D:\Python\lib\site-packages\sqlalchemy\engine\default.py", line 412, in do_rollback dbapi_connection.rollback() File "pymssql.pyx", line 330, in pymssql.Connection.rollback (pymssql.c:5166) File "pymssql.pyx", line 314, in pymssql.Connection.rollback (pymssql.c:5089) File "_mssql.pyx", line 980, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11361) File "_mssql.pyx", line 1001, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11183) File "_mssql.pyx", line 1156, in _mssql.MSSQLConnection.format_and_run_query (_mssql.c:12584) File "_mssql.pyx", line 712, in _mssql.MSSQLConnection.cancel (_mssql.c:7494) File "_mssql.pyx", line 1635, in _mssql.assert_connected (_mssql.c:17597) MSSQLDriverException: Not connected to any MS SQL server 2017-06-24 19:02:04.778 [ERROR] Exception in callback Traceback (most recent call last): File "D:\Python\lib\site-packages\tornado\ioloop.py", line 600, in _run_callback ret = callback() File "D:\Python\lib\site-packages\tornado\stack_context.py", line 275, in null_wrapper return fn(*args, **kwargs) File "D:\Python\lib\site-packages\apscheduler\schedulers\tornado.py", line 59, in wakeup wait_seconds = self._process_jobs() File "D:\Python\lib\site-packages\ndscheduler\core\scheduler\base.py", line 181, in _process_jobs return super(apscheduler_tornado.TornadoScheduler, self)._process_jobs() File "D:\Python\lib\site-packages\apscheduler\schedulers\base.py", line 801, in _process_jobs for job in jobstore.get_due_jobs(now): File "D:\Python\lib\site-packages\apscheduler\jobstores\sqlalchemy.py", line 64, in get_due_jobs return self._get_jobs(self.jobs_t.c.next_run_time <= timestamp) File "D:\Python\lib\site-packages\apscheduler\jobstores\sqlalchemy.py", line 122, in _get_jobs for row in self.engine.execute(selectable): File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1978, in execute return connection.execute(statement, *multiparams, **params) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 914, in execute return meth(self, multiparams, params) File "D:\Python\lib\site-packages\sqlalchemy\sql\elements.py", line 323, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1010, in _execute_clauseelement compiled_sql, distilled_params File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1146, in _execute_context context) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1325, in _handle_dbapi_exception self._autorollback() File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 791, in _autorollback self._root._rollback_impl() File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 670, in _rollback_impl self._handle_dbapi_exception(e, None, None, None, None) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1263, in _handle_dbapi_exception exc_info File "D:\Python\lib\site-packages\sqlalchemy\util\compat.py", line 199, in raise_from_cause reraise(type(exception), exception, tb=exc_tb) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 668, in _rollback_impl self.engine.dialect.do_rollback(self.connection) File "D:\Python\lib\site-packages\sqlalchemy\engine\default.py", line 412, in do_rollback dbapi_connection.rollback() File "pymssql.pyx", line 330, in pymssql.Connection.rollback (pymssql.c:5166) File "pymssql.pyx", line 314, in pymssql.Connection.rollback (pymssql.c:5089) File "_mssql.pyx", line 980, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11361) File "_mssql.pyx", line 1001, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11183) File "_mssql.pyx", line 1156, in _mssql.MSSQLConnection.format_and_run_query (_mssql.c:12584) File "_mssql.pyx", line 712, in _mssql.MSSQLConnection.cancel (_mssql.c:7494) File "_mssql.pyx", line 1635, in _mssql.assert_connected (_mssql.c:17597) DBAPIError: (_mssql.MSSQLDriverException) Not connected to any MS SQL server The scheduler was silent from then until later than night when it was restarted at 11:23. 2017-06-24 23:23:42.503 [INFO] found 0 zombie executions 2017-06-24 23:23:42.533 [INFO] Scheduler started 2017-06-24 23:23:42.599 [INFO] Running server at 127.0.0.1:8888 ... 2017-06-24 23:23:42.607 [INFO] *** You can access scheduler web ui at http://localhost:8888 *** 2017-06-24 23:23:42.709 [WARNING] Run time of job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-06-24 19:02:00 EDT)" was missed by 4:21:42.709000 2017-06-24 23:23:43.025 [INFO] Running job "Glossifier Refresh Service (trigger: cron[month='*', day='*', day_of_week='*', hour='23', minute='0'], next run at: 2017-06-24 23:00:00 EDT)" (scheduled at 2017-06-24 23:00:00-04:00) 2017-06-24 23:24:00.826 [INFO] Running job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-06-24 23:24:00 EDT)" (scheduled at 2017-06-24 23:24:00-04:00) 2017-06-24 23:24:00.954 [INFO] Job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-06-24 23:26:00 EDT)" executed successfully 2017-06-24 23:24:45.127 [INFO] Job "Glossifier Refresh Service (trigger: cron[month='*', day='*', day_of_week='*', hour='23', minute='0'], next run at: 2017-06-25 23:00:00 EDT)" executed successfully Another successful run of the translation queue notification job on Monday the 26th: 2017-06-19 05:00:00.165 [INFO] Running job "Translation Job Notification (trigger: cron[month='*', day='*', day_of_week='mon', hour='5', minute='0'], next run at: 2017-06-19 05:00:00 EDT)" (scheduled at 2017-06-19 05:00:00-04:00) 2017-06-19 05:00:00.302 [INFO] Running job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-06-19 05:00:00 EDT)" (scheduled at 2017-06-19 05:00:00-04:00) 2017-06-19 05:00:00.720 [INFO] Job "Publishing Queue Check (Stub) (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*/2'], next run at: 2017-06-19 05:02:00 EDT)" executed successfully 2017-06-19 05:00:05.424 [INFO] Job "Translation Job Notification (trigger: cron[month='*', day='*', day_of_week='mon', hour='5', minute='0'], next run at: 2017-06-26 05:00:00 EDT)" executed successfully Ran successfully until 1:37 pm on Wednesday June 28, when the scheduler restarted itself in response to a modification of the cdrutil module, which was done by CBIIT when they forced an emergency replacement of the GP mailer database server. When the restart happened, the Windows Service Manager (WCM) lost the ability to communicate with the scheduler process, so it kept trying to start a second instance, which kept failing (because the instance which the scheduler itself restarted already had claimed the HTTP port (8888). 2017-06-28 13:37:22.046 [INFO] D:\cdr\Lib\Python\cdrutil.py modified; restarting server 2017-06-28 13:37:29.042 [INFO] found 0 zombie executions 2017-06-28 13:37:29.042 [INFO] Scheduler started 2017-06-28 13:37:29.043 [INFO] found 0 zombie executions 2017-06-28 13:37:29.046 [INFO] Scheduler started 2017-06-28 13:37:29.075 [INFO] Running server at 127.0.0.1:8888 ... 2017-06-28 13:37:29.076 [INFO] *** You can access scheduler web ui at http://localhost:8888 *** 2017-06-28 13:37:31.940 [INFO] found 0 zombie executions 2017-06-28 13:37:31.941 [INFO] Scheduler started 2017-06-28 13:37:37.892 [INFO] found 0 zombie executions 2017-06-28 13:37:37.894 [INFO] Scheduler started At 7:42 the next morning, the scheduler lost the connection to the database when trying to run the current jobs, and was unable to process any more jobs from that point on. However, the WCM continued trying to start a second instance of the scheduler every 2-3 seconds, until David Do stopped the service the next day in the WCM console. It's been dead ever since (as of July 3). 2017-06-29 07:42:00.059 [ERROR] Exception during reset or similar Traceback (most recent call last): File "D:\Python\lib\site-packages\sqlalchemy\pool.py", line 567, in _finalize_fairy fairy._reset(pool) File "D:\Python\lib\site-packages\sqlalchemy\pool.py", line 701, in _reset pool._dialect.do_rollback(self) File "D:\Python\lib\site-packages\sqlalchemy\engine\default.py", line 412, in do_rollback dbapi_connection.rollback() File "pymssql.pyx", line 330, in pymssql.Connection.rollback (pymssql.c:5166) File "pymssql.pyx", line 314, in pymssql.Connection.rollback (pymssql.c:5089) File "_mssql.pyx", line 980, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11361) File "_mssql.pyx", line 1001, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11183) File "_mssql.pyx", line 1156, in _mssql.MSSQLConnection.format_and_run_query (_mssql.c:12584) File "_mssql.pyx", line 712, in _mssql.MSSQLConnection.cancel (_mssql.c:7494) File "_mssql.pyx", line 1635, in _mssql.assert_connected (_mssql.c:17597) MSSQLDriverException: Not connected to any MS SQL server 2017-06-29 07:42:00.105 [ERROR] Exception in callback Traceback (most recent call last): File "D:\Python\lib\site-packages\tornado\ioloop.py", line 600, in _run_callback ret = callback() File "D:\Python\lib\site-packages\tornado\stack_context.py", line 275, in null_wrapper return fn(*args, **kwargs) File "D:\Python\lib\site-packages\apscheduler\schedulers\tornado.py", line 59, in wakeup wait_seconds = self._process_jobs() File "D:\Python\lib\site-packages\ndscheduler\core\scheduler\base.py", line 181, in _process_jobs return super(apscheduler_tornado.TornadoScheduler, self)._process_jobs() File "D:\Python\lib\site-packages\apscheduler\schedulers\base.py", line 801, in _process_jobs for job in jobstore.get_due_jobs(now): File "D:\Python\lib\site-packages\apscheduler\jobstores\sqlalchemy.py", line 64, in get_due_jobs return self._get_jobs(self.jobs_t.c.next_run_time <= timestamp) File "D:\Python\lib\site-packages\apscheduler\jobstores\sqlalchemy.py", line 122, in _get_jobs for row in self.engine.execute(selectable): File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1978, in execute return connection.execute(statement, *multiparams, **params) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 914, in execute return meth(self, multiparams, params) File "D:\Python\lib\site-packages\sqlalchemy\sql\elements.py", line 323, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1010, in _execute_clauseelement compiled_sql, distilled_params File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1146, in _execute_context context) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1325, in _handle_dbapi_exception self._autorollback() File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 791, in _autorollback self._root._rollback_impl() File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 670, in _rollback_impl self._handle_dbapi_exception(e, None, None, None, None) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1263, in _handle_dbapi_exception exc_info File "D:\Python\lib\site-packages\sqlalchemy\util\compat.py", line 199, in raise_from_cause reraise(type(exception), exception, tb=exc_tb) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 668, in _rollback_impl self.engine.dialect.do_rollback(self.connection) File "D:\Python\lib\site-packages\sqlalchemy\engine\default.py", line 412, in do_rollback dbapi_connection.rollback() File "pymssql.pyx", line 330, in pymssql.Connection.rollback (pymssql.c:5166) File "pymssql.pyx", line 314, in pymssql.Connection.rollback (pymssql.c:5089) File "_mssql.pyx", line 980, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11361) File "_mssql.pyx", line 1001, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11183) File "_mssql.pyx", line 1156, in _mssql.MSSQLConnection.format_and_run_query (_mssql.c:12584) File "_mssql.pyx", line 712, in _mssql.MSSQLConnection.cancel (_mssql.c:7494) File "_mssql.pyx", line 1635, in _mssql.assert_connected (_mssql.c:17597) DBAPIError: (_mssql.MSSQLDriverException) Not connected to any MS SQL server 2017-06-29 07:42:02.000 [INFO] found 0 zombie executions 2017-06-29 07:42:02.002 [INFO] Scheduler started 2017-06-29 07:42:04.989 [INFO] found 0 zombie executions 2017-06-29 07:42:04.990 [INFO] Scheduler started 2017-06-29 07:42:07.924 [INFO] found 0 zombie executions .... 2017-06-30 14:29:51.546 [INFO] Scheduler started 2017-06-30 14:29:53.635 [INFO] found 0 zombie executions 2017-06-30 14:29:53.638 [INFO] Scheduler started 2017-06-30 14:29:55.742 [INFO] found 0 zombie executions 2017-06-30 14:29:55.743 [INFO] Scheduler started 2017-06-30 14:36:15.058 [INFO] 200 GET /?Session=595699EF-C820C4-002-EEUVPMDZ6QIS (127.0.0.1) 919.00ms 2017-06-30 14:36:18.592 [ERROR] Exception during reset or similar Traceback (most recent call last): File "D:\Python\lib\site-packages\sqlalchemy\pool.py", line 567, in _finalize_fairy fairy._reset(pool) File "D:\Python\lib\site-packages\sqlalchemy\pool.py", line 701, in _reset pool._dialect.do_rollback(self) File "D:\Python\lib\site-packages\sqlalchemy\engine\default.py", line 412, in do_rollback dbapi_connection.rollback() File "pymssql.pyx", line 330, in pymssql.Connection.rollback (pymssql.c:5166) File "pymssql.pyx", line 314, in pymssql.Connection.rollback (pymssql.c:5089) File "_mssql.pyx", line 980, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11361) File "_mssql.pyx", line 1001, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11183) File "_mssql.pyx", line 1156, in _mssql.MSSQLConnection.format_and_run_query (_mssql.c:12584) File "_mssql.pyx", line 712, in _mssql.MSSQLConnection.cancel (_mssql.c:7494) File "_mssql.pyx", line 1635, in _mssql.assert_connected (_mssql.c:17597) MSSQLDriverException: Not connected to any MS SQL server 2017-06-30 14:36:18.808 [ERROR] Uncaught exception GET /api/v1/jobs?Session=595699EF-C820C4-002-EEUVPMDZ6QIS (127.0.0.1) HTTPServerRequest(protocol='http', host='localhost:8888', method='GET', uri='/api/v1/jobs?Session=595699EF-C820C4-002-EEUVPMDZ6QIS', version='HTTP/1.1', remote_ip='127.0.0.1', headers={'Accept-Encoding': 'gzip, deflate', 'Host': 'localhost:8888', 'Connection': 'keep-alive', 'Accept': '*/*', 'User-Agent': 'python-requests/2.13.0'}) Traceback (most recent call last): File "D:\Python\lib\site-packages\tornado\web.py", line 1401, in _stack_context_handle_exception raise_exc_info((type, value, traceback)) File "D:\Python\lib\site-packages\tornado\stack_context.py", line 314, in wrapped ret = fn(*args, **kwargs) File "D:\Python\lib\site-packages\tornado\gen.py", line 196, in final_callback if future.result() is not None: File "D:\Python\lib\site-packages\tornado\concurrent.py", line 232, in result raise_exc_info(self._exc_info) File "D:\Python\lib\site-packages\tornado\gen.py", line 1014, in run yielded = self.gen.throw(*exc_info) File "D:\Python\lib\site-packages\ndscheduler\server\handlers\jobs.py", line 64, in get_jobs_yield return_json = yield self.get_jobs() File "D:\Python\lib\site-packages\tornado\gen.py", line 1008, in run value = future.result() File "D:\Python\lib\site-packages\concurrent\futures\_base.py", line 398, in result return self.__get_result() File "D:\Python\lib\site-packages\concurrent\futures\thread.py", line 55, in run result = self.fn(*self.args, **self.kwargs) File "D:\Python\lib\site-packages\ndscheduler\server\handlers\jobs.py", line 59, in get_jobs return self._get_jobs() File "D:\Python\lib\site-packages\ndscheduler\server\handlers\jobs.py", line 21, in _get_jobs jobs = self.scheduler_manager.get_jobs() File "D:\Python\lib\site-packages\ndscheduler\core\scheduler_manager.py", line 126, in get_jobs return self.sched.get_jobs() File "D:\Python\lib\site-packages\apscheduler\schedulers\base.py", line 458, in get_jobs jobs.extend(store.get_all_jobs()) File "D:\Python\lib\site-packages\apscheduler\jobstores\sqlalchemy.py", line 73, in get_all_jobs return self._get_jobs() File "D:\Python\lib\site-packages\apscheduler\jobstores\sqlalchemy.py", line 122, in _get_jobs for row in self.engine.execute(selectable): File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1978, in execute return connection.execute(statement, *multiparams, **params) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 914, in execute return meth(self, multiparams, params) File "D:\Python\lib\site-packages\sqlalchemy\sql\elements.py", line 323, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1010, in _execute_clauseelement compiled_sql, distilled_params File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1146, in _execute_context context) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1325, in _handle_dbapi_exception self._autorollback() File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 791, in _autorollback self._root._rollback_impl() File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 670, in _rollback_impl self._handle_dbapi_exception(e, None, None, None, None) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1263, in _handle_dbapi_exception exc_info File "D:\Python\lib\site-packages\sqlalchemy\util\compat.py", line 199, in raise_from_cause reraise(type(exception), exception, tb=exc_tb) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 668, in _rollback_impl self.engine.dialect.do_rollback(self.connection) File "D:\Python\lib\site-packages\sqlalchemy\engine\default.py", line 412, in do_rollback dbapi_connection.rollback() File "pymssql.pyx", line 330, in pymssql.Connection.rollback (pymssql.c:5166) File "pymssql.pyx", line 314, in pymssql.Connection.rollback (pymssql.c:5089) File "_mssql.pyx", line 980, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11361) File "_mssql.pyx", line 1001, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11183) File "_mssql.pyx", line 1156, in _mssql.MSSQLConnection.format_and_run_query (_mssql.c:12584) File "_mssql.pyx", line 712, in _mssql.MSSQLConnection.cancel (_mssql.c:7494) File "_mssql.pyx", line 1635, in _mssql.assert_connected (_mssql.c:17597) DBAPIError: (_mssql.MSSQLDriverException) Not connected to any MS SQL server 2017-06-30 14:36:19.151 [ERROR] 500 GET /api/v1/jobs?Session=595699EF-C820C4-002-EEUVPMDZ6QIS (127.0.0.1) 1167.00ms 2017-06-30 14:36:29.657 [INFO] 200 GET /?Session=595699EF-C820C4-002-EEUVPMDZ6QIS (127.0.0.1) 65.00ms 2017-06-30 14:36:31.801 [ERROR] Exception during reset or similar Traceback (most recent call last): File "D:\Python\lib\site-packages\sqlalchemy\pool.py", line 567, in _finalize_fairy fairy._reset(pool) File "D:\Python\lib\site-packages\sqlalchemy\pool.py", line 701, in _reset pool._dialect.do_rollback(self) File "D:\Python\lib\site-packages\sqlalchemy\engine\default.py", line 412, in do_rollback dbapi_connection.rollback() File "pymssql.pyx", line 330, in pymssql.Connection.rollback (pymssql.c:5166) File "pymssql.pyx", line 314, in pymssql.Connection.rollback (pymssql.c:5089) File "_mssql.pyx", line 980, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11361) File "_mssql.pyx", line 1001, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11183) File "_mssql.pyx", line 1156, in _mssql.MSSQLConnection.format_and_run_query (_mssql.c:12584) File "_mssql.pyx", line 712, in _mssql.MSSQLConnection.cancel (_mssql.c:7494) File "_mssql.pyx", line 1635, in _mssql.assert_connected (_mssql.c:17597) MSSQLDriverException: Not connected to any MS SQL server 2017-06-30 14:36:31.808 [ERROR] Uncaught exception GET /api/v1/jobs?Session=595699EF-C820C4-002-EEUVPMDZ6QIS (127.0.0.1) HTTPServerRequest(protocol='http', host='localhost:8888', method='GET', uri='/api/v1/jobs?Session=595699EF-C820C4-002-EEUVPMDZ6QIS', version='HTTP/1.1', remote_ip='127.0.0.1', headers={'Accept-Encoding': 'gzip, deflate', 'Host': 'localhost:8888', 'Connection': 'keep-alive', 'Accept': '*/*', 'User-Agent': 'python-requests/2.13.0'}) Traceback (most recent call last): File "D:\Python\lib\site-packages\tornado\web.py", line 1401, in _stack_context_handle_exception raise_exc_info((type, value, traceback)) File "D:\Python\lib\site-packages\tornado\stack_context.py", line 314, in wrapped ret = fn(*args, **kwargs) File "D:\Python\lib\site-packages\tornado\gen.py", line 196, in final_callback if future.result() is not None: File "D:\Python\lib\site-packages\tornado\concurrent.py", line 232, in result raise_exc_info(self._exc_info) File "D:\Python\lib\site-packages\tornado\gen.py", line 1014, in run yielded = self.gen.throw(*exc_info) File "D:\Python\lib\site-packages\ndscheduler\server\handlers\jobs.py", line 64, in get_jobs_yield return_json = yield self.get_jobs() File "D:\Python\lib\site-packages\tornado\gen.py", line 1008, in run value = future.result() File "D:\Python\lib\site-packages\concurrent\futures\_base.py", line 398, in result return self.__get_result() File "D:\Python\lib\site-packages\concurrent\futures\thread.py", line 55, in run result = self.fn(*self.args, **self.kwargs) File "D:\Python\lib\site-packages\ndscheduler\server\handlers\jobs.py", line 59, in get_jobs return self._get_jobs() File "D:\Python\lib\site-packages\ndscheduler\server\handlers\jobs.py", line 21, in _get_jobs jobs = self.scheduler_manager.get_jobs() File "D:\Python\lib\site-packages\ndscheduler\core\scheduler_manager.py", line 126, in get_jobs return self.sched.get_jobs() File "D:\Python\lib\site-packages\apscheduler\schedulers\base.py", line 458, in get_jobs jobs.extend(store.get_all_jobs()) File "D:\Python\lib\site-packages\apscheduler\jobstores\sqlalchemy.py", line 73, in get_all_jobs return self._get_jobs() File "D:\Python\lib\site-packages\apscheduler\jobstores\sqlalchemy.py", line 122, in _get_jobs for row in self.engine.execute(selectable): File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1978, in execute return connection.execute(statement, *multiparams, **params) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 914, in execute return meth(self, multiparams, params) File "D:\Python\lib\site-packages\sqlalchemy\sql\elements.py", line 323, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1010, in _execute_clauseelement compiled_sql, distilled_params File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1146, in _execute_context context) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1325, in _handle_dbapi_exception self._autorollback() File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 791, in _autorollback self._root._rollback_impl() File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 670, in _rollback_impl self._handle_dbapi_exception(e, None, None, None, None) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 1263, in _handle_dbapi_exception exc_info File "D:\Python\lib\site-packages\sqlalchemy\util\compat.py", line 199, in raise_from_cause reraise(type(exception), exception, tb=exc_tb) File "D:\Python\lib\site-packages\sqlalchemy\engine\base.py", line 668, in _rollback_impl self.engine.dialect.do_rollback(self.connection) File "D:\Python\lib\site-packages\sqlalchemy\engine\default.py", line 412, in do_rollback dbapi_connection.rollback() File "pymssql.pyx", line 330, in pymssql.Connection.rollback (pymssql.c:5166) File "pymssql.pyx", line 314, in pymssql.Connection.rollback (pymssql.c:5089) File "_mssql.pyx", line 980, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11361) File "_mssql.pyx", line 1001, in _mssql.MSSQLConnection.execute_non_query (_mssql.c:11183) File "_mssql.pyx", line 1156, in _mssql.MSSQLConnection.format_and_run_query (_mssql.c:12584) File "_mssql.pyx", line 712, in _mssql.MSSQLConnection.cancel (_mssql.c:7494) File "_mssql.pyx", line 1635, in _mssql.assert_connected (_mssql.c:17597) DBAPIError: (_mssql.MSSQLDriverException) Not connected to any MS SQL server 2017-06-30 14:36:31.839 [ERROR] 500 GET /api/v1/jobs?Session=595699EF-C820C4-002-EEUVPMDZ6QIS (127.0.0.1) 79.00ms 2017-06-30 14:36:39.415 [INFO] 200 GET /api/v1/jobs?Session=595699EF-C820C4-002-EEUVPMDZ6QIS (127.0.0.1) 689.00ms 2017-06-30 14:37:49.367 [INFO] 200 GET /api/v1/executions?Session=595699EF-C820C4-002-EEUVPMDZ6QIS (127.0.0.1) 312.00ms 2017-06-30 14:37:55.523 [INFO] 200 GET /api/v1/executions?Session=595699EF-C820C4-002-EEUVPMDZ6QIS&time_range_start=2017-06-29T18%253A37%253A54.458Z&time_range_end=2017-06-30T18%253A37%253A54.458Z (127.0.0.1) 60.00ms