After installing Deadline 9 (current version 9.0.2.0) we have a problem with starting up and shutting down slaves. I’m not sure if these problems are related. I need help with resolving both of them. Also they are not reproducible it looks more like a random problem, but it’s happening few times a day
The shutdown issue
I’ve noticed that after idle shutdown command is sent the slave hangs out and waits for long time before shutting down complete.
Launcher log:
2017-04-26 09:29:14: BEGIN - MACHINE_NAME/user
2017-04-26 09:29:14: Operating System: Windows 81
2017-04-26 09:29:14: CPU Architecture: x64
2017-04-26 09:29:14: CPUs: 12
2017-04-26 09:29:14: Video Card: NVIDIA GeForce 210
2017-04-26 09:29:14: Deadline Launcher 9.0 [v9.0.2.0 Release (89e276a72)]
2017-04-26 09:29:14: Launcher Thread - Launcher thread initializing...
2017-04-26 09:29:14: creating listening socket on port: 49242
2017-04-26 09:29:14: writing listening socket to launcher file: 49242
2017-04-26 09:29:14: Launcher Thread - Launcher thread listening on port 17090
2017-04-26 09:29:16: Local version file: C:\Program Files\Thinkbox\Deadline9\bin\Version
2017-04-26 09:29:16: Network version file: \\rb1\deadline\DeadlineRepository9\bin\Windows\64bit\Version
2017-04-26 09:29:16: Comparing version files...
2017-04-26 09:29:16: Version files match
2017-04-26 09:29:16: Launching Slave:
2017-04-26 09:29:20: Auto Configuration: Picking configuration based on: MACHINE_NAME / MACHINE_IP
2017-04-26 09:29:20: Auto Configuration: No auto configuration could be detected, using local configuration
2017-04-26 09:29:21: Launcher Thread - Remote Administration is now enabled
2017-04-26 09:29:21: Launcher Thread - Automatic Updates is now enabled
2017-04-26 09:35:18: ::ffff:PULSE_IP has connected
2017-04-26 09:35:18: Launcher Thread - Received command: OnLastTaskComplete ShutdownMachineIdle : : Power Management shut down idle machines : False : ~
2017-04-26 09:35:18: Sending command to slave: OnLastTaskComplete
2017-04-26 09:35:18: Got reply: MACHINE_NAME: Sent "OnLastTaskComplete ShutdownMachineIdle : : Power Management shut down idle machines : False " command. Result: "Connection Accepted.
2017-04-26 09:35:18: "
2017-04-26 09:35:18: Launcher Thread - OnLastTaskComplete: Set to ShutdownMachineIdle
2017-04-26 09:35:18: Launcher Thread - Responded with: Success
2017-04-26 09:35:19: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:35:19: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:35:24: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:35:24: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:35:29: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:35:29: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:35:34: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:35:34: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:35:40: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:35:40: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:35:45: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:35:45: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:35:50: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:35:50: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:35:55: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:35:55: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:36:00: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:36:00: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:36:05: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:36:05: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:36:10: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:36:10: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:36:15: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:36:15: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:36:20: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:36:20: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:36:25: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:36:25: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:36:30: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:36:30: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:36:35: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:36:35: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:36:40: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:36:40: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:36:45: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:36:45: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:36:50: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:36:50: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:36:55: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:36:55: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:37:01: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:37:01: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:37:06: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:37:06: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:37:11: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:37:11: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:37:16: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:37:16: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:37:21: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:37:21: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:37:26: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:37:26: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:37:31: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:37:31: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:37:36: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:37:36: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:37:41: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:37:41: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:37:46: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:37:46: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:37:51: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:37:51: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:37:56: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:37:56: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:38:01: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:38:01: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:38:06: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:38:06: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:38:11: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:38:11: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:38:16: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:38:16: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:38:22: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:38:22: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:38:27: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:38:27: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:38:32: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:38:32: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:38:37: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:38:37: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:38:42: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:38:42: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:38:47: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:38:47: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:38:52: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:38:52: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:38:57: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:38:57: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:39:02: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:39:02: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:39:07: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:39:07: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:39:12: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:39:12: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:39:17: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:39:17: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:39:22: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:39:22: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:39:27: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:39:27: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:39:32: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:39:32: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:39:37: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:39:37: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:39:43: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:39:43: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:39:48: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:39:48: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:39:53: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:39:53: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:39:58: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:39:58: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:40:03: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:40:03: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:40:08: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:40:08: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:40:13: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:40:13: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:40:18: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:40:18: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:40:23: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:40:23: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:40:28: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:40:28: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:40:33: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:40:33: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:40:38: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:40:38: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:40:43: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:40:43: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:40:48: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:40:48: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:40:53: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:40:53: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:40:58: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:40:58: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:41:04: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:41:04: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:41:09: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:41:09: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:41:14: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:41:14: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:41:19: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:41:19: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:41:24: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:41:24: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:41:29: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:41:29: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:41:34: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:41:34: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:41:39: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:41:39: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:41:44: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:41:44: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:41:49: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:41:49: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:41:54: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:41:54: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:41:59: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:41:59: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:42:04: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:42:04: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:42:09: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:42:09: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:42:14: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:42:14: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:42:19: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:42:19: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:42:25: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:42:25: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:42:30: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:42:30: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:42:35: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:42:35: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:42:40: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:42:40: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:42:45: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:42:45: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:42:50: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:42:50: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:42:55: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:42:55: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:43:00: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:43:00: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:43:05: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:43:05: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:43:10: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:43:10: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:43:15: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:43:15: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:43:20: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:43:20: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:43:25: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:43:25: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:43:30: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:43:30: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:43:35: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:43:35: Launcher Thread - OnLastTaskComplete: Slave is still running
2017-04-26 09:43:40: Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:43:41: Launcher Thread - OnLastTaskComplete: Shutting down machine because all slaves have shutdown- Reason: Power Management shut down idle machines
Summary:
machine start at 9:29:14
after aroung 5 minutes at 9:35:18 the ShutdownMachineIdle command is received
from 09:35:19 to 09:43:41 launcher is waiting for slave to shut down
launcher is shutting down the computer
As you can see it took over 10 minute to fully shutdown the machine.
I have to split this post into parts due to maximum character restriction
2017-04-26 09:29:20: BEGIN - MACHINE_NAME\user
2017-04-26 09:29:20: Operating System: Windows 81
2017-04-26 09:29:20: CPU Architecture: x64
2017-04-26 09:29:20: CPUs: 12
2017-04-26 09:29:20: Video Card: NVIDIA GeForce 210
2017-04-26 09:29:20: Deadline Slave 9.0 [v9.0.2.0 Release (89e276a72)]
2017-04-26 09:29:25: Auto Configuration: No auto configuration for Repository Path could be detected, using local configuration
2017-04-26 09:29:26: Plugin sandboxing will not be used because it is disabled in the Repository Options.
2017-04-26 09:29:26: Info Thread - Created.
2017-04-26 09:29:27: WARNING: Current process is already in a Process Tree that does not allow breaking away. A new Process Tree will not be created.
2017-04-26 09:29:28: Auto Configuration: Picking configuration based on: MACHINE_NAME / MACHINE_IP
2017-04-26 09:29:28: Auto Configuration: No auto configuration could be detected, using local configuration
2017-04-26 09:29:28: Scheduler Thread - Slave initialization complete.
2017-04-26 09:29:28: Scheduler - Performing Job scan on Primary Pools with scheduling order Pool, Priority, First-in First-out
2017-04-26 09:29:28: Scheduler - Job chooser found no jobs.
2017-04-26 09:29:28: Scheduler - Performing Job scan on Secondary Pools with scheduling order Pool, Priority, First-in First-out
2017-04-26 09:29:28: Scheduler - Job chooser found no jobs.
2017-04-26 09:29:29: Scheduler Thread - Seconds before next job scan: 2
2017-04-26 09:29:32: Scheduler - Performing Job scan on Primary Pools with scheduling order Pool, Priority, First-in First-out
2017-04-26 09:29:32: Scheduler - Job chooser found no jobs.
2017-04-26 09:29:32: Scheduler - Performing Job scan on Secondary Pools with scheduling order Pool, Priority, First-in First-out
2017-04-26 09:29:32: Scheduler - Job chooser found no jobs.
** same message repeated over and over **
2017-04-26 09:35:12: Scheduler Thread - Seconds before next job scan: 2
2017-04-26 09:35:14: Scheduler - Performing Job scan on Primary Pools with scheduling order Pool, Priority, First-in First-out
2017-04-26 09:35:14: Scheduler - Job chooser found no jobs.
2017-04-26 09:35:14: Scheduler - Performing Job scan on Secondary Pools with scheduling order Pool, Priority, First-in First-out
2017-04-26 09:35:14: Scheduler - Job chooser found no jobs.
2017-04-26 09:35:15: Scheduler Thread - Seconds before next job scan: 2
2017-04-26 09:35:18: Scheduler - Performing Job scan on Primary Pools with scheduling order Pool, Priority, First-in First-out
2017-04-26 09:35:18: Scheduler - Job chooser found no jobs.
2017-04-26 09:35:18: Scheduler - Performing Job scan on Secondary Pools with scheduling order Pool, Priority, First-in First-out
2017-04-26 09:35:18: Scheduler - Job chooser found no jobs.
2017-04-26 09:35:18: Listener Thread - ::1 has connected
2017-04-26 09:35:18: Listener Thread - Received message: OnLastTaskComplete ShutdownMachineIdle : : Power Management shut down idle machines : False
2017-04-26 09:35:19: Scheduler Thread - Seconds before next job scan: 2
2017-04-26 09:35:19: Scheduler Thread - Current task complete, shutting down machine...
2017-04-26 09:35:19: Slave - slave shutdown: normal
2017-04-26 09:35:19: Info Thread - requesting slave info thread quit.
2017-04-26 09:35:19: Info Thread - shutdown complete
2017-04-26 09:35:19: Scheduler Thread - shutdown complete
2017-04-26 09:35:19: Listener Thread - ::1 has connected
2017-04-26 09:35:19: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:35:20: Slave - Final cleanup
2017-04-26 09:35:20: WARNING: Current process is already in a Process Tree that does not allow breaking away. A new Process Tree will not be created.
2017-04-26 09:35:24: Listener Thread - ::1 has connected
2017-04-26 09:35:24: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:35:29: Listener Thread - ::1 has connected
2017-04-26 09:35:29: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:35:34: Listener Thread - ::1 has connected
2017-04-26 09:35:34: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:35:40: Listener Thread - ::1 has connected
2017-04-26 09:35:40: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:35:45: Listener Thread - ::1 has connected
2017-04-26 09:35:45: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:35:50: Listener Thread - ::1 has connected
2017-04-26 09:35:50: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:35:55: Listener Thread - ::1 has connected
2017-04-26 09:35:55: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:36:00: Listener Thread - ::1 has connected
2017-04-26 09:36:00: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:36:05: Listener Thread - ::1 has connected
** same message repeated over and over **
2017-04-26 09:40:43: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:40:48: Listener Thread - ::1 has connected
2017-04-26 09:40:48: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:40:53: Listener Thread - ::1 has connected
2017-04-26 09:40:53: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:40:58: Listener Thread - ::1 has connected
2017-04-26 09:40:58: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:41:04: Listener Thread - ::1 has connected
2017-04-26 09:41:04: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:41:09: Listener Thread - ::1 has connected
2017-04-26 09:41:09: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:41:14: Listener Thread - ::1 has connected
2017-04-26 09:41:14: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:41:19: Listener Thread - ::1 has connected
2017-04-26 09:41:19: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:41:24: Listener Thread - ::1 has connected
2017-04-26 09:41:24: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:41:29: Listener Thread - ::1 has connected
2017-04-26 09:41:29: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:41:32: WARNING: Encountered the following error while initializing the Event Sandbox: 'The Event Sandbox process is alive but unresponsive.'.
2017-04-26 09:41:32: Falling back to embedded Event Manager -- this may result in stale Python environments being used.
2017-04-26 09:41:34: Listener Thread - ::1 has connected
2017-04-26 09:41:34: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:41:39: Listener Thread - ::1 has connected
2017-04-26 09:41:39: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:41:44: Listener Thread - ::1 has connected
2017-04-26 09:41:44: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:41:49: Listener Thread - ::1 has connected
2017-04-26 09:41:49: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:41:54: Listener Thread - ::1 has connected
2017-04-26 09:41:54: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:41:59: Listener Thread - ::1 has connected
2017-04-26 09:41:59: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:42:04: Listener Thread - ::1 has connected
2017-04-26 09:42:04: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:42:09: Listener Thread - ::1 has connected
2017-04-26 09:42:09: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:42:14: Listener Thread - ::1 has connected
2017-04-26 09:42:14: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:42:19: Listener Thread - ::1 has connected
2017-04-26 09:42:19: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:42:25: Listener Thread - ::1 has connected
2017-04-26 09:42:25: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:42:30: Listener Thread - ::1 has connected
2017-04-26 09:42:30: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:42:35: Listener Thread - ::1 has connected
2017-04-26 09:42:35: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:42:40: Listener Thread - ::1 has connected
2017-04-26 09:42:40: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:42:45: Listener Thread - ::1 has connected
2017-04-26 09:42:45: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:42:50: Listener Thread - ::1 has connected
2017-04-26 09:42:50: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:42:55: Listener Thread - ::1 has connected
2017-04-26 09:42:55: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:43:00: Listener Thread - ::1 has connected
2017-04-26 09:43:00: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:43:05: Listener Thread - ::1 has connected
2017-04-26 09:43:05: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:43:10: Listener Thread - ::1 has connected
2017-04-26 09:43:10: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:43:15: Listener Thread - ::1 has connected
2017-04-26 09:43:15: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:43:20: Listener Thread - ::1 has connected
2017-04-26 09:43:20: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:43:25: Listener Thread - ::1 has connected
2017-04-26 09:43:25: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:43:30: Listener Thread - ::1 has connected
2017-04-26 09:43:30: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:43:35: Listener Thread - ::1 has connected
2017-04-26 09:43:35: Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:43:40: Listener Thread - OnConnect: Listener Socket has been closed.
2017-04-26 09:43:40: Slave - Shutdown complete
Summary:
Slave start at 9:29:20
Scanning for job
At 09:35:18 slave is receiving command OnLastTaskComplete ShutdownMachineIdle
launcher is checking if slave is still running
At 09:43:40 finally the slave is shutting down
This is the most interesting part
2017-04-26 09:41:32: WARNING: Encountered the following error while initializing the Event Sandbox: 'The Event Sandbox process is alive but unresponsive.'.
2017-04-26 09:41:32: Falling back to embedded Event Manager -- this may result in stale Python environments being used.
Here the problem is that the slave is not starting even though it is setup to start with the Launcher. We do not use multi slave configuration.
Launcher log
2017-04-26 10:38:29: BEGIN - MACHINE_NAME\user
2017-04-26 10:38:29: Operating System: Windows 81
2017-04-26 10:38:29: CPU Architecture: x64
2017-04-26 10:38:29: CPUs: 12
2017-04-26 10:38:29: Video Card: NVIDIA GeForce 210
2017-04-26 10:38:30: Deadline Launcher 9.0 [v9.0.2.0 Release (89e276a72)]
2017-04-26 10:38:30: Launcher Thread - Launcher thread initializing...
2017-04-26 10:38:30: creating listening socket on port: 49230
2017-04-26 10:38:30: writing listening socket to launcher file: 49230
2017-04-26 10:38:30: Launcher Thread - Launcher thread listening on port 17090
2017-04-26 10:38:31: Local version file: C:\Program Files\Thinkbox\Deadline9\bin\Version
2017-04-26 10:38:31: Network version file: \\rb1\deadline\DeadlineRepository9\bin\Windows\64bit\Version
2017-04-26 10:38:31: Comparing version files...
2017-04-26 10:38:31: Version files match
2017-04-26 10:38:31: Launching Slave:
2017-04-26 10:38:34: Auto Configuration: Picking configuration based on: MACHINE_NAME / MACHINE_IP
2017-04-26 10:38:34: Auto Configuration: No auto configuration could be detected, using local configuration
2017-04-26 10:38:35: Launcher Thread - Remote Administration is now enabled
2017-04-26 10:38:35: Launcher Thread - Automatic Updates is now enabled
2017-04-26 10:38:53: ::ffff:PULSE_IP has connected
2017-04-26 10:38:53: Launcher Thread - Received command: LaunchSlave ~
2017-04-26 10:38:53: Local version file: C:\Program Files\Thinkbox\Deadline9\bin\Version
2017-04-26 10:38:53: Network version file: \\rb1\deadline\DeadlineRepository9\bin\Windows\64bit\Version
2017-04-26 10:38:54: Comparing version files...
2017-04-26 10:38:54: Version files match
2017-04-26 10:38:54: Launcher Thread - Responded with: Success
Slave log
2017-04-26 10:38:33: BEGIN - MACHINE_NAME\user
2017-04-26 10:38:33: Operating System: Windows 81
2017-04-26 10:38:33: CPU Architecture: x64
2017-04-26 10:38:33: CPUs: 12
2017-04-26 10:38:33: Video Card: NVIDIA GeForce 210
2017-04-26 10:38:33: Deadline Slave 9.0 [v9.0.2.0 Release (89e276a72)]
2017-04-26 10:38:33: Another Slave named 'MACHINE_NAME' is already running. Use the '-name' setting to launch a slave with an alternative name. Slave will now shutdown.
2017-04-26 10:38:33: Slave - Final cleanup
2017-04-26 10:38:33: Slave - Shutdown complete
To resolve this I have to restart the machine manually. Sometimes a few times in a row.
I should add that nothing have change in the way our machines (slaves and servers) are setup. Both problems did not occur in DL8 and I have installed all the DL9 version (9.0.0.18, 9.0.1.1, 9.0.2.0). I appreciate any help with these issues.
I’m not sure on the shutdown problem at the moment, so I’m going to dive into the startup.
The “Another Slave named ‘MACHINE_NAME’ is already running” is usually caused by the interesting way we try to limit the machine to only running a single Slave at a time. To do that, when the Slave starts up we have the it update “c:\programdata\Thinkbox\Deadline[X]\slaves[Y].ini” where “[Y]” is the Slave name suffix. Mainly, they write their currently listening port number there. If the port number in that file is currently in use, we assume one is already running and have the Slave throw that error and close.
Now, part of the problem is that the Slave doesn’t delete or update that file when it shuts down, and part is that if there is an outbound connection on that port that is in use the Slave will assume it shouldn’t start up.
To test if that’s the problem, try removing the .ini file for your slave(s) or change the port number and see if they start up correctly.
Your solution is partially right. After deleting the file “.ini” file slave starts without problem, which is great. But there is no process with PID that was saved inside the “.ini” file, so the problem is elsewhere.
So it looks like the problem is not caused by the PID but by the SlaveListeningPort which is being occupied by the lsass.exe process. I don’t know why. Changing the port number in “.ini” file resolves the issue.
Interesting because I did not change the default port in deadline.ini file. We use default installation and default port is set to 17093 (SlaveStartupPort).
The one that is colliding with lsass.exe is defined in “.ini” file as SlaveListeningPort. What is the relation between SlaveStartupPort (from deadline.ini) and SlaveListeningPort (from .ini)?
Check the image
[attachment=0]slave_ports_settings.png[/attachment]
Notice:
there is not process with PID 3360
SlaveStartupPort in deadline.ini is set to default 17093
SlaveListeningPort in .ini is set to 49170 (why not 17093?)
lsass.exe is occupying port 49170 same as SlaveListeningPort which prevents deadlineslave from starting up
From my side it looks like this.
When slave starts it grabs settings from .ini file.
As you said it checks if PID 3360 exists. It’s not.
Slave tries to start listening on port 49169 even though is should use default 17093 or random (?).
The 49169 is occupied by lsass.exe so it exits with message that slave with given name is running.
Where did I made mistake?
Interesting question is why lsass.exe is occupying this port?
According to this https://msdn.microsoft.com/en-us/library/cc875824.aspx lsass can allocate random high TCP port. But why is it always colliding with deadlinesalve, which happens multiple time per day? What is the chance that two different processes will randomly select same port?
At least I have a solution (delete .ini file), but I’m that kind of person that I want to know why is it happening
Because the slave ini file is not deleted then after reboot the lsass.exe process has a chance to grab the port which was used in previous session. When slave starts it checks that there is something happening at this port number and exists with error. I’ve notice that dynamic port is selected sequentially starting from 49152, so the chance of conflict is a little higher.
uff that was fun
Question
Isn’t PID and process name enough to determine if slave is running or not? If there is no process with PID saved in .ini file why do you assume that the slave may still be listening on port stored in SlaveListeningPort?
I believe this is mostly a holdover from when we had a single Slave, which a statically configured port. You are correct that we should at least be able to use the PID to get a better idea of whether it’s actually a Slave running, or something else using that port (at which point we should pick a new port). I’m sure we could also make use of advisory file locking on the .ini file itself as a synchronization point, as well.
We’ll have a look at improving this behaviour, we definitely appreciate all the effort you put into figuring this out!