AWS Thinkbox Discussion Forums

Problem with slaves starting and shutting down

After installing Deadline 9 (current version 9.0.2.0) we have a problem with starting up and shutting down slaves. I’m not sure if these problems are related. I need help with resolving both of them. Also they are not reproducible it looks more like a random problem, but it’s happening few times a day

The shutdown issue

I’ve noticed that after idle shutdown command is sent the slave hangs out and waits for long time before shutting down complete.

Launcher log:

2017-04-26 09:29:14:  BEGIN - MACHINE_NAME/user
2017-04-26 09:29:14:  Operating System: Windows 81
2017-04-26 09:29:14:  CPU Architecture: x64
2017-04-26 09:29:14:  CPUs: 12
2017-04-26 09:29:14:  Video Card: NVIDIA GeForce 210 
2017-04-26 09:29:14:  Deadline Launcher 9.0 [v9.0.2.0 Release (89e276a72)]
2017-04-26 09:29:14:  Launcher Thread - Launcher thread initializing...
2017-04-26 09:29:14:  creating listening socket on port: 49242
2017-04-26 09:29:14:  writing listening socket to launcher file: 49242
2017-04-26 09:29:14:  Launcher Thread - Launcher thread listening on port 17090
2017-04-26 09:29:16:  Local version file: C:\Program Files\Thinkbox\Deadline9\bin\Version
2017-04-26 09:29:16:  Network version file: \\rb1\deadline\DeadlineRepository9\bin\Windows\64bit\Version
2017-04-26 09:29:16:  Comparing version files...
2017-04-26 09:29:16:  Version files match
2017-04-26 09:29:16:  Launching Slave: 
2017-04-26 09:29:20:  Auto Configuration: Picking configuration based on: MACHINE_NAME / MACHINE_IP
2017-04-26 09:29:20:  Auto Configuration: No auto configuration could be detected, using local configuration
2017-04-26 09:29:21:  Launcher Thread - Remote Administration is now enabled
2017-04-26 09:29:21:  Launcher Thread - Automatic Updates is now enabled
2017-04-26 09:35:18:  ::ffff:PULSE_IP has connected
2017-04-26 09:35:18:  Launcher Thread - Received command: OnLastTaskComplete ShutdownMachineIdle :  : Power Management shut down idle machines : False : ~
2017-04-26 09:35:18:  Sending command to slave: OnLastTaskComplete 
2017-04-26 09:35:18:  Got reply: MACHINE_NAME: Sent "OnLastTaskComplete ShutdownMachineIdle  :    :  Power Management shut down idle machines  :  False " command. Result: "Connection Accepted.
2017-04-26 09:35:18:  "
2017-04-26 09:35:18:  Launcher Thread - OnLastTaskComplete: Set to ShutdownMachineIdle
2017-04-26 09:35:18:  Launcher Thread - Responded with: Success
2017-04-26 09:35:19:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:35:19:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:35:24:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:35:24:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:35:29:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:35:29:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:35:34:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:35:34:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:35:40:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:35:40:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:35:45:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:35:45:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:35:50:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:35:50:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:35:55:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:35:55:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:36:00:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:36:00:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:36:05:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:36:05:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:36:10:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:36:10:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:36:15:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:36:15:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:36:20:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:36:20:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:36:25:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:36:25:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:36:30:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:36:30:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:36:35:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:36:35:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:36:40:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:36:40:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:36:45:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:36:45:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:36:50:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:36:50:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:36:55:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:36:55:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:37:01:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:37:01:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:37:06:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:37:06:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:37:11:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:37:11:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:37:16:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:37:16:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:37:21:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:37:21:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:37:26:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:37:26:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:37:31:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:37:31:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:37:36:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:37:36:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:37:41:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:37:41:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:37:46:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:37:46:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:37:51:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:37:51:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:37:56:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:37:56:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:38:01:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:38:01:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:38:06:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:38:06:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:38:11:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:38:11:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:38:16:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:38:16:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:38:22:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:38:22:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:38:27:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:38:27:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:38:32:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:38:32:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:38:37:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:38:37:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:38:42:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:38:42:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:38:47:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:38:47:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:38:52:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:38:52:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:38:57:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:38:57:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:39:02:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:39:02:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:39:07:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:39:07:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:39:12:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:39:12:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:39:17:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:39:17:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:39:22:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:39:22:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:39:27:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:39:27:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:39:32:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:39:32:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:39:37:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:39:37:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:39:43:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:39:43:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:39:48:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:39:48:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:39:53:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:39:53:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:39:58:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:39:58:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:40:03:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:40:03:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:40:08:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:40:08:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:40:13:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:40:13:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:40:18:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:40:18:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:40:23:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:40:23:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:40:28:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:40:28:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:40:33:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:40:33:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:40:38:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:40:38:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:40:43:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:40:43:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:40:48:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:40:48:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:40:53:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:40:53:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:40:58:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:40:58:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:41:04:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:41:04:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:41:09:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:41:09:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:41:14:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:41:14:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:41:19:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:41:19:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:41:24:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:41:24:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:41:29:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:41:29:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:41:34:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:41:34:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:41:39:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:41:39:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:41:44:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:41:44:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:41:49:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:41:49:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:41:54:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:41:54:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:41:59:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:41:59:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:42:04:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:42:04:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:42:09:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:42:09:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:42:14:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:42:14:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:42:19:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:42:19:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:42:25:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:42:25:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:42:30:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:42:30:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:42:35:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:42:35:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:42:40:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:42:40:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:42:45:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:42:45:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:42:50:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:42:50:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:42:55:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:42:55:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:43:00:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:43:00:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:43:05:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:43:05:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:43:10:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:43:10:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:43:15:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:43:15:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:43:20:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:43:20:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:43:25:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:43:25:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:43:30:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:43:30:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:43:35:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:43:35:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2017-04-26 09:43:40:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2017-04-26 09:43:41:  Launcher Thread - OnLastTaskComplete: Shutting down machine because all slaves have shutdown- Reason:   Power Management shut down idle machines  

Summary:

  1. machine start at 9:29:14
  2. after aroung 5 minutes at 9:35:18 the ShutdownMachineIdle command is received
  3. from 09:35:19 to 09:43:41 launcher is waiting for slave to shut down
  4. launcher is shutting down the computer

As you can see it took over 10 minute to fully shutdown the machine.

I have to split this post into parts due to maximum character restriction

Slave log:

2017-04-26 09:29:20:  BEGIN - MACHINE_NAME\user
2017-04-26 09:29:20:  Operating System: Windows 81
2017-04-26 09:29:20:  CPU Architecture: x64
2017-04-26 09:29:20:  CPUs: 12
2017-04-26 09:29:20:  Video Card: NVIDIA GeForce 210 
2017-04-26 09:29:20:  Deadline Slave 9.0 [v9.0.2.0 Release (89e276a72)]
2017-04-26 09:29:25:  Auto Configuration: No auto configuration for Repository Path could be detected, using local configuration
2017-04-26 09:29:26:  Plugin sandboxing will not be used because it is disabled in the Repository Options.
2017-04-26 09:29:26:  Info Thread - Created.

2017-04-26 09:29:27:  WARNING: Current process is already in a Process Tree that does not allow breaking away. A new Process Tree will not be created.

2017-04-26 09:29:28:  Auto Configuration: Picking configuration based on: MACHINE_NAME / MACHINE_IP
2017-04-26 09:29:28:  Auto Configuration: No auto configuration could be detected, using local configuration
2017-04-26 09:29:28:  Scheduler Thread - Slave initialization complete.
2017-04-26 09:29:28:  Scheduler - Performing Job scan on Primary Pools with scheduling order Pool, Priority, First-in First-out
2017-04-26 09:29:28:  Scheduler - Job chooser found no jobs.
2017-04-26 09:29:28:  Scheduler - Performing Job scan on Secondary Pools with scheduling order Pool, Priority, First-in First-out
2017-04-26 09:29:28:  Scheduler - Job chooser found no jobs.
2017-04-26 09:29:29:  Scheduler Thread - Seconds before next job scan: 2
2017-04-26 09:29:32:  Scheduler - Performing Job scan on Primary Pools with scheduling order Pool, Priority, First-in First-out
2017-04-26 09:29:32:  Scheduler - Job chooser found no jobs.
2017-04-26 09:29:32:  Scheduler - Performing Job scan on Secondary Pools with scheduling order Pool, Priority, First-in First-out
2017-04-26 09:29:32:  Scheduler - Job chooser found no jobs.

** same message repeated over and over **

2017-04-26 09:35:12:  Scheduler Thread - Seconds before next job scan: 2
2017-04-26 09:35:14:  Scheduler - Performing Job scan on Primary Pools with scheduling order Pool, Priority, First-in First-out
2017-04-26 09:35:14:  Scheduler - Job chooser found no jobs.
2017-04-26 09:35:14:  Scheduler - Performing Job scan on Secondary Pools with scheduling order Pool, Priority, First-in First-out
2017-04-26 09:35:14:  Scheduler - Job chooser found no jobs.
2017-04-26 09:35:15:  Scheduler Thread - Seconds before next job scan: 2
2017-04-26 09:35:18:  Scheduler - Performing Job scan on Primary Pools with scheduling order Pool, Priority, First-in First-out
2017-04-26 09:35:18:  Scheduler - Job chooser found no jobs.
2017-04-26 09:35:18:  Scheduler - Performing Job scan on Secondary Pools with scheduling order Pool, Priority, First-in First-out
2017-04-26 09:35:18:  Scheduler - Job chooser found no jobs.
2017-04-26 09:35:18:  Listener Thread - ::1 has connected
2017-04-26 09:35:18:  Listener Thread - Received message: OnLastTaskComplete ShutdownMachineIdle  :    :  Power Management shut down idle machines  :  False 
2017-04-26 09:35:19:  Scheduler Thread - Seconds before next job scan: 2
2017-04-26 09:35:19:  Scheduler Thread - Current task complete, shutting down machine...
2017-04-26 09:35:19:  Slave - slave shutdown: normal
2017-04-26 09:35:19:  Info Thread - requesting slave info thread quit.
2017-04-26 09:35:19:  Info Thread - shutdown complete
2017-04-26 09:35:19:  Scheduler Thread - shutdown complete
2017-04-26 09:35:19:  Listener Thread - ::1 has connected
2017-04-26 09:35:19:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:35:20:  Slave - Final cleanup

2017-04-26 09:35:20:  WARNING: Current process is already in a Process Tree that does not allow breaking away. A new Process Tree will not be created.

2017-04-26 09:35:24:  Listener Thread - ::1 has connected
2017-04-26 09:35:24:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:35:29:  Listener Thread - ::1 has connected
2017-04-26 09:35:29:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:35:34:  Listener Thread - ::1 has connected
2017-04-26 09:35:34:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:35:40:  Listener Thread - ::1 has connected
2017-04-26 09:35:40:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:35:45:  Listener Thread - ::1 has connected
2017-04-26 09:35:45:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:35:50:  Listener Thread - ::1 has connected
2017-04-26 09:35:50:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:35:55:  Listener Thread - ::1 has connected
2017-04-26 09:35:55:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:36:00:  Listener Thread - ::1 has connected
2017-04-26 09:36:00:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:36:05:  Listener Thread - ::1 has connected

** same message repeated over and over **

2017-04-26 09:40:43:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:40:48:  Listener Thread - ::1 has connected
2017-04-26 09:40:48:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:40:53:  Listener Thread - ::1 has connected
2017-04-26 09:40:53:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:40:58:  Listener Thread - ::1 has connected
2017-04-26 09:40:58:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:41:04:  Listener Thread - ::1 has connected
2017-04-26 09:41:04:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:41:09:  Listener Thread - ::1 has connected
2017-04-26 09:41:09:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:41:14:  Listener Thread - ::1 has connected
2017-04-26 09:41:14:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:41:19:  Listener Thread - ::1 has connected
2017-04-26 09:41:19:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:41:24:  Listener Thread - ::1 has connected
2017-04-26 09:41:24:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:41:29:  Listener Thread - ::1 has connected
2017-04-26 09:41:29:  Listener Thread - Received message: SlaveStillRunning

2017-04-26 09:41:32:  WARNING: Encountered the following error while initializing the Event Sandbox: 'The Event Sandbox process is alive but unresponsive.'.

2017-04-26 09:41:32:  Falling back to embedded Event Manager -- this may result in stale Python environments being used.

2017-04-26 09:41:34:  Listener Thread - ::1 has connected
2017-04-26 09:41:34:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:41:39:  Listener Thread - ::1 has connected
2017-04-26 09:41:39:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:41:44:  Listener Thread - ::1 has connected
2017-04-26 09:41:44:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:41:49:  Listener Thread - ::1 has connected
2017-04-26 09:41:49:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:41:54:  Listener Thread - ::1 has connected
2017-04-26 09:41:54:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:41:59:  Listener Thread - ::1 has connected
2017-04-26 09:41:59:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:42:04:  Listener Thread - ::1 has connected
2017-04-26 09:42:04:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:42:09:  Listener Thread - ::1 has connected
2017-04-26 09:42:09:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:42:14:  Listener Thread - ::1 has connected
2017-04-26 09:42:14:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:42:19:  Listener Thread - ::1 has connected
2017-04-26 09:42:19:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:42:25:  Listener Thread - ::1 has connected
2017-04-26 09:42:25:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:42:30:  Listener Thread - ::1 has connected
2017-04-26 09:42:30:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:42:35:  Listener Thread - ::1 has connected
2017-04-26 09:42:35:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:42:40:  Listener Thread - ::1 has connected
2017-04-26 09:42:40:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:42:45:  Listener Thread - ::1 has connected
2017-04-26 09:42:45:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:42:50:  Listener Thread - ::1 has connected
2017-04-26 09:42:50:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:42:55:  Listener Thread - ::1 has connected
2017-04-26 09:42:55:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:43:00:  Listener Thread - ::1 has connected
2017-04-26 09:43:00:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:43:05:  Listener Thread - ::1 has connected
2017-04-26 09:43:05:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:43:10:  Listener Thread - ::1 has connected
2017-04-26 09:43:10:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:43:15:  Listener Thread - ::1 has connected
2017-04-26 09:43:15:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:43:20:  Listener Thread - ::1 has connected
2017-04-26 09:43:20:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:43:25:  Listener Thread - ::1 has connected
2017-04-26 09:43:25:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:43:30:  Listener Thread - ::1 has connected
2017-04-26 09:43:30:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:43:35:  Listener Thread - ::1 has connected
2017-04-26 09:43:35:  Listener Thread - Received message: SlaveStillRunning
2017-04-26 09:43:40:  Listener Thread - OnConnect: Listener Socket has been closed.
2017-04-26 09:43:40:  Slave - Shutdown complete

Summary:

  1. Slave start at 9:29:20
  2. Scanning for job
  3. At 09:35:18 slave is receiving command OnLastTaskComplete ShutdownMachineIdle
  4. launcher is checking if slave is still running
  5. At 09:43:40 finally the slave is shutting down

This is the most interesting part

2017-04-26 09:41:32:  WARNING: Encountered the following error while initializing the Event Sandbox: 'The Event Sandbox process is alive but unresponsive.'.
2017-04-26 09:41:32:  Falling back to embedded Event Manager -- this may result in stale Python environments being used.

I’ve manage to gather some additional information with ProcessExploerer

This is how the process tree is looking during the hang out

I’ve also recorded the stack trace of the processes (from top to bottom).

This is how it looks in ProcessExplorer and below you can find the text version for all the processes
conhost_thread_stack.png

deadlinelauncher.exe

ntdll.dll!NtWaitForMultipleObjects+0xa
KERNELBASE.dll!WaitForMultipleObjectsEx+0xed
USER32.dll!MsgWaitForMultipleObjectsEx+0x13f
Qt5Core.dll!QEventDispatcherWin32::processEvents+0x5b9
qwindows.dll+0x1fcd1
Qt5Core.dll!QEventLoop::exec+0x1cd
Qt5Core.dll!QCoreApplication::exec+0x189
QtWidgets.pyd!initQtWidgets+0x311f7
python27.DLL!PyCFunction_Call+0x69
python27.DLL!PyEval_GetFuncDesc+0x364
python27.DLL!PyEval_EvalFrameEx+0x2c7f
python27.DLL!PyEval_GetFuncDesc+0x518
python27.DLL!PyEval_GetFuncDesc+0x3da
python27.DLL!PyEval_EvalFrameEx+0x2c7f
python27.DLL!PyEval_GetFuncDesc+0x518
python27.DLL!PyEval_GetFuncDesc+0x3da
python27.DLL!PyEval_EvalFrameEx+0x2c7f
python27.DLL!PyEval_EvalCodeEx+0x8e8
python27.DLL!PyImport_ExecCodeModuleEx+0x1f5
[Native Frame: IL Method without Metadata]
[Managed to Unmanaged Transition]
Python.Runtime.dll!Python.Runtime.PythonEngine.ModuleFromString+0xcb
franticx.dll!FranticX.Scripting.PythonNetScriptEngine.ExecuteFile+0x6d
deadline.dll!Deadline.Scripting.DeadlineScriptManager.CreateModuleFromFile+0x45
deadlinelauncher.exe!a.a+0x7ca
[Unmanaged to Managed Transition]
clr.dll!DllCanUnloadNowInternal+0x543
clr.dll!DllCanUnloadNowInternal+0x44f
clr.dll!GetMetaDataPublicInterfaceFromInternal+0x1f945
clr.dll!GetMetaDataInternalInterfaceFromPublic+0x334d3
clr.dll!GetMetaDataInternalInterfaceFromPublic+0x33606
clr.dll!GetMetaDataInternalInterfaceFromPublic+0x33e2e
clr.dll!GetMetaDataInternalInterfaceFromPublic+0x33ed7
clr.dll!GetMetaDataInternalInterfaceFromPublic+0x337ca
clr.dll!CorExeMain+0x14
mscoreei.dll!CorExeMain+0x5d
MSCOREE.DLL!CorExeMain+0xcb
KERNEL32.dll!BaseThreadInitThunk+0xd
ntdll.dll!RtlUserThreadStart+0x34

deadlineslave.exe - stack

ntdll.dll!NtDelayExecution+0xa
KERNELBASE.dll!SleepEx+0xaa
clr.dll!DllCanUnloadNowInternal+0x403
clr.dll!GetMetaDataPublicInterfaceFromInternal+0x16dd4
clr.dll!GetMetaDataPublicInterfaceFromInternal+0x16edd
[Managed to Unmanaged Transition]
mscorlib.dll!System.Threading.Thread.Sleep+0xa
deadline.dll!Deadline.Net.DeadlineMessageUtils.a+0x41
deadline.dll!Deadline.Net.DeadlineMessageUtils.ReceiveDeadlineMessage+0x18
deadline.dll!Deadline.Events.SandboxedEventManager.a+0x163
deadline.dll!Deadline.Events.SandboxedEventManager.OnSlaveInfoUpdated+0xf0
deadline.dll!Deadline.Slaves.Slave.ReportSlaveInfo+0x1382
deadline.dll!Deadline.Slaves.Slave.UpdateMode+0x4b1
[Unmanaged to Managed Transition]
clr.dll!DllCanUnloadNowInternal+0x543
clr.dll!DllCanUnloadNowInternal+0x44f
clr.dll!DllUnregisterServerInternal+0x27b0a
clr.dll!DllUnregisterServerInternal+0x279a7
[Managed to Unmanaged Transition]
mscorlib.dll!System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal+0x104
mscorlib.dll!System.Reflection.RuntimeMethodInfo.Invoke+0x92
Python.Runtime.dll!Python.Runtime.MethodBinder.Invoke+0xfd
Python.Runtime.dll!Python.Runtime.MethodObject.Invoke+0x25
Python.Runtime.dll!Python.Runtime.MethodBinding.tp_call+0x33d
[Unmanaged to Managed Transition]
[Native Frame: IL Method without Metadata]
clr.dll!GetMetaDataPublicInterfaceFromInternal+0xbd5e
python27.DLL!PyObject_Call+0x86
python27.DLL!PyEval_GetFuncDesc+0x971
python27.DLL!PyEval_GetFuncDesc+0x3e7
python27.DLL!PyEval_EvalFrameEx+0x2c7f
python27.DLL!PyEval_GetFuncDesc+0x518
python27.DLL!PyEval_GetFuncDesc+0x3da
python27.DLL!PyEval_EvalFrameEx+0x2c7f
python27.DLL!PyEval_GetFuncDesc+0x518
python27.DLL!PyEval_GetFuncDesc+0x3da
python27.DLL!PyEval_EvalFrameEx+0x2c7f
python27.DLL!PyEval_EvalCodeEx+0x8e8
python27.DLL!PyFunction_SetClosure+0xbaa
python27.DLL!PyObject_Call+0x86
python27.DLL!PyMethod_New+0xb14
python27.DLL!PyObject_Call+0x86
python27.DLL!PyEval_CallObjectWithKeywords+0x144
QtCore.pyd!initQtCore+0x124ab6
QtCore.pyd!initQtCore+0x124f61
QtCore.pyd!initQtCore+0x125014
QtCore.pyd!initQtCore+0x125115
QtCore.pyd!initQtCore+0x125206
Qt5Core.dll!QMetaObject::activate+0x622
Qt5Core.dll!QTimer::timeout+0x16
Qt5Core.dll!QTimer::timerEvent+0x37
QtCore.pyd!initQtCore+0xeae5a
Qt5Core.dll!QObject::event+0x59
QtCore.pyd!initQtCore+0xeadda
Qt5Widgets.dll!QApplicationPrivate::notify_helper+0xf6
Qt5Widgets.dll!QApplication::notify+0x1eca
QtWidgets.pyd!initQtWidgets+0x2eef8
Qt5Core.dll!QCoreApplication::notifyInternal+0x6f
Qt5Core.dll!QEventDispatcherWin32Private::sendTimerEvent+0xd8
Qt5Core.dll!QEventDispatcherWin32Private::QEventDispatcherWin32Private+0x342
USER32.dll!DispatchMessageW+0x154
USER32.dll!IsThreadDesktopComposited+0x1f7
Qt5Core.dll!QEventDispatcherWin32::processEvents+0x426
qwindows.dll+0x1fcd1
Qt5Core.dll!QEventLoop::exec+0x1cd
Qt5Core.dll!QCoreApplication::exec+0x189
QtWidgets.pyd!initQtWidgets+0x311f7
python27.DLL!PyCFunction_Call+0x69
python27.DLL!PyEval_GetFuncDesc+0x364
python27.DLL!PyEval_EvalFrameEx+0x2c7f
python27.DLL!PyEval_GetFuncDesc+0x518
python27.DLL!PyEval_GetFuncDesc+0x3da
python27.DLL!PyEval_EvalFrameEx+0x2c7f
python27.DLL!PyEval_GetFuncDesc+0x518
python27.DLL!PyEval_GetFuncDesc+0x3da
python27.DLL!PyEval_EvalFrameEx+0x2c7f
python27.DLL!PyEval_EvalCodeEx+0x8e8
python27.DLL!PyImport_ExecCodeModuleEx+0x1f5
[Native Frame: IL Method without Metadata]
[Managed to Unmanaged Transition]
Python.Runtime.dll!Python.Runtime.PythonEngine.ModuleFromString+0xcb
franticx.dll!FranticX.Scripting.PythonNetScriptEngine.ExecuteFile+0x6d
deadline.dll!Deadline.Scripting.DeadlineScriptManager.CreateModuleFromFile+0x45
deadlineslave.exe!a.a+0xef0
[Unmanaged to Managed Transition]
clr.dll!DllCanUnloadNowInternal+0x543
clr.dll!DllCanUnloadNowInternal+0x44f
clr.dll!GetMetaDataPublicInterfaceFromInternal+0x1f945
clr.dll!GetMetaDataInternalInterfaceFromPublic+0x334d3
clr.dll!GetMetaDataInternalInterfaceFromPublic+0x33606
clr.dll!GetMetaDataInternalInterfaceFromPublic+0x33e2e
clr.dll!GetMetaDataInternalInterfaceFromPublic+0x33ed7
clr.dll!GetMetaDataInternalInterfaceFromPublic+0x337ca
clr.dll!CorExeMain+0x14
mscoreei.dll!CorExeMain+0x5d
MSCOREE.DLL!CorExeMain+0xcb
KERNEL32.dll!BaseThreadInitThunk+0xd
ntdll.dll!RtlUserThreadStart+0x34

deadlinesandbox.exe stack

ntdll.dll!NtDelayExecution+0xa
KERNELBASE.dll!SleepEx+0xaa
clr.dll!DllCanUnloadNowInternal+0x403
clr.dll!GetMetaDataPublicInterfaceFromInternal+0x16dd4
clr.dll!GetMetaDataPublicInterfaceFromInternal+0x16edd
[Managed to Unmanaged Transition]
mscorlib.dll!System.Threading.Thread.Sleep+0xa
deadline.dll!Deadline.Slaves.CommandListener.b+0x12e
deadline.dll!Deadline.Slaves.CommandListener.c+0x235
deadlinesandbox.exe!a.a+0x254
[Unmanaged to Managed Transition]
clr.dll!DllCanUnloadNowInternal+0x543
clr.dll!DllCanUnloadNowInternal+0x44f
clr.dll!GetMetaDataPublicInterfaceFromInternal+0x1f945
clr.dll!GetMetaDataInternalInterfaceFromPublic+0x334d3
clr.dll!GetMetaDataInternalInterfaceFromPublic+0x33606
clr.dll!GetMetaDataInternalInterfaceFromPublic+0x33e2e
clr.dll!GetMetaDataInternalInterfaceFromPublic+0x33ed7
clr.dll!GetMetaDataInternalInterfaceFromPublic+0x337ca
clr.dll!CorExeMain+0x14
mscoreei.dll!CorExeMain+0x5d
MSCOREE.DLL!CorExeMain+0xcb
KERNEL32.dll!BaseThreadInitThunk+0xd
ntdll.dll!RtlUserThreadStart+0x34

conhost.exe

ntoskrnl.exe!KeSynchronizeExecution+0x2246
ntoskrnl.exe!KeWaitForMultipleObjects+0x123e
ntoskrnl.exe!KeWaitForMultipleObjects+0xcb9
ntoskrnl.exe!KeWaitForMutexObject+0x2c0
ntoskrnl.exe!_misaligned_access+0x7c
ntoskrnl.exe!_misaligned_access+0xca9
ntoskrnl.exe!KeWaitForMultipleObjects+0x140f
ntoskrnl.exe!KeWaitForMultipleObjects+0xcb9
ntoskrnl.exe!KeWaitForMutexObject+0x2c0
ntoskrnl.exe!NtWaitForSingleObject+0xb2
ntoskrnl.exe!setjmpex+0x34a3
ntdll.dll!ZwWaitForSingleObject+0xa
conhost.exe+0x1155
KERNEL32.DLL!BaseThreadInitThunk+0xd
ntdll.dll!RtlUserThreadStart+0x34

The startup issue

Here the problem is that the slave is not starting even though it is setup to start with the Launcher. We do not use multi slave configuration.

Launcher log

2017-04-26 10:38:29:  BEGIN - MACHINE_NAME\user
2017-04-26 10:38:29:  Operating System: Windows 81
2017-04-26 10:38:29:  CPU Architecture: x64
2017-04-26 10:38:29:  CPUs: 12
2017-04-26 10:38:29:  Video Card: NVIDIA GeForce 210 
2017-04-26 10:38:30:  Deadline Launcher 9.0 [v9.0.2.0 Release (89e276a72)]
2017-04-26 10:38:30:  Launcher Thread - Launcher thread initializing...
2017-04-26 10:38:30:  creating listening socket on port: 49230
2017-04-26 10:38:30:  writing listening socket to launcher file: 49230
2017-04-26 10:38:30:  Launcher Thread - Launcher thread listening on port 17090
2017-04-26 10:38:31:  Local version file: C:\Program Files\Thinkbox\Deadline9\bin\Version
2017-04-26 10:38:31:  Network version file: \\rb1\deadline\DeadlineRepository9\bin\Windows\64bit\Version
2017-04-26 10:38:31:  Comparing version files...
2017-04-26 10:38:31:  Version files match
2017-04-26 10:38:31:  Launching Slave: 
2017-04-26 10:38:34:  Auto Configuration: Picking configuration based on: MACHINE_NAME / MACHINE_IP
2017-04-26 10:38:34:  Auto Configuration: No auto configuration could be detected, using local configuration
2017-04-26 10:38:35:  Launcher Thread - Remote Administration is now enabled
2017-04-26 10:38:35:  Launcher Thread - Automatic Updates is now enabled
2017-04-26 10:38:53:  ::ffff:PULSE_IP has connected
2017-04-26 10:38:53:  Launcher Thread - Received command: LaunchSlave ~
2017-04-26 10:38:53:  Local version file: C:\Program Files\Thinkbox\Deadline9\bin\Version
2017-04-26 10:38:53:  Network version file: \\rb1\deadline\DeadlineRepository9\bin\Windows\64bit\Version
2017-04-26 10:38:54:  Comparing version files...
2017-04-26 10:38:54:  Version files match
2017-04-26 10:38:54:  Launcher Thread - Responded with: Success

Slave log

2017-04-26 10:38:33:  BEGIN - MACHINE_NAME\user
2017-04-26 10:38:33:  Operating System: Windows 81
2017-04-26 10:38:33:  CPU Architecture: x64
2017-04-26 10:38:33:  CPUs: 12
2017-04-26 10:38:33:  Video Card: NVIDIA GeForce 210 
2017-04-26 10:38:33:  Deadline Slave 9.0 [v9.0.2.0 Release (89e276a72)]
2017-04-26 10:38:33:  Another Slave named 'MACHINE_NAME' is already running. Use the '-name' setting to launch a slave with an alternative name. Slave will now shutdown.
2017-04-26 10:38:33:  Slave - Final cleanup
2017-04-26 10:38:33:  Slave - Shutdown complete

To resolve this I have to restart the machine manually. Sometimes a few times in a row.

I should add that nothing have change in the way our machines (slaves and servers) are setup. Both problems did not occur in DL8 and I have installed all the DL9 version (9.0.0.18, 9.0.1.1, 9.0.2.0). I appreciate any help with these issues.

I’m not sure on the shutdown problem at the moment, so I’m going to dive into the startup.

The “Another Slave named ‘MACHINE_NAME’ is already running” is usually caused by the interesting way we try to limit the machine to only running a single Slave at a time. To do that, when the Slave starts up we have the it update “c:\programdata\Thinkbox\Deadline[X]\slaves[Y].ini” where “[Y]” is the Slave name suffix. Mainly, they write their currently listening port number there. If the port number in that file is currently in use, we assume one is already running and have the Slave throw that error and close.

Now, part of the problem is that the Slave doesn’t delete or update that file when it shuts down, and part is that if there is an outbound connection on that port that is in use the Slave will assume it shouldn’t start up.

To test if that’s the problem, try removing the .ini file for your slave(s) or change the port number and see if they start up correctly.

Thanks for the help.

Your solution is partially right. After deleting the file “.ini” file slave starts without problem, which is great. But there is no process with PID that was saved inside the “.ini” file, so the problem is elsewhere.

– edit

So it looks like the problem is not caused by the PID but by the SlaveListeningPort which is being occupied by the lsass.exe process. I don’t know why. Changing the port number in “.ini” file resolves the issue.

At least for now I will have to delete the “.ini” file at system startup to prevent slave locking.

Well, the Slave should either be picking a random listening port, or whichever was defined for it in the master deadline.ini file:

docs.thinkboxsoftware.com/produc … tartupport

You could just change “SlaveStartupPort” to be a different number.

Fun aside: Just noticed that typo in the docs :smiley:

Interesting because I did not change the default port in deadline.ini file. We use default installation and default port is set to 17093 (SlaveStartupPort).

The one that is colliding with lsass.exe is defined in “.ini” file as SlaveListeningPort. What is the relation between SlaveStartupPort (from deadline.ini) and SlaveListeningPort (from .ini)?

deadline.ini

[Deadline]
LicenseMode=Standard
LicenseServer=** CUT **
Region=
LauncherListeningPort=17090
LauncherServiceStartupDelay=60
AutoConfigurationPort=17091
SlaveStartupPort=17093
SlaveDataRoot=
RestartStalledSlave=True
NoGuiMode=false
LaunchSlaveAtStartup=1
AutoUpdateOverride=
ConnectionType=Direct
NetworkRoot=** CUT **
DbSSLCertificate=
NetworkRoot0= ** CUT **

.ini

[Deadline]
SlaveProcessID=3360
SlaveStartedByIdleDetection=False
SlaveListeningPort=49169
SlaveRemoteLogPort=49170

Check the image
[attachment=0]slave_ports_settings.png[/attachment]

Notice:

  1. there is not process with PID 3360
  2. SlaveStartupPort in deadline.ini is set to default 17093
  3. SlaveListeningPort in .ini is set to 49170 (why not 17093?)
  4. lsass.exe is occupying port 49170 same as SlaveListeningPort which prevents deadlineslave from starting up

From my side it looks like this.

  1. When slave starts it grabs settings from .ini file.
  2. As you said it checks if PID 3360 exists. It’s not.
  3. Slave tries to start listening on port 49169 even though is should use default 17093 or random (?).
  4. The 49169 is occupied by lsass.exe so it exits with message that slave with given name is running.

Where did I made mistake?

Interesting question is why lsass.exe is occupying this port?
According to this https://msdn.microsoft.com/en-us/library/cc875824.aspx lsass can allocate random high TCP port. But why is it always colliding with deadlinesalve, which happens multiple time per day? What is the chance that two different processes will randomly select same port?

At least I have a solution (delete .ini file), but I’m that kind of person that I want to know why is it happening :slight_smile:

ok, so I made a mistake :slight_smile:

The slave starts through port 17093 and then probably forks (or something like that) and uses one of the windows dynamic ports as SlaveListeningPort
https://support.microsoft.com/en-us/help/929851/the-default-dynamic-port-range-for-tcp-ip-has-changed-in-windows-vista
Every time the slave is restarted it selects new dynamic port above 49151 and stores it in .ini file

Also I’ve run some other experiments and mange to reproduce the issue:

  1. turn off the launcher and the slave
  2. use ProcessExplorer (or similar software) to determine the port used by the lsass.exe process
  3. edit .ini slave file and set SlaveListeningPort to the same port that lsass.exe is using
  4. start launcher

Effect:
The slave should not start with error that there is already running a slave with that name.

Cause:
Ports conflict. As you stated previously

lsass.exe uses port above 49152 as stated here https://support.microsoft.com/en-us/help/832017/service-overview-and-network-port-requirements-for-windows

Because the slave ini file is not deleted then after reboot the lsass.exe process has a chance to grab the port which was used in previous session. When slave starts it checks that there is something happening at this port number and exists with error. I’ve notice that dynamic port is selected sequentially starting from 49152, so the chance of conflict is a little higher.

uff that was fun :slight_smile:

Question
Isn’t PID and process name enough to determine if slave is running or not? If there is no process with PID saved in .ini file why do you assume that the slave may still be listening on port stored in SlaveListeningPort?

I believe this is mostly a holdover from when we had a single Slave, which a statically configured port. You are correct that we should at least be able to use the PID to get a better idea of whether it’s actually a Slave running, or something else using that port (at which point we should pick a new port). I’m sure we could also make use of advisory file locking on the .ini file itself as a synchronization point, as well.

We’ll have a look at improving this behaviour, we definitely appreciate all the effort you put into figuring this out!

Cheers,
Jon

Bump …

Tomasz, huge thanks for the solution!!!

I’m having the same issue with the launcher port too, so the fix is to delete both the \launcher.ini and \slaves.ini files at startup.

Privacy | Site terms | Cookie preferences