AWS Thinkbox Discussion Forums

Slave Scheduler not working

Hey,
obviously Deadline doesn’t like me :frowning:

Because the feature “CPU Affinity” did not work with 3ds Max I recently activated the slave scheduler - and today i noticed that this feature doesn’t work either - or i did something wrong… :slight_smile:

Attached you can find my settings for the “Slave Scheduling”. The launcher is running as desktop applikation (not as service) but the slave starts although the workstation is not idled.

If i enable the “Idle Detection Override” on the client via the launcher with the same settings, the slave gets offline but after a while, however, he comes back online and renders a task. When it’s done the slave goes offline again and everything starts from scratch.

Here some lines out of the logs

2018-09-06 16:05:43: Scheduler Thread - Current task complete, shutting down slave…
2018-09-06 16:05:43: Info Thread - requesting slave info thread quit.
2018-09-06 16:05:43: 0: Executing plugin command of type ‘Cancel Task’
2018-09-06 16:05:43: 0: Done executing plugin command of type ‘Cancel Task’
2018-09-06 16:05:43: 0: Executing plugin command of type ‘End Job’
2018-09-06 16:05:43: 0: INFO: End Job called - shutting down 3dsmax plugin
2018-09-06 16:05:44: Info Thread - shutdown complete
2018-09-06 16:05:44: Error occurred when checking sandbox stdout: Cannot check if stdout is available for a process that has not been launched.
2018-09-06 16:05:48: 0: WARNING: Timed out waiting for the renderer to close.
2018-09-06 16:05:48: 0: WARNING: Did not receive a success message in response to EndJob:
2018-09-06 16:05:48: 0: INFO: Disconnecting socket connection to 3dsmax
2018-09-06 16:05:48: 0: INFO: Waiting for 3ds Max to shut down
2018-09-06 16:05:48: 0: INFO: 3ds Max has shut down
2018-09-06 16:05:48: 0: Done executing plugin command of type ‘End Job’
2018-09-06 16:05:49: Scheduler Thread - shutdown complete
2018-09-06 16:05:49: Slave - Final cleanup
2018-09-06 16:05:50: Slave - Shutdown complete

It will come back online after five minutes according to your settings. You also have it configured to stop as soon as someone moves the mouse or touches the keyboard, so this is going to oscillate a lot throughout the day. How do you want this to work specifically?

The Launcher log is going to be the best log for us here as it is the one controlling the Slave for idle detection. There will likely be IP addresses and similar in the log, so be mindful if that’s something you want to filter out.

Hey,
thanks for answering! :slight_smile:

But it comes online while a user is sitting in front of the workstation and using it (while the computer is not idle) and not after 5 minutes but after very different times.

I understood my settings like this (and please correct myself if I’m wrong):
While the workstation is in use: stop slave. If the workstation is idle for 5 minutes and (!) the cpu usage is under 50% and (!) the memory usage is under 50% -> start the slave and render a task as long as no one touches the workstation. If someone touches the keyboard/mouse, finish the current task and keep offline as long the workstation is not idle for 5 minutes…

Here are some logs from the deadlinelauncher-*

2018-09-06 17:02:29: BEGIN - VXWORKST119\user-running-launcher
2018-09-06 17:02:29: Operating System: Windows 10 Pro
2018-09-06 17:02:29: CPU Architecture: x64
2018-09-06 17:02:29: CPUs: 8
2018-09-06 17:02:29: Video Card: NVIDIA GeForce GTX 1070
2018-09-06 17:02:29: Deadline Launcher 10.0 [vip-of-workstation-2 Release (12de851eb)]
2018-09-06 17:02:29: Auto Configuration: Picking configuration based on: vxworkst119 / ip-of-workstation-119
2018-09-06 17:02:29: Auto Configuration: No auto configuration could be detected, using local configuration
2018-09-06 17:02:30: Launcher Thread - Launcher thread initializing…
2018-09-06 17:02:30: creating local listening socket on an available port…
2018-09-06 17:02:30: updating local listening port in launcher file: 50856
2018-09-06 17:02:30: Launcher Thread - Launcher thread listening on port 17000
2018-09-06 17:02:30: Local version file: C:\Program Files\Thinkbox\Deadline10\bin\Version
2018-09-06 17:02:30: Network version file: //path-to-repo//DeadlineRepository\bin\Windows\64bit\Version
2018-09-06 17:02:30: Comparing version files…
2018-09-06 17:02:30: Version files match
2018-09-06 17:02:30: Launching Slave:
2018-09-06 17:02:31: Launcher Thread - Remote Administration is now enabled
2018-09-06 17:02:31: Launcher Thread - Automatic Updates is now enabled
2018-09-06 17:03:31: Launcher Scheduling - This machine is no longer idle, but slave vxworkst119 will not be stopped because it was not started by idle detection (scheduling group “workstations”)
2018-09-06 17:06:32: Launcher Scheduling - This machine is no longer idle, but slave vxworkst119 will not be stopped because it was not started by idle detection (scheduling group “workstations”)
2018-09-06 17:08:36: ::ffff:ip-of-workstation-105 has connected
2018-09-06 17:08:36: Launcher Thread - Received command: LaunchSlave ~
2018-09-06 17:08:36: Local version file: C:\Program Files\Thinkbox\Deadline10\bin\Version
2018-09-06 17:08:36: Network version file: //path-to-repo//DeadlineRepository\bin\Windows\64bit\Version
2018-09-06 17:08:37: Comparing version files…
2018-09-06 17:08:37: Version files match
2018-09-06 17:08:37: Launching Slave:
2018-09-06 17:08:37: Launcher Thread - Responded with: Success
2018-09-06 17:09:32: Launcher Scheduling - This machine is no longer idle, but slave vxworkst119 will not be stopped because it was not started by idle detection (scheduling group “workstations”)
2018-09-06 17:11:32: Launcher Scheduling - This machine is no longer idle, but slave vxworkst119 will not be stopped because it was not started by idle detection (scheduling group “workstations”)

After that i removed the setting “Only Stop Slave If Started By Idle Detection”

From here I have sporadically moved the mouse and the keyboard or have left the computer alone - but no matter what I’ve done, the slave came after approximately 2 minutes online, renders a task and then stops again.

2018-09-06 17:13:45: ::ffff:ip-of-workstation-105 has connected
2018-09-06 17:13:45: Launcher Thread - Received command: LaunchSlave ~
2018-09-06 17:13:45: Local version file: C:\Program Files\Thinkbox\Deadline10\bin\Version
2018-09-06 17:13:45: Network version file: //path-to-repo//DeadlineRepository\bin\Windows\64bit\Version
2018-09-06 17:13:45: Comparing version files…
2018-09-06 17:13:45: Version files match
2018-09-06 17:13:45: Launching Slave:
2018-09-06 17:13:45: Launcher Thread - Responded with: Success
2018-09-06 17:14:33: Launcher Scheduling - Stopping slave vxworkst119 after it finishes its current task because this machine is no longer idle (scheduling group “workstations”)
2018-09-06 17:14:33: Sending command to slave: OnLastTaskComplete
2018-09-06 17:14:33: Got reply: VXWORKST119: Sent “OnLastTaskComplete StopSlave” command. Result: “Connection Accepted.”
2018-09-06 17:15:33: Launcher Scheduling - Stopping slave vxworkst119 after it finishes its current task because this machine is no longer idle (scheduling group “workstations”)
2018-09-06 17:15:33: Sending command to slave: OnLastTaskComplete
2018-09-06 17:15:33: Got reply: VXWORKST119: Sent “OnLastTaskComplete StopSlave” command. Result: “Connection Accepted.”
2018-09-06 17:18:52: ::ffff:ip-of-workstation-105 has connected
2018-09-06 17:18:52: Launcher Thread - Received command: LaunchSlave ~
2018-09-06 17:18:52: Local version file: C:\Program Files\Thinkbox\Deadline10\bin\Version
2018-09-06 17:18:52: Network version file: //path-to-repo//DeadlineRepository\bin\Windows\64bit\Version
2018-09-06 17:18:52: Comparing version files…
2018-09-06 17:18:52: Version files match
2018-09-06 17:18:52: Launching Slave:
2018-09-06 17:18:52: Launcher Thread - Responded with: Success
2018-09-06 17:19:34: Launcher Scheduling - Stopping slave vxworkst119 after it finishes its current task because this machine is no longer idle (scheduling group “workstations”)
2018-09-06 17:19:34: Sending command to slave: OnLastTaskComplete
2018-09-06 17:19:34: Got reply: VXWORKST119: Sent “OnLastTaskComplete StopSlave” command. Result: “Connection Accepted.”
2018-09-06 17:20:34: Launcher Scheduling - Stopping slave vxworkst119 after it finishes its current task because this machine is no longer idle (scheduling group “workstations”)
2018-09-06 17:20:34: Sending command to slave: OnLastTaskComplete
2018-09-06 17:20:34: Got reply: VXWORKST119: Sent “OnLastTaskComplete StopSlave” command. Result: “Connection Accepted.”
2018-09-06 17:21:34: Launcher Scheduling - Stopping slave vxworkst119 after it finishes its current task because this machine is no longer idle (scheduling group “workstations”)
2018-09-06 17:21:34: Sending command to slave: OnLastTaskComplete
2018-09-06 17:21:34: Got reply: VXWORKST119: Sent “OnLastTaskComplete StopSlave” command. Result: “Connection Accepted.”
2018-09-06 17:23:53: ::ffff:ip-of-workstation-105 has connected
2018-09-06 17:23:53: Launcher Thread - Received command: LaunchSlave ~
2018-09-06 17:23:53: Local version file: C:\Program Files\Thinkbox\Deadline10\bin\Version
2018-09-06 17:23:53: Network version file: //path-to-repo//DeadlineRepository\bin\Windows\64bit\Version
2018-09-06 17:23:53: Comparing version files…
2018-09-06 17:23:53: Version files match
2018-09-06 17:23:53: Launching Slave:
2018-09-06 17:23:53: Launcher Thread - Responded with: Success
2018-09-06 17:24:35: Launcher Scheduling - Stopping slave vxworkst119 after it finishes its current task because this machine is no longer idle (scheduling group “workstations”)
2018-09-06 17:24:35: Sending command to slave: OnLastTaskComplete
2018-09-06 17:24:35: Got reply: VXWORKST119: Sent “OnLastTaskComplete StopSlave” command. Result: “Connection Accepted.”
2018-09-06 17:25:35: Launcher Scheduling - Stopping slave vxworkst119 after it finishes its current task because this machine is no longer idle (scheduling group “workstations”)
2018-09-06 17:25:35: Sending command to slave: OnLastTaskComplete
2018-09-06 17:25:35: Got reply: VXWORKST119: Sent “OnLastTaskComplete StopSlave” command. Result: “Connection Accepted.”
2018-09-06 17:28:59: ::ffff:ip-of-workstation-105 has connected
2018-09-06 17:28:59: Launcher Thread - Received command: LaunchSlave ~
2018-09-06 17:28:59: Local version file: C:\Program Files\Thinkbox\Deadline10\bin\Version
2018-09-06 17:28:59: Network version file: //path-to-repo//DeadlineRepository\bin\Windows\64bit\Version
2018-09-06 17:28:59: Comparing version files…
2018-09-06 17:28:59: Version files match
2018-09-06 17:28:59: Launching Slave:
2018-09-06 17:28:59: Launcher Thread - Responded with: Success
2018-09-06 17:30:36: Launcher Scheduling - Stopping slave vxworkst119 after it finishes its current task because this machine is no longer idle (scheduling group “workstations”)
2018-09-06 17:30:36: Sending command to slave: OnLastTaskComplete
2018-09-06 17:30:36: Got reply: VXWORKST119: Sent “OnLastTaskComplete StopSlave” command. Result: “Connection Accepted.”

Cheers!

Can you send the slave log that matches up with the last log you sent “2018-09-06 17:13:45 - 2018-09-06 17:30:36”. Feel free to attach the logs as well. You can find them on the render node in C:\ProgramData\Thinkbox\Deadline10\logs.

Thanks!

Charles

@cmoore

Here they are: deadlineslave_scheduler.zip (28.6 KB)

Cheers!

Installed Deadline 10.0.21.5. Still behaves as described. I’m not sure if it’s a bug or a misconfiguration on my side. Can somebody please clarify?

Cheers!
Simon

Hmm… It may be a combination of both. Now that I think about this more, I think the issue is around the idle time calculation and the fact that it’s at 5 minutes. Can you try setting it to 20 minutes and see if it behaves differently.

I’m vaguely remembering that the calculation internally has some minimum time of 5 minutes and I can’t remember the details. I think it’s around the sample times we use for the idleness. This is just a theory.

I tried that and set the timeout to 20 minutes. Didn’t change anything. The slave ist starting while I’m typing or moving the mouse.

Okay. My current theory is that the API call we’re making to grab idle times may be failing. We see this usually on Linux or Windows when the Launcher is running as a service but you’ve already confirmed in your original post that’s not the case.

I’ve configured my settings to be exactly what you had in the original post and I’ll leave it on during the day.

I’ve also asked one of the developers which call we’re using to see if we can have you run the same command to see the output.

Update: Something I’ve managed to miss a few times was this:

2018-09-06 17:08:36: ::ffff:ip-of-workstation-105 has connected
2018-09-06 17:08:36: Launcher Thread - Received command: LaunchSlave ~

Now, the idle detection shouldn’t be sending commands into the Launcher because it’s built into the launcher. Can you check if Pulse is running on “ip-of-workstation-105”? If you turn it off does idle detection work as expected?

Yes, pulse is running on workstation-105 - it’s a render-only-client. I turned it off and now it seems to work as expected.

It’s a bug or do i have to adjust the pulse settings… or install pulse somewhere else?

Cheers!

Well, I tried running Pulse on my workstation to check if it was a bug.

Take a look at your “Configure Power Management” options in the “Tools” menu and see if the workstations are included in the currently enabled group and see if the “Machine Startup” section is enabled. If they are, removing them should help prevent the Slave from starting up unexpectedly. You likely want to have one PM group for workstations and one for render nodes.

And that was the hint! The “Send Remote Command To Start Slave After Waking Up Machines” Option in the “Configure Power Managment” Settings was enabled. When I set the Power Managment I didn’t notice that Option.

Thanks for your help and patience, Edwin.

Cheers,
Simon

You’re quite welcome! I’m a bit annoyed with myself for missing the incoming connection in the log to be honest.

Privacy | Site terms | Cookie preferences