AWS Thinkbox Discussion Forums

Multiple Slaves Starting

We’ve got multiple computers seemingly randomly starting multiple slaves. What INI files etc should I check to disable this behavior? It’s causing a lot of problems with renderings failing.

There currently isn’t a way to disable the ability to launch multiple slaves. In beta 2, there will be an option to hide this feature from normal users, but that doesn’t prevent multiple slaves from starting up. That’s weird that they’re starting up randomly. When you find multiple slaves running on the same machine, do they have the same name, or different names? Also, the next time this happens, can you send us the launcher log from the machine that has multiple slaves running? And finally, you can remove slaves from the Launcher’s right-click menu on the machine.

Cheers,

  • Ryan

I assume they’re the same name since no extra slaves even show up in our monitor. Our launcher is running as a service but I can launch it from desktop and see if there aren’t extra names in the list.

2011-08-18 16:48:46: BEGIN - RENDER-I7-05\renderadmin 2011-08-18 16:48:46: Start-up 2011-08-18 16:48:46: 2011-08-18 16:48:46 2011-08-18 16:48:46: Deadline Launcher 5.1 [v5.1.0.45083 R] 2011-08-18 16:48:56: Local python version file: C:\Program Files\Thinkbox\Deadline\python\2.6.7\Version 2011-08-18 16:48:56: Network python version file: \\sfs-file\repository5\python\Windows\2.6.7\Version 2011-08-18 16:48:56: Comparing python version files 2011-08-18 16:48:56: Python upgrade skipped because Version files are the same 2011-08-18 16:48:56: Local version file: C:\Program Files\Thinkbox\Deadline\bin\Version 2011-08-18 16:48:56: Network version file: \\sfs-file\repository5\bin\Windows\Version 2011-08-18 16:48:56: Comparing version files 2011-08-18 16:48:56: Launcher Thread - Launcher thread initializing... 2011-08-18 16:48:56: Perfoming remote admin check 2011-08-18 16:48:57: Remote Administration is now enabled 2011-08-18 16:48:57: Launcher Thread - Remote administration is enabled 2011-08-18 16:48:57: Launcher Thread - Launcher thread listening on port 5042 2011-08-18 16:48:59: ::ffff:192.168.94.100 has connected 2011-08-18 16:49:00: Launcher Thread - Received command: LaunchSlave Render-i7-05 2011-08-18 16:49:00: Local python version file: C:\Program Files\Thinkbox\Deadline\python\2.6.7\Version 2011-08-18 16:49:00: Network python version file: \\sfs-file\repository5\python\Windows\2.6.7\Version 2011-08-18 16:49:00: Comparing python version files 2011-08-18 16:49:00: Python upgrade skipped because Version files are the same 2011-08-18 16:49:00: Local version file: C:\Program Files\Thinkbox\Deadline\bin\Version 2011-08-18 16:49:00: Network version file: \\sfs-file\repository5\bin\Windows\Version 2011-08-18 16:49:00: Comparing version files 2011-08-18 16:49:00: Launcher Thread - Responded with: Success| 2011-08-18 16:49:57: Perfoming remote admin check 2011-08-18 16:51:57: Perfoming remote admin check 2011-08-18 16:54:57: Perfoming remote admin check 2011-08-18 16:58:57: Perfoming remote admin check 2011-08-18 17:03:57: Perfoming remote admin check 2011-08-18 17:09:58: Perfoming remote admin check 2011-08-18 17:16:58: Perfoming remote admin check 2011-08-18 17:24:58: Perfoming remote admin check 2011-08-18 17:33:58: Perfoming remote admin check 2011-08-18 17:43:58: Perfoming remote admin check 2011-08-18 17:53:59: Perfoming remote admin check 2011-08-18 18:03:59: Perfoming remote admin check 2011-08-18 18:13:59: Perfoming remote admin check 2011-08-18 18:23:59: Perfoming remote admin check 2011-08-18 18:33:59: Perfoming remote admin check 2011-08-18 18:43:59: Perfoming remote admin check 2011-08-18 18:53:59: Perfoming remote admin check 2011-08-18 19:03:59: Perfoming remote admin check 2011-08-18 19:13:59: Perfoming remote admin check 2011-08-18 19:23:59: Perfoming remote admin check 2011-08-18 19:33:59: Perfoming remote admin check 2011-08-18 19:43:59: Perfoming remote admin check 2011-08-18 19:53:59: Perfoming remote admin check 2011-08-18 19:57:33: ::ffff:192.168.94.100 has connected 2011-08-18 19:57:33: Launcher Thread - Received command: OnLastTaskComplete ShutdownMachineIdle : : Render-i7-05 2011-08-18 19:57:34: Sending command to slave: OnLastTaskComplete ShutdownMachineIdle : 2011-08-18 19:57:34: Got reply: RENDER-I7-05: Sent "OnLastTaskComplete ShutdownMachineIdle : " command. Result: "" 2011-08-18 19:57:34: Launcher Thread - Responded with: Success| 2011-08-18 19:58:15: ::ffff:192.168.94.100 has connected 2011-08-18 19:58:15: Launcher Thread - Received command: OnLastTaskComplete ShutdownMachineIdle : : Render-i7-05 2011-08-18 19:58:15: Sending command to slave: OnLastTaskComplete ShutdownMachineIdle : 2011-08-18 19:58:16: Got reply: RENDER-I7-05: Sent "OnLastTaskComplete ShutdownMachineIdle : " command. Result: "" 2011-08-18 19:58:16: Launcher Thread - Responded with: Success| 2011-08-18 20:00:14: ::ffff:192.168.94.100 has connected 2011-08-18 20:00:14: Launcher Thread - Received command: OnLastTaskComplete ShutdownMachineIdle : : Render-i7-05 2011-08-18 20:00:14: Sending command to slave: OnLastTaskComplete ShutdownMachineIdle : 2011-08-18 20:00:15: Got reply: RENDER-I7-05: Sent "OnLastTaskComplete ShutdownMachineIdle : " command. Result: "" 2011-08-18 20:00:15: Launcher Thread - Responded with: Success| 2011-08-18 20:00:52: ::ffff:192.168.94.100 has connected 2011-08-18 20:00:52: Launcher Thread - Received command: OnLastTaskComplete ShutdownMachineIdle : : Render-i7-05 2011-08-18 20:00:52: Sending command to slave: OnLastTaskComplete ShutdownMachineIdle :

Only the one slave name in the launcher when viewed through the GUI.

And so far only the one slave as well.

Ok more specifically if I launch the service or launch the launcher directly no problems. Just one slave. It seems to only be when pulse starts a computer up for a job. If I start a computer. And then add it to a job. No problem. It’s only when pulse started up a computer it must be sending a command to launch a slave. Any reason why pulse would tell a computer to launch slave when it’s already running? Is there a TTD check that’s too short that needs to go longer so that it doesn’t think there is no slave running on startup?

Hmm, sounds like a power management issue then. That definitely helps narrow it down, so we’ll run some tests to try and reproduce. Normally, it shouldn’t matter if Pulse tells a slave to launch when it’s already running, because only once instance of the slave with the given name should ever run at a time.

Cheers,

  • Ryan

Just as another heads up. I (through necessity since they’re running as a service) changed the deadline.ini to tell slaves to launch on startup. Could that be a contributing factor?

Hmm, good question. We’ll definitely keep this in mind while trying to reproduce.

Thanks!

  • Ryan

That seems to do it in one case. I got it to do it outside of a service on one machine. And I deleted the C:\programdata\deadline.ini entry for launchslaveonstart or whatever and now it only launches one slave. I enabled launch slave on startup in the GUI launcher and it didn’t add it back t othe programdata\thinkbox\deadline.ini so I’m not sure where you’re saving that now.

Thanks for the info! That setting is stored as a per-user setting, so it would be in %LOCALAPPDATA%\Thinkbox\Deadline\deadline.ini.

Just to confirm, after you re-enabled the setting, did the problem come back? Or was it only when launchslaveonstartup was in %PROGRAMDATA%\Thinkbox\Deadline\deadline.ini that the slave was launched twice?

Haven’t had time to further test. I needed all of the systems for rendering last night. :wink:

Hmmm another machine now is acting funny.
Slaves.jpg

Good to know. We hope to look into this issue after beta 2 is released (which should be later this week or early next week.

Cheers,

  • Ryan

I removed it from ProgramData. Still launched two.
I removed it from UserData and it properly auto-configured and added it back in.

If I had to place a bet I would say the auto-config is doing it somehow.

I turned off auto-configure; there was no improvement.

This ended up being a more general issue than we originally thought. The problem was that there was a large gap between when a slave starts and when a new slave can tell that the slave is already running. Basically, you could start up as many slaves with the same name providing that the original was still showing its splash screen. It wasn’t until the splash screen went away that new instances would know it’s running and not start up anymore.

This gap has been closed in our internal build, so this issue should be fixed in beta 2!

Thanks again for providing all the information you did. It made it much easier to know where to start looking! :slight_smile:

Cheers,

  • Ryan

Great! Thanks for tracking it down, I’ll stop randomly changing farm settings for each job I submit. :wink:

And I guess my first instinct was the right one haha. :wink:

Hm, i’m still getting that issue here, on Fedora12. When i start deadlinelauncher -nogui there are multiple slaves starting at the same time with random slave names from the rest of the machines in our studio, which had previously launched slaves. I deleted the whole slaves folder, and still no luck. It’s possible bucause we’re using network distribution of fedora here in the studio, so every machine has the same exact system. The slaves are running on single username and i see now that in the local settings of the user there are all the slaves. Is there a way to fix this somehow?

The slave configurations for each instance on a machine are stored in the local install folder. For example, if the client is installed to /usr/local/Thinkbox/Deadline, the folder that contains the slave instances will be /usr/local/Thinkbox/Deadline/settings/slaves. Note that the slaves folder in the local user settings are just configuration settings for those slaves. The slaves folder in the install folder controls which slaves actually start up.

Just to confirm, is this local install folder shared between all nodes as well?

Privacy | Site terms | Cookie preferences