I’m attempting to add more machines to my render farm running Deadline 3.1.0.35390, and I’m having some issues. When Deadline slave runs as an app, the slaves come online and render without issue; but when Deadline runs as a service, the service launches successfully, but the slaves still show up as offline. I’ve verified that the service is running with network administrator credentials, and I have other machines running Deadline as a service without issue. How would I go about further troubleshooting this problem?
Thanks for the help.
Cheers,
Eric
If a slave is not showing up as online, my guess is that it doesn’t have the necessary permissions to update its slave information file in the repository. Check the slave log to if any error messages are getting printed out, and if they are, feel free to post it here for us to take a look.
Cheers,
Hey Ryan,
When I compared what happens on the slaves successfully running Deadline as a service to those that aren’t, I found that on the slaves that aren’t working, the Deadline Launcher service is launching, but it’s not launching the deadlineslave.exe task like it’s supposed to. I checked the bad slave’s history log in the reports folder in the Repository, but no errors were logged. Is there a reason why deadlinelauncherservice.exe wouldn’t launch deadlineslave.exe?
Cheers,
Eric
Some further info: I checked through one of the bad slaves’ local logs. The most recent log doesn’t indicate any errors and just has the following:
2009-04-22 09:40:07: BEGIN - ARCH-PC-69\ek23
2009-04-22 09:40:07: Start-up
2009-04-22 09:40:07: 2009-04-22 09:40:07
2009-04-22 09:40:07: Deadline Launcher 3.1 [v3.1.0.35390 R]
2009-04-22 09:40:07: Launcher Thread - Launcher thread initializing…
2009-04-22 09:40:07: Perfoming remote admin check
2009-04-22 09:40:08: Launcher Thread - Remote administration is disabled
2009-04-22 09:40:08: Launcher Thread - Launcher thread listing on port 5042
2009-04-22 09:41:08: Perfoming remote admin check
2009-04-22 09:43:08: Perfoming remote admin check
…
One of the older logs does have the error, “2009-04-21 16:10:20: Caught unhandled exception: ~LauncherThread() called when mode is not Stopped but rather: Running (System.InvalidProgramException).” I’ve attached the full log to this post.
deadlinelauncherservice(Arch-pc-69)-2009-04-21-0013.log (6.37 KB)
Can you log into the machine with the launcher running as a service, and then try starting up the slave from the Start menu? That should go through the launcher as well, so the launcher would write to the log that it is trying to start the slave. Hopefully if an error is occurring, the launcher would print it out to the log.
Cheers,
The slave launches fine as an application and doesn’t log any errors. The problem seems to be with the deadlinelauncherservice.exe, in that it’s not communicating with the slave for some reason.
Here’s the log from the slave when running it as an application:
2009-04-22 13:36:03: BEGIN - ARCH-PC-69\Administrator
2009-04-22 13:36:03: Start-up
2009-04-22 13:36:03: 2009-04-22 13:36:03
2009-04-22 13:36:03: Deadline Slave 3.1 [v3.1.0.35390 R]
2009-04-22 13:36:03: slave initialization beginning.
2009-04-22 13:36:03: Repository time: 04/22/2009 13:36:03
2009-04-22 13:36:04: Info Thread - Created.
2009-04-22 13:36:19: OnFormClosing
2009-04-22 13:36:19: Info Thread - requesting slave info thread quit.
2009-04-22 13:36:19: Listener Thread - OnConnect: Listener Socket has been closed.
2009-04-22 13:36:19: Info Thread - shutdown complete
2009-04-22 13:36:20: Scheduler Thread - shutdown complete
2009-04-22 13:36:21: OnFormClosing
2009-04-22 13:36:21: MainWindow_FormClosed
2009-04-22 13:36:21: Checked license back in
Here’s the next thing to try. Stay logged off the machine, and from the Monitor on another machine, send a Remote Control command to start the slave. If the slave doesn’t launch on the other machine, log in and grab the launcher log and post it. The log should mention that the launcher received a request to launch the slave, and whether or not launching the slave was successful.
Thanks,
When I attempt to start the slave remotely, the following error occurs:
Arch-pc-69: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 130.132.106.101:5042
The deadlinelauncherservice log on the slave doesn’t report anything except “Performing remote admin check.” It doesn’t show that it received a request to launch the slave.
Shouldn’t the launcher service launch the slave automatically when it’s running?
It should, unless the option to start the slave automatically at startup is disabled. I did notice in the launcher log that Remote Administration is disabled, which could explain why you get that error when doing remote control. To enable remote administration, open the Repository Options from the tools menu in the monitor (while in super user mode), and scroll down to the Launcher Settings.
To check if the option is enabled to launch the slave on startup, log into the problematic machine and check the local user profile deadline.ini file (I guess this would be the user that the launcher service is running under). For example, on XP, this path would be something like C:\Documents and Settings\USERNAME\Local Settings\Application Data\Frantic Films\Deadline\deadline.ini. If you don’t see this line, add it:
LaunchSlaveAtStartup=True
If the line is there, but the setting is set to False, just change it to True.
Cheers,
That explains it: we had “Launch Slave at Startup” turned off so that users could launch Monitor to submit jobs without having that machine automatically become a render slave; I didn’t realize that the launcher service and launcher app both used the same ini file.
The issue we’re running into now is that, with the way Deadline is set up to run as a service, the launcher service launches the deadlineslave.exe process, but when deadlinelauncherservice.exe is stopped, that doesn’t STOP the deadlineslave.exe process. The process has to be killed manually, and even after it’s killed, the slave still shows up as available in the Deadline Monitor.
Our ideal setup is that, when a user logs off a lab machine, a script launches Deadline as a service so that machine can join the render farm. When a user logs back in, another script stops the Deadline process. We had this running great when we were using 3ds Max’s Backburner, and it was an excellent way to use processor power that would otherwise just be sitting there. But with the way Deadline runs as a service, I’m not sure how we’d implement this. It would certainly make life easier if there was a service for the Deadline Slave rather than the Deadline Launcher, but I don’t know how difficult that would be to implement.
Any ideas?
You can leave the launcher service running all the time, and just start/stop the slave when the users log in and out. To start the slave through the launcher (whether it is running as a service or not), use this command line:
deadlinelauncher.exe -slave
Because we’re running the slave through the launcher, it will be launched in service mode, and the launcher will also check if an update is necessary. The Deadline bin folder should be in your PATH, which is why I’m omitting the full path to the launcher executable. You could always use the full path if you want. Then to stop the slave, use this command line:
deadlineslave.exe -s
The reason we have the launcher run as the service is because remote communication between the monitor and the remote machines is done via the launcher. We always encourage that the launcher be left running for this very reason (whether in service mode or not). The launcher doesn’t use up any resources when it’s sitting idle, so it won’t affect the user when they’re logged in.
Cheers,
PS: The reason whey the slave appears online after you kill it is that the slave didn’t have a chance to update its state file (which it does when it shuts down cleanly). However, Deadline monitors the last time a state file was written to, so if a slave hasn’t written to its state file in a while (ie: because it was killed), it would eventually appear as stalled in the monitor.
Great. Thanks for all the help, Ryan. We’ll see if we can get this up and running.
Cheers,
Eric