Hi,
We’re having a strange problem with slaves not starting up properly on our renderfarm. The machines are being powered on through IPMI fine but then they don’t actually start the slaves up. The remote commands keep getting sent but the slave doesn’t start, you can even log on and try and start the slave manually and it won’t do anything. I have to then restart the machines and this seems to fix the problem. We’re running 8.05.01, any thoughts on what might be causing this?
Thanks
Nick
Are you running as a service? We’re noticing that none of our upgrades to 8.05 are working unless we restart the machine.
No, we’ve never really liked the way it work as a service. The artists just don’t seem to be able to understand that they have to check if it’s still running…
We’ve been having this problem since we updated to 8, it seems that sometimes the launcher doesn’t startup and sometimes it does but no matter what you try the slave won’t start until we restart or even reinstall.
Hello,
Can you send over a deadlineslave and deadlinelauncher log files from one of the machines that the slave is not starting up, as I want to take a look and see if there are any errors showing why the launcher is starting, but not the slave when the machine powers on. Thanks!
Hi Dwight,
I’ve been waiting for this to reoccur and it has so I’ve attached the launcher log file. There is no corresponding Slave file as it just doesn’t seemed to have started at all. I’ve left the machine in this state if you need me to try and get any more information from it.
Thanks
Nick
launcher_log_2016_07_26.txt (214 KB)
This is… quite a log file. It had both nothing and a lot all at once. Are there any other log files in that directory? Can you advise the Deadline version you are running on both the slave and the repo? I see the launcher is not running as a service, but it’s clear something is looping here.
Hi Dwight,
Yep, seems to just be looping through. I’ve attached the rest of the logs, I’ve had another look through and it looks like it’s after we applied the lastest update (we should be on 8.05.01). I guess the update is just breaking the slaves? We don’t tend to run as a service because, the artists don’t like it on there machines and we find it easier to debug with it all running the same.
Nick
logs.zip (4.26 MB)
Hello,
Can I have you reinstall the client applications on there, as it looks like the binaries have become corrupted on the machine. Is this happening on all your render nodes?
We reinstalled and it resolved the problem. But yeah it seems to be happening to a few of the machines each time we do an auto update. Is there anyway the launcher can pick this behavior up so it doesn’t just keep repeating the same error?
Nick
Hello,
So the issue here is actually that the machine at 192.168.0.201, likely your pulse machine, is sending the slave start signal, and then 2 minutes later it checks to see if any slaves need to start up, and seeing this slave is not running, sends the signal again. What I would like to do here, next time this happens, is try to run the slave from the command line using the following command
deadlineslave --console
This should tell us what is happening when the slave starts, and give us a better idea of the problem.
Yep, that’s the pulse machine. I’ll keep my eye for the next one but I’m guessing it won’t happen until the next install.
Nick