is there a way in Deadline to set an error threshold, or even better, a specific type error threshold after which a machine that reaches this number gets restarted?
I have a problematic slave (possibly a problem with RAM? don’t know yet) that suddenly starts crashing 3ds Max jobs and causes to fail the jobs in Deadline. But it manages to render a few frames (random number) without issues before it starts acting up, so I don’t want to completely disable it.
No, there isn’t, but perhaps you could use the Slave Restart feature of Power Management. You could create a new PM Group with this single machine in it, and then configure the Slave Restart feature to reboot it every few hours (or for how long on average it takes before it starts acting up). The Slave Restart feature tells the Slave to finish its current task before the restart occurs, so you won’t be losing any render time.
Unfortunately the time the slave takes to start acting up is completely random and I haven’t been able to identify the exact problem. So, this feature won’t prevent the jobs to fail from that one render node.
I would like to reboot the render slaves every day or every 2 days. If I add a few slaves to testpool (mode enabled) with the machine restart mode enabled for restart set for 600 min (every 10 hrs)…does this mean that render jobs that are running will continue to run and once completed THEN the render slave will restart?
e.g On a pool of 100 slaves with jobs of various durations…if the the pool is set to restart every 600 min… then this means that the slaves will restart at various intervals depending on when each job has completed, right?
I’ve had this feature on for a few days now and it works really well. The slaves always finish their current task before restarting, so you don’t have to worry about that.
Yes, exactly. If the slaves are idle and you have the power management setup to shut them down after a certain idle period, the WOL will wake them up only when they have jobs to do. This is exactly the kind of a setup I have here.
Save energy and from past experience with bugs and what now I setup automatic restarts every 24 hours.