Is there a recommended configuration for getting idle machines to suspend when idle and to start when new jobs are in the queue? I stopped looking into this when there was a blocking bug back in v3 or 4 but now I’m all about getting this right. Ok let me break it down:
The basic configuration I want is,
suspend idle farm machines after 30mins,
suspend idle artist machines after 7pm,
unsuspend/startup any machine if eligible for job at any time.
I prefer suspend over shutdown because its faster and less disruptive.
questions:
Idle shutdown requires slave to be running? What if slave crashed?
suspend is currently working but waking from suspend isn’t. Remoting in will wake the machine, then I see that slave isn’t running. What’s going on here?
is unsuspend and/or wol sent from pulse?
is there a way to attempt unsuspend and wol together or should I make two groups. And then would those groups conflict and create unpredictable behavior?
Suspending idle farm machines after 30 minutes is easy, but there isn’t support for suspending machines after a specific time. You would probably want to place your artist machines in a separate Machine Group, and then figure out the appropriate idle time to use. The machines will always be started up at any time they are required for a job that is currently in the queue.
The slave needs to be running in order for the machine to be suspended, because their idle time is the sole value used to determine if they should be suspended. Slaves that are stalled (aka: crashed) or offline will not have their machines suspended.
When a machine is suspended, the slave is closed (due to complications that would occur if left running), but when Pulse tells it to wake up, it also issues the “start slave” command to the launcher so that the slave comes back alive again. That’s why waking it up manually doesn’t start the slave.
WOL is used to wake up offline machines, as well as suspended machines (it worked for both in our tests). The Machine Startup feature is completely unaware if the machine it is waking up is suspended or offline.
This is not ideal. I can’t figure out why the machines don’t return from suspend. And when I remote in, you’re telling me that will break it because it won’t know to start the slave anymore? Monitor still can’t tell me the difference between a machine that is off or on or if launcher is running or not? And I have to go to different menu to see that a shutdown or suspend was sent?
Could it be that the WOL packet doesn’t make it to the machine? Do you only have this problem when the machine is suspended, vs if it was shut down?
No. The slave will still appear offline to Pulse, so it will try to wake it up. A “wake up” command always consists of the WOL packet and a message to the Launcher to start up the slave. So if you remote into it and unsuspend the machine, the slave will still get started as long as the Launcher is running (which is required for Power Management to work anyways).
I was just explaining why you didn’t see a slave when you remoted in.
You can use the ping feature from the View menu to see if a machine is online or not. Nothing for the launcher though, so I’ll add this to the wish list.
Looks like we overlooked this. The slave should be able to show in the Status Message column of the slave list whether it was shutdown or suspended due to idle shutdown.
I’ve switched it to shutdown/wol. But I’ll have to observe the results. I couldn’t say for sure if the machines would return from suspension or not from pulse, but I know the slave would never run unless I ran it manually. I’ll have to try it on some workstation that I can see from my desk. So I can observe which part is working, wake, resume, and/or start slave.
How does pulse know to send the start slave command to offline machines then? How would it differentiate from a machine that just where a user closed it’s slave from an idle suspension that closed slave?
Thanks for putting the other requests in the queue.
The pinging feature wasn’t intuitive to find.
Maybe add it to the slave right-click menu for individual pinging?
Also, “disabled” is a bad term for never run.
In this version where is the pinging coming from? Is it Monitor or pulse or p2p?
I’m also worried that pinging would unsuspend machines.
Pulse just sends both commands blindly, since it’s harmless, and helps to ensure that the slave eventually starts. It doesn’t differentiate how the Slave was closed, it just cares if it is in the Machine Group and if it is required to render a job in the queue.
Yeah, we could do that. Maybe enabling the Ping option can be done from the right-click menu too.
Hmm, have any suggestions for a better term? Maybe we leave the field blank? We need something that says “Enable Me!”, but I’m a bit fried right now and can’t think of anything better than “Disabled”.
The Monitor. It’s still a direct ping.
Never thought of that… definitely something to test.
I’m still not getting suspended machines to start slave and wake up from suspended state via the pulse service. I only see messages about wake-on-lan. Can’t the start-up do both?
Do you have the option enabled allow the device to wake up the machine? If not, that could explain why it isn’t working.
I guess a ping could potentially wake up the machine in this case, but I have seen some network adapters that have an option to only allow a magic packet to wake the computer.
For debugging, you could suspend a machine manually, and then use the Remote Control -> Start Machine option from the slave list in the Monitor to wake it up again. You should probably run the Monitor on the Pulse machine when doing this test though, since that’s where the Power Management messages will be coming from.
My current issue isn’t WOL. It’s about starting slave from a suspended machine. I was under the impression that pulse would start slave on all machines in that group if they are qualified to render a job in the queue.
I can’t tell if the machine is unsuspended or not. The issue is that slave isn’t running and will not run without sending a start slaves command from monitor.
Can you hook up a monitor to the machine to see if it’s unsuspended or not? Is there a workstation you could use for testing for now? Basically, if the machine isn’t waking up from the suspended state, then obviously the slave won’t start up on it. That’s why we need to know if the machine is suspended or not so that we can focus on the proper problem.
Also, when you manually run Start Slave from the Monitor, are you running the Monitor on the same machine that Pulse is running on? If not, try that to see if you can manually start the slave. We’ve seen cases where the pulse machine is behind a different switch, and that could mean the magic packet isn’t making it to the slaves.