Idle detection issues

ChrisMullinax · April 13, 2023, 4:08pm

I have Deadline clients on a schedule that is set to Start Worker When Machine Is Idle For 10 minutes. On a Mac client machine, I will be working (actively typing and moving the mouse), and Deadline Worker starts up. It then sits open for a bit before then closing. In my mind this is not how idle detection should be working. Is this something that is expected or am I missing some part of the setup to have it work properly?

zainali · April 13, 2023, 4:54pm

Hello @ChrisMullinax

This is not expected. Please turn verbose Pulse logging on from Deadline Monitor> Tools> Configure Repo Options> Application Data> Check the box for verbose Worker and Pulse logs.

Now reproduce the issue and share logs from the machine running Pulse and the the Worker (which comes up even when not idle).

Also share a screenshot of Power Management settings from Deadline Monitor> Tools> Configure Power Management> Idle Shutdown and everything else which is configured.

Do you have Worker scheduling set up too? Deadline Monitor> Tools> Configure Worker Scheduling>

ChrisMullinax · April 13, 2023, 5:18pm

Well maybe I don’t have this configured properly. I am not running Pulse and only have idle detection set up in Worker Scheduling, nothing in Power Management. @zainali is this not the way to do it?

ChrisMullinax · April 13, 2023, 5:40pm

These are all direct connections to the repository on a local farm. I have never set up pulse because I didn’t see it as applicable to our farm.

zainali · April 14, 2023, 7:39pm

Gotcha, so it is Worker scheduling which is not working. For that case I still need to see verbose Worker and Launcher logs.

Also confirm, are you running Launcher as a service? Idle detection does not work if Launcher is running as a service.

ChrisMullinax · May 23, 2023, 2:33pm

I ended up installing Pulse to see if anything would change, but I’m still having the same issues. I’ve attached Slave and Launcher logs from a machine I was actively working on when Worker launched. I’ve also included the Pulse log. We are not running Launcher as a service.
Archive.zip (10.2 KB)

Justin_B · May 23, 2023, 2:58pm

It looks like Idle detection is running properly, we can see logs from the Launcher with a bouncing ‘time-till-idle’ showing it’s actually picking up on your usage.

However I can see the issue you described happening here:

2023-05-23 09:20:04:  Launcher Thread - Restarting Worker Edit-07s-Mac-Pro because it has stalled
2023-05-23 09:20:04:  No Worker to shutdown
2023-05-23 09:20:04:  Launching Worker: 
2023-05-23 09:20:37:  Launcher Scheduling - Worker "Edit-07s-Mac-Pro" in Group "Global" GroupSlaveStop: "Normal stop"
2023-05-23 09:20:37:  Sending command to Worker on port 58914: StopSlave

For some reason the Worker is considered stalled by the Launcher and has to be restarted to resolve the stall. Could we get all the application logs from Edit-07s-Mac-Pro? The Worker log you shared it from the restart and immediate close - I’m curious what the previous log looks like and what might be causing the Launcher to consider the Worker stalled.

Thanks!

ChrisMullinax · May 23, 2023, 3:12pm

Archive.zip (39.1 KB)
Here are all the logs from today.

Justin_B · May 24, 2023, 7:14pm

So I think figured out what’s causing the Launcher to consider the Worker stalled triggers the restart; when the Worker gets shut down not all the internal threads successfully shut down in the 30 seconds we give the Worker, which means the Worker doesn’t report a successful shutdown. This results in the Worker showing up as stalled, which the Launcher attempts to solve by starting the Worker again.

The odd thing is the mis-match in shutdown times, the worker only starts its 30 second count at 2023-05-23 09:04:16 and has made it to 22 seconds remaining at 2023-05-23 09:04:37. So it only counts down 8 seconds even though 21 seconds have actually passed based on the timestamps.

The Launcher starts at 2023-05-23 09:04:11 and ends its countdown at 2023-05-23 09:04:45, a generous 36 seconds later. At that point it force kills the Worker’s process.

It’s really odd behaviour that I’m not quite sure how to resolve. I think the issue comes from problems in our force-quit command where we enforce the shutdown with a timer. As a test, if in the Monitor under Tools → Configure Worker Scheduling you enable ‘Allow Worker to Finish Its Current Task When Stopping’ does this behaviour persist?