Hi,
I’m assuming a network interruption or that the Slave might appear Stalled… but When/Why do these messages start? And once this state is entered, is the reading of stdout of the process severed? I don’t seem to ever see any other messages from the process in the logs.
Listener Thread - fe80::4df1:b377:8733:8a9d%15 has connected
Listener Thread - Received message: SlaveStillRunning
Listener Thread - Responded with: Success: Yes
---- December 06 2012 – 10:16 AM ----
Listener Thread - fe80::4df1:b377:8733:8a9d%15 has connected
Listener Thread - Received message: SlaveStillRunning
Listener Thread - Responded with: Success: Yes
Listener Thread - fe80::4df1:b377:8733:8a9d%15 has connected
Listener Thread - Received message: SlaveStillRunning
Listener Thread - Responded with: Success: Yes
Listener Thread - fe80::4df1:b377:8733:8a9d%15 has connected
Listener Thread - Received message: OnLastTaskComplete RestartMachineSlaveRestart
Listener Thread - Responded with: Success
Listener Thread - fe80::4df1:b377:8733:8a9d%15 has connected
Listener Thread - Received message: SlaveStillRunning
Listener Thread - Responded with: Success: Yes
Listener Thread - fe80::4df1:b377:8733:8a9d%15 has connected
Listener Thread - Received message: SlaveStillRunning
Listener Thread - Responded with: Success: Yes
Listener Thread - fe80::4df1:b377:8733:8a9d%15 has connected
Listener Thread - Received message: SlaveStillRunning
Listener Thread - Responded with: Success: Yes
Listener Thread - fe80::4df1:b377:8733:8a9d%15 has connected
Listener Thread - Received message: SlaveStillRunning
Listener Thread - Responded with: Success: Yes
---- December 06 2012 – 10:17 AM ----
Listener Thread - fe80::4df1:b377:8733:8a9d%15 has connected
Listener Thread - Received message: SlaveStillRunning
Listener Thread - Responded with: Success: Yes
Listener Thread - fe80::4df1:b377:8733:8a9d%15 has connected
Listener Thread - Received message: SlaveStillRunning
Listener Thread - Responded with: Success: Yes
Listener Thread - fe80::4df1:b377:8733:8a9d%15 has connected
Listener Thread - Received message: SlaveStillRunning
Listener Thread - Responded with: Success: Yes
Listener Thread - fe80::4df1:b377:8733:8a9d%15 has connected
These messages start when the slave has been sent a remote command that could result in the machine shutting down or restarting. Basically, the launcher takes over processing the command, and part of that processing is to ensure all slaves have shutdown on the machine before shutting down or restarting the machine. This is primarily meant to handle the case where more than one slave is running on the machine.
It sounds like in this case, the slave itself wasn’t shutting down. What did the slave’s log look like here?
The chunk of output I put in the first post just keeps repeating and repeating. I’ve seen some Tasks that run for an hour or more in this state. They do finally complete.
To clarify, I see this output if I remote into the slave. I am not sure if that output ends up in the actual logs in Deadline monitor. I don’t think that it does.
Sorry Paul, I was thinking that was the Launcher log.
Just to confirm, is the slave in the middle of a task when it starts printing out those messages? If so, then this is normal operation because an “OnLastTaskComplete” command was sent, which means the slave won’t shutdown until its task is finished.
Scenario
- Slave is in the middle of rendering a task.
- There may be no correlation, but often times I am remoting in to check a Slave that appears to be taking too long to finish a task
- In Power Management, we did have the Slaves set to Machine Restart every 24 hours… (So is that what is triggering this mode???)… We just disabled this.
- Once the logging on the Slave trips over to this “Listener Thread” mode, it is incessant. Checking multiple times per minute. It floods the logs. I am unsure if the log messages from the actual rendering process are interspersed OR if that connection is severed.
- Tasks do seem to finish eventually.
Essentially we are trying to determine if seeing a Slave in this Listener Thread mode is an indicator of anything BAD. Did something bad happen to trigger this mode? (network interruption, lost connection to Pulse, etc) Or is it just the Slave hitting the timer set in the Machine Restart tab of Power Management? If its Power Management, why can’t it just set the Slave to RestartAfterCurrentTaskCompletion and then leave it alone? It seems excessive to keep nudging the slave… “Hey buddy have you restarted… how about now… how about now…” Haha.
Also, once the Slave is in this mode, is there ANY effect on the Rendering Task? I would guess not, but it really becomes impossible to find any meaningful log messages.
Hey Paul,
Thanks for the additional details. What you’ve described sounds completely normal, since the Machine Restart feature of power management doesn’t restart the machine until the slave has finished its current task. The reason you are seeing all those logged messages is that the Launcher is polling the slave to check if it is still running. There should be no impact on rendering or any other slave operations.
What we will do in the future is suppress this Listener Thread output when slave verbose mode is disabled.
Cheers,