Hi,
It would be great if Pulse and Power management was aware that it tried to wake up a machine but the machine did not become online within given time frame. Lets call this feature “Maximum number of minutes before a woken up slave should become online” with tooltip “The interval, in minutes, to wait for given slave to become online”.
How it’s currently working:
Assume we have 1 job with single task and two slaves A and B (one slave one machine; startup order is set to A then B). Both machines are offline.
- The job is submitted to DL
- The Pulse is checking that both slaves can render the task and selects machine A to be woken up.
- Machine A is not waking up.
- Wait “Next pending job scan” timeout
- Go to step 2
Effect:
Pulse is trying hard to wake up machine A, but it’s not gonna happen. Job is waiting for infinity.
How the proposed feature should work:
Same assumption as before and wake up timeout is set
- The job is submitted to DL
- The Pulse is checking that both slaves can render the task and selects machine A to be woken up.
- Machine A is not waking up.
- Wait “Next pending job scan” timeout
- Check if wake up timeout passed, if not go to step 2
- Wake up timeout passed and the Pulse is deciding to wake up machine B. Maybe event the machine A is marked as non-responsive (?)
- Machine B is starting and soon the job will be rendered
Hope this makes sens