Changes to task timeout not respected in "real time"

nrusch · July 21, 2014, 10:52pm

I’ve noticed this a few times now, so I don’t think I’m crazy. Basically, if I submit a job with a task timeout of 20 minutes, let a task pick up, see that it’s going to run longer than that, and change the max task time to 0, the slave still kills it at 20 minutes.

The slave should at least reconcile what it thinks the task timeout is with what it actually is in the DB one more time immediately before it actually kills (or in this case, doesn’t kill) the task to avoid wasting a lot of farm time. This obviously doesn’t address the problem of the max time being lowered to a value less than the current running time of an active task, so that would need to be solved separately…

…unless there was a central task dispatcher.

LaszloSebo · July 21, 2014, 11:58pm

+1 on both the last minute reconciliation

+1.2 on the central dispatcher

rrussell · July 22, 2014, 4:08pm

We actually do a check at the time the timeout has occurred before killing the task. Maybe it’s the case of it being set back to 0 (ie: disabling it) that isn’t working…

We’ve logged it and will run some tests to see if we can reproduce this behavior.

Cheers,
Ryan

ryangagnon · July 23, 2014, 2:38pm

I’ve been trying to reproduce this error by setting a Job to timeout after 1 minute and then changing the timeout value during the render to 0 or to something higher than 1 minute. I have not been able to get the task to timeout incorrectly, even if I set the timeout value back to 0 a few seconds before the timeout was to occur. Can you confirm that these are the results you’re seeing on your end? Or do you have some means of reproducing this consistently?

Thanks,

Ryan G.