AWS Thinkbox Discussion Forums

Timeouts in job properties only requeue tasks whichever option is selected

Greetings,

We have set timeouts on our MayaBatch and Vray jobs but noticed a very strange behaviour.
Jobs don’t actually timeout when they reach the maximum time for the task, they simply requeue jobs and not even fail previous jobs, which makes a single machine supposedly handle multiple jobs at the same time

image
Those are the results of a test job with a 10s timeout which is supposed to create an error
image
Those are the settings I tested a job with
image
Notice how the job reports say requeue and how the machine gkl3 is “rendering” multiple jobs at the same time

I’m running Deadline 10.1.1.3, It apparently doesn’t happen with Deadline 10.0.21.5

We’d really like to implement timeouts to avoid our render nodes from being blocked on a job too long and restarting (as they are preemptible machines)

I’ve been fiddling around with this, and it’s certainly not behaving as it should.

I’ve gotten the same results that you have on Deadline 10.1.2.2, complete with my single worker claiming it’s on several tasks simultaneously. Though that might be a side effect of effectively re-queuing a task every ten seconds, but it stays weird at a 30 second cut-off as well.

I’ll get the bug report written up and hopefully have the fixed and working for you soon.

As a total aside (and this might just be due to having zero experience with gcloud) how does this help out with preemptible machines? I’d assume you’d want to keep the long jobs off of them as opposed to cancelling them after they turn out to be too long? There might be another way to accomplish what you’re trying to do while we work on this.

Greetings,

Thank you for your input.
The jobs shouldn’t take so long, but due to problems on our side, we decided to implement timeouts to detect those jobs that take too long and optimize them afterwards, so we don’t actually know how long they will take before finishing.

So timeouts are quite important. We make do while the bug is being corrected but we’d rather have them :grinning:

Oh I get it, re-active rather than pro-active! All better than in-active amirite? :wink:

But we’re working on it!

Hello Justin,

Is there any progress in resolving this issue ?
In the meantime, we can’t use the timeout feature of Deadline, and get very bad surprises sometime

Thanks
Christophe

Hi,

@Justin_B is there a solution to this problem ?
We (Fortiche Prod) were considering an update but the version we are testing (10.1.3.6) is affected.

I went through the patch notes and did not notice any mention of a bugfix related to this.

As Deadline 10 does not log anthing through STDOUT when a timeout occurs (whereas Deadline 9 did log something), we cannot catch it through a handler either.

Thanks,
Benoit

Hello!

That fix is live in 10.1.4. Though you should upgrade all the way to latest which is 10.1.12 as of yesterday.

Thanks a lot Justin !
Sorry for not answering earlier, I thought I would get an email for new replies but I didn’t and failed to check.

Ben

Privacy | Site terms | Cookie preferences