Fail render and avoid automatic re-queueing

nrusch · November 27, 2012, 5:27pm

Is there an API function to abort a task and prevent it from being automatically re-queued?

nrusch · November 27, 2012, 5:28pm

Sorry, this probably shouldn’t be in the beta discussion… feel free to move it.

im_thatoneguy · November 28, 2012, 1:13am

Bad Comes to worst you could always set its status to “Completed” and identify the user as “FAILED” who completed it. But I know there is a dedicated status for “Failed” for jobs. That would be my preference if it doesn’t exist for tasks yet.

nrusch · November 28, 2012, 2:55am

Yeah, that’s not really a solution. Basically, I want a way to say “This is a low-level issue, and re-queueing this particular task is a waste of time.”

rrussell · November 28, 2012, 7:07pm

We’ve added this to the todo list. I can see us adding something like a “FailRenderAndTask” plugin API function, which would function like the existing “FailRender” function, but would also mark the task as failed.

nrusch · November 28, 2012, 10:47pm

Just to clarify, in Deadline’s world, what is “FailRender” supposed to indicate?

The feature I’m after is something we currently rely on in Rush. Basically, an exit code of 2 from the script the Rush daemon is executing seems to be the equivalent of what FailRender does now; that is, it indicates a non-fatal error, allowing the task to be re-queued. However, if a Rush task script exits with a code of 1, it treats that as a fatal error, and the task is marked as “Failed” and left alone. From what I can tell, there is currently no way to do this in Deadline.

rrussell · November 29, 2012, 3:03pm

The FailRender function is used in the plugin to immediately stop the render and requeue the task. It generates an error report with the message specified in the FailRender call. So to compare to Rush, it’s like an exit code of 2.

Rather than introduce a new function like I had suggested before, maybe we just add an optional second parameter to FailRender that specifies the action to take. For example:

Requeue the task with an error (the default, and current way of doing it)
Fail the task with an error (task is marked as failed instead of being requeued)
Requeue task without error (maybe there was a minor issue that shouldn’t count against the job’s error count)

For the latter case, the FailRender function name seems odd, so maybe we just need a new function name like “StopTask” or something…

Thoughts?

nrusch · November 29, 2012, 6:14pm

I think that all sounds good, including changing the function name. Maybe AbortTask?

rrussell · November 29, 2012, 6:38pm

I like “AbortTask”.