AWS Thinkbox Discussion Forums

Job appears to be cancelled and requeued at random

I’ve see this hit me a few times, but am finally having it happen repeatedly. I’m on Deadline 10.1.21.4 and all workers are upgraded.

I have an worker running a long job (it’s a python submission that’s calculating a checksum to save for later). After about three minutes it restarts and doesn’t report any error.

The specific error is below. I’m printing progress as it goes and then sometime around the 8 - 10% mark I see: "2022-08-02 14:02:28: Scheduler Thread - Task “0_0-0” has been requeued, because ChangedStatus. Cancelling task…

2022-08-02 14:02:28: 0: RenderThread CancelCurrentTask called, will transition from state None to None

2022-08-02 14:02:28: 0: In the process of canceling current task: ignoring exception thrown by PluginLoader"

I’m not cancelling this task, and no one else is on the farm. The requeue reports show the same thing. The script runs fine when ran manually.

Any suggestions?

The job is picked up by the same worker with no error reporting and restarts / requeues endlessly.

2022-08-02 14:01:02: 0: STDOUT: Progress: 10%

2022-08-02 14:01:29: 0: STDOUT: Progress: 11%

2022-08-02 14:01:56: 0: STDOUT: Progress: 12%

2022-08-02 14:02:24: 0: STDOUT: Progress: 13%

2022-08-02 14:02:28: Scheduler Thread - Task "0_0-0" has been requeued, because ChangedStatus. Cancelling task...

2022-08-02 14:02:28: 0: RenderThread CancelCurrentTask called, will transition from state None to None

2022-08-02 14:02:28: 0: In the process of canceling current task: ignoring exception thrown by PluginLoader

2022-08-02 14:02:28: 0: Unloading plugin: Python

2022-08-02 14:02:28: 0: Executing plugin command of type 'Cancel Task'

2022-08-02 14:02:28: 0: Done executing plugin command of type 'Cancel Task'

2022-08-02 14:02:29: 0: Sending kill command to process Python with id: 11714

2022-08-02 14:02:29: 0: Done executing plugin command of type 'Render Task'

2022-08-02 14:02:29: 0: Executing plugin command of type 'End Job'

2022-08-02 14:02:29: 0: Done executing plugin command of type 'End Job'

2022-08-02 14:02:29: 0: Error in EndJob: Process was canceled externally.

2022-08-02 14:02:29: at Deadline.Plugins.PluginWrapper.RenderTasks(Task task, String& outMessage, AbortLevel& abortLevel)

2022-08-02 14:02:31: Sending kill command to process deadlinesandbox with id: 11703

2022-08-02 14:02:31: 0: Render Thread - Render State transition from = 'Rendering' to = 'WaitingForTask'

2022-08-02 14:02:31: Scheduler Thread - In the process of canceling current tasks: ignoring exception thrown by render thread 0

2022-08-02 14:02:31: Scheduler Thread - Scheduler State transition from = 'PreRendering' to = 'PostRendering'

2022-08-02 14:02:31: Scheduler Thread - Scheduler State transition from = 'PostRendering' to = 'EndJob'

2022-08-02 14:02:31: Scheduler Thread - Scheduler State transition from = 'EndJob' to = 'WaitingForJob'

2022-08-02 14:02:31: Scheduler Thread - Seconds before next job scan: 1

2022-08-02 14:02:32: Scheduler Thread - Performing pending job scan...

2022-08-02 14:02:32: Skipping pending job scan because it is not required at this time

2022-08-02 14:02:32: Scheduler Thread - Performing repository repair...

2022-08-02 14:02:32: Skipping repository repair because it is not required at this time

2022-08-02 14:02:32: Scheduler Thread - Performing house cleaning...

2022-08-02 14:02:32: Skipping house cleaning because it is not required at this time

2022-08-02 14:02:32: Scheduler Thread - Scheduler State transition from = 'WaitingForJob' to = 'LicenseCheck'

2022-08-02 14:02:32: Scheduler Thread - Scheduler State transition from = 'LicenseCheck' to = 'LicenseConfirmed'

2022-08-02 14:02:32: Scheduler - Previously-acquired limits: [Deadline.LimitGroups.LimitGroupStub]

2022-08-02 14:02:32: Scheduler - Limit Stub for '62e96cf27c9381231cfeca53' (held by 'pipeline-01') was removed.

2022-08-02 14:02:32: Scheduler - Scheduler - Acquired limit '62e96cf27c9381231cfeca53'

2022-08-02 14:02:32: Scheduler - Successfully dequeued 1 task(s) for Job '62e96cf27c9381231cfeca53'. Returning.

2022-08-02 14:02:32: Scheduler - Job scan acquired 1 tasks in 0s after evaluating 1 different jobs.

2022-08-02 14:02:32: Scheduler - Limits held after Job scan: [62e96cf27c9381231cfeca53]

2022-08-02 14:02:32: Scheduler - Returning limit stubs not in use.

2022-08-02 14:02:32: Scheduler Thread - Job's Limit Groups:

2022-08-02 14:02:32: Scheduler Thread - Scheduler State transition from = 'LicenseConfirmed' to = 'LicenseCheck'

2022-08-02 14:02:32: Scheduler Thread - Scheduler State transition from = 'LicenseCheck' to = 'LicenseConfirmed'

2022-08-02 14:02:32: Scheduler Thread - Scheduler State transition from = 'LicenseConfirmed' to = 'StartJob'

2022-08-02 14:02:33: 0: Render Thread - Render State transition from = 'WaitingForTask' to = 'ReceivedTask'

2022-08-02 14:02:33: Scheduler Thread - Scheduler State transition from = 'StartJob' to = 'PreRendering'

2022-08-02 14:02:33: 0: Got task!

2022-08-02 14:02:33: 0: Render Thread - Render State transition from = 'ReceivedTask' to = 'Other'

2022-08-02 14:02:33: 0: Loading Job's Plugin timeout is Disabled

2022-08-02 14:02:33: 0: SandboxedPlugin: Render Job As User disabled, running as current user 'admin'

2022-08-02 14:02:34: '/Users/admin/Library/Application Support/Thinkbox/Deadline10/pythonAPIs/2022-04-20T21:27:48.0000000Z' already exists. Skipping extraction of PythonSync.

2022-08-02 14:02:35: 0: Loaded plugin Python

2022-08-02 14:02:35: All job files are already synchronized

2022-08-02 14:02:35: Plugin Python was already synchronized.

2022-08-02 14:02:35: 0: Executing plugin command of type 'Initialize Plugin'

does it have a timeout applied to it?

It does not, unless there’s another place the timeout could be applied?

Here’s the requeue reports which show the “external cancellation”

I am also having this issue with 3ds Max submissions.
I initially put it down to a task timeout so I increased the timeouts, but I’m seeing this cancellation report on tasks way before their timeout.

I’m fairly certain it’s not a network issue.

Based on reports from the artists sending the jobs, this appears to happen quite randomly so I cannot replicate it reliably.
I was thinking it was something to do with our 3ds Max scenes so I was going to investigate further there, however, then the above Python submission is strikingly similar so it does seem like a Deadline issue.

I am running Deadline 10.1.13.1 which I’m planning on updating, but I don’t see any mention of fixes for issues that appear related in newer versions according to the release logs.

Anyone have any ideas about what could be causing this, or suggestions on how to identify the cause?

Privacy | Site terms | Cookie preferences