AWS Thinkbox Discussion Forums

cancelling task hangs slave

Found 5 slaves with this error:

2013-08-19 13:41:08: Scheduler Thread - Cancelling task because task “2_1009-1012” could not be found
2013-08-19 13:41:08: Scheduler Thread - The task has either been changed externally (and requeued), or the Job has been deleted.
2013-08-19 13:41:24: Scheduler Thread - Cancelling task because task “2_1009-1012” could not be found
2013-08-19 13:41:24: Scheduler Thread - The task has either been changed externally (and requeued), or the Job has been deleted.

repeating endlessly in the log. The slave gui is all white, hanging.

Hey Laszlo,

Can you post the full log? That should show us the last thing the slave did before it got stuck like this.

Thanks!

  • Ryan

I suspect this was a python crash as well, in one of the threads.

Looking back a couple days, this is where that error started:

2013-08-16 16:34:31: Scheduler Thread - Synchronizing job auxiliary files from \inferno2\deadline\repository6\jobs\520eb567af37900ecce58130
2013-08-16 16:34:31: Scheduler Thread - Synchronization time for job files: 0.000 s
2013-08-16 16:34:31: Scheduler Thread - Synchronizing plugin files from \inferno2\deadline\repository6\plugins\Nuke
2013-08-16 16:34:31: Scheduler Thread - Synchronization time for plugin files: 31.200 ms
2013-08-16 16:34:32: 0: Got task!
2013-08-16 16:34:32: 0: Plugin will be reloaded because a new job has been loaded, or one of the job files or plugin files has been modified
2013-08-16 16:34:32: Constructor: Nuke
2013-08-16 16:34:32: 0: Loaded plugin: Nuke
2013-08-16 16:34:32: 0: Task timeout is 72000 seconds (Regular Task Timeout)
2013-08-16 16:34:32: 0: Loaded job: [BURN] Nuke Render: LEM_006_1010_anim_v0009_mpf.nk (520eb567af37900ecce58130)
2013-08-16 16:34:33: 0: INFO: Executing plugin script C:\Users\scanlinevfx\AppData\Local\Thinkbox\Deadline6\slave\LAPRO0307\plugins\Nuke.py
2013-08-16 16:41:10: Scheduler Thread - Cancelling task because task “1_1005-1008” could not be found
2013-08-16 16:41:10: Scheduler Thread - The task has either been changed externally (and requeued), or the Job has been deleted.
2013-08-16 16:41:27: Scheduler Thread - Cancelling task because task “1_1005-1008” could not be found
2013-08-16 16:41:27: Scheduler Thread - The task has either been changed externally (and requeued), or the Job has been deleted.
2013-08-16 16:41:43: Scheduler Thread - Cancelling task because task “1_1005-1008” could not be found
2013-08-16 16:41:43: Scheduler Thread - The task has either been changed externally (and requeued), or the Job has been deleted.
2013-08-16 16:41:58: Scheduler Thread - Cancelling task because task “1_1005-1008” could not be found
2013-08-16 16:41:58: Scheduler Thread - The task has either been changed externally (and requeued), or the Job has been deleted.
2013-08-16 16:42:16: Scheduler Thread - Cancelling task because task “1_1005-1008” could not be found
2013-08-16 16:42:16: Scheduler Thread - The task has either been changed externally (and requeued), or the Job has been deleted.

The full log is attached

There was no event viewer entry on the machine for this crash btw. Usually, the python crashes have one.

Thanks! It could definitely be related to the python crash if it only crashed the thread and not the entire slave application. Let’s keep an eye out for this after you have a chance to update to beta 2 with the new python crash fix.

Yeah in the long log, i did find one python exception higher up, so maybe something got corrupted there, and took a while to trickle down

This is still happening with beta2, attached are corresponding logs.
deadlineslave_LAPRO0321-LAPRO0321-2013-08-23-0000.zip (296 KB)

Found a similar one, but this slave actually crashed too.

The event viewer entry:

Log Name: Application
Source: Application Error
Date: 8/23/2013 10:22:53 AM
Event ID: 1000
Task Category: (100)
Level: Error
Keywords: Classic
User: N/A
Computer: lapro0329
Description:
Faulting application name: deadlineslave.exe, version: 6.1.0.52340, time stamp: 0x5213a7d2
Faulting module name: python26.DLL, version: 2.6.7150.1013, time stamp: 0x4e00e564
Exception code: 0x40000015
Fault offset: 0x00000000001233ba
Faulting process id: 0x2e54
Faulting application start time: 0x01ce9f543c9094b2
Faulting application path: C:\Program Files\Thinkbox\Deadline6\bin\deadlineslave.exe
Faulting module path: C:\Program Files\Thinkbox\Deadline6\bin\python26.DLL
Report Id: a40bdaf3-0c18-11e3-b0d7-003048c67815
Event Xml:



1000
2
100
0x80000000000000

6731
Application
lapro0329



deadlineslave.exe
6.1.0.52340
5213a7d2
python26.DLL
2.6.7150.1013
4e00e564
40000015
00000000001233ba
2e54
01ce9f543c9094b2
C:\Program Files\Thinkbox\Deadline6\bin\deadlineslave.exe
C:\Program Files\Thinkbox\Deadline6\bin\python26.DLL
a40bdaf3-0c18-11e3-b0d7-003048c67815

Last relevant lines of log:

2013-08-22 16:39:41: 0: Got task!
2013-08-22 16:39:41: Constructor: MayaBatch
2013-08-22 16:39:41: 0: Loaded plugin: MayaBatch
2013-08-22 16:39:41: 0: Task timeout is 72000 seconds (Regular Task Timeout)
2013-08-22 16:39:41: 0: Loaded job: [BURN] Software Render: LEM_006_1240_maya_animation_Layout.ma version: v0037 (52169e89cc91191b5c4a4f2c)
2013-08-22 16:39:41: 0: INFO: Executing plugin script C:\Users\scanlinevfx\AppData\Local\Thinkbox\Deadline6\slave\LAPRO0329\plugins\MayaBatch.py
2013-08-22 16:50:05: Scheduler Thread - Cancelling task because task “46_1231-1235” could not be found
2013-08-22 16:50:05: Scheduler Thread - The task has either been changed externally (and requeued), or the Job has been deleted.
2013-08-22 16:50:22: Scheduler Thread - Cancelling task because task “46_1231-1235” could not be found
2013-08-22 16:50:22: Scheduler Thread - The task has either been changed externally (and requeued), or the Job has been deleted.
2013-08-22 16:50:40: Scheduler Thread - Cancelling task because task “46_1231-1235” could not be found
2013-08-22 16:50:40: Scheduler Thread - The task has either been changed externally (and requeued), or the Job has been deleted.
2013-08-22 16:50:56: Scheduler Thread - Cancelling task because task “46_1231-1235” could not be found
2013-08-22 16:50:56: Scheduler Thread - The task has either been changed externally (and requeued), or the Job has been deleted.
2013-08-22 16:51:15: Scheduler Thread - Cancelling task because task “46_1231-1235” could not be found
2013-08-22 16:51:15: Scheduler Thread - The task has either been changed externally (and requeued), or the Job has been deleted.
2013-08-22 16:51:30: Scheduler Thread - Cancelling task because task “46_1231-1235” could not be found
2013-08-22 16:51:30: Scheduler Thread - The task has either been changed externally (and requeued), or the Job has been deleted.
2013-08-22 16:51:45: Scheduler Thread - Cancelling task because task “46_1231-1235” could not be found
2013-08-22 16:51:45: Scheduler Thread - The task has either been changed externally (and requeued), or the Job has been deleted.
2013-08-22 16:52:01: Scheduler Thread - Cancelling task because task “46_1231-1235” could not be found
2013-08-22 16:52:01: Scheduler Thread - The task has either been changed externally (and requeued), or the Job has been deleted.
2013-08-22 16:52:20: Scheduler Thread - Cancelling task because task “46_1231-1235” could not be found

Then that just repeats for 2 days. Attached are the full logs
deadlineslave_LAPRO0329-LAPRO0329-2013-08-23-0000.zip (370 KB)

Another, no crash log though.

Last relevant lines:
2013-08-22 20:45:17: 0: INFO: WINDOWS_TRACING_LOGFILE=C:\BVTBin\Tests\installpackage\csilogfile.log
2013-08-22 20:45:17: 0: INFO: Start Job called - starting up 3dsmax plugin
2013-08-22 20:45:17: 0: INFO: Rendering with 3dsmax version: 2012
2013-08-22 20:45:17: 0: INFO: Build of 3dsmax to force: 64bit
2013-08-22 20:45:17: 0: INFO: Rendering with executable: C:\3ds Max 2012\3dsmax.exe
2013-08-22 20:45:17: 0: INFO: Checking registry for 3dsmax language code
2013-08-22 20:45:17: 0: INFO: Found language code: 409
2013-08-22 20:45:17: 0: INFO: Language code string: enu
2013-08-22 20:45:17: 0: INFO: Fail on existing 3dsmax process: 0
2013-08-22 20:45:17: 0: INFO: Load 3dsmax timeout: 1000 seconds
2013-08-22 20:45:17: 0: INFO: Start job timeout: 1000 seconds
2013-08-22 20:45:17: 0: INFO: Progress update timeout: 8000 seconds
2013-08-22 20:45:17: 0: INFO: Progress update timout disabled: 0
2013-08-22 20:45:17: 0: INFO: Slave mode enabled: 1
2013-08-22 20:45:17: 0: INFO: Silent mode enabled: 0
2013-08-22 20:45:17: 0: INFO: Local rendering enabled: 1
2013-08-22 20:45:17: 0: INFO: Running render sanity check using 3dsmaxcmd.exe
2013-08-22 20:46:54: Scheduler Thread - Cancelling task because task “17_1086-1090” could not be found
2013-08-22 20:46:54: Scheduler Thread - The task has either been changed externally (and requeued), or the Job has been deleted.
2013-08-22 20:47:12: Scheduler Thread - Cancelling task because task “17_1086-1090” could not be found
2013-08-22 20:47:12: Scheduler Thread - The task has either been changed externally (and requeued), or the Job has been deleted.
2013-08-22 20:47:30: Scheduler Thread - Cancelling task because task “17_1086-1090” could not be found
2013-08-22 20:47:30: Scheduler Thread - The task has either been changed externally (and requeued), or the Job has been deleted.
2013-08-22 20:47:46: Scheduler Thread - Cancelling task because task “17_1086-1090” could not be found
2013-08-22 20:47:46: Scheduler Thread - The task has either been changed externally (and requeued), or the Job has been deleted.
2013-08-22 20:48:05: Scheduler Thread - Cancelling task because task “17_1086-1090” could not be found
2013-08-22 20:48:05: Scheduler Thread - The task has either been changed externally (and requeued), or the Job has been deleted.

As i was clearing up the hung nodes, i found another that generated a crash entry with the cancelling issue. Exact same stack as the one before:

Attached is the slave log
deadlineslave_LAPRO1305-LAPRO1305-2013-08-23-0000.log (1.98 MB)

Privacy | Site terms | Cookie preferences