Hi,
today i found a task of a job was rendering endlessly (16 hrs to be precise). I assumed a crashed Application and/or Slave and did a “Requeue Task” so it would be rendered by another Slave. It did so but despite the task completing and the job then being marked as complete the status of the Slave in the Slave panel was still “rendering”. I then RDP’ed into the affected machine and found that Nuke (8.0v6) had crashed and was showing one of the usual Windows crash dialogue which showed this info:
[code]Problem signature:
Problem Event Name: APPCRASH
Application Name: Nuke8.0.exe
Application Version: 0.0.0.0
Application Timestamp: 5411d2b9
Fault Module Name: MSVCR100.dll
Fault Module Version: 10.0.30319.1
Fault Module Timestamp: 4ba220dc
Exception Code: 40000015
Exception Offset: 00000000000760d9
OS Version: 6.1.7601.2.1.0.256.48
Locale ID: 1031
Additional Information 1: f50a
Additional Information 2: f50a94c46d1af12074027fcda9ee8c8f
Additional Information 3: c303
Additional Information 4: c3033e59905da9299a0c305ac348ad24
Read our privacy statement online:
http://go.microsoft.com/fwlink/?linkid=104288&clcid=0x0409
If the online privacy statement is not available, please read our privacy statement offline:
C:\Windows\system32\en-US\erofflps.txt[/code]
After clicking the “Close Application” button Windows showed also a crash dialogue for Deadline Slave:
[code]Problem signature:
Problem Event Name: BEX64
Application Name: deadlineslave.exe
Application Version: 7.0.0.50
Application Timestamp: 547f1c07
Fault Module Name: Wacom_Tablet.dll_unloaded
Fault Module Version: 0.0.0.0
Fault Module Timestamp: 51154fcc
Exception Offset: 000007fef5415dfd
Exception Code: c0000005
Exception Data: 0000000000000008
OS Version: 6.1.7601.2.1.0.256.48
Locale ID: 1031
Additional Information 1: 1751
Additional Information 2: 1751db00310023f5bc93b01cbe496fe7
Additional Information 3: 1036
Additional Information 4: 103688058abd7fd2151ef073198ffa01
Read our privacy statement online:
http://go.microsoft.com/fwlink/?linkid=104288&clcid=0x0409
If the online privacy statement is not available, please read our privacy statement offline:
C:\Windows\system32\en-US\erofflps.txt[/code]
After closing this one, it showed another one:
[code]Problem signature:
Problem Event Name: APPCRASH
Application Name: deadlineslave.exe
Application Version: 7.0.0.50
Application Timestamp: 547f1c07
Fault Module Name: Wacom_Tablet.dll
Fault Module Version: 6.3.5.3
Fault Module Timestamp: 51154fcc
Exception Code: c000041d
Exception Offset: 0000000000005dfd
OS Version: 6.1.7601.2.1.0.256.48
Locale ID: 1031
Additional Information 1: e6c3
Additional Information 2: e6c3d8284bf05f0a946f68dcbc0dd3eb
Additional Information 3: fbfe
Additional Information 4: fbfe7ab2cada43151b0fa1e2e8b1787f
Read our privacy statement online:
http://go.microsoft.com/fwlink/?linkid=104288&clcid=0x0409
If the online privacy statement is not available, please read our privacy statement offline:
C:\Windows\system32\en-US\erofflps.txt[/code]
The status of the Slave still didn’t change in the Monitor. But i guess this is because there’s no logic implemented for this kind of situation and because the Slave apparently can’t update its status anymore?
On the other hand shouldn’t there be any communication going on with the deadlinelauncher process to see if the Slave on that machine actually still exists? Just wondering…
I attached a screenshot of the Monitor with the job, task and slave being selected.
Now that i’m posting this i’m remembering seeing the crash message of the Slave almost always when RDP’ing into that machine (cell-ws-17), just forgot to report this until now. After starting Slave and Launcher again everything runs fine usually. Don’t know if this is actually something that you guys can do something about or whether i need to contact Wacom about this as it seems this is related to that Wacom_Tablet.dll in the crash message. Now that i think about it, this probably means the Slave crash wasn’t because of the Nuke crash but maybe rather because of me RDP’ing into that machine. But as i did this a long time after Nuke crashed i’m wondering if Deadline Slave doesn’t actually catch this and kill/restart the process?
Cheers,
Holger