AWS Thinkbox Discussion Forums

.NET slave crash

2 errors in the system even log:
First:

Log Name: Application
Source: .NET Runtime
Date: 8/12/2013 5:13:42 PM
Event ID: 1026
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: lapro0318
Description:
Application: deadlineslave.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.AccessViolationException
Stack:
at Python.Runtime.Runtime.PyType_GenericAlloc(IntPtr, Int32)
at Python.Runtime.CLRObject…ctor(System.Object, IntPtr)
at Python.Runtime.CLRObject.GetInstance(System.Object)
at Deadline.Plugins.ScriptPlugin.a(System.Object)
at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
at System.Threading.ThreadHelper.ThreadStart(System.Object)

Event Xml:



1026
2
0
0x80000000000000

3073
Application
lapro0318



Application: deadlineslave.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.AccessViolationException
Stack:
at Python.Runtime.Runtime.PyType_GenericAlloc(IntPtr, Int32)
at Python.Runtime.CLRObject…ctor(System.Object, IntPtr)
at Python.Runtime.CLRObject.GetInstance(System.Object)
at Deadline.Plugins.ScriptPlugin.a(System.Object)
at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
at System.Threading.ThreadHelper.ThreadStart(System.Object)


Another:

Log Name: Application
Source: Application Error
Date: 8/12/2013 5:13:43 PM
Event ID: 1000
Task Category: (100)
Level: Error
Keywords: Classic
User: N/A
Computer: lapro0318
Description:
Faulting application name: deadlineslave.exe, version: 6.1.0.52215, time stamp: 0x51fa5cf1
Faulting module name: python26.DLL, version: 2.6.7150.1013, time stamp: 0x4e00e564
Exception code: 0xc0000005
Fault offset: 0x00000000000b9509
Faulting process id: 0xb0c
Faulting application start time: 0x01ce92faf578b0bb
Faulting application path: C:\Program Files\Thinkbox\Deadline6\bin\deadlineslave.exe
Faulting module path: C:\Program Files\Thinkbox\Deadline6\bin\python26.DLL
Report Id: 36497ae4-03ad-11e3-aae4-003048c6b711
Event Xml:



1000
2
100
0x80000000000000

3074
Application
lapro0318



deadlineslave.exe
6.1.0.52215
51fa5cf1
python26.DLL
2.6.7150.1013
4e00e564
c0000005
00000000000b9509
b0c
01ce92faf578b0bb
C:\Program Files\Thinkbox\Deadline6\bin\deadlineslave.exe
C:\Program Files\Thinkbox\Deadline6\bin\python26.DLL
36497ae4-03ad-11e3-aae4-003048c6b711

Interesting. This seems to be related to all the other python stuff that you seem to be running into, thanks for the logs :slight_smile:

As I mentioned in the other thread, I’ll be taking a much more serious look at a lot of the Python stuff very soon, so hopefully I’ll be able to figure out what’s going on there.

Thanks Jon, much appreciated!

Yeah it seems like there is some deeper problem with the python processes there…, i just hope it wont be a mono bug :slight_smile: Maybe something with the exception handling, the crashes are much more common if the scripts themselves have errors / exceptions.

The windows crashes rarely give a crash dump (most times the gui is just hanging using 100% of a cpu core), but the linux one did, so i hope that helps narrow it down.

I don’t think it’s Mono’s fault, I think the problem lies in Python.NET code, so I’ll have to do some digging. We’re shipping our own build of it alongside Deadline, so we can at least change that code if we need to, mono would be a huge pain :slight_smile:

I sense garbage collection issues.

I sense that you’re probably right :slight_smile:

Good news! I was able to reproduce the crash quite reliably with Pulse while running a job dependency script, and I think we’ve got the crashing issue solved. Turns out the problem could only occur when a Python script threw an error. In short, when we were cleaning up the Python exception, it was possible for the exception’s memory block to be deallocated more than once, and if that memory was claimed in between attempts, BOOM!

When I was able to reproduce the crash, it would happen after my test script was executed 5-10 times. Now I’ve executed it over 100 times on multiple sessions and it hasn’t crashed once.

We’ll be rolling this fix into beta 2 and will try to get it released as soon as we can (either later this week or early next week).

Thanks for your help in tracking this problem down!

  • Ryan

Glorious news Ryan! Fingers crossed that this fixes the other python crashes, which all seemed to happen at memory allocation time. That seems to line up with this one. :slight_smile:

Yay!

Beta 2 just went out today - earlier than expected! :slight_smile:
viewtopic.php?f=84&t=10060

Glory! Grabbing right now.

Thanks a lot !

Privacy | Site terms | Cookie preferences