Deadline Command 6.0 has stopped working

This crash is encountered when submitting a Nuke 7.0v8 job. The job is submitted and renders properly but also crashes the slave, eventually causing a stalled node.

Here’s the read-out from the workstation that is submitting:

Problem signature:
Problem Event Name: APPCRASH
Application Name: deadlinecommand.exe
Application Version: 6.0.0.51561
Application Timestamp: 51a8a2a7
Fault Module Name: clr.dll
Fault Module Version: 4.0.30319.296
Fault Module Timestamp: 50483916
Exception Code: c00000fd
Exception Offset: 000000000000394b
OS Version: 6.1.7601.2.1.0.256.48
Locale ID: 4105
Additional Information 1: 23d5
Additional Information 2: 23d541fd32a295aa831089287a1fb8bb
Additional Information 3: bb0c
Additional Information 4: bb0c3ffe262371456090f8cd84f41ed6

Here is the readout from the slave that crashes:

Problem signature:
Problem Event Name: APPCRASH
Application Name: deadlineslave.exe
Application Version: 6.0.0.51561
Application Timestamp: 51a8a270
Fault Module Name: clr.dll
Fault Module Version: 4.0.30319.296
Fault Module Timestamp: 50483916
Exception Code: c00000fd
Exception Offset: 00000000001daa2c
OS Version: 6.1.7601.2.1.0.256.48
Locale ID: 4105
Additional Information 1: c697
Additional Information 2: c697220e0557ca8b6cf0073703cfa552
Additional Information 3: f87a
Additional Information 4: f87ab7d70f0b466f36f4116300dd72be

Is the slave running on the same machine that the deadlinecommand crash occurs on, or a different machine? Does reinstalling the Deadline 6 client on the machine(s) help?

Cheers,

  • Ryan

deadline command crashes on the submitting workstation, and deadline slave crashes on the renderdrone. Since it’s a .net 4.0 problem, I installed .net 4.5 on the workstation and now receive a different crash message. Microsoft has a hotfix available for this apparently, but it’s only available through a paid support ticket, and there’s still no promise it’s a fix anyways.

EDIT: I also want to say that when the slave machine crashes, it becomes unresponsive to remote control (slave restart, machine restart, etc). I have to rdp in and restart manually. This didn’t occur with deadline 5. Thanks for any help you can provide Ryan. We have a new season starting up and would like this to be stable before the full staff comes back.

EDIT 2: I uninstalled deadline, deleted c:\program files\thinkbox\deadline and reinstalled, but it didn’t help.

http://support.microsoft.com/kb/2640103

Problem signature: Problem Event Name: CLR20r3 Problem Signature 01: deadlinecommand.exe Problem Signature 02: 6.0.0.51561 Problem Signature 03: 51a8a2a7 Problem Signature 04: mscorlib Problem Signature 05: 4.0.30319.18047 Problem Signature 06: 5155314f Problem Signature 07: 103 Problem Signature 08: 0 Problem Signature 09: System.StackOverflowException OS Version: 6.1.7601.2.1.0.256.48 Locale ID: 4105 Additional Information 1: 1e94 Additional Information 2: 1e94280b7dec4c2db6666dd8e4ba393d Additional Information 3: 3c4a Additional Information 4: 3c4a8d007f864ee70eaeb67a822d0f45

Do you have all of your Windows updates applied? Sometimes that can fix weird issues like this.

Do any other types of jobs cause this crash?

All windows and application updates are current.

Maya 2014 crashes deadlinecommand on the workstation but not the slave.
Max 2014 crashes deadlinecommandbg and the max exe on the workstation but not the slave.
After Effects CS6 crashes deadlinecommand but not the slave.

This occurs on multiple workstations and all slaves.

Thanks for the additional info. It’s crazy that so many machines are affected by this. I have a couple more questions:

  1. Which version of Windows are you running on the workstations and the slaves?
  2. When you upgraded from Deadline 5 to Deadline 6, did you install a fresh repository, or did you overwrite the existing one?
  3. When you upgraded, did you install the Deadline 6 client to a new location on your machines, or did you install over the existing Deadline 5 clients?

Thanks!

  • Ryan

Windows 7 Professional SP1

I originally tried overwrite but something went wrong, so I installed a fresh copy and it worked fine.

A mix of both. I started out installing over top of 5, but decided to uninstall 5 for the last few workstations. All renderdrones were new installs.

Here’s the message from the Submission Results window after a crash: ‘Process is terminated due to StackOverflowException.’

Thanks for the additional info. I’m pretty stumped as to what could be causing this, especially since it’s so widespread. As a last guess, is there any antivirus software running on all of your machines that could somehow be interfering with things?

Assuming that’s not the case, I would highly suggest emailing support@thinkboxsoftware.com, as one of our technical guys can probably set up a remote session if you want and play with this themselves to see if they can come up with any ideas.

Cheers,

  • Ryan

I’ve tried with AV on and off, with no difference.

Going through various .net bug reports online, I’ve discovered that service packs, security patches and updates can wreak havoc on .net apps. I saw one guy discover has app only worked if a certain kb article was installed, otherwise it crashed (same crash as this one). He never knew it because he always had it installed on his development pc. Turns out, adding a line to account for 32-bit operations was the fix.

If support can resolve this, then that’s great. Thanks again Ryan.

With the help of Thinkbox support, this has been resolved. It turned out to be a MongoDB connection problem. The installation required an IP address rather than a server name. Our DNS server is functional so I don’t know why this would fail, but it’s up and running now with no crashes.

Hi,

I’ve just noticed we’re having the same issue. We have mongo installed with a domain name because we have multiple subnets. Is the only solution to switch the IP address rather than domain name? If so that will disconnect a portion of our network…

Cheers

Nick

Just to confirm, does job submission fail from all of your machines? On a machine it does fail from, can you open a command prompt and run deadlinecommand.exe to get the pools list to see what happens? For example:

"C:\Program Files\Thinkbox\Deadline6\bin\deadlinecommand.exe" -pools

It should either:
i) print out the list of pools
ii) fail with a connection error
iii) crash

If you can let us know what happens, that would be a good start.

Thanks!

  • Ryan

Hi Ryan,

I tested the command on one of the machines and it prints the pools fine. Sorry I wasn’t 100% clear in the previous message, we’re having issues with Deadline Slaves crashing and throwing up an error message, something like “Deadline Slave has stopped working”. The machine is then unresponsive to remote deadline commands until I manually connect to it close the error window which allows the Slave to crash properly. Then I can start it up again and it works fine.

Cheers

Nick

Can you post the slave log from the session when the slave crashed? If the launcher is running on the slave machine, you can right-click on it and select Explore Log Folder.

We have recently become aware of some python-related issues that could cause the slave to crash, and we’re currently working to address these in the 6.1 beta. If we can take a look at your slave log, we can determine if it’s a similar problem.