I’m having some trouble getting Modo 701 working properly with Linux Deadline 7.0.2.3.
After submission the scene loads and rendering begins, but at some stage the deadline slave process silently dies, this is normally during the first frame, but occasionally a slave will make it to the second but never to the third or beyond. After the slave process has died the launcher process and modo_cl continue to run and rendering continues but obviously the slave becomes unreachable. I check the slave log and the final entry is a standard progress STDOUT
The only errors I can see in the slave logs generally are Pulse related, which I assume shouldn’t cause any problems, though interestingly these errors occur on both the windows and linux nodes even when Pulse is running and working fine.
2015-01-17 20:41:24: Connection attempt failed. Making connection attempt 2...
2015-01-17 20:41:24: Connection attempt failed. Making connection attempt 3...
2015-01-17 20:41:24: Connection attempt failed. Making connection attempt 4...
2015-01-17 20:41:24: Info Thread - Could not check if Pulse is running because: Network is unreachable
Other things possibly worth mentioning: the scene is very heavy - lots of geometry and textures, when the slave process dies there’s 10Gb+ free ram, currently there’s no swap file, hostnames don’t resolve as we’re running in the cloud on AWS so ‘use slave IP for remote control’ is enabled, IPv6 is not enabled.
just tried Deadline 6 and thankfully it works! None of the ‘Connection attempt failed’ errors either, could the root of this error or the interruptions it causes be the issue with DL7?
I’m more than happy to keep testing with DL7 to find the cause of the problem - my elation at finding it worked on DL6 was just down to realising I could at least render the scene on linux, I have both 6 and 7 installed on the nodes so it’s easy enough to test.
I tried the manual slave/screen approach you suggested, and saw the error below appear a couple of times, once before the job was picked up, and a second time during rendering, although for some reason screen would sometimes just quit giving the error message [screen is terminating] so it took me a few goes to catch this error.
0: STDOUT: ! (renderProgress) Rendering | sceneProgress:16.81% | frame:1, 84.07% | renderPass:1/1, 84.07% | framePass:1/1 | eye:mono | @:Wed Jan 21 16:43:06 2015
0: STDOUT: ! (renderProgress) Rendering | sceneProgress:16.81% | frame:1, 84.07% | renderPass:1/1, 84.07% | framePass:1/1 | eye:mono | @:Wed Jan 21 16:43:06 2015
0: STDOUT: ! (renderProgress) Rendering | sceneProgress:16.81% | frame:1, 84.07% | renderPass:1/1, 84.07% | framePass:1/1 | eye:mono | @:Wed Jan 21 16:43:06 2015
An unhandled exception occurred: Object reference not set to an instance of an object (System.NullReferenceException)
at System.Collections.Generic.List`1[T].Clear () [0x00000] in <filename unknown>:0
at System.Threading.Timer+Scheduler.SchedulerThread () [0x00000] in <filename unknown>:0
at System.Threading.Thread.StartInternal () [0x00000] in <filename unknown>:0
[ERROR] FATAL UNHANDLED EXCEPTION: System.NullReferenceException: Object reference not set to an instance of an object
at System.Collections.Generic.List`1[T].Clear () [0x00000] in <filename unknown>:0
at System.Threading.Timer+Scheduler.SchedulerThread () [0x00000] in <filename unknown>:0
at System.Threading.Thread.StartInternal () [0x00000] in <filename unknown>:0
e[?1le>e]2;screenae[44;1H
e[?1049l[screen is terminating]
Any clues in this error? Does it suggest a issue with Mono? I’ve attached the full log as well.
Yeah, this is definitely internal Mono stuff. That’s not good.
Can you give me an idea of the machine this is running on? We bundle our own Mono now, but I need to know the flavour and version of your distribution so we can install something here.
[code][ 0.000000] Linux version 3.14.27-25.47.amzn1.x86_64 (mockbuild@gobi-build-60001) (gcc version 4.8.2 20140120 (Red Hat 4.8.2-16) (GCC) ) #1 SMP Wed Dec 17 18:36:15 UTC 2014
Well, at least that should be easy for us to replicate.
I’m a bit rusty with EC2, but I remember that there used to be different versions from Canonical, RedHat and the CentOS project. Given the contents of /etc/issue, does Amazon provide their own now?
I’ll see if I can have someone set up the right instance for me here.
I’ve now shared the AMI. Hopefully you’ll be able to get the same error, but if not let me know and I can try and get a huge scene over to you to test with in case that’s making a difference.
So, I’ve just realized something and I’m sorry I didn’t bring this up sooner.
The issue is that we don’t officially support Amazon’s Linux distribution. So, once we reproduce the problem here, the answer is likely going to be “try one of our supported flavours”.
So, an additional request from me. Do you happen to have one of CentOS, Fedora or Ubuntu instances on EC2 you could run the same render on?
I don’t have a supported build at the moment, when I get a chance I can build a new image using a supported flavour, hopefully in the next week or two. I’m planning on trying Ubuntu Server 14.04 LTS, though let me know if you have a preference for something else…
In the meantime, given that Deadline 6 works and 7 doesn’t, do you have a hunch what it might be, or is there a way to force DL7 to use the version of Mono that DL6 is using? I don’t know anything about Mono so that could be a silly question.