Erratic render errors Reading/Writing EXR's (Nuke 11.3 - Deadline 10.0 - W10)

Hiya,

I am unsure if this is Deadline or Nuke related but I can only really see it happening on Deadline.

We have seen very weird erratic behavior when rendering on our farm.

Randomly renders fail with the following types of messages:

[Invalid argument]

or

Write1: EXR: Write failed [Failed to write pixel data to image file "G:/Projects/project/shots/pr_105/pr_sti018_0010/compositing/out/pr_sti018_0010_comp.v002/3424x2202/pr_sti018_0010_comp.v002.1198.exr.4216.tmp". Invalid argument.]

Sometimes this same error but then for a read node (reading exr -> invalid argument)

I am pretty sure these error codes come from the EXR libraries but I’m just wondering why…
What causes such error code?

One interesting Deadline thing I do want to mention (but I am at this stage unsure if it really is like this or an illusion), it looks as if these errors only appear on workers that have a secondary worker process launched (and render on this).
(i.e. -> Worker -> launch new Worker Instance)
I read somewhere some of these errors could be happening if the machine is running out of memory, just wondering if there may be a bug where a new worker instance is not getting access to certain memory?

Hey Ricardo,

It could be failure to access G:. Did you confirm the it was mounted after receiving the failure? If you re-queue the task is the exr output successfull?

Failure to write output could be related to file server load. If that is the case you can try using resource limits to reduce load on your file server.

At the bottom of the error report under details you will find the memory usage of the system, is it close to maxing out?

You can also attempt isolating the submission from Deadline if you can manage to re-create consistently.

Regards,

Charles

Our nodes all have their drives mapped via GPO.
As mentioned, the errors are erratic and not able to be pinned down to just one machine.
The same machine may retry and succeed.

We run a Pixtor system which does not seem to be overloaded during these times.

Memory was usually at 50% or so.

I’ll try turning off batch mode to see if this has any effect or produces more info.

Privacy | Site terms | Cookie preferences