AWS Thinkbox Discussion Forums

Erratic render errors Reading/Writing EXR's (Nuke 11.3 - Deadline 10.0 - W10)

Hiya,

I am unsure if this is Deadline or Nuke related but I can only really see it happening on Deadline.

We have seen very weird erratic behavior when rendering on our farm.

Randomly renders fail with the following types of messages:

[Invalid argument]

or

Write1: EXR: Write failed [Failed to write pixel data to image file "G:/Projects/project/shots/pr_105/pr_sti018_0010/compositing/out/pr_sti018_0010_comp.v002/3424x2202/pr_sti018_0010_comp.v002.1198.exr.4216.tmp". Invalid argument.]

Sometimes this same error but then for a read node (reading exr -> invalid argument)

I am pretty sure these error codes come from the EXR libraries but I’m just wondering why…
What causes such error code?

One interesting Deadline thing I do want to mention (but I am at this stage unsure if it really is like this or an illusion), it looks as if these errors only appear on workers that have a secondary worker process launched (and render on this).
(i.e. -> Worker -> launch new Worker Instance)
I read somewhere some of these errors could be happening if the machine is running out of memory, just wondering if there may be a bug where a new worker instance is not getting access to certain memory?

Hey Ricardo,

It could be failure to access G:. Did you confirm the it was mounted after receiving the failure? If you re-queue the task is the exr output successfull?

Failure to write output could be related to file server load. If that is the case you can try using resource limits to reduce load on your file server.

At the bottom of the error report under details you will find the memory usage of the system, is it close to maxing out?

You can also attempt isolating the submission from Deadline if you can manage to re-create consistently.

Regards,

Charles

Our nodes all have their drives mapped via GPO.
As mentioned, the errors are erratic and not able to be pinned down to just one machine.
The same machine may retry and succeed.

We run a Pixtor system which does not seem to be overloaded during these times.

Memory was usually at 50% or so.

I’ll try turning off batch mode to see if this has any effect or produces more info.

Hello @RicardoMusch

I am currently investigating the same error. Have you reached a conclusion about what might be causing this?

Thanks!

Hey everyone,

So this is something we’ve been having trouble with for absolutely ages.
It’s happened from Nuke 11.3 through to 14.1, all on deadline 10.1.

A big reduction in these errors happened when we disabled the ‘Cache Acceleration’ feature on our QNAP storage system.

So could be something related to that but not convinced as it’s still happening.

Like yourselves, it’s been such a sporadic error, we’ve never been able to pin down exactly when it happens.

All our drives are mounted via GPO and we’ve even configured the ‘Mapped Drives’ settings to configure the drives through deadline in case they aren’t mounted for whatever reason.

We’ve only ever seen this error on reading files, never writing.

Error: FailRenderException : [12:13.36] ERROR: Read3: Error reading pixel data from image file "//baitqn/Media/Projects/project/sequences/FW_301_010/FW_VFX_301_010_040/keying/work/nuke/renders/FW_VFX_301_010_040_denoisePL01_v001/FW_VFX_301_010_040_denoisePL01_v001.1113.exr". Invalid argument.

I know this is a pretty old thread, but thought it might help having this additional info.

Cheers!

1 Like
Privacy | Site terms | Cookie preferences