Tiles failed to save

We had an issue last night with one of our renders not saving out tiles for every tile rendered.

It was a job rendered at 20x20 tiles, but we only had 85 files in the folder for the RGB and each render element.

We’ve had issues before with filename length causing issues and we have a Sanity Check to make sure that no names will be too long, it passed the Sanity Check and infact there was a version kicked off the day before with just one character different and it rendered ok, so it’s definitely not a filename length issue.

The tiles that did save were all on the right hand side of the image

1x 15,16,17,18,19,20
2x 16,17,18,19,20
3x 16,17,18,19,20
4x 17,18,19,20
etc…

I’ve attached the logs of the job, but not sure this is going to reveal much…

I’m thoroughly confused…
999_100_000_563eec34.rar (133 KB)

Hey Dave,

It looks like you sent us the job folder instead of the job’s log folder. Can you send us this folder instead?

\\your\repository\reports\jobs\999_100_000_563eec34

Thanks!

  • Ryan

Sorry my mistake, here you go.
999_100_000_563eec34.rar (940 KB)

Thanks for the logs. Yeah, I’m not seeing anything that explains why they would be missing.

Have you tried re-rendering the job to see if the images start showing up properly? Is it all possible that some of rendered images were accidentally deleted? Or maybe there were one or two slaves whose X: mapping was screwed up so their specific tiles got lost?

Just throwing out ideas, since I really have no idea what could have caused this. :slight_smile:

We kicked it off again, but changed the path to a different folder and it was fine.

All blades are identical but we had an issue on another simple job last night unfortunately the OCD guy deleted the failed job from Deadline so I can’t see what the problems were there. But otherwise had no issues with anything else last night.

The only thing I can possibly think of is these machines were running very high on RAM usage, they only have 8gb ram and with displacement they were peaking at 7.98gb could have been eating all of their page file usage, to the point where it’s possible that the machine might have had no file-space free to save the render to the local render machine, I checked one of the machines and it only had 10gb of space left. I’ve asked our IT guy to clear out the temporary files which normally creates a bit of room. Is there any catch to check for if the files being written to the local machine were able to do so? Or would running out of RAM stop the transferring back to the network successfully but still report in success.

We use a Max SDK function to save the file to disk, so if that reports success, we can only assume it was successful. I guess we could do a check immediately after to see if the file is there, but we’d rather not add the check unless we know there is something wrong with this save function.

Enabling local rendering might help in this case, since we write the file locally and then copy it to the network location after, but again, this is assuming that the Copy function fails.

I’m not really sure if this would have an impact or not.

It would be interesting to determine if this was a one-off issue, or is something that will become a growing problem for you. If it starts to happen more and more, we can definitely dig into it further.

Cheers,

  • Ryan

Hopefully it is a one-off, I’ll kick the exact same job off next week when we’re quiet, to the same place and see if I can re-create the issue, We’ve had random problems before that haven’t repeated, just bad luck this was on a urgent job, thankfully we had baked the light cache off so the render got done during the day.