Tiles not saved, without errors. RAM consuption?

Hi there!

We are having a strange issue after upgrading to DL 10.0.7.0 from DL 8.12 (3ds Max 2015/18 + Vray 3.6).
Some tiles are not saved, but they look like as Completed in the monitor. The draft job fails because there are missing tiles. I will add more info:

  • It is more likely to happen on machines with less RAM, but not exclusively.
  • If a machines goes 99% of RAM comsumption, often declare a false “Completed”, but not exclusively.
  • In DL 8, if a machine could not render a heavy tile, it was marked as an error and eventually another machine with more RAM finished the job. That is Ok for us, the problem it the lack of error message and not drafting.
  • Looks like the RAM consumption is bigger in general for all machines. We notice longer times to start rendering.
  • In Offending machines sometimes several Vray frambuffers are shown. The previous jobs VFB seems to be frozen, usually in “Writting output”.
  • With “Enable Local Rendering” ON I read multiples errors “Unable to locate local render file: C:\Users…\AppData\Local\Thinkbox\Deadline10\slave\david\jobsData\5bdb…”.
    With “Enable Local Rendering” OFF I don’t see these errors anymore. In both cases the tiles seems to have the same chance to declare a false “Completed”.
  • In the server logs looks like the previous task can’t be properly closed (WARNING: Timed out waiting for the renderer to close.), and maybe can’t release the RAM?
  • Restart renderer between tasks in ON.
  • All servers have CPU affinity -1, to discard lack of CPU power.
  • I use a SMTD Maxscript to submit the jobs.

I will attach Logs of offending tasks as well as other config files.

Data for DL Forum.zip (2.0 MB)

I’ve been doing test and researching the forum but I am running out of ideas.
Please let me know if there is any other info that can help you.

Thank you very much!!

Any reason not to update to Deadline 10.0.21.5?
https://docs.thinkboxsoftware.com/products/deadline/10.0/1_User%20Manual/manual/release-notes.html

10.0.8.1 included the following

V-Ray Improvements

  • In the Monitor submitter, an informative label is now displayed to explain why tile Rendering is disabled.
  • Fixed a bug in the Monitor submitter that prevented Tile Rendering jobs from being submitted if there were two outputs with the same name in the vrscene file.
  • Fixed other Tile Rendering related bugs in the Monitor submitter.

I’m definitely in favour of users upgrading to SP21 as it offers quite a bit since SP8 shipped. The ones outlined by @anthonygelatka are mostly in the Monitor submitter though.

Looking at the Slave logs here, I do think it was a bug in the Sandbox. Specifically because of this in the Slave log:

2018-11-02 16:32:06:  WARNING: Encountered the following error while initializing the Plugin Sandbox: 'Timed out while waiting for Sandbox process to respond.'.
2018-11-02 16:32:06:  Falling back to embedded Plugin -- this may result in stale Python environments being used.
2018-11-02 16:33:02:  Synchronization time for job files: 42.559 s
2018-11-02 16:33:14:  Synchronizing Plugin 3dsmax from X:/DeadlineRepository10\plugins\3dsmax took: 10 seconds
2018-11-02 16:33:16:  0: INFO: Executing plugin script 'C:\Users\some_user\AppData\Local\Thinkbox\Deadline10\slave\some_machine\plugins\5bdc3cd72684361fb0539707\3dsmax.py'

I know we fixed some logging / sandbox issues since then but I can’t find the exact release. Try running the upgrade to 10.0.21 and see if the sandbox changes help out here.

Download here, docs here

I’ll do it and let you know the results. Thank you guys!!

Sorry for the delay, we are quite busy around here, but today I’ll do the updates and the proper tests. I’ll keep you updated!

Ok, updated to 10.0.21.5 and tested.

Looks like the problem has evolved. In the firsts tries I’ve made, the very RAM consuming tiles has been rendered properly in low RAM computers (very slowly, but done), but they are not drafted, even when all tiles are present.
In one of the jobs (please, see jpg attached), only one render element has been successfully drafted, after a couple of failing tries by the very same machine.

I had attached the logs of the jobs and the slave logs of the machines not drafting.
Please let me know if there is any other info I can provide you with.

Deadline missing tiles II.zip (545.9 KB)

thank you very much.

Hmm. Well, it’s pretty clear cut here for the failures:

2018-11-10 01:38:46:  0: STDOUT: Unable to read file: \\FILESERVER\Projekte laufend\18_345_HPP The cradle\2_Progress\Images\02 Interior B\Renderings\004\004_Environment_tile_5x6_6x6_.0000.cxr for tile: Tile34

and it does look like it failed to save during the render process as there are these errors from Deadline’s local rendering copy process:

2018-11-10 00:50:42:  0: INFO: Searching local output folder for: C:\Users\Mitarbeiter\AppData\Local\Thinkbox\Deadline10\slave\rendermachine02\jobsData\5be5db832ffab856d8e8149a\0_tempbI4t40\004_Environment_tile_5x6_6x6_.0000.cxr
2018-11-10 00:50:42:  0: WARNING: Unable to locate local render file: C:\Users\Mitarbeiter\AppData\Local\Thinkbox\Deadline10\slave\rendermachine02\jobsData\5be5db832ffab856d8e8149a\0_tempbI4t40\004_Environment_tile_5x6_6x6_.0000.cxr

Now, I wonder if you disable to “local rendering” option if those frames will actually render. If not, I’d try to render within Max on “rendermachine02” with the same scene.

I thought that too. In the first post I already stated:

With this I meant that the tiles are saved in most cases, despite of this error messages. I am sorry if it was not very clear. Switching “Enable local render” off makes this error message dissapear, but doesn’t prevent the tiles to show a false “completed”. As I said, this happens now and then, more often when the render node is close to 100% RAM usage.

In this last error that I posted right above, the tiles has been rendered and saved in the destination folder, but draft could not set them togheter.
All render nodes are working normally in 95% of cases, but when the render consume close to 100% of RAM is more likely to have problems, but not exclusively.

I’ll try to force more errors, so I can provide more diverse data.

Thank you very much

Update. So far so good. I couldn’t run hard tests, but working on a regular basis I did not encounter the error anymore, which is a great sign.

This weekend I’ll leave more extensive test running and on Monday I’ll be able to give more reliable feedback.

Thank you and have a great weekend.

That sounds promising! We’ll stand by!

Good news :slight_smile: after running tests with high RAM consumption I could not reproduce the error anymore.
Looks like the updating to 10.0.21.5 solved the issue.

Thanks a lot guys!

1 Like