AWS Thinkbox Discussion Forums

DEADLINE takes long time to start the job, stuck at certain phase for about 10 min

do you have the same lag submitting a commandline render without Deadline?

Are you able to switch the debug log on to see if redshift outputs any more info?

Are you able to monitor the GPU to see if it’s loading into vram etc?

I prefer exporting to standalone in whatever application or renderer i’m using, I know it’s not always quicker down to the export process.

I saw GridMarkets had some nice tips on submitting houdini/redshift, not sure if it’s of any use, I’ve not tested it out

hmm interesting tips again!!

will look into those. for the time being, things i know

GPU vram was not being loaded (or the cuda cores being used).
will need to look at the redshift logs as well,

will post an update here when i find out! thank you for your suggestions Anthony!

Having this exact same issue.

Haven’t tried adjusting concurrent tasks, but given I’m only running two GPUs per machine (only two machines), and don’t have anywhere near the same bottleneck when rendering locally in Houdini, I’m very curious to see what’s causing the hang-up here…

network?

Disk

We recommend using fast SSD drives. Redshift automatically converts textures (JPG, EXR, PNG, TIFF, etc) to its own texture format which is faster to load and use during rendering. Those converted textures are stored in a local drive folder. We recommend using an SSD for that texture cache folder so that, during rendering, the converted texture files can be opened fast. Redshift can optionally not do any of this caching and simply open textures from their original location (even if that is a network folder), but we don’t recommend this. For more information on the texture cache folder, please read the online documentation.

To recap:

** Prefer SSDs to mechanical hard disks*

Network and NAS

Redshift can render several times faster than CPU renderers. This means that the burden on your network can be higher too, just like it would be if you were adding lots more render nodes! As mentioned above, Redshift caches textures to the local disk so it won’t try to load textures through the network over and over again (it will only do it if the texture changes). However, other files (like Redshift proxies) are not locally cached so they will be accessed over the network repeatedly. Fast networks and networked-attached-storage (NAS) typically work fine in this scenario.

However, there have been a few cases where users reported extremely low performance with certain NAS solutions. Since there are many NAS products available in the market, we strongly recommend thoroughly testing your chosen NAS with large Redshift proxies over the network. For example, try exporting a large Redshift proxy containing 30 million triangles or so (a tessellated sphere would do), save it in a network folder and then try using it in a scene both through a network path and also through a local file - and measure the rendering performance difference between the two.

To recap:

** Rendering with Redshift is like rendering with lots of machines. It might put a strain on your network.*
** Thoroughly test your network storage solution! Some of them have performance issues!*

SSDs… SSDs everywhere…

Could be a network bottleneck? But in the bit of testing I’ve done, even pulling source scene/caches from my server and rendering locally on a workstation is still heaps faster than submitting a job to the farm.

I’ve run into the same issue here but I may be found the problem.
I’ve opened up the resource monitor and looking at network speed.
The scene is loaded from NAS.
When live Houdini is loading scene it pulls full speed from NAS and loads in seconds, a bit over 500 MB scene.
But when I monitor Houdini from the deadline and worker log, when it stats loading scene it pulls only at a speed of half a megabyte per second. And then it takes 18 minutes just to load the scene.
So even loading every time is not that big of an issue, an inconvenience but not a big issue, but the speed of loading is.
Also to mention this is with single machine loading from NAS while testing so no bottlenecking or anything like it.

Been running into this problem a lot recently. Files on a NAS which usually take 1-2mins in a Houdini UI session are taking 10-20mins to open during a Deadline task. This applies to Redshift renders and geo caches alike.

They always get stuck at the ‘Input File’ stage:

2021-03-12 13:31:36:  0: STDOUT: Start: 149
2021-03-12 13:31:36:  0: STDOUT: End: 155
2021-03-12 13:31:36:  0: STDOUT: Increment: 1
2021-03-12 13:31:36:  0: STDOUT: Ignore Inputs: True
2021-03-12 13:31:36:  0: STDOUT: No output specified. Output will be handled by the driver
2021-03-12 13:31:36:  0: STDOUT: Driver: /obj/Build/geo1/filecache1/render
2021-03-12 13:31:36:  0: STDOUT: Input File: (REMOVED THE FILENAME...)
2021-03-12 13:41:00:  0: Task timeout is 7520 seconds (Auto Task Timeout)
2021-03-12 13:41:15:  0: STDOUT: Warnings were generated during load.

Approx 10 minutes to open on our fastest PC

Do we know whether this is a Houdini issue or a specific problem with hrender_dl.py / Deadline specific processes? How could I test further? It’s really crippling our farm performance.

grrrr…i am having the same issues here, i had to rebuild the file, and it worked well for a few submissions. then all of a sudden 5 mins, 10 mins to load and start simulating…it is a 8Mb file, and i got a 10MBE network…why is this happening and how do we fix it please?

ok, if the frame on your file is NOT at f1, the file will take a while loading or it sits on the worker cooking something…send files on f1, and the job starts as usual…still not 100% sure this is the issue but so far it has been working

I’m having the same issue with RedShift and C4d. No concurrent tasks. I checked the log, and it’s taking 5min to load this script. You can see the jump in the log time.
2021-03-30 09:43:08: 0: STDOUT: Running Script: C:\ProgramData\Thinkbox\Deadline10\workers\DESKTOP-R5SF3N2\jobsData\606351a79d44890bd0cd7958\thread0_tempyEoRd0\c4d_Batch_Script.py
2021-03-30 09:48:34: 0: STDOUT: Redshift Debug: Context: Locked:Render

And it renders fine in C4d Without Deadline. I’m pretty sure this is happening with Deadline.

My stuff renders fine outside deadline, in houdini gui or in the cmd line with hython / hrender. But deadline is triggering something forcing the scene to recompute a lot of things. In my case, sucking all the 96Gb of ram I have, going into swap, etc.

ubuntu, houdini 18.5, rs 3.0.41, deadline 10.1.14.5

I tested royal render for a bit and it seems to handle this a bit better. looks like Houdini i s loading the scene every single time for each task in job, royal render managed to load scene once and keep it loaded and just assigning frames. something like deadline already do for Maya batch. it is a huge issue and was producing huge waste of time.

I have to agree on this one too, there is a some kind of problem with deadline!!! And i think its about time that someone check this out, i dont want to pay for extra time license!!!

if you want guys you can try this from Houdini Command Line Tools:

hython “C:\Program Files\Side Effects Software\Houdini 18.5.408\bin\hrender.py” scenefile.hip -d ropname -v -e -f 1 2

And see the speed of loading your scene, i have test it on local machine and also on network too, and there was almost no delay in loading any scene!!! so AWS wake up

And to confirm again.
I have scene where, in deadline log there is point wher it sits for 30-40 minutes seemengly doing nothing.
On the other hadn rendering from command line does nto ahv ethat hold up.
In comand line I use:
source houdini_setup
hbatch
Redshift_setGPU
render -f <start frame
After hbatch and scene it laods ratehr fast and redi to render. In dealdine same time after load it just stuck for 30-40 minutes…
This alone makes houdini unusable in deadline

I wonder if this is related to the bug mentioned in this thread: Path mapping with Houdini plugin - #3 by antoinedurr

The original poster said:

There’s a Houdini bug where rbdfracturematerial SOPs cook themselves to death when the DL code calls hou.fileReferences() (and rbdbulletsolver SOPs don’t seem immune either).

I’ve submitted a bug to SESI r.e. the crazy cooking caused by hou.fileReferences(), ticket #105440 / bug #112903.

Can you try commenting out the call to

    parms = gather_parms_to_map()
    if parms:
        pathmap_parms(tempdir, parms)

inside the hrender_dl.py to see if it changes the behavior?
It is possible that the attempt to collect the external references by calling hou.fileReferences() is causing this.

Hi everyone, original poster here.

it seems the thread picked up some steam and its good to know that I am not alone.

@Bobo, thank you for your input, I have tried commenting out the hou.fileReference() part, and just ran the test. still having a very long delay at the start of the render.

#UPDATE

for me the DEADLINE log is stuck at the same position (as the original post), and can last up to 20 min. which adds rediculous amount of overhead on render time, however, it does render.

the problem does not seem to be affecting all the projects. not sure what is triggering it, but for the time being, only one of my projects are causing a long delay to launch (for the time being, project file size over 60mb).

when I start stripping down the file (deleting stuff), the same problematic file will start rendering as usual (no delay at start). not sure what the threshold is, but I kept deleting nodes i don’t need, submit, render, delete more, submit render, and when I finally got to almost barebone scene (with literally only RS proxies), it started rendering fine.

obviously, this is not ideal for production, so really would appreciate more help.

Are your sims all cached out? Or are you seeing greater and greater “render” times as your frame range increases?

Hi everyone, sharing some findings to see if it might help with the diagnostics.

SETUP -----------------------------------------

ALL sim files are cached out, ALL rendering nodes are REDSHIFT PROXIE,
in HOUDINI GUI, once opened (opening time 5sec), starts rendering almost instaneously.

I have a single HIP file, I have about 20 nodes that does all the geometrical processing (opening multiple 500mb files and processing material etc).

have many nodes that is used to layout the scene and render. They are NOT referencing the PROCESSING nodes. they are all composed of REDSHIFT proxy. infact i went through all the nodes and removed any referencing to the PROCESSING NODES, even if it is for positioning.

i have about 50+ ROP outputs, all REDSHIFT ROPS.

this STARTING POINT was 65mb, when sent to DEADLINE, takes about 20 min just to launch.
and when rendering locally (open Houdini, open file, press render button), no load time, other than

TESTING 1 -------------------------

deleted all the ROP except the one i need to test render (this helped last time).
file size 59mb, DEADLINE load time 15min+ (i gave up after 15min)

deleted other rendering layout nodes, most of them is just redshift PROXIES
file size 58mb, DEADLINE load time 15min+ (i gave up after 15min)

deleted most of the rendering layout nodes
file size 34mb, DEADLINE load time 15min+ (i gave up after 15min)

deleted half of PROCESSING nodes (nodes that takes long time to process, BUT is not referenced or needed for actual render)
file size 28mb, DEADLINE load time 10min.

deleted all PROCESSING nodes
filesize 21mb, DEADLINE load time under 1min.

TESTING 2 ----------------------------------------

I also ran a test, where i deleted all the PROCESSING NODES only, so had all the 50+ rops and all the rendering nodes (most of them redshift proxy).
DEADLINE load time 20min+

once i deleted all ROPS as well,
FILESIZE 51mb, DEADLINE load time under 1 min.

MY SUMMARY ----------------------------------------------

it seems its not just simple file size issue. i have managed to render the same scene at about 50mb just fine through DEADLINE.

also, when i deleted all the PROCESSING NODES, still was taking a while. it was only when i deleted ROP as well, went down to under 1 min.

Hi @antoinedurr, yes, all my sim files are cached out.

sorry i just posted a long thread about my findings and my current setup.

but my actual render nodes (that i need to render) all comprise of REDSHIFT PROXIES, and just some simple keyframe animation, no SIM and i made sure there was no reference to any other nodes.

So if i delete all the nodes that is not required to render, i can get it rendering under 1min.

which is confusing, as it seems perhaps DEADLINE is literally processing the entire scene (and all its inactive nodes).

this is supported by my testing, where when i delete half of my PROCESSING NODES (that is not required to render), i did manage to shave the DEADLINE load time by half.

I’m just about to get started with debugging RS on our Windows farm. I wonder if the enviroment, i.e. env vars, are different and/or not set properly when launching on the farm. For example, if I set the DL ROP to write out .rs files and then render them as a separate job, they all fail indicating no GPU (on the same hosts that normally render RS jobs just fine). So definitely something not square there. Licensing oddities? Just throwing things out there.

Privacy | Site terms | Cookie preferences