DEADLINE takes long time to start the job, stuck at certain phase for about 10 min

Hi team, was wondering if anybody else experienced this, or know a solution to this issue.

i am using HOUDINI + REDSHIFT.
the scene renders fine locally.

when i submit the job, the job gets submitted without any issue. and will proceed to below point, and will be stuck there for about 10 min, and when it eventually gets going, everything renders fine. till next task, will get stuck on the same position again.

not sure if it is related problem, but when i submit the same job, but with concurrent jobs set to six concurrent tasks (i have 6 GPU installed) it will be stuck here for even longer (so long that i had to force stop the job), where as if i submit a single job, its stuck for about 10 min?

i can’t see any error messages or any hint at what could be causing this delay.

would appreciate any help! thanks!

--------- from log -----------------------

2020-11-18 00:52:27: 0: INFO: Starting Houdini Job
2020-11-18 00:52:27: 0: INFO: Stdout Redirection Enabled: True
2020-11-18 00:52:27: 0: INFO: Stdout Handling Enabled: True
2020-11-18 00:52:27: 0: INFO: Popup Handling Enabled: True
2020-11-18 00:52:27: 0: INFO: QT Popup Handling Enabled: False
2020-11-18 00:52:27: 0: INFO: WindowsForms10.Window.8.app.* Popup Handling Enabled: False
2020-11-18 00:52:27: 0: INFO: Using Process Tree: True
2020-11-18 00:52:27: 0: INFO: Hiding DOS Window: True
2020-11-18 00:52:27: 0: INFO: Creating New Console: False
2020-11-18 00:52:27: 0: INFO: Running as user: alteredgene
2020-11-18 00:52:27: 0: INFO: Executable: “C:\Program Files\Side Effects Software\Houdini 18.0.499\bin\Hython.exe”
2020-11-18 00:52:27: 0: INFO: Argument: “C:\ProgramData\Thinkbox\Deadline10\workers\IR-214226-7540\plugins\5fb4dee713f8ff37ac6c7770\hrender_dl.py” -f 710 719 1 -o “$HIP/render2/$OS/$OS.$F4.exr” -g -d /out/RS_testMid -tempdir “C:\ProgramData\Thinkbox\Deadline10\workers\IR-214226-7540\jobsData\5fb4dee713f8ff37ac6c7770\0_tempzX2ip0” -arnoldAbortOnLicenseFail 1 “C:/IMB_NationalMuseumExhibition/IMB-NationalMuseumExhibition.hip”
2020-11-18 00:52:27: 0: INFO: Full Command: “C:\Program Files\Side Effects Software\Houdini 18.0.499\bin\Hython.exe” “C:\ProgramData\Thinkbox\Deadline10\workers\IR-214226-7540\plugins\5fb4dee713f8ff37ac6c7770\hrender_dl.py” -f 710 719 1 -o “$HIP/render2/$OS/$OS.$F4.exr” -g -d /out/RS_testMid -tempdir “C:\ProgramData\Thinkbox\Deadline10\workers\IR-214226-7540\jobsData\5fb4dee713f8ff37ac6c7770\0_tempzX2ip0” -arnoldAbortOnLicenseFail 1 “C:/IMB_NationalMuseumExhibition/IMB-NationalMuseumExhibition.hip”
2020-11-18 00:52:27: 0: INFO: Startup Directory: “C:\Program Files\Side Effects Software\Houdini 18.0.499\bin”
2020-11-18 00:52:27: 0: INFO: Process Priority: BelowNormal
2020-11-18 00:52:27: 0: INFO: Process Affinity: default
2020-11-18 00:52:27: 0: INFO: Process is now running
2020-11-18 00:52:32: 0: STDOUT: [Redshift] Redshift for Houdini plugin version 3.0.30 (Sep 26 2020 14:49:18)
2020-11-18 00:52:32: 0: STDOUT: [Redshift] Plugin compile time HDK version: 18.0.499
2020-11-18 00:52:32: 0: STDOUT: [Redshift] Houdini host version: 18.0.499
2020-11-18 00:52:32: 0: STDOUT: [Redshift] Plugin dso/dll and config path: C:/ProgramData/Redshift/Plugins/Houdini/18.0.499/dso
2020-11-18 00:52:32: 0: STDOUT: [Redshift] Core data path: C:\ProgramData\Redshift
2020-11-18 00:52:32: 0: STDOUT: [Redshift] Local data path: C:\ProgramData\Redshift
2020-11-18 00:52:32: 0: STDOUT: [Redshift] Procedurals path: C:\ProgramData\Redshift\Procedurals
2020-11-18 00:52:32: 0: STDOUT: [Redshift] Preferences file path: C:\ProgramData\Redshift\preferences.xml
2020-11-18 00:52:32: 0: STDOUT: [Redshift] License path: C:\ProgramData\Redshift
2020-11-18 00:52:35: 0: STDOUT: Detected Houdini version: (18, 0, 499)
2020-11-18 00:52:35: 0: STDOUT: [‘C:\ProgramData\Thinkbox\Deadline10\workers\IR-214226-7540\plugins\5fb4dee713f8ff37ac6c7770\hrender_dl.py’, ‘-f’, ‘710’, ‘719’, ‘1’, ‘-o’, ‘$HIP/render2/$OS/$OS.$F4.exr’, ‘-g’, ‘-d’, ‘/out/RS_testMid’, ‘-tempdir’, ‘C:\ProgramData\Thinkbox\Deadline10\workers\IR-214226-7540\jobsData\5fb4dee713f8ff37ac6c7770\0_tempzX2ip0’, ‘-arnoldAbortOnLicenseFail’, ‘1’, ‘C:/IMB_NationalMuseumExhibition/IMB-NationalMuseumExhibition.hip’]
2020-11-18 00:52:35: 0: STDOUT: Start: 710
2020-11-18 00:52:35: 0: STDOUT: End: 719
2020-11-18 00:52:35: 0: STDOUT: Increment: 1
2020-11-18 00:52:35: 0: STDOUT: Ignore Inputs: True
2020-11-18 00:52:35: 0: STDOUT: Output: $HIP/render2/$OS/$OS.$F4.exr
2020-11-18 00:52:35: 0: STDOUT: Driver: /out/RS_testMid
2020-11-18 00:52:35: 0: STDOUT: Input File: C:/IMB_NationalMuseumExhibition/IMB-NationalMuseumExhibition.hip
2020-11-18 00:52:56: 0: STDOUT: Unknown command: verification_id
2020-11-18 00:52:56: 0: STDOUT: Unknown command: license_id
2020-11-18 00:52:56: 0: STDOUT: Unknown command: lock
2020-11-18 00:52:56: 0: STDOUT: Unknown command: product_id
2020-11-18 00:52:56: 0: STDOUT: Unknown command: server_platform
2020-11-18 00:52:56: 0: STDOUT: Unknown command: support_expiry
2020-11-18 00:52:56: 0: STDOUT: Unknown command: houdini_version
2020-11-18 00:52:56: 0: STDOUT: Unknown command: available
2020-11-18 00:52:56: 0: STDOUT: Unknown command: count
2020-11-18 00:52:56: 0: STDOUT: Unknown command: ip_mask
2020-11-18 00:52:56: 0: STDOUT: Unknown command: display
2020-11-18 00:52:56: 0: STDOUT: Unknown command: }

I’m always a bit wary running concurrent tasks with GPU’s

you may submit 6 concurrent tasks, each using a single card, but I’m never sure what determines which card they use, or whether they’re all jumping on the first one.

If I was running 6 cards, I’d likely use 2x workers with 3 card affinity, or 3x workers with 2 card affinity. Then submit jobs with GPU limit but which will be assigned cards via the worker

I’d recommend using a GPU monitoring tool like Afterburner from MSI, you can monitor which cards are being picked up.

hey Anthony! thank you for your advice. ye i had good success with 2 concurrent task (2 GPU, 1 GPU per task), it really scaled linearly. but i think your advice on paring 2 to 3 per task makes good sense.

that being said, above issue actually happens even with a single task (no GPU affinity), so just a simple straight forward deadline submission will have this lag adding about 5 min per task (lag is about 5 min long)

do you have the same lag submitting a commandline render without Deadline?

Are you able to switch the debug log on to see if redshift outputs any more info?

Are you able to monitor the GPU to see if it’s loading into vram etc?

I prefer exporting to standalone in whatever application or renderer i’m using, I know it’s not always quicker down to the export process.

I saw GridMarkets had some nice tips on submitting houdini/redshift, not sure if it’s of any use, I’ve not tested it out

hmm interesting tips again!!

will look into those. for the time being, things i know

GPU vram was not being loaded (or the cuda cores being used).
will need to look at the redshift logs as well,

will post an update here when i find out! thank you for your suggestions Anthony!

Having this exact same issue.

Haven’t tried adjusting concurrent tasks, but given I’m only running two GPUs per machine (only two machines), and don’t have anywhere near the same bottleneck when rendering locally in Houdini, I’m very curious to see what’s causing the hang-up here…

network?

Disk

We recommend using fast SSD drives. Redshift automatically converts textures (JPG, EXR, PNG, TIFF, etc) to its own texture format which is faster to load and use during rendering. Those converted textures are stored in a local drive folder. We recommend using an SSD for that texture cache folder so that, during rendering, the converted texture files can be opened fast. Redshift can optionally not do any of this caching and simply open textures from their original location (even if that is a network folder), but we don’t recommend this. For more information on the texture cache folder, please read the online documentation.

To recap:

** Prefer SSDs to mechanical hard disks*

Network and NAS

Redshift can render several times faster than CPU renderers. This means that the burden on your network can be higher too, just like it would be if you were adding lots more render nodes! As mentioned above, Redshift caches textures to the local disk so it won’t try to load textures through the network over and over again (it will only do it if the texture changes). However, other files (like Redshift proxies) are not locally cached so they will be accessed over the network repeatedly. Fast networks and networked-attached-storage (NAS) typically work fine in this scenario.

However, there have been a few cases where users reported extremely low performance with certain NAS solutions. Since there are many NAS products available in the market, we strongly recommend thoroughly testing your chosen NAS with large Redshift proxies over the network. For example, try exporting a large Redshift proxy containing 30 million triangles or so (a tessellated sphere would do), save it in a network folder and then try using it in a scene both through a network path and also through a local file - and measure the rendering performance difference between the two.

To recap:

** Rendering with Redshift is like rendering with lots of machines. It might put a strain on your network.*
** Thoroughly test your network storage solution! Some of them have performance issues!*

SSDs… SSDs everywhere…

Could be a network bottleneck? But in the bit of testing I’ve done, even pulling source scene/caches from my server and rendering locally on a workstation is still heaps faster than submitting a job to the farm.

Privacy | Site terms | Cookie preferences