REDSHIFT HOUDINI multi GPU (gpu affinity) setting

emmanuel_mouillet · October 9, 2024, 5:37pm

This thread is quite technical. So for those who just need to have 2 different jobs (job_A , job_B) to render on a dual GPU machine at the same time with one GPU per job, i do it this way and it works :

i have my machine foo which is my only worker
from foo i create 2 worker instance foo_0 and foo_1
i then set in the workers instances properties GPU_affinity, with GPU_0 only for instance foo_0 and GPU_1 only for instance foo_1
i create a Pool call instances that contain only foo_0 and foo_1

in the houdini submitter for my 2 jobs : job_A and job_B i use the following setup :

Pool : i select the pool instances i created previously
concurent task : 1 ( default )
frame per task : 9999999
GPU per task : 0 ( default )

with those setups it works i have one job per GPU at full power for each Job. All other test i’ve done ended with bad GPU dispatching with jobs trying to use the same GPU

i just post my result here, cause the thread is quite hard to follow for non technical people

Cheers

emmanuel_mouillet · October 10, 2024, 9:24am

As extra infos , here is my test to describe the problem :

I have one machine only name foo, so 1 worker with 2 x 3090.
I create 2 workers instances foo_0 and foo_1 and put them in a pool instances,
if i submit 2 jobs ( job_A and job_B ) with those option :

test_1

test_2

In both case it doesn’t work deadline can’t dispatch the jobs properly they both try to use the same GPU. for test_1 it’s logic , but i don’t understand why test_2 is not working. It should theoritically work.

But in both scenario GPU_0 is at full load at 1850Mhz and GPU_1 is irregular always oscillating between long period at 400Mhz and quick period at 1850Mhz.

I think the source of the problem is redshift preferences.xml that overide everything .

C:\ProgramData\redshift\preferences.xml

This line :

preference name=“SelectedComputeDevices” type=“string” value=“0:NVIDIA GeForce RTX 3090,”

Deadline is overrided by this, it can’t think and say “ohh this gpu is busy let’s use the other one that would be more clever”. It is just brute force to use GPU_0 because of this file.

So the only 2 solutions i have is either :

use the GPU affinity in the worker instance foo_0 set to GPU_0 and foo_1 set to GPU_1
use a python pre-render script in each Redshift ROP before submitting to deadline to specify which GPU to use

hou.hscript(‘setenv REDSHIFT_GPUDEVICES=0; varchange REDSHIFT_GPUDEVICES’) # use GPU 0
hou.hscript(‘setenv REDSHIFT_GPUDEVICES=1; varchange REDSHIFT_GPUDEVICES’) # use GPU 1

But in any case we are forced to brute force the GPU assignation because it can’t be done cleverly.
Nevertheless i am not sure it’s a deadline issue ?