Render machines don't use all GPUs

Rob · March 13, 2024, 7:15am

I recently discovered the fact that our render machines need almost 1,5 times more time to finish a rendering compared to our workstations.

Farm machines are equiped with two 4090ies, workstations with a single one.

Jobs are sent as it should be:
2 tasks per machine with 1 gpu per task

After some research here and there and some tips to view the log files of the render jobs and tasks, I could see that for simutaniously tasks on one machine both tasks use the same GPU. How can I solve this or is this only an issue with the log? I need to identify the problem soon as there are a lot of renderjobs need to be done in upcoming weeks.

Many thanks in advance!
Rob

anthonygelatka · March 13, 2024, 12:57pm

run two ‘Workers’ on the node and assign them 1 GPU each under the GPU affinity options.

The option to launch another instance is disabled and buried awkwardly in the user groups
Tools > Super/Power User > Manage User Groups

Then ‘Launch new Named Worker’ under Menu Items

You can then right click a Worker and launch another one

Then go to each worker properties

And select a GPU for each machine

Which render engine you are using?

Rob · March 13, 2024, 12:59pm

We use Cinema 4D as the host application and redshift as render engine.

anthonygelatka · March 13, 2024, 1:02pm

should work this way, remember it needs more RAM than VRAM, if the 4090 has 24Gb of VRAM and you have 2 of these you need more than 48Gb free RAM as Windows (and other apps) will also use some RAM.

I’d benchmark running 1 job on 2x cards and 2x jobs on 1 card each, it may be easier/quicker running single tasks across 2 cards depending on the job being processed

Rob · March 13, 2024, 1:03pm

Does this setup still work with 1 C4D/Redshift license per machine?

Rob · March 13, 2024, 1:29pm

As you can see, log files still tell the same GPU is used on different tasks.

anthonygelatka · March 13, 2024, 2:21pm

how do you have the affinity set up on each?
like this?
RENMASTER01 - GPU0
RENMASTER01-Second - GPU1

The licenses should be per host, so you can run as many instances as you like on the same box and pull 1 license of C4D CLR & RS

Rob · March 13, 2024, 2:22pm

Setup is exactly as you say.

Rob · March 14, 2024, 6:48am

After running an overnight test, I could figure out that GPU 0 doesn’t mean anything.
In the first task per render job and worker the log file lists which GPU is used over the whole job including PCI-bus and it even tells device 1/2 or 2/2.
So I can clearly say that all GPUs are running, only the master machine (where deadline repository is running) is around 12,5% slower.