GPU Affinity - Renderings are slower when using more workers

gutster · April 10, 2024, 8:04am

Hello,
we have a render maschine with 6x 4090, TR Pro 5955WX 16 core and 265GB RAM. We are rendering with C4D and Redshift. I though, because one 4090, on its own, is strong enough, so I can render with 6 workers, with one GPU each, so I can pump out frames. But when I have all my 6 workers running, the rendertimes are getting slower and are generally slow. For example:

My test project renders 1 frame on my workstation with a 4070 super in 1:17min. On the render maschine, with 3 workers enabled, the 4090 also does a 1:17min per frame. When I enable all 6 workers, the rendertimes go up to 2 minutes each.

I used the GPU Affinity option to select 1 GPU for each worker. But when I only run one worker, all GPUs VRAM getting loaded. Is that normal behaviour?

Derek_E_Zavada · April 10, 2024, 6:29pm

The performance doesn’t scale linearly as one might expect. Instead, it is slightly slower when compared to using a single GPU. There are several factors that could contribute to this behavior:

CPU Bottleneck: Even though there are 128 CPU threads available, the allocation of 16 threads per GPU could lead to a bottleneck. This is because not all CPU threads are equally capable of efficiently managing GPU tasks. The main thread, which coordinates the overall process, could become a bottleneck if it cannot dispatch tasks quickly enough to keep all GPUs fully utilized.
Overhead of Coordination: With multiple GPUs, there’s an increased overhead in coordinating tasks between them. This includes synchronizing data across GPUs, managing dependencies between tasks, and handling the increased complexity of distributing and collecting data. This overhead grows as more GPUs are added, potentially leading to diminishing returns on the added computational power.
Resource Contention: In a multi-GPU setup, GPUs may compete for shared resources, such as memory bandwidth or I/O bandwidth to the CPU. This contention can cause delays, as each GPU waits its turn to access the necessary resources, leading to less than optimal utilization of the GPUs.
Software and Drivers: The efficiency of multi-GPU setups also heavily depends on the software and drivers being used. If the rendering software or the drivers are not optimized for multi-GPU configurations, this can lead to poor scaling performance. Optimization for multi-GPU setups is complex and requires careful management of resources and tasks.
Thermal Throttling: In densely packed systems, heat management can become an issue. If the GPUs start to overheat, they may reduce their performance to cool down, known as thermal throttling. This can disproportionately affect multi-GPU setups where the heat generated is significantly higher than in single GPU configurations.

Karsten_Mehnert · February 6, 2025, 10:10am

When using C4D with Redshift and Deadline multi-GPU with affinity settings 1 GPU for each worker, the parallel running commandline instances overwrite each others local license checkout status. The more GPUs you have, the more worker instances you run, the more likely it becomes, the local file runs into a read and write access error, leading to more licenses checkout errors. The error handling of Deadline restarts the process and handles the recovery, but slows down rendering significantly, because commandline needs many tries before it properly aquires a license. You’ll have to to work around this, by seperating the preferences folder for each commandline instance. A JobPreLoad script will do the job. Other than that, there have been again changes between Cinema 2024 and 2025 in this area, and it is currently not working.