RS Standalone GPU affinity bug?

Tronotrond · September 9, 2021, 1:40pm

This might very well be a RS bug, but it seems if you set GPUs per task to a higher number than GPUs available the render crashes with an array out or range error. Maybe Deadline can add a check for this before starting the render?
Running Redshift Standalone 3.0.53 on Linux on AWS.

Blockquote

2021-09-09 13:28:49: 0: STDOUT: Creating CUDA contexts
2021-09-09 13:28:49: 0: STDOUT: CUDA init ok
2021-09-09 13:28:49: 0: STDOUT: ======================================================================================================
2021-09-09 13:28:49: 0: STDOUT: ASSERT FAILED
2021-09-09 13:28:49: 0: STDOUT: File …/…/BaseLib/Common/Array.h
2021-09-09 13:28:49: 0: STDOUT: Line 353
2021-09-09 13:28:49: 0: STDOUT: CArray: Index out of range
2021-09-09 13:28:49: 0: STDOUT: ======================================================================================================
2021-09-09 13:28:49: 0: STDOUT: Call stack from …/…/BaseLib/Common/Array.h:353:
2021-09-09 13:28:49: 0: STDOUT: /usr/redshift/bin/libredshift-core.so(+0x219245) [0x7f96ded10245]
2021-09-09 13:28:49: 0: STDOUT: /usr/redshift/bin/libredshift-core.so(+0x5fdfba) [0x7f96df0f4fba]
2021-09-09 13:28:49: 0: STDOUT: /usr/redshift/bin/libredshift-core.so(+0x637e93) [0x7f96df12ee93]
2021-09-09 13:28:49: 0: STDOUT: /usr/redshift/bin/libredshift-core.so(_Z18RS_Renderer_CreatejPi+0x15b) [0x7f96dec6b7eb]
2021-09-09 13:28:49: 0: STDOUT: /usr/redshift/bin/redshiftCmdLine() [0x403d34]
2021-09-09 13:28:49: 0: STDOUT: /lib64/libc.so.6(__libc_start_main+0xea) [0x7f96dda4a0ba]
2021-09-09 13:28:49: 0: STDOUT: /usr/redshift/bin/redshiftCmdLine() [0x406536]
2021-09-09 13:28:49: 0: STDOUT: terminate called after throwing an instance of ‘char*’
2021-09-09 13:28:49: 0: INFO: Process exit code: 134

Blockquote

Also, wondering if there’s a way to more dynamically scale depending on what type of instance you get on the AWS spot market. We have both G4 12xLarge (4 GPUs) and metal (8 GPUs) in our list, depending on availability. It would be awesome to have Deadline for example use up to 4 GPUs per task, and only if the instance have more GPUs additional concurrent tasks are done.
I think it would be a very beneficial functionality for Deadline, as going from 4 to 8 GPUs on a single frame doesn’t add as much performance as just running 2x4.