Actually what version of redshift vs cuda version are you using ? In our case it was problem only when it was executed through deadline process and then we discoverd it was related to the method how redshift is printing messages into std and that was caused only under root user.
3.5.19, the driver is the grid driver 535.154.05 CUDA 12.2
aim is to move to newer version after current project
I am not 100% sure but i remember that this version had linux bug … we’re now on 3.6.01 and it works well with cuda 12.2 … and as i remember 3.5.24 was the one where they fix the problem
I switched to non-root account and now the file goes through but then fails at the end
I updated to latest version and same issue
2024-09-06 11:06:24: 0: STDOUT: License for redshift-core 2024.12 valid until Dec 07 2024
2024-09-06 11:06:24: 0: STDOUT: Detected change in GPU device selection
2024-09-06 11:06:26: 0: STDOUT: Creating CUDA contexts
2024-09-06 11:06:26: 0: STDOUT: CUDA init ok
2024-09-06 11:06:26: 0: STDOUT: No devices available
2024-09-06 11:06:27: 0: STDOUT: PostFX: Shut down
2024-09-06 11:06:27: 0: STDOUT: Shutdown GPU Devices...
2024-09-06 11:06:27: 0: STDOUT: Devices shut down ok
2024-09-06 11:06:27: 0: STDOUT: Shutdown Rendering Sub-Systems...
2024-09-06 11:06:27: 0: STDOUT: License returned
2024-09-06 11:06:27: 0: STDOUT: Finished Shutting down Rendering Sub-Systems
2024-09-06 11:06:27: 0: INFO: Process exit code: 1
2024-09-06 11:06:27: 0: Done executing plugin command of type 'Render Task'
Render outside of Deadline (and using local license) and it goes through fine…
License acquired
License for net.maxon.license.app.bundle_maxonone-release~commercial valid until Oct 03 2024
Detected change in GPU device selection
Creating CUDA contexts
CUDA init ok
Ordinals: { 0 }
Initializing GPUComputing module (CUDA). Active device 0
CUDA Driver Version: 12020
CUDA API Version: 11020
Device 1/1 : Tesla T4
Compute capability: 7.5
feels like it breaks after this line
2024-09-06 11:06:26: 0: STDOUT: CUDA init ok
2024-09-06 11:06:26: 0: STDOUT: No devices available
I submit the job again with GPU0 selected and it went through (using local license), releasing the mx1 license and resubmitting the job now gives a different error
2024-09-06 11:50:15: 0: STDOUT: Loading: /mnt/test/c4d/rs.rs
2024-09-06 11:50:15: 0: STDOUT: Maxon licensing error: License not activated (9)
2024-09-06 11:50:15: 0: STDOUT: Detected change in GPU device selection
2024-09-06 11:50:16: 0: STDOUT: Creating CUDA contexts
2024-09-06 11:50:16: 0: STDOUT: CUDA init ok
2024-09-06 11:50:16: 0: STDOUT: No devices available
2024-09-06 11:50:16: 0: STDOUT: PostFX: Shut down
Then you are in a different problem. There is no device available. Try nvidia-smi tool what prints out.
Somthing weird going on, rendering outside of Deadline works fine, from 3.5.20-3.6.04 I get an RLM error
2024-09-06 12:47:26: 0: STDOUT: Loading: /mnt/test/c4d/rs.rs
2024-09-06 12:47:26: 0: STDOUT: License error: Error communicating with license server (-17)
2024-09-06 12:47:26: 0: STDOUT: License error: (RLM) Communications error with license server (-17)
2024-09-06 12:47:26: 0: STDOUT: Read error from network (-105)
2024-09-06 12:47:26: 0: STDOUT: select() system call error (comm: -15)Interrupted system call (errno: 4)
2024-09-06 12:47:26: 0: STDOUT: Detected change in GPU device selection
but on 3.5.19 I don’t get this error, but it fails on exitcode 1
2024-09-06 12:51:43: 0: STDOUT: Loading: /mnt/test/c4d/rs.rs
2024-09-06 12:51:44: 0: STDOUT: License for redshift-core 2024.12 valid until Dec 07 2024
2024-09-06 12:51:44: 0: STDOUT: Detected change in GPU device selection
2024-09-06 12:51:45: 0: STDOUT: Creating CUDA contexts
2024-09-06 12:51:45: 0: STDOUT: CUDA init ok
2024-09-06 12:51:45: 0: STDOUT: No devices available
...
2024-09-06 12:51:47: 0: INFO: Process exit code: 1
but I I run the worker under a user and not service I get the error exposed as with later versions and also confirmation the license checks out.
0: STDOUT: Loading: /mnt/test/c4d/rs.rs
Port Forwarder (redshift:5054): Client connected to port forwarder.
Worker - Confirmed Credit Usage for "redshift".
0: STDOUT: License error: Error communicating with license server (-17)
0: STDOUT: License error: (RLM) Communications error with license server (-17)
0: STDOUT: Read error from network (-105)
0: STDOUT: select() system call error (comm: -15)Interrupted system call (errno: 4)
0: STDOUT: Detected change in GPU device selection
0: STDOUT: Creating CUDA contexts
0: STDOUT: CUDA init ok
0: STDOUT: No devices available
0: STDOUT: PostFX: Shut down
0: STDOUT: Shutdown GPU Devices...
0: STDOUT: Devices shut down ok
On Windows at least, GPU access is not allowed for services. I wonder how it works on Linux, perhaps some service configuration is possible?
The actual issue was the port being blocked one way to the UBL license forwarder, which was confusing. eventually changed the Global Command override to 5554 (default is 5054) and left UBL redshift at 5054. Bit annoying having the overlap!
I also moved “libredshift-core-cpu.so” to “libredshift-core-cpu.so.BKP” as recommended in another thread. So all rendering fine
Thanks all