Hi, I’ve been having issues with my Cinema 4D / Redshift jobs on Deadline. When one of the nodes hit a CUDA error below:
2019-05-22 08:11:40: 0: STDOUT: Redshift Error: ======================================================================================================
2019-05-22 08:11:40: 0: STDOUT: ASSERT FAILED
2019-05-22 08:11:40: 0: STDOUT: File GPUComputing_CUDA.cpp
2019-05-22 08:11:40: 0: STDOUT: Line 4007
2019-05-22 08:11:40: 0: STDOUT: StreamSynchronize() failed (CUDA_ERROR_LAUNCH_TIMEOUT). This is possibly due to a GPU crash (device 0). Please re-render this scene with the ‘Debug Capture’ option enabled (in the Redshift ‘System’ tab) and, once you get the crash again, send the developers the log file html and bin files located in C:\ProgramData\Redshift\Log/Log.Latest.2. Thanks!
2019-05-22 08:11:40: 0: STDOUT: ======================================================================================================
2019-05-22 08:11:41: 0: WARNING: Monitored managed process Cinema4DProcess is no longer running
2019-05-22 08:11:41: 0: Done executing plugin command of type ‘Render Task’
It would cause the rest of the job to fail and stop rendering as well, even though other nodes didn’t have the same error.
My question is, why doesn’t Deadline simply assign that frame to another node after the crash. And why would it fail all other nodes and refuse to render the rest of the job?
I notice that all the frames from that point on are also assigned to the node that crashed.
Any advice / tips much appreciated, thank you!