hello we got client errors on some machines with 4 Nvidia 3060 which have gpu affinity so every card is rendering 1 frame but since some weeks if all 4 gpus are rendering they are crashing and then failing without any reason because the other clients are rendering without problems.
with maya redshift there is now problem only with houdini redshift.
windows 10
deadline 10.3.0.15
houdini 20.5
2025-02-06 16:31:57: Scheduler Thread - Job’s Limit Groups:
2025-02-06 16:31:57: Scheduler Thread - Skipping K: because it is already mapped
2025-02-06 16:31:57: 0: Loading Job’s Plugin timeout is Disabled
2025-02-06 16:31:57: 0: SandboxedPlugin: Render Job As User disabled, running as current user ‘admin’
2025-02-06 16:31:59: All job files are already synchronized
2025-02-06 16:31:59: Plugin Redshift was already synchronized.
2025-02-06 16:31:59: 0: Executing plugin command of type ‘Initialize Plugin’
2025-02-06 16:31:59: 0: INFO: Executing plugin script ‘C:\ProgramData\Thinkbox\Deadline10\workers\XXXXXX\plugins\67a4d435223240a596f4b381\Redshift.py’
2025-02-06 16:31:59: 0: INFO: Plugin execution sandbox using Python version 3
2025-02-06 16:31:59: 0: INFO: About: Redshift Plugin for Deadline
2025-02-06 16:31:59: 0: INFO: The job’s environment will be merged with the current environment before rendering
2025-02-06 16:31:59: 0: Done executing plugin command of type ‘Initialize Plugin’
2025-02-06 16:31:59: 0: Start Job timeout is disabled.
2025-02-06 16:31:59: 0: Task timeout is 10800 seconds (Regular Task Timeout)
2025-02-06 16:31:59: 0: Loaded job: 02_shots-sh9020_beauty_v010_exportCamera_render (67a4d435223240a596f4b381)
2025-02-06 16:31:59: 0: Skipping K: because it is already mapped
2025-02-06 16:31:59: 0: Executing plugin command of type ‘Start Job’
2025-02-06 16:31:59: 0: DEBUG: S3BackedCache Client is not installed.
2025-02-06 16:31:59: 0: INFO: Executing global asset transfer preload script ‘C:\ProgramData\Thinkbox\Deadline10\workers\XXXXX\plugins\67a4d435223240a596f4b381\GlobalAssetTransferPreLoad.py’
2025-02-06 16:31:59: 0: INFO: Looking for legacy (pre-10.0.26) AWS Portal File Transfer…
2025-02-06 16:31:59: 0: INFO: Looking for legacy (pre-10.0.26) File Transfer controller in C:/Program Files/Thinkbox/S3BackedCache/bin/task.py…
2025-02-06 16:31:59: 0: INFO: Could not find legacy (pre-10.0.26) AWS Portal File Transfer.
2025-02-06 16:31:59: 0: INFO: Legacy (pre-10.0.26) AWS Portal File Transfer is not installed on the system.
2025-02-06 16:31:59: 0: Done executing plugin command of type ‘Start Job’
2025-02-06 16:31:59: 0: Plugin rendering frame(s): 1023
2025-02-06 16:32:00: 0: Executing plugin command of type ‘Render Task’
2025-02-06 16:32:00: 0: INFO: Stdout Redirection Enabled: True
2025-02-06 16:32:00: 0: INFO: Stdout Handling Enabled: True
2025-02-06 16:32:00: 0: INFO: Popup Handling Enabled: True
2025-02-06 16:32:00: 0: INFO: QT Popup Handling Enabled: False
2025-02-06 16:32:00: 0: INFO: WindowsForms10.Window.8.app.* Popup Handling Enabled: False
2025-02-06 16:32:00: 0: INFO: Using Process Tree: True
2025-02-06 16:32:00: 0: INFO: Hiding DOS Window: True
2025-02-06 16:32:00: 0: INFO: Creating New Console: False
2025-02-06 16:32:00: 0: INFO: Running as user: admin
2025-02-06 16:32:00: 0: INFO: File “redshiftCmdLine.exe” is not rooted, checking current directory
2025-02-06 16:32:00: 0: INFO: File “redshiftCmdLine.exe” is not rooted and is not in the current directory, checking PATH
2025-02-06 16:32:00: 0: INFO: Executable: “XXXXXXXXXXXXXXXXXXXXXXXX\Redshift\bin\redshiftCmdLine.exe”
2025-02-06 16:32:00: 0: INFO: The Worker is overriding the GPUs to render, so the following GPUs will be used: 0
2025-02-06 16:32:00: 0: INFO: Argument: “XXXXXXXXXXXXXXXXXXXXXXXX_shots_sh9020_beauty_beauty_v010.1023.rs”
2025-02-06 16:32:00: 0: INFO: Full Command: “XXXXXXXXXXXXXXXXXXXXXXXX\Redshift\bin\redshiftCmdLine.exe” “XXXXXXXXXXXXXXXXXXXXXXXX\sh9020_beauty_beauty_v010.1023.rs”
2025-02-06 16:32:00: 0: INFO: Startup Directory: “XXXXXXXXXXXXXXXXXXXXXXXX\Redshift\bin”
2025-02-06 16:32:00: 0: INFO: Process Priority: BelowNormal
2025-02-06 16:32:00: 0: INFO: Process Affinity: default
2025-02-06 16:32:00: 0: INFO: Process is now running
2025-02-06 16:32:01: 0: STDOUT: Redshift Command-Line Renderer (version 2025.2.2 - API: 202501)
2025-02-06 16:32:01: 0: STDOUT: Copyright 2024 MAXON Computer GmbH. All rights reserved.
2025-02-06 16:32:04: 0: STDOUT: No GPUs were selected in the command line, using selected compute devices from preferences.
2025-02-06 16:32:04: 0: STDOUT: Querying texture cache budget from REDSHIFT_TEXTURECACHEBUDGET: 120 GB
2025-02-06 16:32:04: 0: STDOUT: Querying cache path from preferences.xml: %LOCALAPPDATA%\Redshift\Cache
2025-02-06 16:32:04: 0: STDOUT: Creating cache path C:\Users\User\AppData\Local\Redshift\Cache
2025-02-06 16:32:04: 0: STDOUT: Enforcing shader cache budget…
2025-02-06 16:32:04: 0: STDOUT: Enforcing texture cache budget…
2025-02-06 16:32:04: 0: STDOUT: Collecting files…
2025-02-06 16:32:04: 0: STDOUT: Total size for 353 files 6007.75MB (budget 122880.00MB)
2025-02-06 16:32:04: 0: STDOUT: Under budget. Done.
2025-02-06 16:32:04: 0: STDOUT: Creating mesh cache…
2025-02-06 16:32:04: 0: STDOUT: Done
2025-02-06 16:32:04: 0: STDOUT: Overriding GPU devices due to REDSHIFT_GPUDEVICES (0)
2025-02-06 16:32:04: 0: STDOUT: Redshift Initialized
2025-02-06 16:32:04: 0: STDOUT: Version: 2025.2.2, Dec 20 2024 00:05:27 [94f6fb09]
2025-02-06 16:32:04: 0: STDOUT: Windows Platform (Windows 10 Pro)
2025-02-06 16:32:04: 0: STDOUT: Release Build
2025-02-06 16:32:04: 0: STDOUT: Number of CPU HW threads: 32
2025-02-06 16:32:04: 0: STDOUT: CPU speed: 2.99 GHz
2025-02-06 16:32:04: 0: STDOUT: Total system memory: 127.72 GB
2025-02-06 16:32:04: 0: STDOUT: TDR delay: 60s
2025-02-06 16:32:04: 0: STDOUT: Hardware-accelerated GPU scheduling not enabled
2025-02-06 16:32:04: 0: STDOUT: Driver version: [NVidia] 572.16
2025-02-06 16:32:04: 0: STDOUT: Current working dir: XXXXXXXXXXXXXXXXXXXXXXXX\Redshift\bin
2025-02-06 16:32:04: 0: STDOUT: redshift_LICENSE=5XXXXXXX
2025-02-06 16:32:04: 0: STDOUT: RLM License Search Path=C:\ProgramData\Redshift;C:\ProgramData\Maxon\RLM
2025-02-06 16:32:04: 0: STDOUT: License return timeout is disabled (license will be returned on shutdown)
2025-02-06 16:32:04: 0: STDOUT: Loading Redshift procedural extensions…
2025-02-06 16:32:04: 0: STDOUT: From path: XXXXXXXXXXXXXXXXXXXXXXXX\Redshift\Procedurals
2025-02-06 16:32:04: 0: STDOUT: Done!
2025-02-06 16:32:04: 0: STDOUT:
2025-02-06 16:32:04: 0: STDOUT: Preparing compute platforms
2025-02-06 16:32:04: 0: STDOUT: Found cuda compute library in XXXXXXXXXXXXXXXXXXXXXXXX\Redshift\bin\redshift-core-cuda-vc140.dll
2025-02-06 16:32:04: 0: STDOUT: Could not load the hip core library from XXXXXXXXXXXXXXXXXXXXXXXX\Redshift\bin\redshift-core-hip-vc140.dll
2025-02-06 16:32:04: 0: STDOUT: error: The specified module could not be found.
2025-02-06 16:32:04: 0: STDOUT: Found cpu compute library in XXXXXXXXXXXXXXXXXXXXXXXX\Redshift\bin\redshift-core-cpu-vc140.dll
2025-02-06 16:32:04: 0: STDOUT: Done
2025-02-06 16:32:04: 0: STDOUT: Creating CUDA contexts
2025-02-06 16:32:04: 0: STDOUT: CUDA init ok
2025-02-06 16:32:04: 0: STDOUT: Ordinals: { 0 }
2025-02-06 16:32:05: 0: STDOUT: Initing NVAPI and querying info…
2025-02-06 16:32:05: 0: STDOUT: Done
2025-02-06 16:32:05: 0: STDOUT: Initializing GPUComputing module (CUDA). Active device 0
2025-02-06 16:32:05: 0: STDOUT: CUDA Driver Version: 12080
2025-02-06 16:32:05: 0: STDOUT: CUDA API Version: 12040
2025-02-06 16:32:05: 0: STDOUT: Device 1/4 : NVIDIA GeForce RTX 3060
2025-02-06 16:32:05: 0: STDOUT: Compute capability: 8.6
2025-02-06 16:32:05: 0: STDOUT: Num multiprocessors: 28
2025-02-06 16:32:05: 0: STDOUT: PCI busID: 1, deviceID: 0, domainID: 0
2025-02-06 16:32:05: 0: STDOUT: Theoretical memory bandwidth: 360.048004 GB/Sec
2025-02-06 16:32:05: 0: STDOUT: Measured PCIe bandwidth (pinned CPU->GPU): 11.953486 GB/s
2025-02-06 16:32:05: 0: STDOUT: Measured PCIe bandwidth (pinned GPU->CPU): 12.152853 GB/s
2025-02-06 16:32:05: 0: STDOUT: Measured PCIe bandwidth (paged CPU->GPU): 7.491826 GB/s
2025-02-06 16:32:05: 0: STDOUT: Measured PCIe bandwidth (paged GPU->CPU): 6.590516 GB/s
2025-02-06 16:32:05: 0: STDOUT: Estimated GPU->CPU latency (0): 0.043551 ms
2025-02-06 16:32:05: 0: STDOUT: Estimated GPU->CPU latency (1): 0.044919 ms
2025-02-06 16:32:05: 0: STDOUT: Estimated GPU->CPU latency (2): 0.045080 ms
2025-02-06 16:32:05: 0: STDOUT: Estimated GPU->CPU latency (3): 0.045022 ms
2025-02-06 16:32:05: 0: STDOUT: New CUDA context created
2025-02-06 16:32:05: 0: STDOUT: Available memory: 11872.1094 MB out of 12288.0000 MB
2025-02-06 16:32:05: 0: STDOUT: Determining peer-to-peer capability (NVLink or PCIe)
2025-02-06 16:32:05: 0: STDOUT: Done
2025-02-06 16:32:05: 0: STDOUT: PostFX: Initialized
2025-02-06 16:32:06: 0: STDOUT: OptiX denoiser init…
2025-02-06 16:32:06: 0: STDOUT: Selecting device
2025-02-06 16:32:06: 0: STDOUT: Selected device NVIDIA GeForce RTX 3060 (ordinal 0)
2025-02-06 16:32:06: 0: STDOUT: OIDN Init…
2025-02-06 16:32:06: 0: STDOUT: OIDN: Using Device NVIDIA GeForce RTX 3060
2025-02-06 16:32:06: 0: STDOUT: OptixRT init…
2025-02-06 16:32:06: 0: STDOUT: Load/set programs
2025-02-06 16:32:06: 0: STDOUT: Ok!
2025-02-06 16:32:06: 0: STDOUT: Loading: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX_sh9020_beauty_beauty_v010.1023.rs
2025-02-06 16:32:07: 0: STDOUT: License acquired
2025-02-06 16:32:07: 0: STDOUT: License for net.maxon.license.app.redshift~commercial valid until Dec 09 2025
2025-02-06 16:32:07: 0: STDOUT: =================================================================================================
2025-02-06 16:32:07: 0: STDOUT: Rendering frame 1023…
2025-02-06 16:32:07: 0: STDOUT: AMM enabled
2025-02-06 16:32:07: 0: STDOUT: =================================================================================================
2025-02-06 16:32:07: 0: STDOUT: 0ms
2025-02-06 16:32:07: 0: STDOUT: Loading OCIO config using C:\PROGRA~1\SIDEEF~1\HOUDIN~1.445\packages\ocio\houdini-config-v2.1.0_aces-v1.3_ocio-v2.3.ocio
2025-02-06 16:32:07: 0: STDOUT: Could not find OCIO config. Using Redshift’s default instead
2025-02-06 16:32:07: 0: STDOUT: Full path: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.config.ocio
2025-02-06 16:32:07: 0: STDOUT: Ok
2025-02-06 16:32:07: 0: STDOUT: Creating OCIO processors between rendering and view/srgb spaces
2025-02-06 16:32:07: 0: STDOUT: Rendering space: ACEScg
2025-02-06 16:32:07: 0: STDOUT: Display: sRGB - Display
2025-02-06 16:32:07: 0: STDOUT: View: Un-tone-mapped
2025-02-06 16:32:07: 0: STDOUT: Failed to get ACEScg → Un-tone-mapped (sRGB - Display) processor from OpenColorIO config
2025-02-06 16:32:07: 0: STDOUT: ocio exception: DisplayViewTransform error. Display ‘sRGB - Display’ not found.
2025-02-06 16:32:07: 0: STDOUT: Failed to create rendering to display/view processor! Aborting render!
2025-02-06 16:32:07: 0: STDOUT: Failed to get ACEScg → Un-tone-mapped (sRGB - Display) processor from OpenColorIO config
2025-02-06 16:32:07: 0: STDOUT: ocio exception: DisplayViewTransform error. Display ‘sRGB - Display’ not found.
2025-02-06 16:32:07: 0: STDOUT: Failed to create display/view to rendering processor for RT! Aborting render!
2025-02-06 16:32:07: 0: STDOUT: Loading OCIO color space transforms for texture sampling
2025-02-06 16:32:07: 0: STDOUT: Found a suitable sRGB color space: “sRGB”
2025-02-06 16:32:07: 0: STDOUT: Found a suitable sRGB-linear color space: “scene-linear Rec.709-sRGB”
2025-02-06 16:32:07: 0: STDOUT: Failed to get ACEScg → Un-tone-mapped (sRGB - Display) processor from OpenColorIO config
2025-02-06 16:32:07: 0: STDOUT: ocio exception: DisplayViewTransform error. Display ‘sRGB - Display’ not found.
2025-02-06 16:32:08: 0: STDOUT: PostFX: Shut down
2025-02-06 16:32:08: 0: STDOUT: Shutdown GPU Devices…
2025-02-06 16:32:08: 0: STDOUT: Devices shut down ok
2025-02-06 16:32:08: 0: STDOUT: Shutdown Rendering Sub-Systems…
2025-02-06 16:32:08: 0: STDOUT: License returned
2025-02-06 16:32:08: 0: STDOUT: Finished Shutting down Rendering Sub-Systems
2025-02-06 16:32:08: 0: INFO: Process exit code: 1
2025-02-06 16:32:08: 0: Done executing plugin command of type ‘Render Task’
2025-02-06 16:32:08: 0: Executing plugin command of type ‘End Job’
2025-02-06 16:32:08: 0: Done executing plugin command of type ‘End Job’
2025-02-06 16:32:12: Scheduler Thread - Render Thread 0 threw a major error:
2025-02-06 16:32:12: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2025-02-06 16:32:12: Exception Details
2025-02-06 16:32:12: RenderPluginException – Error: Renderer returned non-zero error code, 1. Check the log for more information.
2025-02-06 16:32:12: at Deadline.Plugins.PluginWrapper.RenderTasks(Task task, String& outMessage, AbortLevel& abortLevel)
2025-02-06 16:32:12: RenderPluginException.Cause: JobError (2)
2025-02-06 16:32:12: RenderPluginException.Level: Major (1)
2025-02-06 16:32:12: RenderPluginException.HasSlaveLog: True
2025-02-06 16:32:12: RenderPluginException.SlaveLogFileName: C:\ProgramData\Thinkbox\Deadline10\logs\deadlineslave_renderthread_0-XXX-0000.log
2025-02-06 16:32:12: Exception.TargetSite: Deadline.Slaves.Messaging.PluginResponseMemento d(Deadline.Net.DeadlineMessage, System.Threading.CancellationToken)
2025-02-06 16:32:12: Exception.Data: ( )
2025-02-06 16:32:12: Exception.Source: deadline
2025-02-06 16:32:12: Exception.HResult: -2146233088
2025-02-06 16:32:12: Exception.StackTrace:
2025-02-06 16:32:12: at Deadline.Plugins.SandboxedPlugin.d(DeadlineMessage bgq, CancellationToken bgr)
2025-02-06 16:32:12: at Deadline.Plugins.SandboxedPlugin.RenderTask(Task task, CancellationToken cancellationToken)
2025-02-06 16:32:12: at Deadline.Slaves.SlaveRenderThread.c(TaskLogWriter ajv, CancellationToken ajw)
2025-02-06 16:32:12: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<