AWS Thinkbox Discussion Forums

Redshift job failure - all the server in the same time

here with our renderfarm we have a big issue with some frame of a deadline job - we have made all attempts but always the same results: job failure in all the server at the same time.
we update redshift… the nvidia driver… all… but the results is the same… this is the error we have:

Reason

Task rendering stopped due to external cancellation

=======================================================
Log

2020-09-21 10:40:19: 0: Loading Job’s Plugin timeout is Disabled
2020-09-21 10:40:22: 0: Executing plugin command of type ‘Sync Files for Job’
2020-09-21 10:40:22: 0: All job files are already synchronized
2020-09-21 10:40:22: 0: Plugin Redshift was already synchronized.
2020-09-21 10:40:22: 0: Done executing plugin command of type ‘Sync Files for Job’
2020-09-21 10:40:22: 0: Executing plugin command of type ‘Initialize Plugin’
2020-09-21 10:40:23: 0: INFO: Executing plugin script ‘C:\Users\cyan\AppData\Local\Thinkbox\Deadline10\slave\Server1-3\plugins\5f64f4532ddc462420960860\Redshift.py’
2020-09-21 10:40:23: 0: INFO: About: Redshift Plugin for Deadline
2020-09-21 10:40:23: 0: INFO: Render Job As User disabled, running as current user ‘cyan’
2020-09-21 10:40:23: 0: INFO: The job’s environment will be merged with the current environment before rendering
2020-09-21 10:40:23: 0: Done executing plugin command of type ‘Initialize Plugin’
2020-09-21 10:40:23: 0: Start Job timeout is disabled.
2020-09-21 10:40:23: 0: Task timeout is disabled.
2020-09-21 10:40:23: 0: Loaded job: HAND_SCENA_DOWN_MID_setting (5f64f4532ddc462420960860)
2020-09-21 10:40:23: 0: Executing plugin command of type ‘Start Job’
2020-09-21 10:40:23: 0: INFO: Executing global asset transfer preload script ‘C:\Users\cyan\AppData\Local\Thinkbox\Deadline10\slave\Server1-3\plugins\5f64f4532ddc462420960860\GlobalAssetTransferPreLoad.py’
2020-09-21 10:40:23: 0: INFO: Looking for AWS Portal File Transfer…
2020-09-21 10:40:23: 0: INFO: Looking for File Transfer controller in C:/Program Files/Thinkbox/S3BackedCache/bin/task.py…
2020-09-21 10:40:23: 0: INFO: Could not find AWS Portal File Transfer.
2020-09-21 10:40:23: 0: INFO: AWS Portal File Transfer is not installed on the system.
2020-09-21 10:40:23: 0: INFO: Executing global job preload script ‘C:\Users\cyan\AppData\Local\Thinkbox\Deadline10\slave\Server1-3\plugins\5f64f4532ddc462420960860\GlobalJobPreLoad.py’
2020-09-21 10:40:23: 0: Done executing plugin command of type ‘Start Job’
2020-09-21 10:40:23: 0: Plugin rendering frame(s): 299
2020-09-21 10:40:23: 0: Executing plugin command of type ‘Render Task’
2020-09-21 10:40:23: 0: INFO: Stdout Redirection Enabled: True
2020-09-21 10:40:23: 0: INFO: Stdout Handling Enabled: True
2020-09-21 10:40:23: 0: INFO: Popup Handling Enabled: True
2020-09-21 10:40:23: 0: INFO: QT Popup Handling Enabled: False
2020-09-21 10:40:23: 0: INFO: WindowsForms10.Window.8.app.* Popup Handling Enabled: False
2020-09-21 10:40:23: 0: INFO: Using Process Tree: True
2020-09-21 10:40:23: 0: INFO: Hiding DOS Window: True
2020-09-21 10:40:23: 0: INFO: Creating New Console: False
2020-09-21 10:40:23: 0: INFO: Running as user: cyan
2020-09-21 10:40:23: 0: INFO: Executable: “C:/ProgramData/Redshift/bin/redshiftCmdLine.exe”
2020-09-21 10:40:23: 0: INFO: The Worker is overriding the GPUs to render, so the following GPUs will be used: 5,6
2020-09-21 10:40:23: 0: INFO: Argument: “R:\Redshift_PROXY\Argetum_GIN\Scena_Sotto\scena_sotto_quality_medium.0299.rs” -oip “G:\3-Projects\44-Gassenhof_Argintum\MayaFiles\images\ENV_DOWN_midSetting” -gpu 5 -gpu 6
2020-09-21 10:40:23: 0: INFO: Full Command: “C:/ProgramData/Redshift/bin/redshiftCmdLine.exe” “R:\Redshift_PROXY\Argetum_GIN\Scena_Sotto\scena_sotto_quality_medium.0299.rs” -oip “G:\3-Projects\44-Gassenhof_Argintum\MayaFiles\images\ENV_DOWN_midSetting” -gpu 5 -gpu 6
2020-09-21 10:40:23: 0: INFO: Startup Directory: “C:\ProgramData\Redshift\bin”
2020-09-21 10:40:23: 0: INFO: Process Priority: BelowNormal
2020-09-21 10:40:23: 0: INFO: Process Affinity: default
2020-09-21 10:40:23: 0: INFO: Process is now running
2020-09-21 10:40:28: 0: STDOUT: Redshift Command-Line Renderer (version 3.0.28 - API: 3023)
2020-09-21 10:40:28: 0: STDOUT: Copyright 2020 Redshift Rendering Technologies
2020-09-21 10:40:28: 0: STDOUT: Querying texture cache buget from preferences.xml: 32 GB
2020-09-21 10:40:28: 0: STDOUT: Querying cache path from preferences.xml: %LOCALAPPDATA%\Redshift\Cache
2020-09-21 10:40:28: 0: STDOUT: Invalid GPU ID 6 specified. Ignoring
2020-09-21 10:40:28: 0: STDOUT: Creating cache path C:\Users\cyan\AppData\Local\Redshift\Cache
2020-09-21 10:40:28: 0: STDOUT: Enforcing shader cache budget…
2020-09-21 10:40:28: 0: STDOUT: Enforcing texture cache budget…
2020-09-21 10:40:28: 0: STDOUT: Collecting files…
2020-09-21 10:40:28: 0: STDOUT: Total size for 1103 files 32761.93MB (budget 32768.00MB)
2020-09-21 10:40:28: 0: STDOUT: Under budget. Done.
2020-09-21 10:40:28: 0: STDOUT: Creating mesh cache…
2020-09-21 10:40:28: 0: STDOUT: Done
2020-09-21 10:40:28: 0: STDOUT: Redshift Initialized
2020-09-21 10:40:28: 0: STDOUT: Version: 3.0.28, Sep 9 2020
2020-09-21 10:40:28: 0: STDOUT: Windows Platform (Windows 10 Pro)
2020-09-21 10:40:28: 0: STDOUT: Release Build
2020-09-21 10:40:28: 0: STDOUT: Number of CPU HW threads: 48
2020-09-21 10:40:28: 0: STDOUT: CPU speed: 2.30 GHz
2020-09-21 10:40:28: 0: STDOUT: Total system memory: 127.65 GB
2020-09-21 10:40:28: 0: STDOUT: TDR delay: 60s
2020-09-21 10:40:28: 0: STDOUT: Driver version: 456.38
2020-09-21 10:40:28: 0: STDOUT: Current working dir: C:\ProgramData\Redshift\bin
2020-09-21 10:40:28: 0: STDOUT: Creating CUDA contexts
2020-09-21 10:40:28: 0: STDOUT: CUDA init ok
2020-09-21 10:40:28: 0: STDOUT: Ordinals: { 5 }
2020-09-21 10:40:29: 0: STDOUT: Initializing GPUComputing module (CUDA). Ordinal 5
2020-09-21 10:40:29: 0: STDOUT: CUDA Driver Version: 11010
2020-09-21 10:40:29: 0: STDOUT: CUDA API Version: 11000
2020-09-21 10:40:29: 0: STDOUT: Device 6/6 : GeForce RTX 2080 Ti
2020-09-21 10:40:29: 0: STDOUT: Compute capability: 7.5
2020-09-21 10:40:29: 0: STDOUT: Num multiprocessors: 68
2020-09-21 10:40:29: 0: STDOUT: PCI busID: 178, deviceID: 0, domainID: 0
2020-09-21 10:40:29: 0: STDOUT: Theoretical memory bandwidth: 616.000000 GB/Sec
2020-09-21 10:40:29: 0: STDOUT: Measured PCIe bandwidth (pinned CPU->GPU): 10.818015 GB/s
2020-09-21 10:40:29: 0: STDOUT: Measured PCIe bandwidth (pinned GPU->CPU): 11.842785 GB/s
2020-09-21 10:40:29: 0: STDOUT: Measured PCIe bandwidth (paged CPU->GPU): 2.840157 GB/s
2020-09-21 10:40:29: 0: STDOUT: Measured PCIe bandwidth (paged GPU->CPU): 3.144429 GB/s
2020-09-21 10:40:29: 0: STDOUT: Estimated GPU->CPU latency (0): 0.070238 ms
2020-09-21 10:40:29: 0: STDOUT: Estimated GPU->CPU latency (1): 0.060496 ms
2020-09-21 10:40:29: 0: STDOUT: Estimated GPU->CPU latency (2): 0.055275 ms
2020-09-21 10:40:29: 0: STDOUT: Estimated GPU->CPU latency (3): 0.056098 ms
2020-09-21 10:40:29: 0: STDOUT: New CUDA context created
2020-09-21 10:40:29: 0: STDOUT: Available memory: 9995.6250 MB out of 11264.0000 MB
2020-09-21 10:40:29: 0: STDOUT: Determining peer-to-peer capability (NVLink or PCIe)
2020-09-21 10:40:29: 0: STDOUT: Done
2020-09-21 10:40:29: 0: STDOUT: OptiX denoiser init…
2020-09-21 10:40:29: 0: STDOUT: Selecting device
2020-09-21 10:40:29: 0: STDOUT: Selected device GeForce RTX 2080 Ti (ordinal 0)
2020-09-21 10:40:30: 0: STDOUT: OptixRT init…
2020-09-21 10:40:30: 0: STDOUT: Load/set programs
2020-09-21 10:40:30: 0: STDOUT: Ok!
2020-09-21 10:40:30: 0: STDOUT: Loading Redshift procedural extensions…
2020-09-21 10:40:30: 0: STDOUT: From path: C:\ProgramData\Redshift\Procedurals
2020-09-21 10:40:30: 0: STDOUT: Done!
2020-09-21 10:40:30: 0: STDOUT: Loading: R:\Redshift_PROXY\Argetum_GIN\Scena_Sotto\scena_sotto_quality_medium.0299.rs
2020-09-21 10:40:31: 0: STDOUT: =================================================================================================
2020-09-21 10:40:31: 0: STDOUT: Rendering frame 299…
2020-09-21 10:40:31: 0: STDOUT: AMM enabled
2020-09-21 10:40:31: 0: STDOUT: =================================================================================================
2020-09-21 10:40:31: 0: STDOUT: License acquired
2020-09-21 10:40:31: 0: STDOUT: License for redshift-core 2020.11 (permanent)
2020-09-21 10:40:31: 0: STDOUT: 67ms
2020-09-21 10:40:31: 0: STDOUT: Preparing ray tracing hierarchy for meshes
2020-09-21 10:40:31: 0: STDOUT: Time to process 0 meshes: 0ms
2020-09-21 10:40:32: 0: STDOUT: HID rehost=1f2bdd4626cad2c613ef80e54bf1e75a4fa66fb3.0
2020-09-21 10:40:32: 0: STDOUT: Processing 13 textures
2020-09-21 10:40:32: 0: STDOUT: Time to process textures: 0.187233 seconds
2020-09-21 10:40:32: 0: STDOUT: Preparing materials and shaders
2020-09-21 10:40:33: 0: STDOUT: Time to process all materials and shaders: 1.225720 seconds
2020-09-21 10:40:33: 0: STDOUT: Allocating GPU mem…(device 0)
2020-09-21 10:40:34: 0: STDOUT: Done (Allocator size: 9232 MB. CUDA reported free mem before: 9504 MB, after: 272 MB)
2020-09-21 10:40:34: 0: STDOUT: Allocating GPU mem for ray tracing hierarchy processing
2020-09-21 10:40:34: 0: STDOUT: Allocating VRAM for device 0 (GeForce RTX 2080 Ti)
2020-09-21 10:40:34: 0: STDOUT: Redshift can use up to 9232 MB
2020-09-21 10:40:34: 0: STDOUT: Fixed: 0 MB
2020-09-21 10:40:34: 0: STDOUT: Geo: 8214 MB, Tex: 255 MB, Rays: 761 MB, NRPR: 262144
2020-09-21 10:40:34: 0: STDOUT: Done! ( 23ms). Compute API reported free mem: 272 MB
2020-09-21 10:40:34: 0: STDOUT: Ray Tracing Hierarchy Info:
2020-09-21 10:40:34: 0: STDOUT: Max depth: 128. MaxNumLeafPrimitives: 8
2020-09-21 10:40:34: 0: STDOUT: Extents: (-70.060989 -72.444077 -115.194969) - (81.009636 30.083845 76.954109)
2020-09-21 10:40:45: 0: STDOUT: Time to create tree: 11558 ms (1 6204 5352)
2020-09-21 10:40:45: 0: STDOUT: Rendering blocks… (resolution: 1920x1080, block size: 256, unified minmax: [16,8192])
2020-09-21 10:40:45: 0: STDOUT: Allocating VRAM for device 0 (GeForce RTX 2080 Ti)
2020-09-21 10:40:45: 0: STDOUT: Redshift can use up to 9232 MB
2020-09-21 10:40:45: 0: STDOUT: Fixed: 1 MB
2020-09-21 10:40:45: 0: STDOUT: Geo: 433 MB, Tex: 1352 MB, Rays: 5281 MB, NRPR: 1390528
2020-09-21 10:40:45: 0: STDOUT: Done! ( 31ms). Compute API reported free mem: 272 MB
2020-09-21 10:49:55: 0: STDOUT: Block 1/40 (3,1) rendered by GPU 0 in 549733ms
2020-09-21 10:51:14: 0: STDOUT: Allocated emergency memory for deep recursion!
2020-09-21 10:55:25: 0: Executing plugin command of type ‘Cancel Task’
2020-09-21 10:55:25: 0: Done executing plugin command of type ‘Cancel Task’
2020-09-21 10:55:25: 0: Done executing plugin command of type ‘Render Task’
2020-09-21 10:55:25: 0: In the process of canceling current task: ignoring exception thrown by PluginLoader

=======================================================
Details

Date: 2020/09/21 10:53:51
Frames: 299
Elapsed Time: 00:00:15:09
Slave Name: Server1-3

How many cards are in the machine and how are they split between the slaves?

they are 7 cards splitter by slave with GPU affinity…

houdini + redshift
I have the same issue…one half of a job has been rendered ok but another failed…
what the hell is this external cancellation is ?
everything worked fine until this moment
i’m shure - my scene is ok.

can you try disable gpu affinity? just for a test?

i dont like this line :slight_smile:
2020-09-21 10:51:14: 0: STDOUT: Allocated emergency memory for deep recursion!

Privacy | Site terms | Cookie preferences