AWS Thinkbox Discussion Forums

Maya Redshift Crash

Hello,

I am getting an issue with Maya Redshift crashing on the farm.

I created a sample scene with Maya that only contains a cube and a redshift light. I tried submitting the file for the MayaBatch, MayaCmd and now even the Comand Line Plugin. I am running the deadlinelauncher service on the same machine as root. The Repository pretty much uses default settings, so no “render as user” or other things.

Right now I’m simply trying to run a command like this that works fine when I enter it in a terminal:

/[...]/bin/Render -r file  -s 1001 -e 1001 -b 1 -rd "/[...]/renders" "/[...]scene_v001.ma"

When I configure a Command Line plugin job I get this error on the farm:

=======================================================
Error
=======================================================
Error: Process returned non-zero exit code '1'
   at Deadline.Plugins.PluginWrapper.RenderTasks(Task task, String& outMessage, AbortLevel& abortLevel)

=======================================================
Type
=======================================================
RenderPluginException

=======================================================
Stack Trace
=======================================================
   at Deadline.Plugins.SandboxedPlugin.d(DeadlineMessage bgx, CancellationToken bgy)
   at Deadline.Plugins.SandboxedPlugin.RenderTask(Task task, CancellationToken cancellationToken)
   at Deadline.Slaves.SlaveRenderThread.c(TaskLogWriter akc, CancellationToken akd)

=======================================================
Log
=======================================================
2024-05-08 22:24:51:  0: Loading Job's Plugin timeout is Disabled
2024-05-08 22:24:51:  0: SandboxedPlugin: Render Job As User disabled, running as current user 'root'
2024-05-08 22:24:53:  0: Executing plugin command of type 'Initialize Plugin'
2024-05-08 22:24:53:  0: INFO: Executing plugin script '/var/lib/Thinkbox/Deadline10/workers/la-cg06/plugins/663c5838232b9a682ee325d8/CommandLine.py'
2024-05-08 22:24:53:  0: INFO: Plugin execution sandbox using Python version 3
2024-05-08 22:24:53:  0: INFO: Single Frames Only: False
2024-05-08 22:24:53:  0: INFO: About: Command Line Plugin for Deadline
2024-05-08 22:24:53:  0: INFO: The job's environment will be merged with the current environment before rendering
2024-05-08 22:24:53:  0: Done executing plugin command of type 'Initialize Plugin'
2024-05-08 22:24:53:  0: Start Job timeout is disabled.
2024-05-08 22:24:53:  0: Task timeout is disabled.
2024-05-08 22:24:53:  0: Loaded job: Untitled (663c5838232b9a682ee325d8)
2024-05-08 22:24:53:  0: Executing plugin command of type 'Start Job'
2024-05-08 22:24:53:  0: DEBUG: S3BackedCache Client is not installed.
2024-05-08 22:24:53:  0: INFO: Executing global asset transfer preload script '/var/lib/Thinkbox/Deadline10/workers/la-cg06/plugins/663c5838232b9a682ee325d8/GlobalAssetTransferPreLoad.py'
2024-05-08 22:24:54:  0: INFO: Looking for legacy (pre-10.0.26) AWS Portal File Transfer...
2024-05-08 22:24:54:  0: INFO: Looking for legacy (pre-10.0.26) File Transfer controller in /opt/Thinkbox/S3BackedCache/bin/task.py...
2024-05-08 22:24:54:  0: INFO: Could not find legacy (pre-10.0.26) AWS Portal File Transfer.
2024-05-08 22:24:54:  0: INFO: Legacy (pre-10.0.26) AWS Portal File Transfer is not installed on the system.
2024-05-08 22:24:54:  0: Done executing plugin command of type 'Start Job'
2024-05-08 22:24:54:  0: Plugin rendering frame(s): 1
2024-05-08 22:24:54:  0: Executing plugin command of type 'Render Task'
2024-05-08 22:24:54:  0: INFO: Executable: /tools/development/ayon/suites/prod/bin/Render
2024-05-08 22:24:54:  0: INFO: Arguments: -r file  -s 1001 -e 1001 -b 1 -rd "/mnt/burnside/ayon/24_0002_project/aaa/aaa0020/work/lighting/renders/maya/24_0002_aaa0020_workfileLighting_v002/Main" "/mnt/burnside/ayon/24_0002_project/aaa/aaa0020/work/lighting/24_0002_aaa0020_lighting_v003.ma"
2024-05-08 22:24:54:  0: INFO: Execute in Shell: False
2024-05-08 22:24:54:  0: INFO: Invoking: Run Process
2024-05-08 22:24:54:  0: STDOUT: Starting "/tools/maya/2024.2/maya/bin/maya"
2024-05-08 22:24:55:  0: STDOUT: QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/var/tmp/runtime-root'
2024-05-08 22:24:55:  0: STDOUT: This plugin does not support createPlatformOpenGLContext!
2024-05-08 22:24:56:  0: STDOUT: [Redshift] Redshift for Maya 2024
2024-05-08 22:24:56:  0: STDOUT: [Redshift] Version 3.5.22, Dec  7 2023
2024-05-08 22:24:57:  0: STDOUT: Installing Redshift MEL overrides...
2024-05-08 22:24:57:  0: STDOUT: 	Sourcing /tools/redshift/3.5.22/redshift/redshift4maya/common/scripts/override/2024/MLdeleteUnused.mel
2024-05-08 22:24:57:  0: STDOUT: 	Sourcing /tools/redshift/3.5.22/redshift/redshift4maya/common/scripts/override/2024/connectNodeToAttrOverride.mel
2024-05-08 22:24:57:  0: STDOUT: 	Sourcing /tools/redshift/3.5.22/redshift/redshift4maya/common/scripts/override/2024/createMayaSoftwareCommonGlobalsTab.mel
2024-05-08 22:24:57:  0: STDOUT: 	Sourcing /tools/redshift/3.5.22/redshift/redshift4maya/common/scripts/override/2024/createRenderNode.mel
2024-05-08 22:24:57:  0: STDOUT: 	Sourcing /tools/redshift/3.5.22/redshift/redshift4maya/common/scripts/override/2024/mayaBatchRender.mel
2024-05-08 22:24:57:  0: STDOUT: 	Sourcing /tools/redshift/3.5.22/redshift/redshift4maya/common/scripts/override/2024/relationshipEditor.mel
2024-05-08 22:24:57:  0: STDOUT: 	Sourcing /tools/redshift/3.5.22/redshift/redshift4maya/common/scripts/override/2024/renderWindowPanel.mel
2024-05-08 22:24:57:  0: STDOUT: 	Sourcing /tools/redshift/3.5.22/redshift/redshift4maya/common/scripts/override/2024/renderWithCurrentRenderer.mel
2024-05-08 22:24:57:  0: STDOUT: Warning: file: /tools/maya/2024.2/maya/scripts/others/registerPluginResource.mel line 81: Plug-in "mayaUsdPlugin" resource identifier "kUsdStage" already registered, it will be replaced
2024-05-08 22:24:57:  0: STDOUT: Warning: line 1: displayRGBColor is unavailable in batch mode
2024-05-08 22:24:57:  0: STDOUT: File read in  1.2 seconds.
2024-05-08 22:24:57:  0: STDOUT: Result: /mnt/burnside/ayon/24_0002_project/aaa/aaa0020/work/lighting/24_0002_aaa0020_lighting_v003.ma
2024-05-08 22:24:58:  0: STDOUT: [Redshift] Initializing render thread...
2024-05-08 22:24:58:  0: STDOUT: [Redshift] Initializing Redshift...
2024-05-08 22:24:58:  0: STDOUT: [Redshift] Querying texture cache budget from preferences.xml: 32 GB
2024-05-08 22:24:58:  0: STDOUT: [Redshift] Querying cache path from preferences.xml: $REDSHIFT_LOCALDATAPATH/cache
2024-05-08 22:24:58:  0: STDOUT: [Redshift] Creating cache path /root/redshift/cache
2024-05-08 22:24:58:  0: STDOUT: [Redshift] 	Enforcing shader cache budget...
2024-05-08 22:24:58:  0: STDOUT: [Redshift] 	Enforcing texture cache budget...
2024-05-08 22:24:58:  0: STDOUT: [Redshift] 		Collecting files...
2024-05-08 22:24:58:  0: STDOUT: [Redshift] 		Total size for 0 files 0.00MB (budget 32768.00MB)
2024-05-08 22:24:58:  0: STDOUT: [Redshift] 		Under budget. Done.
2024-05-08 22:24:58:  0: STDOUT: [Redshift] 	Creating mesh cache...
2024-05-08 22:24:58:  0: STDOUT: [Redshift] 	Done
2024-05-08 22:24:58:  0: STDOUT: [Redshift] Cache path: /root/redshift/cache
2024-05-08 22:24:58:  0: STDOUT: [Redshift] 
2024-05-08 22:24:58:  0: STDOUT: [Redshift] Redshift Initialized
2024-05-08 22:24:58:  0: STDOUT: [Redshift] 	Version: 3.5.22, Dec  7 2023 00:43:36 [ab984afb]
2024-05-08 22:24:58:  0: STDOUT: [Redshift] 	Linux Platform
2024-05-08 22:24:58:  0: STDOUT: [Redshift] 	Release Build
2024-05-08 22:24:58:  0: STDOUT: [Redshift] 	Number of CPU HW threads: 32
2024-05-08 22:24:58:  0: STDOUT: [Redshift] 	CPU speed: 4.00 GHz
2024-05-08 22:24:58:  0: STDOUT: [Redshift] 	Total system memory: 251.70 GB
2024-05-08 22:24:58:  0: STDOUT: [Redshift] 	Current working dir: /tools/deadline/10.3.2.1/deadline/bin
2024-05-08 22:24:58:  0: STDOUT: [Redshift] redshift_LICENSE=5053@nyserver
2024-05-08 22:24:58:  0: STDOUT: [Redshift] RLM License Search Path=/root/redshift:/etc/opt/maxon/rlm
2024-05-08 22:24:58:  0: STDOUT: [Redshift] License return timeout is disabled (license will be returned on shutdown)
2024-05-08 22:24:58:  0: STDOUT: [Redshift] 
2024-05-08 22:24:58:  0: STDOUT: [Redshift] Loading Redshift procedural extensions...
2024-05-08 22:24:58:  0: STDOUT: [Redshift] 	From path: /tools/redshift/3.5.22/redshift/procedurals/
2024-05-08 22:24:58:  0: STDOUT: [Redshift] 	Done!
2024-05-08 22:24:58:  0: STDOUT: [Redshift]  
2024-05-08 22:24:58:  0: STDOUT: [Redshift] Preparing compute platforms
2024-05-08 22:24:58:  0: STDOUT: [Redshift] 	Found CUDA compute library in /tools/redshift/3.5.22/redshift/bin/libredshift-core-cuda.so
2024-05-08 22:24:58:  0: STDOUT: [Redshift] 	Found CPU compute library in /tools/redshift/3.5.22/redshift/bin/libredshift-core-cpu.so
2024-05-08 22:24:58:  0: STDOUT: [Redshift] 	Done
2024-05-08 22:24:58:  0: STDOUT: [Redshift] Creating CUDA contexts
2024-05-08 22:24:58:  0: STDOUT: [Redshift] 	CUDA init ok
2024-05-08 22:24:58:  0: STDOUT: [Redshift] 	Ordinals: { 0 }
2024-05-08 22:24:58:  0: STDOUT: Stack trace:
2024-05-08 22:24:58:  0: STDOUT:   /lib64/libc.so.6(+0xce23b) [0x7f92c5fe323b]
2024-05-08 22:24:58:  0: STDOUT:   /tools/redshift/3.5.22/redshift/bin/libredshift-core-cpu.so(+0xa84494) [0x7f92103f8494]
2024-05-08 22:24:58:  0: STDOUT:   /tools/redshift/3.5.22/redshift/bin/libredshift-core-cpu.so(+0xaf40af) [0x7f92104680af]
2024-05-08 22:24:58:  0: STDOUT:   /tools/redshift/3.5.22/redshift/bin/libredshift-core-cpu.so(+0xafe52c) [0x7f921047252c]
2024-05-08 22:24:58:  0: STDOUT:   /tools/redshift/3.5.22/redshift/bin/libredshift-core-cpu.so(+0x8b12b6) [0x7f92102252b6]
2024-05-08 22:24:58:  0: STDOUT:   /tools/redshift/3.5.22/redshift/bin/libredshift-core-cpu.so(+0x22cc1d1) [0x7f9211c401d1]
2024-05-08 22:24:58:  0: STDOUT:   StaticInitDevices
2024-05-08 22:24:58:  0: STDOUT:   /tools/redshift/3.5.22/redshift/redshift4maya/2024/../../bin/libredshift-core.so(+0x91beeb) [0x7f922bb83eeb]
2024-05-08 22:24:58:  0: STDOUT:   /tools/redshift/3.5.22/redshift/redshift4maya/2024/../../bin/libredshift-core.so(+0x9568a0) [0x7f922bbbe8a0]
2024-05-08 22:24:58:  0: STDOUT:   /tools/redshift/3.5.22/redshift/redshift4maya/2024/../../bin/libredshift-core.so(+0x9abc34) [0x7f922bc13c34]
2024-05-08 22:24:58:  0: STDOUT:   RS_Renderer_Create(unsigned int, int*)
2024-05-08 22:24:58:  0: STDOUT:   /tools/redshift/3.5.22/redshift/redshift4maya/2024/redshift4maya.so(+0x3132a4) [0x7f927a3d32a4]
2024-05-08 22:24:58:  0: STDOUT:   /lib64/libpthread.so.0(+0x81ca) [0x7f92c62e21ca]
2024-05-08 22:24:58:  0: STDOUT:   clone
2024-05-08 22:24:58:  0: STDOUT: 1Writing crash report in /usr/tmp/24_0002_aaa0020_lighting_v003[Recovered-root.2024-05-08-22.24].crash
2024-05-08 22:24:58:  0: STDOUT: Result: /mnt/burnside/ayon/24_0002_project/aaa/aaa0020/work/lighting/24_0002_aaa0020_lighting_v003.ma
2024-05-08 22:24:58:  0: STDOUT: Fatal Error. Attempting to save in /usr/tmp/24_0002_aaa0020_lighting_v003[Recovered-root.2024-05-08-22.24].ma
2024-05-08 22:24:58:  0: STDOUT: // Maya exited with status 1
2024-05-08 22:24:58:  0: INFO: Process returned: 1
2024-05-08 22:24:58:  0: Done executing plugin command of type 'Render Task'

Again, this works fine on the same machine that is running the deadline10launcher as root. There, when executing the same command in a terminal, the next message that gets logged is:

[Redshift] Initializing GPUComputing module (CUDA). Active device 0
[Redshift] 	CUDA Driver Version: 12030
[Redshift] 	CUDA API Version: 11020
[Redshift] 	Device 1/1 : NVIDIA RTX A5000 
[Redshift] 	Compute capability: 8.6
[Redshift] 	Num multiprocessors: 64
[Redshift] 	PCI busID: 97, deviceID: 0, domainID: 0
[Redshift] 	Theoretical memory bandwidth: 768.096008 GB/Sec
[Redshift] 	Measured PCIe bandwidth (pinned CPU->GPU): 23.797508 GB/s
[Redshift] 	Measured PCIe bandwidth (pinned GPU->CPU): 24.307302 GB/s
[Redshift] 	Measured PCIe bandwidth (paged CPU->GPU): 22.050194 GB/s
[Redshift] 	Measured PCIe bandwidth (paged GPU->CPU): 15.476170 GB/s
[Redshift] 	Estimated GPU->CPU latency (0): 0.004548 ms
[Redshift] 	Estimated GPU->CPU latency (1): 0.004536 ms
[Redshift] 	Estimated GPU->CPU latency (2): 0.004512 ms
[Redshift] 	Estimated GPU->CPU latency (3): 0.004547 ms
[Redshift] 	New CUDA context created
[Redshift] 	Available memory: 21894.8125 MB out of 24247.7500 MB

This shows that something crashes when the deadline worker process tries to initialize the CUDA module.

What could cause this issue? Any thoughts on how to further trouble shoot this?

Thanks,
Beat

When you’re testing running a command in a terminal are you running it as root on the same render node? Or is it working on a different machine?

In your task report the Worker on la-cg06 is running as root but still failing, so I’m not certain that root is a factor here. Is there anything notable in that crash report it calls out making?

I’m able to run the exact same command as the same user on the same machine that I’m running the deadline worker.

I’m not really seeing anything notable in the crash report. My thought is that there’s something wrong with the binary not being able to load or initialize the CUDA module so maybe there’s just a difference in environment variables or something along those lines.

I am continuing to trouble shoot the environment and trying to replicate the interactive shell (that works) exactly on the worker process.

I troubleshooted the job by running the same command in a terminal and one by one removing environment variables.
Removing HOME resulted in the same error. So I simply added:

deadlinePlugin.SetProcessEnvironmentVariable('HOME', '/root')

to the GlobalJobPreLoad.py file as a test and that did the trick. I’m not sure exactly what’s the deal with that but for some reason our Redshift installation requires the HOME environment variable to be set.

I’ll think of a long term solution for this but this is at least explaining what’s going on in case anyone else is running into this issue.

Privacy | Site terms | Cookie preferences