AWS Thinkbox Discussion Forums

all depdendent jobs become corrupt after pending

Greetings, Thinkbox.

Deadline Version: 6.0.0.51030
FranticX Version: 2.0.0.50955
Windows XP 64
Nuke 6.3v7 & Python 2.6.2

I submit a Nuke render job with a dependent python executable job. Every time the Nuke job completes, the Python job corrupts.

Here is my info file for the Nuke job:
UserName=sally.slade
OutputDirectory0=//inferno2/projects/tboa/scenes/TST_000_0000/2d/tcomps/TST_000_0000_tcomp_sg1_v0001/proxy_2khd/TST_000_0000_tcomp_sg1_v0001.%04d.jpg
LimitGroups=nuke
Group=nuke
Name=[TBOA] Nuke Render: TST_000_0000_tcomp_sg1_v0001_ssl.nk
Plugin=Nuke
EnvironmentKeyValue5=LENSMODEL_PATH=//s2/exchange/software/nuke/custom_gizmos/plugins/6/Lens_Models/
EnvironmentKeyValue1=QT_PLUGIN_PATH=//s2/exchange/software/managed/Libraries/Qt/4.5.0_x64/plugins
EnvironmentKeyValue7=NUKE_PATH=//s2/exchange/software/Nuke/projects/tboa/TST_000_0000://s2/exchange/software/Nuke/projects/tboa://s2/exchange/software/managed/pythonScripts/site-packages://s2/exchange/software/Nuke/custom_gizmos
EnvironmentKeyValue6=SCL_SHOT_PATH=//inferno2/projects/tboa/scenes/TST_000_0000
EnableAutoTimeout=0
EnvironmentKeyValue0=SCL_SHOW_PATH=//inferno2/projects/tboa
EnvironmentKeyValue3=SCL_USERNAME=sally.slade
EnvironmentKeyValue2=SCL_SHOW_CODE=tboa
Priority=40
LimitConcurrentTasksToNumberOfCpus=0
Department=
EnvironmentKeyValue9=PATH=//s2/exchange/software/managed/Libraries/Qt/4.5.0_x64/bin;//s2/exchange/software/nuke/custom_gizmos/plugins/AtomKraft/bin;//s2/exchange/software/managed/Libraries/Qt/4.5.0_x64/bin;//s2/exchange/software/managed/Libraries/Qt/4.5.0_x64/bin;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v5.0/bin/;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v5.0/libnvvp/;C:/Program Files/Thinkbox/Deadline/bin;C:/Program Files (x86)/NVIDIA Corporation/PhysX/Common;C:/Program Files/Autodesk/Maya2009/bin;C:/Program Files (x86)/Windows Resource Kits/Tools/;C:/WINDOWS/system32;C:/WINDOWS;C:/WINDOWS/System32/Wbem;C:/Program Files/Intel/DMIX;C:/WINDOWS/system32/WindowsPowerShell/v1.0;C:/Program Files (x86)/QuickTime/QTSystem/;C:/Program Files/Common Files/Autodesk Shared/;C:/Program Files (x86)/Common Files/DivX Shared/;C:/Program Files/XviD MPEG-4 Video Codec 1.2.2 64-BIT;C:/Program Files/TortoiseSVN/bin;C:/Program Files (x86)/Microsoft SQL Server/90/Tools/binn/;C:/dcraw;C:/Program Files (x86)/Autodesk/Backburner/;c:/blur/common;C:/Program Files/Common Files/Nuke/6.3/plugins/AtomKraft 1.2.1/bin;C:/Program Files/Common Files/Nuke/7.0/plugins/AtomKraft 1.2.1/bin;//s2/exchange/software/managed/Libraries/Qt/4.7.1_x64_Scl/bin;//s2/exchange/software/managed/Libraries/MySQL/x64/lib;//S2/exchange/software/nuke/custom_gizmos/plugins;
ConcurrentTasks=1
ChunkSize=4
Comment=
TaskTimeoutMinutes=1200
JobDependencies=
InitialStatus=Active
EnvironmentKeyValue11=SCL_SHOT_CODE=TST_000_0000
EnvironmentKeyValue10=peregrinel_LICENS … evfxla.com
EnvironmentKeyValue12=NUKE_TEMP_DIR=S:/nuke_temp
MachineLimit=30
EnvironmentKeyValue8=QTDIR=//s2/exchange/software/managed/Libraries/Qt/4.5.0_x64
EnvironmentKeyValue4=RVL_SERVER=s2la.scanlinevfxla.com
Frames=1005-1010x1
Pool=2d

Here is the job file for the Nuke job:
SceneFile=//inferno2/projects/tboa/scenes/TST_000_0000/2d/tcomps/TST_000_0000_tcomp_sg1_v0001/linear_2khd\TST_000_0000_tcomp_sg1_v0001_ssl.nk
Camera=
WriteNodeNames=WriteBot1
NukeX=False
ProjectPath=
Priority=40
Version=6.3v7
StrictErrorChecking=1
Build=64bit
IgnoreError211=0
Arguments=
OutputFilePath=
MaxProcessors=0
LocalRendering=0
OutputFilePrefix=

Here is my info file for the dependent python job:
OutputDirectory0=
LimitGroups=
Group=nuke
Name=[TBOA] Create Shotgun Version: TST_000_0000_tcomp_sg1_v0001
Plugin=Python
EnableAutoTimeout=0
Priority=40
LimitConcurrentTasksToNumberOfCpus=0
Department=
ConcurrentTasks=1
ChunkSize=1
Comment=
TaskTimeoutMinutes=1200
JobDependencies=51800ad22aec2b0924bced81
InitialStatus=Active
MachineLimit=1
Frames=1
DeleteOnComplete=true
Pool=2d

Here is my Job file for the dependent python job:
ScriptFile=//s2/exchange/software/managed/pythonScripts/site-packages/scl/shotgun/createVersion.py
Camera=
ProjectPath=
Priority=17
Version=2.6
StrictErrorChecking=1
Build=64bit
IgnoreError211=0
ScriptJob=True
Arguments=//inferno2/projects/tboa/scenes/TST_000_0000/2d/tcomps/TST_000_0000_tcomp_sg1_v0001/linear_2khd/TST_000_0000_tcomp_sg1_v0001.%04d.exr
OutputFilePath=
MaxProcessors=0
LocalRendering=0
OutputFilePrefix=

Attached are some screencaps illustrating the dependent job initally being populated with data and looking correct, and then suddenly becoming corrupt (presumably at execution time).

Thank you for your insight
Sally Slade
Scanline VFX



Thanks for reporting this, and for the detailed info. We’ll try to reproduce here and get back to you.

Out of curiosity, do you know if the python job finishes before it becomes corrupted? I’m just wondering if the corruption occurs when it’s released from the pending state, or after it completes.

Also, I haven’t tested this, so I don’t know if it will work, but if you right-click on the corrupted job, can you export it? If so, please upload the exported job here and we’ll take a look. In theory, we should be able to figure out why it’s corrupted if you are able to export it.

Thanks!

  • Ryan

Ill let Sally answer in detail, but i do see is that the corrupted jobs’ task list shows the task to still be pending. So im guessing it hasnt finished ( the task has no reports ).

The jobs have no export or archive option in their right click menu.

cheers,
l

I tried those files here to reproduce the problem. The only things I changed in them was the pool, group, and limitgroup settings in the job info files. On 10 different occasions, I was unable to reproduce the problem. Just to confirm, does this happen for every dependent python job, or is it more of a random thing?

I did notice the version number is a bit older (6.0.0.51030), which is beta 18, and I’ve been testing against RC1. It would be interesting to see if the problem still happens after an upgrade.

Cheers,

  • Ryan

It seems to be every dependent python job. I just tried again with two rudimentary python jobs which print some output. Same result.

Laszlo is updating us to the new deadline today-- I’ll give it a try then and let you know how it goes.

Thank you for the help
Sally

The rollout will take a little longer than usual (probably 1-2 days), as RC1 has to be reinstalled per machine, but will report back as soon as we can repro on machines with the new version.

cheers,
laszlo

Privacy | Site terms | Cookie preferences