Ran into an interesting result. I realise this is probably just the failure to load correct Arnold version since this was happening (as far as I can tell) on just single node.
Interesting thing is that although the task fails to write the .ass file it still exits without error and is marked as complete, unless you put minimum render time for a task. So then your depending kick jobs fail to render as they can’t find the ass files they are looking for.
2018-07-17 19:20:04: 0: STDOUT: Error: Caught exception: [' File "C:/INSTALLATION/SIDEEF~1/HOUDIN~1.405/houdini/python2.7libs\\hou.py", line 33040, in load\n return _hou.hipFile_load(*args, **kwargs)\n', 'OperationFailed: The attempted operation failed.\nError loading: //Networkpath/HoudiniFileRendering.hip\nWarning: \n\n/obj/ASSET_NAME_TEMP/shop/ASSET_NAME_plastic/standard_surface1:\n\n Skipping unrecognized parameter "aov_group".\n Skipping unrecognized parameter "aov_id1".\n Skipping unrecognized parameter "id1".\n Skipping unrecognized parameter "sep1".\n Skipping unrecognized parameter "aov_id2".\n Skipping unrecognized parameter "id2".\n Skipping unrecognized parameter "sep2".\n ... just more aov channels from assets ']
2018-07-17 19:20:04: 0: INFO: Process exit code: 0
2018-07-17 19:20:04: 0: INFO: Finished Houdini Job
2018-07-17 19:20:04: 0: Done executing plugin command of type 'Render Task
2018-07-17 19:20:04: 0: Minimum required render time is 120 seconds
2018-07-17 19:20:04: 0: Actual render time was 22 seconds'
I’m not sure if we’d want to be strict about catching exceptions as render errors here since it’s not guaranteed some bad code would affect the render.
The dead give away in this case is that it doesn’t write the .ass file. Also there’s no Arnold shutdown message or releasing of resources.
Here’s a successful job for comparison.
2018-07-18 14:56:16: 0: STDOUT: 00:00:09 1715MB | [ass] writing scene to //Networkpath/JOB/_SCENES/TYPE/SHOT/ASS/OS.0724.ass (mask=0x18FF) ...
2018-07-18 14:56:25: 0: STDOUT: 00:00:18 1722MB | [ass] wrote 80831181 bytes, 83 nodes in 0:08.89
2018-07-18 14:56:25: 0: STDOUT: 00:00:18 1722MB |
2018-07-18 14:56:25: 0: STDOUT: 00:00:18 1722MB | releasing resources
2018-07-18 14:56:25: 0: STDOUT: 00:00:18 1681MB | Arnold shutdown
2018-07-18 14:56:25: 0: STDOUT: Finished Rendering
2018-07-18 14:56:25: 0: INFO: Process exit code: 0
2018-07-18 14:56:25: 0: INFO: Finished Houdini Job
2018-07-18 14:56:25: 0: Done executing plugin command of type 'Render Task'
2018-07-18 14:56:25: 0: Minimum required render time is 120 seconds
2018-07-18 14:56:25: 0: Actual render time was 357 seconds
So I would probably think about watching also for the stuff that tells if the job was successful when catching exceptions.