When I submit jobs with the spot plugin, I get errors when the instances try to run jobs. If I submit the exact same job to a running on demand instance, with the same ami, and same security group, there is no problem. I really hope Thinkbox can try to improve these generic errors like
ERROR: Encountered the following error while initializing the Plugin Sandbox: 'Value cannot be null. (Parameter 'input')'.
Iāve attached an example hip.
spot fleet broken submission is from node: /obj/topnet1/ropfetch_spot
on demand working submission from node: /obj/topnet1/ropfetch_ondemand
I thought Iād try and test UBL on an ondemand instance, since I realised the ondemand instance which works fine with houdini was using a floating licence. when It instead uses UBL we get:
2022-06-05 14:25:49: 0: INFO: Full Command: "/opt/hfs19.0/bin/hython" "/Volumes/cloud_prod/temp/pdgtemp/12888/scripts/rop.py" "json"
2022-06-05 14:25:49: 0: INFO: Startup Directory: "/Volumes/cloud_prod/temp/pdgtemp/12888"
2022-06-05 14:25:49: 0: INFO: Process Priority: BelowNormal
2022-06-05 14:25:49: 0: INFO: Process Affinity: default
2022-06-05 14:25:49: 0: INFO: Process is now running
2022-06-05 14:25:52: 0: STDOUT: PDG Type Registry: Failed to import duplicate module 'houdini' which was previously imported from '/opt/hfs19.0/houdini/pdg/types/houdini/__init__.py'. Module will be skipped.
2022-06-05 14:25:52: 0: STDOUT: PDG Type Registry: Failed to import duplicate module 'partitioners' which was previously imported from '/opt/hfs19.0/houdini/pdg/types/partitioners/__init__.py'. Module will be skipped.
2022-06-05 14:25:52: 0: STDOUT: PDG Type Registry: Failed to import duplicate module 'schedulers' which was previously imported from '/opt/hfs19.0/houdini/pdg/types/schedulers/__init__.py'. Module will be skipped.
2022-06-05 14:25:52: 0: STDOUT: PDG Type Registry: Failed to import duplicate module 'utils' which was previously imported from '/opt/hfs19.0/houdini/pdg/types/utils/__init__.py'. Module will be skipped.
2022-06-05 14:25:53: 0: Sending kill command to process hython-bin with id: 4208
2022-06-05 14:25:53: 0: Done executing plugin command of type 'Render Task'
2022-06-05 14:25:53: 0: Executing plugin command of type 'End Job'
2022-06-05 14:25:53: 0: Done executing plugin command of type 'End Job'
2022-06-05 14:25:55: Sending kill command to process deadlinesandbox.exe with id: 4179
2022-06-05 14:25:56: Scheduler Thread - Render Thread 0 threw a major error:
2022-06-05 14:25:56: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2022-06-05 14:25:56: Exception Details
2022-06-05 14:25:56: RenderPluginException -- Error: FailRenderException : PDGDeadline exception: Traceback (most recent call last):
2022-06-05 14:25:56: File "/var/lib/Thinkbox/Deadline10/workers/ip-10-1-138-253/plugins/629cbc2cffb8aa15ea75c68d/PDGDeadline.py", line 357, in RenderTasks
2022-06-05 14:25:56: self.RunManagedProcess(self.wProcess)
2022-06-05 14:25:56: FailRenderException: No licenses could be found to run this application
2022-06-05 14:25:56: at Deadline.Plugins.DeadlinePlugin.FailRender(String message)
2022-06-05 14:25:56: at Deadline.Plugins.DeadlinePlugin.FailRender(String message) (Python.Runtime.PythonException)
2022-06-05 14:25:56: File "/var/lib/Thinkbox/Deadline10/workers/ip-10-1-138-253/plugins/629cbc2cffb8aa15ea75c68d/PDGDeadline.py", line 369, in RenderTasks
2022-06-05 14:25:56: self.FailRender('PDGDeadline exception: {}'.format(traceback.format_exc(1)))
2022-06-05 14:25:56: at Python.Runtime.Dispatcher.Dispatch(ArrayList args)
2022-06-05 14:25:56: at __FranticX_GenericDelegate0Dispatcher.Invoke()
2022-06-05 14:25:56: at Deadline.Plugins.DeadlinePlugin.RenderTasks()
2022-06-05 14:25:56: at Deadline.Plugins.DeadlinePlugin.DoRenderTasks()
2022-06-05 14:25:56: at Deadline.Plugins.PluginWrapper.RenderTasks(Task task, String& outMessage, AbortLevel& abortLevel)
2022-06-05 14:25:56: at Deadline.Plugins.PluginWrapper.RenderTasks(Task task, String& outMessage, AbortLevel& abortLevel)
2022-06-05 14:25:56: RenderPluginException.Cause: JobError (2)
2022-06-05 14:25:56: RenderPluginException.Level: Major (1)
2022-06-05 14:25:56: RenderPluginException.HasSlaveLog: True
2022-06-05 14:25:56: RenderPluginException.SlaveLogFileName: /var/log/Thinkbox/Deadline10/deadlineslave_renderthread_0-ip-10-1-138-253-0000.log
2022-06-05 14:25:56: Exception.TargetSite: Deadline.Slaves.Messaging.PluginResponseMemento d(Deadline.Net.DeadlineMessage, System.Threading.CancellationToken)
2022-06-05 14:25:56: Exception.Data: ( )
2022-06-05 14:25:56: Exception.Source: deadline
2022-06-05 14:25:56: Exception.HResult: -2146233088
2022-06-05 14:25:56: Exception.StackTrace:
2022-06-05 14:25:56: at Deadline.Plugins.SandboxedPlugin.d(DeadlineMessage bgm, CancellationToken bgn
2022-06-05 14:25:56: at Deadline.Plugins.SandboxedPlugin.RenderTask(Task task, CancellationToken cancellationToken
2022-06-05 14:25:56: at Deadline.Slaves.SlaveRenderThread.c(TaskLogWriter ajt, CancellationToken aju)
2022-06-05 14:25:56: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
And the license forwarder is running, the certs for the licenses were installed, and with all ports open the result is the same.
This is the license forwarder log which isnāt particularly useful if in fact no licenses are available.
Connecting to ip-10-1-128-12...
RemoteLog: connecting to machine '10.1.128.12' which resolved to '10.1.128.12' port 33661
Making a connection to '10.1.128.12' port 33661
2022-06-05 05:31:40: ::ffff:10.1.131.72 has connected
2022-06-05 05:31:41: License Forwarder - Received request to register ip-10-1-131-72/::ffff:10.1.131.72 for feature houdini.
2022-06-05 05:31:41: License Forwarder Tunneler Thread for houdini ( 1715 ) : Initialized - listening on port 1715
2022-06-05 05:33:46: ::ffff:10.1.128.12 has connected
2022-06-05 05:33:46: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:33:46: ::ffff:10.1.131.72 has connected
2022-06-05 05:33:46: License Forwarder - Received request to register ip-10-1-131-72/::ffff:10.1.131.72 for feature houdini.
2022-06-05 05:34:01: ::ffff:10.1.128.12 has connected
2022-06-05 05:34:01: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:34:14: ::ffff:10.1.128.12 has connected
2022-06-05 05:34:14: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:34:27: ::ffff:10.1.128.12 has connected
2022-06-05 05:34:27: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:34:40: ::ffff:10.1.128.12 has connected
2022-06-05 05:34:40: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:34:54: ::ffff:10.1.128.12 has connected
2022-06-05 05:34:54: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:35:09: ::ffff:10.1.128.12 has connected
2022-06-05 05:35:09: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:35:22: ::ffff:10.1.128.12 has connected
2022-06-05 05:35:22: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:35:36: ::ffff:10.1.128.12 has connected
2022-06-05 05:35:36: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:35:49: ::ffff:10.1.128.12 has connected
2022-06-05 05:35:49: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:36:03: ::ffff:10.1.128.12 has connected
2022-06-05 05:36:03: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:36:16: ::ffff:10.1.128.12 has connected
2022-06-05 05:36:16: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:36:29: ::ffff:10.1.128.12 has connected
2022-06-05 05:36:29: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:36:42: ::ffff:10.1.128.12 has connected
2022-06-05 05:36:42: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:36:55: ::ffff:10.1.128.12 has connected
2022-06-05 05:36:55: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:37:09: ::ffff:10.1.128.12 has connected
2022-06-05 05:37:09: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:51:36: ::ffff:10.1.131.72 has connected
2022-06-05 05:51:36: License Forwarder - Received request to register ip-10-1-131-72/::ffff:10.1.131.72 for feature houdini.
2022-06-05 05:51:50: ::ffff:10.1.128.12 has connected
2022-06-05 05:51:50: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:52:03: ::ffff:10.1.128.12 has connected
2022-06-05 05:52:03: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:52:16: ::ffff:10.1.128.12 has connected
2022-06-05 05:52:16: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:52:27: ::ffff:10.1.128.12 has connected
2022-06-05 05:52:28: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:52:40: ::ffff:10.1.128.12 has connected
2022-06-05 05:52:40: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:52:53: ::ffff:10.1.128.12 has connected
2022-06-05 05:52:53: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:53:07: ::ffff:10.1.128.12 has connected
2022-06-05 05:53:07: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:53:20: ::ffff:10.1.128.12 has connected
2022-06-05 05:53:20: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:53:32: ::ffff:10.1.128.12 has connected
2022-06-05 05:53:32: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:53:45: ::ffff:10.1.128.12 has connected
2022-06-05 05:53:45: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:53:59: ::ffff:10.1.128.12 has connected
2022-06-05 05:53:59: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:54:13: ::ffff:10.1.128.12 has connected
2022-06-05 05:54:13: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:54:26: ::ffff:10.1.128.12 has connected
2022-06-05 05:54:26: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:54:38: ::ffff:10.1.128.12 has connected
2022-06-05 05:54:38: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:54:51: ::ffff:10.1.128.12 has connected
2022-06-05 05:54:51: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:55:04: ::ffff:10.1.128.12 has connected
2022-06-05 05:55:04: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:55:16: ::ffff:10.1.128.12 has connected
2022-06-05 05:55:16: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:55:28: ::ffff:10.1.128.12 has connected
2022-06-05 05:55:28: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:55:41: ::ffff:10.1.128.12 has connected
2022-06-05 05:55:41: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:55:54: ::ffff:10.1.128.12 has connected
2022-06-05 05:55:54: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:56:08: ::ffff:10.1.128.12 has connected
2022-06-05 05:56:08: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:56:20: ::ffff:10.1.128.12 has connected
2022-06-05 05:56:20: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:56:33: ::ffff:10.1.128.12 has connected
2022-06-05 05:56:33: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:56:45: ::ffff:10.1.128.12 has connected
2022-06-05 05:56:45: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:56:58: ::ffff:10.1.128.12 has connected
2022-06-05 05:56:58: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:57:12: ::ffff:10.1.128.12 has connected
2022-06-05 05:57:12: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:57:27: ::ffff:10.1.128.12 has connected
2022-06-05 05:57:27: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:57:40: ::ffff:10.1.128.12 has connected
2022-06-05 05:57:40: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:57:57: ::ffff:10.1.128.12 has connected
2022-06-05 05:57:57: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:58:11: ::ffff:10.1.128.12 has connected
### Here I modified the license forwarder properties and set the ip address manually. Still the logs on the render node were not succesful.
2022-06-05 05:58:11: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
Success
2022-06-05 05:58:25: ::ffff:10.1.128.12 has connected
2022-06-05 05:58:25: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:58:38: ::ffff:10.1.128.12 has connected
2022-06-05 05:58:38: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:58:51: ::ffff:10.1.128.12 has connected
2022-06-05 05:58:51: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
2022-06-05 05:59:05: ::ffff:10.1.128.12 has connected
2022-06-05 05:59:05: License Forwarder - Received request to register ip-10-1-128-12/::ffff:10.1.128.12 for feature houdini.
Has anyone got UBL to work with Houdini on the latest version of Deadline?
H19.0.589 (py3) and DL 10.1.19.4:
Our set up is slightly different, using Houdini and Redshift (v3.5.03) spot fleet.
Our workaround was to use UBL for redshift but have houdini look at our on-prem lic server (i.e. BYOL) for the houdini engine lics . The standard deadline ROP worked for us. At first we were getting the driver deadline not found, but that was because I had forgotten to upload our submitter directory to the AMI totally forgetting that the Deadline.hda was buried in thereā¦
I use the SESI cloud license server for H19 BYO licenses now, which is a huge upgrade from SESI, since you dont have to worry about your license server going through a VPN. Thats ideal for someone like me who may operate on a laptop and wants to be portable.
I agree, the docs prob. do need to be updated. Although it is confusing that SideFX distributes both a py2 and py3 version of Houdiniā¦ so I guess it depends on which one youāre using.
I did create the python3.7libs directory manually since we are using the py3 version of H19.
I decided to revert back to H18.5 until I can confirm at least UBL works, still it seems flaky. First render always fails and the log suggests permissions issues:
I created a post here for that, since its a seperate problem, but the tickets really are stacking up.
Ok, the permissions issues are resolved by running hserver twice with a 10 second delay on startup with instance user data. Next, just need to get that spot plugin to not throw an exception. Iāll test it again on 18.5
I did, but with UBL, only for engine. No mantra with spot and UBL. mantra is ok without UBL, but you need to edit the plugin / param file.
The core problem I saw was that starting hserver in user data did not persist running after user data ended. The workaround for me was to start hserver with a systemd service. I have no idea why I am the only person that had to solve this problem that way. The service is available here:
By the way, if anyone finds that this systemd service is actually required and Iām not just the only one, please email Side FX for visibility. Fair enough if Iām an outlier, but if Iām not they need to know.