AWS Thinkbox Discussion Forums

Houdini PDG - Error on Farm

I’ve run into an issue with running PDG on my local Farm. I’ve looked through the documentation for a couple days now and have not found the solution. Any help would be greatly appreciated.

If I right click and say dirty and cook node, the Deadline scheduler will submit the jobs to the farm and they will pick up on the slaves. But only the local machine that I submitted from will actually cook, and output frames. The other 2 hosts throw this error.

Deadline Task Report
: 0: STDOUT: Traceback (most recent call last):
0: STDOUT: File “/home/jcoleman/mnt/afx_sd_01/Sandbox/Jcoleman/Mutagen_Test/pdgtemp/102793/scripts/rop.py”, line 592, in
0: STDOUT: args.server = socket.gethostbyname(hostname) + ‘:’ + port
0: STDOUT: socket.gaierror: [Errno -2] Name or service not known

So I can tell the slaves are not connecting to the PDG_RESULT_SERVER. Even though earlier in the log its says this

2020-04-22 09:13:54: 0: INFO: Setting Process Environment Variable PDG_RESULT_SERVER to AFX-WS-002:36179
2020-04-22 09:13:54: 0: INFO: PDG_RESULT_SERVER: AFX-WS-002:36179

Full Log

2020-04-22 09:13:52: 0: Loading Job’s Plugin timeout is Disabled
2020-04-22 09:13:53: 0: cat: /etc/upstream-release: Is a directory
2020-04-22 09:13:53: 0: Executing plugin command of type ‘Sync Files for Job’
2020-04-22 09:13:53: 0: All job files are already synchronized
2020-04-22 09:13:53: 0: Synchronizing Plugin PDGDeadline from /opt/hfs18.0.416/houdini/pdg/plugins/PDGDeadline took: 0 seconds
2020-04-22 09:13:53: 0: Done executing plugin command of type ‘Sync Files for Job’
2020-04-22 09:13:53: 0: Executing plugin command of type ‘Initialize Plugin’
2020-04-22 09:13:53: 0: INFO: Executing plugin script ‘/home/jcoleman/Thinkbox/Deadline10/slave/AFX-WS-003/plugins/5ea0430e8101669210206d9f/PDGDeadline.py’
2020-04-22 09:13:53: 0: INFO: *********** PDGDeadline InitializeProcess
2020-04-22 09:13:53: 0: INFO: About: PDG Plugin for Deadline
2020-04-22 09:13:53: 0: INFO: Render Job As User disabled, running as current user ‘jcoleman’
2020-04-22 09:13:53: 0: INFO: The job’s environment will be merged with the current environment before rendering
2020-04-22 09:13:53: 0: Done executing plugin command of type ‘Initialize Plugin’
2020-04-22 09:13:54: 0: Start Job timeout is disabled.
2020-04-22 09:13:54: 0: Task timeout is disabled.
2020-04-22 09:13:54: 0: Loaded job: PDG TASKS (5ea0430e8101669210206d9f)
2020-04-22 09:13:54: 0: Executing plugin command of type ‘Start Job’
2020-04-22 09:13:54: 0: DEBUG: S3BackedCache Client is not installed.
2020-04-22 09:13:54: 0: INFO: Executing global asset transfer preload script ‘/home/jcoleman/Thinkbox/Deadline10/slave/AFX-WS-003/plugins/5ea0430e8101669210206d9f/GlobalAssetTransferPreLoad.py’
2020-04-22 09:13:54: 0: INFO: Looking for legacy (pre-10.0.26) AWS Portal File Transfer…
2020-04-22 09:13:54: 0: INFO: Looking for legacy (pre-10.0.26) File Transfer controller in /opt/Thinkbox/S3BackedCache/bin/task.py…
2020-04-22 09:13:54: 0: INFO: Could not find legacy (pre-10.0.26) AWS Portal File Transfer.
2020-04-22 09:13:54: 0: INFO: Legacy (pre-10.0.26) AWS Portal File Transfer is not installed on the system.
2020-04-22 09:13:54: 0: Done executing plugin command of type ‘Start Job’
2020-04-22 09:13:54: 0: Plugin rendering frame(s): 0
2020-04-22 09:13:54: 0: Executing plugin command of type ‘Render Task’
2020-04-22 09:13:54: 0: INFO: StartFrame: 0
2020-04-22 09:13:54: 0: INFO: Startup Directory: /home/jcoleman/mnt/afx_sd_01/Sandbox/Jcoleman/Mutagen_Test/pdgtemp/102793
2020-04-22 09:13:54: 0: INFO: Looking for task file: /home/jcoleman/mnt/afx_sd_01/Sandbox/Jcoleman/Mutagen_Test/pdgtemp/102793/job_6f1422602fd64e69b84bba5e7bbacd7f/task_0.txt
2020-04-22 09:13:54: 0: INFO: Setting PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/opt/hfs18.0.416/bin
2020-04-22 09:13:54: 0: INFO: Setting Process Environment Variable PATH to /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/opt/hfs18.0.416/bin
2020-04-22 09:13:54: 0: INFO: $HYTHON mapped to: /opt/hfs18.0.416/bin/hython
2020-04-22 09:13:54: 0: INFO: Setting Process Environment Variable PDG_TEMP to /home/jcoleman/mnt/afx_sd_01/Sandbox/Jcoleman/Mutagen_Test/pdgtemp/102793
2020-04-22 09:13:54: 0: INFO: Setting Process Environment Variable PDG_SHARED_TEMP to /home/jcoleman/mnt/afx_sd_01/Sandbox/Jcoleman/Mutagen_Test/pdgtemp/102793
2020-04-22 09:13:54: 0: INFO: Setting Process Environment Variable PDG_SCRIPTDIR to /home/jcoleman/mnt/afx_sd_01/Sandbox/Jcoleman/Mutagen_Test/pdgtemp/102793/scripts
2020-04-22 09:13:54: 0: INFO: Setting Process Environment Variable PDG_DIR to /home/jcoleman/mnt/afx_sd_01/Sandbox/Jcoleman/Mutagen_Test
2020-04-22 09:13:54: 0: INFO: Setting Process Environment Variable PDG_HFS to /opt/hfs18.0.416
2020-04-22 09:13:54: 0: INFO: Setting Process Environment Variable HFS to /opt/hfs18.0.416
2020-04-22 09:13:54: 0: INFO: Setting Process Environment Variable PYTHON to
2020-04-22 09:13:54: 0: INFO: Setting Process Environment Variable PDG_JOBID to 5ea0430e8101669210206d9f
2020-04-22 09:13:54: 0: INFO: Setting Process Environment Variable PDG_JOB_BATCH_NAME to PDG Mutagen_Test_v01_02 2020-04-22 09:13:38.361209
2020-04-22 09:13:54: 0: INFO: Setting Process Environment Variable PDG_ITEM_NAME to ropfetch_geo300
2020-04-22 09:13:54: 0: INFO: Setting Process Environment Variable PDG_INDEX to 300
2020-04-22 09:13:54: 0: INFO: Submit as job: False
2020-04-22 09:13:54: 0: INFO: Setting Process Environment Variable PDG_SUBMIT_AS_JOB to False
2020-04-22 09:13:54: 0: INFO: Setting Process Environment Variable PDG_RESULT_SERVER to AFX-WS-002:36179
2020-04-22 09:13:54: 0: INFO: PDG_RESULT_SERVER: AFX-WS-002:36179
2020-04-22 09:13:54: 0: INFO: Setting Process Environment Variable PDG_HTTP_PORT to None
2020-04-22 09:13:54: 0: INFO: PDG_HTTP_PORT: None
2020-04-22 09:13:54: 0: INFO: Task Executable: /opt/hfs18.0.416/bin/hython
2020-04-22 09:13:54: 0: INFO: Task Arguments: “/home/jcoleman/mnt/afx_sd_01/Sandbox/Jcoleman/Mutagen_Test/pdgtemp/102793/scripts/rop.py” “–batch” “-p” “/home/jcoleman/mnt/afx_sd_01/Sandbox/Jcoleman/Mutagen_Test/Mutagen_Test_v01_02.hiplc” “-n” “/obj/Test_Sim/Explosion_Test” “-i” “ropfetch_geo300” “-fs” “1” “-fe” “100” “-fi” “1”
2020-04-22 09:13:54: 0: INFO: Invoking: Run Process
2020-04-22 09:13:59: 0: STDOUT: cat: /sys/devices/virtual/dmi/id/board_{vendor,name,version}: No such file or directory
2020-04-22 09:13:59: 0: STDOUT: No LSB modules are available.
2020-04-22 09:14:00: 0: STDOUT: [Redshift] Redshift for Houdini plugin version 3.0.17 (Mar 13 2020 15:35:12)
2020-04-22 09:14:00: 0: STDOUT: [Redshift] Plugin compile time HDK version: 18.0.391
2020-04-22 09:14:00: 0: STDOUT: [Redshift] Houdini host version: 18.0.416
2020-04-22 09:14:00: 0: STDOUT: [Redshift] Houdini and the Redshift plugin versions don’t match. Houdini or Redshift may become unestable, with features not available or crashes at render time
2020-04-22 09:14:00: 0: STDOUT: [Redshift] Plugin dso/dll and config path: /usr/redshift/redshift4houdini/18.0.416/dso
2020-04-22 09:14:00: 0: STDOUT: [Redshift] Core data path: /usr/redshift
2020-04-22 09:14:00: 0: STDOUT: [Redshift] Local data path: /home/jcoleman/redshift
2020-04-22 09:14:00: 0: STDOUT: [Redshift] Procedurals path: /usr/redshift/procedurals
2020-04-22 09:14:00: 0: STDOUT: [Redshift] Preferences file path: /home/jcoleman/redshift/preferences.xml
2020-04-22 09:14:00: 0: STDOUT: [Redshift] License path: /home/jcoleman/redshift
2020-04-22 09:14:00: 0: STDOUT: PDG Type Registry: Failed to import duplicate module ‘utils’ which was previously imported from ‘/opt/hfs18.0.416/houdini/pdg/types/utils’
2020-04-22 09:14:00: 0: STDOUT: PDG Type Registry: Failed to import duplicate module ‘houdini’ which was previously imported from ‘/opt/hfs18.0.416/houdini/pdg/types/houdini’
2020-04-22 09:14:00: 0: STDOUT: PDG Type Registry: Failed to import duplicate module ‘partitioners’ which was previously imported from ‘/opt/hfs18.0.416/houdini/pdg/types/partitioners’
2020-04-22 09:14:00: 0: STDOUT: PDG Type Registry: Failed to import duplicate module ‘schedulers’ which was previously imported from ‘/opt/hfs18.0.416/houdini/pdg/types/schedulers’
2020-04-22 09:14:02: 0: STDOUT: Traceback (most recent call last):
2020-04-22 09:14:02: 0: STDOUT: File “/home/jcoleman/mnt/afx_sd_01/Sandbox/Jcoleman/Mutagen_Test/pdgtemp/102793/scripts/rop.py”, line 592, in
2020-04-22 09:14:02: 0: STDOUT: args.server = socket.gethostbyname(hostname) + ‘:’ + port
2020-04-22 09:14:02: 0: STDOUT: socket.gaierror: [Errno -2] Name or service not known
2020-04-22 09:14:02: 0: STDOUT: [Redshift] Closing the RS instance. End of the plugin log system.
2020-04-22 09:14:02: 0: WARNING: Process returned non-zero exit code: 1
2020-04-22 09:14:02: 0: Done executing plugin command of type ‘Render Task’

Are the workers able to resolve the AFX-WS-002:36179 name?

If but if you try to use this workflow without Deadline, are other machines able to connect to connect to it?

Another test that might bear fruit: When the PDG Result Server is running from one of the workers can you run telnet AFX-WS-002 36179 and get a result that isn’t a connection error? A blank/nonsense response would mean that the server is listening and the worker machine is able to complete the connection.

When I try to run Telnet AFX-WS-003 40991 (currently the machine I’m submitting from) I get =

telnet: could not resolve AFX-WS-003/40991: Name or service not know

How might I resolve this?

Well at least that’s consistent. I’d check out if that port is open and if that service is up and running.

You might also have good luck on the SideFX forums, they might have some pointers about where the logs are.

Thanks Justin. I’m already on the forums over at SideFX, so far they are saying its a DNS problem. I"m going mess around with it this weekend and see if I can figure out whats going on. Thanks for your time!

Privacy | Site terms | Cookie preferences