Houdini PDG "submit job" hython consumes 1 slave

Ahmed_Hindy · November 29, 2023, 8:25am

hi all,
we got a lot of electrical blackouts here so regular PDG deadline doesn’t work since if the machine running the graph (local submission machine) turns off, the PDG tasks on farm will always error out.

So the solution seems to submit the graph to be run on a farm, but the problem is that it runs on one machine preventing it from picking a “real” job.

I tried to make both “jobs” have the same name thus I can use “concurrent tasks” but it doesnt work like that. both must be the same job.

I need some advice, how do you deal with electrical cutouts/ Intranet outages while running PDG on the farm? if the answer is to submit graphs, then how do you make the machine running a graph not sit idle doing an extremely light hython task, and pick a real render/ sim task concurrently?

anthonygelatka · November 29, 2023, 10:17am

could you run 2 workers on one machine, allocate 1 CPU core and give it a pool/group, then allocate the rest of the CPU cores on the other worker, and let it process the rest of the job?

Ahmed_Hindy · December 11, 2023, 6:33pm

good idea. didnt know I could assign 1 CPU core only which is great. Any “better” ideas that doesnt involve a second slave?

zainali · December 11, 2023, 11:04pm

Hello @Ahmed_Hindy

I do not think you can run concurrent jobs. Concurrency works on the tasks of a single job.

I need to know why the graph job runs forever? I am not very familiar with PDG sorry about the naïve question.

If the job does not run indefinitely completes but has to hang we might be able to workaround by using task/job timeout or completing the job based on what is printed in the logs.

Ahmed_Hindy · December 27, 2023, 4:19pm

it hangs because PDG ‘Procedural Dependency Graph’ runs on a submitter machine and sends dynamic work items to slaves. the amount of tasks do increase as needed and Houdini’s PDG is the one who handles all of that.
if the submitter machine fails for some reason then all slave jobs will simply fail since they cant communicate with the PDG graph running on the submitter.

my point is: I can send the submitter to run as a separate job on farm but it’s a waste of a slave since it consumes 0% CPU or any other resources. I am thinking of running dual slaves on some machines then limiting my PDG graph submission pool to those dual machines.

all ideas are appreciated.