Experiences? Large scale farm performance and future updates (Houdini, Maya)

I’ll stick to your numbers here:

1: Assuming 20tsd/40tsd is short for 20/40 thousand? It’s hard to give a solid number on how often the Monitor will update a job list as it’s effectively a query against the database.

Using an RCS will make it faster as clients will be able to take advantage of request caching built into the RCS.

In terms of usability the performance settings are used to increase/decrease how often Monitors poll the database for updated data. The idea being that if updating all the jobs takes 30 seconds, don’t poll for new data every 10 seconds. It does only scale based on the number of Workers, so I can’t just punch in 20 thousand jobs and get you an estimate. However at 1000 Workers it suggests a Job update every 46 seconds. Meaning at most the data in the Monitor will be 46 seconds out of date.

However I’ve never been a render wrangler, so if someone who’s actually running a farm that big could weigh in that’d be great.

2: In general the support team can get an unofficial patch out for the new release before full support is added to the product. Or the support team is able to publish the updated plugin code here in the forums early.

For Maya 2024, we’ve created patch files and they’re up in the forums on Maya 2024 Patch files.

Adding these patches manually isn’t usually complicated, and you can use this Help Centre guide to go through it if the support team hasn’t done it already.

Assuming the product developers (Autodesk for Maya for example) haven’t wildly changed the rendering API the patch should work immediately. New features may or may not work however, it can vary product to product and update to update.

Ideally we can get the patch files out in a couple days, and full support out in the next release. We’re not able to comment on when releases are, or what they’ll have in them due to Amazon’s policies however.

3:
a) The Workers do not communicate or co-ordinate with each other to co-ordinate resources. It’s possible to assign CPU cores and GPUs using their respective affinity settings but we don’t have a similar feature for memory usage.

b) No not by default. The reason being that all our AWS integrations (AWS Portal — Deadline 10.2.1.1 documentation / Spot Event Plugin — Deadline 10.2.1.1 documentation) assume a single Worker per VM, and if that Worker is idle it terminates the VM. There’s nothing stopping you from starting your own VMs using images that have multiple Workers configured however!

The idle detection limitation is a byproduct of our implementation. For AWS, the instance size and price should scale linearly. Meaning two 8GB machines should cost the same as a single 16GB machine of the same family. So our usual recommendation is to size the instances to match the work versus running multiple Workers.