This might be more of a feature request, but we’re rendering on AWS, and due to limited availability, we utilize a wide range of G4s on GPU rendering.
So without knowing the number of GPUs we’re getting. 1, 4 or 8. Is is possible to for example set max 2 GPUs per task and have concurrent tasks auto-adjust from the number of GPUs available on a per-worker basis? It would definitively be a very handy feature to save both money and time.
Are you using AWS Portal or Spot Event Plugin for AWS rendering? I am not aware of any feature like the one you described, but below are some ideas that might help.
If using Spot Event Plugin, you could configure the ec2 instance UserData script with the Spot Fleet Request to query the instance metadata and perform some additional configuration based on instance type. Alternatively, might be able to query the system specs with some terminal commands in the UserData. With that information, you could either increase a worker’s concurrent task limit OR, configure an amount of additional workers to start based on the instance specs. I haven’t tested this yet, so there might be some caveats or limitations.
Unfortunately, I don’t believe there is a way to modify the AWS Portal Spot Fleet UserData. The only way I’ve been able to get close is by creating AWS Portal Infrastructure and configuring the Spot Event Plugin to spin up workers within that infrastructure. However, I’d only recommend this if you plan to leave AWS Portal running 24/7 because you’ll need to reconfigure the Spot Event Plugin’s Spot Fleet Request every time you spin up/down the portal infrastructure.
Hope this helps (and makes sense)! Curious to hear what you end doing, and what others might recommend.
Thanks that’s a good tip! We actually run both setups in different locations. 24/7 gateway running with a custom spot event script - and just the Spot Fleet. We are transition over to the Spot Fleet though as we’re moving everything into the cloud, it’s easier to maintain, and don’t need the costly gateway and asset server to transfer from on-prem.
But yeah, that might be a way to do it!
I believe you can catch the number of GPUs with this line
nvidia-smi --query-gpu=name --format=csv,noheader | wc -l
Just have to dig up where and how the worker config is done. Hopefully Thinkbox can guide with some pointers here, but I’ll definitively look into it.