AWS Thinkbox Discussion Forums

Deadline 8.0.19.1 not detecting correct number of CPUs

Any way to have Deadline detect the correct amount of CPUs? It’s a Xeon Gold processor. We’re missing about half, maybe more.

I’m assuming you have the 44 thread guy as shown here.

Deadline 8.0 should be using the 64bit processor mask which should (unsurprisingly) allow us to show and control 64 virtual cores) per socket. Is this a multiple socket machine? There is a way Windows iterates over NUMA nodes that doens’t work well with Deadline at the moment, and likely you will only see cores from one of the two sockets.

That said, you can disable the affinity (I believe this may have come after 8.0) and then lock down different Slaves to different NUMA nodes with external software. That may get you there.

We’ve got two of these here and it’s a multiple socket machine. So yes, it looks like we’re just seeing half. What I didn’t quite catch was that about disabling affinity (it is off by default on Deadline 8) and about locking down different slaves to different numa nodes. Do you have any link i can read about?
Is this an issue that happens only on version 8, or happens also in more recent versions? Thanks!

Specs:

It does use all of the procs with no affinity, but if we want to assign specific number of procs, we can only do that to half of them.

If it’s already disabled, you’re in good shape and at least you can make use of the cores.

The issue at the moment is that the API for controlling cores changed some years ago and we’ll have to upgrade some core components to support it. Demand seems to have been low in recent years and I’m not sure if folks are just using CPU affinity less or not.

I did bring up Process Lasso over here as it’s able to remember affinity across program restarts, but I’ve never used it.

How are you planning to break up CPU affinity? I’d like to throw your use case in when I bring it up to the dev team.

To us, using affinity makes a lot of sense because that way we can get these machines that have a huge amount of cores (and RAM) and run different instances that can take on different and faster render jobs. (Saving licenses in the process) Unfortunately right now we can only get access to half of them with affinity turned on.

Ideally we would be breaking a single machine and run 4 instances of 18 cores each. Processes seem to be more efficient and faster turning on the CPU affinity, even when we’re using only half of them at the moment.

Investigating a little bit more about the problem, seems it has to do with “process groups”. The 72 cores are split into 2 process groups (36 procs each) and hence why they can use all of them, it would seem deadline doesn’t do process groups. Unfortunately process lasso doesn’t either.

Ugh. Okay, then we’re stuck for now.

I’ll raise this with the core team so they’re aware of it again, but that doesn’t mean we can put it on the roadmap at the moment.

Have you tried using multiple Slaves to see if the OS’s scheduling naturally handles moving processes around for you? I’m assuming you’re hitting problems with memory bandwidth in the machines.

Privacy | Site terms | Cookie preferences