Large numbers of partitions vs large number of PF particles

Glacierise · January 29, 2007, 12:56pm

Hi guys,

There is this interesting question. If I need a simulation with a lot of particles, which is the better way to do it? Making a flow with as many particles as possible, and a moderate number of partitions, or a flow with a few thousand particles and a lot of partitions?

And although the speed issue is important, it is not the only one. Having a small system leads to more interactivity, which is what all artists want. And the next thing is parallelism. The simulation itself can not be calculated on more CPUs, but a cached version with few particles could be loaded on a lot of machines, and each one could create its own partitions, which I think is just as good as a parallel calculation. (Edit: Which is exactly what happens when Deadline is being used, coming to think of it )

Thanks!

anon46613453 · January 29, 2007, 1:00pm

That’s a good question I’d love to hear Frantics idea on that!

anon52203822 · January 29, 2007, 1:26pm

It sounds like you may have answered your own question. If you can get the

simulation to a point where multiple machines can each create some

partitions (using different random seeds) you can probably get the best

speed and quantity.

I would suggest pushing each machine to the maximum number of particles it

can comfortably handle. You can start with a small quantity to get your

simulation working as you want, and then crank up the quantities to a higher

value. Remember to keep Viewport Quantity Multiplier set to a low value

while running through the simulation. As long as you are rendering to file

frame-by-frame, the calculations shouldn’t be too unreasonable. It is when

you access frames out of order that PFlow can really slow down. (Also

remember that the Viewport Quantity Multiplier is a better choice than the

Display operator’s Visible %. With the former, you actually reduce the

number of calculated particles, but with the latter, you still calculate

them, but just display fewer.)

I haven’t tested disk throughput, but I would imagine that having a huge

number of very small partitions would be slower than a moderate number of

larger partitions. That said, you need to do what you need to do. If your

simulation is too slow to manage, you’ll need to go with smaller counts.

But, you mentioned “thousands” which sounds pretty light. We usually work

with “millions”, so unless your simulation is very CPU intensive, or your

processor is really slow, you should really be able to push more through in

each pass.

Do you have a .max file you can share with us? We (or another beta tester)

might have some suggestions on how best to attack your specific simulation.

Glacierise · January 29, 2007, 1:55pm

Thanks for the exaustive reply!

I didn’t have anything specific in mind, it was just a hypothetical argument. The information is valuable, nevertheless.

Bobo · January 29, 2007, 2:15pm

There are two kinds of users - those who use Deadline and have a small or large farm, and those who run the automatic partitioning on a single machine.

In the first case, the more partitions you can push through Deadline AT THE SAME TIME, the better.

In the latter case, it would be interesting to benchmark PFlow+Krakatoa with different setups, like 10 partitions with 5 million each vs. 5 partitions with 10 million each vs. 50 partitions with 1 million each. I would expect the saving time to be about the same, but I have been wrong before.

I think that LOADING from less partitions would be faster as there would be less overhead from opening and decompressing multiple files (PRT are zipped internally and expanded when accessed). This might be true for the saving part, too.

Borislav "Bobo" Petrov
Technical Director 3D VFX
Frantic Films Winnipeg

mwiebe · January 29, 2007, 3:33pm

There are a three main factors that come into play when considering how to balance, when trying to optimize your performance for a given system. They are

Memory usage
Particle system performance
Network and disk I/O, in particular multiple machines reading and writing large amounts of data on one server

Memory usage is often dictated by particle count, but in some cases, especially if you’re creating a lot of particles and then culling them, it takes a lot of memory to create small amounts of particles. Generally, you want to process as many particles as you can at once.

The particle system performance affects the total throughput of your particle generation process, and in most systems this is really fast. Sometimes, with systems that are slow because of collisions, inter-particle interactions, or scripted operators, this will become a bottleneck in the performance, in which case you’ll want to process fewer particles at once and use more partitions.

The final factor, the available bandwidth, limits how much processing you can do at once. What we’ve found is that because reading or writing particle data can saturate this bandwidth pretty quick, you need to monitor the performance, and limit the number of machines reading and writing from a server at once. We do this with the limit group feature in Deadline, generally limiting the total number of machines touching the server to 10 or so, but you’ll want to confirm those numbers based on your own hardware.

As an example, if you wanted to save 20 million particles, you might split that into 5 partitions of 4 million particles each. At that level, there’s a good chance you can have 5 machines writing and 5 machines reading particles from the server to render at the same time without hitting any serious performance bottlenecks.

Cheers,

Mark