Hi guys,
There is this interesting question. If I need a simulation with a lot of particles, which is the better way to do it? Making a flow with as many particles as possible, and a moderate number of partitions, or a flow with a few thousand particles and a lot of partitions?
And although the speed issue is important, it is not the only one. Having a small system leads to more interactivity, which is what all artists want. And the next thing is parallelism. The simulation itself can not be calculated on more CPUs, but a cached version with few particles could be loaded on a lot of machines, and each one could create its own partitions, which I think is just as good as a parallel calculation. (Edit: Which is exactly what happens when Deadline is being used, coming to think of it )
Thanks!
That’s a good question I’d love to hear Frantics idea on that!
It sounds like you may have answered your own question. If you can get the
simulation to a point where multiple machines can each create some
partitions (using different random seeds) you can probably get the best
speed and quantity.
I would suggest pushing each machine to the maximum number of particles it
can comfortably handle. You can start with a small quantity to get your
simulation working as you want, and then crank up the quantities to a higher
value. Remember to keep Viewport Quantity Multiplier set to a low value
while running through the simulation. As long as you are rendering to file
frame-by-frame, the calculations shouldn’t be too unreasonable. It is when
you access frames out of order that PFlow can really slow down. (Also
remember that the Viewport Quantity Multiplier is a better choice than the
Display operator’s Visible %. With the former, you actually reduce the
number of calculated particles, but with the latter, you still calculate
them, but just display fewer.)
I haven’t tested disk throughput, but I would imagine that having a huge
number of very small partitions would be slower than a moderate number of
larger partitions. That said, you need to do what you need to do. If your
simulation is too slow to manage, you’ll need to go with smaller counts.
But, you mentioned “thousands” which sounds pretty light. We usually work
with “millions”, so unless your simulation is very CPU intensive, or your
processor is really slow, you should really be able to push more through in
each pass.
Do you have a .max file you can share with us? We (or another beta tester)
might have some suggestions on how best to attack your specific simulation.
Thanks for the exaustive reply!
I didn’t have anything specific in mind, it was just a hypothetical argument. The information is valuable, nevertheless.
There are two kinds of users - those who use Deadline and have a small or large farm, and those who run the automatic partitioning on a single machine.
In the first case, the more partitions you can push through Deadline AT THE SAME TIME, the better.
In the latter case, it would be interesting to benchmark PFlow+Krakatoa with different setups, like 10 partitions with 5 million each vs. 5 partitions with 10 million each vs. 50 partitions with 1 million each. I would expect the saving time to be about the same, but I have been wrong before.
I think that LOADING from less partitions would be faster as there would be less overhead from opening and decompressing multiple files (PRT are zipped internally and expanded when accessed). This might be true for the saving part, too.
Borislav "Bobo" Petrov
Technical Director 3D VFX
Frantic Films Winnipeg
There are a three main factors that come into play when considering how to balance, when trying to optimize your performance for a given system. They are
- Memory usage
- Particle system performance
- Network and disk I/O, in particular multiple machines reading and writing large amounts of data on one server
Memory usage is often dictated by particle count, but in some cases, especially if you’re creating a lot of particles and then culling them, it takes a lot of memory to create small amounts of particles. Generally, you want to process as many particles as you can at once.
The particle system performance affects the total throughput of your particle generation process, and in most systems this is really fast. Sometimes, with systems that are slow because of collisions, inter-particle interactions, or scripted operators, this will become a bottleneck in the performance, in which case you’ll want to process fewer particles at once and use more partitions.
The final factor, the available bandwidth, limits how much processing you can do at once. What we’ve found is that because reading or writing particle data can saturate this bandwidth pretty quick, you need to monitor the performance, and limit the number of machines reading and writing from a server at once. We do this with the limit group feature in Deadline, generally limiting the total number of machines touching the server to 10 or so, but you’ll want to confirm those numbers based on your own hardware.
As an example, if you wanted to save 20 million particles, you might split that into 5 partitions of 4 million particles each. At that level, there’s a good chance you can have 5 machines writing and 5 machines reading particles from the server to render at the same time without hitting any serious performance bottlenecks.
Cheers,
Mark