Here is a recent benchmark using Stoke Release Candidate 2.
I already posted some numbers in another thread, but now I have all runs with various Thread counts and would like to discuss what they show.
The simulation used one rather default FumeFX simulation of a burning Teapot. Emission was performed from the surface of the Teapot, but I tested it with Emission from FumeFX Source and it produced the same results and took the same time.
The simulation generated 100,000 particles per frame over 101 frames for a total of 10 million on the last frame.
Saving to PRTs amounted to 15.2GB of data on disk. It made no difference whether the saving was done on an HDD or SSD, since the saving performance was CPU-bound all the time. Most of the time was spent zipping the streams.
The Particle Flow run used Integration Step of Half Frame, it would be 213 seconds faster with One Frame step.
I used a HP Z420 Quad Xeon machine with 32 GB of RAM. The Memory Limit was set to 24GB. Simulating to Memory Cache only produced 19GB of actual data. With less RAM, the performance of Stoke would have been of course slower - I might run some more tests with 8GB and 4GB to show what happens.
In the table / graph below, you can see the comparison of simulating with PFlow+FumeFX Follow+Saving with Krakatoa vs. Stoke simulating on 8 threads and saving on 1 to 8 threads.
You will notice that the Stoke Simulation time is around 10 times shorter than the PFlow simulation time. Switching PFlow to 1 Frame Integration Step would reduce that to 5x. Saving with Stoke using only one thread though is only 1.41 times faster.
As we go to 4 cores on a Quad CPU, we get quite a nice speed up from saving 2,3, or 4 PRT files in parallel in the background - two cores save 1.9x faster than 1, three cores save 2.7x faster than 1, and four cores are 3.5x faster than 1.
Unfortunately, this does not hold true for Hyper-Threading. Adding 1 to 4 threads via HT does not improve performance that significantly. Running on 8 threads is only 4.6x faster than running on one thread! Still, it squeezes out some more seconds from the machine - 8 threads are still 95 seconds faster overall than 4 threads, but the addition of 4 Hyper-Threads gives us only 30% performance boost.
Compared to PFlow, 4 Threads finish partitioning nearly 5x faster, and 8 Threads are 6.5 times faster.
Note that the Simulation and Saving in PFlow is sequential - each frame is being calculated, then saved to disk, then the next frame is processed. Thus, the sum of the Simulation and Saving (labeled as Flush) times is the Total time.
In the case of Stoke, the Simulation and Saving are asynchronous - they run in parallel thanks to the memory buffering and background caching threads. As result, the Total time is shorter than the sum of the Simulation and Saving times. The only exception is the saving with one thread since the saving is significantly slower than the simulation and has to do most of the work after the simulation has finished.
Last but not least, as reported in the other thread, the final results of the Stoke simulation looked better. The FumeFX grid was set too narrow and PFLow particles were escaping the grid, producing linear streaks. I decided not to delete them because I was interested in the final data amount being saved.