Sorry for the long delay!
Here is how Krakatoa works:
- First we load all particles into memory just once
- Then we look at the number of threads the machine supports. In older versions, you had only two options - run on all threads, or just on one. In the latest versions, the number can be reduced dynamically, and I will explain how a bit later.
- For each thread, we allocate a separate frame buffer. The higher the resolution, the higher the memory requirements of the buffer. The more threads, the more frame buffers would be allocated. Hence the spike you mentioned.
- We would then sort the particles by depth, divide the particles based on their distance from the camera into as many regions as we have allocated buffers, and draw the respective chunks of particles into their buffers.
- In the end, we would combine the buffers into the final output image.
- This gets complicated if multiple Render Elements are involved, super-sampling is enabled for occlusion anti-aliasing etc., but the general principle is the same, it is just the amount of RAM needed that varies.
The problem with this approach originally was that a machine with 32 or 64 threads would allocated 32x or 64x the amount of memory for output image buffers, potentially running out of memory. This was especially dangerous when rendering very large output resolutions like 8K, 16K etc.
In addition, our tests with up to 80 threads back in 2011 showed that the benefit of multi-threading diminished after 16 threads and was pretty much non-existent with > 32 threads due to the overhead of splitting particles into groups and combining frame buffers after the fact.
In order to solve the memory issues, we introduced over two consecutive releases some improvements to the internal logic to dynamically adjust the number of threads based on available memory. Krakatoa would look at the total free memory before rendering, subtract the memory needed to fit the particles, plus some safety buffer, and then assume the rest of the addressable free memory would be split between the frame buffers. It would then calculate the memory requirements of all frame buffers based on output resolution, number of render elements etc., and come up with the maximum number of threads it could run in parallel.
So even on a machine with 32 cores, if the memory would only allow 16 frame buffers to fit in it, only 16 threads would be used for drawing the particles. The performance would suffer a little bit, but as mentioned above, due to diminishing return with increase of thread counts, it would not be 2x slower, and it would be guaranteed to finish without running out of memory.
The UI in Krakatoa MX added an area to control the memory usage, and report actual numbers so you could play with assumptions about the number of particles you plan to render using the current channel layout, and tell you what would work and what not. You can also manually limit the number of threads to any value you want, so if you want only 4 or 8, you can just request that and the count will be clamped at render time even if more threads would be possible.
On top of that, we improved the depth splitting logic to better account for DOF, so the number of particles in slices where extreme redrawing of the same particle would reduce performance would be adjusted (a large circle of confusion could cause a single particle to draw thousands of times, so it should count as thousands of particles in the splitting calculation).
From the above info, you should be able to see the general direction you should go:
- Krakatoa is very memory hungry, so if you have money to spend, it is always better to buy more RAM than more cores - you will be able to render more particles, in higher resolutions, and produce higher-quality output.
- When deciding on RAM type, faster RAM can benefit your Krakatoa performance more than faster cores. We found that the memory performance when stuffing data into RAM is the main bottleneck in Krakatoa, and the loading of particles takes half of the time anyway. When we were running early Krakatoa benchmarks, Krakatoa was one of the few applications that showed a measurable benefit of Tri-Channel memory vs. Dual-Channel memory (my Tri-Channel i7 rig outperformed my Dual-Channel Xeon workstation). The internet is full of benchmarks telling users that Quad-Channel memory has no benefit for their applications (e.g. this one). But in the case of Krakatoa, you might actually get a real benefit from a Quad-Channel memory layout.
- Krakatoa’s multi-threading becomes less effective after 16 threads, and 32 should be the absolute maximum you should be running. So a machine with 16 threads and 256 GB RAM is a much better Krakatoa box than one with 32 threads and 128 GB RAM.
- Krakatoa will adjust the number of threads dynamically, so in many cases when rendering very large images with many Render Elements, you might have 32 threads but only end up using half of them or even less. So more memory beats more threads again.
Hope this helps!