Thanks, Chad, I was able to repro the problem with 5M particles:
No Cache, One Spot Light 100 shadow map - 18 sec.
PCaching - 18 sec.
Rendering from PCache - 30 sec.
Rendering from PCache+LCache - 8 sec.
Something is not right when getting particles for lighting from PCache. It sits there 10 seconds doing nothing, in that time it could load the file a couple of times…
We will investigate. If it can be fixed in a patch, we might send you a new DLL to try out.
We investigated and figured out what is happening. Turns out that the standard library sort method takes a huge hit when the values it has to sort are already sorted (!). With 2 million particles, the sort time goes up from 4 seconds to 12+ seconds. If you would move the light to a new position that causes a new sort order, the calculation is fast again. In addition, it is single-threaded. We had our own code that attempted multi-threaded sorting, but it had the same behavior with already sorted particles, so the second CPU never really played a huge role in that specific case. Also, it was hard-coded to 2 CPUs instead of using all available cores.
The obvious solution to this problem is to implement a better sorting algorithm that does not have this undesired behavior. Our hope is that such an algorithm could be potentially faster in general cases, too, making Krakatoa even faster in the lighting phase. If we can implement the new sorting to better support multiple threads, then using a Quad system would allow for even higher sorting speed.