I have a suspicion, but I could be wrong.
About 7 years ago, I purchased a gaming rig with 12GB of RAM which had the first generation of i7 CPU (quad-core + HT). I still use it at home.
At the same time, I was also running a quad Xeon with 16 GB in the VFX company I was working for.
When I tested Krakatoa on these two machines, I expected the Xeon with the slightly more memory to outperform the i7, but it did not. And the reason for that was that 12GB RAM machines used tri-channel memory (3 banks x 4 GB), while the 16GB machine was using dual-channel memory (2 banks x 8 GB).
en.wikipedia.org/wiki/Multi-cha … chitecture
Ever since our very first tests of Krakatoa in 2006, we knew that Krakatoa’s performance was very dependent on the memory performance. At the same time, the tech press claimed that there was barely any practical performance benefit to tri-channel memory. But we knew that Krakatoa was a special beast, and it made great use of the hardware, including 64 bit computing, Hyper-Threading, and memory bandwidth. So it made sense.
So now the question is - does your E5-2665 run in dual-channel or quad-channel memory mode? If it has 4x4 DDR4 memory modules in its banks, it should be using quad-channel and in theory outperform the E5530. If it has 2x8 memory modules in its banks, then it would be running in dual-channel mode and thus should be around 33% slower, assuming all other bandwidths and speeds are the same.
Indeed, if you normalize 53.47% to 100%, then the 34.41% turn into 64.3538%, which is pretty close to being 33% slower, not just 19%!
That being said, I would have expected the Saving / Partitioning processing to be less sensitive to memory bandwidth since it does not really hit the memory as hard as Krakatoa rendering. When Partitioning, Krakatoa simply tells the source particle system(s) to update and provide their particles, then it streams them into a temp. buffer of a relatively small size (about 50K particles at a time I think). These particles are then compressed using ZLib and are written to the PRT file stream by one thread, while a second thread goes back to the source particle system to process the next batch of particles. So the process is mostly single-threaded (the part where the particle system is updating itself), and then dual-threaded (the part where the particles of the current frame are compressed and written to the PRT stream).
What particle system were you partitioning? (Particle Flow, Thinking Particles, something else?) How many million particles were in a single partition per frame?
I also suggest reading this blog post about performance, hard disk performance, and memory bandwidth limits written 5 years ago, but still very relevant: thinkboxsoftware.com/news/20 … /7bpf.html
Hope this helps!