Faster is slower?

christianl · August 23, 2016, 12:06am

Hello,

I have been doing alot of particle partitioning in the last couple of days and have noticed a strange anamoly.

There are two types of machines on my render farm that partition particles for Krakatoa.

One machine type is a dual Xeon E5-2665 @ 2.40GHz processor with 64Gb of Ram

The other is a dual Xeon E5530 @ 2.40GHz with 12Gb of Ram

After 2 hours and 40 mins of partitioning the E5530 machine has partitioned 53.47% and the E5-2665 machine has paritioned 34.41% (they started partitioning at the same time). So why is the machine with the slower proccessor and less RAM 19.06% faster than the machine with the better proccessor and more RAM?

There are 10 partitions in this specfific job and two of them are on the faster machines and they both report the 19% difference from the other machines. I have done several jobs now and have noticed this 19% difference every time so it doesn’t seem to be specific to the job.

I am using Krakatoa v.2.5.1 and all the machines are on Windows server 2008 R2.

Thought I would check to see if anyone else has run into this before or if there are things that I can check into that would help the faster machines go faster.

Thanks!

Cheers, Christian

Bobo · August 23, 2016, 7:08am

I have a suspicion, but I could be wrong.

About 7 years ago, I purchased a gaming rig with 12GB of RAM which had the first generation of i7 CPU (quad-core + HT). I still use it at home.
At the same time, I was also running a quad Xeon with 16 GB in the VFX company I was working for.

When I tested Krakatoa on these two machines, I expected the Xeon with the slightly more memory to outperform the i7, but it did not. And the reason for that was that 12GB RAM machines used tri-channel memory (3 banks x 4 GB), while the 16GB machine was using dual-channel memory (2 banks x 8 GB).

en.wikipedia.org/wiki/Multi-cha … chitecture

Ever since our very first tests of Krakatoa in 2006, we knew that Krakatoa’s performance was very dependent on the memory performance. At the same time, the tech press claimed that there was barely any practical performance benefit to tri-channel memory. But we knew that Krakatoa was a special beast, and it made great use of the hardware, including 64 bit computing, Hyper-Threading, and memory bandwidth. So it made sense.

So now the question is - does your E5-2665 run in dual-channel or quad-channel memory mode? If it has 4x4 DDR4 memory modules in its banks, it should be using quad-channel and in theory outperform the E5530. If it has 2x8 memory modules in its banks, then it would be running in dual-channel mode and thus should be around 33% slower, assuming all other bandwidths and speeds are the same.

Indeed, if you normalize 53.47% to 100%, then the 34.41% turn into 64.3538%, which is pretty close to being 33% slower, not just 19%!

That being said, I would have expected the Saving / Partitioning processing to be less sensitive to memory bandwidth since it does not really hit the memory as hard as Krakatoa rendering. When Partitioning, Krakatoa simply tells the source particle system(s) to update and provide their particles, then it streams them into a temp. buffer of a relatively small size (about 50K particles at a time I think). These particles are then compressed using ZLib and are written to the PRT file stream by one thread, while a second thread goes back to the source particle system to process the next batch of particles. So the process is mostly single-threaded (the part where the particle system is updating itself), and then dual-threaded (the part where the particles of the current frame are compressed and written to the PRT stream).

What particle system were you partitioning? (Particle Flow, Thinking Particles, something else?) How many million particles were in a single partition per frame?

I also suggest reading this blog post about performance, hard disk performance, and memory bandwidth limits written 5 years ago, but still very relevant: thinkboxsoftware.com/news/20 … /7bpf.html

Hope this helps!