Longer rendertimes with 1.02.30007

katho3d · November 12, 2007, 8:54am

Krakatoa 1.02.30007 renders slower (5-20%) than 1.01.29090 in most cases here.
Strange, shouldn't it be faster?

I have 2 examples:

1. 10 mio. particles (prt loader), no matte object, 4 lights
1.01.29090 = 98sec
1.02.30007 = 119sec

2. 5 mio. particles (ram cache), 1 matte object, 1 light
1.01.29090 = 172sec
1.02.30007 = 183sec.

Bobo · November 12, 2007, 1:13pm

>Krakatoa 1.02.30007 renders slower (5-20%) than 1.01.29090
>in most cases here.Strange, shouldn't it be faster?

Not always, not necessarily.

First of all, 30007 is work in progress. We generalized the internal structure of particle data, allowing for named channels to be added and removed as needed, but this increased complexity of the code somewhat, and it has not been tweaked for speed yet. The code changes were so radical it is a wonder it works so well in the first Beta build. There are some issues we have to solve, for example if you cache without lighting and then enable lighting, you will get an error because there is no lighting channel in the cache, while in 29090 the channel was always there, just not used, in other words, wasted.

So the new build can use much less memory in some cases, which was the main idea behind the change, and this can lead to better performance in some cases.

For example, loading and rendering 10M particles with no lighting takes 26 seconds in 29090 and 22 seconds in 30007, but the same scene with Lighting on and one Omni light takes 43 vs. 48 seconds. Enabling Color Override in 29090 has no effect on the speed because the same 38 bytes of memory are being allocated per particle, but in 30007 it removes the Color channel and the render time goes down to 46 seconds and memory is only 20 bytes, allowing you to fit almost twice as many particles in the same amount of memory. So at this point, the changes come at the price of slightly reduced speed, but give advantages in particle counts.

We will spend some time optimizing the code before 1.1.0 ships and I hope we can come closer or even beat the 1.0.1 times.

That being said, our plans for 1.2.0 are mostly performance-related. :o)

katho3d · November 12, 2007, 1:53pm

Thanks for the explanation Bobo, I'm looking forward to the new versions, especially 1.2 wink

Bobo · November 30, 2007, 7:45pm

>Thanks for the explanation Bobo, I'm looking forward to the
>new versions, especially 1.2

We figured out what was going on speedwise, we just have not fixed it yet.

Here is the story in short:

The new internal named channels structure is incompatible with the standard sorting routines of C++ (both the Standard and Frantic Films Threaded Sort). Thus, instead of sorting the actual data in the channels, we switched to indexed sort where the indices are sorted instead of the data. This sorting is a lot faster than the old one, so sorting times actually went down. Unfortunately, data access times went up because the data is accessed out of order and the CPU cannot take advantage of caches at all. Thus, the Attenuation Map generation which accesses data by index now is much slower. The accumulated difference turned out to be about 3 to 4 seconds longer lighting times for 10 million particles and one omni light. In fact, the attenuation data generation of an omni light takes exactly twice as long, the spot light's penalty is only 20%.

The Radix Sort is even slower when using indices and should thus not be used unless your machine has a single CPU (it would be a bit faster than non-threaded standard sort).

Our plan is to reimplement a sorting algorithm that would be able to deal with the new channel structures without using indices, and hopefully perform at least as fast as in 1.0.1, without the added penalty of slower attenuation data generation. We might even remove the sorting methods list from the UI if it turns out to outperform all existing methods anyway.

JFYI.

Bobo · December 11, 2007, 8:46pm

>>>Our plan is to reimplement a sorting algorithm that would be
>able to deal with the new channel structures without using
>indices, and hopefully perform at least as fast as in 1.0.1,
>without the added penalty of slower attenuation data
>generation. We might even remove the sorting methods list
>from the UI if it turns out to outperform all existing
>methods anyway.
>JFYI.

I posted some benchmarks but decided to remove them until we have made all the changes we plan. So far, it looks like the new method performs better than the old one, and also supports all your CPUs so it tends to be even faster on quad machines. More to come soon...

katho3d · December 12, 2007, 5:48am

Great news! Thanks Bobo, can´t wait to test this on my dual quad machine.