quick renders.

benyaboy · April 20, 2010, 10:23pm

I’m struggling to get krakatoa to render quickly, while giving me the correct feed back about my settings. Most other renderers let to decrease the frame size to achieve a quicker result.
And since there is no diskcache or prt volume cache, I’m at a loss of what I can adjust without crazy render times.
Does anyone have any advise about this?

On a related note. I’m playing around with the “global load percentage.” What does this do?

Does it multiply all render n values?
does it support nth?
or does it take the n of nth?

Does it support prt Volumes?
How does it work with prt volumes?

does it compensate the density when it uses less particles?

Thanks,
Ben.

anon58450692 · April 20, 2010, 10:38pm

Here is a little more info: http://software.primefocusworld.com/software/support/krakatoa/main_controls.php#Load_Percentage

It keeps every Nth particle, but it occurs very late in the stack of operations so all particles are still loaded, colored, culled, transformed, KCM’d, etc. The density is multiplied by 1.0/Fraction, so with 10% global load each particle will have 10x the density.

Which parts of Krakatoa are the slowest for you? Are you familiar with the profiling results shown in the Krakatoa Log Window during a render?

anon58450692 · April 20, 2010, 10:50pm

As it turns out, the density compensation isn’t documented AND isn’t consistent with the PRT Loaders which don’t change the density when loading. I personally think that if you want less dense particles its pretty easy to set the density appropriately with the myriad of other options for tweaking density.

anon58450692 · April 20, 2010, 10:52pm

The >Iterative mode button has some options for lowering the render resolution and trying to compensate the density accordingly. http://software.primefocusworld.com/software/support/krakatoa/iterative_mode_scale_output.php

EDIT: Bobo says this feature probably doesn’t do anything to the density in most situations. Apparently it was only for tweaking the density of additive renders.

anon58450692 · April 20, 2010, 10:54pm

Can you elaborate on this? The PRT Loader is effectively a diskcache, isn’t it? A PRT Volume cache can be created by saving the particles to disk. Granted its difficult to set up a scene, THEN save the particles from a single object to disk, but we are working on making that easier.

benyaboy · April 20, 2010, 11:07pm

So it doesn’t affect the “load” percentage. it affects what it’s going to keep.

I’ll have to make some script to manage the file load percentages. and prt spacing, and light shadow buffer size.

On my current scene fwiw
51million particles total.
20million from loaders
the rest from prt volumes. (although these number dont seem to add up right)

rerendering with the pcache and lcache is about 10%-15% of the rendering time. Most is loading from disk and generating the levelset/parts from the volume, another 15% is kcm and light attenuation (no culling objects).

But I still need some general advice for myself and training everyone what they can do to see an image faster that won’t look completely different set back to full.

Is this completely impossible?

B.

anon58450692 · April 20, 2010, 11:19pm

You should be able to mess around with the resolution of the PRT Volume to improve speed and can compensate density accordingly.

For example, if you double the ‘Spacing’ value in the PRT Volume you can increase the density by 8 times (A 1x1x1 cube became an 2x2x2 cube) and expect a similar image when viewed from a sufficient distance.

If your geometry is complicated I imagine that the majority of the time is spent pre-processing the geometry, and that is heavily related to the ‘Spacing’ value in the PRT Volume. That’s unfortunate too since that part is the least responsive once it has started.

Bobo · April 20, 2010, 11:25pm

As I demonstrated in the Iterative Scaling topic Darcy posted above, my approach was to
*Load a fraction of the particles from disk (in my case I was loading 1 out of 10 partitions)
*Increase the Density Exponent by 1 to compensate for the particle density loss
*Set the Output Size to a fraction (1/2 or 1/4) of the original size
*Render

The result, if scaled up 2 or 4 times, looks very similar to the full 10 partitions render, but uses only 1/10 of the particles and thus loads in 1/10 of the time.

My results in that test were
10 Partitions, 10MP total. 640x480, 35.188 seconds.
1 Partition, 1MP, 640x480, 4.219 seconds.
Same 1 Partition at 1/2, 1/4, 1/8 and 1/16 resolution took 3.531, 3.438, 3.406 and 3.422 seconds.

For additive rendering, I had to compensate the density internally because a smaller image causes more particles to overlap in the fewer pixels, so it additionally scales the density to OriginalDensity/(ScaleFactor^2). In the current build, this is done when >Force Additive Mode is checked.

Since loading Nth particle from a PRT Loader or the Global % spinner do NOT shorten the actual loading time (the former has to read through the whole file but skips particles, the latter actually loads and processes them all, then discards them), these two methods do NOT speed up the loading.

The best way to speed up loading is to use multiple partitions and disable some of them when needed or load First N from them to get just the first partition loaded. If that is not an option, loading every Nth from PRT Loaders is still better than the Global % as it skips the KCMs, materials etc.

For PRT Volumes, you can start by finding a good voxel size and then using the Regular Grid or Random In Cube controls to increase the count without increasing the density of the Level Set. Once you have found the right final settings, start reducing the Grid or Random In Cube values to produce less particles for testing.
If the level set generation time is too long compared to the particle generation time, you could try dumping the particles to PRT and using PRT Loaders exclusively, but I am not sure this would be faster (probably depends on how crazy the source geometry is).

benyaboy · April 20, 2010, 11:32pm

reguarding the “diskcache.” I didn’t mean “cache to disk”, I meant cache what “loads-from-disk”. either before or after the cull. and similarly for the prt volume, the “created-from-levelset” cache. And these would be per object obviously.

I’ll have to reread your replies later. I just found out I have a surprise client meeting at 8am. so I’m heading home.

Thanks for the help.
If I get a chance to test the impact of globally setting the loader render settings, I’ll put it up.

Cheers,
Ben.

Chad · April 21, 2010, 2:01am

I thought the design intent with the global percentage was to scale the number of particles down and compensate by scaling the density up. Is that not so? Having it happen at the end is rather unfortunate, especially if we have I/O bottlenecks, or complex KCM’s, or crazy culling, or complex materials, but I can see how it would still result in faster lighting and rasterizing.

Setting up a script that loops through all the PRT Loaders and PRT Volumes and scales them by say, 1/8 or 1/27 or whatever and increases the global density by 8x or 27x would also make sense, probably worthwhile. I wonder is making the shadow maps smaller would help too. Not that shadow rendering is that slow, but the softer mask would show the lower particle spatial density less.

Of course, the best way to make super fast renders is to save the whole scene out to a single PRT file at a low percentage, but then you lose the ability to play with KCM’s or materials. However it’s a great way to quickly tweak lighting, DoF, and motion blur, and you get fast viewports, too. And you probably want to do that for the final network renders anyway.

Chad

Bobo · April 21, 2010, 2:27am

When the Global % was added, we knew it was not optimal, but still better than nothing.
For example Culling in the PRT Loader happens at the end of the loading/modifying/material evaluating/transforming process, so we cannot know how many particles will be left, thus we cannot apply a % before evaluating everything. So the simplest solution was to apply the % to the final particle stream that ends up in memory after all unknowns are solved. The result is less particles to light and draw and is good for iterative rendering with Cache on. But in this particular case, the loading time is the problem and there is not much that can be done universally for all sources as each one of them has its own controls - PRT Loader has probably the best ones, PFlow has some, PRT Volume can be kind of controlled but not easily and FumeFX always produces the same count (but we are planning to work on that in the future).

We looked at the code and it says right there that it is supposed to scale the density up, but in my tests I could not measure anything like that in the actual rendering. We will investigate further.

Actually I was considering changing the Shadow Map size as part of the Iterative Scaling option as it makes sense as you explained above. I might just do that.

As you have probably already heard, we have the drawing of particles and matte objects about 8 times faster (on 8 cores) and hopefully the lighting pass will follow, so the total rendering time will get shorter, just not the loading time at this point (materials and culling are multi-threaded already). We fixed the slowdown in PRT Volume particle loading due to memory allocation issues. Not sure what else could be done, but we would LOVE to see some Log feedback from your scenes to analyze where the majority of time is spent.

I also wonder whether we could save the level set generated by a PRT Volume and read it from disk instead of calculating on the fly. If the LS generation takes minutes but loading that data would take seconds, it might be worth the disk space…

Chad · April 21, 2010, 6:02pm

You want the standard log or the debug level?

The more we use the PRT Volume, the more we’re noticing the issue. We’re making some objects where the bounding boxes are really large, but the actual enclosed volume is low. So there are few points relative to the number of level set voxels. And when the objects being evaluated by the PRT Volume have complex modifier stacks, you get a double whammy. Having the PRT Volume allow for geometry or a level set input would be really nice for uncoupling that and could be beneficial, but also allow for more flexibility in designing the workflow, since we could then edit the level set externally.

Chad

Bobo · April 21, 2010, 7:00pm

The standard level provides enough info about the rendering process and various sections’ timing, but Debug mode wouldn’t hurt either.

Chad · April 21, 2010, 7:08pm

I started the render with debug already, and at over an hour per frame (and that’s only one eye) I’m not going to stop it now.

Chad · April 26, 2010, 5:39pm

Sorry, haven’t sent the logs yet, I cached everything out to PRT’s to speed things up, but it hasn’t, so I’m thinking it’s just the fault of having only 1/8th the CPUs running during the 11 lighting passes and the 1 rasterizing pass.

PRG: +Particle Cache Disabled.
PRG: +Lighting Cache Disabled.
PRG: Rendering frame 0
STS: Section “Retrieving Particles”:
STS: Total 00h 03m 01.906s Called 1 times Avg 00h 03m 01.906s
PRG: Rendering 316332578 particles.
PRG: Producing volumetric lighting with 11 lights
STS: Section “Lighting:Matte Objects”:
STS: Total 00h 00m 00.296s Called 11 times Avg 00h 00m 00.026s
STS: Section “Lighting:Sorting Particles”:
STS: Total 00h 09m 07.642s Called 11 times Avg 00h 00m 49.785s
STS: Section “Lighting:Generating Attenuation”:
STS: Total 00h 41m 33.329s Called 11 times Avg 00h 03m 46.666s
STS: Section “Render:Matte Objects”:
STS: Total 00h 00m 00.328s Called 1 times Avg 00h 00m 00.328s
STS: Section “Render:Sorting Particles”:
STS: Total 00h 00m 58.031s Called 1 times Avg 00h 00m 58.031s
STS: Section “Render:Drawing Particles”:
STS: Total 00h 05m 27.203s Called 1 times Avg 00h 05m 27.203s
PRG: Finished rendering frame: 0

Bobo · April 26, 2010, 6:54pm

This is VERY useful info, thank you!

So in the next Beta the 5.5 minutes for the final pass should go down to about 40-45 seconds.
The 57 minutes total render time should be reduced to about 52 minutes because the majority of the time is spent in the lighting and we haven’t threaded the Lighting pass yet

Once we have done that, the 41.5 minutes for lighting should go down to about 6. In other words, instead of the current 57 minutes of rendering you would be looking at around 16 minutes in the fully threaded version. This would be about 3.3 times faster in total.