Cuda accelerated partioning, rendering etc

jhjariwala · September 14, 2012, 6:16am

Hi All,

I was wondering if Krakatoa can utilizes the GPU to accelerate the rendering, partioning etc.

Can GPU be helpful/utilized in certain Krakatoa tasks where it’s only single threaded? How can be it benefit and speed up Krakatoa over CPU? I think i could ask same for Frost as well.

Bobo · September 14, 2012, 10:26am

My 2 cents:

There are barely any parts of Krakatoa and Frost left that are not multi-threaded. The ones that come to mind are PRT Volume level set conversion and Frost Geometry mode. Neither of them would benefit from GPU acceleration, but we have every intention to make them multi-threaded in the future.

GPU acceleration could be used to accelerate preview rendering and voxel rendering. Krakatoa Voxel Rendering algorithm is perfectly suitable for GPU acceleration and IF we decide to support GPU in the future, might be the first candidate to try out.

The main reasons we are not looking into GPU acceleration yet are:
*Network rendering. Imagine a large VFX studio with dozens (or even hundreds - we have such customers) Krakatoa-Render licenses on Deadline. Adding a CUDA-capable graphics card to every network node wouldn’t be realistic, so making everything CPU-based is usually an actual selling point for large customers.
*Memory requirements - currently Krakatoa requires all particles to be in memory all the time. This means that the amount of graphics card memory would become a limitation. Should we solve this for CPU rendering (out of core rendering), it might open the potential for GPU rendering, too.
*A lot of what Krakatoa does is very very fast. The slower parts are typically on the Max side - material and maps evaluation, single-threaded deformations on the modifier stack (Magma is multi-threaded) etc. Moving to the GPU would not help there anyway… Similar for saving particles - Max would not let us process stuff in parallel, and GPU would not help.

I cannot comment about the potential of a GPU-accelerated Frost, because I have not discussed it with Paul and simply don’t know. Frost is heavily multi-threaded already, but we know it could be faster. Do you have any benchmarks comparing Glu3D GPU’s meshing with Frost? Especially in the 10+ million particles range.

jhjariwala · September 18, 2012, 5:41pm

Hi Bobo,

As for frost, yes it’s indeed very fast. Only meshing method I find pretty slow in frost is “anisotropic”. I havn’t done benchmark with glu3d gpu meshing yet. It’s not fully implemented yet. But I hope to do some benchmarking on weekend.

As for Krakatoa, It’s quite fast too. Maybe it’s possible to implement GPU based rendering for realtime viewport preview when working with magmaflow? (today I was working on clouds R&D vimeo.com/49687141 , and I found cellular map quite slow when visualizing in viewport with magmaflow as PRT viewport color)

I would love to see realtime preview to tweak parameters instantly, more of interactive rendering. Visualizing voxels and points in viewport will be awesome whether in Nitrous and Direct3D viewport.

Chad · September 18, 2012, 6:19pm

The problem there is that the Cellular map itself isn’t going to be GPU rendered. So either you have to implement all of the old maps on the GPU (which ADSK has largely done, but I don’t think they share) or you’d be stuck not having access to any maps specifically written for this new GPU renderer.

Cellular itself wouldn’t be hard to convert, of course, just pointing out that you’d be giving up a huge chunk of your workflow to get GPU rendering, enough to make you wonder if it’s worth it.

Would you be happy with something like a GPU based PRT object with it’s own separate subset modifiers that gets passed as a particle stream directly to Krakatoa? That way the entire particle dataset could be on the GPU memory and avoid passing it back and forth, which is a huge bottleneck.

What about an ActiveShade implementation of Krakatoa? We’re working with Krakatoa SR right now and have an interactive version of it that is entirely CPU based (for now) but it’s really fast if you do proxying while doing the “magmaflow-esque” operators and then when you let go of the sliders it does the full resolution render in the background.

jhjariwala · September 18, 2012, 7:17pm

I didn’t mean cellular map to be speed up by GPU rendering. I may misplaced it in this thread. (i was talking as in general that it was a bit slow in viewport, maybe should had posted as separate thread)
I was interested to know if there is something going in direction of gpu based or if it’s possible that gpu based voxel rendering could be faster than CPU?

My main interest would be to visualize particles interactively in viewport or at interactive rendering like you said that is in Krakatoa SR. Also now a days Geforce GPUs (cheap one) are available with 2GB and 3GB of memory which could hold up enough amount of particles.

Chad · September 18, 2012, 9:06pm

I just meant that each time you wanted to process the particles on the GPU, you have to move the data to the GPU. And each time you want to process the particles on the CPU, you have to move it back. The moving of the data is the slowest part of the pipeline. It’s not uncommon when we make OpenCL tools for Fusion to have the image upload to take 5ms, the kernel itself to take 3ms, and the readback to take 5ms. If you get a faster GPU, it just gets proportionally worse, 5ms/1ms/5ms. So what we do is chain all the OpenCL tools together and do all the processing on the GPU and transfer back to system memory as infrequently as possible. If you were to try do to this with Krakatoa, you might be able to get SOME modifiers to work on the GPU, but since most of what you want is on the CPU and isn’t Thinkbox at all (like Cellular) then you will hit a transfer bottleneck.

There are some parts of the rendering pipeline that could be done on the GPU, like the DOF splatting, but I don’t know if it would actually make enough of an improvement to justify the cost, especially since you might lose a significant portion of your render farm. So it might net out to be slower.

If you wanted to do voxel rendering in the viewport, maybe that would be something interesting for Ember? Let Krakatoa pass it’s points off to Ember, then allow Ember to sample to voxels, upload to GPU and raymarch there? It might make more sense to do that as ActiveShade too.

JohnnyRandom · September 21, 2012, 4:08am

It is very “setting” dependent, it can be quite fast too. The “right” combination of settings will indeed slow it down to a crawl. I love anisotropic, cheap sheeting is a beautiful thing, I would love it more if the sheeting were a bit more stable, it can flutter quite a bit if you are stretching too far across gaps.

jhjariwala · September 21, 2012, 8:27am

I did some tests and Frost is very fast compare to GPU meshing. I was wondering what possibilities could happen with GPU support for Krakatoa, as seeing other plugins, applications, fluids system uses gpu to maximize performance.

My question is now, what are the possibilities for ActiveShade or Realtime preview for Krakatoa voxels/points?

Bobo · September 21, 2012, 8:56am

As mentioned already, voxel rendering in Krakatoa was implemented based on a real-time GPU algorithm, so it is predestined to be the first thing we try if we start looking at GPU acceleration. I suspect we could do the same for point rendering after that… Voxel rendering typically requires less particles to produce solid-looking results, so it would require less memory on the graphics card. Also voxel rendering is currently a lot slower than point rendering on the CPU and uses much less cores than point rendering, so speeding it up would have a larger impact…

It is being considered for Krakatoa MX 3, but we can make no promises at this point.

benyaboy · September 21, 2012, 9:31am

A good rule of thumb for determining good usage for gpgpu: “Can I saturate thousands of threads?” Because if you aren’t able to break apart the processing into thousands of tiny independent pieces, you are just using the gpu like a much slower cpu.

A minor point is about memory. Since graphics cards have less memory, you need to be clever about paging or streaming in new data. This is super easy for a raytracer but might be difficult for krakatoa.