Does Krakatoa really use all CPUs when partitioning?

I’m saving out partitions right now and I notice that only 25% of my available CPU power is being used. The machine I’m running this on has an Intel i7 930. In my Krakatoa Prefs, I do indeed have “Use All CPUs(8)” selected, but I don’t see that reflected in the Windows Task Manager. So either Krak is lying or Windows is.

Any idea what’s really going on?

That preference is only applicable to certain parts of Krakatoa involved in the rendering process. It has no effect on saving particles, especially when running particles through external systems (ie. Particle Flow, Thinking Particles, Modifiers, etc.).

Some people have had success using Deadline to start multiple instances of 3ds Max on the same machine in order to use more cores. Depending on the I/O requirements of your scene and the fact that starting Max is often slower than partitioning a frame in Krakatoa, your mileage may vary.

Interesting. Thank you.

If you look closer, the Krakatoa setting you mentioned is labeled “Particle Sorting Multi-Threading Settings”
software.primefocusworld.com/sof … erformance

So no lie there, the only thing that really hits 100% CPU even on our 8 cores here is the sorting of particles before calculating attenuation and before drawing the final pass.

Various other parts of the system are multi-threaded, but they rarely come close to 100% for various reasons. For example, the applying of materials and maps to particles during the loading of a PRT Loader is multi-threaded, but it really hits the CPUs hard only with very complex shader trees that are rarely used in Krakatoa anyway. The same applies to culling particles in the PRT Loader (multiple particles are being processed in parallel), and the Magma sub-system accessible through the KCM modifiers and the MagmaFlow editor also uses all cores, but requires hundreds of nodes in complex trees to really push the limits of the system. Right now, the memory management overhead is probably the real “bottleneck” of Magma (if you can consider the processing of several million particles per second a “bottleneck” :wink: ).

In the case of saving particles though, there are two sides to the story: the source object producing the particles and the saving code of Krakatoa itself. Now obviously Particle Flow and pretty much every other 3rd party system you can save from is single-threaded, so we cannot do anything there but wait for it to provide particles. But in the current version 1.5.1, we are performing the saving one particle at a time, which is not optimal if the particles are getting processed by Krakatoa beyond the generation step itself. For example, if you are resaving particles loaded via a PRT Loader and you are applying KCMs, deformation modifiers and materials to these particles before saving them to a new PRT sequence, the process is slower than it should be. So in the upcoming version we are working hard to deliver soon to you this has been improved by pooling a large number of particles together and running all the additional operations in parallel, causing more than one CPU to get engaged and, of course, resulting in a measurable speedup of the saving process.

Thank you very much! I’m still wrapping my head around a lot of Krakatoa’s features, so I admit my noobness. Thanks a million for the clarification!

I must admit that I am also wondering, why Deadline - even with 8 instances of 3dsmax started - only uses around 15% of my system resources…

If I look in the task manager processes list, each instance of 3dsmax only uses between 1% and 3% CPU?? Surely something must be wrong?

My scene is a basic setup - 1 million particles and some wind - 16 partitions - 2 machines/8 threads each…

On Deadline 3.1 my nodes are at full load (80-100% CPU usage) when partitioning - so, my guess is that this is somehow related to Deadline 4.0

Here is a screenshot - only got 6 instances of 3dsmax started so far - but it clearly shows that only one thread is actually doing something

http://i176.photobucket.com/albums/w179/Martians3d/deadlinethreadprob.jpg

Hope there will be a solution to this problem soon :slight_smile:

ye i have the same issue Martians has :frowning:

I did some testing on our slaves (running Deadline 4) and I am quite sure I sent the results to Martin.
I could not reproduce the issue on our network.

In general, 3-4% is the number I am seeing on an 8 core workstation when windows is paging and cannot really process the scene correctly. Can you submit 8 tasks to a slave, then go and watch the Task Manager and create some screenshots of the CPU/Memory tab and the Processes tab with all max copies sorted on top so we can see what the CPU usage distribution and memory load is?

When processing PFlows (single threaded), I see around 10-13% load going up and down. When running concurrent tasks on the slave, the real issue for me was the startup time of each task. On our system loading Max takes a while because we also hit the network to sync the whole Max installation to ensure plugins are identical. So I could not get all tasks running in parallel - before the last one started, the first one was finished. Thus, I couldn’t get more than 70% CPU load at any given time, and even when I got it, it was fluctuating between 40 and 70%.

I am not sure whether you are seeing a Deadline issue, a Krakatoa issue or a hardware issue. I have spoken to the Deadline developers and they have no ideas what could be going on. The fact that it behaves as expected on our slaves using Deadline 4 shows that it is most probably not a Deadline problem.

Since we have more than enough slaves, we rarely use concurrent tasks processing, so there might be things that we don’t know about just because of lack of practical experience. On a historical note, the whole concurrent tasks ability came from the early pre-Krakatoa renderer that we used on “Stay” in 2004 - it was also single-threaded and quite I/O-bound as it had to read Spore particles from a slow external DLL, so we were running up to 8 tasks per 2 CPUs slave to saturate it.

Let’s try to get to the bottom of this.

According to the screenshot, only ONE of 6 max copies has actually loaded the (untitled) scene.
All others are still on the “startup” MAX file used to create the connection to the 3dsMax plugin.
So on this screenshot, according to Windows 7, only one Max copy is actually working on particles, the others are still loading.
I want to see a screenshot of all 8 (or at least 6) listing Untitled as the scene name (or whatever the scene name is).

The strange thing is that according to the Deadline Slave, 5 out of 8 threads are rendering. This is not supported by the Windows display of Max copies.
I actually trust what Windows is showing in the Task bar more than what the Slave says, so I don’t think you have more than one Max running at that moment…

Yes, that would definitely explain the minimal system load I’m experiencing…

I’ll do another test later and let it run long enough to attempt opening all 8 threads… It only ran about 15 min. for the last screenshot here…

Just a thought, could it somehow be license related? Maybe running in license-free mode is causing it? Or did you test both with and without Deadline license?

It should not be license-related. In fact, the free mode just checks the number of slaves installed and does NOT look for a license at all, while my test was on a fully licensed network with about 100 machines online.
Launching another concurrent task on the same machine seems to always take quite long. If you have 8 CPUs, try launching 10 tasks with 4 tasks per machine and 1000 frames or longer. Once the first task is done, the same machine will continue with task 5 without reloading Max while the other 3 are still working. If you still don’t get around 40-50% CPU load, then we have a real issue. Right now, I think it is just the sluggishness of the Max launch that is interfering.

Allright, just did a 40 minute test… Basic scene, 10 partitions - 4 tasks per node

Here are the screenshots:

http://i176.photobucket.com/albums/w179/Martians3d/deadlinethreadprob2.jpg

http://i176.photobucket.com/albums/w179/Martians3d/deadlinethreadprob_monitor.jpg

http://i176.photobucket.com/albums/w179/Martians3d/deadlinethreadprob_monitor_errorlog.jpg

This is quite strange. So according to the first screenshot all 4 partitions are running (each takes around 6 seconds), but the CPU load is just 7%. Also I am not sure why Max is showing “deadlineStartupMax2011” as the scene title, but that could be a glitch in Windows. According to the Monitor, your scene was Untitled and that’s what should be listed.

I would love to know how long it takes to save one frame and one whole partition on your local workstation without Deadline. Does it take 6 seconds or much less?

Then the second screenshot confirms that the two nodes have 4 threads each.

The third screenshot shows an error that I cannot explain (we have seen it several times, but could never reproduce the cause for it). In short, Max somehow manages to create a user interface in network rendering mode where no user interface could exist. The “popup” dialog is part of the Krakatoa GUI, but the Krakatoa GUI cannot be open during network rendering because Max does not support viewports or user interface elements in that mode.
I would be interested to know whether that error happens all the time or just sometimes? I would love to fix this issue if we can find out what is causing it.

Also, I am a bit concerned about your description of the scene as “basic scene”. If you are testing CPU load, you have to test with HEAVY scene, something that would make PFlow think a lot. Heavy collisions, a lot of spawning, things that generally hit the CPU.

What I would suggest as part of the test is running one partition of the same scene locally on your workstation and looking at the time it took. (You could use less frames so you don’t have to wait for 40 miutes per partition). Then multiply that by 10 to see how long would it take to make 10 partitions locally.
Then compare to the total time it took to save those same 10 partitions using the two nodes on Deadline. Was the local partitioning of 10 partitions faster or slower than the Deadline partitioning? In theory, 2 nodes x 4 tasks should save 8 times faster, but with the overhead of loading Max, it might be closer to 4. With the last two partitions handled later, the whole process might be even slower, but I would still expect it to be at least 3 times faster than one workstation doing all 10.

Then try submitting the same scene to both nodes with ONE task per machine, without concurrent tasks. This should be in theory about twice as fast as one workstation doing all 10, but with the overhead of managing Max it would probably be less than twice as fast. But it would free up your workstation to do other things, so it is not a bad approach.
Still, we want to get an idea whether partitioning on Deadline with your setup DOES make things faster than running local partitioning on your workstation. Let’s get some actual numbers. Let’s not even look at the CPU load at this point, but try to find out which method gives you the 10 partitions in shorter time.

If it turns out that Deadline partitioning does not give you the same output in less time, we really have a serious problem. If it is faster, we want to figure out how much faster and whether the Concurrent Tasks or 1 task per machine produces the output faster.

Thank you very much for your time!

Here are my results using 500 frames and 1 million particles affected by wind:

Single partition on workstation:

25 minutes

Deadline (nodes are identical to workstation):

10 partitions and 10 threads (2 x 5):

130 minutes

Which means that Deadline, even when using 10 threads on two fast rendernodes is only 50% faster than using a single thread on an identical workstation.

It could of course be that using a heavier scene would make these numbers tilt more to Deadline’s favour. But it still seems a bit wrong to me…

Now try 10 partitions on Deadline with 1 thread (one task per machine). Does it still take 130 minutes?

Saving the partitions using Deadline and single thread took 35 mins. per partition, so 350 minutes in total - almost 3 times longer than when using multiple threads…

So, I think that the task manager might actually be the culprit - or at least it shows some strange info about CPU/thread usage…