I’m getting erratic performance when culling particles in the PRT Loader. Sometimes taskmanager shows 66% CPU usage, and other times around 25%. This is on a 4 core system.
I am loading the PRT from a RAM drive, so I don’t think it’s an I/O issue, and I’m only culling, not doing any color or density modifications or deformation other than a transformed node.
Is there anything I should be looking out for to make this faster?
Krakatoa loads the particles in batches (of around 50000 particles) and culls them in parallel, then loads the next batch and culls them parallel, until done. Since the I/O portion is by definition a single threaded operation you generally do not see 100% processor utilization, and will see the CPU usage spike up then down rapidly. That being said, it will also probably be more erratic if the culling geometry is fairly light, since the parallel portion of the pipeline becomes smaller relative to the serial (such as I/O and Modifiers).
As far as I can tell there isn’t such a thing as multi-threaded disk I/O. Its not really a processor operation, so having more cores doesn’t make your disk or network transfer data into RAM faster. I have Krakatoa doing constant disk reads in a background thread, filling one buffer while the rest of Krakatoa is processing the one before it, but you still run into standard bandwidth limitations where you’ll see slowdowns if you can cull 50000 particles faster than you can load and decompress 50000 particles from a networked drive. I can definitely spend time working on variable sized buffers to hide the limited bandwidth, more or less like buffering in streaming video. Its kind of a dumb system right now, since it doesn’t adapt the read-ahead buffer sizes based on the amount of computation you are doing to the incoming particles. So long story short, it is sort of multi-threaded. Do you guys have any sneaky tricks that you use?
Yeah, I’ll have Ben send you some info on doing multithreaded disk I/O. Dunno if it will help or not, but when we were optimizing our disk reads for Fusion, we had to deal with this.
The buffering part was related to the fact that our I/O systems, being the fibre channel SAN and RAM drives, were way faster than our normal “local” hard drives, so the buffering was a toggle that could increase performance. But we also had controls over the actual file access sizes in bytes and could tweak them per machine.
The asynchronous overlapping I/O might be tricky because of the compression. However, if you have multiple PRT’s coming in, ala partitioning, then you could overlap the reading of the each PRT.
Fusion provides control over all these options via a crazy mix of environment variables and preference files, but the performance difference is startling. Matching the buffer size to that of the storage system alone provided a huge improvement, and when we fired off multiple threads on uncompressed image data, we saw a really big jump in speed.