encapsulating channels in single sample file

LaszloSebo · February 5, 2014, 5:41pm

Whats your roadmap for encapsulating channels that belong to single samples into single files?

I would be super eager to have this feature… to illustrate why a couple of examples.

I did a 10 frame long publish of an object 202 objects (separate caches), which resulted in 4907 files being generated. This is a lot, but still reasonable. However, the full frame range we have for this shot is 2100 frames…
An earlier export of this object-set resulted in 300k+ files, which we could not use as transferring that between facilities was prohibitively expensive… :-\

Another major problem is that server access of tiny files is much slower due to the ratio of file handle creation / handling vs actual file read activity. Some of these caches, while only a low number of faces, have very low viewport performance, mainly due to the file access activity.

We would still prefer to have samples in separate files, just not every channel.

paul · February 5, 2014, 7:09pm

This is our top priority for when we update the XMesh file format. However, we don’t have a timeline for this development yet. As always, if this is a high priority for you, we could develop this though our professional services (custom development for a fee).

I wonder if one sample per file sufficient in a case like this? At one file per sample, we’re still looking at 202 x 2100 = 424 200 files. We may need to combine frames or objects into one file.

May I ask why you prefer samples in separate files?

LaszloSebo · February 6, 2014, 12:15am

The reason we prefer separate files is that we localize cache data on workstations / rendernodes, based on whats required. If you are rendering frame X, you will only be pulling that one frame and what it needs to render it.

LaszloSebo · February 7, 2014, 6:24pm

Actually, for cases like this, we would probably turn off localization… So maybe an option to have a single file per cache would be beneficial. I feel like a hypocrite, because i have been arguing for separate files per sample for 10+ years, and even wrote my own caching format with that philosophy

paul · February 7, 2014, 7:50pm

Are you thinking that there will be too many xmesh (XML) files if we use separate files?

(What I was thinking of is keeping the separate XML files, but combining the binary data into one giant file. We could provide a simple command-line program to handle copying a frame range out of the combined binary data file.)

im_thatoneguy · February 12, 2014, 5:41am

Sign us up for all objects in one file please! I would expect this as an option with the ability to pick an object on read. Obviously with an option to generate 202 objects for instance as needed.

You could still pull down just the samples needed but it would be more like 4 files instead of 808.

LaszloSebo · February 12, 2014, 5:57pm

Combining the data into one file already takes away the ability to easily manipulate the data, say, swapping channel files to turn UVWs into a geometry so you can deform it with modifiers or selectively re-cache a single frame (which we actually do in a lot of cases).

So at that point, i think the separate xml files are redundant, and should also be combined into the large file.

My instinct tells me that we should still have the “one file = one sample” approach, as that still lets us to combine multiple objects into a single file (if we just export them as a single set), and also lets us keep them separate (at the expense of multiple files).

The ability to recache a single sample is probably the most important thing, as we sometimes cant afford to recache an entire sequence for one bad frame…

im_thatoneguy · February 12, 2014, 6:00pm

Would it though? If they’re still plain text wouldn’t it only add one line of code to instead of parsing the entire XML, look for one a tag? Or at worse two <Object=“$NAME”> && ?

LaszloSebo · February 17, 2014, 5:49pm

Yeah the swapping could be easily scripted.

LaszloSebo · February 17, 2014, 5:50pm

I recall several occasions where we had to recache full several hour alembic caches simply due to the fact that they were a single file. If we at the time used xmesh, it would have been a simple requeue of a particular frame in deadline, done. So i think having the samples as separate files might actually be the better option. But right now due to the 10+ files per sample, i feel that its too expensive.