Should the Cache External File References On Slave or Copy ALL External File References to Repository option account for PRTs?
I’m sure it would be a lot of I/O, and make for huge jobs, but it would allow you to work with PRT’s locally and yet have the farm render you scene just fine, and in the case of the Cache option, you could (if I interpret the help file correctly) reuse PRT’s for multiple passes which would reduce network I/O.
Nope, as far as I understand the process, copying the files to the Repository does not reduce network load.
The reason people copy external files to the Repository is to “freeze” them at the version at the moment of submission. This makes sense if the assets are being worked on and you don’t want a texture to suddenly change in the middle of a render job because and artist modified it on the network.
This does NOT reduce the network load because the textures are pulled from their network locations and then sent to the Repository that is just another network location. Then the job data has to be copied to the slaves for rendering, so in fact you are doing one step more here (collecting and duplicating the data).
What we did at Prime Focus at one point in order to avoid this in a custom internal pipeline was to cache all textures on the slaves. The textures were not collected and copied to the Repository, but their names and original locations were collected in a text file and sent with the job. The MAX scene was saved with removed texture paths, just file name and type. A script would run on the slave and it would go through this list of files, look in a local cache folder and compare the files there with the originals on the network. If the network file was modified or the local one did not exist, it would be copied once to the slave’s cache folder. If the local version was up-to-date, it would not be touched. Once the script was done, the scene would be loaded and would remap itself to the local cache path where all textures would be waiting. After a couple of days, all slaves working on a project would have a full complement of textures and as long as these textures were not constantly modified by the paint artists, almost no syncing would be needed most of the time. Obviously this would require a lot of disk space on each slave and doing the same with PRTs would be an overkill. We did support VRay geometry caches though at one point.
So if your PRT files barely change and would fit on the disk drives of all your slaves, such a scheme could work. But this feature is not part of the shipping Deadline and I am not sure if it ever will be.
Assuming you kept a per-slave cache (and not per-job), you could compare the modify dates on the network and local files and update the local ones only if the network ones changed. If you had multiple jobs using the same external assets, this would dramatically reduce the network load.
That’s the setup that Fusion 6.1 does with the “Local Cache” and it helps both reduce network load as well as (optionally) provide resilience against outages of the network storage.
You would probably need to have per-slave limits on the size of the cache, and clear out old files based on some rules, as well as have a control in Monitor to clean out the cache from a job or slave.
Just something to think about. We’re right now struggling with some Krakatoa jobs where 85% of the render time is just loading the PRTs. If we’re just doing passes, there’s really no reason not to reuse the cached PRT’s if they existed.
At some point we were even discussing using a Torrent system to share data, so slaves that needed resources would pull them from other slaves that had them already instead of hitting a centralized location. But then the companies split and we never came to a point to try this.
I was writing logs in the cache with the usage data (how often a texture was actually requested and when), but never came around to implement the full cleanup logic based on that - if the hard drive reached a threshold, I had an emergency deleting procedure that would clean up brute force and resync what is needed after that. Don’t think that ever kicked in.
Since we have most of the logic in place for textures, we could look into adding support for PRTs. But PRTs are orders of magnitude larger, and I am not convinced copying terabytes of data to slaves is a great idea…
You could filter what to cache by extension or size, and that would help some.
It should only hit TB sizes if you cache a sequence, right? Individual frames shouldn’t need more than a few hundred gigabytes.
Now if Deadline could track the external files and dispatch tasks/jobs based on the slave having more of the cached files, so that say, frame 135 of each of 10 passes would be sent to the same slave… That could certainly reduce traffic.