We are considering running 2+ slaves parallel on some machines with high ram / cpu count. Since each of our max jobs have their own specific configuration, this presents a bit of a problem for us: if conflicting configurations would need to be running at the same time in two separate maxs…
I think some of the critical conflicts we could probably sort out on the plugin.ini level. Sadly some files conflict in the max root (for example different versions of fume etc, put things in there… common dlls, that conflict between versions).
Would you suggest isolating separate max folders, like:
c:\3dsmax_0
c:\3dsmax_1
? Or should we somehow try to sort it out on the plugin ini level? (might not be possible)
Obviously, the simplest form of parallel processing would be running Concurrent Tasks - since the Tasks belong to the same job, there would be no problem with conflicting configurations. Plus it would require no additional license.
Running multiple Slaves was mainly intended for running completely different types of jobs on the same box - for example a Max job and a Nuke job. Assuming that the Max job would be CPU-intensive and the Nuke comp would be mainly I/O bound, you would use both at their maximum, and once again you would have no configuration problems because of the dissimilar apps running.
What would you benefit from running two different Max jobs on the same machine if both are rendering? I suspect giving all cores to VRay in a single job would be better than splitting the cores between two rendering jobs. Similarly, if the one job is simulating Fume while the other is rendering VRay, since both are quite CPU intensive, I am not sure you would benefit from it. But I could be wrong.
Can you give an example of the nature of the Max jobs that would be sharing the node?
One example would be a TP sim which typically uses usually a single core and a parallel vray render. We find that max utilizes the machines on average only around 20-40%. Vray renders push the average into the 40%s, otherwise the number would be quite disappointing
The goal would be to get the percentage as high as possible.
We are currently testing running parallel nuke / 3dsmax renders, and will see if that provides a benefit. There is the obvious confusion it causes for artists (“on some machines, sometimes, my renders take 3x longer!”) but it seems to provide some flexibility, as it gives a chunk of slaves that can always pick up nuke jobs, even if at slower speeds, and also affecting the 3d job on the same machine. At this point i cant conclusively say whether its actually giving us more throughput or the opposite…
It is issues like this that are motivating rapid adoption of private clouds in technology companies. Currently this adoption is largely outside VFX, but we know of at least a few VFX companies that are in the process of converting a small portion of their classic farm into a private cloud for evaluation.
In a cloud scenario, each application can be factored into its own VM image so that there is no conflict with other versions of the app or with other software in general. In some cases this can have the side benefit of obviating chunks of pipeline code responsible for adapting the environment to specific app versions. VMs also open the possibility of using apps on different operating systems so that it’s no longer necessary to manually partition the render farm by OS. Adding new machines to a private cloud is quite efficient since only the base OS and hypervisor need be installed and this can be totally automated. And the need to track software versions across hundreds of machines largely disappears since software management becomes more about configuring individual VM images (and it’s easy to roll back to an older image if it’s wrong).
In the case of Windows / Max, a VM-based approach would mean that each VM is loading Windows, so there is some added overhead there. This is less of an issue for Linux because it can often be made very light. But while Window is heavier, it can be paired down by removing services and components that are not relevant to the computing application. The decision as to whether running two (or more) VMs on a compute node has more total “utility” than wangling two versions of an app (like Max) to peacefully co-exist on a classic render node is surely dependent on factors unique to each studio. These decisions are best reached through explicit testing, in my opinion.
Yeah we started discussions after your email internally, and might start looking into this as well.
It all depends on lost performance percentage. As long as a VM render is only 5-10% slower than a regular, i think it would be justifiable. If it reaches the 20% point, its probably going to fall off the table as that reaches hundreds of thousands of dollars in lost performance (albeit, who knows how much $$ by having easier management…). We have to do some testing