How to minimize idle time in our render farm

CHCSI · October 15, 2018, 10:59am

Hi everyone!

Quite new here so I hope I’m in the right place. If not, please let me know where I should post this instead.

I’m curious if anyone has any input in an issue we’re having. Here’s the situation:

Currently we have a render farm on our location, which is preforming well all things considered, however we have received complaints that we do not have enough processing power to handle peak periods, and looking at statistics, we have seen that idle time has been around 50% or more over the course of the year.

Now, I suspect this is because our users and projects don’t utilize the time we have properly. Now, since we have around 50% idle time, we could potentially do twice as many jobs in the same time period if we utilize our time effectively.

So, my question is: Has anyone had a similar issue, and if so, how did you resolve it? What’s the best course of action to take here? Do we need to talk with project leaders and users, and provide them with specific time-slots they can use for rendering, to better spread out the jobs we have, or is there any other solution that may work better?

Thanks in advance!

/Chris

eamsler · October 15, 2018, 4:33pm

Hey Chris! This is the perfect spot.

I guess I need background here. Is the 50% idleness overnight or during the day?

If it’s during the day while renders are happening, it may be time to use multiple concurrent tasks or Slaves to better saturate the render nodes. That depends on the applications you’re using so outline the tools in your pipeline if that’s the case.

If the problem is idleness overnight, you may want to do workflows where artists submit preview jobs with frame skipping, then submit jobs with the full frame ranges to happen overnight.

Now, this all said what we’re finding is the cloud is a great way to handle this burstiness of render farms. You could start machines during the day to get frames back as quickly as possible, then turn them off at night. I’ve asked one of our solutions architects to see if we have a public cost breakdown for where the break even point is.

CHCSI · October 16, 2018, 6:23am

Hey Eamsler, thanks for the reply! I’ll try to provide as much information as possible here.

I’ve checked the farm status reports in deadline monitor, and our problem areas seem to be weekends & evenings, that’s where the majority of idle time occurs. It seems during weekdays we pretty much always have jobs queued on the farm it’s relatively smooth sailing, but once the weekend rolls around or the workday is over, the farm pretty much stays idle.

We have been experimenting with cloud based solutions to help lessen the load of peak periods, but so far I think the problem is that we haven’t been able to calculate when it’s cost effective to use them.

A cost breakdown would be really helpful if there is any info available.

Thanks again!

eamsler · October 16, 2018, 3:14pm

Yeah, the cost breakdown is interesting because it varies by workload. In talking with the SA yesterday, they mentioned that it’s really hard to compare farm costs. Factoring in floor space, cooling, server power, and maintaining physical infrastructure locally is difficult.

I’m not aware of any cost breakdown docs and usually it comes down to a proof-of-concept and throwing some test renders at the system. Different workloads (compositing, GPU and CPU rendering, etc) have different instance requirements.