Deadline - Darwin/Spencer - Survival of the Fittest

MikeOwen · October 23, 2011, 11:55am

Hi,
(I guess this is the correct place to be posting for the advisory board these days?)
en.wikipedia.org/wiki/Survival_of_the_fittest
Dr.D Studios (Happy Feet 2) are using the concept of assigning a “Darwin” value to all render jobs by first rendering every, say 10th frame, seeing how long it takes to render, how much memory it takes to process those frames, etc and then using the calculated “Darwin” value, deciding where in the queue the job will be placed. The best optimised/efficient render jobs will get processed first. Major flaw in the system is when you have a certain job that just needs to be pushed through. ie: no way to forcibly override this architecture. Actually, yes, there is.

How about 4 Deadline variables in the future? (darwin, pool, priority, date) or any other combo variable? All jobs in the future have a Darwin default value to “0”, which just means “Darwin” is ignored. However, for larger studios that want to implement this architecture, then the Darwin value can be calculated. The event plugin architecture can help us here to execute the pre-calculating “Nth frame” analysis jobs so that the “Darwin” value can be assigned.

Conceptually, I guess this idea comes back to a db backed Deadline system of AI. ie: dynamically chunking/de-chunking tasks depending on whether the previous frames rendered are slow or quick to process.

Anyway, I was inspired by the idea, so thought I would just post it
Mike

Bobo · October 24, 2011, 3:11am

This is an interesting idea, but…

Survival of the fittest in the case of Deadline would mean that if an artist optimized a job to run more efficiently on Deadline, it could be preferred over slower jobs.
This might sound cool, but it is totally unfair in real world production. In many cases, artists submitting jobs have no choice but to submit whatever data they have to, knowing perfectly well the job will be slow, but important and impossible to optimize.
What really matters in production IMHO is the ability to predict WHEN a job will be done.

I think it would be much more interesting to look into a way to schedule Deadline by providing the (wait for it…) actual deadline of each job.
So a producer can specify a “date and time by which the job MUST be done” and let Deadline assign the right priorities and all other logic to the jobs to meet these requirements. If the requirements cannot be met, the system should be able to predict that to a certain extent days in advance to avoid surprises and client disappointments.

We have tried to do this in production by analyzing previous versions of the same jobs, including safety factors (around 1.5x the predicted time). Production schedules usually provide enough guidelines about which shot and which elements of it would be needed by which day so compositors could put the shot together and send it to the client etc. For a company working on a single show for 2-3 years like the case with CG movies a la Happy Feet 2, the requirements might be different. Even for your kind of production, the SOTF method might be a good idea. From my experience, having it in VFX production with several shows going on would be equivalent to an artist going into the queue and setting his job to 100 Priority and stealing all machines. Just because a job renders faster does not mean it should be prioritized against a job that takes 10 hours a frame but is needed by noon tomorrow…

Discuss!

cbond · October 24, 2011, 5:35pm

first off - i think this is the actual new space. i was inclined to keep it separate, but it didnt make sense as you or chad or ben suggested…

second - i had an idea similar to this but vastly different. in doing some tests with a specific kind of render, i realised that a combination of concurrency, frames at once etc has a direct coorelation on rendering performance- but not always the way you expect. i spoke to ryan about coming up with a MPH or L/km analagy for rendering so that people could better optimise their renders on machines…i’ll expand on this with the same analogy

in a car - you have a number of metrics - RPM, KM/H and MPG - a combination thereof can maximise your efficiency. for example - you can drive some cars at 60mpg at 5000RPM or at 2000 RPM - and that has no effect on how fast you get there, but it does effect your MPG or efficiency. similarily - there are ways to submit jobs that are very fast to the farm, but at the expense of other jobs when different submission settings may result in a similar speed to completion of hte render without being resource hogs.

i did some tests of a non-cpu bound renders and discovered that the completion time of hte render could change by 10-20% but the actual resources used were 100-1000% more. that means that the ‘quickest’ render, would actually ensure that nothing else could get through the queue BUT by doing that and essentially killing any other job you would only get the job through a few minutes faster. Doesnt seem efficent to me…so i think i would love to come up with an Efficiency rating - kind of like MPG to give feedback on how hard you are working your cores/network IO with different settings to try and optimise the final workflow.

comments?

cb

Chad · October 24, 2011, 11:15pm

Aye, we have Fusion renders that are basically GPU bound. The GPU does an OpenGL render for around 20 seconds then Fusion compresses the image on CPU and saves it out on the network. So you can see cycles of GPU/CPU/Network over and over again. So if you can slip in another concurrent job that doesn’t use the GPU, but uses the CPU at a high rate, then you might see the Fusion job slow down by 20%, but you can at least get the other job to render “for free”.

But how do you determine if that’s acceptable, and how do you find the compatible pairings of jobs? Both seem fairly complex to me right now.

Chad

MikeOwen · October 24, 2011, 11:42pm

Sounds to me like the idea behind the “Windows Experience Index” grading system, giving each sub-system of a machine a grade.
en.wikipedia.org/wiki/Windows_Sy … sment_Tool
However, its Windows only. I don’t trust it and “Deadline Experience Index - DEI” just sounds pants

Going back to my original post. I totally agree with Bobo, it would get in the way of production especially as project deadlines get closer, hence the Darwin default value of “0”, which would cause the Darwin value to be ignored for those jobs. Of course, this system could be setup now, by just running pre-calculating 10th frame jobs for certain jobs with a specific job plugin setting and then automatically drive up or down the pre-existing priority value. So, I have actually just come full-circle to why Darwin isn’t the best setup for the majority.

I guess what I am really pushing towards is a more AI based approach, where the farm does more ‘thinking’ for me (auto chunking/de-chunking as the frame sequence render times go heavy). In previous conversations with Ryan, I believe this meant a db backend. We feed the farm so much information, its a shame it can’t learn more from it and learn to adapt!

However, I do agree with Chris’s concept of grading a machine. The push towards virtualisation of render-farms is only going to increase in the future and we will need a way to identify the most efficient Deadline job processing workflow on these machines. A part of me says, leave this all alone for the individual studios to handle as they will all want different things.

Edit: I completely forgot to mention Bobo’s major point. Being able to tell Deadline to do whatever it takes to get these jobs completed by a certain time would be amazing! Again, to achieve this, Deadline would need the ability to alter task chunks to improve efficiency as the jobs progress.

cbond · October 25, 2011, 12:31am

to clarify - i wasn’t referring to grading a machine, but a job or even a task. you can compare task time and cpu usage and size of the file vs other jobs and start to get a calculation that could start to define how a job gets parsed out.

its fun to think about…but i think it means all of us getting in a room and drinking many tiki drinks to get exactly the right balance.

cb

im_thatoneguy · November 1, 2011, 1:58am

I like the Deadline date as priority idea. In fact it was one of the motivations for me initially pestering Shotgun Software to talk to you.

There are problems though with Deadline Boosting as well and that’s that often you have a deliverable that’s due tomorrow but the shot isn’t due until the end of the week. So unfortunately I think to some degree you would need to have the artists manually set their ‘due date’ before letting the farm apply some sort of automated priority management.

Our current system is to just use

30 for “Sometime soon would be nice.”
50 for “When it’s done”
80 for “GOGOGO!”
and 90 for “NOW!! CLIENT AT DESK!”

If you use 100 I personally show up and hit you with a stick. But we’re a small studio so most priority conflicts can be resolved by physical violence in the parking lot. What would be nice just to save a little time is to detect if a job is going to do a simple logic check ((EOD - NOWTIME )> Estimated Completion Time) and de-prioritize jobs which can’t finish by EOD. Whose job finishes overnight really isn’t terribly important but if someone can get their frames and start working now then that should go first.

Even that though gets into questions of context. For example if my normals pass finishes EOD I can’t really do much with just that so knowing whether it’s an element for the same shot would be handy and possibly something that Shotgun integration could add. If Facebook adds social context to the web then Shotgun or something similar can add Show context. “What is this render element attached to? When is that thing due?” It could start getting really interesting when you work in something like Shotgun’s Tank which also has dependency graphing. So you could add context to a job “This render Element is being used in HOL_ORN_0020_A01.01.nk. That nuke file is part of Shot 0010 which has a milestone tomorrow at 1pm. Based on every 20th frame this job is estimated to finish tomorrow at 2pm. [CONFLICT] -> [EMAIL JOB SUBMITTER]”

I don’t think a lot of that context though belongs in Deadline. To me that logic belongs in the project management system which can direct shotgun to achieve those goals. So for example if Deadline reports the estimated render time then it can adjust a PMS’s render task and detect dependency conflicts. So I think ultimately this kind of balancing will be handled externally through event triggers on the PMS side. Deadline just can’t have enough context to make those intricate decisions.

If any steps were taken in that regard I would, as I mentioned before just add a little “Due Date” spinner to the Submission window. That’s who we usually mange it. “Hey Mike, when do you need your new Reflection pass by?” If it was right there in the UI I could avoid turning my head.

im_thatoneguy · November 20, 2011, 2:54am

Ok, I got bored sitting and waiting for renders on a saturday so … I invented a new job monitor:

viewtopic.php?f=97&t=6751&p=27086