We’ve noticed that some of our jobs can have issues where they crash when dealing with larger scenes. From looking into it, it seems that the issue is actually the initial loading of the scene. In this particular program, a lot of data gets read up front, taking up a huge amount of RAM. After that initial load, the RAM usage drops significantly, to entirely manageable levels. However this can cause an issue when 4 concurrent tasks are running. All 4 try to load the file together, slowing each other down and often crashing.
Our current solution is just that we run less concurrent tasks, but now I’m wondering, is it possible to stagger the times at which concurrent tasks run? Even just a simple timer that waits 1 minute from the first to the second task etc. might be able to help. I couldn’t find anything in the docs, but I’m wondering if I missed an option or if anyone has done something like this to get around a similar issue in their pipeline?
I think your best options is docs.thinkboxsoftware.com/produc … throttling but I am not sure if this will be triggered as it is a single slave, with many tasks. Can you give it a try?
Thanks for this tip, I’ll try it out. Though it’s unclear whether or not this will affect anything when I’m not copying the job files?
To explain, in case I wasn’t clear, our slaves don’t copy job files. The issue with large scenes is that they’re slow and unwieldy to open in the program. It’s the size they take up in RAM that’s the issue. The size on disk doesn’t come into play at all.
I would apply a “Task” level “Limit” to your job(s) in question and let it release the limit, say when the job has progressed > 1% of the task in question: docs.thinkboxsoftware.com/produc … #new-limit
This will have the effect that no matter how many ConcurrentTasks are running on a slave, with this “Task” level limit, it will be restricted to loading 1 scene file and start rendering to at least 1.1% of the task before then releasing it’s task level limit, which will thereby allow another Concurrent task to then be released. Yes, it will slow things down for the intial job startup per concurrent task, but shoudl deliver the stability and functionality you require to stop this ‘blocking’ or local bottleneck behaviour.
Ah, that’s a great solution. Unfortunately it wont happen to work in our case, as this is with the Anime Studio Pro plugin. It has an issue with not being able to return useful feedback for the progress bar, so the tasks are always either at 0% or 100%. For any other one this would work really well, but unfortunately it doesn’t seem to work for us.
Thanks for the ideas! We’ll just have to continue manually adjusting concurrent task levels for now.
Ok, understood. Here’s a naughty little trick for you; you can always set the progress yourself by customising the AnimeStudio Pro plugin. I’m not too hot on what Stdout this application dumps out, but is there any line(s) printed out which would be the appropriate point ‘in time’ to force the progress value? If so, we just need a StdoutHandler function that looks for this line via a RegEx and then when found, sets the progress or it could even release the actual limit if you fancy.
# manually set progress to 5% at a certain point of the render process, to allow the "limit" applied to the job to be "released" at 5% or more.
self.SetProgress( 5.0 )
When I tried adjusting the HandleProgress function, it didn’t seem to be updating the task progress at all. Even when I just changed the function to literally just be self.SetProgress(1). When I moved that very same line to the end of the arguments gathering, it was able to set the progress so it was definitely reading from the correct script. This makes me think that HandleProgress isn’t getting called at a useful point, presumably because all the stdout appears to be dumped at once, after all the rendering has been done anyway.
I’ll look deeper into the plugin script anyway and might find some way in there to delay some of the tasks even if it’s a bit hackier than these solutions.