AWS Thinkbox Discussion Forums

Some Pulse Questions

Hi.



I’ve just started as a studio engineer at a house that’s currently using DeadLine for its render farm management. They’re seeing some performance issues with the system currently and are concerned about DeadLine’s scalability as they grow from 30 to 50 users and from 240 to 320 render nodes.



They’ve recently upgraded their DeadLine repository server, and although that’s helped they still have some apparently delay-related job submission failures.



I’ve always worked with or developed job distribution systems that use a central manager and socket-based communication inter process communication. I understand the reasons for DeadLine’s architectural choices, but am very concerned about the impact the tradeoffs inherent in its decentralized, file-based system will have as this growing studio continues to use it.



On its surface Pulse would seem to offer a number of performance improvements by alleviating much of the communications and coordination load placed on the repository server. However, so far in our tests Pulse is proving to be a CPU and memory hog, straining the Pulse server far beyond what I’d expect to see for such a communication utility.



Is there a way I can find out in detail what it is that Pulse is doing? I’d like to know why, during currently light rendering, it is using 25-75% of the CPU capacity of a 3 GHz dual core Pentium and 750 MB to 1.5 GB of RAM for its ‘working set’. What is in the ‘working set’, (this footprint is much larger than the contents of our repository outside of the ‘trash’ and ‘slaves’ directories)? Also, what happens whan Pulse reaches the 2 GB memory limit for 32 bit Windows processes?



Also, out of curiosity, why are there almost 200,000 files in the ‘slaves’ directory of our repository? It almost looks like the render slaves are using files in there to communicate between one another and failing to clean up after themselves.



There are a number of features I do like about DeadLine, and the staff here is comfortable with using it. Hopefully we can find a way to help its performance scale with our needs, even if that requires dropping the repository onto a disk-backed RAM disk.



Thanks,



Sean Laverty

Studio Technical Engineer

Hi Sean



Thanks for voicing your concern. We too have run into some issues with Pulse, and have thus returned to the core of Deadline to look for ways to improve its performance. Over the course of Deadline’s life, new features and functionallity have been added which slowly affected Deadline’s overall performance, and this was our main focus during the last couple of months of development.



We refactored how some job data was being stored and organized, and on initial tests we have seen…

i) a 50% drop in the amount of File IO being performed when the Slave updates it’s slaveInfo file

ii) a 70% drop in the amount of File IO being performed when the Slave scans for a job to render



We’ve also been able to reduce some of the lag problems in the Monitor. We’re hoping that these changes will allow us to move away from using Pulse, though if an application like Pulse is required down the road, the reduction in File IO should definitely help its performance.



.NET applications use up memory as they need it, and only return unused memory when it is needed elsewhere. .NET apps are known for looking like they’re using more memory than they actually are (especially if using the Task Manager to see this information). Our experience with Pulse was that it had shortcomings keeping up with 200+ slaves during busy periods, but we never ran into the memory issues you speak of. Of course, not having to rely on pulse in the future would alleviate this problem.



Those slave files piling up was actually a bug in the Deadline code. We have addressed this bug in our current working version, so you won’t see this problem going forward.



We plan to go into heavy testing with this new version in the next week or so, with a public release hopefuly coming in the next month. Again, our main focus was performance and usability, which should hopefully solve the scalability problems you are running into.



Cheers,



Ryan Russell

Frantic Films Software

http://software.franticfilms.com/

(204)949-0070

Hi.



Thanks for the quick response.



Cool. It’s good to hear that the file behavior in the ‘slaves’ directory was as aberration from expected behavior. The file access tuning also sounds promising.



I’m curious as to what Frantic suggests for a repository server in a high-demand environment. Could you post the detailed specs of the server that you guys are currently using? Also, do you know if any of DeadLine’s users have used a RAMsan or similar device to alleviate the disk I/O latencies on the repository server?



Thanks Again,



Sean Laverty

Studio Technical Engineer

Hi Sean,



Our repository machine is an AMD Athlon MP 2800+ (dual processor) with 2 gigs of ram, and we we use a 1 gig network connection. The machine is running windows 2000 server SP3.



We currently have 202 slaves running with 522 jobs (67 active) in the queue, and the server machine is sitting at roughly 50% cpu usage (all by the System). In the repository options, the BetweenPollingDelay is 120 seconds, and the ScanJobRepositoryPercentage is 10%. We are not currently running Pulse.



I’m not aware of any Deadline users using a RAMsan device.

[If anyone reading this are using or have used such a device, please post your experiences with it.]



I hope this information is helpful.



Cheers,



Ryan Russell

Frantic Films Software

http://software.franticfilms.com/

(204)949-0070

Ryan this is excellent news about the new version performances.



As for the Server we are running dual 1Ghz cpus with 2GB ram.

With this machine we get good to very slow performances depending on the load. We have aroudn 120 nodes connected to the server.



We are considering upgrading to a machine with more cpu power and most importantly raid drives for speed improvement and easy crash recovery.



Sylvain Berger | Technical Director | Alpha Vision


Privacy | Site terms | Cookie preferences