AWS Thinkbox Discussion Forums

Solid State Repository Partition Test

Hi.



We’ve been looking into ways to speed up our Deadline repository performance. Our farm has ~240 render nodes and we often have dozens of active jos with thousands of tasks. Our old repository server solution was bogging down with file I/O latency on the repository file system’s RAID 0.



One thing we’ve been trying out is placing the repository on a fairly low-end solid state disk, a 4 GB Gigabyte iRAM. Our repository data is usually 1-2 GB so it fits nicely in the available space. Afer a month of testing we’ve found that the iRAM’s fast bandwidth and relatively negligible file seek time has improved our repository performance and monitor interactivity dramatically. It’s quite useable under our heavier loads.



The down side to the iRAM is its volatility. If the server is ever powered down completely for very long the file information will be lost. We’re maintaining snapshots of the repository on a traditional drive but are looking at a more reliable, permanent solution. The iRAM was only intended to be a technology evaluation device and chosen for its low price and availability. There are better solid state disk devices avalable at higher price points and we’ll be swapping in one of those soon.



Even with a repository living on a solid state disk I’m skeptical about a single Deadline server’s ability to effectively support more than 300 render nodes. The completely decentralized architecture and dependancy on files for communication creates significant scalability issues. As we expand our farm further it appears that we’re going to need to divide the farm into multiple repository groups.



Pulse should be a viable tool for improving Deadline’s scalability by moving at least some of the communication away from simple files. Unfortunately the last version we were able to try exhibited considerable memory issues. How is Pulse’s development coming along?



-Sean

Hi Sean,



We’re still working the kinks out of Pulse, including the memory issues you are running into. In fact, Pulse won’t be making an appearance in the next release. We’ve cut down the amount of file IO Deadline performs by over 50%, so you should see a noticeable improvement in performance when you use this version (which should be out in a couple weeks). Our farm has 200+ slaves, and we’ve noticed the improvement here.



We agree that the decentralized architecture can cause scalability issues, but the combination of the reduced file IO along with Pulse (once it gets back on its legs) should allow for better scalability.



As always, we appreciate your feedback, and we hope this next version will solve (or at least reduce) your performance issues. If you have any questions regarding the new version, please let us know.



Cheers,



Ryan Russell

Frantic Films Software

http://software.franticfilms.com/

(204)949-0070

Privacy | Site terms | Cookie preferences