Pulse has some memory leaking issues in 6.2.1.50, and I’m wondering if anything has been fixed in that area in 7.0/7.1.
Basically, every so often, Pulse will eat up all the RAM on its host and subsequently die, which prevents anyone from submitting any jobs (since we use the REST interface exclusively for submissions). Sometimes it may happen once a week, other times it may not happen for a couple of weeks, and it happens on both of our Pulse servers (one in each studio). The process is running with no GUI on a Fedora 19 host.
We are looking into a way to get a memory dump of pulse from you so we can see what is causing the large amount of memory used. Are you aware of a way in Fedora? We’ll still researching this.
So it looks like the gcore command on Fedora is our best bet. When your Pulse instance begins to use a lot of memory, just run the command:
gcore [pid]
replacing [pid] with the pid of the pulse process, and that should make it do a dump. Then send that file to us via something like WeTransfer and we can take a look. Thanks
Update here too. Since we have issues with launching processes in 6.2, if it’s not already, have Pulse run the cleanup tasks without a separate process:
In general, when the memory usage spikes like this, it tends to go down and stabilize some time later, but sometimes Pulse ends up dying. It seems like potentially bad behavior given how little memory a fresh process consumes.
Talked with the guys again and they’re pretty confident that this has been fixed in 7.1. I thought it was caused by process spawning, but it’s actually anything to do with threading. We’re still researching the Mono profiler angle so that should give a peek into anything on the .net side that’s not being de-allocated.
So we’re having a lot of problems with Pulse in 7.0 when the job count starts to get high-ish (currently ~7,500), but we’re near the end of a big show, so doing a full repo upgrade isn’t a good idea. Therefore, I’m wondering if it’s possible to run a 7.1 Pulse instance against a 7.0 repo and clients. Or, alternatively, doing a repo upgrade to 7.1 but leaving the non-Pulse clients on 7.0.
I do believe the answer on both is ‘no’ due to some changes we made with the database structure, but I will need to get verification on whether that change impacts Pulse.
It’ll mess up the power management stuff for sure.
If I remember right, upgrading to 7.1 while still having some machines on 7.0 will break a lot of things that require the SlaveInfo’s data. Normally this doesn’t happen in minor versions… You’re just unlikely this time Nathan.
This is purely from some basic system observation, but it seems like Pulse will consistently crash as soon as its private memory allocation exceeds 4 GB. Is it possible some configuration of the Mono runtime or the applications themselves is causing additional problems here? This is still with 7.0.2.3.
The shipping build of Mono is supposed to be 64bit at this point… Want to run a file against “/opt/Thinkbox/Deadline7/mono/bin/mono-sgen” and check it out?
I’m getting the following:
ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, not stripped
Update:
I’m just thinking over this more… I know Mono has certain limitations on how large lists can be and other bits. Have you managed to grab a crash report yet? It should just be a matter of running Pulse yourself manually form the command line and piping the output to a file.
Yeah, I did that and got the same thing. I’m assuming the Windows executables being run by Mono are also 64 bit, but I can’t easily confirm that on Linux.
OK, after acquiring a PE reader for Linux, it looks like the executables are 32-bit (x86). That’s very unfortunate if true. Can you confirm or deny that that’s the case?
Erm… It shouldn’t be. Mono and .net just like Java compile programs to interpreted byte code (.net just throws .exe on the end instead of .class). We have an option during build to optimize for a particular platform, but me thinks it should run as whichever the Mono interpreter (mono-sgen) is built for.
I’ll see if our projects are set to emit the 32-bit header. Not sure if Mono respects that, but it would explain some things. As far as I remember we specify ‘Any CPU’