Just thought it might be worth bringing these up again to see if any progress or insight existed on either.
Continuous Pulse memory leak when running as a service.
Restarting Pulse with Power Management + Wake On LAN enabled causes all slave nodes to be started up (with or without jobs in the queue). Disabling WoL, restarting Pulse, then re-enabling WoL circumvents this issue, but is quite an annoying process, especially since Pulse needs to be restarted several times a week (due to the aforementioned memory leak).
It would be very useful to post the version of Deadline you are using, as we now have customers running 2.7, 3.0, 3.1 and 4.0 (which was released today).
Were you ever able to confirm that this was a workaround?
Unfortunately, this issue was never properly logged as a bug, and it didn’t come up during the 4.0 beta, so it was overlooked. We’re targeting this bug for the next release.
I never actually got around to trying that. The reason is we run Pulse on a server that gets logged on to and off of pretty regularly, so having Pulse running as a service with logon rights is kind of essential (since we need it to be logged on all the time). I’ll see if I can keep it logged on long enough to do a somewhat long-term test though.
Sorry for dropping the ball on the thread/bug report.
We have confirmed that this leak occurs with the Idle Shutdown, Wake On Lan, and Thermal Shutdown features enabled. Do you guys use all three of these? I know you mentioned in the other thread that you use “all of them”, but I just want to confirm this. We’ve looked at the code and nothing jumps out that would cause a leak, so the more we can narrow down the problem, the faster we can likely fix it. We’re going to test each of these 3 feature individually to try and narrow it down, but if you are not using one of more of the 3 features mentioned, please let us know.
As something of an update, it appears there is actually a leak in the Pulse GUI as well.
I’ve been running it in GUI mode instead of service mode as sort of a long-term comparison test, and, while not nearly as noticeable as the service-mode leak, the RAM usage has gone from ~16 MB to ~72 MB over the course of 8 days. I’ll keep it running to see if this trend continues, but I thought it might be worth mentioning at this point. Also, this is with the launcher running (not sure if a leak in that context was discussed/brought up already) as opposed to starting Pulse directly from the install folder.
72MB seems pretty normal. Ours production Pulse always fluctuates between 50 and 80 MB, but never climbs higher than that. I should mention too that we’ve been running memory profilers on Pulse while in nogui mode, and there is no indication that there is a leak. However, it did appear that many objects were being tagged for garbage collection, but were not being collected in a timely fashion. Eventually though, they should be cleaned up, but we’re playing with some ideas to force the cleanup.
Since upgrading to 4.0, have you left Pulse running in service mode long enough for it to eat up all the system’s memory? I’m just wondering if .NET is being lazy with the cleanup, so it looks like a leak, but as soon as the system needs some RAM, it will clean things up as necessary (because that’s what our profiler tests seem to indicate). We’re going to leave Pulse running on 3 machines over the long weekend to see what the memory is like, and we’ll give you an update on Tuesday.
When we came in this morning (Tuesday), we found each instance of Pulse to be using roughly the same amount of memory it was when we left the office on Friday, and the peak memory usage wasn’t all that much higher. This would indicate that “forcing” the cleanup helps keep the memory in check when running in nogui mode. So we will be rolling out this modification in the upcoming 4.0 SP1 release.
Still curious though how Pulse ran for you over the last few days.
Yeah, that’s not surprising, because before we were forcing garbage collecting, that’s what we were seeing too. When we checked the memory profiler, we saw all these objects in the finalizer queue. They were marked for collection, but weren’t getting collected fast enough. Forcing garbage collection at certain intervals (every 5 minutes in our tests) ensured these objects were cleaned up in a timely fashion. The fact that this doesn’t happen in GUI mode would indicate that garbage collection works differently when an application is running with an interface on a message loop.
Oh well, at least this issue should be resolved in 4.0 SP1, which we hope to get out before the end of the month.
Cool, thanks for taking a look at this. I’ll continue to run it in service mode for the time being to see if the RAM usage continues to climb, or if it levels out at some critical point.
Thought I’d drop an update here. RAM use is up to 415 MB. Just out of curiosity, if this is just a case of .NET being lazy with garbage collection, at what point does it decide to purge anything?
It should purge this memory when the system needs it (ie: if you start up another program that requires the memory), although I’m surprised it’s climbed that high. Oh well, I guess this is all a little moot since 4.0 SP1 should resolve this issue.
Not a 911 call, just an interesting update. I’ve been ignoring Pulse for awhile, but I just checked it out again and it was up to ~910 MB. Looking forward to SP1…