slave crash when low memory

sberger · February 18, 2009, 8:24pm

Hi, I have a few of Deadline slave crash when the virtual memory runs low on windows XP 64

I had a few stalled slave, I vnc-ed on the stalled machine and all stalled slave had the crash window and the low virtual memory message

sberger · February 18, 2009, 8:38pm

and I end up with this in the task manager.
wmiprvse.exe running at 99% cpu
and svchost.exe taking over 3 GB or ram

rrussell · February 18, 2009, 8:40pm

This problem has always existed in Deadline (even before my time), and it really is a difficult problem to handle. Running out of virtual memory can produce unexpected results, and we’ve found that the slave usually doesn’t crash in this situation (we’ve let our memory usage max out on some of our machines as tests, and in most cases the slave kept on running). If running out of memory causes the slave’s virtual memory to become corrupt, there really is no way for it to recover. At least Deadline is detecting that the slave has crashed so you can handle appropriately.

Not sure why those two processes you mentioned would max out like that. Hard to say if that’s related to Deadline crashing or to the machine swapping memory…

Cheers,

Ryan

sberger · February 18, 2009, 9:00pm

I didn’t had this behavior in 2.7 … just finished seting up 3.0 on all render nodes yesterday and I had 5 slave crashed like that today.

rrussell · February 18, 2009, 9:14pm

It did exist in 2.7 (we’ve seen it before), and the bug for it goes all the way back to the 1.x days. A big difference between 2.x and 3.x is that we’ve moved from .NET 1.1 to .NET 2.0. Not sure if that would have anything to do with it happening more often…

sberger · February 18, 2009, 10:15pm

Hi, my Lead It told me about a problem he have seen with the wmiprvse.exe.
Is your code call the Windows Management Instrumentation there seems to be a memory leak on svchost (the wmiprvse parent process) ?

Thanks

rrussell · February 19, 2009, 4:22pm

We ran some tests, and there doesn’t appear to be any leaks with our WMI usage. We found some info on the web which explains ways to help avoid memory leaks, and we will be implementing those in Deadline 3.1. However, when we ran both test apps side by side (one with the added cleanup code and one without), neither one was using up more memory than the either, and both of their memory usages were stable (we were watching the test app, svchost, and wmiprvse processes).

It would be interesting to know if you only see this problem after the machine has started swapping, or if a memory leak in the WMI code is what leads to the machine swapping in the first place.