Linux errors more often

I upgraded to deadline 3.1 sp1 about two weeks ago. At first things seemed fine. No real difference between 3.0sp1 and this version from what i have seen yet except for the new gui features which are nice. But to the problem. I run deadline like i normally do, mount the NAS, mount deadlinerepository, and run deadlineslave. (i wish i could use nfs but i guess that is how it goes). After a few hours one or two of the slaves randomly drop with no real error code. It says “there was a problem running native code, possible something with mono.” Deadline logs do not show any errors just that it stalled. What do you need for me to help you diagnose the problem?

For most of the day the servers run fine no issues. It just sucks because my boss now wants to move all our servers to windows now because linux has been crapping out so often. Linux is SO much faster than windows at rendering. oh well

Is that the exact error message, or is there more to it? If there is more, could you post the full error message? If you’re running the slave from a terminal, I imagine you could just copy the error out of the terminal and paste it here. If you could grab some of the normal slave output as well, we can check if the error is occurring at specific points of execution.

Also, maybe try upgrading to Mono 2.4.2, which was just released this week. This is a maintenance release to 2.4, so it wouldn’t hurt to see if that helps.

Cheers,

  • Ryan

Hey Ryan,

I upgraded one of my linux slaves to mono 2.4.2 and it still errors just as often as the rest. I attached the error in an file because there was way too much to copy into here.

-Jonathan
error.txt (26.8 KB)

This error seems to be related to libgdiplus, which is a dependency of Mono for the Windows Forms. I’ve googled the problem, and it seems like it might be related to mismatched mono/libgdiplus versions. On our openSuse box, I found that when I upgraded Mono to 2.4.2 using the built in package manager, it didn’t automatically upgrade libgdiplus to 2.4.2 (it was still at 2.4). Maybe check that the libgdiplus version matches the mono version in your package management software.

Cheers,

  • Ryan

I upgraded libgdiplus to version 2.4.2 and no luck :frowning:

I attached the errors I am getting now.
error2.txt (30.5 KB)

Hey there,

The good news is that it looks like it’s no longer libgdi that’s causing the problem… but that bad news is that we’re obviously not much further along :frowning: The error you’re getting is still getting generated by native code (looks like libc is the culprit this time…), and might be because Mono is doing something weird. Maybe try downgrading Mono and liggdiplus back to a lower version (maybe 2.4.0) and see if that fixes things. Unfortunately, we haven’t encountered this kind of error before, so we’re going to have to play this one by ear a little bit.

Cheers,

  • Jon

I have 1 server running 9.04 ubuntu linux x86_64 with mono 2.4.2 and libgdiplus 2.4.2. I have 4 more running the same OS with mono 2.4 and libgdiplus 2.4. I did not have any problems with these version of linux before running 3.0 sp1. So it rules out the OS. I left 4 servers at 2.4 because I do not like to change production level servers until everything is tested fully. (defeats the purpose of upgrading :frowning: ).

I have only 2 weeks to fix this bug or I will be forced to change all my servers to windows which I REALLY do not want to do because they are slower. But they also do not crash. Please help.

-Jonathan

Do all the error messages you’re getting seem to happen in the same spots? Or are they kinda random? I’m asking because the place where your latest log crashed is actually in really simple code that’s been in there for quite a while. Basically, when the crash occurred, Deadline was simply trying to start a new Process to run the command “which free”. This is to find the location of the ‘free’ command in order to get memory usage statistics. I can’t imagine this command would throw a segfault (though you might want to try it on the command line just to be sure), and Deadline would have needed to start other Processes before this in order to get as far as it did in the logs… So I have to say I’m rather confused as to what could be causing these crashes you’ve been experiencing.

Which Linux distribution are you using? We haven’t been having any problems on our Suse machine, so maybe we’ll have to do some more extensive testing on whichever distro you’re running on in order to pin this one down.

The errors happen randomly. I have no idea either what could be causing the error.

The distro i am using is Ubuntu Server 9.04 x86_64

I have been using ubuntu server since deadline started supporting linux. I have not had very many problems until now though. I have to watch deadline monitor often when rendering just to make sure nothing errors or if it does, that i can restart it right away.

Is there anything that you would like me to do to help with this?

Maybe post a few more error logs, since they seem to be different. Any pattern we can find in where/when the crashes occur would definitely help us :slight_smile:
Looks like we’ll have to set up a Ubuntu VM and do a bunch of test renders on there to see if we can replicate the issue, though. Hopefully that’ll get us somewhere.

So i found a quick and easy fix. I ran deadlineslave with no gui (deadlineslave -nogui) and i have not received any errors or crashes yet.

Hopefully that will aid in solving the problem.

Interesting. From the error messages you were getting, I wouldn’t have guessed it was a GUI-related problem. Maybe it has something to do with Ubuntu Server not shipping with a GUI, and it might not have installed cleanly…? Are you using GNOME or KDE? On the Ubuntu Server VM I set up, I’ve installed GNOME and it seems to be working fine so far (with Mono 2.4). Maybe upgrading or reinstalling KDE/GNOME/xserver/xorg would fix it? I know it’s a bit of a long shot, since you said it worked fine before the upgrade, but it might be worth a try.

I’ll keep trying to break the Slave on the VM setup, in case I find anything in the meantime :slight_smile:

Cheers,

  • Jon

You can install Ubuntu Server gui really easy, “sudo apt-get install ubuntu-desktop” That will install gnome as if it was ubuntu desktop version (with some added server functionality but no big difference, kinda defeats the point though). I had no problem installing deadline after i installed mono 2.4+.

The upgrade i meant was with deadline not ubuntu. Once we changed from 3.0 sp1 to 3.1 sp1 is when everything started failing. with 3.0 sp1 i could run my server headless without gui which made everything run faster and with less ram. Now I have to load up into gdm which takes much longer and is slower. I would go back to 3.0sp1 if i could but downgrading is a lot harder than upgrading.

Good luck trying to break it. I can give you my install script if you want to see exactly how I installed everything. We are using maya 2k9 64 bit. I dont know if that helps.

-Jonathan

Yup, that’s more or less what I did to get the GUI on there :slight_smile: And an install script could be handy, just to make sure I’m setting up everything the same way you are, if it’s not too much trouble.

We’ll definitely let you know if our testing turns up anything, but at least you found a workaround that seems to work for now.

Cheers,

  • Jon

Here is my handy dandy install script.

IP="ip addr show eth0 |grep 192.168 | cut -c20-21"

sudo apt-get install -ym rpm csh smbfs gcc bison gettext pkg-config libgtk2.0-dev ia32-libs ubuntu-desktop nfs-common

sudo mkdir -p /var/lib/rpm
sudo mkdir -m 777 /usr/tmp
sudo mkdir -pm 777 /Agaminas/agami1/
sudo mkdir -m 777 /deadlinerepository/
sudo mkdir -pm 777 /Bb/c/
sudo mkdir -m 777 /install

sudo ln -s /usr/aw /aw
sudo ln -s /usr/autodesk /autodesk

sudo mount bb:/share/Personal/Jonathan/ /install
sudo mount.cifs \\bb\c /Bb/c -o user=USER%PASSWORD dom=DOMAIN
sudo mount.cifs \\Agaminas\agami1 /Agaminas/agami1/ -o user=USER%PASSWORD dom=DOMAIN
sudo mount.cifs \\deadliner\deadlinerepository /deadlinerepository/ -o user=USER%PASSWORD dom=DOMAIN

cd /install/Software/Workstation/Maya\ 2k9/Linux-64
sudo rpm -ivh --nodeps AWCommon-.rpm Maya2009_.rpm

cd /install/Software/Workstation/Deadline/31sp1
cp mono-2.4.2.tar.bz2 ~/
cp libgdiplus-2.4.2.tar.bz2 ~/
cp deadlineclient.sh ~/
cd ~/
sudo tar -xvjf ~/libgdiplus-2.4.2.tar.bz2
sudo tar -xvjf ~/mono-2.4.2.tar.bz2
rm mono-2.4.2.tar.bz2 libgdiplus-2.4.2.tar.bz2
cd mono-2.4.2
sudo ./configure
sudo make && sudo make install
cd …/libgdiplus-2.4.2

sudo ./configure
sudo make && sudo make install
sudo cp /install/Render\ Farm/Maya\ 2k9\ aw/aw$IP.dat /var/flexlm/aw.dat

cd /install/Scripts
cp rf_Startup.sh /home/farm
cd ~/
sudo umount /install
sudo rm -R /install