MacPro Slave Failing to Render After Loading Scene

Hello

We have a MacPro render server where our Deadline 7 Repository resides (Mid 2012, OS X 10.8.5, 48GB). This MacPro also has a Slave.

We have 2 additional Slaves: MacPro (Mid 2012, OS X 10.8.5, 48GB) and 27" iMac (Mid 2011, OS X 10.8.5, 32GB).

We are rendering some animated scenes in Mental Ray using Maya 2015. If we queue several Jobs to render over the next few days the render server and the iMac plough on through the work without any trouble, however the MacPro Slave will generally pick up the first Job and contribute with the Tasks but once the Job is complete it will pick up the next Job, load the scene and then just do nothing.

To try and pin down the problem we created a small network of just these 3 Macs with a good HP ProCurve Switch. On the errant MacPro I erased an internal drive and did a clean install of OS X 10.8.5, Maya and Deadline 7 Client. Launched the Slave and it started contributing to an ongoing Job, a sequence of approximately 150 frames in Tasks of 10 frames. Once this Job was complete the iMac and the render server MacPro went on to the next Job and started rendering but the MacPro Slave loaded the scene (these scenes are big and take about 10 minutes to load) and then just sat with no obvious error in the log.

Any suggestions or advice very welcome as we have been trying to get to the bottom of this for over a week now!

Chris

Hello,

So I think the first thing we would want to try is to turn of Mayabatch and see if your machines see the same result after the first job. If that fails to improve things, can you try to unplug the monitor from one of the machines experiencing this and see if they still lag on the second job. Also, can you turn on verbose slave logging via the Application Logging section of repository options in the Monitor, so that if we need the application logs, they will already have the info we want. Thanks.

Thanks for your reply

Last night we tried two things: we switched off Optimise Animation Detection from the Mental Ray Global Render Settings and when we submitted the overnight Jobs we switched off Auto Memory Limit and left it set at 0. We suspect the problem may be memory related.

Again last night the MacPro render server/slave and the iMac slave kept solidly rendering one Job after another whereas the MacPro slave completed the first Job, loaded the scene for the second and just sat there until I arrived this morning.

Out of curiosity why would unplugging the monitor make a difference. We will try rendering with MayaBatch switched off and see if it helps and I think I will also try swapping the memory between the MacPros to see if the problem swaps across.

Hi,
Just to double-check as you said all your machines are running 10.8…but is the machine displaying this issue running 10.9 - Mavericks? If so, this could be the “App Nap” issue on OSX which can be worked around via our docs here:
docs.thinkboxsoftware.com/produc … rkstations

Hello Mike

Thanks for your thoughts on this. The MacPro slave runs Mavericks and we too thought that the problem may well be OS related, particularly with the change to SMB as the default network protocol. I formatted an internal drive and put a clean install of 10.8.5 on it as well as Maya and Deadline 7 - it made no difference unfortunately.

Today we have been testing memory as the MacPro server has 48GB of buffered memory and the MacPro slave has 48GB of un-buffered memory. We switched the memory but the problem stayed with the MacPro slave, so it’s not looking like that. We’ll do a better test of this tonight when we can send much bigger Jobs to the Macs.

We’re now looking to see if Sophos anti-virus is causing our problem.

A quick update

Disabling Sophos anti-virus did not help either, unfortunately. The MacPro slave completed the first Job, loaded the scene for the second and then went no further. At least we can reproduce the problem!

Can you enable verbose slave logging under “Application Logging” in your repository configure settings, restart the slave and then send us the logs when this next happens?

Hello Mike

We’ve done that and will send you the log when it next fails. First we are going to try setting the Pulse Throttling max number of Slaves that can copy job files to 1. We are thinking that our render server can’t cope with 2 other Macs reading the Scene files (about 2GB in total) consecutively.

Do you have any system monitoring tools via your IT team to visualise your bandwidth here?

Hi Mike

Our IT colleague has given us a more powerful HP ProCurve switch to help. I’ll ask for something to monitor bandwidth.

Is there any way to throttle the CPU usage on Macs? We’re really up against a tight delivery date, good time to have this problem!, so we want to keep a Slave running on the render server as it is very reliable not needing any network connection. However running the 16 cores on the render server flat out rendering may be troubling its ability to serve the scene files etc. Just a thought.

Thanks

Sure, you can control the CPU affinity. See here for more info:
docs.thinkboxsoftware.com/produc … u-affinity

I just need to correct Mike here that as far as we are aware, there is no way in Mac OS to control CPU affinity, so that feature will not work on a Mac system. We’ll have to look into other options on this one.

Ah, yes, good spot Dwight! Windows & Linux play quite nicely here, but as Dwight said, OSX just plain doesn’t expose any functionality here. The first sentence in our docs clearly points this out!

Hello Mike

We had spotted that the CPU Affinity didn’t work with Macs unfortunately. It’s odd as I remember, years ago, being able to specify the number of CPUs when setting off a batch using the command line.

Our woes continued during tests overnight, I’ve got the verbose logs. We set Pulse Throttling to limit the number of Slaves that can copy a Job at the same time to 1. Set off 3 Macs rendering. Our troublesome MacPro Slave was given a single Job of 350 frames of animation in 10 frame Tasks (each frame takes approximately 5 minutes to render). It completed 20 frames and then decided it had had enough and just stopped. We suspect that having just copied all of the required files for the next Task of 10 frames MacPro-03 stopped, didn’t notify Pulse that it had finished and so Pulse held the other 2 Macs back from copying their files and the result was that everything came to an abrupt halt.

The Slave log is 30Mb, shall I crop it down to the Task before and up to the point of failure? Which other logs would help you please?

Chris

Pulse Throttling is only applied when a slave first picks up a job not a new task.
docs.thinkboxsoftware.com/produc … throttling

Yep, cropping down to the essentials would help. Alternatively, zipping up the log files will help and/or you can send them direct to "support@thinkboxsoftware.com" to generate a private support ticket, if you don’t want to post them here. Logs from machines which are ok, would also be useful to compare against.

Hi

I will send the log file soon, in the meantime we are both trying to get as many fares out as possible whilst also trying to test different conditions to try and ascertain why our MacPro keeps stopping when it commences the second Job in the queue. Perhaps it’s just grumpy!

We thought our problem may be related to Render Layers. Unfortunately having recreated one of the scenes with all previous Render Layer changes applied to the master layer in a clean scene it still stopped on the second Job.

Our current strand of thinking is that it is memory related and connected to the Mental Ray Plugin. We find that the MacPro will render say 5 tasks, each specifying 10 frames and complete the Job. Next it reads the scene for the second Job and then just stops. If we restart the Slave it still will not get going. We have found that we need to have Maya running, as well as the Slave, for stability on the Macs and if we unload Mental Ray and then load it again then the MacPro will start rendering again. Do you have any knowledge of Mental Ray plugin memory problems or of Maya losing communication with the Mental Ray plugin?

Hello,

When you see this happening, can you remote to machine and move mouse, see if that fixes it? I am wondering if this an attack of the app nap feature that debuted in Mavericks. Thanks.

It’s a logical thought and we thought the same a while back. To test this I performed a clean install of OS X 10.8.5 (Mountain Lion) on the errant MacPro along with a fresh install of Maya and Deadline - exactly the same happens. Prior to this in OS X 10.9.5 we had previously followed the Deadline installation carefully and disabled AppNap from Terminal and switched it off at the applications (Get Info).

It’s as if the Mental Ray plugin is either running out of memory or having completed the first Job it becomes unavailable for the second Job so the scene loads but Mental Ray is just not there to start actually rendering.

As a test, can you enable in the job properties - “Reload Plugin Between Tasks” on a job which is displaying this error on the Mac in trouble, to see if it helps?
docs.thinkboxsoftware.com/produc … ml#general

Hello,

If you could also share a slave log done after verbose slave logging was turned on in the application logging section of the repository options, that would be good too.