Deadline on Apple M1 is unstable

Chad_Gleason · May 3, 2021, 7:30pm

I’m not sure where to post this, so apologies if its in the wrong spot.

Running the latest Deadline on an M1 Mac in Rosetta mode will cause a worker crash after a short while (10-15 mins, maybe). This happens 100% of the time on an M1 Mac Mini and an M1 Macbook Air. Both are running Big Sur 11.3. Is there any word on whether a native Apple Silicon port is in the works? Regardless, Rosetta should be adequate in the short term, if only it were stable.

Deadline runs very stable on intel macs, as well as Windows machines.

Thanks!

AWvisuals · April 26, 2023, 10:22pm

I just bought a mac mini M2 to try and see if it works as a nice compact render machine (mainly for nuke jobs). I seem to be experiencing the same problem still, but maybe even shorter periods between crashes. Any updates about this? It seems like an ideal small setup for the job, but if this doesn’t work at all, I might have to rethink my approach…

Nreddy · April 27, 2023, 3:12pm

Hello @Chad_Gleason ,

Thanks for reaching out, Can you please share the Crash report from your Mac, we have seen this reported before and after reviewing the crash reports shared from customer by our engineers, they have found other similar crashes online. The solution for those folks was this:

Make sure Rosetta 2 is installed (If you need to install Rosetta on your Mac - Apple Support)
If it was already installed, re-install it

Can you please try the proposed above workaround and see If helps at all ? Please update with your findings.

Thanks!

jkimenau · June 14, 2023, 9:34am

Hello @Nreddy,

I did re-install Rosetta 2 thanks to these steps. Unfortunately nothing changed and both our identical Macbook Pro M1 Max 64GB stall after some time. Sometimes after 10mn, sometimes 20, 1h, up to 1h30 so far.
They both run Deadline client 10.2.1.1. And are connected to an RCS 10.2.1.0. I couldn’t find the 10.2.1.0 OSX client version online anymore.

Is there anyway you could help me?

DaveJacobson84 · June 27, 2023, 11:14pm

Hi @Nreddy,

It looks like I’m having the same issue on my farm. I just re-installed Rosetta 2 per your suggestion, however the Worker still hangs. Sometimes after a few minutes, and sometimes after an hour or so. The issue is specific to my M1 Mac Studios, the Intel Macs and PCs work as expected.

Mac Studio (2022) M1 Ultra 128GB
Mac OS 12.6.6
Deadline Client v10.2.1.1 with RCS v10.2.1.1

Please let me know if you need any additional information, thanks!

Dave Jacobson

Nreddy · June 29, 2023, 8:03pm

Can you send the crash report to investigate about the reason for Deadline client application is getting crashed, Spindump a built-in tool can provide insights by capturing thread activity and stack traces at the time of the crash.

Run the spindump while the Deadline worker is running and let it crash, so that spindump collects the required data and send over report on this thread for investigation. Please refer this blog post found on Internet that outlines the steps about running spindump to collect the stack trace of Deadline worker.

jkimenau · June 30, 2023, 7:29am

Workers never really crash on my side. They just stall forever after some time.
Would spindump be of any use in that case?

DaveJacobson84 · June 30, 2023, 1:45pm

Same for me, I always have to force quit the Worker app.

@Nreddy, I just sent you a DM with a link to a couple spindumps.

Thanks!

karpreet · July 3, 2023, 9:03pm

@DaveJacobson84 could you be able to share the spindumps with me, I would like to take a look on that.
Also, if the worker is not crashing and just stalling for a while it should write logs in the application log folder. Feel free to share the worker logs as well.

If it has some sensitive data, you can also open a ticket with Thinkbox Support.

DaveJacobson84 · July 3, 2023, 9:49pm

@karpreet just sent you a DM with a link to the spindumps and logs. Thanks!

DaveJacobson84 · July 18, 2023, 7:50pm

Hi @karpreet, just seeing if you’ve managed to find any useful information in the spindumps. Please let me know if there’s anything else I can do to help move this along. Thanks!

AnadinMoshi · July 26, 2023, 3:28am

I am having a lot of troubles with this, many restarts of workers each day, combination of M1 and M2 machines could also provide logs if needed

casey · July 30, 2023, 11:25pm

Here to report that I am also having issues on both an M1 Max, and a M2 Studio Ultra.

Is there a reason the applications are called ***.exe? Is that working as intended?

There aren’t any logs on either machine that give any information about why they are stalling. Rosetta2 is 100% installed, and I even reinstalled.

AnadinMoshi · August 1, 2023, 11:46pm

I have managed to reduce the frequency of the constant hanging/crashing by increasing the interval between housecleaning (now set to 3600) Also I am seeing this happen to the RCS about once a day too

DaveJacobson84 · August 2, 2023, 3:11pm

Thanks AnadinMoshi, I gave that a try however my workers are still stalling.

briansmith74 · August 14, 2023, 7:29pm

Seeing exactly the same issue on the 2 different versions of deadline on new M2 Ultra Studios running latest ventura.
The machines do not crash but the deadline worker seems to hang up and cause a spinning wheel of death on each machine.
seems to be random length of time of when they do seize up, not consistent.
Machines then show stalled in Deadline Monitor,
Machine power settings are all set as always on, no screensaver etc.
I have a second set of nodes running monterey on mac Pro 2019s and do not get any stalls on those.

DaveJacobson84 · August 23, 2023, 3:37pm

Hey Thinkbox, any luck reproducing this issue on your end? We’re currently unable to utilize any of our M1 Macs for Deadline rendering. Can we expect to see Deadline running natively on Apple Silicon any time soon? Thanks!

Dave

briansmith74 · August 28, 2023, 7:21pm

Consistent failures on both M1 and M2.
tried multiple options, setting cleanup to 3600
enabling "prevent app nap: for both monitor and slave,
system power setting are all set to always on, rosetta installed several times.
csrutil disabled, security settings set to low, Thinkbox, C4D set to full disk access
Yet all 14 x M2 studio nodes will randomly become stalled only in deadline app, both slave and monitor.
Monitor crashing consistently on M1.
My second farm which is 15 Mac Pros (2019 Intels) do not have any issue!
This is really unnaceptable for a product many businesses rely on.

Justin_B · August 29, 2023, 4:02pm

We’ve not had any luck reliably re-creating this issue to root-cause the issue; it is as intermittent for you folks as it is for us. We’ve got an engineering issue rolling, but we can’t share any details about when it would be resolved, or when a release with a fix would happen due to Amazon policy.

QT seems to be the common denominator in the spinlogs we’ve gotten in this thread and in Worker randomly crashing on MacOS M2 Ultra. We only use QT to create the UI. Which means running the Worker with -nogui should resolve the issue, but that’s not what we’ve been seeing.

If possible, could we get a spinlog from a crash when running the Worker with the -nogui flag? It could be there’s two causes of a hang and crash, and the second only gets to show up with QT removed from the equation.

Thanks!

DaveJacobson84 · August 29, 2023, 4:27pm

Hey Justin, thanks so much for the response. I’ve just sent you a DM with a -nogui spindump. I had previously attempted running it with the -nogui flag, but it still stalls/hangs. It also might be worth mentioning I’m having no issues whatsoever running the Monitor on my M1 systems, it’s just the Worker that’s problematic. Thanks!