Deadline Monitor 10.4.0.8 Repeated Random Crashes - RHEL 9.4

Hello everyone!

I’m having frequent random crashes with the Deadline Monitor, 10.4.0.8, on RHEL 9.4.
It repeatedly disappears making it very difficult to use.
The monitor log files don’t seem have anything referring to an error or crash.

Is anyone else experiencing this?

I noticed that someone else also mentioned this and posted this error:

malloc(): unaligned tcache chunk detected
Aborted (core dumped)

I appreciate any help with this issue!
Thank you!

1 Like

I also have this issue running Rocky9.4. I thought this was to do with a central install but running it locally I also get this issue. There was a known issue with glibc on rhel 9.4 specifically so possibly related. (This has been fixed on newer versions of most DCC’s)

https://issues.redhat.com/browse/RHEL-39415

1 Like

I’m also running everything locally. Thanks for sharing the RHEL glibc issue. I’ll keep an eye on that. I use Maya with RenderMan and everything seems to be working fine in 9.4 Hopefully someone from the Thinkbox staff will chime in with some suggestions.

I have this exact issue with the Deadline Monitor 10.4.0.12 on Rocky 9.3

Is there anything one can do to help/sort this?
Its getting really challening to work like this.

tia.

So I did a central install on our application server…
These versions of Deadline are built with Python3.10
while Rocky 9.3 was deployed with Python3.9

Seems like the Deadline monitor needs to be pointed to its own libraries, as allowing it to use the Rocky System libraries will memory leak to a hard segfault. This was causing about 5 crashes an hour for me.

I tried to include a line that makes sure any python/qt related libraries that the deadlinemonitor depends on when it launches are specified in its application wrapper. This seemed to help a bit, but it still crashed within the hour for me.

# Edit ${DEADLINE_PATH}/deadlinemonitor
# prioritize Deadline's python3.10/QtLibs over Rocky's system python3.9/Qt
LD_LIBRARY_PATH="${DEADLINE_INSTALL_PATH}/${DEADLINE_VERSION}/lib/python3/lib"
#echo "${LD_LIBRARY_PATH}"

We are encountering a similar issue with rocky 9.5 and Monitor 10.4.1.8, (seemingly random) crashes.

It usually crashes with the message Segmentation Fault (core dumped), sometimes with

malloc(): unaligned tcache chunk detected
Aborted (core dumped)

Then I tried running it through gdb and got this

...
Thread 15 ".NET TP Worker" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fbeaa7fc640 (LWP 330944)]
0x00007ffff78999f9 in malloc () from /lib64/libc.so.6
(gdb) c
Continuing.

Thread 15 ".NET TP Worker" received signal SIGSEGV, Segmentation fault.
0x00007ffff78999f9 in malloc () from /lib64/libc.so.6
...

Any thoughts would be greatly appreciated

also still having this issue on 9.4/5/6

saw this post

and wondering if the monitor issue is related to house keeping also

can confirm, disabling housekeeping and all other maintenance tasks has gotten our pulse stable enough for production.

We still have the webservice crashing every 1-2 days but have not been able to find the time to dig into the root of that issue

1 Like