Hey,
recently we ran into a problem with dependent jobs not rendering even though the parent job is complete. Where do I start with debugging? Unfortunately there are no logs to start with. Any help is highly appreciated!
-Christian
Deadline: 6.1.0.52433
Plugins: Nuke, Pythonscript
Hey Christian,
I think this is related to the behavior you’re seeing here:
viewtopic.php?f=86&t=10166
First question, are you guys currently running Pulse? Resuming dependent jobs and deleting jobs that are complete are part of Deadline’s housecleaning process. This is done at regular intervals by Pulse, or by the Slaves at random times in between tasks.
Cheers,
Hi Ryan,
thank you for your answer. Sorry, I wasn’t aware of pulse. After restarting pulse everything looked good, but after a couple of minutes it crashed and showed a fatal error. I will send you the log by email.
-Christian
Thank you Ryan! You brought me onto the right track. The pulse server was outdated and crashing. After updating it everything seems to be back to normal.
-Christian
Bad news, Pulse crashed again. I will send the log by email.
-Christian
Hey Christian,
We got the logs, and we’re trying to reproduce this crashing problem here on our Linux VMs (one running Fedora 15 x64, and one running Ubuntu 12.0.4 x64), but we’re not having any luck. It appears that the crash happens while running the job dependency scripts, so we’re running a bunch of tests with scripts that succeed and scripts that error. We’ll keep them running throughout the day to see if they do eventually crash.
So a few initial questions:
- Which linux OS (flavor and version) are you running Pulse on?
- Which version of Mono do you have installed on the linux machine? You can check from a terminal by running ‘mono -V’.
Thanks!
Hey Ryan,
thank your for you response! These are our system specifications:
- CentOS 6.3 x64
- Mono JIT compiler version 2.10.9
The script which is checking for the dependencies I’ll send you by email.
-Christian
Today I restarted Pulse and it is running fine now since two hours. Is it possible that too many jobs have been in the line?
-Christian
This morning Pulse was down. It seems like it crashed shortly after I wrote the comment yesterday. I restarted it, but it crashed within a couple of minutes again.
-Christian
Hey Christian,
Just a heads up that we released beta 4 today, and it includes a change where the Deadline house cleaning operations (which includes checking dependencies) is done in a separate process. We’re not quite sure why Pulse keeps crashing for you, as we were unable to reproduce it on two different linux VMs, but this change should help you guys out since it should no longer bring down the main Pulse application. We’re going to try and setup a VM running the same specs you guys are to see if we can reproduce there.
Based on the logs you’ve sent us, it appears the crash occurs when running the python scripts for job dependencies, but it seems to be completely random. So with the changed I mentioned above, if a python script causes a random crash, it shouldn’t be that big of a deal because the next house cleaning operation will probably succeed without crashing.
So after upgrading to beta 4, let us know if you see any stability improvements with Pulse.
Cheers,
Hey Ryan,
thank you for the update. Can I only update the Pulse machine or do I have to update the whole farm?
Let me know in case you need more information from our side.
-Christian
I would recommend updating the whole farm (repository, pulse machine, slaves, and workstations).
Cheers,
Hey Ryan,
today I updated our Deadline renderfarm. So far everything seems to be fine. I’ll let you know how Pulse is doing. Thank you for you support!
-Christian
Hello Ryan,
it seems like the update fixed the problem with Pulse crashing. It is up since Friday now and I cannot see any problems. Thank you for your support!
-Christian
That’s great! Thanks for the update.
Cheers,