I’m having a hard time getting our second (!) farm to run smooth. We already have one Deadline setup with which I’m fairly happy, but out of confidentality reasons I need to set up another one in a air-gapped network.
Anyway. The farm I’m setting up right now is a mixed farm, the server and the workstations are MacOS, our render blades are running Windows. I frequently get an error stating “Error loading project” whenever I submit a new job. The error happens mostly on the windows-based machines. but from time to time the Macs are also reporting it. It almost feels as if the machines are trying to load the job in the beginning, but do not succeed on their first try – then after trying two or three times they manage to load the project and start rendering.
The log files are not very detailed on this issue, so I’m having a hard time figuring out what could be the reason. I’m wondering if it is related to permission issues, or network performance? Does anybody have similar experiences and found a way to fix this?
Any suggestions or tips are very much appreciated. Thanks!
P.S.: Anybody happy to share their experience on mixed-os renderfarms with Cinema4D? I already ran into a first scene that rendered slightly different on Windows and Mac (Speculars were strangely brigther).
Thanks for the reply. Yes I am following the general cross platform recommendations.
Apparently the “Error loading project” is something I also have on our other farm from time to time, but not even near as frequent as on the new one.
See below the job report (I get the warning about the ‘unknown baselist allocator’ even on render logs, so that shouldnt be anything critical I guess):
A fairly regular happening after submitting a job looks something like that. Quite a solid amount of error messages and then after a while the clients actually start rendering.
Yes, all the plugins are installed on all slaves. We do get results after some time and the frames are fine, it’s just that everytime the slaves are trying to start rendering they produce a lot of errors before they can actually load the project.
Oh the dreaded C4D ‘error loading project’. This happens for all kinds of reasons that I’m not overly familiar with.
One thing that we do to make troubleshooting easier to to isolate the problem away from Deadline. We do that by pulling the info from the ‘executable’ line and the ‘argument’ line to make this:
Try running that from a command prompt on Beast02 and see if it generates the same error. It’ll be easier to troubleshoot from there (and then Maxon might be willing to lend us a hand too!).
We repropagated all the permissions on the server to make sure the users have access to all files. Unfortunately it seems that didn’t fix it, still getting the same issues. Ultimately the jobs are rendering, but every job is having more than a hundred errors on its way. And inbetween not all slaves are actually rendering…
So I did the troubleshooting Edwin suggested – submitting a render job on one of our slaves directly through the Command Prompt and it turned out I got an “Error loading project” as well. Usually the first time I try to send the job, C4D gives me that error. If I try it again it starts rendering… Which means Deadline is probably not causing it, but C4D or our Server.
I still think this might be related to some kind of plugin problem. I’ve seen this error before “Unknown baselist allocator - RegID: 7” but i don’t remember which plugin caused it. What plugins are the scene using? Have you tried rendering without any plugins from the command line? Or activating plugins one at the time?
I’m not sure how responsive Maxon’s tech support is, but they might have an idea too. Short of a re-install (unlikely to help), I’m not sure what else try.
If we can figure this out I’ll log it in our internal knowledge base app.
I removed all the plugins from our Commandlines and submitted a job in Standard renderer, but have still been getting the errors. So plugins should not be the issue.
Then we were starting to wonder if it might be connected with the network performance of the second network, so I switched several of the pc slaves to our primary network (where the render farm is working totally fine) and have ran into the same errors. So network should also not be the issue.
The most common error is the “Error loading project” and inbetween a “Access to the path ‘Z:_DEADLINE\REPOSITORY7\plugins\Cinema4D\Cinema4D.ico’ is denied.” error pops up.
Gonna get in touch with MAXON now and see if they have any thoughts on this.
MAXON didn’t have much to say about the issue, only that they are not aware of any issues like that with Commandline.
Thinking that it might have to do with network congestion and similar issues, we replaced our switch with a new much faster one, rewired all of our clients and went through permissions again. Still that did not fix the issue. Network traffic to and from the server is only somewhere between 30-50 MB/S at peak, so nothing that should be tremendously worrying.
The only other option I can think of right now is to reinstall the whole farm, but I’m not very confident that will fix it either. I’m at the point that I’m starting to believe it’s not possible to get a smooth connection between a Windows slave and a Mac server…
Does Thinkbox have any consultants that we could get in for a day to have look at our setup on location? Or does anybody know an expert (preferrably in the London area) that does studio visits?
Man i feel your pain What happens if you render manually from a PC to another location that is not your server? A mac with file sharing or some other tmp location. Or try to render locally maybe. Do you get the same error then?
Just random shots in the dark.
Any updates or workarounds to lessen errors or solve this problem? We are running into the same issue. PCs will eventually finish rendering after a few hundred to a few thousand errors. We are getting either “Error loading project” or, “asset errors” with texture files. It is very hard to isolate the issue when the PCs seem to error and then, randomly load the project and textures with no errors and render.
I think the primary solution here is to resolve network congestion, as I believe a lot of these intermittent errors with accessing network located assets boil down to this.
I wonder, does your switching hardware show performance graphs to try and isolate if this is in fact a performance problem there. Bonsak’s suggestion is also a good one, did you have any luck with that?
Funny. After switching to a new server (windows) that is over 10 times faster than the old one + new 10GB switch, we’re getting a lot of these errors when jobs are starting. Both Osx and Windows slaves report this error. Even on small projects with only 10-20mb of textures in total. We only saw these kind of errors on the old server when the network was being hammered completely. An that was with a lot bigger projects with texture loads in the range of 200-300MB
I still think it’s a data delivery performance problem. Maybe the backbone of your switch is overwhelmed and it’s dropping packets? I mentioned before in the thread, but most high-end network equipment will have graphs. Here’s ours for the OpenStack farm:
You’ll be able to see the utilization here. It’s fairly rare that you won’t have this available. This HP 2920-48G we have here also show system-wide problems like dropped packed counts and the like on a per-port basis.
Hi
Thanks. We don’t have any errors or dropped packages on either switches but I see now that there’s a lot of “inDiscards” in the 6x10GB trunk between the the server and the HP FF 5700 switch. What ever that means. I’ll have to look into that.