Before I detail the specifics of the problem, here’s a basic overview of our setup:
Storage - xsan volume reshared over smb via a 10.6 server
Clients - windows 7 64 pro
Nodes - windows 7 64 pro
Softimage 2012
Deadline 5.1
Instead of using mapped drives (eg Z:), we use \server\share, for both nodes and workstations.
All machines are authenticated onto that server using the same credentials and for good measure i have chmod’d the job folder in question.
I’ll also prefix this by saying that pretty much all of our maya and nuke jobs all go through fine, there are some occasions where maya drops textures, mostly tiffs, it’s not too much of a problem though we would like to sort it out at some point.
This morning we tried to submit a scene. All our nodes picked the job up, started a task, loaded xsibatch and the scene, but failed on reading some textures. It complained that the texture wasn’t in the specified place (it is). Some nodes rendered their tasks fine, but the majority errored (ERROR: 2000) and re-queued themselves.
After verifying that the textures were there and had the correct permissions, i set the error checking to false in the repository / plugins section. The tasks again submitted fine, didn’t error but dropped loads of textures. Some textures were ok in some frames, some were not, some were fine, just nothing consistent.
We fiddled around with loads of settings in soft and deadline, image handling type, scripting language prefs, submit scene option in deadline submission, but none made any difference. I’ve tried submitting the scene via deadline monitor also, same deal.
We merged the soft scene into a new project, same thing. Also checked on the max file lenght for windows, we weren’t even close to it.
As a separate test, i submitted the scene to the nodes using the xsibatch commandline. This seemed a bit more stable, but again dropped frames often.
We also created a new scene, new project, with all the same textures applied to separate cubes, again it dropped textures.
Submitted scene as suspended, closed soft to get rid of the lock file and resumed it in deadline, no difference.
As a final test, i submitted the job, but limited the number of machines it could render on to 2. Although slow, it didnt drop any frames at all. We thought we had hit on the problem, perhaps some wierd network glitch but remember that we can submit maya and nuke jobs to all 35 nodes without issue.
I’m sure there are other things that we have tried that I cannot remember, but at this point, im kinda out of ideas, so any suggestions woudl be most welcome!!