hashtable / dlinit getting deleted

I have just two slaves for now, but every once in a while the .dlinit, .options, etc get deleted from the users home folder.

My setup:

  • Slaves running linux, pxe booting from a common nfs share.
  • Unfortunately I can’t seem to reproduce the problem, it happens at random as far as I can tell.
  • I can fix it by just copying the needed files from the repository, but they files get deleted again after some time.

Here’s the error I’m getting from deadline:

[code]=======================================================
Error Message

could not read hashtable file

=======================================================
Slave Log

0: Loaded plugin: Maxwell
0: Task timeout is disabled.
0: Loaded job: ginger render pass (999_040_999_2c2ea594)
0: INFO: StartJob: initializing script plugin Maxwell
0: An exception occurred: Could not open plugin configuration “/home/node/Deadline/slave/Node2/plugins/Maxwell.dlinit” because could not read hashtable file (Deadline.Plugins.RenderPluginException)

=======================================================
Error Type

FileNotFoundException

=======================================================
Error Stack Trace

at FranticX.Text.HashtableReader.FromTextFile (System.String fileName, Boolean acceptEmptyValue, Boolean acceptControlCharacters) [0x00000] in :0
at FranticX.Text.HashtableReader.FromTextFile (System.String fileName, Boolean acceptEmptyValue) [0x00000] in :0
at Deadline.Plugins.ScriptPlugin.StartJob (Deadline.Jobs.Job job) [0x00000] in :0
[/code]

Why would these files be deleted from the slaves? Maybe a conflict with both machines running from the same root (the nfs share)?

thanks,

  • Eric

I think you’re running into a known issue when netbooting with Deadline 5.1 and 5.2:
thinkboxsoftware.com/deadlin … ple_Slaves

The problem is that all the slaves launch with the same name, and then they can clobber each other’s temporary job and plugin files. Disabling the multiple slaves feature should fix that problem.

Cheers,

  • Ryan

Thanks Ryan - it turns out I had disabled that on my server which of course never does any rendering. :slight_smile:

Now I’ve got it where it matters so hopefully problem solved.

  • Eric

Looks like I declared success too early, I’m still having the same problem. But I think I can describe what’s happening better:

  • I manually copy Maxwell.dlinit and the other files from plugins to the nodes home directory (for example /home/node/Deadline/slave/Node2/plugins)
  • Run multiple maxwell jobs just fine.
  • Run a command line job (which runs imagemagick once) and afterwards the /home/node/Deadline/slave/Node2/plugins directory has commandLine.dlinit, but no maxwell.dlinit files
  • Try and run maxwell again and it fails with the error in my first post.

I do have the “MultipleSlavesEnabled=False” line in my deadline.ini file.

Any ideas? Maybe a file permissions problem?

Why do you manually copy Maxwell.dlinit? Deadline should be doing that automatically when it picks up a Maxwell job. Also, it’s normal operation for Deadline to purge the jobsData and plugins folder in the slave’s local folder between jobs, so you can’t expect anything to live in those folders long term.

I was only doing that as a quick fix to get maxwell rendering - after I read the error message I just put the files in the directory where it was looking and it worked for an emergency run.

I ended up copying the deadline.ini file and putting it in the user’s home folder instead of the deadline install folder and that seems to be working properly now.

Sorry for so many false positives, but I’m back to not working.

I’m guessing it’s when deadline is trying to copy the Maxwell files to that temp directory, maybe I messed up something in the repository setup without knowing it? I have made customizations to the maxwell.py and param files in the past, though it continues to work fine on my windows workstation.

What’s currently in \your\repository\plugins\Maxwell? When the slave starts a maxwell job, it should be copying the files from this folder to the local folder. If that’s failing, maybe it’s a permission thing…

I went in and monkeyed with some of the share settings on my server, turning off ‘unix extensions’ for the Samba share seemed to make it work. Thanks for the reminder about permissions. It’s been running a few days now without issue so I think we’ve fixed it.

thanks,

  • Eric