we are having some strange Deadline behavior. We created a new nuke
plugin which we called custom_nuke and when rendering with it on some
occasions it throws the error “could not read hashtable”.
The plugin was copied from the original nuke plugin and we added some
functionality (Local Rendering to get some load off our file servers).
It works 80% of the time but sometimes it throws this error and we would
like to get rid of it.
I saw an earlier post in the forum about it but unfortunately it didn’t help.
Thank you.
Also, is this error completely random? For example, will it happen on a task, and then the next time the task gets picked up, it renders fine?
Is it possible this error is limited to one or two slaves? You can check the slave name in the error reports to determine what machine the error occurred on.
at FranticX.Text.HashtableReader.FromTextFile(String fileName, Boolean acceptEmptyValue, Boolean acceptControlCharacters)
at FranticX.Text.HashtableReader.FromTextFile(String fileName, Boolean acceptEmptyValue)
at Deadline.Plugins.ScriptPlugin.StartJob(Job job)[/code]
I checked again and it is not happening on the same slaves. The slaves are rendering fine with the same plugin on other jobs.
Regarding the tasks. It seems to be job specific. Slave A is not rendering Task A and B of Job A because of the error. Slave B is rendering just fine. 5 minutes later Slave A is able to render Job B.
Unfortunately I can’t tell you if one task is not rendering on attempt one and is rendering on attempt two because the slave never picked up the same task twice.
Do you have a file or a folder in the custom_nuke plugin directory in the repository that starts with a period? We have seen an error like this when there was a .svn folder in the plugin’s folder.
Also, which OS is the repository installed on, and how many slaves do you have connecting to it?
We do not have .svn folders in our repository or any file that starts with a period.
The repository is on Windows Server 2008. All slaves are on Windows 7. We are connecting with around 80-130 slaves depending on the renderload we need to get handled.
sorry for the late answer. We are in the wake of moving offices and it is a little bit hectic around here at the moment.
The error is disappearing slowly. Every other day their is just one job which has only a few errors.
Right now it seems to only happen when a job is submitted at one specific machine. Could it be that it has something to do with the user profile in which some one is logged in?
Thank you again for your help.
I’m glad to hear the problem is disappearing, but that’s really strange that only jobs submitted from a particular machine would be affected.
If you want to debug further, you could the following line to your plugin’s dlinit file:
DebugLogging=true
This will result in a lot of extra log messages during the plugin loading phase. Then when it happens again, you could send us the full error report and we can take a look.
I integrated the DebugLogging flag. It works fine on jobs which doesn’t have the error but on jobs with the error my error log is still as minimal as it can be:
=======================================================
Error Message
=======================================================
could not read hashtable file
=======================================================
Slave Log
=======================================================
0: Task timeout is disabled.
0: Loaded job: x.nk (001_050_003_2afd34fd)
0: Successfully mapped M: to \\mxxxx\xxxxx
0: Successfully mapped N: to \\nxxxx\xxxxx
0: Successfully mapped O: to \\oxxxx\xxxxx
0: Successfully mapped P: to \\pxxxx\xxxxx
0: Successfully mapped Q: to \\qxxxx\xxxxx
0: Successfully mapped R: to \\rxxxx\xxxxx
0: INFO: StartJob: initializing script plugin custom_nuke
=======================================================
Error Type
=======================================================
FileNotFoundException
=======================================================
Error Stack Trace
=======================================================
at FranticX.Text.HashtableReader.FromTextFile(String fileName, Boolean acceptEmptyValue, Boolean acceptControlCharacters)
at FranticX.Text.HashtableReader.FromTextFile(String fileName, Boolean acceptEmptyValue)
at Deadline.Plugins.ScriptPlugin.StartJob(Job job)
I think I previously forgot to talk about some other details. We are sending from the default “Nuke” submission script but have a onJobSubmission event plugin to change it to “custom_nuke”. It allows us to change quickly to the custom_nuke plugin without having to replace the default submitter for nuke. Perhaps this is causing some of the trouble?
Sorry for the late gap between my last reply and this one we were moving offices and there was a lot of other stuff to get done. I just want you to know that I really appreciate the help. Thank you.
Actually, that could very well be the problem. As a test, could you try just changing the Nuke submitter, even if it’s temporary? The integrated Nuke submitter uses a proxy system, and you should have installed the proxy scripts on your workstations. These scripts just call the main \your\repository\submission\Nuke\SubmitNukeToDeadline.py script. So you can just modify this main script and after you restart Nuke on your workstations, they will start using the modified script.
Changing the Nuke submitter seems to do the trick. At least we didn’t have any errors the last 8 hours.
The reason why we didn’t change the submitter in the first place was that we didn’t want to touch the original submitter. There were reasons for this choice I can’t go into unfortunately.
I have to talk to my colleague if we change this decision based on our new findings.