By the way, you mention your machines running Windows 7, mine are too. So, this might be a lead?
My setup is:
Server: Windows Server 2008 x86, Deadline Repository, Domain Controller
Nodes (5): Windows 7 Professional x64, all machines are in the domain and have full access to the repository. I mainly run 3ds Max 2010 x64 and Nuke 6.1v1 x64
Deadline: latest build, Pulse running on the server where the repository sits.
LAN: 1Gbps
This error occurs when the slave is unable to update its “state” file in the repository, which we call the “slaveInfo” file. These files exist in \your\repository\slaves[SLAVE NAME]. When it is unable to write to this file, it prints out the error message you’re seeing, along with the reason it failed. In this case, it’s failing because it thinks another process has locked that file down.
Normally, if this only happens a few times, it’s not a cause for concern, as this should have no impact on the actual rendering. However, if it’s a chronic problem where the slave can’t update its slaveInfo for a while, then this can lead to Deadline detecting the slave as stalled and requeuing its current task. This can result in lost render time. Please let us know which case it is for you.
If it’s the latter, a good start is to figure out when the problem occurs. For example:
Is it always at the same time each day?
Does it only happen when a slave is rendering? What do the system resources look like when this is occurring? If CPU and RAM are completely maxed out, that could impact the slave’s ability to update this file.
I seem to be having this very issue on one of my render machines. I thought it might be a permissions issue but after checking everything is in order. I even uninstalled the client, re-installed and am up agianst the same issue. Any ideas on why I might be coming across this error?
Slave - Exception:Failed to update salveinfo: The process cannot access the file because it is being used by another process.
Slave - No slave update eroor notification address specified in Repository Options - cannot send notification
The occurrence of this issue should be a lot less in the upcoming Deadline 5.0 release. The standard protocol for updating xml files (like the slave info file) is to make multiple attempts to write the file in case it’s locked for a brief second. In the case of the slave info, we weren’t doing this, so a single failed attempt resulted in the error message you’re seeing. In Deadline 5.0, multiple attempts will be made, so the chances of this problem occurring are a lot less.
Does this slave ever get marked as stalled? If not, that means that it is able to update its slave info in a timely fashion, and you can pretty much disregard this error message.
Purging trash
---- April 18 2011 – 11:32 AM ----
Slave - Exception: Failed to update slaveInfo: The process cannot access the file because it is being used by another process.
Slave - No slave update error notification address specified in Repository Options - cannot send notification
---- April 18 2011 – 11:34 AM ----
Purging limit groups
Hmm, those errors shouldn’t prevent the slave from dequeuing tasks. Do you have slave verbose logging enabled? If not, enable it in the Repository Options (in the Logging section) and then restart the Slave app on this machine to see if it prints out any additional info when it tries to look for a task. Feel free to post the slave log and we’ll have a look.
Thanks for the log. There are no errors, so that seems to imply that this particular slave hasn’t been configured properly to pick up jobs. A few things to check:
Is the slave assigned to the pool and/or group that your jobs are being submitted to?
Is the slave on your jobs’ blacklist, or not part of their whitelist?
Do the jobs use any limit groups that the slave has been blacklisted from?
Is the slave on your jobs’ bad slave list? You can see a job’s bad slave list by right-clicking on it and selecting View Bad Slave List.
1.) The slave in question is assigned to the correct group/pool
2.) It is part of the whitelist
3.) Machine has not been blacklisted
4.) Machine is not on the bad slave list
When I submit a job the machine in question shows up in my task list as being active; it just doesn’t render anything. I still get the error:
---- April 19 2011 – 11:24 AM ----
Slave - Exception: Failed to update slaveInfo: The process cannot access the file because it is being used by another process.
Slave - No slave update error notification address specified in Repository Options - cannot send notification
That’s really weird, since the log doesn’t show the slave even attempting to pick up a job.
It really sounds like something is borked with this particular machine. Do you have a render node image that you can use to “reset” this machine? If so, that might be faster than trying to figure out why this one machine is misbehaving.