Quick question… Why does deadline complete this task?
2017-12-11 21:36:32: 0: STDOUT: [Redshift]Frame file locked. Another process is already rendering it. Skipping frame
2017-12-11 21:36:32: 0: STDOUT: [Redshift]ROP node endRender
2017-12-11 21:36:32: 0: STDOUT: [Redshift]Closing the RS scene
2017-12-11 21:36:32: 0: STDOUT: Finished Rendering
2017-12-11 21:36:33: 0: INFO: Process exit code: 0
2017-12-11 21:36:33: 0: INFO: Finished Houdini Job
The exit code from Houdini is 0, so I assume that’s why, but would there be a way to catch this “[Redshift]Frame file locked. Another process is already rendering it. Skipping frame” message and instead fail the task?
Hey Shaun,
Can I first ask, what is a “Frame file locked”? It sounds kinda bad? Maybe that needs to be fixed at the root issue? Too many instances of Houdini running? Too many Deadline Slave instances? Misconfiguration of Deadline pools/groups, leading to a situation where multiple Slaves on the same machine working on the same scene file?
We can easily add a StdOut handler for this, but first, we must fully understand how we got into this situation in the first place!
Yeah, i’m so confused as to how this happened. It’s like it stumbled over its own feet. Nothing else was rendering to it and the machine doing it has only one slave one it.
Frame locked is refereeing to finding a .lock file present (rendered_file.0001.exr.lock). Redshift is supposed to remove these when it’s finished doing its thing and then at the next stage when it goes to write the final exr (rendered_file.0001.exr) it checks to see if there are one of these “.lock” files present for the frame. If it finds one, it won’t write out the file.
I can only assume that our server was too slow at removing the .lock file before Redshift went onto the next stage. …It saw it was still there, skipped the task, Houdini returns exit code 0, and then Deadline marks the task as complete. No rendered frame is ever created.
This problem occurs, when the redshift process crashes hard, and the lock file is not cleaned up by the render process.
What version of redshift are you using?
This used to be a problem with the older 2.0 series of redshift, though there was something done or an option that was added to remove this as a problem.
I feel like this hasnt been a problem for 6-12 months for us.
This is a snippet from our internally developed plugin, when we had this problem :
#in the initalize process:
self.AddStdoutHandlerCallback(".*Skipping frame.*").HandleCallback += self.HandleLockFileError
self.AddStdoutHandlerCallback(".*Skipping layer.*").HandleCallback += self.HandleLockFileError
# as a
def HandleLockFileError(self):
#- remove the lock files when it hits this block, delete the lock and requeu/ fail the task, so that it re runs correctly.
error_line = self.GetRegexMatch(0)
# example:
# [Redshift] Skipping frame 5 (1/1) - another process is already rendering to '//path/to/file.0005.exr'
bits = error_line.split("'")
frame_path = bits[1]
lock_frame = '%s.lock' % frame_path
if os.path.isfile(lock_frame) :
try:
os.remove(lock_frame)
self.deadlinePlugin.LogInfo("DEBUG:: Lock File Removed.")
except:
self.deadlinePlugin.LogInfo("ERROR:: Could not delete the lock file.")
message = "Lock File detected, file has been deleted. Please requeue this frame."
self.deadlinePlugin.FailRender(message)
Ill see if i can did around for what was done to resolve this in the end.
We’re using RS 2.5.47 (Houdini 16.5.268). I spoke to Redshift and they said they did have an option to skip past these lock files if the users chooses to, but this was only enabled in Maya. They may look at added it into Houdini.
They also suspected that the .lock file was just left from an orphaned process due to a filesystem blip. Which I think is also a possibility.
Ultimately the problem seems to be Redshift.
To have that snippet that you posted working in our pipeline though would be great. Did you just add that to your custom Houdini.py ?
Oh and also, if anyone else comes across this. Someone on the Redshift forums let me know that I could do something like this for the Redshift.py plugin:
This should requeue the task apparently…I’ll give it a go now.
Redshift 2.5.50: * [Houdini] New REDSHIFT_DISABLEOUTPUTLOCKFILES environment variable, that can be set to 1 to disable the output image files .lock feature