AWS Thinkbox Discussion Forums

Redshift not rendering frame error not caught by Deadline

Hey,

Quick question… Why does deadline complete this task?

2017-12-11 21:36:32: 0: STDOUT: [Redshift]Frame file locked. Another process is already rendering it. Skipping frame
2017-12-11 21:36:32: 0: STDOUT: [Redshift]ROP node endRender
2017-12-11 21:36:32: 0: STDOUT: [Redshift]Closing the RS scene
2017-12-11 21:36:32: 0: STDOUT: Finished Rendering
2017-12-11 21:36:33: 0: INFO: Process exit code: 0
2017-12-11 21:36:33: 0: INFO: Finished Houdini Job

The exit code from Houdini is 0, so I assume that’s why, but would there be a way to catch this “[Redshift]Frame file locked. Another process is already rendering it. Skipping frame” message and instead fail the task?

That would be supper helpful.

Thanks,
Shaun.

Redshift have told me that it’s not an error, so that’s why Houdini sees it as 0.

My question then is, how can I catch this?

How can I catch this line: STDOUT: [Redshift]Frame file locked. Another process is already rendering it. Skipping frame

and force Deadline to fail the task?

Hey Shaun,
Can I first ask, what is a “Frame file locked”? It sounds kinda bad? Maybe that needs to be fixed at the root issue? Too many instances of Houdini running? Too many Deadline Slave instances? Misconfiguration of Deadline pools/groups, leading to a situation where multiple Slaves on the same machine working on the same scene file?
We can easily add a StdOut handler for this, but first, we must fully understand how we got into this situation in the first place! :slight_smile:

Hey,

Yeah, i’m so confused as to how this happened. It’s like it stumbled over its own feet. Nothing else was rendering to it and the machine doing it has only one slave one it.

Frame locked is refereeing to finding a .lock file present (rendered_file.0001.exr.lock). Redshift is supposed to remove these when it’s finished doing its thing and then at the next stage when it goes to write the final exr (rendered_file.0001.exr) it checks to see if there are one of these “.lock” files present for the frame. If it finds one, it won’t write out the file.

I can only assume that our server was too slow at removing the .lock file before Redshift went onto the next stage. …It saw it was still there, skipped the task, Houdini returns exit code 0, and then Deadline marks the task as complete. No rendered frame is ever created.

ok, what exact version of Deadline are you using?

9.0.7.0

Hey Shaun,

This problem occurs, when the redshift process crashes hard, and the lock file is not cleaned up by the render process.

What version of redshift are you using?
This used to be a problem with the older 2.0 series of redshift, though there was something done or an option that was added to remove this as a problem.
I feel like this hasnt been a problem for 6-12 months for us.

This is a snippet from our internally developed plugin, when we had this problem :


#in the initalize process:

        self.AddStdoutHandlerCallback(".*Skipping frame.*").HandleCallback += self.HandleLockFileError
        self.AddStdoutHandlerCallback(".*Skipping layer.*").HandleCallback += self.HandleLockFileError

# as a 
    def HandleLockFileError(self):
        #- remove the lock files when it hits this block, delete the lock and requeu/ fail the task, so that it re runs correctly.

        error_line = self.GetRegexMatch(0)
        # example:
        #  [Redshift] Skipping frame 5 (1/1) - another process is already rendering to '//path/to/file.0005.exr'
        bits = error_line.split("'")
        frame_path = bits[1]
        lock_frame = '%s.lock' % frame_path

        if os.path.isfile(lock_frame) :

            try:
                os.remove(lock_frame)
                self.deadlinePlugin.LogInfo("DEBUG:: Lock File Removed.")
            except:
                self.deadlinePlugin.LogInfo("ERROR:: Could not delete the lock file.")
                
    message = "Lock File detected, file has been deleted. Please requeue this frame."
    self.deadlinePlugin.FailRender(message)

Ill see if i can did around for what was done to resolve this in the end.

Cheers
Kym

also this is from the release notes for version 2.5.44 of redshift, maybe your hitting this:

  • [Houdini] Allow proxy sequence export from multiple processes (.lock file created while exporting proxy files)

Hey Kwatts,

We’re using RS 2.5.47 (Houdini 16.5.268). I spoke to Redshift and they said they did have an option to skip past these lock files if the users chooses to, but this was only enabled in Maya. They may look at added it into Houdini.

They also suspected that the .lock file was just left from an orphaned process due to a filesystem blip. Which I think is also a possibility.

Ultimately the problem seems to be Redshift.

To have that snippet that you posted working in our pipeline though would be great. Did you just add that to your custom Houdini.py ?

Just letting you know that Redshift have fixed this one for me in their next release (they’ll provide an option to skip the check for lock files).

As a side note… is it possible to modify the OutputFilename for a job?

imageFile_020__main__v007_beauty.%AOV%.######.exr.exr

Above is OutputFilename0 for a job. Redshift understands this, but obviously Windows and Deadline don’t.

Is it possible to make an event which I could write to correct that image output variable?

I can make a new thread for that question of you like?

Oh and also, if anyone else comes across this. Someone on the Redshift forums let me know that I could do something like this for the Redshift.py plugin:

This should requeue the task apparently…I’ll give it a go now.

Redshift 2.5.50: * [Houdini] New REDSHIFT_DISABLEOUTPUTLOCKFILES environment variable, that can be set to 1 to disable the output image files .lock feature

:smiley:

Privacy | Site terms | Cookie preferences