Ignore license error trought nuke

Ji,

I often get this king of log error:

[code]=======================================================
Error Message

Exception during render: An error occurred in RenderTasks(): Error in CheckExitCode(): Renderer returned non-zero error code, 100. Check the log for more information.
à Deadline.Plugins.ScriptPlugin.RenderTasks(String taskId, Int32 startFrame, Int32 endFrame, String& outMessage)

=======================================================
Slave Log

0: Loaded plugin: Nuke
0: Task timeout is 7200 seconds (Regular Task Timeout)
0: Loaded job: LBC.002.0005.cmp.v08.nk (000_050_999_7bb34981)
0: Successfully mapped R: to \FX-NAS-01\vfx\RENDER
0: Successfully mapped K: to \FX-NAS-01\vfx\Projets
0: INFO: StartJob: initializing script plugin Nuke
0: INFO: About: Nuke Plugin for Deadline
0: INFO: Prepping OFX cache
0: INFO: Checking Nuke temp path: C:\Users\renderfx\AppData\Local\Temp\nuke
0: INFO: Path already exists
0: INFO: OFX cache prepped
0: Plugin rendering frame(s): 249
0: INFO: Any stdout that matches the regular expression “READY FOR INPUT” will be handled as appropriate
0: INFO: Any stdout that matches the regular expression “.ERROR:.” will be handled as appropriate
0: INFO: Any stdout that matches the regular expression “.Error:.” will be handled as appropriate
0: INFO: Any stdout that matches the regular expression “.*Frame [0-9]+ (([0-9]+) of ([0-9]+))” will be handled as appropriate
0: INFO: Stdout Handling Enabled: True
0: INFO: Popup Handling Enabled: True
0: INFO: Using Process Tree: True
0: INFO: Hiding DOS Window: True
0: INFO: Creating New Console: False
0: INFO: Render Executable: “C:\Program Files\Nuke6.3v8\Nuke6.3.exe”
0: INFO: Rendering with NukeX
0: INFO: Render Argument: -V --nukex -x -F 249-249 “C:\Users\renderfx\AppData\Local\Temp\LBC.002.0005.cmp.v08_thread0.nk”
0: INFO: Startup Directory: “C:\Program Files\Nuke6.3v8”
0: INFO: Process Priority: AboveNormal
0: INFO: Process is now running
0: STDOUT: NukeX 6.3v8, 64 bit, built May 29 2012.
0: STDOUT: Copyright © 2012 The Foundry Visionmongers Ltd. All Rights Reserved.
0: STDOUT: Timestamp: Mon Sep 09 10:03:50 2013
0: STDOUT: License Requested: nuke 2012.0529 render only
0: STDOUT: Host IDs: d4ae527b8ecb, d4ae527b8eca
0: STDOUT: License failure:
0: STDOUT: Licensed number of users already reached.
0: STDOUT: Feature: nuke_r
0: STDOUT: License path: C:\ProgramData\The Foundry\FLEXlm\client.lic;C:\ProgramData -
0: STDOUT: \The Foundry\FLEXlm\example.lic;C:\Program Files\The Foundry\FLEXlm -
0: STDOUT: *.lic
0: STDOUT: FLEXnet Licensing error:-4,132
0: STDOUT: For further information, refer to the FLEXnet Licensing End User Guide,
0: STDOUT: available at “www.macrovision.com”.
0: STDOUT: License failure:
0: STDOUT: Error : Maximum user counted exceeded.
0: INFO: Process exit code: 100
0: An exception occurred: Exception during render: An error occurred in RenderTasks(): Error in CheckExitCode(): Renderer returned non-zero error code, 100. Check the log for more information.
à Deadline.Plugins.ScriptPlugin.RenderTasks(String taskId, Int32 startFrame, Int32 endFrame, String& outMessage) (Deadline.Plugins.RenderPluginException)

=======================================================
Error Type

RenderPluginException

=======================================================
Error Stack Trace

à Deadline.Plugins.Plugin.RenderTask(String taskId, Int32 startFrame, Int32 endFrame)
à Deadline.Slaves.SlaveRenderThread.RenderCurrentTask(TaskLogWriter tlw)
[/code]

I read quite a bit about it, and it seem that there is too many request to my license server and the time that one computer release the nuke license, the other computer try to take that license at the same time and create this error.

This cause me a lot of problem because it’s failing my jobs when it append, and i have to reduce my number of available license on deadline. Exemple i have 20 license i only make 18 of those avalaible to limit the number of error like this.

I wonder if deadline could ignore this error and simply retry to render the frame ? So that way when the error come up well it wait 5 second and try again witout failing my job. That would be great because i could also use all of my license without any problem.

Thks

Fred

Hey Fred! There’s actually a better solution to this.

Deadline has a feature called limits whereby only X number of Slaves are allowed to take X number of licenses at any one time. That will make sure only 20 of those Nuke render licenses are used at any one time.

Take a look: thinkboxsoftware.com/deadline-5-limitgroups/
In Deadline 6.0, you’ll need to make a new limits panel from the little window icon in the toolbar.

More info on pools, groups, limits and other neat stuff can be found here:
thinkboxsoftware.com/deadlin … imits.html

You then need to specify that limit when submitting a Nuke job.

Hmm… I just noticed based on this forum post that there may be an issue in 6.0. Let us know if you hit the same problem.

viewtopic.php?f=11&t=10200

Hi,
I am aware of thé limit job, but as i said, part of the problem is I can only use 19 Of my 20 license with the limit system. Because the license manager doesn’t release the token for the license fast enough, which create this error.

EDIT: This error ocurs on deadline 5.1 too, i was aware of the probleme since a long time but now it’s getting a but more problematic.
Thks !

Fred

Yup, this looks like something we will need to look into more.

It seems like the solution would be to have a pause on limit groups where a limit stub isn’t released for ## seconds. Then we could set a stub to wait 15 seconds.

Another option would be to ignore license errors towards “failed tasks” and “failed jobs”. I don’t mind the errors so much as the fact that it’ll fail a job so that it never finishes rendering.

I agree with that one guy, The error would not bother me if it would stop failing the job

I’ll admit, having failure classes would be super handy for all kinds of neat reasons. I’ll discuss with Ryan and see what we come up with.

I’m not going to guarantee anything, but it’s always worth considering ideas and where they might fit in.

As it stands right now, and error is an error. Everything goes through the same handler pipeline.

Couple of thoughts here…

  1. Ensure end client is running the latest version of flexlm lmgrd license software. Users tend to install license server software and forget about it. Even downloading the latest FLT from The Foundry would be a good idea. Especially, as they have built a nice wrapper utility in the newer versions. Does this resolve the licenses not being released quick enough? The lmgrd software can be updated independently of the software vendors binary. Maybe a newer version of Flexlm will resolve the issue? Worth a try…

  2. You could have a RegEx handler for this exact error and do a sleep command, BUT the error occurs right at the beginning of the job as the ManagedProcess starts up. In this case, Nuke.exe hasn’t really introduced any wasted time into the task at hand, but is adding 1 error to the overall error count. However, you get yourself into a possible bad situation by introducing a sleep() function. Slave hits this error and does a sleep, retries after 10 seconds or whatever, hits the error again, goes to sleep again…infinity, whilst other jobs in the queue are waiting to be picked up? Maybe, there is a license issue (mis-config) on this one slave and rightly so, it should fail and not just circle around between 10 sec sleeps?

  3. Flexlm has a “lmstat lmremove” command via it’s utility, which could be executed at “EndJob” of ALL Nuke jobs on ALL slaves (needs to be like this to ensure 100% reliability), which will ensure a slave cleans up it’s particular license it’s pulled from the license server. However, again, 2 major issues. (a) what happens if the license server is unavailable for a split-second…network blip…restart…blue-screen and (b) Google it and you will see reports of repeated “lmstat lmremove” command executions in short repetition over the network to your license server are prone to locking up the flexlm process = BAD!

  4. Enhance “LimitGroup” functionality with a user customisable sleep( variable) for each limit group. If a job gets this limit group stub, then it does a “hold” just before the ManagedProcess starts up. But your slowing your farm down and how long is long enough?

  5. Another possible option:

Use the new ManagedProcess.AbortRender method to re-queue the task without adding to the error count but only if this exact error has been identified (edit the Nuke plugin STDout handler):

FranticX.Processes.ManagedProcess.AbortRender( string "message", AbortLevel Minor )

Now what Deadline doesn’t currently have is a “re-queue” counter as part of “Job Settings” > “Failure Detection”. My concern here is that I want to know if a job is receiving too many “re-queues” via the normal job notification framework (The Nuke job could be going round in circles forever). I see this as yet more variables in the “Failure Detection” repo job settings: “Fail task after this many re-queues” & “Fail job after this many re-queues” & of course, “Send a Warning to the Job’s User after it has generated this many re-queues”

Option #5 at least introduces a framework which can be re-used to handle license issues across all plugins. I don’t like option #4 as it’s not as flexible as #5 for the future.

Hi,
Has anyone had this issue again since upgrading to at least Deadline v6.1 beta 3 or later?
Also, are you running Flexlm or RLM as your Nuke license server? (I can see from the logs that Fred is on Fexlm)
Feedback would be good!
Thanks!

For me, the Limit pane is showing 1 “Stubs In Use” for a job that has a limit set, even though 8 machines were rendering it. It also seems to lag a bit, as it continued to read 1 even after the job had exited the farm for several minutes. There were about 60 machines that were assigned to this limit group.

I’m on:
Deadline Version: 6.1.0.54665 R
FranticX Version: 2.0.0.54634 R

Hello Brett,

Can I have you open up the console in the monitor(view, new panel, console) and see if there is anything in there to indicate an issue in what is being reported in the monitor? As well, can you verify what the last update time is showing as in the lower right corner when this happens? Thanks.