Problem: Multiple tasks being assigned to the same node

3dsmax 2009 and Deadline 3.1

Im having a issue where deadline is assigning multiple tasks to the same render node. The concurrent task is set to 1 in both the Advance section under “Modify Job Properties” and “Modify Slave Settings.” This only start to happen in mid job. Once multiple task starts occuring, nothing renders from those nodes. When I was in Max 2008 with Deadline 2.7. I did not have these problems. Possible the update of frame.net are causing these issues? Any thoughts?

Here is the error report.

Exception during render: An error occurred in RenderTasks(): RenderTask: Unexpected exception (Monitored managed process “3dsmaxProcess” has exited or been terminated.
2009/05/20 07:01:36 INF: Loaded C:/Documents and Settings/render.HIVE/Local Settings/Application Data/Frantic Films/Deadline/slave/jobsData/sh04A_v07.max
2009/05/20 07:01:37 INF: Job: C:/Documents and Settings/render.HIVE/Local Settings/Application Data/Frantic Films/Deadline/slave/jobsData/sh04A_v07.max
2009/05/20 07:06:14 ERR: An unexpected exception has occurred in the network renderer and it is terminating.
) (Deadline.Plugins.ScriptPlugin+FailRenderException) (Deadline.Plugins.RenderPluginException)
at Deadline.Plugins.ScriptPlugin.RenderTasks(Int32 startFrame, Int32 endFrame, String& outMessage)

Slave Log
Prepass 1 of 1… [00:00:25.4] [00:00:37.2 est]
0: INFO: Prepass 1 of 1… [00:00:25.5] [00:00:37.0 est]
0: INFO: Prepass 1 of 1… [00:00:25.7] [00:00:36.0 est]
0: INFO: Prepass 1 of 1… [00:00:25.8] [00:00:34.6 est]
0: INFO: Prepass 1 of 1… [00:00:26.0] [00:00:34.2 est]
0: INFO: Prepass 1 of 1… [00:00:26.1] [00:00:33.8 est]
0: INFO: Prepass 1 of 1… [00:00:26.3] [00:00:33.1 est]
0: INFO: Prepass 1 of 1… [00:00:26.4] [00:00:32.9 est]
0: INFO: Prepass 1 of 1… [00:00:26.6] [00:00:31.9 est]
0: INFO: Prepass 1 of 1… [00:00:26.7] [00:00:31.7 est]
0: INFO: Prepass 1 of 1… [00:00:26.9] [00:00:31.5 est]
0: INFO: Prepass 1 of 1… [00:00:27.0] [00:00:31.3 est]
0: INFO: Prepass 1 of 1… [00:00:27.2] [00:00:30.8 est]
0: INFO: Prepass 1 of 1… [00:00:27.3] [00:00:30.5 est]
0: INFO: Prepass 1 of 1… [00:00:27.5] [00:00:29.7 est]
0: INFO: Prepass 1 of 1… [00:00:27.6] [00:00:28.3 est]
0: INFO: Prepass 1 of 1…: done [00:00:28.7]
0: WARNING: Monitored managed process 3dsmaxProcess is no longer running
0: In the process of canceling current task: ignoring exception thrown by PluginLoader
Scheduler Thread - Render Thread 0 threw an error:
Scheduler Thread - Exception during render: An error occurred in RenderTasks(): RenderTask: Unexpected exception (Monitored managed process “3dsmaxProcess” has exited or been terminated.
2009/05/20 07:01:36 INF: Loaded C:/Documents and Settings/render.HIVE/Local Settings/Application Data/Frantic Films/Deadline/slave/jobsData/sh04A_v07.max

2009/05/20 07:01:37 INF: Job: C:/Documents and Settings/render.HIVE/Local Settings/Application Data/Frantic Films/Deadline/slave/jobsData/sh04A_v07.max

2009/05/20 07:06:14 ERR: An unexpected exception has occurred in the network renderer and it is terminating.

) (Deadline.Plugins.ScriptPlugin+FailRenderException) (Deadline.Plugins.RenderPluginException)

at Deadline.Plugins.ScriptPlugin.RenderTasks(Int32 startFrame, Int32 endFrame, String& outMessage)

Error Type
RenderPluginException

Error Stack Trace
at Deadline.Plugins.Plugin.RenderTask(Int32 startFrame, Int32 endFrame)
at Deadline.Slaves.SlaveRenderThread.RenderCurrentTask()

It sounds like a the slave thinks that the task has been requeued while it is rendering it, and then moves on to another task, when actually the original task hasn’t been requeued. The only known cause for this problem in Deadline 3.1 is network related (ie: the slave loses access to parts of the repository due to network problems), so we suggest checking to see if your slaves are having any problems accessing the repository.

Also, if you could give us a full slave log that highlights a slave losing track of one or more tasks, that would be really helpful.

Cheers,

  • Ryan

Sure, where can I locate the slave log? On the render node or repository? Thanks

From the Slave application running on the machine, select Help -> Explore Log Folder. The Slave creates a log for each session, so you may have to search through the Slave logs to find the applicable log(s).

Cheers,

  • Ryan

Sorry for the delayed response. We had to switch back to backburner due to time constraints of the project. I have seen this issue now in 2 other places as well. Where a render node will start another task before completing its current task. I’ve enclose a jpg of the manager and the slave log of a render node. Please take a look at the jpg. you can see how the render nodes just hang and start another task before completing a task. Thank you.
deadlineslave(Quadstation01)-2009-07-14-0000.log (905 KB)
deadline error.jpg

It looks like a network problem is causing the slave to think that the job folder no longer exists:

If the slave can no longer see the task file or it’s job folder, but it can access the repository root, it assumes that the repository is still online, but that the job has been deleted. In the next version of Deadline, we will be checking more folders within the repository before assuming it’s still online, which should help eliminate this problem.

Out of curiosity, have you been noticing any network hiccups lately? Also, how many slaves/workstations do you have, and what OS is the repository installed on?

Cheers,

  • Ryan

The network is stable without any hiccups. Everything was fine until the upgrade to 3.1. The repository is on a node with windows 2003 server. There are 5 render nodes and 1 workstation. I however has seen this problem when the repository was on xp 64bit with 17 render nodes and 6 workstations. I will investigate into the network. Keep you posted. Until then any other thoughts would be helpful.

Unfortunately, there really isn’t anything else that can be done at this time. The code that checks if the repository is “online” or not if a task file can’t be found hasn’t changed since 3.0, so for whatever reason the slave is experiencing a situation where it can “see” the repository root, but can’t “see” the job folder that its current task belongs to. That’s why I thought it might be a network issue. The next release will feature a more robust “online” check, so hopefully that will resolve this problem going forward.

If you do discover anything in particular with your network that might be responsible, let us know!

Cheers,

  • Ryan