Deadline reports rendering progress incorrectly!

I have no idea what happened, but this is a second job I’ve tried that Deadline incorrectly reports the render progress and thus doesn’t move on, especially if you have dependent jobs on it!

See the picture for clarity.

Basically, the Job still reports two or more (I even had 34!, but I only have physically 5 machines) machines still rendering. The machines list shows all slaves as Idle and the job is basically stuck.

However, all render frames are correctly rendered and all is fine, but if you have a dependant job on this one, it never gets started!

This is the log of the frame on that machine:

Log Message
0: Task timeout is disabled.
0: Plugin rendering frame(s): 7
0: INFO: Render Tasks called
0: INFO: Lightning: Render frame 7
0: INFO: Lightning: Rendering frame \messiah_UNMANAGED_PROJECTS_\Test\RENDERS\test_beauty_0007.jpg
0: INFO: Lightning: Rendering 1 render elements
0: INFO: Lightning: Checking output paths
0: INFO: Lightning: Checking default actions
0: INFO: Lightning: TYPE_MISSING_EXTERNAL_FILES = IGNORE
0: INFO: Lightning: TYPE_MISSING_UVWS = IGNORE
0: INFO: Lightning: TYPE_MISSING_DLL_FILES = FAIL
0: INFO: Lightning: TYPE_MISSING_XREF_FILES = IGNORE
0: INFO: Lightning: Calling renderer
0: INFO: Preparing to Render…
0: INFO: Updating Scene…
0: INFO: Loading Map …
0: INFO: Loading Map: brick04.jpg
0: INFO: Loading Map: concrete07.jpg
0: INFO: Updating Scene…
0: INFO: Updating Materials …
0: INFO: Updating Object
0: INFO: Updating Object: Plane01
0: INFO: Updating Object: Sphere01
0: INFO: Updating Object: Sphere02
0: INFO: Updating Motion Blur Data …
0: INFO: Updating Instance
0: INFO: Updating Instance: Plane01
0: INFO: Updating Instance: Sphere01
0: INFO: Updating Instance: Sphere02
0: INFO: Updating Lights:
0: INFO: Updating Lights: GI-Engine
0: INFO: Updating Lights: Sun-Light
0: INFO: Updating Objects…
0: INFO: Updating Intersection-Engine…
0: INFO: Collecting Toon Lines …
0: INFO: Subdividing…
0: INFO: Updating RenderElements…
0: INFO: Update Plugins …
0: INFO: Loading GI Solution …
0: INFO: Update Plugins …
0: INFO: Loading GI Solution …
0: INFO: Update Plugins …
0: INFO: Loading GI Solution …
0: INFO: Update Plugins …
0: INFO: Loading GI Solution …
0: INFO: Update Plugins …
0: INFO: Loading GI Solution …
0: INFO: Update Plugins …
0: INFO: Remapping Samples …
0: INFO: Update Plugins …
0: INFO: Preparing Maps…
0: INFO: Global Illumination prepass 1 of 1…
0: INFO: Frame Finished
0: INFO: Lightning: CallCurRendererRenderFrame returned code 1
0: INFO: Lightning: Render done
0: INFO: Lightning: writing output to temp file: C:\Users\loocas\AppData\Local\Temp\test_beauty_0007.jpg
0: INFO: Lightning: Saved image to \messiah_UNMANAGED_PROJECTS_\Test\RENDERS\test_beauty_0007.jpg
0: INFO: Lightning: Checking render elements
0: INFO: Lightning: writing output to temp file: C:\Users\loocas\AppData\Local\Temp\test_BallsAlpha_0007.jpg
0: INFO: Lightning: Saved image to \messiah_UNMANAGED_PROJECTS_\Test\RENDERS\test_BallsAlpha_0007.jpg

Log Details
Log Date/Time = Oct 03/10 20:32:14
Frames = 7-7

Slave Machine = Duber-slave-04
Slave Version = v4.1.0.42706 R

Plugin Name = 3dsmax

So, basically, it finished, but it seems as if Deadline wasn’t notified about the change?

I have to say this is the first time I’ve bumped into this issue and the only difference from previous jobs was that I used a Inline PostLoad script: ATSOps.RetargetCommonRoot @"\messiah_UNMANAGED_PROJECTS_\Test"

That’s it.

Oh, another update.

The Job just finished, after waiting about 10 minutes and it reports the render times of the frames correctly, even though it was stuck for about 10 minutes on some of the frames?

WTH?! :open_mouth:

It gets stuck even on a different plugin Job (namely Nuke).

Same deal. :frowning:

Btw: I’m running Pulse, if it makes any difference.

Just tested submitting a SINGLE JOB to a SINGLE NODE!

This is beyond weird!

We’ve seen this type of problem happen when running Pulse and when Pulse runs into permission problems. Can you give us a bit more detail about your setup? Specifically:

  1. What OS do you have the repository installed on?
  2. Do you have Pulse running on the same machine as the repository?
    i) If not, which OS is Pulse running on?
    ii) If so, is Pulse connecting to the repository via a local path (ie: c:\program files\prime focus\deadlinerepository)?

Cheers,

  • Ryan
  • Windows Server 2008 x86
  • Yes, same server.
  • Honestly, I don’t know. I thought it was connecting to the \serverName\deadline path, but it seems (screenshot attached) it’s running locally.

Thanks for the info. The first thing to try is to have Pulse use the UNC path to connect to the repository. Just use the Launcher in the system tray to change the repository path and then restart Pulse. If the problem still occurs after making this change, then please post the slave log for the session where it appears to be processing many tasks at the same time and we can look through it to see if any problems stand out.

Cheers,

  • Ryan

Will do, as soon as the current bunch is done on the server.

Thank you!

It seems, after upgrade to 4.1 SP1, the problem is gone. At least for now. I’ll see if it still does report incorrectly.

By the way, even after changing the repository option for Remote Administration, I still can’t remotely restart slaves etc…

I get this error:

Machine Name Command Timestamp Status Results
Duber-slave-01 LaunchSlave 10/06 22:39 Failed No connection could be made because the target machine actively refused it 192.168.0.201:5042

There is no firewall restriction on the local network, by the way, so the port is open.

Also, I should probably mention that I’m trying to operate the Nodes remotely, I mean, via VPN from home on a completely different network. If it changes anything.

Is there a firewall running on the remote network that would block this communication?

No, the Firewall is configured to let Deadline services through. As I said, I manage my jobs from home without issues, except for firing up the slaves. However, locally, i.e. running the remote administration from within the company network (from another slave or one of the workstations) all seems to be working fine. I just restarted the slaves successfully. It only doesn’t work via VPN from my home office.

But, since it works at least this way, it’s not a big deal actually.

Thank you again for your support, much appretiated!

Nope, the problem is still here. Just seeing 11 active nodes while in fact I only have 5 nodes in the farm and further on only 5 licenses.

Can it have anything to do with this error message I received from Deadline?

Slave Duber-slave-03 encountered an error when attempting to update its *.slaveInfo file.

Slave Info:
Slave Version: v4.1.0.43205 R
Start Date/Time: 10/06/2010
Current Time: 2010-10-08 14:58:57

Error:

Exception: Failed to update slaveInfo: The process cannot access the file because it is being used by another process.

It’s hard to say. I would have to look at the slave log from the session where it picked up multiple tasks. If you could post it, that would be a big help.

Thanks,

  • Ryan

Is it this one?:

=======================================================
Log Message

0: Task timeout is disabled.
0: Plugin rendering frame(s): 256
0: INFO: Render Tasks called
0: INFO: Lightning: Render frame 256
0: INFO: Lightning: Rendering frame \messiah_UNMANAGED_PROJECTS_\Orloj\SHOTS\BELL\RENDERS\V02\WALL_Beauty_4K_v02_0256.tif
0: INFO: Lightning: Rendering 0 render elements
0: INFO: Lightning: Checking output paths
0: INFO: Lightning: Checking default actions
0: INFO: Lightning: TYPE_MISSING_EXTERNAL_FILES = IGNORE
0: INFO: Lightning: TYPE_MISSING_UVWS = IGNORE
0: INFO: Lightning: TYPE_MISSING_DLL_FILES = FAIL
0: INFO: Lightning: TYPE_MISSING_XREF_FILES = IGNORE
0: INFO: Lightning: Calling renderer
0: INFO: Preparing to Render…
0: INFO: Updating Scene…
0: INFO: Loading Map …
0: INFO: Loading Map: backShit.jpg
0: INFO: Loading Map: backShit.jpg
0: INFO: Loading Map: backShit.jpg
0: INFO: Loading Map: backShit.jpg
0: INFO: Loading Map: backShit.jpg
0: INFO: Loading Map: backShit.jpg
0: INFO: Loading Map: backShit.jpg
0: INFO: Loading Map: backShit.jpg
0: INFO: Loading Map: frontShit.jpg
0: INFO: Loading Map: frontShit.jpg
0: INFO: Loading Map: frontShit.jpg
0: INFO: Loading Map: backShit.jpg
0: INFO: Loading Map: backShit.jpg
0: INFO: Loading Map: backShit.jpg
0: INFO: Loading Map: backShit.jpg
0: INFO: Loading Map: backShit.jpg
0: INFO: Loading Map: backShit.jpg
0: INFO: Loading Map: backShit.jpg
0: INFO: Loading Map: backShit.jpg
0: INFO: Loading Map: frontShit.jpg
0: INFO: Loading Map: frontShit.jpg
0: INFO: Loading Map: frontShit.jpg
0: INFO: Updating Scene…
0: INFO: Updating Materials …
0: INFO: Updating Object
0: INFO: Updating Object: TP
0: INFO: Updating Motion Blur Data …
0: INFO: Updating Instance
0: INFO: Updating Instance: TP
0: INFO: Updating Lights:
0: INFO: Updating Lights: Omni01
0: INFO: Updating Lights: Omni03
0: INFO: Updating Lights: Omni04
0: INFO: Updating Lights: Omni05
0: INFO: Updating Lights: Omni06
0: INFO: Updating Lights: Omni07
0: INFO: Updating Lights: Direct01
0: INFO: Updating Objects…
0: INFO: Updating Intersection-Engine…
0: INFO: Collecting Toon Lines …
0: INFO: Subdividing…
0: INFO: Updating ImageSampler…
0: INFO: Update Plugins …
0: INFO: Preparing Maps…
0: INFO: Frame Finished
0: INFO: Lightning: CallCurRendererRenderFrame returned code 1
0: INFO: Lightning: Render done
0: INFO: Lightning: writing output to temp file: C:\Users\loocas\AppData\Local\Temp\WALL_Beauty_4K_v02_0256.tif
0: INFO: Lightning: Saved image to \messiah_UNMANAGED_PROJECTS_\Orloj\SHOTS\BELL\RENDERS\V02\WALL_Beauty_4K_v02_0256.tif
0: INFO: Lightning: Checking render elements

=======================================================
Log Details

Log Date/Time = Oct 08/10 15:15:08
Frames = 256-256

Slave Machine = Vfx02
Slave Version = v4.1.0.43205 R

Plugin Name = 3dsmax

That’s from a particular job. We need the full slave log, which you can find by going to the slave machine, and selecting Help -> Explore Log Folder from the Slave UI. Just make sure to grab a log from the session where the slave picked up multiple tasks. Also, please post it as an attachment, since I’m sure it will be fairly long. :slight_smile:

Yeah, sorry for that :slight_smile:

Here are the logs that seem relevant (date/time wise), hope this helps you out.
deadlineslave(Vfx02)-2010-10-08-0000.log (532 KB)
deadlinelauncher(Vfx02)-2010-10-08-0000.log (5.49 KB)
3dsmaxConnect.log (14.8 KB)


task_535(Vfx02)-0000.log (2.2 KB)

Thanks for the log. So I went through it and I’m seeing a few of these statements:

This would seem to imply that the Slave is having problems connecting to the repository. Normally, if a connection is severed completely, the slave won’t assume the task has been requeued because it determines that the entire repository folder can’t be reached. However, if the slave only loses partial connectivity (ie: it can’t see this task file, but it can see the job folder and the repository folder), then all it can do is assume someone requeued the task. You should check to see if you’re having any network issues, because that could definitely be the source of this problem.

Cheers,

  • Ryan

Thanks for looking at this problem.

However, I can’t see any issues on the network. It’s a small (8 PCs total) local network on a 1Gb line. :frowning:

I have Pulse running though and I’m submitting from an external PC over a narrow 4Mbit line (it’s a pain, btw), but that’s about it. All the render machines and the repository are on the same local network.