AWS Thinkbox Discussion Forums

OnLastTaskComplete RestartSlave restarting slave mid-render?

Hi there,

We are updating to a later version, and have been sending OnLastTaskComplete RestartSlave commands to slaves in groups.

We have noticed however that at least in some cases, deadline doesnt wait for the task to be complete, after a while it simply kills the slave whether its rendering or not…

The slave log (fume sim having been going for 20+ hours):

2016-04-13 11:23:36:  Listener Thread - ::ffff:172.18.10.187 has connected
2016-04-13 11:23:36:  Listener Thread - Received message: SlaveStillRunning
2016-04-13 11:23:40:  0: INFO: scLog: info ::oFumeFXReturnFrames.returnFrames()
2016-04-13 11:23:40:  0: INFO: scLog: info No FumeFX files found to return, skipping
2016-04-13 11:23:40:  0: STDOUT: FumeFX: I/O done in 668.41 sec
2016-04-13 11:23:40:  0: STDOUT: FumeFX: Saving cache data OK.
2016-04-13 11:23:40:  0: STDOUT: Wed Apr 13 11:23:40 2016 (+1305938ms) :(3dsmax): ::oFumeFXReturnFrames.returnFrames()
2016-04-13 11:23:40:  0: STDOUT: Wed Apr 13 11:23:40 2016 (+291ms)   :(3dsmax): No FumeFX files found to return, skipping
2016-04-13 11:23:40:  0: STDOUT: Wed Apr 13 11:23:40 2016 (+1ms)     :(3dsmax): 1079f took 1306232 ms
2016-04-13 11:23:40:  0: STDOUT: Wed Apr 13 11:23:40 2016 (+0ms)     :(3dsmax):  slidertime: 1077f anim range: 950f 1220f
2016-04-13 11:23:40:  0: STDOUT: FumeFX: Frame time: 21:46.31
2016-04-13 11:23:40:  0: STDOUT: FumeFX: Estimated Time: 61:26:47
2016-04-13 11:23:40:  0: STDOUT: FumeFX:  
2016-04-13 11:23:40:  0: STDOUT: FumeFX: Frame: 1078
2016-04-13 11:23:40:  0: STDOUT: FumeFX: Vorticity Strength Effector is active (vort)
2016-04-13 11:23:40:  0: INFO: Ignoring popup "FumeFX: Frame 1078 (950 - 1220)"
2016-04-13 11:23:40:  0: INFO: scLog: info 1079f took 1306232 ms
2016-04-13 11:23:40:  0: INFO: scLog: info  slidertime: 1077f anim range: 950f 1220f
2016-04-13 11:23:41:  Listener Thread - ::ffff:172.18.10.187 has connected
2016-04-13 11:23:41:  Listener Thread - Received message: SlaveStillRunning
2016-04-13 11:23:42:  0: STDOUT: FumeFX: Vmax=3.430, dtReq=0.36, steps=1, CFL'=13.72
2016-04-13 11:23:46:  Listener Thread - ::ffff:172.18.10.187 has connected
2016-04-13 11:23:46:  Listener Thread - Received message: SlaveStillRunning
2016-04-13 11:23:51:  Listener Thread - ::ffff:172.18.10.187 has connected
2016-04-13 11:23:51:  Listener Thread - Received message: SlaveStillRunning
2016-04-13 11:23:56:  Listener Thread - ::ffff:172.18.10.187 has connected
2016-04-13 11:23:56:  Listener Thread - Received message: SlaveStillRunning
2016-04-13 11:24:01:  Listener Thread - ::ffff:172.18.10.187 has connected
2016-04-13 11:24:01:  Listener Thread - Received message: SlaveStillRunning
2016-04-13 11:24:06:  Listener Thread - ::ffff:172.18.10.187 has connected

These Listener Thread lines keep rpeating every 5 secs till:

2016-04-13 11:30:20:  Listener Thread - Received message: SlaveStillRunning
2016-04-13 11:30:25:  Listener Thread - ::ffff:172.18.10.187 has connected
2016-04-13 11:30:25:  Listener Thread - Received message: SlaveStillRunning
2016-04-13 11:31:50:  Listener Thread - ::ffff:172.18.10.187 has connected
2016-04-13 11:31:50:  Listener Thread - Received message: SlaveStillRunning
2016-04-13 11:32:11:  0: STDOUT: FumeFX: Resizing: from (994,965,560) to (1001,965,560)
2016-04-13 11:32:11:  0: STDOUT: FumeFX: Memory used: 21666Mb
2016-04-13 11:32:19:  Listener Thread - ::ffff:172.18.10.187 has connected
2016-04-13 11:32:19:  Listener Thread - Received message: StopSlave
2016-04-13 11:32:19:  Slave - slave shutdown: normal
2016-04-13 11:32:19:  Info Thread - requesting slave info thread quit.
2016-04-13 11:32:20:  Scheduler Thread - Seconds before next job scan: 2
2016-04-13 11:32:20:  Scheduler - Returning all limit stubs.
2016-04-13 11:32:20:  Scheduler -   returning mx_fumefxsim
2016-04-13 11:32:20:  0: Shutdown

In the launcher:

2016-04-13 10:23:39:  BEGIN - LAPRO0648\scanlinevfx
2016-04-13 10:23:39:  ::ffff:172.18.8.91 has connected
2016-04-13 10:23:40:  Launcher Thread - Received command: OnLastTaskComplete RestartSlave ~
2016-04-13 10:23:40:  Local version file: C:\Program Files\Thinkbox\Deadline7\bin\Version
2016-04-13 10:23:40:  Network version file: \\inferno2\deadline\repository7\bin\Windows\64bit\Version
2016-04-13 10:23:40:  Comparing version files...
2016-04-13 10:23:40:  Version files are different
2016-04-13 10:23:40:  Sending command to slave: OnLastTaskComplete RestartSlave
2016-04-13 10:23:40:  Got reply: LAPRO0648: Sent "OnLastTaskComplete RestartSlave" command. Result: "Connection Accepted.
2016-04-13 10:23:40:  "
2016-04-13 10:23:40:  Sending command to slave: OnLastTaskComplete RestartSlave
2016-04-13 10:23:40:  Got reply: LAPRO0648: Sent "OnLastTaskComplete RestartSlave" command. Result: "Connection Accepted.
2016-04-13 10:23:40:  "
2016-04-13 10:23:40:  Launcher Thread - OnLastTaskComplete: Set to RestartSlave
2016-04-13 10:23:40:  Launcher Thread - Responded with: Success|
2016-04-13 10:23:44:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2016-04-13 10:23:44:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2016-04-13 10:23:49:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2016-04-13 10:23:49:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2016-04-13 10:23:54:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2016-04-13 10:23:54:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2016-04-13 10:23:59:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2016-04-13 10:23:59:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2016-04-13 10:24:04:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown

The “Slave is still running” keeps repeating every 5 secs, until it seems like deadline had enough of waiting:

...
2016-04-13 11:30:25:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2016-04-13 11:30:27:  Launcher Thread - OnLastTaskComplete: Slave  is still running
2016-04-13 11:30:33:  Launcher Thread - OnLastTaskComplete: Checking if all slaves have shutdown
2016-04-13 11:32:07:  Local version file: C:\Program Files\Thinkbox\Deadline7\bin\Version
2016-04-13 11:32:07:  Network version file: \\inferno2\deadline\repository7\bin\Windows\64bit\Version
2016-04-13 11:32:07:  Comparing version files...
2016-04-13 11:32:07:  Version files are different
2016-04-13 11:32:10:  Upgrade detected
2016-04-13 11:32:10:  Sending command to slave: StopSlave 
2016-04-13 11:32:19:  Got reply: LAPRO0648: Sent "StopSlave" command. Result: "Connection Accepted.
2016-04-13 11:32:19:  "
2016-04-13 11:32:46:  Sending command to slave: StopSlave 
2016-04-13 11:32:46:  Got reply: LAPRO0648: Sent "StopSlave" command. Result: "Connection Accepted.
2016-04-13 11:32:46:  "
2016-04-13 11:32:50:  No Slave to shutdown
2016-04-13 11:32:50:  No Monitor to shutdown
2016-04-13 11:32:50:  No Pulse to shutdown
2016-04-13 11:32:50:  No Web Service to shutdown
2016-04-13 11:32:51:  Copying bin files to bootstrap folder S:\TMP\LauncherBootstrap

Interrupting the sim…

Hmm it could be that the task is getting finished, but that the Slave is closed before it properly updates its status and releases the task. Can you confirm whether or not the output for these tasks was properly generated? Either way, we will look into this as it appears to be a bug.

No, the task doesnt appear to have been finished - that’s actually how we found out about the issue, by the artist complaining that his sim keeps requeueing. Looking at the logs, we have found that the machines were prematurely executing the ‘after last task complete’ command, or at least, so it seems…

Hey Laszlo,

Are you still experiencing this behavior? If so, would you be able to provide some steps to reproduce as we are having difficulty reproducing this error on our end.

Hey Laszlo,

Are you still experiencing this issue?

Hi Ryan,

Sorry i was out for a few weeks (months). We have since switched to deadline8, i am not aware of recent reports of this issue.

cheers
l

Privacy | Site terms | Cookie preferences