AWS Thinkbox Discussion Forums

artists coming in -> housecleaning blocked

In the mornings as artists come in and shut down the slaves running on their workstation (usually by hard killing it like a boss), it generates a lot of stalled machines all at once.
This usually halts houscleaning for ~2 hours or so till it ‘releases’ everything related to the stalled machines:

goodmorning.PNG

The cleanup cycle for the stalled machines ONLY lasted :

2014-09-26 09:56:18: Performing Stalled Slave Scan…

2014-09-26 11:53:28: Stalled Slave Scan - Cleaned up 389 stalled slaves in 1.953 hrs

And this, with a secondary housecleaning running on another box, also looping to clean out the stalled boxes. So pulse had help :slight_smile:

Is it possible for you guys to shut down the slave gracefully instead of hard killing it? Either the artist clicks the [x] button in the slave to close it, or you change the way it is killed from another process to run “deadlineslave -s” instead of killing the process. Both ways will allow the slave to requeue the task, clean up what it’s doing, and close gracefully. There is a good chance this will help the amount of cases where your job states get messed up.

Cheers,
Ryan

Our official shutdown tool (which artists should in theory be using most of the time) does the following:

c:\Program Files\Thinkbox\Deadline6\bin\deadlineslave.exe -shutdown
or
c:\Program Files\Thinkbox\Deadline6\bin\deadlineslave.exe -name -shutdown

(if there is a secondary box, we might specify which slave to shut down)

We then wait ~20 seconds for the action to go through. There is no prockill built into our tool. They might get impatient though, but as long as they use our internal shutdown tool, its a graceful shutdown.

I dont think this shutdown process always cleans up after itself.

If i click the little x to close the slave, i get the following in the logs:

2014-09-26 15:04:26: 0: Shutdown
2014-09-26 15:04:26: 0: Shutdown
2014-09-26 15:04:27: 0: Shutdown
2014-09-26 15:04:27: 0: Shutdown
2014-09-26 15:04:27: 0: Exited ThreadMain(), cleaning up…
2014-09-26 15:04:28: Scheduler Thread - shutdown complete

However, when using the deadlineslave.exe -shutdown command, there is no cleaning up step:
(last lines of the logs)

case 1 - midway through returning limit stubs after a failed attempt to dequeue something:

2014-09-26 15:11:50:  Waiting for threads to quit.  1 seconds until forced shutdown.
2014-09-26 15:11:51:  Waiting for threads to quit.  0 seconds until forced shutdown.
2014-09-26 15:11:51:  Info Thread: Stopped
2014-09-26 15:11:51:  Scheduler Thread: ShuttingDown / DequeuingTask
2014-09-26 15:11:51:  Render Threads: 
2014-09-26 15:11:51:  Forcing shutdown.
2014-09-26 15:11:51:  Exception Details
2014-09-26 15:11:51:  Exception -- One or more threads failed to quit successfully.
2014-09-26 15:11:51:  Exception.Data: ( )
2014-09-26 15:11:51:  Exception.HResult: -2146233088
2014-09-26 15:11:51:    Exception.StackTrace: 
2014-09-26 15:11:51:      (null)
2014-09-26 15:11:51:  Slave - slave shutdown: forced

resulted in stalled machine

case 2 - midway through returning limit stubs after a failed attempt to dequeue something:

2014-09-26 15:23:58:  Waiting for threads to quit.  7 seconds until forced shutdown.
2014-09-26 15:23:59:  Waiting for threads to quit.  6 seconds until forced shutdown.
2014-09-26 15:24:00:  Waiting for threads to quit.  5 seconds until forced shutdown.
2014-09-26 15:24:01:  Waiting for threads to quit.  4 seconds until forced shutdown.
2014-09-26 15:24:02:  Waiting for threads to quit.  3 seconds until forced shutdown.
2014-09-26 15:24:02:  Scheduler -   returning 5425db99162dfe13ac7f0375
2014-09-26 15:24:03:  Waiting for threads to quit.  2 seconds until forced shutdown.
2014-09-26 15:24:04:  Scheduler -   returning 5425dbb3162dfe41d4ab34a5
2014-09-26 15:24:04:  Waiting for threads to quit.  1 seconds until forced shutdown.

resulted in stalled machine

An example of a properly cleanup when using the commandline:
case 4: (mid maya render)

2014-09-26 15:18:38:  0: Shutdown
2014-09-26 15:18:38:  0: Shutdown
2014-09-26 15:18:39:  0: Shutdown
2014-09-26 15:18:39:  0: Shutdown
2014-09-26 15:18:39:  Waiting for threads to quit.  15 seconds until forced shutdown.
2014-09-26 15:18:39:  0: Shutdown
2014-09-26 15:18:39:  0: Exited ThreadMain(), cleaning up...
2014-09-26 15:18:40:  Scheduler Thread - shutdown complete

resulted in offline machine

Privacy | Site terms | Cookie preferences