Whole farm crash

Hi ! yersteday i’ve got a strange issue where 70 worker were suddently shutted down and then they tried to connect to the deadline repository without any succes.
This issue havnt affect the whole render farm, 4/5 worker were still running.

This happened after a OnLastTaskComplete, RestartSlave command was send to every worker.

Workers tried to reconnect for 1 hours until i restart machine and then it worked.

My first assumption would be a network or Repository connectivity issue. However, I haven’t found any obvious evidence of network failures, Repository issues during the incident.

Has anyone encountered a similar situation where a large number of Workers received a RestartSlave command and then failed to reconnect to the Repository until the machines were rebooted?

Logs in deadline launcher look like this (for more than 1hour):

2026-06-10 11:38:28: Launcher Thread - OnLastTaskComplete: Checking if all Workers have shutdown
2026-06-10 11:38:28: Launcher Thread - OnLastTaskComplete: Worker is still running
2026-06-10 11:38:33: Launcher Thread - OnLastTaskComplete: Checking if all Workers have shutdown
2026-06-10 11:38:33: Launcher Thread - OnLastTaskComplete: Worker is still running
2026-06-10 11:38:38: Launcher Thread - OnLastTaskComplete: Checking if all Workers have shutdown
2026-06-10 11:38:38: Launcher Thread - OnLastTaskComplete: Worker is still running
2026-06-10 11:38:43: Launcher Thread - OnLastTaskComplete: Checking if all Workers have shutdown

are there multiple workers on the machine? It’s possible if you run multiple workers (gpu affinity etc) that one worker shutdown but another worker was still running so the shutdown didn’t proceed.

Hi, thanks for your answer !

Some machine have multiple worker but the problem was on 60 different computer.

The logs line you see up there are on a computer with only one worker.

The problem has not recurred so far. if it happens again i will try to reproduce it to get more logs.

You should be able to check on one machine by running the shutdown after task complete

  • Checking if all Workers have shutdown
  • Worker is still running

These logs made me think that was the issue, worth checking if there is a stale/crashed workers in the background. Is it running as a service?