Hi ! yersteday i’ve got a strange issue where 70 worker were suddently shutted down and then they tried to connect to the deadline repository without any succes.
This issue havnt affect the whole render farm, 4/5 worker were still running.
This happened after a OnLastTaskComplete, RestartSlave command was send to every worker.
Workers tried to reconnect for 1 hours until i restart machine and then it worked.
My first assumption would be a network or Repository connectivity issue. However, I haven’t found any obvious evidence of network failures, Repository issues during the incident.
Has anyone encountered a similar situation where a large number of Workers received a RestartSlave command and then failed to reconnect to the Repository until the machines were rebooted?
Logs in deadline launcher look like this (for more than 1hour):
2026-06-10 11:38:28: Launcher Thread - OnLastTaskComplete: Checking if all Workers have shutdown
2026-06-10 11:38:28: Launcher Thread - OnLastTaskComplete: Worker is still running
2026-06-10 11:38:33: Launcher Thread - OnLastTaskComplete: Checking if all Workers have shutdown
2026-06-10 11:38:33: Launcher Thread - OnLastTaskComplete: Worker is still running
2026-06-10 11:38:38: Launcher Thread - OnLastTaskComplete: Checking if all Workers have shutdown
2026-06-10 11:38:38: Launcher Thread - OnLastTaskComplete: Worker is still running
2026-06-10 11:38:43: Launcher Thread - OnLastTaskComplete: Checking if all Workers have shutdown