Why nodes and workstations get stalled what is the reason and how it gets fixed can anyone help on this
Hello @Ayyappan1986
Thanks for reaching out. Worker — Deadline 10.2.1.1 documentation
Workers become stalled when they don’t update their status for a long period of time, and is often an indication that the Worker has crashed. A stalled Worker isn’t necessarily a bad thing, because it’s possible the Worker just wasn’t shutdown properly (it was killed from the Task Manager, for example). In either case, it’s a good idea to check the Worker machine and restart the Worker application if necessary.
To troubleshoot further I need to look into below:
- Login to the stalled Worker machine.
- Is the Worker running?
- Check for the Worker executable in running processes
- Check the Worker logs: Logs — Deadline 10.2.1.1 documentation
- Do the Worker logs indicate connection issues or crashes? Please share
- If the Worker is not running, why?
- Check the launcher logs, is the launcher starting the Worker? Logs — Deadline 10.2.1.1 documentation
- Is the launcher running as a service? Look at the Task manager for
deadlinelauncherservice
or runps -aux | grep deadline*
- Is the Worker set to start with the launcher? Check this in Global Client Configuration file: Client Configuration — Deadline 10.2.1.1 documentation share this file with us
- Is restart Worker if stalled enabled? Also in Client Configuration file.
- Is the machine rebooting or crashing? Check the system logs.
Is there any permanent solution to fix out this issue
That depends on what’s causing the stall.
A Worker is marked stalled when it doesn’t send the database a status update for 5 minutes.
To avoid getting stalled Workers permanently make sure of:
- The computer running the Worker shuts down gracefully
- The computer running the Worker doesn’t lose network connectivity while the Worker is running
- The Worker application is never force-quit (using
-kill
or the Task Manager)
Would there be any issues with running the worker through a linux service and not directly from the launcher?
Is it an issue if the launcher isn’t running at all on a headless render node?