Hi,
We have configured a new installation of deadline 10.3.1.3.
We installed the mongo db, the remote connection server and the web service each on their own virtual machines (Rocky linux 9.4)
The whole installation is running fine, I tested submitting a whole range of jobs and everything works.
The only thing that looks wrong is that, in the deadline monitor, the machine holding the web service goes to ‘stalled’ status after a few hours. The web service is still functional (I use it in scripts to connect to the repo), but that worker appears as stalled. Restarting the deadline10launcher service on that machine fixes the issue, but after x hours, it goes back to ‘stalled’ status
Looking at the logs of the deadline10launcher service on the web service machine I only see these lines repeated over and over :
Launcher Scheduling - GeneralNotice: "1 group(s) were found, but none were considered (0 disabled, the rest had no Workers on this host)"
Counters: http-process-calls-current=0 http-process-calls-count=7 http-process-slow-calls-count=0 http-process-last-slow=none http-process-histo: ApxMean=586.8 P99=4096 BoundsMs=0,1.4,2,2.8,4,5.7,8,11.3,16,22.6,32,45.3,64,90.5,128,181,256,362,512,724.1,1024,1448.2,2048,2896.3 Counts=0,0,0,0,0,0,0,0,0,2,1,1,0,0,1,0,1,0,0,0,0,0,0,1 assign-tasks-calls-current=0 assign-tasks-calls-count=0
What could cause this ‘stalled’ status on the worker holding the web service ?
I’m also wondering if this web service machine should be listed as a worker, as no jobs will ever be executed on it. Would it be better to remove it from the workers, and how should I remove it ?
Regards