Is anyone else experiencing nodes stalling. There is no patttern or consistancy to it, just nodes randomly stall and don’t come back up. Normally i try and restart the node from the monitor but usually the connection is refused. I have to stop the DLService and then manually kill any of the DL processes running and then restart the service.
I’ve actually got a theory here that something in the middleware that connects the apps to the database might break. The reason for blaming the middleware is that it affects both the Launcher and the Slave, and it seems that whenever they get into this state that the only way to get them out of it again is to stop and restart the apps.
It may be affected by having multiple Slaves specified in the “[repo]/settings/connection.ini” (Deadline 8) "[repo]/settings/dbConnect.xml (Deadline 7) file. Just as a test, can you edit that file and specify only one host and see if that makes any kind of difference?