Hi,
Although not related to v6, I wanted to report this situation as it caused me some excitement over the wkd!
Essentially to repeat the situation (which I would advise against!):
- Have Deadline Slave/Launcher both running v5.2 on a machine, with Pulse running on the network as well.
- Pull out network cable for more than 15 minutes.
- Put network cable back into machine 30+ minutes later.
- Go home, discover 48 hours later, that in the repository /temp directory, approx 46,000 files called “timeCheck_” + “5 random characters” + “.ComputerName” which had its network cable pulled out above…
Example File: “/temp/timeCheck_0sDZv.ComputerName”
So, looks like Pulse tries to check for “stalled” status on this machine, which it thinks is still running slave and does a “timeCheck”. Unfortunately, when the number of files gets this big, Pulse can’t handle it when it tries to “purge” the files and Pulse will crash after about 30-35 minutes, which at I guess, I reckon is the random intervals, that Pulse executes a clean up routine.
Lesson Learnt by Mike - always ensure Slave is shutdown before removing a network cable for more than 15+ minutes in the future!
However, I was thinking that an improvement could be made in the “timecheck” code for the future? Do a ping before trying to check time on a slave, maybe?
Maybe this situation is a thing of the past in v6, which would be good. Either way, I wanted to report this situation.
Thanks,
Mike