AWS Thinkbox Discussion Forums

Pulse (v5.2) crash - timeCheck_0sDZv

Hi,
Although not related to v6, I wanted to report this situation as it caused me some excitement over the wkd!
Essentially to repeat the situation (which I would advise against!):

  1. Have Deadline Slave/Launcher both running v5.2 on a machine, with Pulse running on the network as well.
  2. Pull out network cable for more than 15 minutes.
  3. Put network cable back into machine 30+ minutes later.
  4. Go home, discover 48 hours later, that in the repository /temp directory, approx 46,000 files called “timeCheck_” + “5 random characters” + “.ComputerName” which had its network cable pulled out above…

Example File: “/temp/timeCheck_0sDZv.ComputerName”

So, looks like Pulse tries to check for “stalled” status on this machine, which it thinks is still running slave and does a “timeCheck”. Unfortunately, when the number of files gets this big, Pulse can’t handle it when it tries to “purge” the files and Pulse will crash after about 30-35 minutes, which at I guess, I reckon is the random intervals, that Pulse executes a clean up routine.

Lesson Learnt by Mike - always ensure Slave is shutdown before removing a network cable for more than 15+ minutes in the future!

However, I was thinking that an improvement could be made in the “timecheck” code for the future? Do a ping before trying to check time on a slave, maybe?

Maybe this situation is a thing of the past in v6, which would be good. Either way, I wanted to report this situation.

Thanks,
Mike

Thanks for reporting this! These time check files are no longer created in v6, so this won’t be an issue going forward.

Cheers,

  • Ryan

Hooray! Go v6 :slight_smile:

Privacy | Site terms | Cookie preferences