AWS Thinkbox Discussion Forums

"House cleaning" processes still hanging

EDIT: Sorry, I accidentally clicked submit before finishing this post.

I reported a similar issue during the 6.1 beta (see topic here: http://forums.thinkboxsoftware.com/viewtopic.php?f=156&t=10463).

The same issue seems to be occurring with the deadlinecommand processes spawned by Pulse. Basically, I had Pulse running for a few days, and then shut it down via deadlinepulse -shutdown from another terminal. Pulse was not able to cleanly close, and after timing out, it eventually resorted to killing itself. This may or may not be related, but this kind of thing is frequent when trying to cleanly close Pulse:

2014-03-18 11:16:08: Stopping Pulse 2014-03-18 11:16:38: Pulse failed to exit after 30 seconds, killing it...

I also saw one or two instances of this in the logs:

2014-03-17 17:44:56: Web Service - Web Service shutting down... 2014-03-17 17:45:55: Info Thread: Stopped 2014-03-17 17:45:55: Clean Up Thread: Stopped 2014-03-17 17:45:55: Pending Job Scan Thread: ShuttingDown 2014-03-17 17:45:55: Server Thread: Stopped 2014-03-17 17:45:55: Scheduler Thread: Stopped 2014-03-17 17:45:55: Power Management: Stopped 2014-03-17 17:45:55: Forcing shutdown. 2014-03-17 17:45:55: Exception Details 2014-03-17 17:45:55: Exception -- One or more threads failed to quit successfully. 2014-03-17 17:45:55: Exception.Data: ( ) 2014-03-17 17:45:55: Exception.StackTrace: 2014-03-17 17:45:55: (null) 2014-03-17 17:45:55: Could not query child process information for pid 9702 because: Thread was being aborted (System.Threading.ThreadAbortException) 2014-03-17 17:45:55: WARNING: an error occured while trying to kill the process tree: Thread was being aborted (System.Threading.ThreadAbortException) 2014-03-17 17:45:55: Could not query child process information for pid 9702 because: Thread was being aborted (System.Threading.ThreadAbortException) 2014-03-17 17:45:55: WARNING: an error occured while trying to kill the process tree: Thread was being aborted (System.Threading.ThreadAbortException) 2014-03-17 17:45:55: Error running pending job scan process: Thread was being aborted (System.Threading.ThreadAbortException)

Anyway, after Pulse finished offing itself, I checked the Mongo server to make sure everything had cleaned itself up properly, only to find there were still 4 open connections to the DB. In my process tree, I found 3 orphaned deadlinecommand processes that were previously parented under the deadlinepulse process, and were hanging indefinitely. Two of them were executing deadlinecommand -DoHouseCleaning 10 False, and one was executing deadlinecommand -DoPendingJobScan False. All of them had been running for around 2 days, and after terminating them, all of the remining Mongo connections were properly closed.

This is in 6.2 beta 3 on Fedora 19.

Also, I just tried starting Pulse again (this time in “nogui” mode), and almost immediately got a hanging instance of deadlinecommand -DoHouseCleaning 10 False. When I stopped Pulse (using deadlinepulse -shutdown), I got this in the terminal (which seems to correspond to the “Pulse failed to exit after 30 seconds, killing it…” message written to the log file):

Web Service - Web Service shutting down... Killed

Also, the child deadlinecommand process again became orphaned and continued to hang around until I terminated it manually.

At the end of that thread, I mentioned that we would add the option to disable having housecleaning run in a separate process. Have you tried the new option yet?
thinkboxsoftware.com/deadlin … e_Cleaning

If not, try disabling the separate process option and restart pulse to see if the problem still occurs.

Thanks!
Ryan

Trying this out, and so far so good.

Privacy | Site terms | Cookie preferences