AWS Thinkbox Discussion Forums

repo repair hanging

Hi there,

Usually this process takes only a few minutes:

(grep “Triggering Repository Repair Events”)

2016-05-19 15:39:57:  Triggering Repository Repair Events
2016-05-19 15:41:10:  Triggering Repository Repair Events
2016-05-19 15:42:14:  Triggering Repository Repair Events
2016-05-19 15:43:30:  Triggering Repository Repair Events
2016-05-19 15:44:48:  Triggering Repository Repair Events
2016-05-19 15:46:06:  Triggering Repository Repair Events
2016-05-19 15:47:14:  Triggering Repository Repair Events
2016-05-19 15:48:18:  Triggering Repository Repair Events
2016-05-19 15:49:21:  Triggering Repository Repair Events

But its been hanging on these for 40 mins now and counting:

2016-05-19 16:53:41:  Slave 'LAPRO0449' has stalled because it has not updated its state in 39.963 m. Performing house cleaning...
2016-05-19 16:53:41:  Found task class: 17:[1026-1026]
2016-05-19 16:53:41:  Task is still being rendered!
2016-05-19 16:53:41:  It's time to requeue this baby.
2016-05-19 16:53:51:  Cannot send notification because the email address list is empty.
2016-05-19 16:53:51:  Slave 'LAPRO0757' has stalled because it has not updated its state in 38.707 m. Performing house cleaning...
2016-05-19 16:53:51:  Found task class: 124:[1075-1075]
2016-05-19 16:53:51:  Task is still being rendered!
2016-05-19 16:53:51:  It's time to requeue this baby.
2016-05-19 16:54:12:  Cannot send notification because the email address list is empty.
2016-05-19 16:54:12:  Slave 'LAPRO0757-secondary' has stalled because it has not updated its state in 38.236 m. Performing house cleaning...
2016-05-19 16:54:12:  Could not find associated job class though.
2016-05-19 16:54:23:  Cannot send notification because the email address list is empty.
2016-05-19 16:54:23:  Slave 'LAPRO0674' has stalled because it has not updated its state in 39.384 m. Performing house cleaning...
2016-05-19 16:54:23:  Found task class: 51:[1076-1076]
2016-05-19 16:54:23:  Task is still being rendered!
2016-05-19 16:54:23:  It's time to requeue this baby.
2016-05-19 16:54:43:  Cannot send notification because the email address list is empty.
2016-05-19 16:54:43:  Slave 'LAPRO0674-secondary' has stalled because it has not updated its state in 39.548 m. Performing house cleaning...
2016-05-19 16:54:43:  Found task class: 39:[1040-1040]
2016-05-19 16:54:43:  Task is still being rendered!
2016-05-19 16:54:43:  It's time to requeue this baby.
2016-05-19 16:54:57:  Cannot send notification because the email address list is empty.
2016-05-19 16:54:57:  Slave 'LAPRO0742' has stalled because it has not updated its state in 39.468 m. Performing house cleaning...
2016-05-19 16:54:57:  Found task class: 36:[1028-1028]
2016-05-19 16:54:57:  Task is still being rendered!
2016-05-19 16:54:57:  It's time to requeue this baby.
2016-05-19 16:55:14:  Cannot send notification because the email address list is empty.
2016-05-19 16:55:14:  Slave 'LAPRO0798' has stalled because it has not updated its state in 40.376 m. Performing house cleaning...
2016-05-19 16:55:15:  Found task class: 142:[1093-1093]
2016-05-19 16:55:15:  Task is still being rendered!
2016-05-19 16:55:15:  It's time to requeue this baby.
2016-05-19 16:55:27:  Cannot send notification because the email address list is empty.
2016-05-19 16:55:27:  Slave 'LAPRO0798-secondary' has stalled because it has not updated its state in 40.055 m. Performing house cleaning...
2016-05-19 16:55:27:  Found task class: 222:[1223-1223]
2016-05-19 16:55:27:  Task is still being rendered!
2016-05-19 16:55:27:  It's time to requeue this baby.
2016-05-19 16:55:38:  Cannot send notification because the email address list is empty.
2016-05-19 16:55:38:  Slave 'LAPRO0159' has stalled because it has not updated its state in 39.740 m. Performing house cleaning...
2016-05-19 16:55:38:  Could not find associated job for this slave.
2016-05-19 16:56:01:  Cannot send notification because the email address list is empty.
2016-05-19 16:56:01:  Slave 'LAPRO0795' has stalled because it has not updated its state in 41.109 m. Performing house cleaning...
2016-05-19 16:56:01:  Found task class: 27:[1025-1025]
2016-05-19 16:56:01:  Task is still being rendered!
2016-05-19 16:56:01:  It's time to requeue this baby.
2016-05-19 16:56:13:  Cannot send notification because the email address list is empty.
2016-05-19 16:56:13:  Slave 'LAPRO0795-secondary' has stalled because it has not updated its state in 41.477 m. Performing house cleaning...
2016-05-19 16:56:13:  Found task class: 33:[1034-1034]
2016-05-19 16:56:13:  Task is still being rendered!
2016-05-19 16:56:13:  It's time to requeue this baby.

Any ideas? Seems like handling each machine takes 10-20 seconds. This process of “Slave X has stalled” has started around 2016-05-19 16:16:16.

Have you guys seen this since the lock queues quieted down, out of curiosity?

This is definitely one that could have been caused by the queues on the Report DB, since this would have been trying to create a Requeue Report for the Job it found the stalled slave was working on.

This has dropped, havent seen it take that long ever since we updated to 8.0.1.1

Privacy | Site terms | Cookie preferences