AWS Thinkbox Discussion Forums

Database corruption

It sometimes randomly happens that jobs are not removed correctly
as follows, jobs starting with _ID as 1st information are not correct:

Resulting in an endless loop within pulse and jobs update rusty

Pulse constantly reporting :

once these jobs are removed from db using [code]> db.Jobs.remove({"_id" : “50e6caa0b6587218f04a9142”})

db.Jobs.remove({"_id" : “50e6e94151682704a45359ae”})
[/code]

pulse returns to normal

Does this happen when removing a group of jobs or a single job? Please let me know if there are any more details you could provide.

This happens randomly but :

seems to happen mostly with packs of small jobs with auto-delete on complete
these are really lots of small jobs that do not stay long in queue.

sometimes on of them does not succeed in deletion and this ends up with database having a “corrupted” job,
but this job does not show up in monitor, nore does it retry deletion

so this ends up with pulse in infinite loop because it does not know what to do with this job

once job is manually removed from db, pulse immediatly returns to normal

Ok, that information helps.

I’m still testing to reproduce the error on my end.

Privacy | Site terms | Cookie preferences