AWS Thinkbox Discussion Forums

housecleaning taking 10+ hours

We are struggling guys, seems like housecleaning can’t keep up with the job count / load we have…

The current houecleaning cycle has been going since yesterday ~7pm:

root 32042 0.7 1.9 1774416 638252 pts/3 Sl Sep22 6:54 /opt/mono-2.10.9/bin/mono /opt/Thinkbox/Deadline6/bin/deadlinecommand.exe -DoHouseCleaning 10 True

It took 9 hours and 43 minutes to:

“Purge job reports for “XXX” because the job no longer exists” on 2302 jobs.
"Purge deleted job XXX becasue it was deleted over 2hours ago on 7587 jobs

It hasnt done anything else since midnight and its still going…

This is probably another symptom resulting from the load on your database (same reason why the pending job scan takes so long). Also, I know you guys had a lot of things backed up, like job reports. Do you know if that’s even been caught up on yet?

You had brought up fragmentation in another thread, and from googling the issue, it does appear that this can eventually cause performance issues. Doing a repair operation can compact these back down, but it will probably take some time for the operation to finish, which unfortunately involves downtime. Is this at all an option?

We are mid-deliveries on 2 shows, so i’m really afraid to trigger a repair which might take down the whole farm for an extended period :frowning: Not an option right now… (people are doing 7 day weeks). Although, the unusability of the farm right now might mean the frustration level reaches the “fukitol, lets try that” level.

The job report cleanup finished yesterday around noon, and it was down at 0 while i was running the process on my box. Now its back up a little, since i stopped the manual process on my machine:

namespace: deadline6db.DeletedJobReportEntries
objects: 80860
average object size: 592.0 Bytes
data size: 45.7 MB
storage size: 10.2 GB
extents: 25
last extent size: 2.0 GB
padding factor: 1
system flags: HasIdIndex
user flags: None
indexes: 1
total index size: 4.8 MB
index sizes:
id: 4.788 MB

I wonder if it’s hung up, especially if it’s not logging anything new since midnight. Maybe try killing it so another one can trigger?

Sorry, i should have worded it better, its doing only those processes i mentioned, its logging stuff, just not any other tasks.

“Purge job reports for “XXX” because the job no longer exists” on 2302 jobs.
"Purge deleted job XXX becasue it was deleted over 2hours ago on 7587 jobs

I restarted it shortly after writing this message, and this is the complete log since the restart:

2014-09-23 09:52:36:  BEGIN -\root
2014-09-23 09:52:36:  Deadline Command 6.2 [v6.2.1.39 R  (9f4ea2276)]
2014-09-23 09:52:36:  Purging old Housecleaning logs.
2014-09-23 09:52:36:  Performing Job Cleanup Scan...
2014-09-23 09:52:36:      Job Cleanup Scan - Loading completed jobs
2014-09-23 09:52:50:      Job Cleanup Scan - Loaded 11226 completed jobs in 14.471 s
2014-09-23 09:52:50:      Job Cleanup Scan - Scanning completed jobs
2014-09-23 09:54:01:      Job Cleanup Scan - Archived completed job "[EXO] RS_190_1980_v0065_ssh_foamglidefix_cache_flowline_WaveFoamCaps_0 " because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014-09-23 09:54:03:      Job Cleanup Scan - Warning: completed job "[EXO] RS_190_1980_v0065_ssh_foamglidefix_images_render3d_FL-Foam_L_0 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 09:54:03:  Argument cannot be null.
2014-09-23 09:54:03:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 09:55:27:      Job Cleanup Scan - Archived completed job "[EXO] RS_190_2050_v0699_ssh_v679_surfVox2_cache_flowline_Surface_0 " because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014-09-23 09:55:29:      Job Cleanup Scan - Warning: completed job "[EXO] RS_190_2050_v0699_ssh_v679_surfVox2_images_render3d_FL-Happy_L_0 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 09:55:29:  Argument cannot be null.
2014-09-23 09:55:29:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 09:56:35:      Job Cleanup Scan - Archived completed job "[EXO] RS_190_1980_v0065_ssh_foamglidefix_cache_flowline_WaveFoam_0 " because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014-09-23 09:56:38:      Job Cleanup Scan - Warning: completed job "[EXO] RS_190_2050_v0698_ssh_v679_surfVox3b_images_render3d_FL-Happy_L_0 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 09:56:38:  Argument cannot be null.
2014-09-23 09:56:38:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 09:57:56:      Job Cleanup Scan - Archived completed job "[EXO] RS_190_1980_v0068_ssh_cascading_images_render3d_FL-SprayOnly_L_0 " because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014-09-23 09:57:58:      Job Cleanup Scan - Warning: completed job "[EXO] RS_190_1980_v0057_ssh_foamtweaks_images_render3d_FL-FoamCaps_L_0 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 09:57:58:  Argument cannot be null.
2014-09-23 09:57:58:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 10:00:23:      Job Cleanup Scan - Archived completed job "[EXO] RS_190_7030_v0100_lle_LargerBreakup_S05_cache_flowline_SurfaceS05_2 " because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014-09-23 10:00:24:      Job Cleanup Scan - Warning: completed job "[EXO] RS_190_1980_v0068_ssh_cascading_images_render3d_FL-SprayMistOnly_L_0 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 10:00:24:  Argument cannot be null.
2014-09-23 10:00:24:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 10:02:14:      Job Cleanup Scan - Archived completed job "[EXO] RS_190_7030_v0099_lle_BetterCol_S05_cache_flowline_HappyS05_0 " because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014-09-23 10:02:15:      Job Cleanup Scan - Warning: completed job "[EXO] RS_190_7030_v0100_lle_LargerBreakup_S05_cache_flowline_SurfaceS05_0 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 10:02:15:  Argument cannot be null.
2014-09-23 10:02:15:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 10:04:47:      Job Cleanup Scan - Archived completed job "[EXO] RS_190_7030_v0100_lle_LargerBreakup_S05_cache_flowline_HappyS05_1 " because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014-09-23 10:04:48:      Job Cleanup Scan - Warning: completed job "[EXO] RS_190_1980_v0068_ssh_cascading_cache_flowline_MistFine_0 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 10:04:48:  Argument cannot be null.
2014-09-23 10:04:48:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 10:06:11:      Job Cleanup Scan - Archived completed job "[EXO] RS_190_1980_v0066_ssh_crestFaster_images_render3d_FL-CrestSprayOnly_L_0 " because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014-09-23 10:06:13:      Job Cleanup Scan - Warning: completed job "[EXO] RS_190_2050_v0700_ssh_cascade_HI_images_render3d_FL-SprayOnly_L_0 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 10:06:13:  Argument cannot be null.
2014-09-23 10:06:13:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 10:07:57:      Job Cleanup Scan - Archived completed job "[EXO] RS_190_1980_v0067_ssh_crestSlower_images_render3d_FL-CrestMistOnly_L_0 " because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014-09-23 10:07:58:      Job Cleanup Scan - Warning: completed job "[EXO] RS_190_1980_v0064_ssh_slowerstill_HI_images_render3d_FL-Wave_L_0 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 10:07:58:  Argument cannot be null.
2014-09-23 10:07:58:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 10:09:32:      Job Cleanup Scan - Archived completed job "[EXO] RS_190_1980_v0067_ssh_crestSlower_cache_flowline_CrestSpray_0 " because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014-09-23 10:09:34:      Job Cleanup Scan - Warning: completed job "[EXO] RS_190_1980_v0067_ssh_crestSlower_images_render3d_FL-CrestSprayOnly_L_0 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 10:09:34:  Argument cannot be null.
2014-09-23 10:09:34:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 10:10:52:      Job Cleanup Scan - Archived completed job "[EXO] RS_190_1980_v0067_ssh_crestSlower_images_render3d_FL-CrestSprayMistOnly_L_0 " because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014-09-23 10:12:40:      Job Cleanup Scan - Archived completed job "[EXO] RS_190_1980_v0068_ssh_cascading_images_render3d_FL-MistOnly_L_0 " because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014-09-23 10:12:40:      Job Cleanup Scan - Warning: completed job "[EXO] RS_190_1980_v0066_ssh_crestFaster_images_render3d_FL-CrestSprayMistOnly_L_0 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 10:12:40:  Argument cannot be null.
2014-09-23 10:12:40:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 10:12:42:      Job Cleanup Scan - Warning: completed job "[GOLD] CS_000_0175_v0043_npo_latestFXandLigh_images_render3d_Light-GoldenGate_0 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 10:12:42:  Argument cannot be null.
2014-09-23 10:12:42:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 10:15:54:      Job Cleanup Scan - Archived completed job "[EXO] RS_190_7030_v0089_lle_Hp93LS52_Say56_S05_images_render3d_FL-LeadSprayS05_L_0 " because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014-09-23 10:16:00:      Job Cleanup Scan - Warning: completed job "[EXO] RS_190_7030_v0096_lle_NoCollisions_S05_images_render3d_FL-WaveS05_L_0 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 10:16:00:  Argument cannot be null.
2014-09-23 10:16:00:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 10:19:40:      Job Cleanup Scan - Archived completed job "[EXO] RS_190_7030_v0100_lle_LargerBreakup_S05_images_render3d_FL-HappyS05_L_0 " because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014-09-23 10:19:44:      Job Cleanup Scan - Warning: completed job "[GOLD] SB_000_0115_v0009_jma_Firstpass_images_render3d_Light-Buildings_0 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 10:19:44:  Argument cannot be null.
2014-09-23 10:19:44:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 10:24:56:      Job Cleanup Scan - Archived completed job "[GOLD] SB_000_0115_v0009_jma_Firstpass_images_render3d_Light-BrigEagle_0 " because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014-09-23 10:24:57:      Job Cleanup Scan - Warning: completed job "[GOLD] SB_000_0115_v0009_jma_Firstpass_images_render3d_Light-MissleCruiser_0 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 10:24:57:  Argument cannot be null.
2014-09-23 10:24:57:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 10:24:59:      Job Cleanup Scan - Warning: completed job "[EXO] RS_190_7030_v0097_lle_ElemFric0_05_S05_images_render3d_FL-LeadSurfaceS05_L_0 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 10:24:59:  Argument cannot be null.
2014-09-23 10:24:59:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 10:26:04:      Job Cleanup Scan - Archived completed job "[EXO] RS_190_2060_v0864_str_CleanRend_images_render3d_FL-LeadSprayMist_L_0 " because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014-09-23 10:26:57:      Job Cleanup Scan - Archived completed job "[EXO] RS_190_2050_v0700_ssh_cascade_HI_images_render3d_FL-MistOnly_L_0 " because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014-09-23 10:27:32:      Job Cleanup Scan - Archived completed job "[EXO] RS_190_2050_v0700_ssh_cascade_HI_images_render3d_FL-SprayMistOnly_L_0 " because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014-09-23 10:29:12:      Job Cleanup Scan - Archived completed job "[EXO] RS_190_1980_v0065_ssh_foamglidefix_images_render3d_FL-FoamCaps_L_0 " because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014-09-23 10:30:17:      Job Cleanup Scan - Archived completed job "[GOLD] SB_000_0117_v0003_jma_FirstPass_images_render3d_Light-Buildings_0 " because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014-09-23 10:30:18:      Job Cleanup Scan - Warning: completed job "[GOLD] SB_000_0117_v0003_jma_FirstPass_images_render3d_Light-BrigEagle_0 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 10:30:18:  Argument cannot be null.
2014-09-23 10:30:18:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 10:34:45:      Job Cleanup Scan - Archived completed job "[UMP] EP_144_0185_v0032_jma_NewCam_images_render3d_Light-All_0 " because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014-09-23 10:34:46:      Job Cleanup Scan - Warning: completed job "[UMP] EP_144_0185_v0032_jma_NewCam_images_render3d_Light-All-Patch_0 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 10:34:46:  Argument cannot be null.
2014-09-23 10:34:46:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 10:38:51:      Job Cleanup Scan - Archived completed job "[UMP] EP_144_0185_v0032_jma_NewCam_images_render3d_Light-All-BALLZ_0 " because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014-09-23 10:38:52:      Job Cleanup Scan - Warning: completed job "[EXO] RS_190_7030_v0098_lle_BreakupVelocity_S05_images_render3d_FL-LeadSprayS05_L_0 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 10:38:52:  Argument cannot be null.
2014-09-23 10:38:52:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 10:38:53:      Job Cleanup Scan - Warning: completed job "[EXO] RS_190_1980_v0063_ssh_slowerstill_images_render3d_FL-Wave_L_0 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 10:38:53:  Argument cannot be null.
2014-09-23 10:38:53:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 10:38:54:      Job Cleanup Scan - Warning: completed job "[GOLD] CS_000_0105_v0009_cmu_exrTest_images_render3d_Light-CargoShip_0 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 10:38:54:  Argument cannot be null.
2014-09-23 10:38:54:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 10:38:56:      Job Cleanup Scan - Warning: completed job "[GOLD] CS_000_0105_v0010_cmu_txTest_images_render3d_Light-CargoShip_0 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 10:38:56:  Argument cannot be null.
2014-09-23 10:38:56:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 10:47:03:      Job Cleanup Scan - Archived completed job "[GOLD] SB_000_0005_v0407_ydi_test_boat_inter_images_render3d_FL-Flow_0 " because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014-09-23 10:47:04:      Job Cleanup Scan - Warning: completed job "[EXO] RS_190_7030_v0098_lle_BreakupVelocity_S05_cache_flowline_LeadSprayS05_0 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 10:47:04:  Argument cannot be null.
2014-09-23 10:47:04:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 10:47:04:      Job Cleanup Scan - Warning: completed job "[EXO] RS_190_7030_v0096_lle_NoCollisions_S05_images_render3d_FL-HappyS05_L_0 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 10:47:04:  Argument cannot be null.
2014-09-23 10:47:04:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 10:50:09:      Job Cleanup Scan - Archived completed job "[EXO] RS_190_7030_v0096_lle_NoCollisions_S05_cache_flowline_SurfaceS05_2 " because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014-09-23 10:50:11:      Job Cleanup Scan - Warning: completed job "[EXO] RS_190_7030_v0096_lle_NoCollisions_S05_cache_flowline_SurfaceS05_1 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 10:50:11:  Argument cannot be null.
2014-09-23 10:50:11:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 10:53:01:      Job Cleanup Scan - Archived completed job "[EXO] RS_190_7030_v0096_lle_NoCollisions_S05_cache_flowline_SurfaceS05_3 " because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014-09-23 10:53:03:      Job Cleanup Scan - Warning: completed job "[EXO] RS_190_1980_v0057_ssh_foamtweaks_cache_flowline_CrestSpray_0 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 10:53:03:  Argument cannot be null.
2014-09-23 10:53:03:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 10:53:04:      Job Cleanup Scan - Warning: completed job "[EXO] RS_190_7030_v0100_lle_LargerBreakup_S05_cache_flowline_SurfaceS05_3 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 10:53:04:  Argument cannot be null.
2014-09-23 10:53:04:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 10:56:35:      Job Cleanup Scan - Archived completed job "[EXO] RS_190_7030_v0089_lle_Hp93LS52_Say56_S05_cache_flowline_LeadSprayS05_2 " because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014-09-23 10:56:37:      Job Cleanup Scan - Warning: completed job "[EXO] RS_190_1980_v0066_ssh_crestFaster_images_render3d_FL-CrestMistOnly_L_0 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 10:56:37:  Argument cannot be null.
2014-09-23 10:56:37:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 10:59:07:      Job Cleanup Scan - Archived completed job "[EXO] RS_190_1980_v0057_ssh_foamtweaks_images_render3d_FL-CrestSprayOnly_L_0 " because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014-09-23 10:59:10:      Job Cleanup Scan - Warning: completed job "[EXO] RS_190_7030_v0098_lle_BreakupVelocity_S05_cache_flowline_LeadSprayS05_3 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 10:59:10:  Argument cannot be null.
2014-09-23 10:59:10:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 11:06:24:      Job Cleanup Scan - Archived completed job "[GOLD] SB_000_0005_v0407_ydi_test_boat_inter_cache_flowline_Flow_0 " because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014-09-23 11:06:26:      Job Cleanup Scan - Warning: completed job "[EXO] RS_190_7030_v0089_lle_Hp93LS52_Say56_S05_cache_flowline_LeadSprayS05_1 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 11:06:26:  Argument cannot be null.
2014-09-23 11:06:26:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 11:06:26:      Job Cleanup Scan - Warning: completed job "[EXO] RS_190_1980_v0057_ssh_foamtweaks_cache_flowline_WaveFoamCaps_0 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 11:06:26:  Argument cannot be null.
2014-09-23 11:06:26:  Parameter name: document (FranticX.Database.DatabaseConnectionException)
2014-09-23 11:06:27:      Job Cleanup Scan - Warning: completed job "[EXO] RS_190_7030_v0096_lle_NoCollisions_S05_cache_flowline_SurfaceS05_0 " could not be archived because: An unexpected error occurred while interacting with the database (,,
2014-09-23 11:06:27:  Argument cannot be null.
2014-09-23 11:06:27:  Parameter name: document (FranticX.Database.DatabaseConnectionException)

Any suggestions welcome…

(im running 2 housecleaning steps on other machines:
deadlinecommand.exe -DoHouseCleaning 0 True PurgeDeletedJobs
deadlinecommand.exe -DoHouseCleaning 0 True purgeoldjobreports
to catch up. I suspect thats the reason for those exceptions)

Do you guys submit your scene files with the jobs? I’m noticing that each archive is taking minutes to execute, and I’m wondering if it’s due to including the scene file in the job archive (it would get included in the zip file, and then the zip gets copied to the archive job folder). If you are including the scene file, how big can these files get?

For the majority of the jobs, we don’t put the max file in the repo. The largest current zip file in the jobsArchived folder is 9 megs and thats all logs.

Privacy | Site terms | Cookie preferences