We have a problem where dependencies take AGES to queue… We have jobs being submitted faster than deadline can unpend them…
For example, our script runs instantly (checking ~100 tasks’ dependencies on disk):
2014-07-23 14:27:43: Executing dependency script:
2014-07-23 14:27:43: Running Scanline handleJobDependencies checker v0.42 for JobID: 53cfe181162dfe0664a774c8
2014-07-23 14:27:43: cachePathStates
2014-07-23 14:27:43: globbing done. Checking against task list…
2014-07-23 14:27:43: Queueing: [‘98’, ‘99’] 2014-07-23 14:27:43: Dependency script returned 2 tasks that can start: /mnt/s2/exchange/software/managed/pythonScripts/site-packages/scl/deadline/scriptDependency.py
Then 82 seconds later, deadline releases the tasks… :
2014-07-23 14:28:24: Old Statistics - Purged old slave statistics in 42.938 s
2014-07-23 14:28:24: Old Statistics - Purging repository statistics that are older than Mar 25/14 14:27:41
2014-07-23 14:28:25: Old Statistics - Purged old repository statistics in 841.731 ms
2014-07-23 14:28:25: Performing Stalled Slave Scan…
2014-07-23 14:28:25: Stalled Slave Scan - Loading slave states
2014-07-23 14:28:25: Stalled Slave Scan - Loaded 1636 slave states in 157.567 ms
2014-07-23 14:28:25: Stalled Slave Scan - Scanning slave states
2014-07-23 14:28:26: Stalled Slave Scan - Cleaned up 0 stalled slaves in 24.863 ms
2014-07-23 14:28:26: Stalled Slave Scan - Done.
2014-07-23 14:28:26: Process exit code: 0 2014-07-23 14:28:55: Pending Job Scan - Released pending tasks [98,99] because the frames they depends on have finished and/or their required assets are available.
Thats just 2 tasks… we have 404 pending JOBS, with thousands of tasks total…
Not sure what to do, we have ~640 pending jobs now, and its just growing, deadline cant keep up… our queue counter is so high right now, it takes 1 min per job to set task states from pending to queued. Lots of angry people
We are trying to shut down chunks of the farm to see if that improves the db performance. But it feels like we are hitting the current limits of deadline with our farm size
After we shut down 2-300 idle slaves, the db server is now back alive again
What the heck are the idle slaves doing to the farm? Their ‘look for tasks’ timeout is set to 180 seconds now, they should hardly ever do anything… it almost feels like maybe pulse is doing something?
Dunno. Really have no clues…
based on the logs the slaves are mostly inactive while idle
Although, one weird thing i noticed, is that the limit checks take 1-2 mins PER dequeue attempt? Is that expected?
2014-07-23 17:21:36: Scheduler Thread - Render Thread 0 completed its task
2014-07-23 17:21:36: Scheduler Thread - Seconds before next job scan: 2
2014-07-23 17:21:38: Scheduler - Performing Job scan on Primary Pools with scheduling order Pool, Weighted, Balanced
2014-07-23 17:21:39: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:39: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:39: Scheduler - The 53d038c85ff1d8ad04c4790e limit is maxed out.
2014-07-23 17:21:40: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:40: Scheduler - The 53d03935e4ac450f1421a4f1 limit is maxed out.
2014-07-23 17:21:40: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:40: Scheduler - The 53d0396b91468f0864073d68 limit is maxed out.
2014-07-23 17:21:41: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:41: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:41: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:42: Scheduler - The 53d03a5554265d197cf7bf6e limit is maxed out.
2014-07-23 17:21:42: Scheduler - The 53d03a762f88e63f742ba24e limit is maxed out.
2014-07-23 17:21:42: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:42: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:43: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:43: Scheduler - The 53d03b3b31592cf158c73db9 limit is maxed out.
2014-07-23 17:21:43: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:44: Scheduler - The 53d03b5129af49286c3d88e6 limit is maxed out.
2014-07-23 17:21:44: Scheduler - The 53d03b5c0c546f13fcea3572 limit is maxed out.
2014-07-23 17:21:44: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:44: Scheduler - The 53d03b9e8774a056d8a13701 limit is maxed out.
2014-07-23 17:21:44: Scheduler - The 53d03bd5a93953377c666b4a limit is maxed out.
2014-07-23 17:21:44: Scheduler - The 53d03be5542e2513681cfb0c limit is maxed out.
2014-07-23 17:21:45: Scheduler - The 53d03bed211de1ccdc6f9eec limit is maxed out.
2014-07-23 17:21:45: Scheduler - The 53d03beffdece075e000545d limit is maxed out.
2014-07-23 17:21:45: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:45: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:45: Scheduler - The 53d03c70948ff0be1c588166 limit is maxed out.
2014-07-23 17:21:46: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:46: Scheduler - The 53d03c7ee389b73a88685fd2 limit is maxed out.
2014-07-23 17:21:46: Scheduler - The 53d03cc2d7bb9714b00e709d limit is maxed out.
2014-07-23 17:21:46: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:46: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:47: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:47: Scheduler - The 53d03d39677a791fe82136de limit is maxed out.
2014-07-23 17:21:48: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:48: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:48: Scheduler - The 53d03d693afddffa644a6f9e limit is maxed out.
2014-07-23 17:21:49: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:49: Scheduler - The 53d03dc3aad77a26f8032e53 limit is maxed out.
2014-07-23 17:21:50: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:51: Scheduler - The 53d03e583799d020382ec5e0 limit is maxed out.
2014-07-23 17:21:51: Scheduler - The 53d03e653799d070186e69ae limit is maxed out.
2014-07-23 17:21:51: Scheduler - The 53d03e66eb7af2060885203d limit is maxed out.
2014-07-23 17:21:51: Scheduler - The 53d03e939495c815d0f3c6c6 limit is maxed out.
2014-07-23 17:21:51: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:51: Scheduler - The 53d03f41fbe7670854a13149 limit is maxed out.
2014-07-23 17:21:52: Scheduler - The 53d03f571633290d0ce66c37 limit is maxed out.
2014-07-23 17:21:52: Scheduler - The 53d040282645070b14a54929 limit is maxed out.
2014-07-23 17:21:52: Scheduler - The 53d0403c78de961a3c2c2a40 limit is maxed out.
2014-07-23 17:21:52: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:52: Scheduler - The 53d040f1927bbb4b1065b13f limit is maxed out.
2014-07-23 17:21:53: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:53: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:54: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:54: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:55: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:55: Scheduler - The 53d042b4fa674026249ce5d7 limit is maxed out.
2014-07-23 17:21:55: Scheduler - The 53d042f663a48561209e301d limit is maxed out.
2014-07-23 17:21:55: Scheduler - The 53d042f9415e688b24e501d0 limit is maxed out.
2014-07-23 17:21:56: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:56: Scheduler - The 53d04370e639060d5c790b5d limit is maxed out.
2014-07-23 17:21:57: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:57: Scheduler - The 53d0449b17a44313506131e1 limit is maxed out.
2014-07-23 17:21:57: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:58: Scheduler - The 53d044bfc508b123a080710a limit is maxed out.
2014-07-23 17:21:58: Scheduler - The 53d044ef24d79e9128b44c25 limit is maxed out.
2014-07-23 17:21:58: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:58: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:59: Scheduler - The 53d0458c443133af744591f3 limit is maxed out.
2014-07-23 17:21:59: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:21:59: Scheduler - The 53d045caadca807b04f46c2b limit is maxed out.
2014-07-23 17:21:59: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:22:00: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:22:00: Scheduler - The 53d0461bfe60951e280e334a limit is maxed out.
2014-07-23 17:22:00: Scheduler - The 53d0462d76251ba3ecac19a1 limit is maxed out.
2014-07-23 17:22:00: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:22:01: Scheduler - The 53d046bbb611261108532f80 limit is maxed out.
2014-07-23 17:22:01: Scheduler - The 53d046c11b45c61d083fa243 limit is maxed out.
2014-07-23 17:22:01: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:22:01: Scheduler - The 53d0488976da670d08703d1c limit is maxed out.
2014-07-23 17:22:01: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:22:01: Scheduler - The 53d04a25183af3017c3811d1 limit is maxed out.
2014-07-23 17:22:02: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:22:02: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:22:03: Scheduler - The 53d04bcfb2354f0c38e554b4 limit is maxed out.
2014-07-23 17:22:03: Scheduler - The 53d04cf1350fe30c449f1c54 limit is maxed out.
2014-07-23 17:22:03: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:22:03: Scheduler - The 53d04d2bf7d3ea1e8cbc05a8 limit is maxed out.
2014-07-23 17:22:03: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:22:04: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:22:05: Scheduler - The 53d04e881c5e271c7cfaa181 limit is maxed out.
2014-07-23 17:22:05: Scheduler - The 53d04e9638c14e0d5c771414 limit is maxed out.
2014-07-23 17:22:06: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:22:06: Scheduler - The 53d0329378301d0df8c5cf56 limit is maxed out.
2014-07-23 17:22:06: Scheduler - Slave is not whitelisted for upload limit.
2014-07-23 17:22:07: Scheduler - The 53d04211548f1e19a09fccc6 limit is maxed out.
2014-07-23 17:22:07: Scheduler - The 53d04044548f1e17b4f0d7f1 limit is maxed out.
2014-07-23 17:22:07: Scheduler - Slave is not whitelisted for rsmb limit.
2014-07-23 17:22:09: Scheduler - The ingest2d limit is maxed out.
2014-07-23 17:22:10: Scheduler - The ingest2d limit is maxed out.
2014-07-23 17:22:10: Scheduler - The ingest2d limit is maxed out.
2014-07-23 17:22:11: Scheduler - The ingest2d limit is maxed out.
2014-07-23 17:22:12: Scheduler - The ingest2d limit is maxed out.
2014-07-23 17:22:12: Scheduler - The ingest2d limit is maxed out.
2014-07-23 17:22:13: Scheduler - The ingest2d limit is maxed out.
2014-07-23 17:22:14: Scheduler - The ingest2d limit is maxed out.
2014-07-23 17:22:15: Scheduler - Successfully dequeued 1 task(s). Returning.
2014-07-23 17:22:15: 0: Shutdown
2014-07-23 17:22:16: 0: Exited ThreadMain(), cleaning up...
I monitored what an idle slave does between job tests, and it seems like it still has traffic to the mongo db.
For example, this slave started a ‘wait’ process at 17:52:33, and performed its next job check at 17:56:12 (full log attached):
2014-07-23 17:52:33: Scheduler Thread - Seconds before next job scan: 219
2014-07-23 17:56:12: Scheduler - Performing Job scan on Primary Pools with scheduling order Pool, Weighted, Balanced
However, inbetween there was mongo related traffic every 9-10 seconds back and forth between the different servers and the machine.
This wireshark packet stream is filtered only to the mongo packets, but there were additional tcp packets as well:
Its set at 20 seconds, yet it connects every 10 sec
The 20 seconds, is that setting the same for when its rendering and when its idle? While rendering, i would want more frequent updates (20s is good), since i want it to show progress fairly well. When idle, thats probably not necessary?
Also, why is it that while rendering, the farm is happy, db is healthy. But as soon as 3-500 machines go idle -> bang…
Maybe try increasing it to 40 seconds to see if that helps, and if it does, try seeing if you can lower it to 30 seconds. If you do this, and the slaves still seem to be communicating with the DB every 10 seconds, then it must be something else, and we’ll have to figure out what is responsible for that traffic. We’ll log it as an issue anyway and look into it.
I should reiterate too that we’ve made many performance improvements in Deadline 7. The slave info is in a separate database, so those higher frequency updates won’t lock other areas of the database. In addition, because we use Mongo’s new server side timestamp feature, each slave info update is only one write operation instead of two.
Another area we can probably improve upon is reducing the amount of data that is sent for each slave info update, since some of the data (total ram, machine name, video card, etc) won’t change while the slave is running.
What you are seeing every 10 seconds is the Mongo Drivers Heartbeat checking in (doing a simple ping and making sure that it is talking to the primary db). Unfortunately this is hard coded in Mongo’s drivers at 10 seconds and there is no way to change this on the users side. They already have an open ticket asking to be able to customize this (https://jira.mongodb.org/browse/CSHARP-719) however there has been no traction on as far as I can tell.
2014-09-11 10:14:50: Queueing: ['10', '11', '12']
2014-09-11 10:14:50: Dependency script returned 3 tasks that can start: /mnt/s2/exchange/software/managed/pythonScripts/site-packages/scl/deadline/scriptDependency.py
2014-09-11 10:15:00: Server Thread - Auto Config: Received packet on autoconfig port
2014-09-11 10:15:00: Server Thread - Auto Config: Picking configuration based on:
2014-09-11 10:15:00: Server Thread - LAPRO0437
2014-09-11 10:15:00: Server Thread - ::ffff:172.18.3.87
2014-09-11 10:15:00: Server Thread - Auto Config: Match found for rule Dedicated Render (HostnameRegex)
2014-09-11 10:15:00: Server Thread - Auto Config: Created a configuration worth sending
2014-09-11 10:15:00: Server Thread - Auto Config: Received a request for configuration from ::ffff:172.18.3.87
2014-09-11 10:15:00: Server Thread - Auto Config: Configuration sent
2014-09-11 10:15:14: Server Thread - Auto Config: Received packet on autoconfig port
2014-09-11 10:15:14: Server Thread - Auto Config: Picking configuration based on:
2014-09-11 10:15:14: Server Thread - LAPRO0437
2014-09-11 10:15:14: Server Thread - ::ffff:172.18.3.87
2014-09-11 10:15:14: Server Thread - Auto Config: Match found for rule Dedicated Render (HostnameRegex)
2014-09-11 10:15:14: Server Thread - Auto Config: Created a configuration worth sending
2014-09-11 10:15:14: Server Thread - Auto Config: Received a request for configuration from ::ffff:172.18.3.87
2014-09-11 10:15:14: Server Thread - Auto Config: Configuration sent
2014-09-11 10:15:30: Power Management - Thermal Shutdown: Skipping zone "Test" because it is disabled
2014-09-11 10:15:30: Power Management - Thermal Shutdown: Skipping zone "Slaves" because it is disabled
2014-09-11 10:15:30: Power Management - Thermal Shutdown: Skipping zone "Laszlo" because it is disabled
2014-09-11 10:15:30: Power Management - Thermal Shutdown: Skipping zone "AnimatorWorkstations" because it is disabled
2014-09-11 10:15:30: Power Management - Idle Shutdown: Skipping idle shutdown group "Test" because it is disabled
2014-09-11 10:15:30: Power Management - Idle Shutdown: Skipping idle shutdown group "Slaves" because it is disabled
2014-09-11 10:15:30: Power Management - Idle Shutdown: Skipping idle shutdown group "Laszlo" because it is disabled
2014-09-11 10:15:30: Power Management - Idle Shutdown: Skipping idle shutdown group "AnimatorWorkstations" because it is disabled
2014-09-11 10:15:30: Power Management - Machine Startup: There are no slaves that need to be woken up at this time
2014-09-11 10:15:30: Power Management - Machine Restart: Skipping machine group "Test" because it is disabled
2014-09-11 10:15:30: Power Management - Machine Restart: Skipping machine group "Slaves" because it is disabled
2014-09-11 10:15:30: Power Management - Machine Restart: Skipping machine group "Laszlo" because it is disabled
2014-09-11 10:15:30: Power Management - Machine Restart: Skipping machine group "AnimatorWorkstations" because it is disabled
2014-09-11 10:15:30: Power Management - Slave Scheduling: Skipping scheduling group "Test" because it is disabled
2014-09-11 10:15:30: Power Management - Slave Scheduling: Skipping scheduling group "Slaves" because it is disabled
2014-09-11 10:15:30: Power Management - Slave Scheduling: Skipping scheduling group "Laszlo" because it is disabled
2014-09-11 10:15:30: Power Management - Slave Scheduling: Skipping scheduling group "AnimatorWorkstations" because it is disabled
2014-09-11 10:15:36: Pending Job Scan - Released pending tasks [10,11,12] because the frames they depends on have finished and/or their required assets are available.
The important lines are: 2014-09-11 10:14:50: Dependency script returned 3 tasks that can start: /mnt/s2/exchange/software/managed/pythonScripts/site-packages/scl/deadline/scriptDependency.py
… 2014-09-11 10:15:36: Pending Job Scan - Released pending tasks [10,11,12] because the frames they depends on have finished and/or their required assets are available.
46 seconds to flip 3 task states from pending to queued. This is terrible… ((((((
2014-09-11 10:02:00: Queueing: ['89']
2014-09-11 10:02:00: Dependency script returned 1 tasks that can start: /mnt/s2/exchange/software/managed/pythonScripts/site-packages/scl/deadline/scriptDependency.py
2014-09-11 10:02:01: Job Cleanup Scan - Deleted completed job "[EXO] Create Proxy Quicktime: RS_190_2050_Light-Tech_beauty_v0009_r" because Delete On Complete was enabled.
2014-09-11 10:02:04: Job Cleanup Scan - Deleted completed job "[YETI] Create Proxy Quicktime: MR_246_0200_2d_slapcomps_fxslap_v0006" because Delete On Complete was enabled.
2014-09-11 10:02:08: Job Cleanup Scan - Deleted completed job "[FAST7] Create Proxy Quicktime: SG_153_1010_2d_plates_bgarray_projection_rt_v0004" because Delete On Complete was enabled.
2014-09-11 10:02:10: Job Cleanup Scan - Deleted completed job "[EXO] Create Proxy Quicktime: RS_190_2050_Light-Tech_beauty_v0009_r" because Delete On Complete was enabled.
2014-09-11 10:02:14: Job Cleanup Scan - Deleted completed job "[MJ] Create Proxy Quicktime: RP_093_0230_2d_comps_main_v0020" because Delete On Complete was enabled.
2014-09-11 10:02:17: Job Cleanup Scan - Deleted completed job "[FAST7] ShootBot: Avid QT TLR_000_1750_2d_tcomps_outside_exposure_wedge_v0002" because Delete On Complete was enabled.
2014-09-11 10:02:20: Job Cleanup Scan - Deleted completed job "[FAST7] ShootBot: Avid QT SG_153_1010_2d_plates_bgarray_projection_rt_v0004" because Delete On Complete was enabled.
2014-09-11 10:02:23: Job Cleanup Scan - Deleted completed job "[EXO] ShootBot: Avid QT RS_190_7030_FL-LeadSurfSprayMist_v0029" because Delete On Complete was enabled.
2014-09-11 10:02:23: Job Cleanup Scan - Deleted 10 and archived 0 completed jobs in 28.379 s
2014-09-11 10:02:23: Job Cleanup Scan - Done.
2014-09-11 10:02:23: Purging Unsubmitted Jobs
2014-09-11 10:02:23: Unsubmitted Job Scan - Loading unsubmitted jobs
2014-09-11 10:02:25: Unsubmitted Job Scan - Loaded 2 unsubmitted jobs in 704.699 ms
2014-09-11 10:02:25: Unsubmitted Job Scan - Deleted 0 unsubmitted jobs in 2.000 μs
2014-09-11 10:02:25: Unsubmitted Job Scan - Done.
2014-09-11 10:02:25: Purging Deleted Jobs
2014-09-11 10:02:25: Deleted Job Scan - Loading deleted jobs
2014-09-11 10:02:25: Deleted Job Scan - Loaded 9800 deleted jobs in 507.222 ms
2014-09-11 10:02:25: Deleted Job Scan - Purging deleted job '541082811d5ce10ae4d77913' because it was deleted over 24 hour(s) ago
2014-09-11 10:02:26: Deleted Job Scan - Purging deleted job '5410835761078a0c3c9b79cb' because it was deleted over 24 hour(s) ago
2014-09-11 10:02:26: Deleted Job Scan - Purging deleted job '54108375ad996b1314b256cd' because it was deleted over 24 hour(s) ago
2014-09-11 10:02:27: Deleted Job Scan - Purging deleted job '541082fe81dee029287d9ed6' because it was deleted over 24 hour(s) ago
2014-09-11 10:02:28: Deleted Job Scan - Purging deleted job '5410833d069eb45dcc26406d' because it was deleted over 24 hour(s) ago
2014-09-11 10:02:29: Deleted Job Scan - Purging deleted job '5410827b929e66467426d0d2' because it was deleted over 24 hour(s) ago
2014-09-11 10:02:30: Deleted Job Scan - Purging deleted job '541082481c26ae48087e4414' because it was deleted over 24 hour(s) ago
2014-09-11 10:02:31: Deleted Job Scan - Purging deleted job '541082844c2e5fbe5cdf5e8f' because it was deleted over 24 hour(s) ago
2014-09-11 10:02:31: Deleted Job Scan - Purging deleted job '5410835461078a0a00965a40' because it was deleted over 24 hour(s) ago
2014-09-11 10:02:32: Deleted Job Scan - Purging deleted job '5410836cad996b1ce4d0b233' because it was deleted over 24 hour(s) ago
2014-09-11 10:02:32: Deleted Job Scan - Purging deleted job '54108372ad996b19749a89fb' because it was deleted over 24 hour(s) ago
2014-09-11 10:02:33: Deleted Job Scan - Purged 11 deleted jobs in 7.873 s
2014-09-11 10:02:33: Deleted Job Scan - Done.
2014-09-11 10:02:33: Purging Old Job Auxiliary Files
2014-09-11 10:02:33: Auxiliary File Scan - Scanning for auxiliary directories
2014-09-11 10:02:39: Auxiliary File Scan - Found 25414 auxiliary directories in 6.430 s
2014-09-11 10:02:39: Auxiliary File Scan - Loading job IDs
2014-09-11 10:02:41: Auxiliary File Scan - Loaded 25406 job IDs in 1.684 s
2014-09-11 10:02:45: Pending Job Scan - Released pending tasks [89] because the frames they depends on have finished and/or their required assets are available.
2014-09-11 10:02:46: Queueing: ['89']
2014-09-11 10:02:46: Dependency script returned 1 tasks that can start: /mnt/s2/exchange/software/managed/pythonScripts/site-packages/scl/deadline/scriptDependency.py
2014-09-11 10:03:08: Server Thread - Auto Config: Received packet on autoconfig port
2014-09-11 10:03:08: Server Thread - Auto Config: Picking configuration based on:
2014-09-11 10:03:08: Server Thread - LAPRO0419
2014-09-11 10:03:08: Server Thread - ::ffff:172.18.3.69
2014-09-11 10:03:08: Server Thread - Auto Config: Match found for rule Dedicated Render (HostnameRegex)
2014-09-11 10:03:08: Server Thread - Auto Config: Created a configuration worth sending
2014-09-11 10:03:08: Server Thread - Auto Config: Received a request for configuration from ::ffff:172.18.3.69
2014-09-11 10:03:08: Server Thread - Auto Config: Configuration sent
2014-09-11 10:03:36: Pending Job Scan - Released pending tasks [89] because the frames they depends on have finished and/or their required assets are available.
This includes the fix that Jon made that should help with the performance of releasing these pending tasks. You should be able to just update your Pulse machine initially to see if this helps.
After killing all deadline processes and restarting them again, its somewhat faster:
2014-09-11 10:25:05: Queueing: ['8', '14', '15']
2014-09-11 10:25:05: Dependency script returned 3 tasks that can start: /mnt/s2/exchange/software/managed/pythonScripts/site-packages/scl/deadline/scriptDependency.py
2014-09-11 10:25:15: Pending Job Scan - Released pending tasks [8,14,15] because the frames they depends on have finished and/or their required assets are available.
10 seconds for 3 tasks, faster than before, but about 1/100th the speed that it should be,… this should take a ~50-100ms, if that.
Something is going on in there guys thats making things very sad for us…
We just found a bug in the build that was preventing Slaves from properly marking Jobs as failed through Failure Detection, so if you rely heavily on Failure Detection, I’d recommend waiting for the next patch. I fixed the issue internally already, so we should get another build out relatively quickly.
But yeah, when the time comes, auto-upgrade should work .
Laszlo, I think you should still put the new build on your Pulse machine at least though, if you have time. This shouldn’t impact Failure Detection, and should help lighten the load on the DB when doing dependency checks.