Our pending job scan takes about 20 minutes (!) per cycle. The majority of that time is not actually the pending job testing, but changing the task statuses…
For example:
2013-12-16 00:29:38: Executing dependency script: /mnt/s2/exchange/software/managed/pythonScripts/site-packages/scl/deadline/scriptDependency.py
2013-12-16 00:29:38: Running Scanline handleJobDependencies checker v0.42 for JobID: 52aeb133c3f6ebe230ad647d TaskIDs: [u’0’, u’1’, u’2’, u’3’, u’4’, u’5’, u’6’, u’7’, u’8’, u’9’, u’10’, u’11’, u’12’, u’13’, u’14’, u’15’, u’16’, u’17’, u’18’, u’19’, u’20’, u’21’, u’22’, u’23’, u’24’, u’25’, u’26’, u’27’, u’28’, u’29’, u’30’, u’31’, u’32’, u’33’, u’34’, u’35’, u’36’, u’37’, u’38’, u’39’, u’40’, u’41’, u’42’, u’43’, u’44’, u’45’, u’46’, u’47’, u’48’, u’49’, u’50’, u’51’, u’52’, u’53’, u’54’, u’55’, u’56’, u’57’, u’58’, u’59’, u’60’, u’61’, u’62’, u’63’, u’64’, u’65’, u’66’, u’67’, u’68’, u’69’, u’70’, u’71’, u’72’, u’73’, u’74’, u’75’, u’76’, u’77’, u’78’, u’79’, u’80’, u’81’, u’82’, u’83’, u’84’, u’85’, u’86’, u’87’, u’88’, u’89’, u’90’, u’91’, u’92’, u’93’, u’94’, u’95’, u’96’, u’97’, u’98’, u’99’, u’100’, u’101’, u’102’, u’103’, u’104’, u’105’, u’106’, u’107’, u’108’, u’109’, u’110’, u’111’, u’112’, u’113’, u’114’, u’115’, u’116’, u’117’, u’118’, u’119’, u’120’, u’121’, u’122’, u’123’, u’124’, u’125’, u’126’, u’127’, u’128’, u’129’, u’130’, u’131’, u’132’, u’133’, u’134’, u’135’, u’136’, u’137’, u’138’, u’139’, u’140’, u’141’, u’142’, u’143’, u’144’, u’145’, u’146’, u’147’, u’148’, u’149’, u’150’, u’151’, u’152’, u’153’, u’154’, u’155’, u’156’, u’157’, u’158’, u’159’, u’160’, u’161’]
2013-12-16 00:29:38: Dependency to check: [Flowline] /mnt/flspace/CL/vulcan/WW_108_0150/cache/flowline/Spray/v0304_str_Solved/Spray
2013-12-16 00:29:38: globbing: set([])
2013-12-16 00:29:38: cachePathStates: set([952, 953, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, 965, 966, 967, 968, 969, 970, 971, 972, 973, 974, 975, 976, 977, 978, 979, 980, 981])
2013-12-16 00:29:38: globbing done. Checking against task list…
2013-12-16 00:29:38: Queueing: [‘0’, ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’, ‘7’, ‘8’, ‘9’, ‘10’, ‘11’, ‘12’, ‘13’, ‘14’, ‘15’, ‘16’, ‘17’, ‘18’, ‘19’, ‘20’, ‘21’, ‘22’, ‘23’, ‘24’, ‘25’, ‘26’, ‘27’, ‘28’, ‘29’, ‘30’]
2013-12-16 00:29:38: Dependency script returned 31 tasks that can start: /mnt/s2/exchange/software/managed/pythonScripts/site-packages/scl/deadline/scriptDependency.py
Note that the whole test took less then a second. Then, there is a 2 minute delay before it actually releases the tasks…:
2013-12-16 00:31:07: Pending Job Scan - Released pending tasks (0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30) for job "[VULCAN] WW_108_0150_v0304_str_Solved_cache_flowline_Mist_2 " because the frames they depends on have finished and/or their required assets are available.
This is a single job, and since we have hundreds, it really adds up. People end up trying manually release their jobs, but that also takes ages.
The deadline mongo machine is now extremely powerful, we are on beta13, and pulse is running on a different machine. Something else is up…