AWS Thinkbox Discussion Forums

Deadline Machine Limit not using all nodes

Here is our situation

  • We have 500 render nodes
  • 1 Job on the farm with a machine limit of 40

How can we make it so if render nodes are available and not being used, that frames will be picked up on the farm, until other jobs are submitted even when its limit is 40?

Because what is happening is we will have 100 jobs on the farm all set with a machine limit of 40, however as the jobs complete and we have less jobs on the farm, they are not utilizing all the available nodes when only a few jobs remain.

Is this something achievable?

The Machine Limit is a hard limit. It might be possible to manipulate it with scripts and events, but it would be probably messy.

I would suggest you look instead into the “Pool, Weighted, First-In First-Out” job scheduling algorithm:
docs.thinkboxsoftware.com/produ … ling-order

It behaves similar to the default “Pool, Priority, First-In First-Out”, but the Weighted part lets you set up some fancy logic based on the Priority, the number of Tasks the Job is already rendering, the number of Errors it has accumulated, and the submission time.

If you set the number of Tasks to reduce the Weight (which is the default behavior anyway), the Job with the highest priority and earliest submission time will get the first Slaves, but the more Tasks it gets running, the lower the Job’s Weight will become, until at some point the Slaves will start ignoring it because another Job with the same Priority has less Tasks and thus higher Weight. As result, the Slaves will distribute themselves naturally across the earliest, highest priority Jobs. However, as the number of Jobs on the queue goes down, the Slaves will spread across the remaining Jobs in higher numbers, and if only one Job remains, all Slaves will end up working on it because they have nowhere else to go…


I just made a quick prototype script that simulates the Weight algorithm. I seeded 30 jobs with the same priority of 50, submission time 1 second apart, no errors, and asked 500 slaves to distribute themselves across these jobs using a Render Task Weigth of -2000.
Here is the output with the job name, number of active tasks, and the Slave IDs (1 to 500) assigned to their tasks:

Job 0 		 0 : #() 
Job 1 		 0 : #() 
Job 2 		 0 : #() 
Job 3 		 0 : #() 
Job 4 		 0 : #() 
Job 5 		 1 : #(495) 
Job 6 		 2 : #(457, 482) 
Job 7 		 4 : #(421, 445, 471, 497) 
Job 8 		 5 : #(386, 409, 433, 458, 484) 
Job 9 		 7 : #(353, 375, 399, 422, 446, 472, 496) 
Job 10 		 8 : #(321, 342, 365, 387, 410, 434, 459, 485) 
Job 11 		 10 : #(291, 311, 332, 354, 376, 400, 423, 447, 473, 499) 
Job 12 		 11 : #(262, 281, 302, 322, 343, 366, 388, 412, 435, 460, 486) 
Job 13 		 13 : #(235, 253, 272, 292, 312, 333, 355, 378, 401, 425, 448, 475, 500) 
Job 14 		 14 : #(210, 227, 244, 264, 282, 303, 323, 345, 367, 389, 413, 436, 461, 483) 
Job 15 		 15 : #(185, 201, 218, 236, 254, 273, 294, 313, 334, 356, 377, 402, 426, 449, 476) 
Job 16 		 17 : #(162, 177, 193, 211, 226, 245, 265, 283, 304, 324, 346, 369, 390, 414, 437, 462, 487) 
Job 17 		 18 : #(141, 156, 170, 186, 202, 219, 237, 256, 274, 295, 315, 335, 357, 379, 403, 427, 450, 477) 
Job 18 		 20 : #(121, 135, 148, 164, 178, 194, 212, 228, 246, 266, 284, 305, 325, 344, 364, 391, 415, 438, 463, 489) 
Job 19 		 21 : #(103, 115, 128, 142, 155, 171, 187, 203, 220, 239, 255, 275, 296, 316, 336, 358, 380, 404, 428, 451, 478) 
Job 20 		 23 : #(86, 97, 109, 122, 136, 149, 165, 179, 195, 213, 229, 247, 267, 285, 306, 326, 347, 370, 392, 416, 439, 464, 488) 
Job 21 		 24 : #(71, 81, 92, 104, 116, 129, 143, 157, 172, 189, 204, 221, 240, 257, 276, 293, 314, 337, 359, 382, 405, 424, 452, 470) 
Job 22 		 26 : #(57, 66, 76, 87, 98, 110, 123, 137, 150, 166, 180, 196, 214, 230, 248, 268, 286, 307, 327, 349, 371, 393, 417, 441, 465, 490) 
Job 23 		 27 : #(45, 53, 62, 72, 82, 93, 105, 117, 130, 144, 158, 173, 190, 205, 222, 241, 258, 277, 297, 317, 338, 361, 381, 406, 429, 453, 479) 
Job 24 		 29 : #(34, 41, 49, 59, 67, 77, 88, 100, 111, 124, 134, 151, 167, 181, 197, 215, 231, 249, 269, 287, 308, 328, 350, 372, 394, 411, 442, 466, 491) 
Job 25 		 30 : #(25, 31, 39, 46, 54, 63, 73, 83, 94, 106, 118, 131, 145, 159, 174, 191, 206, 223, 242, 259, 278, 298, 319, 339, 362, 383, 398, 430, 454, 480) 
Job 26 		 32 : #(17, 22, 28, 35, 43, 51, 60, 68, 78, 89, 101, 112, 125, 138, 152, 168, 182, 198, 216, 232, 250, 270, 288, 309, 329, 348, 373, 395, 418, 443, 467, 492) 
Job 27 		 34 : #(11, 15, 20, 26, 32, 40, 47, 55, 64, 74, 84, 95, 107, 119, 132, 146, 160, 175, 192, 207, 224, 238, 260, 279, 299, 320, 340, 363, 384, 407, 431, 455, 474, 498) 
Job 28 		 35 : #(6, 9, 13, 18, 23, 29, 36, 44, 52, 58, 69, 79, 90, 99, 113, 126, 139, 153, 169, 183, 199, 209, 233, 251, 271, 289, 310, 330, 351, 368, 396, 419, 440, 468, 493) 
Job 29 		 36 : #(3, 5, 8, 12, 16, 21, 27, 33, 38, 48, 56, 65, 75, 85, 96, 108, 120, 133, 147, 161, 176, 188, 208, 225, 243, 261, 280, 300, 318, 341, 360, 385, 408, 432, 456, 481) 
Job 30 		 38 : #(1, 2, 4, 7, 10, 14, 19, 24, 30, 37, 42, 50, 61, 70, 80, 91, 102, 114, 127, 140, 154, 163, 184, 200, 217, 234, 252, 263, 290, 301, 331, 352, 374, 397, 420, 444, 469, 494)

As the slaves will crunch through the tasks of the earlier jobs, the pyramid will slowly move up through the list of jobs, processing all their tasks. You don’t get exactly 40 slaves per job, but it is kinda close. With a weight of -1800, Job 30 gets exactly 40 slaves, and the rest gradually less.

This is certainly and interesting approach. I’ll have to test this out. It seems like it could work for what we are after. A little setup would be involved but not to bad.

Thank you bobo for the breakdown and links

Privacy | Site terms | Cookie preferences