AWS Thinkbox Discussion Forums

dequeue - returning stubs speed

After a failed attempt to dequeue tasks, this is how a slave’s log looks like typically:

2014-09-26 15:30:59:  Scheduler - Job chooser found no jobs.
2014-09-26 15:30:59:  Scheduler - Returning limit stubs not in use.
2014-09-26 15:30:59:  Scheduler -   returning mx_fastmachines
2014-09-26 15:31:00:  Scheduler -   returning 54235c3c7437680e1c40ef33
2014-09-26 15:31:02:  Scheduler -   returning mx_flowlinerender
2014-09-26 15:31:03:  Scheduler -   returning mx_render
2014-09-26 15:31:03:  Scheduler -   returning vray2
2014-09-26 15:31:04:  Scheduler -   returning 54235c5b743768184cd65053
2014-09-26 15:31:07:  Scheduler -   returning 542366169154b51bd4b9c121
2014-09-26 15:31:08:  Scheduler -   returning 54246108162dfe27740e32b0
2014-09-26 15:31:09:  Scheduler -   returning 5424e82474376811846795cf
2014-09-26 15:31:09:  Scheduler -   returning 5424e83b74376818203ddb9e
2014-09-26 15:31:10:  Scheduler -   returning 5424eb7874376810442eb474
2014-09-26 15:31:10:  Scheduler -   returning 5424eb8a7437681e5c9e9b9a
2014-09-26 15:31:11:  Scheduler -   returning 5424eb9c74376817785127ba
2014-09-26 15:31:12:  Scheduler -   returning 542535410294982a8092915d
2014-09-26 15:31:13:  Scheduler -   returning 542580139154b52d28fc557f
2014-09-26 15:31:14:  Scheduler -   returning 54248a13743768138c967063
2014-09-26 15:31:15:  Scheduler -   returning 5425e6ef4af95217343db04f
2014-09-26 15:31:17:  Scheduler -   returning vrayformaya
2014-09-26 15:31:20:  Scheduler -   returning win7
2014-09-26 15:31:20:  Scheduler -   returning 5425876c9154b51308f0383a
2014-09-26 15:31:21:  Scheduler -   returning 5425879d9154b52978feef17
2014-09-26 15:31:22:  Scheduler -   returning 5424799e9154b51420d36097
2014-09-26 15:31:23:  Scheduler -   returning 541ca6550294981bd4c8578a
2014-09-26 15:31:24:  Scheduler -   returning 542489f47437681418c56b49
2014-09-26 15:31:28:  Scheduler -   returning 542489dc74376819ac7777fe
2014-09-26 15:31:31:  Scheduler -   returning 5424c164b25c5a068c365533
2014-09-26 15:31:35:  Scheduler -   returning 5424c18bb25c5a30a45a2516
2014-09-26 15:31:37:  Scheduler -   returning 5425aecec5f29c1bdc651ddb
2014-09-26 15:31:38:  Scheduler -   returning 5425c212b25c5a2954c93931
2014-09-26 15:31:41:  Scheduler -   returning 5425c25db25c5a2fbc25998c
2014-09-26 15:31:41:  Scheduler -   returning 5425ae8dc5f29c1c50415e75
2014-09-26 15:31:42:  Scheduler -   returning 540db584ff8c261464b642d9
2014-09-26 15:31:43:  Scheduler -   returning vray3
2014-09-26 15:31:44:  Scheduler -   returning 5422f1f014529e14545089f8
2014-09-26 15:31:46:  Scheduler -   returning 5422f1f114529e14545089fb
2014-09-26 15:31:47:  Scheduler -   returning 5422f1f114529e14545089fe
2014-09-26 15:31:48:  Scheduler -   returning 542359722f3caf19d09bcd19
2014-09-26 15:31:49:  Scheduler -   returning 54238fe6bb8cd8886cbec827
2014-09-26 15:31:52:  Scheduler -   returning 54248141bb8cd86530c8cfaf
2014-09-26 15:31:53:  Scheduler -   returning 54248a3d40d1281b18d57562
2014-09-26 15:31:53:  Scheduler -   returning 54248cffbb8cd86ba03fec8d
2014-09-26 15:31:54:  Scheduler -   returning 5424aaf231b809148c253d1c
2014-09-26 15:31:56:  Scheduler -   returning 5425984acb2bf21a50ca4a23
2014-09-26 15:31:57:  Scheduler -   returning 54259bfacb2bf20e24613c22
2014-09-26 15:31:57:  Scheduler -   returning 54259c1ecb2bf22118aab40e
2014-09-26 15:32:02:  Scheduler -   returning 54259d18cb2bf22040898bbf
2014-09-26 15:32:04:  Scheduler -   returning 54259d34cb2bf220c492e453
2014-09-26 15:32:06:  Scheduler -   returning 5425b58a01f73e256cfde5b5
2014-09-26 15:32:07:  Scheduler -   returning 5425bb2a01f73e263090c8be
2014-09-26 15:32:08:  Scheduler -   returning 5425bc4601f73e11e42c00ae
2014-09-26 15:32:09:  Scheduler -   returning 5425c231b25c5a2a388e4839
2014-09-26 15:32:09:  Scheduler -   returning 5425c84b10f8521e783911a0
2014-09-26 15:32:10:  Scheduler -   returning 5425c8d610f8520f3448376b
2014-09-26 15:32:11:  Scheduler -   returning 5425d19e10f85212bc0a2650
2014-09-26 15:32:13:  Scheduler -   returning 5425d58cbb649a190007a586
2014-09-26 15:32:13:  Scheduler -   returning 5425d8bd162dfe444c6c9693
2014-09-26 15:32:14:  Scheduler -   returning 5425d85db25c5a1e103e9b42
2014-09-26 15:32:16:  Scheduler -   returning 5425d874029498333c7fd831
2014-09-26 15:32:18:  Scheduler -   returning 5425d8f2162dfe1d20115829
2014-09-26 15:32:22:  Scheduler -   returning 5425d89e0294981e5cfeefe5
2014-09-26 15:32:22:  Scheduler -   returning 5425da1e2702df2f2030675a
2014-09-26 15:32:28:  Scheduler -   returning 5425d9d9b25c5a1ecc6b59a2
2014-09-26 15:32:28:  Scheduler -   returning 5425da4b2702df2ae41f4c4f
2014-09-26 15:32:32:  Scheduler -   returning 5425da0bb25c5a1e9cc433dd
2014-09-26 15:32:34:  Scheduler -   returning 5425da37b25c5a305c02e26d
2014-09-26 15:32:39:  Scheduler -   returning 5425dbb3162dfe41d4ab34a5
2014-09-26 15:32:41:  Scheduler -   returning 5425dbc8162dfe2b6899fc07
2014-09-26 15:32:46:  Scheduler -   returning 5425dc577437681484213768
2014-09-26 15:32:47:  Scheduler -   returning 5425dc7374376812900408b8
2014-09-26 15:32:48:  Scheduler -   returning 5425dc8d7437681e3847628d
2014-09-26 15:32:49:  Scheduler -   returning 5425ddbb2702df163805b4b5
2014-09-26 15:32:51:  Scheduler -   returning 5425ddea2702df250ca0ba39
2014-09-26 15:32:57:  Scheduler -   returning 5425dde7c5f29c127444ace3
2014-09-26 15:32:57:  Scheduler -   returning 5425de132702df16ccd483b0
2014-09-26 15:32:58:  Scheduler -   returning 5425de18c5f29c1edc9b1b60
2014-09-26 15:33:00:  Scheduler -   returning 5425dff3bb649a27a0535cc1
2014-09-26 15:33:03:  Scheduler -   returning 5425e083bb649a16704d93e6
2014-09-26 15:33:04:  Scheduler -   returning 5424af178bdca138dc03032d
2014-09-26 15:33:07:  Scheduler -   returning 5425bde531b80904f0f9f07c
2014-09-26 15:33:10:  Scheduler -   returning 5425deac6404271fac79fc5d
2014-09-26 15:33:14:  Scheduler -   returning 5425af9dc5f29c233c3a9599
2014-09-26 15:33:14:  Scheduler -   returning 54259859cb2bf222405d0635
2014-09-26 15:33:18:  Scheduler -   returning 5425c1ab9154b516ec297f23
2014-09-26 15:33:20:  Scheduler -   returning 5425aeaac5f29c13b0a29826
2014-09-26 15:33:20:  Scheduler -   returning 54259ca82702df3a74b33a1b
2014-09-26 15:33:21:  Scheduler -   returning 5425b52453fa152bc40281e0
2014-09-26 15:33:24:  Scheduler -   returning 5425b54d53fa15202851563d
2014-09-26 15:33:25:  Scheduler -   returning 5425dc3e162dfe3ec4000449
2014-09-26 15:33:26:  Scheduler -   returning 5425dfd2162dfe3d10e40e39
2014-09-26 15:33:26:  Scheduler -   returning 5425b738029498120cb2ad17
2014-09-26 15:33:28:  Scheduler -   returning 5425d82ab25c5a1b147f6435
2014-09-26 15:33:30:  Scheduler -   returning 5425d8420294981588708be6
2014-09-26 15:33:31:  Scheduler -   returning 5425ddfd162dfe494c8556f9
2014-09-26 15:33:31:  Scheduler -   returning 5425d27fbb649a214c479fbd
2014-09-26 15:33:33:  Scheduler Thread - Seconds before next job scan: 268

2 and a half minutes of returning stubs for jobs it couldn’t even dequeue. I’m confused why it takes ownership of those stubs in the first place if it can’t dequeue the job for other reasons…, but also, why does it take 160 seconds to return 94 limits. Note that this was a freshly started slave, it never rendered anything so should not have any leftover stubs from a previous job.

So it started, grabbed 94 limit stubs, then gave them back in 5(!) mins:

2 mins 23 seconds to go through available jobs:
2014-09-26 15:28:36: Scheduler - Performing Job scan on Primary Pools with scheduling order Pool, Weighted, Balanced

2014-09-26 15:30:59: Scheduler - Job chooser found no jobs.

then another 2 and a half minutes to return the stubs.

Why do the slaves take ownership of stubs for jobs they can’t dequeue? I’m guessing they first grab the limit requirements of a job, check if there are available stubs for each limit, then if that is correct, then they would attempt to grab one of each. But it seems like the “checking if there is anything” step is actually the “grabbing it” step. So you end up with stubs for jobs you never had the chance of rendering - also, you probably throttle other slave’s ability to render those jobs as well, since the limit stubs are not available for them.
log.txt (41.7 KB)

This is a known issue, and occurs when a slave is able to get all the stubs it needs for a job, but then that job has no remaining queued tasks to render (it acquires the stubs before dequeuing a task). There is a final call to return all unused limit stubs at the end of the dequeuing process, and that’s when these all get returned.

We think the proper way to handle this is to return job-specific limits for each job on the fly if we can’t get a task for it. However, we think that the slave should still hold the global limits until the very end. That way, it’s not getting and returning them for each job along the way.

So basically it tried to dequeue a task, but in the meantime, somebody else already has?

Yup, that’s correct. We’re already working on ways to mitigate this as well.

Cheers,
Ryan

Central dispatcher? :slight_smile:

Privacy | Site terms | Cookie preferences