AWS Thinkbox Discussion Forums

few idle nodes while everything else is flat out

Couple of questions:

We have 20 nodes, recently rebuilt and all running centos6.3.
Due to a config change early on in the build cycle 6 of the nodes failed to process a task and the other 14 carried on.
The config change has been applied, nodes rebooted and a new job has been submitted.
These 6 nodes are sitting there idle and I suspect they will stay that way until the current running job has been completed which at its current render rate will be about 4 hours.
There are no pools/groups setup, so how can I kick the idle nodes into picking up the queued job or the current rendering job?

I notice that the render jobs are currently only using 45% of the CPU so I want to double up the tasks each node can render.
Is it as simple as setting concurrent tasks to 2 before the job starts as I have found that setting this to 2 once the job is rendering does nothing.

Chris

  1. Can you enable slave verbose logging in the Application Logging section of the Repository Options, and then restart one of those 6 slaves? If it’s having trouble dequeuing a job for some reason, it should print this out to the log.

  2. We were able to reproduce this. Currently, the slave only checks the job’s concurrent task count when it is dequeuing tasks for a job. So when it got the first task, it saw concurrent tasks was set to 1. While it’s working on that task, it doesn’t check for more because of the cached value of 1. When you bump it up to 2, the slave won’t detect that until it finishes its current task and goes to dequeue another one. At that point, it will dequeue 2, because it’s refreshing the concurrent task count.

We’ve logged (2) as a bug, and we’ll see if we can properly deal with this situation for 6.0.

Cheers,

  • Ryan

For 1)

Its a license problem… but the slave task does not complain directly, once you start the launcher you start to see license errors and the monitor view shows idle when it should say error or something more meaningful.

Now I have the battle to get the license problem resolved…

What’s listed under the License column for those slaves in the Monitor? Is there anything to indicate there is a licensing problem? If there is, we can use that information to highlight the problem in the list better.

Thanks!

  • Ryan

Hi Ryan,

DSorry for the delay I wasn’t in the office for the last few days and really wanted a screen shot of this as the License column isn’t very good at all.

We have 16 V5 licenses (4 were temp & expired) and 20 V6 beta.

Some licenses are showing as permanent (how?), some have 1 day, 10 days or 15days…left. Nothing is showing as expired, but I know 4 nodes wont work as there are not enough licenses.
Basically that field doesn’t seem to get updated when the status update runs…

See attached screen shot.

Chris

Privacy | Site terms | Cookie preferences