AWS Thinkbox Discussion Forums

[Linux] Job ignored, regardless of pool/group settings

I know there is an ongoing discussion of the “job not being picked up” issue in another thread, but I figured it would be cleaner for me to start my own.

I’m seeing the same issue where a job is in the queue, but is ignored by the available slaves.

Initially, I submitted the job to the ‘none’ group and pool. The slave I was testing with had no groups/pools assigned, and it ignored the job. The JSON dump of the job at that point is:

{ "Arch" : false, "Aux" : [ ], "Bad" : [ ], "CompletedChunks" : 0, "Date" : ISODate("2012-10-15T21:27:08.800Z"), "DateComp" : ISODate("0001-01-01T00:00:00Z"), "DateStart" : ISODate("0001-01-01T00:00:00Z"), "Errs" : 0, "FailedChunks" : 0, "IsSub" : true, "LastWriteTime" : ISODate("2012-10-15T21:27:09.172Z"), "Mach" : "ws-vm02", "OutDir" : [ "/home/ruschn/Videos/LocalHeroAlexaTest-Day-LogC-tif" ], "OutFile" : [ "LocalHeroAlexaTest-Day-LogC.%04d.tif" ], "PendingChunks" : 0, "Plug" : "FFmpeg", "PlugInfo" : { "InputFile0" : "/home/ruschn/Videos/LocalHeroAlexaTest-Day-LogC.mov", "OutputFile" : "/home/ruschn/Videos/LocalHeroAlexaTest-Day-LogC-tif/LocalHeroAlexaTest-Day-LogC.%04d.tif", "OutputArgs" : "-vcodec tiff", "UseSameInputArgs" : "False" }, "Props" : { "Name" : "mov to tiff test 1", "User" : "ruschn", "Cmmt" : "", "CmmtTag" : "", "Dept" : "", "Frames" : "0", "Chunk" : 1, "Tasks" : 1, "Grp" : "none", "Pool" : "none", "Pri" : 50, "Conc" : 1, "ConcLimt" : true, "AuxSync" : false, "Int" : false, "Seq" : false, "Reload" : false, "NoEvnt" : false, "OnComp" : 2, "AutoTime" : false, "TimeScrpt" : false, "MinTime" : 0, "MaxTime" : 0, "Timeout" : 1, "Dep" : [ ], "DepFrame" : false, "DepComp" : true, "DepDel" : false, "DepFail" : false, "DepPer" : -1, "NoBad" : false, "JobFailOvr" : false, "JobFailErr" : 0, "TskFailOvr" : false, "TskFailErr" : 0, "SndWarn" : true, "NotOvr" : false, "SndEmail" : false, "NotEmail" : [ ], "NotUser" : [ "ruschn" ], "NotNote" : "", "Limits" : [ ], "ListedSlaves" : [ ], "White" : false, "MachLmt" : 0, "MachLmtProg" : -1, "PrJobScrp" : "", "PoJobScrp" : "", "PrTskScrp" : "", "PoTskScrp" : "", "Schd" : 0, "SchdDays" : 1, "SchdDate" : ISODate("0001-01-01T00:00:00Z"), "SchdDateRan" : ISODate("0001-01-01T00:00:00Z"), "Ex0" : "", "Ex1" : "", "Ex2" : "", "Ex3" : "", "Ex4" : "", "Ex5" : "", "Ex6" : "", "Ex7" : "", "Ex8" : "", "Ex9" : "", "ExDic" : { } }, "QueuedChunks" : 1, "RenderingChunks" : 0, "Stat" : 1, "SuspendedChunks" : 0, "Tile" : false, "TileFrame" : 0, "TileX" : 0, "TileY" : 0, "_id" : "507c7faca2cb531474547f8b" }

Thinking it may have been an anomaly with the ‘none’ group/pool, I added a group and apool, both called ‘all’, put the slave in them, and assigned the job to both. However, the slave is still ignoring the job. The JSON for that slave is:

{ "Arch" : "x86_64", "BadJobs" : 0, "CPU" : 2, "Disk" : NumberLong("2749100032"), "DiskStr" : "2.56 GB ", "Grps" : "all", "Host" : "ws-vm02", "IP" : "100.100.200.8", "JobGrp" : "", "JobId" : "", "JobName" : "", "JobPlug" : "", "JobPool" : "", "JobPri" : -1, "JobUser" : "", "LastWriteTime" : ISODate("2012-10-15T21:56:16.306Z"), "Lic" : "@ws-vm01", "LicEx" : 108, "LicFree" : false, "LicPerm" : false, "Limits" : [ ], "MAC" : "08:00:27:93:F2:E6", "Msg" : "2012/10/15 12:34:15 Slave started", "Name" : "ws-vm02", "OS" : "Linux", "OnTskComp" : "Continue Running", "Pools" : "all", "Port" : 35193, "ProcSpd" : NumberLong(2792), "Procs" : 4, "Pulse" : true, "RAM" : NumberLong(2100809728), "RAMFree" : NumberLong(1455345664), "RndTime" : 0, "Stat" : 2, "StatDate" : ISODate("2012-10-15T19:34:16.908Z"), "TskComp" : 0, "TskFail" : 0, "TskId" : "", "TskName" : "", "TskProg" : "", "TskStat" : "", "Up" : 8520.1162109375, "User" : "ruschn", "Ver" : "v6.0.0.48694 R", "Vid" : "InnoTek Systemberatung GmbH VirtualBox Graphics Adapter", "_id" : "ws-vm02" }

Man, we are having absolutely no luck reproducing this problem, but at least 3 users have reported it now!! We’ve looked through your job’s json, and Alcium’s here:
viewtopic.php?f=156&t=8278&p=34833#p34827

Nothing stands out. We’ve tried combinations of pools and groups, machine limits > 0, white lists and black lists, and they seem to be behaving properly.

I think the next thing to do is enable Slave Verbose logging. This can be done from the Application Logging section of the Repository Options. After enabling it, restart a slave that should pick up the job. After it has made a few attempts to search for a job, grab the slave log for the current session and post it. You can find the log by selecting Help -> Explore Log Folder from the Slave application. I’m really hoping there is something here that explains what’s going on.

Also, can you confirm that the machine you’re submitting from and the slave are running beta 2? The version number is 6.0.48694.

Thanks!

  • Ryan

Fresh setup for both Repository and DB
Here is another pair of jobs JSON
One which I have set every possible slave in whitelist, the other is untouched

I can handle letting you have remote other one of our computers if you want to directly interact with the system
in the meantime I’ll try enable berbose log

Here we go with a log (shortened for display purposes, but always showing the same bits)

Remote access would be great!

The slave isn’t complaining about Limits (which was the bug in beta 1), so I think remote access would be the way to go. You can send the info to me directly. Just click on my user name and send me an email.

Thanks!

  • Ryan

The Pulse log might have its interests
deadlinepulse(SRV-DEADLINE)-2012-10-16-0002.log (132 KB)

Same result for me after enabling verbose logging. Same behavior whether job machine limit is set to 0 or something else.

We’ve finally tracked this down to a bug that only affected the RELEASE builds, which is why we could never reproduce this in our debuggers. We’re hoping to get a new build uploaded tomorrow, but if not tomorrow, definitely before the end of the week.

Quality, well done Ryan.

The issues I’m having with this and Mongo, I am going to remove it all and starrt again for the next beta 3 release.

Mark

Privacy | Site terms | Cookie preferences