AWS Thinkbox Discussion Forums

Submitting manually-constructed job/plugin info files

This is beta 11 on Linux.

I just tried manually throwing together JobInfo and PluginInfo files and submitting them using deadlinecommand, but the jobs are corrupt somehow (they have the issue where the JSON starts with “_id” instead of “Aux”).

Before testing this submission, I had a single job in the queue. After I ran it once, I still had one job in the queue, but if I looked at the Mongo DB, this is what I got:

{ "Aux" : [ "commandsfile.txt" ], "Bad" : [ ], "CompletedChunks" : 2, "Date" : ISODate("2013-01-30T00:33:54.284Z"), "DateComp" : ISODate("0001-01-01T00:00:00Z"), "DateStart" : ISODate("2013-01-30T00:38:35.419Z"), "Errs" : 4, "FailedChunks" : 0, "IsSub" : true, "LastWriteTime" : ISODate("2013-01-31T03:12:06.514Z"), "Mach" : "ws-082", "OutDir" : [ ], "OutFile" : [ ], "PendingChunks" : 0, "Plug" : "CommandScript", "Props" : { "Name" : "TestCommand", "User" : "ruschn", "Cmmt" : "", "CmmtTag" : "", "Dept" : "", "Frames" : "0-2", "Chunk" : 1, "Tasks" : 3, "Grp" : "none", "Pool" : "none", "Pri" : 50, "Conc" : 1, "ConcLimt" : true, "AuxSync" : false, "Int" : false, "Seq" : false, "Reload" : false, "NoEvnt" : false, "OnComp" : 2, "AutoTime" : false, "TimeScrpt" : false, "MinTime" : 0, "MaxTime" : 0, "Timeout" : 1, "Dep" : [ ], "DepFrame" : false, "DepComp" : true, "DepDel" : false, "DepFail" : false, "DepPer" : -1, "NoBad" : false, "JobFailOvr" : false, "JobFailErr" : 0, "TskFailOvr" : false, "TskFailErr" : 0, "SndWarn" : true, "NotOvr" : false, "SndEmail" : false, "NotEmail" : [ ], "NotUser" : [ ], "NotNote" : "", "Limits" : [ ], "ListedSlaves" : [ ], "White" : false, "MachLmt" : 0, "MachLmtProg" : -1, "PrJobScrp" : "", "PoJobScrp" : "", "PrTskScrp" : "", "PoTskScrp" : "", "Schd" : 0, "SchdDays" : 1, "SchdDate" : ISODate("0001-01-01T00:00:00Z"), "SchdDateRan" : ISODate("0001-01-01T00:00:00Z"), "PlugInfo" : { "StartupDirectory" : "/lumalocal" }, "Env" : { }, "EnvOnly" : false, "Ex0" : "", "Ex1" : "", "Ex2" : "", "Ex3" : "", "Ex4" : "", "Ex5" : "", "Ex6" : "", "Ex7" : "", "Ex8" : "", "Ex9" : "", "ExDic" : { } }, "QueuedChunks" : 0, "RenderingChunks" : 0, "Stat" : 2, "SuspendedChunks" : 1, "Tile" : false, "TileFrame" : 0, "TileX" : 0, "TileY" : 0, "_id" : "51086a72962ccc301a015970" } { "_id" : "5109e13e962ccc7dcbb5766e", "LastWriteTime" : ISODate("0001-01-01T00:00:00Z"), "Props" : { "Name" : "Test Job Name", "User" : "ruschn", "Cmmt" : "", "CmmtTag" : "", "Dept" : "", "Frames" : "1-10", "Chunk" : 1, "Tasks" : 10, "Grp" : "none", "Pool" : "none", "Pri" : 50, "Conc" : 1, "ConcLimt" : true, "AuxSync" : false, "Int" : false, "Seq" : false, "Reload" : false, "NoEvnt" : false, "OnComp" : 2, "AutoTime" : false, "TimeScrpt" : false, "MinTime" : 0, "MaxTime" : 0, "Timeout" : 1, "Dep" : [ ], "DepFrame" : false, "DepComp" : true, "DepDel" : false, "DepFail" : false, "DepPer" : -1, "NoBad" : false, "JobFailOvr" : false, "JobFailErr" : 0, "TskFailOvr" : false, "TskFailErr" : 0, "SndWarn" : true, "NotOvr" : false, "SndEmail" : false, "NotEmail" : [ ], "NotUser" : [ "ruschn" ], "NotNote" : "", "Limits" : [ ], "ListedSlaves" : [ ], "White" : false, "MachLmt" : 0, "MachLmtProg" : -1, "PrJobScrp" : "", "PoJobScrp" : "", "PrTskScrp" : "", "PoTskScrp" : "", "Schd" : 0, "SchdDays" : 1, "SchdDate" : ISODate("0001-01-01T00:00:00Z"), "SchdDateRan" : ISODate("0001-01-01T00:00:00Z"), "PlugInfo" : { "picklePath" : "''" }, "Env" : { }, "EnvOnly" : false, "Ex0" : "", "Ex1" : "", "Ex2" : "", "Ex3" : "", "Ex4" : "", "Ex5" : "", "Ex6" : "", "Ex7" : "", "Ex8" : "", "Ex9" : "", "ExDic" : { } }, "IsSub" : true, "Mach" : "ws-082", "Date" : ISODate("2013-01-31T03:13:02.503Z"), "DateStart" : ISODate("0001-01-01T00:00:00Z"), "DateComp" : ISODate("0001-01-01T00:00:00Z"), "Plug" : "LumaJob", "OutDir" : [ ], "OutFile" : [ ], "Tile" : false, "TileFrame" : 0, "TileX" : 0, "TileY" : 0, "Stat" : 1, "Aux" : [ ], "Bad" : [ ], "CompletedChunks" : 0, "QueuedChunks" : 10, "SuspendedChunks" : 0, "RenderingChunks" : 0, "FailedChunks" : 0, "PendingChunks" : 0, "Errs" : 0, "DataSize" : NumberLong(-1) }

The job info file (luma_jobInfo.job) looks like this:

Plugin=LumaJob Frames=1-10

The plugin info file (luma_pluginInfo.job) is even simpler:

binPath=/foo/bar/spangle

If I look in the repo, I can see that the job directories are being created, and that any auxiliary paths I pass are copied in.

Any idea what’s going on here? Is my job info too skeletal? I suspect not, since I would expect deadlinecommand to error in that case, so this seems like a bug.

Hey Nathan,

We think we’ve figured out what causes this problem. There was a previous issue where the LastWriteTime field wasn’t written to the database. We fixed that, but now we are seeing that field with its default value, and coincidentally falls outside the query the Monitor uses to get the jobs (we initially query for all jobs that have been changed since the default value, but not including it). That’s why it never shows up in the Monitor.

While we were trying to figure out how this could happen, we realized we were setting the LastWriteTime in a separate thread. If deadlinecommand exits before that thread updated the database, then the LastWriteTime field won’t get set.

This should be fixed in beta 12.

So in short, it’s a bug, and you aren’t doing anything wrong.

Cheers,

  • Ryan

Thanks Ryan, that’s good to know. In the interim, is there any way for me to work around this to get my jobs to show up?

Interesting update: I started a slave, which then proceeded to try and pick up a task from the previously invisible job. The task failed, but the job now shows up in the Monitor, and there are now 3 items in the Mongo Jobs collection:

{ "Aux" : [ "commandsfile.txt" ], "Bad" : [ ], "CompletedChunks" : 2, "Date" : ISODate("2013-01-30T00:33:54.284Z"), "DateComp" : ISODate("0001-01-01T00:00:00Z"), "DateStart" : ISODate("2013-01-30T00:38:35.419Z"), "Errs" : 4, "FailedChunks" : 0, "IsSub" : true, "LastWriteTime" : ISODate("2013-01-31T03:12:06.514Z"), "Mach" : "ws-082", "OutDir" : [ ], "OutFile" : [ ], "PendingChunks" : 0, "Plug" : "CommandScript", "Props" : { "Name" : "TestCommand", "User" : "ruschn", "Cmmt" : "", "CmmtTag" : "", "Dept" : "", "Frames" : "0-2", "Chunk" : 1, "Tasks" : 3, "Grp" : "none", "Pool" : "none", "Pri" : 50, "Conc" : 1, "ConcLimt" : true, "AuxSync" : false, "Int" : false, "Seq" : false, "Reload" : false, "NoEvnt" : false, "OnComp" : 2, "AutoTime" : false, "TimeScrpt" : false, "MinTime" : 0, "MaxTime" : 0, "Timeout" : 1, "Dep" : [ ], "DepFrame" : false, "DepComp" : true, "DepDel" : false, "DepFail" : false, "DepPer" : -1, "NoBad" : false, "JobFailOvr" : false, "JobFailErr" : 0, "TskFailOvr" : false, "TskFailErr" : 0, "SndWarn" : true, "NotOvr" : false, "SndEmail" : false, "NotEmail" : [ ], "NotUser" : [ ], "NotNote" : "", "Limits" : [ ], "ListedSlaves" : [ ], "White" : false, "MachLmt" : 0, "MachLmtProg" : -1, "PrJobScrp" : "", "PoJobScrp" : "", "PrTskScrp" : "", "PoTskScrp" : "", "Schd" : 0, "SchdDays" : 1, "SchdDate" : ISODate("0001-01-01T00:00:00Z"), "SchdDateRan" : ISODate("0001-01-01T00:00:00Z"), "PlugInfo" : { "StartupDirectory" : "/lumalocal" }, "Env" : { }, "EnvOnly" : false, "Ex0" : "", "Ex1" : "", "Ex2" : "", "Ex3" : "", "Ex4" : "", "Ex5" : "", "Ex6" : "", "Ex7" : "", "Ex8" : "", "Ex9" : "", "ExDic" : { } }, "QueuedChunks" : 0, "RenderingChunks" : 0, "Stat" : 2, "SuspendedChunks" : 1, "Tile" : false, "TileFrame" : 0, "TileX" : 0, "TileY" : 0, "_id" : "51086a72962ccc301a015970" } { "Aux" : [ ], "Bad" : [ ], "CompletedChunks" : 0, "DataSize" : NumberLong(-1), "Date" : ISODate("2013-01-31T03:13:02.503Z"), "DateComp" : ISODate("0001-01-01T00:00:00Z"), "DateStart" : ISODate("2013-01-31T22:04:15.027Z"), "Errs" : 1, "FailedChunks" : 0, "IsSub" : true, "LastWriteTime" : ISODate("2013-01-31T22:04:17.570Z"), "Mach" : "ws-082", "OutDir" : [ ], "OutFile" : [ ], "PendingChunks" : 0, "Plug" : "LumaJob", "Props" : { "Name" : "Test Job Name", "User" : "ruschn", "Cmmt" : "", "CmmtTag" : "", "Dept" : "", "Frames" : "1-10", "Chunk" : 1, "Tasks" : 10, "Grp" : "none", "Pool" : "none", "Pri" : 50, "Conc" : 1, "ConcLimt" : true, "AuxSync" : false, "Int" : false, "Seq" : false, "Reload" : false, "NoEvnt" : false, "OnComp" : 2, "AutoTime" : false, "TimeScrpt" : false, "MinTime" : 0, "MaxTime" : 0, "Timeout" : 1, "Dep" : [ ], "DepFrame" : false, "DepComp" : true, "DepDel" : false, "DepFail" : false, "DepPer" : -1, "NoBad" : false, "JobFailOvr" : false, "JobFailErr" : 0, "TskFailOvr" : false, "TskFailErr" : 0, "SndWarn" : true, "NotOvr" : false, "SndEmail" : false, "NotEmail" : [ ], "NotUser" : [ "ruschn" ], "NotNote" : "", "Limits" : [ ], "ListedSlaves" : [ ], "White" : false, "MachLmt" : 0, "MachLmtProg" : -1, "PrJobScrp" : "", "PoJobScrp" : "", "PrTskScrp" : "", "PoTskScrp" : "", "Schd" : 0, "SchdDays" : 1, "SchdDate" : ISODate("0001-01-01T00:00:00Z"), "SchdDateRan" : ISODate("0001-01-01T00:00:00Z"), "PlugInfo" : { "picklePath" : "''" }, "Env" : { }, "EnvOnly" : false, "Ex0" : "", "Ex1" : "", "Ex2" : "", "Ex3" : "", "Ex4" : "", "Ex5" : "", "Ex6" : "", "Ex7" : "", "Ex8" : "", "Ex9" : "", "ExDic" : { } }, "QueuedChunks" : 10, "RenderingChunks" : 0, "Stat" : 1, "SuspendedChunks" : 0, "Tile" : false, "TileFrame" : 0, "TileX" : 0, "TileY" : 0, "_id" : "5109e13e962ccc7dcbb5766e" } { "_id" : "5109e326962ccc0802ae33fa", "LastWriteTime" : ISODate("0001-01-01T00:00:00Z"), "Props" : { "Name" : "Untitled", "User" : "ruschn", "Cmmt" : "", "CmmtTag" : "", "Dept" : "", "Frames" : "1-10", "Chunk" : 1, "Tasks" : 10, "Grp" : "none", "Pool" : "none", "Pri" : 50, "Conc" : 1, "ConcLimt" : true, "AuxSync" : false, "Int" : false, "Seq" : false, "Reload" : false, "NoEvnt" : false, "OnComp" : 2, "AutoTime" : false, "TimeScrpt" : false, "MinTime" : 0, "MaxTime" : 0, "Timeout" : 1, "Dep" : [ ], "DepFrame" : false, "DepComp" : true, "DepDel" : false, "DepFail" : false, "DepPer" : -1, "NoBad" : false, "JobFailOvr" : false, "JobFailErr" : 0, "TskFailOvr" : false, "TskFailErr" : 0, "SndWarn" : true, "NotOvr" : false, "SndEmail" : false, "NotEmail" : [ ], "NotUser" : [ "ruschn" ], "NotNote" : "", "Limits" : [ ], "ListedSlaves" : [ ], "White" : false, "MachLmt" : 0, "MachLmtProg" : -1, "PrJobScrp" : "", "PoJobScrp" : "", "PrTskScrp" : "", "PoTskScrp" : "", "Schd" : 0, "SchdDays" : 1, "SchdDate" : ISODate("0001-01-01T00:00:00Z"), "SchdDateRan" : ISODate("0001-01-01T00:00:00Z"), "PlugInfo" : { "picklePath" : "/Volumes/sv-dev01/devRepo/ruschn" }, "Env" : { }, "EnvOnly" : false, "Ex0" : "", "Ex1" : "", "Ex2" : "", "Ex3" : "", "Ex4" : "", "Ex5" : "", "Ex6" : "", "Ex7" : "", "Ex8" : "", "Ex9" : "", "ExDic" : { } }, "IsSub" : true, "Mach" : "ws-082", "Date" : ISODate("2013-01-31T03:21:10.287Z"), "DateStart" : ISODate("0001-01-01T00:00:00Z"), "DateComp" : ISODate("0001-01-01T00:00:00Z"), "Plug" : "LumaJob", "OutDir" : [ ], "OutFile" : [ ], "Tile" : false, "TileFrame" : 0, "TileX" : 0, "TileY" : 0, "Stat" : 1, "Aux" : [ "foo.exr", "bar.exr" ], "Bad" : [ ], "CompletedChunks" : 0, "QueuedChunks" : 10, "SuspendedChunks" : 0, "RenderingChunks" : 0, "FailedChunks" : 0, "PendingChunks" : 0, "Errs" : 0, "DataSize" : NumberLong(408960) }

As a side question, what’s the easiest way to remove these corrupt entries from the DB? I think I could do it manually using db.Jobs.remove(), but I’m wondering if there is some ancillary data elsewhere in the DB that corresponds to this job, and if that might break the logical structure somehow (e.g. an app can find one piece but not the other, so it fails).

Ok, this gets weirder:

So after I started the slave, then shut it down, I had 2 jobs in the queue (both suspended), but 3 items in the Jobs collection.

I started the slave again, and it basically cloned the second job in the queue (the one that was previously invisible), and started trying to work on that.

I deleted that job from the queue, and now the Jobs collection only has 2 entries in it, both of which appear to be complete and valid.

Privacy | Site terms | Cookie preferences