AWS Thinkbox Discussion Forums

log delay

Hi there,

We are seeing a considerable delay between log lines being printed, vs when they are logged (and timestamped) by deadline. Our internal logging system happens have its own timestamps, and its almost never in sync with deadline’s… We noticed this recently, it used to be perfect previously. Not sure when the change happened.

For example, the deadline printout:

Capture1.PNG

The internal logs for the same lines:

Capture2.PNG

Note the ~6-8 second delay. Our logging system, when it detects deadline simply does a regular std out print as well (in addition to the file-logging), so that deadline can also register its logs. They are done at the same time from the same thread.

Any ideas?

cheers
l

Note that the delay varies and increases over time, its around 14 seconds here (same task):

deadline log:

internal log:

This looks like a job for… the dev team at SIGGRAPH? You should hunt down Jon if you’re headed out.

I’ll send an internal e-mail to see if I can figure this out over here.

Just so I’m following, this what’s being written into the Slave’s log files or are you streaming them over TCP remote logging session?

Sorry, i somehow missed your response… i was not at siggraph though, so didnt get a chance to follow up in person.
This was through the monitor’s task logs feature, not connecting live.

I’m checking my e-mail, but I’m not seeing a response.

I guess to gather some more info here, the delay is in the UI and not with the logging which is written to disk? I don’t know enough about the LogWriter framework in Deadline, but it handles all of the log writing and also pushing that information to various consumers (that includes log streaming from the Slave). I’ll notify folks on this one too.

The delay is visible in the timestamps already, and i believe in the files as well.

I need a little more information to try and reproduce this on my end. Can you show me a bit of what your logging system looks like? You mentioned it is writing to a log file, and writing to standard out when it “detects Deadline”. Can you give me an idea of what that looks like? Is it a script? Is it part of a custom Deadline plugin?

Also, can you give us an idea of how this is impacting your workflow?

For this case (3dsmax job), we have a 2 layered logging system:

  • maxscript logger
  • python core logger module

On the top, you have the maxscript logger module that is used in all our maxscript libraries. It does a few 3dsmax specific operations, then relays the raw messages to our python modules for disk logging.

The ‘deadline’ detection is done in 2 ways.
For the python logger, its simply using an env variable: when we submit jobs to deadline, we add a ‘farm mode’ env variable to them. The logging module detects this, and when its present, it prints the log lines both to the offline files and the stdout as well. This is done for all job types using the core logger, not just 3dsmax.

Our maxscript logger looks for the “DeadlineUtil” struct to detect the ‘render environment’. First it sends the log lines to our python core logger to save them to disk, and then if “DeadlineUtil != undefined”, it also sends them to deadline.

So the order is:

  1. mxs log function is called
  2. mxs function sends the log message to our python logger, which writes it into a disk file for later review
  3. it detects the DeadlineUtil struct, so it logs the message to deadline using: DeadlineUtil.LogMessage ("scLog: " + (level as string) + " " + message)

Step #2 is what you see in the attachment with the white background in the post, and a timestamp of 15:03:09, and step #3 is what you see with the black background and a timestamp of 15:03:23 in the log.

The impact is felt during troubleshooting, when we are trying to figure out what went wrong for a particular job, which script failed and why. If there is a random offset in timestamps, it makes the order of operations hard to determine - and the logs unreliable for troubleshooting.

It feels like there is some buffered throttling in deadline’s log handling somewhere, but instead of logging the time of the lines coming in, its logging the time when the lines are going out.

Any ideas on how to resolve this? Here is another example ( we spent a few hours troubleshooting an unrelated script, because we trusted the deadline log timestamps, when it was not showing the right times ):

Deadline log:

2018-01-03 03:09:26:  0: INFO: scLog: debug Checking: \\inferno2\projects\fall\scenes\kca_000_0560\cache\common\rigcache\chrcrane01a_1\v0006_ske_bryan_approved_cranes\pc2\mesh_m_high_armfeatheruppertop_018.pc2 index: 0
2018-01-03 03:09:26:  0: INFO: scLog: debug finding item
2018-01-03 03:09:27:  0: INFO: scLog: debug iUDIMIndex: 0
2018-01-03 03:09:27:  0: INFO: scLog: debug Could not find filename in localization list and its not a UDIM neither: \\inferno2\projects\fall\scenes\kca_000_0560\cache\common\rigcache\chrcrane01a_1\v0006_ske_bryan_approved_cranes\pc2\mesh_m_high_armfeatheruppertop_018.pc2
2018-01-03 03:09:28:  0: INFO: scLog: debug t2: 1t3: 1t4: 115733
2018-01-03 03:09:28:  0: INFO: scLog: debug Checking: \\inferno2\projects\fall\scenes\kca_000_0560\cache\common\rigcache\chrcrane01a_10\v0006_ske_bryan_approved_cranes\pc2\mesh_m_high_armfeathermid_015.pc2 index: 0
2018-01-03 03:09:29:  0: INFO: scLog: debug finding item
2018-01-03 03:09:29:  0: INFO: scLog: debug iUDIMIndex: 0
2018-01-03 03:09:30:  0: INFO: scLog: debug Could not find filename in localization list and its not a UDIM neither: \\inferno2\projects\fall\scenes\kca_000_0560\cache\common\rigcache\chrcrane01a_10\v0006_ske_bryan_approved_cranes\pc2\mesh_m_high_armfeathermid_015.pc2
2018-01-03 03:09:30:  0: INFO: scLog: debug t2: 1t3: 1t4: 115734
2018-01-03 03:09:31:  0: INFO: scLog: debug Checking: \\inferno2\projects\fall\scenes\kca_000_0560\cache\common\rigcache\chrcrane01a_14\v0006_ske_bryan_approved_cranes\pc2\mesh_m_high_armfeatherlower_046.pc2 index: 0
2018-01-03 03:09:31:  0: INFO: scLog: debug finding item
2018-01-03 03:09:32:  0: INFO: scLog: debug iUDIMIndex: 0
2018-01-03 03:09:32:  0: INFO: scLog: debug Could not find filename in localization list and its not a UDIM neither: \\inferno2\projects\fall\scenes\kca_000_0560\cache\common\rigcache\chrcrane01a_14\v0006_ske_bryan_approved_cranes\pc2\mesh_m_high_armfeatherlower_046.pc2
2018-01-03 03:09:33:  0: INFO: scLog: debug t2: 1t3: 1t4: 115734
2018-01-03 03:09:33:  0: INFO: scLog: debug Checking: \\inferno2\projects\fall\scenes\kca_000_0560\cache\common\rigcache\chrcrane01a_14\v0006_ske_bryan_approved_cranes\pc2\mesh_m_high_armfeatheruppertop_020.pc2 index: 0
2018-01-03 03:09:34:  0: INFO: scLog: debug finding item
2018-01-03 03:09:34:  0: INFO: scLog: debug iUDIMIndex: 0
2018-01-03 03:09:35:  0: INFO: scLog: debug Could not find filename in localization list and its not a UDIM neither: \\inferno2\projects\fall\scenes\kca_000_0560\cache\common\rigcache\chrcrane01a_14\v0006_ske_bryan_approved_cranes\pc2\mesh_m_high_armfeatheruppertop_020.pc2
2018-01-03 03:09:35:  0: INFO: scLog: debug t2: 1t3: 1t4: 115735

Our internal on-disk log for the same log lines:

Wed Jan 03 03:07:47 2018 (+15055ms) : debug : Checking: \\inferno2\projects\fall\scenes\kca_000_0560\cache\common\rigcache\chrcrane01a_1\v0006_ske_bryan_approved_cranes\pc2\mesh_m_high_armfeatheruppertop_018.pc2 index: 0
Wed Jan 03 03:07:47 2018 (+0ms) : debug : finding item
Wed Jan 03 03:07:47 2018 (+0ms) : debug : iUDIMIndex: 0
Wed Jan 03 03:07:47 2018 (+0ms) : debug : Could not find filename in localization list and its not a UDIM neither: \\inferno2\projects\fall\scenes\kca_000_0560\cache\common\rigcache\chrcrane01a_1\v0006_ske_bryan_approved_cranes\pc2\mesh_m_high_armfeatheruppertop_018.pc2
Wed Jan 03 03:07:47 2018 (+0ms) : debug : t2: 1t3: 1t4: 115733
Wed Jan 03 03:07:47 2018 (+0ms) : debug : Checking: \\inferno2\projects\fall\scenes\kca_000_0560\cache\common\rigcache\chrcrane01a_10\v0006_ske_bryan_approved_cranes\pc2\mesh_m_high_armfeathermid_015.pc2 index: 0
Wed Jan 03 03:07:47 2018 (+0ms) : debug : finding item
Wed Jan 03 03:07:47 2018 (+0ms) : debug : iUDIMIndex: 0
Wed Jan 03 03:07:47 2018 (+0ms) : debug : Could not find filename in localization list and its not a UDIM neither: \\inferno2\projects\fall\scenes\kca_000_0560\cache\common\rigcache\chrcrane01a_10\v0006_ske_bryan_approved_cranes\pc2\mesh_m_high_armfeathermid_015.pc2
Wed Jan 03 03:07:47 2018 (+0ms) : debug : t2: 1t3: 1t4: 115734
Wed Jan 03 03:07:47 2018 (+1ms) : debug : Checking: \\inferno2\projects\fall\scenes\kca_000_0560\cache\common\rigcache\chrcrane01a_14\v0006_ske_bryan_approved_cranes\pc2\mesh_m_high_armfeatherlower_046.pc2 index: 0
Wed Jan 03 03:07:47 2018 (+0ms) : debug : finding item
Wed Jan 03 03:07:47 2018 (+0ms) : debug : iUDIMIndex: 0
Wed Jan 03 03:07:47 2018 (+0ms) : debug : Could not find filename in localization list and its not a UDIM neither: \\inferno2\projects\fall\scenes\kca_000_0560\cache\common\rigcache\chrcrane01a_14\v0006_ske_bryan_approved_cranes\pc2\mesh_m_high_armfeatherlower_046.pc2
Wed Jan 03 03:07:47 2018 (+0ms) : debug : t2: 1t3: 1t4: 115734
Wed Jan 03 03:07:47 2018 (+0ms) : debug : Checking: \\inferno2\projects\fall\scenes\kca_000_0560\cache\common\rigcache\chrcrane01a_14\v0006_ske_bryan_approved_cranes\pc2\mesh_m_high_armfeatheruppertop_020.pc2 index: 0
Wed Jan 03 03:07:47 2018 (+0ms) : debug : finding item
Wed Jan 03 03:07:47 2018 (+0ms) : debug : iUDIMIndex: 0
Wed Jan 03 03:07:47 2018 (+0ms) : debug : Could not find filename in localization list and its not a UDIM neither: \\inferno2\projects\fall\scenes\kca_000_0560\cache\common\rigcache\chrcrane01a_14\v0006_ske_bryan_approved_cranes\pc2\mesh_m_high_armfeatheruppertop_020.pc2
Wed Jan 03 03:07:47 2018 (+0ms) : debug : t2: 1t3: 1t4: 115735

There is a whole 2 minute offset in time…

Could this be caused by us using this feature:

Whats worse is that deadline seems to be throttling the 3dsmax render process while its feeding the data to the slave window… (so we have renders timing out)

I’ll make a quick app to test this and see what falls out. I didn’t expect the log streaming to have that effect. I’m a little concerned if we start throwing away lines, but I’ve found a tricky way to speed up regex handling in a different app that I’ll discuss with the dev team today.

This is unrelated to log streaming as much as i can tell. Deadline simply logs the incoming stdout to disk & its internal log at a different time then when it gets the lines.

Any news on this? Its making troubleshooting pretty hard … :-\

Yeah, we’ve figured this one out! It looks like there was an artificial pause in the standard output handling loop. I’m thinking it was added to try and curb busy waiting, but it ended up limiting things down to 20 lines per second. There’s a merge in for review, so we’re waiting on that. I’m not sure if this actually affected render performance as the sandbox seems to spool up lines to work.

In addition, I’ll need to see how we push this back to 8.0 for you.

The best hack around this would be to somehow filter superfluous lines before it makes it to the stdout handlers.

Thats great news Edwin thanks!

Fun thing over here. Would it need to be back ported all the way to 8.0? Considering this is the first big consequence of this bug we’ve seen over the years I just want to make sure we’re not back porting something that won’t benefit most users. It will get fixed, but I’d like to make sure we’re being efficient with how much build/test/deploy time we’re spending.

Funny you should ask… We started our d10 integration again today (finally black panther is out the door), and have a window to update over the next 1-2 weeks. So fingers crossed the answer is no haha

Nice! Well, if push comes to shove we can make it work for you guys but it’ll save some work for our resident buildmeister.

Privacy | Site terms | Cookie preferences