I have a plugin that I’m trying to debug while rendering on a network. I use MXS calls to DeadlineUtil.LogMessage to get slaves to log messages while this happens.
It seems that individual calls to that function are very slow. I can only get about 2 messages out per second. Since I’m printing lots of debug info (hundreds of unique prints), the tasks have started to fail simply because the 5 minute startup timeout I’ve given them has passed while printing my messages. Obviously one solution is to increase the startup timeout value, but is there another way to log messages to deadline in a non-blocking manner? I’m assuming LogMessage is slow because each call connects to the repository, sends the message, and then waits for a response (that’s my guess, anyways). Is there a way to just toss messages into the void and hope for the best, rather than waiting for a full handshake each time?
Edit: on further comparisons of log timestamps on various rendered jobs, it seems the speed of the function is pretty variable…sometimes 10+ messages are successfully logged within a second, sometimes it takes a full second for two messages in a row. Assuming my handshake theory is correct, maybe this is due to network traffic or some other factor? Or maybe the call to GetCOREInterface()->Log() is the culprit?
I went digging around and while “LogMessage” is defined in a number of places, the one you are calling is definitely within Lightning.
In digging into the C++ there, we’re piping the text back to the Python half of the plugin in “[repo]/plugins/3dsmax.py” and it seems to be grabbed by “PollUntilComplete()” function in that file. What’s being sent through the pipe is “FUNCTION: STDOUT: [your message here]” and is being handled by this code:
match = self.FunctionRegex.Match( request )
if( match.Success ):
# Call the lightning function handler method to see if we should reply or if the render is finished.
reply = ""
try:
reply = self.LightingFunctionHandler( match.Groups[ 1 ].Value )
if( reply != "" ):
self.MaxSocket.Send( reply )
except Exception as e:
self.Plugin.FailRender( e.Message )
continue
I’ll likely have to hand this off to the dev team… Because the actual output your sending is captured by the “self.StdoutRegex” regular expression, I don’t think you can batch your messages with newlines or anything clever there.
Thanks for the insight…so my hunch was probably correct about the cause of the slowdown then…
And yea, unfortunately even if sending groups of lines in a batch was a possibility, it wouldn’t be practical here, because the purpose of my logging is to narrow down plugin crashes…so I have to log things as they happen, not at the end in a single go (in case a crash happens before a batch of messages is able to be sent, thereby negating their purpose).
Hmm. I have another idea to work around this. Does Deadline throw a proper error with what you’re trying to debug? If that’s the case, we tack on a log file in the “Error” section (it’s the only plugin we do this for).
You can see the path by looking for “Network log file:” in a render log. I believe there are other Max API calls you can make to write out there. That might help figure this out and get you higher bandwidth.
No, no proper error for the crashes. Also I need the logs for general purpose debugging of other things as well, that don’t necessarily cause crashes…like if a render is taking a particularly long time (but still finishes), I log various timing info about processes in my plugin so it’s easier to figure out where the slowdown is occurring.
Unfortunately the only solution that I think will work, is to find a way to send messages through lightning or some other script interface, that don’t wait for a return response.
Hmm. The bad news is we’re absolutely sending the text through lightning’s TCP pipe from that function call and that’s what’s blocking at the moment. I’m not sure why the message handling is being delayed, but we have to keep finding a triage possibility here because the developer investigation into this is going to take awhile.
The slowdown is happening somewhere inside of PollUntilComplete() and it seems like we have one blocking function that blocks for about 500 milliseconds here:
request = self.MaxSocket.Receive( 500 )
That’s practically eons for local loopback traffic… I wonder, could you try dropping that to “100” or less and see if that makes any difference? Please back up “[repo]/plugins/3dsmax/3dsmax.py” before doing that just in case there are negative consequences I’m not considering.
Another angle would be trying to just dump to Max’s standard output and hoping it drops into the Deadline log anyway. I’m assuming you’ve tried that already, but I thought I’d ask.
Interesting…I missed that function call on first glance.
Quick question before trying to modify that file…can I call those functions directly from my own python interpreter or must they be called by lightning itself? Maybe I can bypass the maxscript call to DeadlineUtil altogether, and just call that python function with a modified timeout? I haven’t studied the code too closely yet to see if that’s feasible…you might know more offhand in the meantime…