Hello Ryan,
Deadline 3.1 SP1 is crashing a lot with while rendering maya files. We’re getting the infamous “Send/Don’t Send” Microsoft error message on different slaves and while rendering different maya projects. “mayabatch.exe” usually stays in the process memory and though reserving a lot of unnecessary memory but the slave usually continue on running until we click the “send” or the “don’t send” button but since the memory is already taken the new job never finishes.
When I click to see additional details about the problem I get these infos:
AppName: deadlineslave.exe AppVer: 3.1.0.36430 AppStamp:4a312eb1
ModName: unknown ModVer: 0.0.0.0 ModStamp:00000000
fDebug: 0 Offset: 000000001b987530
I don’t know if these information could help if not can you please advise what to send you the next time this happens.
Thanks
Hey there,
Ryan’s actually out to Siggraph for the rest of the week, but the rest of us should hopefully be able to give you a hand
Could you send us the slave log of one of these crashes? You can find the logs by selecting “Explore Log Folder” under the Help menu of the Slave in question.
Cheers,
Hi,
I’ve attached the log file
Thank you for your help
Hey there,
I took a look at the log file you attached to your last post, and the following line stuck out a bit:
2009-08-04 11:39:57: Info Thread - Cancelling task because task filename "\\uranium\deadline3\jobs\005_050_999_28a4dab6\Rendering\005_050_999_28a4dab6_00028_141-145.Render-108" could not be found, it was likely requeued
I’m assuming that this job/task wasn’t actually requeued? If it wasn’t, then this was probably due to a hiccup in the Network, and the Slave thought it was requeued when it actually wasn’t – this is probably why the job isn’t completing.
However, this shouldn’t be crashing the Slave, even if this is the issue. Do you get any other kind of error report when the Slave crashes? I’m a bit baffled at the lack of info in the slave log…
Hi,
I’ve seen this error too and as you said I guess it shouldn’t be crashing the slave but this did happen today on a slave that was disconnected from the network accidentally but I thought it could be only a coincidence.
Anyway the only other error log that I have is the windows error log located in the Windows Error message; I don’t have all the info right now but I’ll send it to you the next time it happens and I’ve already sent you the first three lines that contains the memory address where it stopped and the application stamp when the error happend (I don’t know if it helps)
AppName: deadlineslave.exe AppVer: 3.1.0.36430 AppStamp:4a312eb1
ModName: unknown ModVer: 0.0.0.0 ModStamp:00000000
fDebug: 0 Offset: 000000001b987530
Is there any other info I can gather from deadline?
Is this something that’s consistently happening? Or has this only happened once or twice?
A couple of months ago I’ve noticed it at least 20 times on different machines. I’ve been rebooting the render nodes twice a day since and this seemed to eliminate the problem but now it’s coming back.
Unless I’m mistaken, I believe we have recently changed the slave to handle disconnects from the repository, and general network spottiness more robustly. I will have to do some testing to see if these crashes still occur in our internal development version. In the meantime, if you do see this kind of crash happen again, it would be good to check the slave logs and confirm that this is indeed what’s causing the issue, and that they didn’t simply happen to coincide this one time.
Cheers,