Why am I getting these from time to time?
An error occurred in StartJob(): 3dsmax: Exception caught in 3ds max: simple_socket::receive: Invalid packet size (given 1231515247)
Why am I getting these from time to time?
An error occurred in StartJob(): 3dsmax: Exception caught in 3ds max: simple_socket::receive: Invalid packet size (given 1231515247)
This can happen if the socket connection between 3dsmax and Deadline becomes corrupt. This can happen for various reasons, and unfortunately not much can be done about it when it occurs.
Cheers,
Ok, just asking as I’ve seen this for the first time.
Yeah, it’s pretty rare when it does happen. Just glad to hear that you don’t run into it very often.
Well, the reason why I’m asking is that this is the first time I started seeing the error, but, it started to show rather frequently, all of a sudden:
Does it only happen for certain jobs, and are they particularly heavy job? Also, are there any other errors besides this one, or are they all the same “packet size” error type?
Most of them are the Packet Size errors. Two or three, tops, were issues with the Lightning.dlx (also a first) and some scripting.
Weird stuff. The jobs are rather computationally intensive, but not memory or network intensive at all. The files are about 50MB each, one frame takes about an hour or so…
I actually get these errors quite a lot.
Lukas - Are you running Pulse on the same server which is acting as the Deadline repository?
Yes, but I certainly have absolutely no intentions of running Pulse on any other machine.
The location where Pulse is running shouldn’t matter. The socket connection is the local socket connection between the Deadline Slave and the instance of 3dsmax running on the machine.
I’m curious as to why you (Lukas) started seeing this all of a sudden. I wonder if it’s at all related to the types of jobs you’re currently rendering. You did mention they were CPU intensive, but I guess I would be surprised if this affected the socket connection. Now if the case was that 3dsmax’s memory is somehow becoming corrupt, then I could see this happening…
My thoughts were that our Pulse is presently throttling the slave - “Slave Throttle Limit” = 20. It used to be higher, around 40, then 30 and I have found the number of socket connection errors has reduced as a result of lowering this setting. YMMV? So, I was thinking under heavy load, perhaps 3dsMax is doing something weird? As Ryan said, perhaps this is totally un-related. Either way, I see this error message more frequently when our farm is under a very heavy load for weeks on end.
I should add that the other error we regularly get during this heavy period is the “lightning.dlx” error which Lukas mentioned earlier in this thread.
HTH,
Mike
I don’t know, but I’m seeing more weird stuff with Deadline on the last bunch of shots. I see, for example, tasks being rendered for hours, but they still show about 3 minutes in the progress, as if they were constantly being rendered over and over, but I don’t see the finished frames for those tasks on the HDD. And also didn’t receive any error message.
Weird stuff I tell you!
Yes, that’s my case here. I’ve been heavily rendering 3ds Max jobs for about 10 days non-stop now. 100% utilization on all CPUs.
Hmm… this is just weird.
On one particular job on several particular tasks I see this error:
An error occurred in StartJob(): 3dsmax startup: Did not receive expected token from lightning plugin. (got "") - check the install of lightningMax.dlx
2011/08/30 18:42:21 WRN: MAXScript Auto-load Script Error - C:\duber\3ds Max\2011_x64\scripts\Load_puppetShop.ms Exception: -- Runtime error: Error setting current directory: "C:\Program Files\Autodesk\3ds Max 2011\scripts\Puppetshop"
2011/08/30 18:42:21 WRN: MAXScript Auto-load Script Error - C:\duber\3ds Max\2011_x64\scripts\Tactic_reg.ms Exception: -- Type error: FilterString [String to filter] [Tokens] requires String, got: undefined
2011/08/30 18:42:25 DBG: Starting network
2011/08/30 18:42:26 INF: Loaded C:\Users\loocas\AppData\Local\Thinkbox\Deadline\slave\plugins\deadlineStartupMax2011.max
2011/08/30 18:42:26 INF: SYSTEM: Production renderer is changed to Default Scanline Renderer. Previous messages are cleared.
2011/08/30 18:42:26 INF: Job: C:\Users\loocas\AppData\Local\Thinkbox\Deadline\slave\plugins\deadlineStartupMax2011.max
Why?
Same job, other task:
An error occurred in StartJob(): 3dsmax: Exception caught in 3ds max: simple_socket::receive: Invalid packet size (given 1231515247)
Any idea why you’re getting those maxscript auto-load errors? Not sure if they’re related though. As Mike mentioned, maybe all these issues are related to the current heavy load on your infrastructure. I just realized now, but all your errors seem to be occurring when the 3dsmax job is starting up. This is when 3dsmax is loaded, the scene file is loaded, and likely other assets are pulled from the network. I recall you saying that you never submit your scene files with the job, so that means that the slaves need to load these scenes over the network, and if they’re really large files it could definitely impact things.
If you don’t have Pulse Throttling enabled, maybe it’s something worth looking into:
thinkboxsoftware.com/deadlin … ring_Pulse
You can use throttling to limit the number of machines that are loading a job at a given time. This may help…
Hmm… I’ll try that. But I doubt this could be the bottleneck. Even though I don’t use anything extra, the network is a 1gbps and the files I’m rendering are about 50MB, plus the textures are about 200MB total (about 50 files).
I’ll see if I get these errors any more when Throttling enabled.
Thanks for the tip.
Hi Guys,
Most interesting thread for me as I think Lukas has stumbled on the same issues I was having 6 months ago. Ryan - I can send you the various ticket ID’s if you want to cross-reference some of my previous notes to you. I think we might have a couple of similar themed but different issues here.
OK, here’s my thoughts:
0: INFO: Loading 3dsmax scene file
0: An exception occurred: An error occurred in StartJob(): 3dsmax: Exception caught in 3ds max: simple_socket::receive: Invalid packet size (given 1231515247)
My initial thoughts (yesterday) is that Pulse was releasing the task process (loading the 3dsMax scene file) too early and the data file hadn’t quite finished copying over to the local slave. Not sure, need Ryan’s input on this issue. I think Ryan is correct and “Slave Throttling” isn’t the solution here; it does however, help to ‘semi’ fix the issue by trying to reduce the number of occurrences of this situation. However, I don’t think its the correct solution to these issues.
I think the MAXScript errors that Lukas is seeing is as a result of the previous 3dsMax instance having not shutdown cleanly and third-party MXS scripts trying to access a particular 3rd party plugin such as PuppetShop and fails. Also, the “lightning.dlx” errror:
An error occurred in StartJob(): 3dsmax startup: Did not receive expected token from lightning plugin. (got "") - check the install of lightningMax.dlx
I believe is a result of the “WaitForConnection()” function failing (on the slave’s 2nd attempt to run 3dsMax plugin) to access the “lightning.dlx” as its still locked by that “hung” instance of 3dsMax previously. So this can also be fixed by ensuring we kill the particular ProcessID if an instance of 3dsMax goes pop on us.
One last thought; Lukas - have you got Krakatoa installed on any of the slaves which are displaying these issues?
HTH,
Mike
Interesting, but it kinda makes sense.
Doesn’t Deadline check for a 3ds Max process if it’s still active or not?
Nope, I don’t have Krakatoa.
Lukas - OK, cool. I was just covering off a very extreme edge case that I identified to Bobo earlier in the year, regarding the way Krakatoa scripts were setting the “current directory”. Bobo got this all fixed in the internal build at the time and I believe only studios running an old copy of Krakatoa would still be effected and even then, its only if they were doing some custom Deadline coding anyway.