I’m having an odd problem with slave crashes on my Ubuntu Linux nodes, same configuration I’ve been using successfully for a while.
- Recently have been using VRay standalone which is when the problem started coming up.
- I’ve been using Maxwell successfully and I don’t think this problem was happening with that plugin.
Here’s the sequence that consistently causes the problem:
- Submit vray standalone job.
- Job starts fine, everybody’s happy.
- At some point, the slave just crashes, without any warnings that I can see, and after a period of time that seems random.
- But, it’s always a number of hours after the slave, and usually the job, has started.
- Job keeps rendering away, finishes, etc. just fine, but the slave doesn’t know about any of that because it isn’t running.
- Launcher keeps running, so I can restart the slave fine when I realize it’s happened.
- Slave shows up in monitor as “stalled, but fixed”.
No warnings or errors are generated that I can see - are there any other logs that dig in deeper than whats available from the slave logs? For example, here’s the last couple lines from the most recent occurrence:
2013-09-11 08:14:01: 0: STDOUT: [2013/Sep/11|08:14:00] Sending 840004 bytes of irradiance map to node2.hypothetical.cgi
2013-09-11 08:14:01: 0: STDOUT: [2013/Sep/11|08:14:00] Sending 840004 bytes of irradiance map to node2.hypothetical.cgi
2013-09-11 08:14:01: 0: STDOUT: [2013/Sep/11|08:14:00] Sending 9916 bytes of irradiance map to node2.hypothetical.cgi
Any ideas or diagnosis tips?
thanks
- Eric