Repository not reachable causes non-recoverable crashes.

benyaboy · December 5, 2011, 5:02pm

If we need to reboot the repository, slaves and monitor can crash. Our repository is mounted via dfs (no replication, one path). So deadline exists on \ourdomain.com\it\deadline which is actually on \server1\deadline$ .

By using domain-based dfs, the namespace server will connect.( eg.\ourdomain.com\it is still running as this is actually \domaincontroller1\it )

So it would be great if you could handle this case. I know it’s annoying but this is a problem with most cross-platform filesystem libraries, including boost::filesystem.

Thanks,
Ben.

rrussell · December 5, 2011, 6:44pm

Hey Ben,

Can you post a Monitor and/or Slave log from a session where this crash occurred? I’m guessing the problem is that to Deadline, it looks like the repository is replaced with another folder structure that it doesn’t know what to do with. We could probably try to replicate this with a simple renaming of the repository folder on a local install, but it would help to see your logs to see where the crashes are originating.

To find the logs folder, just select Help -> Explore Log Folder from any of the Deadline application UIs. You’ll probably have to go back through the logs to find one from when the crash occurred, or you can wait until it happens again and find the most recent log.

Thanks!

Ryan

benyaboy · December 5, 2011, 10:49pm

I’ve attached some logs. Not sure which is the one with the crash.
I would guess that, yes, it assumes that the folder structure has changed, but it is really a “folder going offline.”
deadlineslave(Rn029)-2011-12-05-0000.zip (81.6 KB)