AWS Thinkbox Discussion Forums

getCOREInterface() error

[code]Error

Error in StartJob: 3dsmax: Caught an exception in GetCOREInterface()->LoadFromFile(): Error reading stream.
NetWorkerPreLoad. curJobname: C:\Users\mike.SFS\AppData\Local\Thinkbox\Deadline6\slave\BEASTMASTER\plugins\52b8d552b5b7e12950d863ea\deadlineStartupMax2014.max; init: 0
2013/12/31 12:46:29 DBG: [00384] [05844] in NetWorkerPreLoad. calling PostInitMessageSystem()
2013/12/31 12:46:29 DBG: [00384] [05844] in NetWorkerPreLoad. srv_pid: 0
2013/12/31 12:46:29 DBG: [00384] [05844] leaving NetWorkerPreLoad. LoadLib()
2013/12/31 12:46:29 DBG: [00384] [05844] NetRenderPreLoad passed[/code]

Not sure if ithis is deadline related however we have one machine that can’t render any max jobs. The precursor to this was a database corruption which was fixed by a --repair command. Not sure exactly what the error was but I suspect it was 32bit related as it crashed when it broke the 2GB database.# file limit. Repair recondensed it to only a couple hundred MBs. But now this slave returns this error even after a slave restart. I haven’t gotten in contact with the user yet to see if I can reboot their machine but hopefully that will be a functional solution. Just thought I would toss out one more failure scenario.

Also when the database went down all of our rendering machines stalled while rendering. I would expect the slaves to fail more gracefully when they lose connection to the server. E.g. stop rendering.

Have you had the opportunity to restart this machine yet to see if that helps?

Could you explain what you mean by “stalled” in this case? Did the slaves crash out, or did they get stuck in a loop of repeatedly trying to connect to the db, or was it something else? Could you send a log from the session where this occurred?

We don’t want the slave to stop rendering if it can’t connect to the db because if it’s temporary, we want to avoid losing any render time. So the desired behavior in this case is for the slave to keep rendering while it repeatedly tries to reestablish communication with the db. If the slave didn’t else in this case, we’ll need to investigate.

Thanks!
Ryan

The slaves continued rendering until 99.9% and then 3ds Max just hung. We just had the same behavior when our license expired in mid-render. The slaves reached 99.9%/100.0% but wouldn’t write the file/release.

The rebooted slave that was erroring was fixed by rebooting.

I’m not sure if this is the “stalled behavior” but it probably is:

13-12-24 06:06:31: 2013/12/24 02:49:56 WRN: [02600] [03792] Gamma Correction Settings are Being Changed 2013-12-24 06:06:31: 2013/12/24 02:49:57 INF: [02600] [03792] Done loading file: C:/Users/renderadmin/AppData/Local/Thinkbox/Deadline6/slave/RENDER-I7-01/jobsData/52b8d374b5b7e104903c592a/L3_SPYDR_EXT_0040_A02.01.max 2013-12-24 06:06:31: 2013/12/24 02:49:57 INF: [02600] [03792] SYSTEM: Production renderer is changed to V-Ray Adv 3.05.03. Previous messages are cleared. 2013-12-24 06:06:31: 2013/12/24 06:06:25 ERR: [02600] [03096] [V-Ray] [VFBCore::SetRegion] Error writing render region to raw image file. [RIFile:] Error writing channel buffer. [RIFile::WTChan] Error writing tag header. [RIFile::WriteTagHeader] Error writing to stream (64). 2013-12-24 06:06:31: 2013/12/24 06:06:25 ERR: [02600] [03792] [V-Ray] Cannot write output image file "W:\13022 L3 SPYDR Videos\Shots\EXT\0040\Renders\ANIM\Plane\A02.01\L3_SPYDR_EXT_0040_A02.01.0576.vrimg" 2013-12-24 06:06:31: ) 2013-12-24 06:06:31: at Deadline.Plugins.ScriptPlugin.RenderTasks(String taskId, Int32 startFrame, Int32 endFrame, String& outMessage, AbortLevel& abortLevel) 2013-12-24 06:06:31: RenderPluginException.Cause: JobError (2) 2013-12-24 06:06:31: RenderPluginException.Level: Major (1) 2013-12-24 06:06:31: RenderPluginException.HasSlaveLog: True 2013-12-24 06:06:31: Exception.Data: ( ) 2013-12-24 06:06:31: Exception.TargetSite: Void RenderTask(System.String, Int32, Int32) 2013-12-24 06:06:31: Exception.Source: deadline 2013-12-24 06:06:31: Exception.StackTrace: 2013-12-24 06:06:31: at Deadline.Plugins.Plugin.RenderTask(String taskId, Int32 startFrame, Int32 endFrame) 2013-12-24 06:06:31: at Deadline.Slaves.SlaveRenderThread.RenderCurrentTask(TaskLogWriter tlw) 2013-12-24 06:06:31: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2013-12-24 06:06:32: Error occurred while writing report log: 2013-12-24 06:06:32: Exception Details 2013-12-24 06:06:32: IOException -- The specified network name is no longer available. 2013-12-24 06:06:32: Exception.Data: ( ) 2013-12-24 06:06:32: Exception.TargetSite: Void WinIOError(Int32, System.String) 2013-12-24 06:06:32: Exception.Source: mscorlib 2013-12-24 06:06:32: Exception.StackTrace: 2013-12-24 06:06:32: at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath) 2013-12-24 06:06:32: at System.IO.File.InternalCopy(String sourceFileName, String destFileName, Boolean overwrite) 2013-12-24 06:06:32: at Deadline.StorageDB.JobStorage.WriteJobReportFile(Report report, String reportLog) 2013-12-24 06:06:33: An error occurred while saving job report: An error occurred while trying to connect to the Database (SFS-File:27017). It is possible that the Mongo Database server is incorrectly configured, currently offline, blocked by a firewall, or experiencing network issues. 2013-12-24 06:06:33: Full error: Unable to connect to server SFS-File:27017: No connection could be made because the target machine actively refused it 192.168.94.101:27017. (FranticX.Database.DatabaseConnectionException) 2013-12-24 06:06:33: Events plugin names could not be collected from the repository because: The specified network name is no longer available. 2013-12-24 06:06:33: (System.IO.IOException) 2013-12-24 06:06:33: WARNING: Exception in CleanupDeadlineEventListener: Python Error: AttributeError : 'NotifyEventListener' object has no attribute 'Cleanup' (Python.Runtime.PythonException) 2013-12-24 06:06:33: Stack Trace: 2013-12-24 06:06:33: [' File "none", line 31, in CleanupDeadlineEventListener\n'] 2013-12-24 06:06:33: (System.Exception) 2013-12-24 06:06:33: at FranticX.Scripting.PythonNetScriptEngine.HandlePythonError(Exception e) 2013-12-24 06:06:33: at FranticX.Scripting.PythonNetScriptEngine.CallFunction(String functionName, PyObject[] args) 2013-12-24 06:06:33: at Deadline.Scripting.DeadlineScriptManager.CallFunction(String scopeName, String functionName, PyObject[] args) 2013-12-24 06:06:33: at Deadline.Events.DeadlineEventPlugin.Dispose() 2013-12-24 06:06:33: [stack trace (maximumDepth=4)] FranticX.Diagnostics.Trace2.WriteStack line 0 2013-12-24 06:06:33: Deadline.Events.DeadlineEventPlugin.Dispose line 0 2013-12-24 06:06:33: Deadline.Events.DeadlineEventManager.LoadEventListeners line 0 2013-12-24 06:06:33: Deadline.Events.DeadlineEventManager.OnJobError line 0 2013-12-24 06:06:33: Error occurred while writing report log: 2013-12-24 06:06:33: Exception Details 2013-12-24 06:06:33: IOException -- The specified network name is no longer available. 2013-12-24 06:06:33: Exception.Data: ( ) 2013-12-24 06:06:33: Exception.TargetSite: Void WinIOError(Int32, System.String) 2013-12-24 06:06:33: Exception.Source: mscorlib 2013-12-24 06:06:33: Exception.StackTrace: 2013-12-24 06:06:33: at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath) 2013-12-24 06:06:33: at System.IO.File.InternalCopy(String sourceFileName, String destFileName, Boolean overwrite) 2013-12-24 06:06:33: at Deadline.StorageDB.SlaveStorage.WriteSlaveReportFile(Report report, String reportLog) 2013-12-24 06:06:34: An error occurred while saving slave report: An error occurred while trying to connect to the Database (SFS-File:27017). It is possible that the Mongo Database server is incorrectly configured, currently offline, blocked by a firewall, or experiencing network issues. 2013-12-24 06:06:34: Full error: Unable to connect to server SFS-File:27017: No connection could be made because the target machine actively refused it 192.168.94.101:27017. (FranticX.Database.DatabaseConnectionException) 2013-12-24 06:06:35: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2013-12-24 06:06:35: Exception Details 2013-12-24 06:06:35: SocketException -- No connection could be made because the target machine actively refused it 192.168.94.101:27017 2013-12-24 06:06:35: SocketException.ErrorCode: 10061 (No connection could be made because the target machine actively refused it) 2013-12-24 06:06:35: SocketException.SocketErrorCode: ConnectionRefused (10061) 2013-12-24 06:06:35: Win32Exception.NativeErrorCode: 10061 2013-12-24 06:06:35: Exception.Data: ( ) 2013-12-24 06:06:35: Exception.TargetSite: Void DoConnect(System.Net.EndPoint, System.Net.SocketAddress) 2013-12-24 06:06:35: Exception.Source: System 2013-12-24 06:06:35: Exception.StackTrace: 2013-12-24 06:06:35: at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress) 2013-12-24 06:06:35: at System.Net.Sockets.Socket.Connect(EndPoint remoteEP) 2013-12-24 06:06:35: at System.Net.Sockets.TcpClient.Connect(IPEndPoint remoteEP) 2013-12-24 06:06:35: at MongoDB.Driver.Internal.MongoConnection.Open() 2013-12-24 06:06:35: at MongoDB.Driver.Internal.MongoConnection.SendMessage(MongoRequestMessage message, WriteConcern writeConcern, String databaseName) 2013-12-24 06:06:35: at MongoDB.Driver.Internal.MongoConnection.RunCommand(String databaseName, QueryFlags queryFlags, CommandDocument command, Boolean throwOnError) 2013-12-24 06:06:35: at MongoDB.Driver.MongoServerInstance.Ping(MongoConnection connection) 2013-12-24 06:06:35: at MongoDB.Driver.MongoServerInstance.Connect() 2013-12-24 06:06:35: at MongoDB.Driver.Internal.DirectMongoServerProxy.Connect(TimeSpan timeout, ReadPreference readPreference) 2013-12-24 06:06:35: MongoConnectionException -- Unable to connect to server SFS-File:27017: No connection could be made because the target machine actively refused it 192.168.94.101:27017. 2013-12-24 06:06:35: Exception.Data: ( System.Collections.DictionaryEntry ) 2013-12-24 06:06:35: Exception.TargetSite: Void Connect(System.TimeSpan, MongoDB.Driver.ReadPreference) 2013-12-24 06:06:35: Exception.Source: MongoDB.Driver 2013-12-24 06:06:35: Exception.StackTrace: 2013-12-24 06:06:35: at MongoDB.Driver.Internal.DirectMongoServerProxy.Connect(TimeSpan timeout, ReadPreference readPreference) 2013-12-24 06:06:35: at MongoDB.Driver.Internal.DirectMongoServerProxy.ChooseServerInstance(ReadPreference readPreference) 2013-12-24 06:06:35: at MongoDB.Driver.MongoServer.AcquireConnection(MongoDatabase database, ReadPreference readPreference) 2013-12-24 06:06:35: at MongoDB.Driver.MongoCursorEnumerator`1.AcquireConnection() 2013-12-24 06:06:35: at MongoDB.Driver.MongoCursorEnumerator`1.GetFirst() 2013-12-24 06:06:35: at MongoDB.Driver.MongoCursorEnumerator`1.MoveNext() 2013-12-24 06:06:35: at System.Linq.Enumerable.FirstOrDefault[TSource](IEnumerable`1 source) 2013-12-24 06:06:35: at Deadline.StorageDB.MongoDB.MongoJobStorage.GetJobReports(String jobID) 2013-12-24 06:06:35: DatabaseConnectionException -- An error occurred while trying to connect to the Database (SFS-File:27017). It is possible that the Mongo Database server is incorrectly configured, currently offline, blocked by a firewall, or experiencing network issues. 2013-12-24 06:06:35: Full error: Unable to connect to server SFS-File:27017: No connection could be made because the target machine actively refused it 192.168.94.101:27017. 2013-12-24 06:06:35: Exception.Data: ( ) 2013-12-24 06:06:35: Exception.TargetSite: Void HandleException(MongoDB.Driver.MongoServer, System.Exception) 2013-12-24 06:06:35: Exception.Source: deadline 2013-12-24 06:06:35: Exception.StackTrace: 2013-12-24 06:06:35: at Deadline.StorageDB.MongoDB.MongoDBUtils.HandleException(MongoServer server, Exception ex) 2013-12-24 06:06:35: at Deadline.StorageDB.MongoDB.MongoJobStorage.GetJobReports(String jobID) 2013-12-24 06:06:35: at Deadline.Slaves.SlaveSchedulerThread.ShouldReportTaskFailed(Job job, Task task) 2013-12-24 06:06:35: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2013-12-24 06:06:38: Scheduler Thread - Exception occurred while trying to requeue task. 2013-12-24 06:06:38: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2013-12-24 06:06:38: Exception Details 2013-12-24 06:06:38: SocketException -- No connection could be made because the target machine actively refused it 192.168.94.101:27017 2013-12-24 06:06:38: SocketException.ErrorCode: 10061 (No connection could be made because the target machine actively refused it) 2013-12-24 06:06:38: SocketException.SocketErrorCode: ConnectionRefused (10061) 2013-12-24 06:06:38: Win32Exception.NativeErrorCode: 10061 2013-12-24 06:06:38: Exception.Data: ( ) 2013-12-24 06:06:38: Exception.TargetSite: Void DoConnect(System.Net.EndPoint, System.Net.SocketAddress) 2013-12-24 06:06:38: Exception.Source: System 2013-12-24 06:06:38: Exception.StackTrace: 2013-12-24 06:06:38: at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress) 2013-12-24 06:06:38: at System.Net.Sockets.Socket.Connect(EndPoint remoteEP) 2013-12-24 06:06:38: at System.Net.Sockets.TcpClient.Connect(IPEndPoint remoteEP) 2013-12-24 06:06:38: at MongoDB.Driver.Internal.MongoConnection.Open() 2013-12-24 06:06:38: at MongoDB.Driver.Internal.MongoConnection.SendMessage(MongoRequestMessage message, WriteConcern writeConcern, String databaseName) 2013-12-24 06:06:38: at MongoDB.Driver.Internal.MongoConnection.RunCommand(String databaseName, QueryFlags queryFlags, CommandDocument command, Boolean throwOnError) 2013-12-24 06:06:38: at MongoDB.Driver.MongoServerInstance.Ping(MongoConnection connection) 2013-12-24 06:06:38: at MongoDB.Driver.MongoServerInstance.Connect() 2013-12-24 06:06:38: at MongoDB.Driver.Internal.DirectMongoServerProxy.Connect(TimeSpan timeout, ReadPreference readPreference) 2013-12-24 06:06:38: MongoConnectionException -- Unable to connect to server SFS-File:27017: No connection could be made because the target machine actively refused it 192.168.94.101:27017. 2013-12-24 06:06:38: Exception.Data: ( System.Collections.DictionaryEntry ) 2013-12-24 06:06:38: Exception.TargetSite: Void Connect(System.TimeSpan, MongoDB.Driver.ReadPreference) 2013-12-24 06:06:38: Exception.Source: MongoDB.Driver 2013-12-24 06:06:38: Exception.StackTrace: 2013-12-24 06:06:38: at MongoDB.Driver.Internal.DirectMongoServerProxy.Connect(TimeSpan timeout, ReadPreference readPreference) 2013-12-24 06:06:38: at MongoDB.Driver.Internal.DirectMongoServerProxy.ChooseServerInstance(ReadPreference readPreference) 2013-12-24 06:06:38: at MongoDB.Driver.MongoServer.AcquireConnection(MongoDatabase database, ReadPreference readPreference) 2013-12-24 06:06:38: at MongoDB.Driver.MongoCursorEnumerator`1.AcquireConnection() 2013-12-24 06:06:38: at MongoDB.Driver.MongoCursorEnumerator`1.GetFirst() 2013-12-24 06:06:38: at MongoDB.Driver.MongoCursorEnumerator`1.MoveNext() 2013-12-24 06:06:38: at System.Linq.Enumerable.FirstOrDefault[TSource](IEnumerable`1 source) 2013-12-24 06:06:38: at MongoDB.Driver.MongoCollection.RunCommandAs(Type commandResultType, IMongoCommand command) 2013-12-24 06:06:38: at MongoDB.Driver.MongoCollection.RunCommandAs[TCommandResult](IMongoCommand command) 2013-12-24 06:06:38: at MongoDB.Driver.MongoCollection.FindAndModify(IMongoQuery query, IMongoSortBy sortBy, IMongoUpdate update, IMongoFields fields, Boolean returnNew, Boolean upsert) 2013-12-24 06:06:38: at MongoDB.Driver.MongoCollection.FindAndModify(IMongoQuery query, IMongoSortBy sortBy, IMongoUpdate update, Boolean returnNew, Boolean upsert) 2013-12-24 06:06:38: at MongoDB.Driver.MongoCollection.FindAndModify(IMongoQuery query, IMongoSortBy sortBy, IMongoUpdate update, Boolean returnNew) 2013-12-24 06:06:38: at Deadline.StorageDB.MongoDB.MongoJobStorage.ChangeTaskStatus(String jobID, Task task, TaskStatus newStatus, String slaveName) 2013-12-24 06:06:38: DatabaseConnectionException -- An error occurred while trying to connect to the Database (SFS-File:27017). It is possible that the Mongo Database server is incorrectly configured, currently offline, blocked by a firewall, or experiencing network issues. 2013-12-24 06:06:38: Full error: Unable to connect to server SFS-File:27017: No connection could be made because the target machine actively refused it 192.168.94.101:27017. 2013-12-24 06:06:38: Exception.Data: ( ) 2013-12-24 06:06:38: Exception.TargetSite: Void HandleException(MongoDB.Driver.MongoServer, System.Exception) 2013-12-24 06:06:38: Exception.Source: deadline 2013-12-24 06:06:38: Exception.StackTrace: 2013-12-24 06:06:38: at Deadline.StorageDB.MongoDB.MongoDBUtils.HandleException(MongoServer server, Exception ex) 2013-12-24 06:06:38: at Deadline.StorageDB.MongoDB.MongoJobStorage.ChangeTaskStatus(String jobID, Task task, TaskStatus newStatus, String slaveName) 2013-12-24 06:06:38: at Deadline.Controllers.DataController.RequeueTask(Job job, Task task) 2013-12-24 06:06:38: at Deadline.Slaves.SlaveSchedulerThread.RequeueTask(Job currentJob, Task task) 2013-12-24 06:06:38: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2013-12-24 06:06:38: Scheduler Thread - It is likely that the Slave cannot connect to the Repository, which means that the network may be down or the Repository machine is offline. 2013-12-24 06:06:38: Scheduler Thread - The Slave cannot continue until this operation has completed successfully. 2013-12-24 06:06:38: Scheduler Thread - Waiting 20 seconds before retrying... 2013-12-24 06:06:43: Slave - An error occurred while updating the slave's info: An error occurred while trying to connect to the Database (SFS-File:27017). It is possible that the Mongo Database server is incorrectly configured, currently offline, blocked by a firewall, or experiencing network issues. 2013-12-24 06:06:43: Full error: Unable to connect to server SFS-File:27017: No connection could be made because the target machine actively refused it 192.168.94.101:27017. (FranticX.Database.DatabaseConnectionException) 2013-12-24 06:08:22: Slave - An error occurred while updating the slave's info: An error occurred while trying to connect to the Database (SFS-File:27017). It is possible that the Mongo Database server is incorrectly configured, currently offline, blocked by a firewall, or experiencing network issues. 2013-12-24 06:08:22: Full error: Unable to connect to server SFS-File:27017: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 192.168.94.101:27017. (FranticX.Database.DatabaseConnectionException) 2013-12-24 06:08:43: Scheduler Thread - Exception occurred while trying to requeue task. 2013-12-24 06:08:43: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2013-12-24 06:08:43: Exception Details 2013-12-24 06:08:43: SocketException -- A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 192.168.94.101:27017 2013-12-24 06:08:43: SocketException.ErrorCode: 10060 (A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond) 2013-12-24 06:08:43: SocketException.SocketErrorCode: TimedOut (10060) 2013-12-24 06:08:43: Win32Exception.NativeErrorCode: 10060 2013-12-24 06:08:43: Exception.Data: ( ) 2013-12-24 06:08:43: Exception.TargetSite: Void DoConnect(System.Net.EndPoint, System.Net.SocketAddress) 2013-12-24 06:08:43: Exception.Source: System 2013-12-24 06:08:43: Exception.StackTrace: 2013-12-24 06:08:43: at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress) 2013-12-24 06:08:43: at System.Net.Sockets.Socket.Connect(EndPoint remoteEP) 2013-12-24 06:08:43: at System.Net.Sockets.TcpClient.Connect(IPEndPoint remoteEP) 2013-12-24 06:08:43: at MongoDB.Driver.Internal.MongoConnection.Open() 2013-12-24 06:08:43: at MongoDB.Driver.Internal.MongoConnection.SendMessage(MongoRequestMessage message, WriteConcern writeConcern, String databaseName) 2013-12-24 06:08:43: at MongoDB.Driver.Internal.MongoConnection.RunCommand(String databaseName, QueryFlags queryFlags, CommandDocument command, Boolean throwOnError) 2013-12-24 06:08:43: at MongoDB.Driver.MongoServerInstance.Ping(MongoConnection connection) 2013-12-24 06:08:43: at MongoDB.Driver.MongoServerInstance.Connect() 2013-12-24 06:08:43: at MongoDB.Driver.Internal.DirectMongoServerProxy.Connect(TimeSpan timeout, ReadPreference readPreference) 2013-12-24 06:08:43: MongoConnectionException -- Unable to connect to server SFS-File:27017: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 192.168.94.101:27017. 2013-12-24 06:08:43: Exception.Data: ( System.Collections.DictionaryEntry ) 2013-12-24 06:08:43: Exception.TargetSite: Void Connect(System.TimeSpan, MongoDB.Driver.ReadPreference) 2013-12-24 06:08:43: Exception.Source: MongoDB.Driver 2013-12-24 06:08:43: Exception.StackTrace: 2013-12-24 06:08:43: at MongoDB.Driver.Internal.DirectMongoServerProxy.Connect(TimeSpan timeout, ReadPreference readPreference) 2013-12-24 06:08:43: at MongoDB.Driver.Internal.DirectMongoServerProxy.ChooseServerInstance(ReadPreference readPreference) 2013-12-24 06:08:43: at MongoDB.Driver.MongoServer.AcquireConnection(MongoDatabase database, ReadPreference readPreference) 2013-12-24 06:08:43: at MongoDB.Driver.MongoCursorEnumerator`1.AcquireConnection() 2013-12-24 06:08:43: at MongoDB.Driver.MongoCursorEnumerator`1.GetFirst() 2013-12-24 06:08:43: at MongoDB.Driver.MongoCursorEnumerator`1.MoveNext() 2013-12-24 06:08:43: at System.Linq.Enumerable.FirstOrDefault[TSource](IEnumerable`1 source) 2013-12-24 06:08:43: at MongoDB.Driver.MongoCollection.RunCommandAs(Type commandResultType, IMongoCommand command) 2013-12-24 06:08:43: at MongoDB.Driver.MongoCollection.RunCommandAs[TCommandResult](IMongoCommand command) 2013-12-24 06:08:43: at MongoDB.Driver.MongoCollection.FindAndModify(IMongoQuery query, IMongoSortBy sortBy, IMongoUpdate update, IMongoFields fields, Boolean returnNew, Boolean upsert) 2013-12-24 06:08:43: at MongoDB.Driver.MongoCollection.FindAndModify(IMongoQuery query, IMongoSortBy sortBy, IMongoUpdate update, Boolean returnNew, Boolean upsert) 2013-12-24 06:08:43: at MongoDB.Driver.MongoCollection.FindAndModify(IMongoQuery query, IMongoSortBy sortBy, IMongoUpdate update, Boolean returnNew) 2013-12-24 06:08:43: at Deadline.StorageDB.MongoDB.MongoJobStorage.ChangeTaskStatus(String jobID, Task task, TaskStatus newStatus, String slaveName) 2013-12-24 06:08:43: DatabaseConnectionException -- An error occurred while trying to connect to the Database (SFS-File:27017). It is possible that the Mongo Database server is incorrectly configured, currently offline, blocked by a firewall, or experiencing network issues. 2013-12-24 06:08:43: Full error: Unable to connect to server SFS-File:27017: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 192.168.94.101:27017. 2013-12-24 06:08:43: Exception.Data: ( ) 2013-12-24 06:08:43: Exception.TargetSite: Void HandleException(MongoDB.Driver.MongoServer, System.Exception) 2013-12-24 06:08:43: Exception.Source: deadline 2013-12-24 06:08:43: Exception.StackTrace: 2013-12-24 06:08:43: at Deadline.StorageDB.MongoDB.MongoDBUtils.HandleException(MongoServer server, Exception ex) 2013-12-24 06:08:43: at Deadline.StorageDB.MongoDB.MongoJobStorage.ChangeTaskStatus(String jobID, Task task, TaskStatus newStatus, String slaveName) 2013-12-24 06:08:43: at Deadline.Controllers.DataController.RequeueTask(Job job, Task task) 2013-12-24 06:08:43: at Deadline.Slaves.SlaveSchedulerThread.RequeueTask(Job currentJob, Task task) 2013-12-24 06:08:43: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2013-12-24 06:08:43: Scheduler Thread - It is likely that the Slave cannot connect to the Repository, which means that the network may be down or the Repository machine is offline. 2013-12-24 06:08:43: Scheduler Thread - The Slave cannot continue until this operation has completed successfully. 2013-12-24 06:08:43: Scheduler Thread - Waiting 20 seconds before retrying... 2013-12-24 06:09:24: Slave - An error occurred while updating the slave's info: An error occurred while trying to connect to the Database (SFS-File:27017). It is possible that the Mongo Database server is incorrectly configured, currently offline, blocked by a firewall, or experiencing network issues. 2013-12-24 06:09:24: Full error: Unable to connect to server SFS-File:27017: No connection could be made because the target machine actively refused it 192.168.94.101:27017. (FranticX.Database.DatabaseConnectionException) 2013-12-24 06:09:26: Scheduler Thread - Exception occurred while trying to requeue task. 2013-12-24 06:09:26: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2013-12-24 06:09:26: Exception Details 2013-12-24 06:09:26: SocketException -- No connection could be made because the target machine actively refused it 192.168.94.101:27017 2013-12-24 06:09:26: SocketException.ErrorCode: 10061 (No connection could be made because the target machine actively refused it) 2013-12-24 06:09:26: SocketException.SocketErrorCode: ConnectionRefused (10061) 2013-12-24 06:09:26: Win32Exception.NativeErrorCode: 10061 2013-12-24 06:09:26: Exception.Data: ( ) 2013-12-24 06:09:26: Exception.TargetSite: Void DoConnect(System.Net.EndPoint, System.Net.SocketAddress) 2013-12-24 06:09:26: Exception.Source: System 2013-12-24 06:09:26: Exception.StackTrace: 2013-12-24 06:09:26: at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress) 2013-12-24 06:09:26: at System.Net.Sockets.Socket.Connect(EndPoint remoteEP) 2013-12-24 06:09:26: at System.Net.Sockets.TcpClient.Connect(IPEndPoint remoteEP) 2013-12-24 06:09:26: at MongoDB.Driver.Internal.MongoConnection.Open() 2013-12-24 06:09:26: at MongoDB.Driver.Internal.MongoConnection.SendMessage(MongoRequestMessage message, WriteConcern writeConcern, String databaseName) 2013-12-24 06:09:26: at MongoDB.Driver.Internal.MongoConnection.RunCommand(String databaseName, QueryFlags queryFlags, CommandDocument command, Boolean throwOnError) 2013-12-24 06:09:26: at MongoDB.Driver.MongoServerInstance.Ping(MongoConnection connection) 2013-12-24 06:09:26: at MongoDB.Driver.MongoServerInstance.Connect() 2013-12-24 06:09:26: at MongoDB.Driver.Internal.DirectMongoServerProxy.Connect(TimeSpan timeout, ReadPreference readPreference) 2013-12-24 06:09:26: MongoConnectionException -- Unable to connect to server SFS-File:27017: No connection could be made because the target machine actively refused it 192.168.94.101:27017. 2013-12-24 06:09:26: Exception.Data: ( System.Collections.DictionaryEntry ) 2013-12-24 06:09:26: Exception.TargetSite: Void Connect(System.TimeSpan, MongoDB.Driver.ReadPreference) 2013-12-24 06:09:26: Exception.Source: MongoDB.Driver 2013-12-24 06:09:26: Exception.StackTrace: 2013-12-24 06:09:26: at MongoDB.Driver.Internal.DirectMongoServerProxy.Connect(TimeSpan timeout, ReadPreference readPreference) 2013-12-24 06:09:26: at MongoDB.Driver.Internal.DirectMongoServerProxy.ChooseServerInstance(ReadPreference readPreference) 2013-12-24 06:09:26: at MongoDB.Driver.MongoServer.AcquireConnection(MongoDatabase database, ReadPreference readPreference) 2013-12-24 06:09:26: at MongoDB.Driver.MongoCursorEnumerator`1.AcquireConnection() 2013-12-24 06:09:26: at MongoDB.Driver.MongoCursorEnumerator`1.GetFirst() 2013-12-24 06:09:26: at MongoDB.Driver.MongoCursorEnumerator`1.MoveNext() 2013-12-24 06:09:26: at System.Linq.Enumerable.FirstOrDefault[TSource](IEnumerable`1 source) 2013-12-24 06:09:26: at MongoDB.Driver.MongoCollection.RunCommandAs(Type commandResultType, IMongoCommand command) 2013-12-24 06:09:26: at MongoDB.Driver.MongoCollection.RunCommandAs[TCommandResult](IMongoCommand command) 2013-12-24 06:09:26: at MongoDB.Driver.MongoCollection.FindAndModify(IMongoQuery query, IMongoSortBy sortBy, IMongoUpdate update, IMongoFields fields, Boolean returnNew, Boolean upsert) 2013-12-24 06:09:26: at MongoDB.Driver.MongoCollection.FindAndModify(IMongoQuery query, IMongoSortBy sortBy, IMongoUpdate update, Boolean returnNew, Boolean upsert) 2013-12-24 06:09:26: at MongoDB.Driver.MongoCollection.FindAndModify(IMongoQuery query, IMongoSortBy sortBy, IMongoUpdate update, Boolean returnNew) 2013-12-24 06:09:26: at Deadline.StorageDB.MongoDB.MongoJobStorage.ChangeTaskStatus(String jobID, Task task, TaskStatus newStatus, String slaveName) 2013-12-24 06:09:26: DatabaseConnectionException -- An error occurred while trying to connect to the Database (SFS-File:27017). It is possible that the Mongo Database server is incorrectly configured, currently offline, blocked by a firewall, or experiencing network issues. 2013-12-24 06:09:26: Full error: Unable to connect to server SFS-File:27017: No connection could be made because the target machine actively refused it 192.168.94.101:27017. 2013-12-24 06:09:26: Exception.Data: ( ) 2013-12-24 06:09:26: Exception.TargetSite: Void HandleException(MongoDB.Driver.MongoServer, System.Exception) 2013-12-24 06:09:26: Exception.Source: deadline 2013-12-24 06:09:26: Exception.StackTrace: 2013-12-24 06:09:26: at Deadline.StorageDB.MongoDB.MongoDBUtils.HandleException(MongoServer server, Exception ex) 2013-12-24 06:09:26: at Deadline.StorageDB.MongoDB.MongoJobStorage.ChangeTaskStatus(String jobID, Task task, TaskStatus newStatus, String slaveName) 2013-12-24 06:09:26: at Deadline.Controllers.DataController.RequeueTask(Job job, Task task) 2013-12-24 06:09:26: at Deadline.Slaves.SlaveSchedulerThread.RequeueTask(Job currentJob, Task task) 2013-12-24 06:09:26: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2013-12-24 06:09:26: Scheduler Thread - It is likely that the Slave cannot connect to the Repository, which means that the network may be down or the Repository machine is offline. 2013-12-24 06:09:26: Scheduler Thread - The Slave cannot continue until this operation has completed successfully. 2013-12-24 06:09:26: Scheduler Thread - Waiting 20 seconds before retrying... 2013-12-24 06:09:39: Slave - An error occurred while updating the slave's info: An error occurred while trying to connect to the Database (SFS-File:27017). It is possible that the Mongo Database server is incorrectly configured, currently offline, blocked by a firewall, or experiencing network issues. 2013-12-24 06:09:39: Full error: Unable to connect to server SFS-File:27017: No connection could be made because the target machine actively refused it 192.168.94.101:27017. (FranticX.Database.DatabaseConnectionException) 2013-12-24 06:09:49: Scheduler Thread - Exception occurred while trying to requeue task.

Thanks for the log. It appears that Max threw an error and Deadline caught it, but of course it’s unable to requeue the task because it can’t connect to the database. So eventually when the db came back online, it should have requeued the task and then started functioning normally again. Was the slave left running during the repair to confirm this?

Hi,
FYI.
Although I’m sure there are other issues here which you guys are already discussing regarding the connection/disconnection of the DB, I’d just add that Deadline v6.1 doesn’t yet support VRay v3.0 and the next line in the below VRay log message provided, I have seen before, when a user accidentally leaves the “region” checkbox enabled in the VRay VFB.

2013-12-24 06:06:31: 2013/12/24 02:49:57 INF: [02600] [03792] SYSTEM: Production renderer is changed to V-Ray Adv 3.05.03. Previous messages are cleared. 2013-12-24 06:06:31: 2013/12/24 06:06:25 ERR: [02600] [03096] [V-Ray] [VFBCore::SetRegion] Error writing render region to raw image file. [RIFile:] Error writing channel buffer. [RIFile::WTChan] Error writing tag header. [RIFile::WriteTagHeader] Error writing to stream (64).

Just thought I would mention these 2 issues in case they are the ‘origin’ of the issue.
Mike

I think the max error was initiated though by the DB disconnection since it affected all of our running nodes. Repairing the DB didn’t revive the nodes and I had to restart the slaves to resume rendering.

Yep, just mentioning the VRay log messages as they caught my eye.

On a separate but related note; do you guys have any plans to upgrade your MongoDB machine to x64bit this year to resolve your file limit restriction?

If this really isn’t an option, perhaps you might want to consider the Mongo DB monitoring cloud service which could help you keep track of when the DB is approaching a ‘breakable’ size and pro-actively alert you? BTW: MMS is free: mms.mongodb.com/

The problem is that it’s running on our file-server so taking it down and upgrading to x64 Windows Server would be a complete shop-stopping operation for a day at least. Our plan was to just do it as a seamless ‘failover’ to a brand new machine but the money just isn’t there right now for the upgrade. It’s definitely given me ammunition though to poke and prod for a new system.

Hmmm, I’m actually looking at it right now and it looks like WinServer let’s me execute a dos command in the event that a service fails. I could just enter the --repair command and have it auto-restart the service.

The MongoDB --repair command is a blocking operation and as a DB admin command, 10gen recommend that is is always done during a ‘maintenance’ window as it terminates all access to the DB whilst it runs. Personally, in a production environment, I wouldn’t automate this admin-only command on a DB. To improve redundancy of a MongoDB, this would ideally be done via Replica Sets:
docs.mongodb.org/manual/replication/

Privacy | Site terms | Cookie preferences