We are getting some very odd behavior with beta10 in the monitor. Machines show up as if they have been rendering for 2 hours, thier cpu usage is at 0. I remote into the box, and its in fact rnedering something completely different, for only a couple of minutes. The last status update column shows something from an hour and a half ago…
A lot of errors in the deadline log…:
2013-11-07 15:13:12: Traceback (most recent call last):
2013-11-07 15:13:12: File “DeadlineMonitor\UI\DisplayWidgets\JobDependencyGraph.py”, line 1369, in createScriptNode
2013-11-07 15:13:12: AttributeError: ‘NoneType’ object has no attribute ‘setElidedTitle’
2013-11-07 15:13:12: Traceback (most recent call last):
2013-11-07 15:13:12: File “DeadlineMonitor\UI\DisplayWidgets\JobDependencyGraph.py”, line 1156, in recursiveNodeCreation
2013-11-07 15:13:12: File “DeadlineMonitor\UI\DisplayWidgets\JobDependencyGraph.py”, line 1034, in recursiveNodeCreation
2013-11-07 15:13:12: File “DeadlineMonitor\UI\DisplayWidgets\JobDependencyGraph.py”, line 1417, in createScriptNode
2013-11-07 15:13:12: AttributeError: ‘NoneType’ object has no attribute ‘boundingRect’
2013-11-07 15:13:12: Traceback (most recent call last):
2013-11-07 15:13:12: File “DeadlineMonitor\UI\DisplayWidgets\JobDependencyGraph.py”, line 1369, in createScriptNode
2013-11-07 15:13:12: AttributeError: ‘NoneType’ object has no attribute ‘setElidedTitle’
2013-11-07 15:13:12: Traceback (most recent call last):
2013-11-07 15:13:12: File “DeadlineMonitor\UI\DisplayWidgets\JobDependencyGraph.py”, line 1156, in recursiveNodeCreation
2013-11-07 15:13:12: File “DeadlineMonitor\UI\DisplayWidgets\JobDependencyGraph.py”, line 1034, in recursiveNodeCreation
2013-11-07 15:13:12: File “DeadlineMonitor\UI\DisplayWidgets\JobDependencyGraph.py”, line 1417, in createScriptNode
2013-11-07 15:13:12: AttributeError: ‘NoneType’ object has no attribute ‘boundingRect’
2013-11-07 15:54:32: Error occurred while updating task cache: Index was outside the bounds of the array. (System.IndexOutOfRangeException)
2013-11-07 16:03:31: Error occurred while updating slave reports: An unexpected error occurred while interacting with the database (deadline.scanlinevfxla.com:27017):
2013-11-07 16:03:31: Timeout waiting for a MongoConnection. (FranticX.Database.DocumentException)
2013-11-07 16:03:31: Error occurred while updating job cache: Timeout waiting for a MongoConnection. (System.TimeoutException)
2013-11-07 16:03:32: Error occurred while updating slave cache: Timeout waiting for a MongoConnection. (System.TimeoutException)
2013-11-07 16:03:37: Error occurred while updating limit group cache: Timeout waiting for a MongoConnection. (System.TimeoutException)
2013-11-07 16:03:42: Error occurred while updating Cloud Instances: An unexpected error occurred while interacting with the database (deadline.scanlinevfxla.com:27017):
2013-11-07 16:03:42: Timeout waiting for a MongoConnection. (FranticX.Database.DocumentException)
2013-11-07 16:03:42: at Deadline.StorageDB.MongoDB.MongoDBUtils.HandleException(MongoServer server, Exception ex)
2013-11-07 16:03:42: at Deadline.StorageDB.MongoDB.MongoCloudStorage.GetCloudRegions(Boolean invalidateCache)
2013-11-07 16:03:42: at Deadline.StorageDB.CloudStorage.UpdateData()
2013-11-07 16:28:48: Error occurred while updating slave reports: An unexpected error occurred while interacting with the database (deadline.scanlinevfxla.com:27017):
2013-11-07 16:28:48: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. (FranticX.Database.DocumentException)
2013-11-07 16:28:49: Error occurred while updating slave cache: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. (System.IO.IOException)
2013-11-07 16:28:50: Error occurred while updating Cloud Instances: An error occurred while trying to connect to the Database (deadline.scanlinevfxla.com:27017). It is possible that the Mongo Database server is incorrectly configured, currently offline, blocked by a firewall, or experiencing network issues.
2013-11-07 16:28:50: Full error: Unable to connect to server deadline.scanlinevfxla.com:27017: No connection could be made because the target machine actively refused it 172.18.2.209:27017. (FranticX.Database.DatabaseConnectionException)
2013-11-07 16:28:50: at Deadline.StorageDB.MongoDB.MongoDBUtils.HandleException(MongoServer server, Exception ex)
2013-11-07 16:28:50: at Deadline.StorageDB.MongoDB.MongoCloudStorage.GetCloudRegions(Boolean invalidateCache)
2013-11-07 16:28:50: at Deadline.StorageDB.CloudStorage.UpdateData()
2013-11-07 16:28:52: Error occurred while updating task cache: An error occurred while trying to connect to the Database (deadline.scanlinevfxla.com:27017). It is possible that the Mongo Database server is incorrectly configured, currently offline, blocked by a firewall, or experiencing network issues.
2013-11-07 16:28:52: Full error: Unable to connect to server deadline.scanlinevfxla.com:27017: No connection could be made because the target machine actively refused it 172.18.2.209:27017. (FranticX.Database.DatabaseConnectionException)
2013-11-07 16:52:57: Connecting to Pulse log
2013-11-07 17:10:24: Error occurred while updating limit group cache: An unexpected error occurred while interacting with the database (deadline.scanlinevfxla.com:27017):
2013-11-07 17:10:24: Server instance deadline.scanlinevfxla.com:27017 is no longer connected. (FranticX.Database.DocumentException)
2013-11-07 17:39:08: Error occurred while updating limit group cache: An unexpected error occurred while interacting with the database (deadline.scanlinevfxla.com:27017):
2013-11-07 17:39:08: Server instance deadline.scanlinevfxla.com:27017 is no longer connected. (FranticX.Database.DocumentException)
2013-11-07 17:57:04: Listener Thread - OnConnect: Listener Socket has been closed.
We’ve got the same problem, except that our slaves aren’t rendering something different, they have just stopped rendering.
For example, the slave is rendering a frame that should take around 30 minutes, but it’s been rendering for 2 hours, CPU usage is at 0%, progress is stuck at 43%, and when I remote onto the slave, Max is just sitting there dead.
From 30 slaves that are currently active on the farm, only 10 are actually rendering, all of the others are just sitting there with the CPU at 0%.
Is it possible to roll back to beta 8?
We weren’t having too many problems with that, and at least the renders got done.
Dave
We rolled back to beta9 overnight, but i believe the only file you need to roll back is the lightning dlx plugins in the plugins/3dsmax folder
Just uploaded some updated lightning files here that should fix the problem:
viewtopic.php?f=86&t=10646&p=46225#p46225