AWS Thinkbox Discussion Forums

monitor showing jobs that don't exist

I noticed that some jobs have been timing out on querying tasks in a monitor thats been open for several days now:

If i open another monitor on another machine, those jobs dont actually show up any more, and probably got cleaned up a couple days ago. Using 7.0.0.36

Thanks for reporting this! Any errors in the monitor log? I’m guessing that the jobs weren’t properly removed from the batched job data properly…

Some:

2014-10-08 11:09:42: Deadline Monitor 7.0 [v7.0.0.36 R  (a009fe449)]
2014-10-08 11:09:45: Time to initialize: 2.480 s
2014-10-08 11:10:10: Auto Configuration: No auto configuration for Repository Path could be detected, using local configuration
2014-10-08 11:10:12: Auto Configuration: Picking configuration based on: lapro3067 / 172.18.8.91
2014-10-08 11:10:12: Auto Configuration: No auto configuration could be detected, using local configuration
2014-10-08 11:10:12: Time to connect to Repository: 1.732 s
2014-10-08 11:10:12: Time to check user account: 16.000 ms
2014-10-08 11:10:12: Time to purge old logs and temp files: 47.000 ms
2014-10-08 11:10:13: Time to synchronize plugin icons: 904.000 ms
2014-10-08 11:10:14: Time to initialize main window: 1.404 s
2014-10-08 11:10:15: Main Window shown
2014-10-08 11:10:15: Time to show main window: 109.000 ms
2014-10-13 09:54:28: Error occurred while updating slave cache: Attempted to read past the end of the stream. (System.IO.EndOfStreamException)
2014-10-13 09:58:16: Error occurred while updating job cache: Unable to connect to a member of the replica set matching the read preference Primary (MongoDB.Driver.MongoConnectionException)
2014-10-13 09:58:18: Error occurred while updating slave reports: An error occurred while trying to connect to the Database (deadline01.scanlinevfxla.com:27017,deadline.scanlinevfxla.com:27017,deadline03.scanlinevfxla.com:27017). It is possible that the Mongo Database server is incorrectly configured, currently offline, blocked by a firewall, or experiencing network issues.
2014-10-13 09:58:18: Full error: Unable to connect to a member of the replica set matching the read preference Primary (FranticX.Database.DatabaseConnectionException)
2014-10-13 09:58:18: Error occurred while updating limit group cache: Unable to connect to a member of the replica set matching the read preference Primary (MongoDB.Driver.MongoConnectionException)
2014-10-13 09:58:19: Error occurred while updating balancer cache: Unable to connect to a member of the replica set matching the read preference Primary (MongoDB.Driver.MongoConnectionException)
2014-10-13 18:57:15: Error occurred while updating slave cache: Timeout waiting for a MongoConnection. (System.TimeoutException)
2014-10-13 18:57:34: Error occurred while updating balancer cache: Timeout waiting for a MongoConnection. (System.TimeoutException)
2014-10-13 18:57:36: Error occurred while updating limit group cache: Timeout waiting for a MongoConnection. (System.TimeoutException)
2014-10-13 18:57:37: Error occurred while updating job cache: Timeout waiting for a MongoConnection. (System.TimeoutException)
2014-10-13 18:57:58: Error occurred while updating task cache: An unexpected error occurred while interacting with the database (deadline01.scanlinevfxla.com:27017,deadline.scanlinevfxla.com:27017,deadline03.scanlinevfxla.com:27017):
2014-10-13 18:57:58: Timeout waiting for a MongoConnection. (System.TimeoutException)
2014-10-13 18:57:58:    at MongoDB.Driver.Internal.MongoConnectionPool.AcquireConnection(AcquireConnectionOptions options)
2014-10-13 18:57:58:    at MongoDB.Driver.MongoServerInstance.AcquireConnection()
2014-10-13 18:57:58:    at MongoDB.Driver.MongoServer.AcquireConnection(ReadPreference readPreference)
2014-10-13 18:57:58:    at MongoDB.Driver.MongoCursor`1.MongoCursorConnectionProvider.AcquireConnection()
2014-10-13 18:57:58:    at MongoDB.Driver.Operations.QueryOperation`1.GetFirstBatch(IConnectionProvider connectionProvider)
2014-10-13 18:57:58:    at MongoDB.Driver.Operations.QueryOperation`1.Execute(IConnectionProvider connectionProvider)
2014-10-13 18:57:58:    at Deadline.StorageDB.MongoDB.MongoJobStorage.Internal_GetJobTasks(String jobID, Boolean invalidateCache, String[] fields) (FranticX.Database.DatabaseConnectionException)
2014-10-13 18:58:06: Error occurred while updating pulse cache: Timeout waiting for a MongoConnection. (System.TimeoutException)
2014-10-13 18:58:31: Traceback (most recent call last):
2014-10-13 18:58:31:   File "DeadlineMonitor\UI\Controls\JobListControl.py", line 510, in calculateUsers
2014-10-13 18:58:31: DatabaseConnectionException: An unexpected error occurred while interacting with the database (deadline01.scanlinevfxla.com:27017,deadline.scanlinevfxla.com:27017,deadline03.scanlinevfxla.com:27017):
2014-10-13 18:58:31: Timeout waiting for a MongoConnection. (System.TimeoutException)
2014-10-13 18:58:31:    at MongoDB.Driver.Internal.MongoConnectionPool.AcquireConnection(AcquireConnectionOptions options)
2014-10-13 18:58:31:    at MongoDB.Driver.MongoServerInstance.AcquireConnection()
2014-10-13 18:58:31:    at MongoDB.Driver.MongoServer.AcquireConnection(ReadPreference readPreference)
2014-10-13 18:58:31:    at MongoDB.Driver.MongoCursor`1.MongoCursorConnectionProvider.AcquireConnection()
2014-10-13 18:58:31:    at MongoDB.Driver.Operations.QueryOperation`1.GetFirstBatch(IConnectionProvider connectionProvider)
2014-10-13 18:58:31:    at MongoDB.Driver.Operations.QueryOperation`1.Execute(IConnectionProvider connectionProvider)
2014-10-13 18:58:31:    at Deadline.StorageDB.MongoDB.MongoUserStorage.GetUserNames(Boolean invalidateCache)
2014-10-13 18:58:31:    at Deadline.StorageDB.MongoDB.MongoDBUtils.HandleException(MongoServer server, Exception ex)
2014-10-13 18:58:31:    at Deadline.StorageDB.MongoDB.MongoUserStorage.GetUserNames(Boolean invalidateCache)
2014-10-13 18:58:37: Error occurred while reloading network settings: An unexpected error occurred while interacting with the database (deadline01.scanlinevfxla.com:27017,deadline.scanlinevfxla.com:27017,deadline03.scanlinevfxla.com:27017):
2014-10-13 18:58:37: Timeout waiting for a MongoConnection. (System.TimeoutException)
2014-10-13 18:58:37:    at MongoDB.Driver.Internal.MongoConnectionPool.AcquireConnection(AcquireConnectionOptions options)
2014-10-13 18:58:37:    at MongoDB.Driver.MongoServerInstance.AcquireConnection()
2014-10-13 18:58:37:    at MongoDB.Driver.MongoServer.AcquireConnection(ReadPreference readPreference)
2014-10-13 18:58:37:    at MongoDB.Driver.MongoCursor`1.MongoCursorConnectionProvider.AcquireConnection()
2014-10-13 18:58:37:    at MongoDB.Driver.Operations.QueryOperation`1.GetFirstBatch(IConnectionProvider connectionProvider)
2014-10-13 18:58:37:    at MongoDB.Driver.Operations.QueryOperation`1.Execute(IConnectionProvider connectionProvider)
2014-10-13 18:58:37:    at System.Linq.Enumerable.FirstOrDefault[TSource](IEnumerable`1 source)
2014-10-13 18:58:37:    at MongoDB.Driver.MongoCollection.FindOneAs[TDocument](FindOneArgs args)
2014-10-13 18:58:37:    at Deadline.StorageDB.MongoDB.MongoSettingsStorage.GetNetworkSettings(Boolean invalidateCache) (FranticX.Database.DatabaseConnectionException)

The actual cleanup of the jobs happened at:

2014/10/12 19:13:21 root deadline03.scanlinevfxla.com (deadline03.scanlinevfxla.com\root): Archived completed job '[TEST] TST_000_0000_v0014_cde_deadline_7_cache_fumefx_dbgDeadlineTest_0 [MAXSCRIPT] ’ because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014/10/12 19:17:31 root deadline03.scanlinevfxla.com (deadline03.scanlinevfxla.com\root): Archived completed job '[TEST] TST_000_0000_v0014_cde_deadline_7_images_render3d_dbgDeadlineTest_0 ’ because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014/10/12 19:17:31 root deadline03.scanlinevfxla.com (deadline03.scanlinevfxla.com\root): Archived completed job '[TEST] TST_000_0000_v0013_cde_deadline_7_images_render3d_dbgDeadlineTest_0 ’ because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014/10/13 09:59:54 root deadline03.scanlinevfxla.com (deadline03.scanlinevfxla.com\root): Archived completed job '[GOLD] TST_000_0000_v0003_lse_test_images_render3d_prefTest_0 ’ because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.
2014/10/13 10:02:03 root deadline03.scanlinevfxla.com (deadline03.scanlinevfxla.com\root): Archived completed job '[GOLD] TST_000_0000_v0002_lse_test_images_render3d_prefTest_0 ’ because Auto Job Cleanup is enabled and this job has been complete for more than 10 days.

Note that the jobs cleaned up on the 12th are also still shown in the monitor. So it seems the errors reported on the 13th are unrelated to this updating issue.

Only one of these 5 jobs disappeared from the list: [TEST] TST_000_0000_v0013_cde_deadline_7_images_render3d_dbgDeadlineTest_0

cheers
laszlo

Here is the full job listing of whats in our deadline7 queue right now (shown in the ‘bad’ monitor):

And whats actually in the queue (fresh monitor on another machine):

Thanks! Good to know it’s not specific to just batched jobs.

This happens with 1 out of 5 jobs at least. We only have <20 jobs at a time in the queue, but a visible percentage of them always fall into this category.

Privacy | Site terms | Cookie preferences