I have a Mac Mini that serves the repository and also acts as one of the nodes. Occasionally it seems to wedge (the GUI fails due to driver misfires, thanks Intel!). This means that I have to powercycle the repository - this machine has been flaky across memory modules, OS revisions and so on. It seems that the graphics drivers are rubbish and I’ve not found a way to restart the entire GUI without killing the machine outright. The other nodes, meanwhile, keep number crunching, but spit out lots of error information such as that below.
When the repository machine comes back up, I can remount the repository under the same path with quick use of ‘rm’ to remove the ‘Phil’ entry that is left lurking in /Volumes and the ‘connect to file server’. That seems to work fine, and the nodes write out their frames without issue.
I am wondering, though, about the requeue behaviour because it seems that the frames are not marked as completed, but are re-queued in DL. Is this configurable/avoidable, or am I imagining the requeue action?
Here’s the console output from a node when the repository falls over :
Exception Details
JobDeletedException – job was deleted, jobDirectory: /Volumes/Phil/Applications/DeadlineRepository/jobs/999_050_999_1ae37186/999_050_999_1ae37186.job
JobDeletedException.JobDirectory: /Volumes/Phil/Applications/DeadlineRepository/jobs/999_050_999_1ae37186/999_050_999_1ae37186.job
Exception.Source: deadline
Exception.TargetSite: Boolean RefreshJob(Deadline.Jobs.Job ByRef, Boolean)
Exception.Data: ( )
Exception.StackTrace:
at Deadline.Storage.JobStorage.RefreshJob (Deadline.Jobs.Job& job, Boolean forceRefresh) [0x00000] in :0
at Deadline.Storage.JobStorage.RefreshJob (Deadline.Jobs.Job& job) [0x00000] in :0
at Deadline.Controllers.JobController.RequeueTask (Deadline.Jobs.Job job, Deadline.Jobs.Task task, Boolean refreshJob) [0x00000] in :0
at Deadline.Controllers.DeadlineController.RequeueTask (Deadline.Jobs.Job job, Deadline.Jobs.Task task, Boolean refreshJob) [0x00000] in :0
at Deadline.Slaves.SlaveSchedulerThread.RequeueTask (Deadline.Jobs.Task task) [0x00000] in :0
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Scheduler Thread - can not continue until this operation is completed successfully
Scheduler Thread - waiting 20 seconds before retrying…
Slave - UnauthorizedAccessException: Failed to update slaveInfo: Access to the path “/Volumes/Phil/Applications” is denied. For more information, see software.primefocusworld.com/sof … or_message.
Scheduler Thread - exception occurred while trying to requeue task.
Exception Details
JobDeletedException – job was deleted, jobDirectory: /Volumes/Phil/Applications/DeadlineRepository/jobs/999_050_999_1ae37186/999_050_999_1ae37186.job
JobDeletedException.JobDirectory: /Volumes/Phil/Applications/DeadlineRepository/jobs/999_050_999_1ae37186/999_050_999_1ae37186.job
Exception.Source: deadline
Exception.TargetSite: Boolean RefreshJob(Deadline.Jobs.Job ByRef, Boolean)
Exception.Data: ( )
Exception.StackTrace:
at Deadline.Storage.JobStorage.RefreshJob (Deadline.Jobs.Job& job, Boolean forceRefresh) [0x00000] in :0
at Deadline.Storage.JobStorage.RefreshJob (Deadline.Jobs.Job& job) [0x00000] in :0
at Deadline.Controllers.JobController.RequeueTask (Deadline.Jobs.Job job, Deadline.Jobs.Task task, Boolean refreshJob) [0x00000] in :0
at Deadline.Controllers.DeadlineController.RequeueTask (Deadline.Jobs.Job job, Deadline.Jobs.Task task, Boolean refreshJob) [0x00000] in :0
at Deadline.Slaves.SlaveSchedulerThread.RequeueTask (Deadline.Jobs.Task task) [0x00000] in :0
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Scheduler Thread - can not continue until this operation is completed successfully
Scheduler Thread - waiting 20 seconds before retrying…
Slave - UnauthorizedAccessException: Failed to update slaveInfo: Access to the path “/Volumes/Phil/Applications” is denied. For more information, see software.primefocusworld.com/sof … or_message.
Scheduler Thread - exception occurred while trying to requeue task.
Exception Details
JobDeletedException – job was deleted, jobDirectory: /Volumes/Phil/Applications/DeadlineRepository/jobs/999_050_999_1ae37186/999_050_999_1ae37186.job
JobDeletedException.JobDirectory: /Volumes/Phil/Applications/DeadlineRepository/jobs/999_050_999_1ae37186/999_050_999_1ae37186.job
Exception.Source: deadline
Exception.TargetSite: Boolean RefreshJob(Deadline.Jobs.Job ByRef, Boolean)
Exception.Data: ( )
Exception.StackTrace:
at Deadline.Storage.JobStorage.RefreshJob (Deadline.Jobs.Job& job, Boolean forceRefresh) [0x00000] in :0
at Deadline.Storage.JobStorage.RefreshJob (Deadline.Jobs.Job& job) [0x00000] in :0
at Deadline.Controllers.JobController.RequeueTask (Deadline.Jobs.Job job, Deadline.Jobs.Task task, Boolean refreshJob) [0x00000] in :0
at Deadline.Controllers.DeadlineController.RequeueTask (Deadline.Jobs.Job job, Deadline.Jobs.Task task, Boolean refreshJob) [0x00000] in :0
at Deadline.Slaves.SlaveSchedulerThread.RequeueTask (Deadline.Jobs.Task task) [0x00000] in :0
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Scheduler Thread - can not continue until this operation is completed successfully
Scheduler Thread - waiting 20 seconds before retrying…
Scheduler Thread - creating requeue report due to lost network connection
Exception Details
UnauthorizedAccessException – Access to the path “/Volumes/Phil/Applications” is denied.
Exception.Source: mscorlib
Exception.TargetSite: System.IO.DirectoryInfo CreateDirectoriesInternal(System.String)
Exception.Data: ( )
Exception.StackTrace:
at System.IO.Directory.CreateDirectoriesInternal (System.String path) [0x00000] in :0
at System.IO.Directory.CreateDirectory (System.String path) [0x00000] in :0
at System.IO.DirectoryInfo.Create () [0x00000] in :0
at (wrapper remoting-invoke-with-check) System.IO.DirectoryInfo:Create ()
at System.IO.Directory.CreateDirectoriesInternal (System.String path) [0x00000] in :0
at System.IO.Directory.CreateDirectory (System.String path) [0x00000] in :0
at System.IO.DirectoryInfo.Create () [0x00000] in :0
at (wrapper remoting-invoke-with-check) System.IO.DirectoryInfo:Create ()
at System.IO.Directory.CreateDirectoriesInternal (System.String path) [0x00000] in :0
at System.IO.Directory.CreateDirectory (System.String path) [0x00000] in :0
at Deadline.Storage.DeadlineStorage.GetRepositoryDateTime (System.String hostNameOrIpAddress, Int32 timeoutMilliseconds) [0x00000] in :0
at Deadline.Storage.Caches.DeadlineStorageCache.GetRepositoryDateTime (System.String hostNameOrIpAddress, Int32 timeoutMilliseconds) [0x00000] in :0
at Deadline.Controllers.RepositoryController.GetRepositoryDateTime () [0x00000] in :0
at Deadline.Controllers.DeadlineController.GetRepositoryDateTime () [0x00000] in :0
at Deadline.Slaves.SlaveSchedulerThread.CheckTaskStatuses () [0x00000] in :0
at Deadline.Slaves.SlaveSchedulerThread.RenderTasks () [0x00000] in :0
at Deadline.Slaves.SlaveSchedulerThread.ThreadMain () [0x00000] in :0
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<