AWS Thinkbox Discussion Forums

[Linux] Slave exceptions during idle

Deadline 6.0.0.48466
Fedora 15 x64

I’ve started a Slave and left it running for about an hour as a basic stability test. No jobs in the queue or anything. It’s still running, but it vomits a fairly verbose exception at intervals, repeated twice so far now.

[code]
Purging old job auxiliary files
Purging trash
Purging repository temp files
Purging old job auxiliary files
Purging limits
Purging obsolete slaves
Scheduler Thread - exception occurred:
Scheduler Thread - Unexpected Error Occured

0 Other Running Threads

Process Threads

Memory and CPU Stats
GC.TotalMemory: 12.984 MB
Environment.WorkingSet: 0 Bytes
ComputerSystem.TotalPhysicalMemory: 1.957 GB
ComputerSystem.FreePhysicalMemory: 1.174 GB

Application Information
Application.ExecutablePath: /usr/Deadline6/client/bin/deadlineslave.exe
Application.CurrentDirectory: /home/ruschn
Application.StartupPath: /usr/Deadline6/client/bin
Application.ProductName: Deadline Slave 6.0
Application.ProductVersion: 6.0.0.48466 File.GetLastWriteTime( Application.ExecutablePath ): 09/26/2012 10:35:49

Assembly Information (Executing)
ExecutingAssembly.CodeBase: file:///usr/Deadline6/client/bin/franticx.dll
ExecutingAssembly.Location: /usr/Deadline6/client/bin/franticx.dll
ExecutingAssembly.GlobalAssemblyCache: False File.GetLastWriteTime( ExecutingAssembly.Location ): 09/26/2012 10:35:49

Assembly Information (Current)
CurrentAssembly.CodeBase: file:///usr/Deadline6/client/bin/franticx.dll
CurrentAssembly.Location: /usr/Deadline6/client/bin/franticx.dll
CurrentAssembly.GlobalAssemblyCache: False File.GetLastWriteTime( CurrentAssembly.Location ): 09/26/2012 10:35:49

Thread Information
CurrentThread.Name: SlaveSchedulerThread
CurrentThread.Priority: Lowest

Operating System Information
Environment.OSVersion.Platform: Linux
Environment.OSVersion.Version: 2.6.43.8

.NET Platform Information
Environment.Version.Major: 4
Environment.Version.Minor: 0
Environment.Version.Build: 30319
Environment.Version.Revision: 1

Misc Environment Information
Environment.MachineName: ws-vm01
Environment.UserName: ruschn
Environment.SystemDirectory:
Environment.TickCount: 8465651

Command Line
Environment.CommandLine: /usr/Deadline6/client/bin/deadlineslave.exe
Environment.CommandLineArgs[0]: /usr/Deadline6/client/bin/deadlineslave.exe

Current Call Stack
Environment.StackTrace: at System.Environment.get_StackTrace()
at FranticX.Diagnostics.Reporting.ErrorReporting.GetExceptionReport(System.Exception e, ErrorReportDetail detail)
at FranticX.Diagnostics.Reporting.ErrorReporting.WriteExceptionReport(System.Exception ex)
at FranticX.Diagnostics.Reporting.ErrorReporting.RecordException(System.Exception e)
at Deadline.Slaves.SlaveSchedulerThread.a(System.Exception A_0)
at Deadline.Slaves.SlaveSchedulerThread.i()
at System.Threading.Thread.StartUnsafe()

email file reporting failed: Sender account in the SMTP settings is blank (System.Exception)
exception occurred while recording file: Sender account in the SMTP settings is blank (System.Exception)

Exception Details
NotImplementedException – The requested feature is not implemented.
Exception.Source: deadline
Exception.TargetSite: Void PurgeOldSlaveReports(System.String, DateTime)
Exception.Data: ( )
Exception.StackTrace:
at Deadline.StorageDB.MongoDB.MongoSlaveStorage.PurgeOldSlaveReports (System.String slaveId, DateTime expiryDate) [0x00000] in :0
at Deadline.Slaves.SlaveSchedulerThread.i () [0x00000] in :0

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Purging trash
Purging repository temp files
Purging old statistics
Purging slave statistics that are older than Jun 04/12 11:27:28
Purging repository statistics that are older than Jun 04/12 11:27:28
Purging repository temp files
Purging old statistics
Purging slave statistics that are older than Jun 04/12 11:27:58[/code]

I also get some more generic ‘missing path’ errors echoed due to what I believe is simply the ‘jobs’ and ‘slaves’ directories not existing in the repository’s ‘reports’ directory. It would be nice to either have those created automatically when the repo is set up, or to not have the Slave echo an exception.

Purging old job reports error purging job report files: Directory '/usr/Deadline6/repo/reports/jobs' not found. (System.IO.DirectoryNotFoundException) error purging slave report files: Directory '/usr/Deadline6/repo/reports/slaves' not found. (System.IO.DirectoryNotFoundException) Purging obsolete slaves Purging trash Cleaning up orphaned tasks Purging obsolete slaves Purging trash Purging old job reports error purging job report files: Directory '/usr/Deadline6/repo/reports/jobs' not found. (System.IO.DirectoryNotFoundException) error purging slave report files: Directory '/usr/Deadline6/repo/reports/slaves' not found. (System.IO.DirectoryNotFoundException)

The first error is still marked as todo. The error is harmless.

The missing paths error will be fixed in the next beta. The repository installer will now create those paths.

Cheers,

  • Ryan

Ok. Just as a heads-up, I left the slave running unattended for several hours, and when I went to check on it again, the GUI was completely frozen, with (as you might expect), many instances of this exception logged to the terminal. The process then sat for awhile chewing my CPU, and then eventually came back (after about 5 mins). It may or may not be a symptom of the exception described above, but seems worth mentioning.

Thanks for the heads up. We are aware of situations where the Slave UI locks up for a bit. We’re assuming at this point it has to do with logging to the UI, and we plan to investigate this soon once some of the more critical issues are worked out. As far as we can tell, it doesn’t affect the actual running of the slave, it just affects the ability to interact with it.

Cheers,

  • Ryan

Just wanted to note that this issue is still present in Beta 4.

Do you mean the issue were the slave locks up, or where it dumps out a bunch of error messages, or both? If it’s still dumping error messages, can you post them?

Thanks!

  • Ryan

I haven’t noticed the lockup again, but I haven’t left it running for as long since (been pretty busy lately). But the error messages are still there, and still the same.

Okay, cool. We’re actually changing how the reports are stored in beta 5, so you’ll have to let us know if you see similar problems after upgrading.

Thanks!

  • Ryan
Privacy | Site terms | Cookie preferences