Jobs failing with "Could not read task log because: Empty path name is not legal."


#1

Not sure what’s happening here but I haven’t seen this before and suddenly a ton of my jobs are failing because of this.

=======================================================
Error
=======================================================


=======================================================
Type
=======================================================
ThreadAbortException

=======================================================
Stack Trace
=======================================================
  at System.Collections.SortedList.IndexOfKey (System.Object key) <0x7faf3c6ebb80 + 0x00000> in <a8460a77e67a430a8486a9751162e5f4>:0 
  at System.Threading.Timer+Scheduler.InternalRemove (System.Threading.Timer timer) [0x00000] in <a8460a77e67a430a8486a9751162e5f4>:0 
  at System.Threading.Timer+Scheduler.Change (System.Threading.Timer timer, System.Int64 new_next_run) [0x0000e] in <a8460a77e67a430a8486a9751162e5f4>:0 
  at System.Threading.Timer.Change (System.Int64 dueTime, System.Int64 period, System.Boolean first) [0x000b8] in <a8460a77e67a430a8486a9751162e5f4>:0 
  at System.Threading.Timer.Init (System.Threading.TimerCallback callback, System.Object state, System.Int64 dueTime, System.Int64 period) [0x0001f] in <a8460a77e67a430a8486a9751162e5f4>:0 
  at System.Threading.Timer..ctor (System.Threading.TimerCallback callback, System.Object state, System.Int32 dueTime, System.Int32 period) [0x00006] in <a8460a77e67a430a8486a9751162e5f4>:0 
  at (wrapper remoting-invoke-with-check) System.Threading.Timer:.ctor (System.Threading.TimerCallback,object,int,int)
  at System.Timers.Timer.set_Enabled (System.Boolean value) [0x000bd] in <5071a6e4a4564e19a2eda0f53e42f9bd>:0 
  at System.Timers.Timer.Start () [0x00000] in <5071a6e4a4564e19a2eda0f53e42f9bd>:0 
  at (wrapper remoting-invoke-with-check) System.Timers.Timer:Start ()
  at Deadline.Slaves.SlaveRenderThread.e (System.String adm, Deadline.Jobs.Job adn) [0x00314] in <af76bc9c0ee149a7ad075ff1508ea786>:0 
  at Deadline.Slaves.SlaveRenderThread.b (Deadline.IO.TaskLogWriter adk) [0x00015] in <af76bc9c0ee149a7ad075ff1508ea786>:0 
  at Deadline.Slaves.SlaveRenderThread.a () [0x0008f] in <af76bc9c0ee149a7ad075ff1508ea786>:0 

=======================================================
Log
=======================================================
Could not read task log because: Empty path name is not legal. (System.ArgumentException)
  at System.IO.StreamReader..ctor (System.String path, System.Text.Encoding encoding, System.Boolean detectEncodingFromByteOrderMarks, System.Int32 bufferSize, System.Boolean checkHost) [0x00042] in <a8460a77e67a430a8486a9751162e5f4>:0 
  at System.IO.StreamReader..ctor (System.String path, System.Text.Encoding encoding, System.Boolean detectEncodingFromByteOrderMarks, System.Int32 bufferSize) [0x00000] in <a8460a77e67a430a8486a9751162e5f4>:0 
  at System.IO.StreamReader..ctor (System.String path, System.Text.Encoding encoding) [0x00009] in <a8460a77e67a430a8486a9751162e5f4>:0 
  at (wrapper remoting-invoke-with-check) System.IO.StreamReader:.ctor (string,System.Text.Encoding)
  at System.IO.File.ReadLines (System.String path, System.Text.Encoding encoding) [0x00000] in <a8460a77e67a430a8486a9751162e5f4>:0 
  at Deadline.Controllers.DataController.ReportError (System.Exception e, Deadline.Slaves.Slave slave, Deadline.Plugins.IPlugin plugin, Deadline.Jobs.Job job, Deadline.Jobs.Task task, System.TimeSpan taskTimeElapsed) [0x00332] in <af76bc9c0ee149a7ad075ff1508ea786>:0 

=======================================================
Details
=======================================================
Date: 08/09/2019 17:51:34
Frames: 35
Elapsed Time: 00:00:00:14
Job Submit Date: 08/09/2019 17:46:09
Job User: jbartolozzi
Average RAM Usage: 16255895642 (4%)
Peak RAM Usage: 19194388480 (4%)
Average CPU Usage: 37%
Peak CPU Usage: 94%
Used CPU Clocks (x10^6 cycles): 3014306
Total CPU Clocks (x10^6 cycles): 8146771

=======================================================
Slave Information
=======================================================
Slave Name: c1-n047-41-b
Version: v10.0.27.3 Release (f9638e9ab)
Operating System: Ubuntu 18.04.2 LTS
Machine User: svcthirdeyeadmin
IP Address: 10.99.11.152
MAC Address: 0C:C4:7A:C6:4C:04
CPU Architecture: x86_64
CPUs: 56
CPU Usage: 34%
Memory Usage: 15.6 GB / 488.0 GB (3%)
Free Disk Space: 1.600 TB 
Video Card: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)

I am trying to run Houdini jobs on instanced slaves.


#2

When I look at the slave log I see this:

Could not kill child process with id 44220 because: Success (System.ComponentModel.Win32Exception)
2019-08-09 18:16:44:  Attempting a hard kill of child process with id 44220 because it failed to exit.
2019-08-09 18:16:44:  Could not kill child process with id 44220 because 'kill' returned non-zero exit code: -1
2019-08-09 18:16:46:  Could not kill child process with id 44220 because: Success (System.ComponentModel.Win32Exception)
2019-08-09 18:16:46:  Attempting a hard kill of child process with id 44220 because it failed to exit.
2019-08-09 18:16:46:  Could not kill child process with id 44220 because 'kill' returned non-zero exit code: -1
2019-08-09 18:16:48:  Could not kill child process with id 44220 because: Success (System.ComponentModel.Win32Exception)
2019-08-09 18:16:48:  Attempting a hard kill of child process with id 44220 because it failed to exit.
2019-08-09 18:16:48:  Could not kill child process with id 44220 because 'kill' returned non-zero exit code: -1
2019-08-09 18:16:50:  Could not kill parent process with id 44151 because: Success (System.ComponentModel.Win32Exception)
2019-08-09 18:16:50:  Attempting a hard kill of parent process with id 44151 because it failed to exit.
2019-08-09 18:16:50:  Could not kill parent process with id 44151 because 'kill' returned non-zero exit code: -1
2019-08-09 18:16:52:  Could not kill parent process with id 44151 because: Success (System.ComponentModel.Win32Exception)
2019-08-09 18:16:52:  Attempting a hard kill of parent process with id 44151 because it failed to exit.
2019-08-09 18:16:52:  Could not kill parent process with id 44151 because 'kill' returned non-zero exit code: -1
2019-08-09 18:16:55:  Scheduler Thread - Job's Limit Groups:

#3

Using
"ForceReloadPlugin": "False",
Seems to fix it. But not really sure how to go about debugging this error.