slave log stops updating

LaszloSebo · January 31, 2013, 4:31am

Hi there,

It seems that the slave logs can easily stop updating (while the job is still going). Most of the machines rendering a particular job have their local log window frozen (its interactive, i can scroll it, but there is no new content appearing), even though the job is still going. We have our internal logs being written to a file, so i see that its doing something.

The last couple of lines of one particular task are:

1-30 19:58:42: 0: STDOUT: Wed Jan 30 19:58:42 2013 (+0ms) : debug : Initiating pc2 file: //inferno2/projects/tboa/scenes/FB_360_0001/cache/common/rigcache/vehThemistoklesWarshipA_7/v0003_mpo_FR1_Layout_360_Test/pc2/mesh_m_high_flag.pc2
2013-01-30 19:58:42: 0: STDOUT: Wed Jan 30 19:58:42 2013 (+47ms) : debug : Initiating pc2 file: //inferno2/projects/tboa/scenes/FB_360_0001/cache/common/rigcache/vehThemistoklesWarshipA_7/v0003_mpo_FR1_Layout_360_Test/pc2/mesh_m_low_flag.pc2
2013-01-30 19:58:42: 0: STDOUT: Wed Jan 30 19:58:42 2013 (+15ms) : debug : Initiating pc2 file: //inferno2/projects/tboa/scenes/FB_360_0001/cache/common/rigcache/vehThemistoklesWarshipA_7/v0003_mpo_FR1_Layout_360_Test/pc2/mesh_m_sim_flag.pc2

Then nothing…

While in our own logs:

03h:41m:49s (step 910/7626) Overall Progress: 17 percent
Wed Jan 30 20:28:48 2013 (+1875ms) : PUBLISH STEP 4a/7 : Frame: 1861, Step time: 1.88 secs. Avg step time: 1.98 secs. Total time spent: 00h:30m:05s Estimated time left: 03h:41m:46s (step 911/7626) Overall Progress: 17 percent
Wed Jan 30 20:28:50 2013 (+1796ms) : PUBLISH STEP 4a/7 : Frame: 1862, Step time: 1.80 secs. Avg step time: 1.98 secs. Total time spent: 00h:30m:07s Estimated time left: 03h:41m:43s (step 912/7626) Overall Progress: 17 percent
Wed Jan 30 20:28:52 2013 (+1796ms) : PUBLISH STEP 4a/7 : Frame: 1863, Step time: 1.80 secs. Avg step time: 1.98 secs. Total time spent: 00h:30m:08s Estimated time left: 03h:41m:39s (step 913/7626) Overall Progress: 17 percent

rrussell · January 31, 2013, 5:40pm

Hey Laszlo,

Weird. We haven’t seen this in any of the testing we’ve done here.

The next time this happens, can you grab the slave log for the current session and post it? You can get to the log folder from the Slave UI by selecting Help -> Explore Log Folder. Also, if you can let us know which line the slave last printed out, that would be great.

We can use this info to see if there is anything in the log file that could explain the problem.

If the log contains sensitive information, you can email it directly to support@thinkboxsoftware.com.

Thanks!

Ryan

LaszloSebo · January 31, 2013, 6:06pm

Ok i think i might have figured it out. There is a line in our log output thats extremely long, right after that last bit.

This is whats in that log file (last two lines are from me trying to restart the log by forcing a garbage collect on the slave):

2013-01-30 19:58:42: 0: STDOUT: Wed Jan 30 19:58:42 2013 (+16ms) : debug : Initiating pc2 file: //inferno2/projects/tboa/scenes/FB_360_0001/cache/common/rigcache/vehThemistoklesWarshipA_6/v0003_mpo_FR1_Layout_360_Test/pc2/mesh_m_sim_flag.pc2
2013-01-30 19:58:42: 0: STDOUT: Wed Jan 30 19:58:42 2013 (+0ms) : PUBLISH STEP 4/7 : calling pre timestep function: <bound method RigCacheExporter.preTimeStepFunction of <scNodes.rigCache.RigCacheExporter object at 0x0000000197675208>>
2013-01-30 19:58:42: 0: STDOUT: Wed Jan 30 19:58:42 2013 (+0ms) : debug : Pre timestep function called for rigCache node RigCacheExporter@ScanlineData_RigCacheExporter8
2013-01-30 19:58:42: 0: STDOUT: Wed Jan 30 19:58:42 2013 (+15ms) : debug : getNameSpace() for RiggingImporter@ScanlineData_RiggingImporter8
2013-01-30 19:58:42: 0: STDOUT: Wed Jan 30 19:58:42 2013 (+62ms) : debug : getNameSpace() found vehThemistoklesWarshipA_3313_7
2013-01-30 19:58:42: 0: STDOUT: Wed Jan 30 19:58:42 2013 (+14ms) : debug : exportTransformsToSTF_Setup
2013-01-30 19:58:42: 0: STDOUT: Wed Jan 30 19:58:42 2013 (+0ms) : debug : Initiating pc2 file: //inferno2/projects/tboa/scenes/FB_360_0001/cache/common/rigcache/vehThemistoklesWarshipA_7/v0003_mpo_FR1_Layout_360_Test/pc2/mesh_m_high_flag.pc2
2013-01-30 19:58:42: 0: STDOUT: Wed Jan 30 19:58:42 2013 (+47ms) : debug : Initiating pc2 file: //inferno2/projects/tboa/scenes/FB_360_0001/cache/common/rigcache/vehThemistoklesWarshipA_7/v0003_mpo_FR1_Layout_360_Test/pc2/mesh_m_low_flag.pc2
2013-01-30 19:58:42: 0: STDOUT: Wed Jan 30 19:58:42 2013 (+15ms) : debug : Initiating pc2 file: //inferno2/projects/tboa/scenes/FB_360_0001/cache/common/rigcache/vehThemistoklesWarshipA_7/v0003_mpo_FR1_Layout_360_Test/pc2/mesh_m_sim_flag.pc2
2013-01-30 20:33:57: [memory usage] 24.785 MB / 151.047 MB
2013-01-30 20:33:57: [memory usage] 21.195 MB / 137.789 MB

This is from the ERROR log for that frame later on. It DID have the rest of the text, it has it all. The next line was 266961 characters long though, and i guess that was upsetting I culled it here.

Will remove that from our logging system.

starting line:

Error

Error in RenderTasks: Monitored managed process “MayaBatch” has exited or been terminated.
at Deadline.Plugins.ScriptPlugin.RenderTasks(String taskId, Int32 startFrame, Int32 endFrame, String& outMessage, AbortLevel& abortLevel)

=======================================================
Log

0: STDOUT: Wed Jan 30 19:58:42 2013 (+31ms) : PUBLISH : timesteps to use for timestep functions: {951: [0, 1, 2, 3, 4, 5, 6, 7, 8], 952: [0, 1, 2, 3, 4, 5, 6, 7, 8], 953: [0, 1, 2, 3, 4, 5, 6, 7, 8], 954: [0, 1, 2, 3, 4, 5, 6, 7, 8], 955: [0, 1, 2, 3, 4, 5, 6, 7, 8], 956: [0, 1, 2, 3, 4, 5, 6, 7, 8], 957: [0, 1, 2, 3, 4, 5, 6, 7, 8], 958: [0, 1, 2, 3, 4, 5, 6, 7, 8
…
…
8575: [0, 1, 2, 3, 4, 5, 6, 7, 8], 8576: [0, 1, 2, 3, 4, 5, 6, 7, 8]}
0: STDOUT: Wed Jan 30 19:58:47 2013 (+5780ms) : PUBLISH STEP 4a/7 : Frame: 951, Step time: 4.81 secs. Avg step time: 4.81 secs. Total time spent: 00h:00m:04s Estimated time left: 10h:11m:31s (step 1/7626)
0: STDOUT: Wed Jan 30 19:58:50 2013 (+2110ms) : PUBLISH STEP 4a/7 : Frame: 952, Step time: 2.11 secs. Avg step time: 3.46 secs. Total time spent: 00h:00m:06s Estimated time left: 07h:19m:46s (step 2/7626)

slave log stops updating

starting line:

Error

======================================================= Log

=======================================================
Log