maya2016 renderman/RIS no error but failing

hey, new to deadline. our sysadmin set it up and left us with it.

the first simple job i submitted rendered fine. it reported some errors but retired and all were finished

i submitted a more complex job with textures and alembic caches, constraints, and nCloth. all cached and available on a file server to the slaves.

i submit a sequence of frames to a pool that has one slave in it. all good. all frames finish, no errors

i make a pool with more than one slave and randomly plenty of frames fail without error. random frames on random machines.
i have tested running the same scene on those two machines i am testing deadline on. i have made batch scripts to launch at the same time outside of deadline and both slaves render the scene perfectly. but in deadline on the same slaves it fails.

deadline does not pick up that it has failed, no errors are reported and so they always fail.
most time when i look at the log, renderman gets up to 3-20% complete and just stops. finishes its log and says its complete.

the monitor can tell that the frames are empty because i use the graph output frames option and it sees they are not there, but the rendering does not notice they are not there and so seems to thing the frames are complete which theye are not.

nothing i can change in the repository settings makes any difference to this

i noticed in one forum post someone had a similar problem with AE and a script was setup to catch the bad frames for resumbission. is anything available for the maya submission??

running windows 7 sp1
deadline 7.2.3.0R
maya2016
renderman/RIS 20.6

Well, this sounds like a fun problem.

Could you send along the log for one of these tests? You can strip out anything identifying, I’m mostly interested in the errors.

One big thing I noticed is your mention of nCloth. If Maya treats that as a simulation, you’ll only be able to process that on one machine. Since you’re using Renderman, I think exporting the Ribs are the best bet (see the Maya Job Type dropdown). Then at least the rendering can be distributed while one box generates the Ribs. Make sure the task size is huge so only one machine handles it. That should guarantee one process starts and finishes the simulation in one go.

[attachment=0]Screen Shot 2016-05-02 at 12.59.22 PM.png[/attachment]

Hopefully that helps. I’m no Maya expert, so if anyone else wants to chime in on this one, go for it.

hey, thanks for the info.

we are actually using the Maya Render Job as the “deadline job type”

so that should make a task for every frame that will generate the rib then render it. ( we are not licenced to use renderman pro server so cant render as a standalone render )

so again, another scene is rendering fine on one machine. all simulations are cached or baked. so nothing needs to simulate.

again, on more than one machine in a pool, more frames fail than succeed, but dont report to deadline they have failed. either from all the errors in the log or the file size is only 8kb

i have attached 3 logs. frame 8 succeeded and i have attached that for example.

i have tried to set the settings to catch errors but nothing seems to see these errors.

thanks
daryl

not sure if you got those attachements

so here they are again
frame8_succeed.txt (11.1 KB)
frame1.txt (53.4 KB)
frame3.txt (9.96 KB)

I’m glad you know the Renderman process better than me! Sorry for needing to clarify this stuff again.

After pouring over the diffs here it’s looking like you’re right that the only reliable way to make this work is going to be to check the output files… It might be doable. The way it’s done in After Effects really should extend fine here in some cases, but it could cause us a HUGE amount of trouble with all the different renderers and the different output channels. As it stands, Grant had to refactor everything in there recently to make it manageable.

Time to go for a walk around the office :slight_smile:

Update: Chatting with Grant, we do have the ability to know where all of the output should go (which is petty obvious given that we need that information for tile rendering and assembly as well as local rendering). He’s going to take a look into Renderman issues and see if he can’t dig up more info on the problem. Just for reference on the issue, could you archive the job and send it over to support@thinkboxsoftware.com? Assuming there was no scene submitted with the job it should just include most of the logs as well as some of the submitter options (which shouldn’t matter here).

In the mean time, does turning off batch mode make any difference? It’s a completely different plugin Deadline will run to render things, but it should also guarantee that Maya gets reloaded every task.

thanks for the input.
yeah you do know where all the output files are since your interface asks us to set that folder. it should be easy enough for you guys to check the file size.
also, your Task window can be changed to a graph mode and can graph the output file size. would be great to have a final check that checks the file size and under a certain tollerance retried frames.

i tried switching off batch mode and it made no difference.

when you say archive the job, what do you want? the whole maya project with scene file, textures, cache files, render logs and stuff like that?
im not sure where the deadline log files are kept on disk to archive it all up.
i captured the text from the deadline task report last time.

so let me know what bits you want to make it easier for you to diagnose. im happy to send you anything you want.

thanks
daryl

Archiving the job itself should be good enough. Just ‘complete’ or ‘fail’ the job and there should be an option to archive it. The Monitor will ask for a location for the zip file and you can send that along to us. I think trying to replicate the problem is going to be a bit too difficult, but confirming the file check works should be easy to test (just delete them while rendering).

I feel like we’ve seen this Renderman issue before as well… Have you reached out to Pixar or tried upgrading? It might be worthwhile while we work through this. They’re currently at 20.9, but the changelogs don’t specifically mention this specific problem.

ok, i’ll get back to archiving a job for you.

but for now, i have managed to get our IT guy to deploy renderman pro server ( the standalone renderer ) to the machines and it is licensed.
i am trying to use the “renderman Export Job” option as i think it may really help this issue.

i can run the standalone render in a cmd shell in windows using the commands deadlin is trying to use. when launched thru the cmd shell, the frame renders fine.
but it i let deadline try the same thing, it keeps getting a license error. i have confirmed that that machine is licensed and can render the frame.
do you know why dealine might be failing to render because of licenses when renderman itself is fine and licenses on that same machine??

i know the rendermn.ini file is pointing to the correct license server location.
here is the line in the following log that seems to be killing it:
STDOUT: R50009 {SEVERE} License location is not set in rendermn.ini - aborting.

here is the log


=======================================================
Error

Error: Renderer returned non-zero error code, 13. Check the log for more information.
at Deadline.Plugins.ScriptPlugin.RenderTasks(String taskId, Int32 startFrame, Int32 endFrame, String& outMessage, AbortLevel& abortLevel)

=======================================================
Type

RenderPluginException

=======================================================
Stack Trace

at Deadline.Plugins.Plugin.RenderTask(String taskId, Int32 startFrame, Int32 endFrame)
at Deadline.Slaves.SlaveRenderThread.a(TaskLogWriter A_0)

=======================================================
Log

2016-05-06 15:41:44: BEGIN - MELB-81C229\SRV_Deadline
2016-05-06 15:41:44: 0: Loaded plugin PRMan (\MELFS1\Render\plugins\PRMan)
2016-05-06 15:41:44: 0: Start Job timeout is disabled.
2016-05-06 15:41:44: 0: Task timeout is disabled.
2016-05-06 15:41:44: 0: Loaded job: mayaFurBall5 (572c26464560a8211c4336d8)
2016-05-06 15:41:44: 0: Skipping drive mapping because they have already been mapped for this job
2016-05-06 15:41:44: 0: INFO: Executing plugin script C:\Users\srv_deadline\AppData\Local\Thinkbox\Deadline7\slave\melb-81c229\plugins\572c26464560a8211c4336d8\PRMan.py
2016-05-06 15:41:44: 0: INFO: About: RIB Plugin for Deadline
2016-05-06 15:41:44: 0: INFO: The job’s environment will be merged with the current environment before rendering
2016-05-06 15:41:44: 0: Plugin rendering frame(s): 1
2016-05-06 15:41:44: 0: INFO: Stdout Redirection Enabled: True
2016-05-06 15:41:44: 0: INFO: Stdout Handling Enabled: True
2016-05-06 15:41:44: 0: INFO: Popup Handling Enabled: False
2016-05-06 15:41:44: 0: INFO: Using Process Tree: True
2016-05-06 15:41:44: 0: INFO: Hiding DOS Window: True
2016-05-06 15:41:44: 0: INFO: Creating New Console: False
2016-05-06 15:41:44: 0: INFO: Running as user: SRV_Deadline
2016-05-06 15:41:44: 0: INFO: Executable: “C:/Program Files/Pixar/RenderManProServer-20.9/bin/prman.exe”
2016-05-06 15:41:44: 0: INFO: Rendering file: R:\daryl\renderTest\renderman\mayaFurBall2\rib\0001\0001.rib
2016-05-06 15:41:44: 0: INFO: Argument: -Progress -t:0 “R:/daryl/renderTest/renderman/mayaFurBall2/rib/0001/0001.rib”
2016-05-06 15:41:44: 0: INFO: Startup Directory: “R:\daryl\renderTest”
2016-05-06 15:41:44: 0: INFO: Process Priority: BelowNormal
2016-05-06 15:41:44: 0: INFO: Process Affinity: default
2016-05-06 15:41:44: 0: INFO: Process is now running
2016-05-06 15:41:44: 0: STDOUT: R50009 {SEVERE} License location is not set in rendermn.ini - aborting.
2016-05-06 15:41:45: 0: INFO: Process exit code: 13
2016-05-06 15:41:46: 0: An exception occurred: Error: Renderer returned non-zero error code, 13. Check the log for more information.
2016-05-06 15:41:46: at Deadline.Plugins.ScriptPlugin.RenderTasks(String taskId, Int32 startFrame, Int32 endFrame, String& outMessage, AbortLevel& abortLevel) (Deadline.Plugins.RenderPluginException)

=======================================================
Details

Date: 05/06/2016 15:41:47
Frames: 1
Elapsed Time: 00:00:00:03
Job Submit Date: 05/06/2016 15:06:12
Job User: darylm
Average RAM Usage: 3605996032 (11%)
Peak RAM Usage: 3606278144 (11%)
Average CPU Usage: 13%
Peak CPU Usage: 37%
Used CPU Clocks: 16016
Total CPU Clocks: 123200

=======================================================
Slave Information

Slave Name: MELB-81C229
Version: v7.2.3.0 R (d21b3e911)
Operating System: Windows 7 Enterprise (SP1)
Running As Service: Yes
Machine User: SRV_Deadline
IP Address: 10.17.24.113
MAC Address: FC:AA:14:81:C2:29
CPU Architecture: x64
CPUs: 4
CPU Usage: 0%
Memory Usage: 3.4 GB / 31.9 GB (10%)
Free Disk Space: 37.857 GB
Video Card: NVIDIA GeForce GTX 960

If Renderman uses environment variables for pointing to the correct server, you should reboot the nodes so they pick up the changes. There’s an option via the remote control to restart after the current task is complete, so you can safely restart the farm.