minor bugs/irritations.

anon35454328 · March 14, 2013, 12:25pm

Using arnold render plugin.

In the task window
progress goes from 0% to 100% and nothing inbetween.

Peak Ram Usage and Ave ram usage are incorrect. I am seeing 1% while sar reports between 43-49% usage as does cat /proc/meminfo

[root@render10 ~]# sar -r 5 5
Linux 2.6.32-279.22.1.el6.x86_64 (render10) 03/14/2013 x86_64 (24 CPU)

12:24:37 PM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit
12:24:42 PM 12445824 12151192 49.40 207264 5363696 5451368 11.09
12:24:47 PM 12445740 12151276 49.40 207264 5363724 5451368 11.09
12:24:52 PM 12445760 12151256 49.40 207264 5363768 5721568 11.64
12:24:57 PM 12447256 12149760 49.40 207264 5363768 5451368 11.09

[root@render10 ~]# cat /proc/meminfo
MemTotal: 24597016 kB
MemFree: 12445988 kB
Buffers: 207264 kB
Cached: 5363464 kB

Chris

anon35454328 · March 14, 2013, 1:09pm

Found another…

Under jobs I have status active (38), but there 20 nodes each running 2 tasks which equals 40 tasks.

Chris

rrussell · March 14, 2013, 5:01pm

What’s your arnold verbosity level set to? I think you need a minimum of 4 to get progress.

Ram values are only for the process that the slave is running (and its child processes), not the ram usage of the entire system.

Probably just a minor display issue. Do you see this happen a lot? When you do see it, does it ever “fix” itself for that job?

Cheers,

Ryan

anon35454328 · March 14, 2013, 6:55pm

Arnold verbosity is set to 4.

Just looked and all of the ram usage is currently empty

If you look under the slaves the memory is reported correctly, but the CPU speed is incorrect 2 systems (nodes 20 & 3) say 2.4ghz and the other 18 say 1.6ghz and I have checked the 1.6ghz systems at cpuinfo reports 2.4ghz.

Again minor…but I am doing performance setups on render nodes to get the optimal performance and when each frame takes 12 or 19 mins to render you get lots of time to uncover these minor problems…

I would like to get pulse up and working but I cannot see any install notes under V6?

Chris

rrussell · March 15, 2013, 12:58pm

Thanks for checking the arnold verbosity. Can you post a render log from a job? We can check the output to make sure it’s printing out progress info, and if it is, check our stdout handlers to see if they are parsing it incorrectly.

Just to confirm, are you saying those 2 systems are reporting 2.4ghz in their cpuinfo? If so, then that means Deadline is getting this information correctly.

Setup is the same as it was for v5, so you can refer to the v5 docs:
thinkboxsoftware.com/deadlin … /#Overview

Cheers,

Ryan

anon35454328 · March 15, 2013, 1:26pm

All nodes are 2.4GHZ as reported by cpuinfo.

Only 2 were reported correctly yesterday.
Today I have 6 @ 2.4Ghz and 14 @ 1.6Ghz

rrussell · March 15, 2013, 1:29pm

I saw in your other post that you’re still on beta 10. Can you upgrade to beta 15 and see if you still have these problems?

Thanks!

Ryan

anon35454328 · March 15, 2013, 6:29pm

Interesting… I thought I was on a much higher beta than that as I thought I started with beta11 then upgraded to beta13…

OK - downloading latest beta…

rrussell · March 15, 2013, 6:39pm

Is it possible you might have just updated the repository and not the clients?

Cheers,

Ryan

anon35454328 · March 15, 2013, 6:55pm

Had problems with shotgun (see other post) so upgraded everything and the upgrade resolved it, and now when performing RV & shotgun integration found the Deadline and shotgun integration is broken again back to where I was a few weeks ago…

rrussell · March 15, 2013, 7:07pm

Both the repository and clients are now running beta 15? I’m a bit confused because in the shotgun post, you said you’re on beta 10 still…
viewtopic.php?f=86&t=9103#p39614

Note that in order to get the new ssl libraries, you would need to run the client installer on all of your machines.

anon35454328 · March 19, 2013, 6:05pm

Nothing has really changed and its a minor display problem.

Progress bar 0 then 100% (renders take 20 - 30 mins each) in the task window (jobs window the task usage is correct).

CPU speed is still reading either 1.6ghz or 2.4ghz in the slave window.

Memory usage under the task window is incorrect (peak usage 1.2mb ) while Slaves show consistent 4.6GB usage.

rrussell · March 19, 2013, 7:33pm

Are you able to post a log from an Arnold job? We’ll need that to check our progress handling.

Can you post the contents of the cpuinfo file for a machine that this is being reported wrong for?

I’ve logged this as a bug. Sounds like something is off with our memory usage gathering (at least under linux).

Thanks!

Ryan

anon35454328 · March 20, 2013, 11:47am

See attached tar file.

This has screen shot, cpuinfo and arnold log as produced from the launcher task on one of the nodes.

Enjoy
deadline.tar.gz (495 KB)

rrussell · March 20, 2013, 1:06pm

Thanks. I’ve confirmed that Deadline is pulling the correct “cpu MHz” value from the cpuinfo file. I did some reading, and I’ve learned that some processors can scale up or down as necessary, which I’m pretty sure explains what you are seeing here. Apparently, you can change the CPU governor settings to avoid this:
experts-exchange.com/OS/Linu … -Tips.html

Also, thanks for the Arnold log. I’m a bit embarrassed to say this, but it turns out our Arnold standalone plugin has to progress handling built into it like I had thought. I was thinking of our Arnold for Maya support. However, from looking at this particular log, I don’t see an obvious way to pull out overall progress. Yes, there are these lines:

55% done - 578 rays/pixel

However, there are 5 different sections of this log that go from 0 to 100%, and nothing obvious to indicate how many sections there would be in advance. We could probably guesstimate the overall progress like this:

We go through the first section, reporting progress 0-100%.
We go through the second section, adjusting overall progress so it appears as 51-100% (the first 0-50% is covered by the first section).
We go through the third section, adjusting overall progress so it appears as 67-100% (the first 0-66% is covered by the first two sections).
etc…

This will result in the progress jumping around a bit, but maybe that’s better than nothing?

Just in case we get more info with more verbosity though, can you run another job with verbosity set to 5 and send us the log?

Thanks!

Ryan

anon35454328 · March 20, 2013, 5:36pm

Thanks on the cpuinfo stuff.
Yes they were ticking over at 1600mhz and not 2400mhz… Oh joy but at least there is a fix/workaround.

rrussell · March 22, 2013, 1:38pm

Just a follow up. Any chance we can get a log from an Arnold job with verbosity set to 5?

Also, I took a look at how we collect memory usage for the rendering process. We are currently grabbing the Resident Set value, instead of the Virtual Memory value. That being said, the difference between 1.2MB and 4.6GB is pretty substantial.

Can you get an Arnold job rendering, and then while it’s rendering and using lots of memory, go to the node it’s rendering on and get the contents of /proc/PID/stat (where PID is the process ID of the Arnold process)? If you could send us the contents of that stat file, that would be great!

Cheers,

Ryan

anon35454328 · April 25, 2013, 3:14pm

Sorry been stupidly busy… Will try a arnold with it set to 5… Expect a weeks delay as the farm as 1300 arnold frames to render currently and due to a Yeti bug we cannot multi thread the tasks so we are looking at 1hour a frame