beta 7 feedback

MikeOwen · October 11, 2013, 12:59pm

Loading a saved layout or clicking on a pinned layout, will load the correct layout, but where you have a panel with multiple tabs in it within the monitor, the tabs are in a different order, reading left to right, then went you saved the layout OR pinned the layout. Essentially, layout’s don’t remember the order of tabs within a panel.
If a user creates a monitor layout file by saving it to a file server location and that user or another user loads in this monitor layout file. Then you are unable to ever delete this monitor layout file as it is file locked. You will have to ensure all instances of monitor are shutdown which are referencing this layout file. Of course, this isn’t practical for a studio.
Lightning.dlx needs a “Do NOT render” function effectively. If you run a FumeFX Sim via Deadline, AT the end of the SIM, Lightning renders the single frame number which more than likely the user will not want to have happen. Worse still, if you have MR as the renderer, then it crashes out. Luckily Scanline and QuickSilver renderers do not crash out. VRay just renders as well and you can “kind of” hack VRay a little, by enabling “Do NOT render” button in indirect illumination which will help BUT again, in all these example renderers above, more than likely, the user will NOT want any rendering to take place. So…how about a setting called “Do NOT Render”, exposed to SMTD as well, which allows users to tell Lightning to SKIP the SDK render command and return successfully via the py plugin? This would be similar to telling the 3dsmax plugin that it is a “MaxScript Job” and therefore don’t render. Although, no custom Maxscript file/job would need to be executed.

See attached log report for an example of this exact MR crash situation, together with identical SIM jobs but with either Scanline or QuickSilver enabled as the renderer.

FumeFX_Lightning_Crash_Renderer_Reports.zip (19.9 KB)

Slave messages when Pulse isn’t running. Not errors, but maybe needs a tidy up? Cleaner messages?

2013-10-09 09:32:19: Info Thread - Could not check if Pulse is running because: The requested address is not valid in its context 255.255.255.255:17062 2013-10-09 09:32:19: at Deadline.Net.DeadlineNetUtils.ConnectSocket(IPAddress ipAddress, Int32 port, Int32 maxAttempts, Boolean verbose, MonitorManager monitorManager) 2013-10-09 09:32:19: at Deadline.Slaves.SlaveInfoThread.a() 2013-10-09 09:32:21: Scheduler - Could not check if throttling is necessary from Deadline Pulse because: The requested address is not valid in its context 255.255.255.255:17062 2013-10-09 09:32:21: at Deadline.Net.DeadlineNetUtils.ConnectSocket(IPAddress ipAddress, Int32 port, Int32 maxAttempts, Boolean verbose, MonitorManager monitorManager) 2013-10-09 09:32:21: at Deadline.Net.DeadlineNetUtils.ConnectSocket(IPAddress ipAddress, Int32 port, Int32 maxAttempts, Boolean verbose) 2013-10-09 09:32:21: at Deadline.Scheduling.SchedulerUtils.AddToPulseThrottlingQueue(DeadlineNetworkSettings networkSettings, SlaveState& slaveState)

Not a major issue, but it does dump a stack whenever the message is printed.

Job Properties, required assets. Click on Add, then click on “cancel” in the “Add Asset” folder OS browser and you get this error in console:

2013-10-10 16:20:30: Traceback (most recent call last): 2013-10-10 16:20:30: File "DeadlineUI\UI\Forms\JobPropertyForms\RequiredAssetsForm.py", line 82, in addButtonClicked 2013-10-10 16:20:30: UnboundLocalError: local variable 'filesContainPadding' referenced before assignment

Job Properties of a 3dsMax job - “3dsMax Settings” panel. Could the top left hand corner of the panel be anchored / pinned down, so that the “group boxes” / “rollouts” could be resized within the panel? I notice that the overall size of the Job Properties dialog has a fixed min. width, yet the “3dsmax Settings” panel isn’t resized automatically to fit the width of the min. width of the overall dialog, which would help give more space for long text strings to be readable. This affects other jobs so, it’s not just a 3dsMax job thing.
Add Feature? - “RAdmin” - “ConnectWithRAdmin.py” script? Back in the day, didn’t FF use RAdmin. I vaguely remember there being RAdmin functionality in Deadline v2.7?
I assume previously supplied feedback on the “Configure Cloud Providers…” dialog hasn’t been fixed yet?
MultiRegion SingleFrame Tile rendering jobs now fail to submit a Draft job via the SMTD as of beta 7.

MXS Listener prints out:
“C:\MultiRegionRenderTests\PNG\test0000_config_2013_10_10__17_18_48.txt”
“C:\MultiRegionRenderTests\PNG\MultiMatteElement1_MultiMatteElement\test_MultiMatteElement0000_config_2013_10_10__17_18_48.txt”
“C:\MultiRegionRenderTests\PNG\MultiMatteElement2_MultiMatteElement\test_MultiMatteElement0000_config_2013_10_10__17_18_48.txt”
“C:\MultiRegionRenderTests\PNG\MultiMatteElement3_MultiMatteElement\test_MultiMatteElement0000_config_2013_10_10__17_18_48.txt”
“C:\MultiRegionRenderTests\PNG\VRaySpecular_VRaySpecular\test_VRaySpecular0000_config_2013_10_10__17_18_48.txt”
“C:\MultiRegionRenderTests\PNG\VRayLighting_VRayLighting\test_VRayLighting0000_config_2013_10_10__17_18_48.txt”
“C:\MultiRegionRenderTests\PNG\VRayVelocity_VRayVelocity\test_VRayVelocity0000_config_2013_10_10__17_18_48.txt”

SMTD prints out:
–DEADLINE DISTRIBUTED TILES JOB SUBMISSION FAILED.

See attached SMTD log report.

SubmitMaxToDeadline - [WIN7X64] - 10-10-2013-0010.log (21.8 KB)

Every so often I get this in the console. Issue?:

2013-10-10 18:57:32:  Error occurred while updating slave cache: Read failure (System.IO.IOException)

Pulse Web Service - crashing. [CAVEAT: tested via Deadline v5.2], if via iOS or Android on WiFi or 3G connection, I press the “refresh”, then “cancel”, then “refresh”, then “cancel” button enough times before the first data refresh has completed, it will crash out the web-service running on that Pulse and indeed lock the entire Pulse application up. Closing/Force killing the application and re-opening it normally fixes the issue. Yeah, I know v6 removes a lot of the work from Pulse…but I reckon this is worth giving a test as I reckon none of the web-service code has changed much since v5.2? (Make sure you have a good number of jobs all doing something when you carry out the tests. ie: 50+ slaves and 400 jobs+, of which 30+ are active) as this seems to make a difference. (READ: my previous comment about this being on v5.2)
“0: Got task!” - has this Slave STDout been removed from the Slave now, when it successfully picks up a task? Not sure, if it is internal debug info that should be cleaned up?
I sometimes get these error messages on my Win VM. I think it’s whenever the OSX host machine goes to sleep, where MongoDB lives and hence it loses it’s connection:

2013-10-11 13:00:53: Error occurred while updating pulse cache: No such host is known (System.Net.Sockets.SocketException) 2013-10-11 13:04:36: Error occurred while updating Cloud Instances: An unexpected error occurred while interacting with the database (mbp.local:27017): 2013-10-11 13:04:36: No such host is known (FranticX.Database.DocumentException) 2013-10-11 13:04:36: at b.a(MongoServer A_0, Exception A_1) 2013-10-11 13:04:36: at Deadline.StorageDB.MongoDB.MongoCloudStorage.GetCloudRegions(Boolean invalidateCache) 2013-10-11 13:04:36: at Deadline.StorageDB.CloudStorage.a() 2013-10-11 13:05:52: Error occurred while reloading network settings: An unexpected error occurred while interacting with the database (mbp.local:27017): 2013-10-11 13:05:52: No such host is known (FranticX.Database.DocumentException) 2013-10-11 13:11:38: Error occurred while updating Cloud Instances: An unexpected error occurred while interacting with the database (mbp.local:27017): 2013-10-11 13:11:38: No such host is known (FranticX.Database.DocumentException) 2013-10-11 13:11:38: at b.a(MongoServer A_0, Exception A_1) 2013-10-11 13:11:38: at Deadline.StorageDB.MongoDB.MongoCloudStorage.GetCloudRegions(Boolean invalidateCache) 2013-10-11 13:11:38: at Deadline.StorageDB.CloudStorage.a() 2013-10-11 13:12:45: Error occurred while updating limit group cache: No such host is known (System.Net.Sockets.SocketException)

“Changed the wording of the Sync All Auxiliary Files job property to Re-sync Auxiliary Files Between Tasks” - Nope, it’s still named the old way! (in the job properties dialog - checkbox at the bottom)
House Cleaning in a separate thread now generates a lot of STDout in console. Looks ok to me. Just wondering if it’s a bit too much?

2013-10-11 13:44:56: Update timeout has been set to 300 seconds 2013-10-11 13:44:56: Stdout Handling Enabled: False 2013-10-11 13:44:56: Popup Handling Enabled: False 2013-10-11 13:44:56: Using Process Tree: True 2013-10-11 13:44:56: Hiding DOS Window: True 2013-10-11 13:44:56: Creating New Console: False 2013-10-11 13:44:56: Executable: "/Applications/Thinkbox/Deadline6/Resources/deadlinecommand" 2013-10-11 13:44:56: Argument: -DoHouseCleaning 0 True 2013-10-11 13:44:56: Startup Directory: "/Applications/Thinkbox/Deadline6/Resources" 2013-10-11 13:44:56: Process Priority: BelowNormal 2013-10-11 13:44:56: Process Affinity: default 2013-10-11 13:44:56: Process is now running 2013-10-11 13:44:56: Purging repository temp files 2013-10-11 13:44:56: Purging deleted jobs 2013-10-11 13:44:56: purging deleted job '5252a6c871bd0c0fa0fd3e6a' because it was deleted over 1 hour(s) ago 2013-10-11 13:44:56: Purging limits 2013-10-11 13:44:56: Purging old job and slave reports 2013-10-11 13:44:56: purging job reports for '5252a6c871bd0c0fa0fd3e6a' because the job no longer exists 2013-10-11 13:44:56: purging job reports for '5254149d71bd0c012079f8cf' because the job no longer exists 2013-10-11 13:44:56: purging job reports for '525522d171bd0c0834d192a1' because the job no longer exists 2013-10-11 13:44:56: purging job reports for '5255237771bd0c082042b3c4' because the job no longer exists 2013-10-11 13:44:56: purging job reports for '52554c8771bd0c0e70ba7c4d' because the job no longer exists 2013-10-11 13:44:56: purging job reports for '52554cba71bd0c0d40d41bf1' because the job no longer exists 2013-10-11 13:44:56: purging job reports for '52554cd771bd0c0b5c42f044' because the job no longer exists 2013-10-11 13:44:56: purged 9 job report files 2013-10-11 13:44:56: Purging old job auxiliary files 2013-10-11 13:44:56: Purging job auxiliary files '5252a6c871bd0c0fa0fd3e6a' because the job no longer exists 2013-10-11 13:44:56: Purging job auxiliary files '5254149d71bd0c012079f8cf' because the job no longer exists 2013-10-11 13:44:56: Purging job auxiliary files '525522d171bd0c0834d192a1' because the job no longer exists 2013-10-11 13:44:56: Purging job auxiliary files '5255230171bd0c0878616591' because the job no longer exists 2013-10-11 13:44:56: Purging job auxiliary files '5255237771bd0c082042b3c4' because the job no longer exists 2013-10-11 13:44:56: Purging job auxiliary files '52554c8771bd0c0e70ba7c4d' because the job no longer exists 2013-10-11 13:44:56: Purging job auxiliary files '52554cba71bd0c0d40d41bf1' because the job no longer exists 2013-10-11 13:44:56: Purging job auxiliary files '52554cd771bd0c0b5c42f044' because the job no longer exists 2013-10-11 13:44:56: Purging old statistics 2013-10-11 13:44:56: Purging slave statistics that are older than Jun 13/13 13:44:56 2013-10-11 13:44:56: Purging repository statistics that are older than Jun 13/13 13:44:56 2013-10-11 13:44:56: Checking available Database connections 2013-10-11 13:44:56: Purging obsolete slaves 2013-10-11 13:44:56: Performing Job Repository Scan... 2013-10-11 13:44:56: Loading jobs 2013-10-11 13:44:56: Scanning jobs 2013-10-11 13:44:57: Archived completed job "FumeFX_Sim_Progress_SCANLINE" because Archive On Complete was enabled. 2013-10-11 13:44:57: Released pending job "FumeFX_Sim_Progress_MR" because its dependencies are finished and/or its required assets are available. 2013-10-11 13:44:57: Cleaning up orphaned tasks 2013-10-11 13:44:57: Done. 2013-10-11 13:44:57: Purging unsubmitted jobs 2013-10-11 13:44:57: Process exit code: 0

“The slave will now return its license if it is disabled.” - license loop-hole? What if I own 5 licenses but go around starting up slaves and then instantly disable them and so-forth…wouldn’t I be able to run up lot’s of slaves > 5?

rrussell · October 11, 2013, 6:06pm

Yeah, we’re seeing some weird behavior when panels are tabbed. We’ll look into it.
I can’t reproduce this. I saved the layout from a Mac to the server, reloaded it from a PC and a Mac, and I was then able to delete the file while both monitors were still open. I also tested saving the layout to a linux server and a windows server.
Hmm, would that actually work? I guess it depends on what triggers the simluation. We would have to test if the simulation gets triggered at all if the render function isn’t called. Do you have a fumefx sample scene we could use to test this?
We’ll clean those up so that they don’t include the stack trace.
Thanks! We’ll get that fixed.
Currently, none of the dialogs that use that auto-generated panel allow it to resize. We’ll put that on the todo list, but I don’t think we’ll worry about it for 6.1.
We tested with it briefly way back in the day. I’m not sure if anyone else every actually used it. If we get requests for it, we’ll add it.
You’ll have to refresh my memory here. Also, might be best to ask to the VMX guys directly about this one.
I can’t seem to reproduce this. Can you post a simple scene that reproduces? Also, maybe a screen shot of your Tile settings?
Probably not. Probably just an issue reading data from Mongo. The Monitor would get the new data the next time it requests it from Mongo anyways.
Can you test with 6.1 to see if this is still a problem? We did make some improvements in the web service code when running Python scripts, and from the tests we did back then, things seem to be stable.
It should still be printed out. Do you have verbose logging enabled? It only gets printed out when verbose logging is enabled.
Yup, that’s due to MongoDB not being accessible.
Now how the heck did we miss that one? It will be updated in the next beta. I double checked, and it currently is correct in the Job Details and the Manage User Groups dialog.
I don’t think it’s too much. Also, we didn’t make it any more verbose than it was in previous betas.
A disabled slave or unlicensed slave can’t pick up tasks, so it’s not a loop-hole. Starting back in 6.0, it is actually possible to run more slaves than you have licenses, but you will only be able to render with as many slaves as you have licenses. The reason for this change is in the 6.0 release notes: Slaves now only need access to a license when rendering jobs. If the license server goes offline, the Slave will continue to run instead of shutting itself down. However, the Slave won’t be able to render another job until the license server comes back online.

LaszloSebo · October 11, 2013, 6:16pm

We were seeing similar crashy behavior with 6.0 btw, and ended up turning the webservice off. Since pulse is running some critical processes (the dependency handling for example), its too risky…

rrussell · October 11, 2013, 6:58pm

Could you launch another instance of Pulse on a separate machine, and see if you can still reproduce this in 6.1? Having a second instance of Pulse shouldn’t cause any issues, since locking is performed during any repository scans or power management checks, and if that second instance crashes, no big deal.

All the python instability issues we fixed during the 6.1 beta have already been applied to the web service, so assuming this was the cause of the problems you were seeing in 6.0, this should be fixed.

Thanks!

Ryan

LaszloSebo · October 11, 2013, 9:11pm

I could crash pulse basically by trying to get a deadline app running on my phone. Do you think that would be related to the python issues?

rrussell · October 11, 2013, 10:23pm

It could be. The mobile app calls a couple of web service python scripts (they sit in \your\repository\scripts\webservice).

MikeOwen · October 14, 2013, 1:59pm

Hi Ryan,

I can’t reproduce this either. Possible VM weirdness / restart fixed the issue kind of deal. Forgot and I’ll watch for this issue in the future.
Absolutely! No SDK render command is required by FFX. See the email attachments I sent you last week for a Deadline-3dsMax-FFX job. As long as the FFX “Backburner-Mode” button is enabled before a Max job is submitted to Deadline, then it will actually ignore whatever frame range you send with the job and just start sim’ing it’s FFX sim range once the Max file has been opened. You will notice that the sim’ing takes place, before Lightning has had a chance to execute the ‘render’ command. After the sim has completed, Lightning then continues with a ‘render’ command, which in this case is irrelevant. So, instead of trying to idenitfy when a FFX sim job is specifically sent to Deadline and hence, ‘skip’ the SDK render command…I was thinking to keep it more generic and useful for other non-render jobs in the future, to expose a “Do NOT render” command to the 3dsMax specific plugin info job. It doesn’t actually have to be a new function inside of the Lightning.dlx; it could just be handled in the plugin py code by effectively skipping the command. a.k.a, like when it’s a MaxScript job. Shout if you need me to re-send Max-Deadline job files.
See attached. At the moment, it’s just empty space and would be really useful to support long “text strings” for all plugin settings…

Yep, looks like alpha 1 feedback is still WIP.
See attached Max2013 file. Open file, Open SMTD, Click on Tiles tab, Click on “GET from Camera…”, select only option (16 tiles - 4x4), enable Multi-Region Rendering, ensure Draft settings at bottom of rollout are enabled, ie: “Use Draft for Assembly” & “Submit Dependent Assembly Job” and the Max file will get into the queue, but the Draft job fails to be submitted.

DraftTileRenderingTest_MultiRegions_SingleFrame.zip (21.3 KB)

Sorry, I just don’t have access to everything I would need to test/re-create this situation. Perhaps, another studio will be able to test? However, couple of questions. Is the Pulse web-service and Python API service both run as non-blocking, separate threads? So even given worst case scenario of both services crashing, it still wouldn’t bring down Pulse or in my situation, just lock it up and stop it from working? If a Python script such as a PM command is executed which is a locally executed Python script, does the PM thread also run as a separate thread? ie: If the PM thread executed a machine start-up command which was a Python IPMI wake-up script AND someone also executed a Pulse Service script either via a mobile app or say, web-browser based, would they clash? As I said, very hard for me to re-create my previous situations. Sorry.
OK. Yep, verbose is enabled.
Wow! Never thought of that! Being able to run the slaves, but not actually be allowed to do any “processing” without a license is so much more flexible! Nice

rrussell · October 16, 2013, 3:50pm

Cool! Thanks for the additional info. In beta 8, there will be an option in SMTD under the Render tab called Disable Frame Rendering.
Yup, it’s on the todo list, just not sure if we’ll have time to fix it before 6.1 goes out.
Thanks! I think Bobo is doing some work on the region rendering stuff right now. I’ll point him to this.

Cheers,

Ryan

grant.bartel · October 18, 2013, 9:12pm

Hey Mike,

Quick update for 9. This is fixed, we were using the render element class instead of the name when naming the config files so we had a duplicate. It will be in the new build on monday

Grant

MikeOwen · October 18, 2013, 9:28pm

Ah, cool. Cheers Grant!