AWS Thinkbox Discussion Forums

Many errors in RC2

Hello,

We’ve been testing this latest version for a week and here’s a breakdown of what I noticed:
-slaves fail to connect to repository, show as Offline. Numerous restarts temporally fix this.
-while rendering slaves generate errors (see attached), on a 100 task job we get over 100 errors.
-overall startup speed has gone down. tasks take 30+ minutes to start rendering.

Please let me know what types of additional logs you would like to see.

All slaves are on Win7 x64, launcher is ran as service.

Thanks

Job_2016-04-25_15-48-16_571e748095d1f6061c4c0f5c.txt (88.9 KB)
Job_2016-04-25_16-00-27_571e775bbb6cb108e04f2e24.txt (90.4 KB)

Hey Simon,

Could you upload a full Slave log for one of your runs on with the latest version? You should be able to find them in C:\ProgramData\Thinkbox\Deadline8\logs on Windows.

In terms of connecting to the Repository, what is your setup like? Where is the client and server (Proxy, File share, and DB) relative to each other? Are you connecting to the file share & DB directly or via proxy? We do some local caching of Repository scripts and APIs now, so start up time will likely have gone up compared to 7.2, but it definitely shouldn’t be 30 minutes! What kind of startup times were you seeing in the previous betas?

And finally, have you tried running Deadline outside of the service context, to see if that makes any difference on the errors you’re seeing, or the initial startup time?

logs.7z (556 KB)

Client, file share and DB are all on the same subnet.
The connection to DB and file share is direct.
By start up I meant was the time the task is given to a slave to when it actually shows any activity (progress bar). It also shows the slave as rendering but the Job Plugin field is blank.
In the 7.2 version it took 1-2 minutes to start rendering.
There were no changes to the network, everything is still on 1gb, I have throttling enabled and default values.
On Thursday, I am installing a new file server with 10gb connections to match the blade server NIC speed.

I will try to reinstall the deadline clients without services and see if that changes anything.

Does it matter if the DB and license server are on Win7? I know that Win7 has a connection limit, but the repo is on a fileshare. Should I move DB and license to Win2008 server for example?

Overall when looking at the Monitor, the updating of information is slower. As if the slaves are not sending out updates. Sometimes the progress goes is stuck for 1-2min and then it updates. In 7.2 you were able to see updates every couple of seconds.

Thanks

Gotcha, that time seems even more out of whack given that everything is local and direct-connect.

But yes, the connection limit on Windows 7 would definitely impact things like access to license server, database, and file shares on that machine. While the connection limit is higher on Win 7 than previous non-server Windows version, it’s still relatively low (I think it’s 15 or 20 now, as opposed to 5). Given that the Monitor updates its data continuously and the Slaves would be opening connections for licenses and to talk to the Deadline DB, I imagine you’d reach that cap fairly quickly. You can get an idea of how many connections are being made by running ‘netstat’ from the command line, or even looking at the MongoDB log, which should be regularly printing out how many current connections are being maintained.

If you’ve got a server machine around, I would highly suggest moving that stuff over and see if there’s an improvement. If that doesn’t change things, we’ll obviously try to sort this out still :slight_smile:

alright, moved everything to windows server box. Now I’m having issues with the license server.

8:48:37 (lmgrd) SLOG: FNPLS-INTERNAL-VL1-1024 8:48:37 (lmgrd) Starting vendor daemons ... 8:48:37 (lmgrd) Starting vendor daemon at port 2708 8:48:37 (lmgrd) Using vendor daemon port 2708 specified in license file 8:48:37 (lmgrd) License server manager (lmgrd) startup failed: 8:48:37 (lmgrd) CreateProcess error code: 0x36b1 File= thinkbox.exe

I am using this installer:
ThinkboxLicenseServer-11.13.1.2.1-windows-installer

Solved the license issue, I was missing VC++ 2008.

Actually, both the 32 and 64-bit version of the MSVC redists should be bundled and installed for you.

Can you send along the installer logs for the license server? They should be in “%temp%” on Windows, so just paste that into Explorer’s address bar, then look for text files with the words “bitrock” in the name.

[code]Log started 04/28/2016 at 08:48:14
Preferred installation mode : win32
Trying to init installer in mode win32
Mode win32 successfully initialized
Preparing to Install
Removing Firewall Exceptions
Executing netsh advfirewall firewall delete rule name=“Thinkbox”

Script exit code: 1

Script output:

No rules match the specified criteria.

Script stderr:
Program ended with an error exit code

Error running netsh advfirewall firewall delete rule name=“Thinkbox”
: Program ended with an error exit code
Removing Firewall Exceptions
Executing netsh advfirewall firewall delete rule name=“lmgrd”
Script exit code: 1

Script output:

No rules match the specified criteria.

Script stderr:
Program ended with an error exit code

Error running netsh advfirewall firewall delete rule name=“lmgrd”: Program ended with an error exit code
Remove Firewall Exceptions
Executing netsh advfirewall firewall delete rule “Thinkbox License Server”

Script exit code: 0

Script output:

Deleted 1 rule(s).
Ok.

Script stderr:

Remove Firewall Exceptions
Executing netsh advfirewall firewall delete rule “Thinkbox Vendor Daemon”

Script exit code: 0

Script output:

Deleted 1 rule(s).
Ok.

Script stderr:

Removing Old Installation
Executing C:\Program Files (x86)\Thinkbox\License Server\uninstall.exe --mode unattended
Script exit code: 1

Script output:

Script stderr:
The system cannot find the path specified.

Error running C:\Program Files (x86)\Thinkbox\License Server\uninstall.exe --mode unattended: The system cannot find the path specified.
Removing Old Installations
Executing C:\Program Files\Thinkbox\License Server/uninstall.exe --mode unattended
Script exit code: 0

Script output:

Script stderr:

Preparing to Install
Directory already exists: C:\Program Files\Thinkbox\License Server
Unpacking files
Directory already exists: C:\Program Files\Thinkbox\License Server
Unpacking files
Unpacking C:\Program Files\Thinkbox\License Server\vcredist_x86.exe
Unpacking C:\Program Files\Thinkbox\License Server\thinkbox.opt
Unpacking C:\Program Files\Thinkbox\License Server\lmgrd.exe
Unpacking C:\Program Files\Thinkbox\License Server\lmtools.exe
Unpacking C:\Program Files\Thinkbox\License Server\lmutil.exe
Unpacking C:\Program Files\Thinkbox\License Server\thinkbox.exe
Creating Shortcut for Uninstall Thinkbox License Server
Executing C:\Program Files\Thinkbox\License Server/vcredist_x64.exe /q
Script exit code:

Script output:

Script stderr:

Unknown error while running C:\Program Files\Thinkbox\License Server/vcredist_x64.exe /q
Removing Excess Files
Removing Excess files
Unable to start Thinkbox License : Service not responding
Executing netsh advfirewall firewall add rule name=“Thinkbox Vendor Daemon” dir=in action=allow program=“C:\Program Files\Thinkbox\License Server\thinkbox.exe” enable=yes
Script exit code: 0

Script output:
Ok.

Script stderr:

Executing netsh advfirewall firewall add rule name=“Thinkbox License Server” dir=in action=allow program=“C:\Program Files\Thinkbox\License Server\lmgrd.exe” enable=yes
Script exit code: 0

Script output:
Ok.

Script stderr:

Creating Uninstaller
Creating uninstaller 25%
Creating uninstaller 50%
Creating uninstaller 75%
Creating uninstaller 100%
Installation completed
Log finished 04/28/2016 at 08:48:55
[/code]

I think i resolved this issue. While I was looking over the networking setting on our blades, I noticed that one of the BMC IPs changed to the same one as our file share. Stupid mistake on my part :blush:. It seems, everything is back to normal.

Gotcha, glad to hear you managed to figure it out!

Privacy | Site terms | Cookie preferences