AWS Thinkbox Discussion Forums

Initial Deadline Setup Issues & Help Required

Hi,

We are running into the below errors when trying to run a cmdline test job while we have just started setting up deadline.

2023-03-06 21:50:44: 0: Failed to properly create Deadline Worker data folder 'Thinkbox\Deadline10\workers' because: The SlaveDataRoot path in the deadline.ini file isn't a rooted path. (Deadline.Configuration.DeadlineConfigException)

and

2023-03-06 21:50:44: 0: ERROR: DataController threw an unexpected exception during initialization: FranticX.Database.DatabaseConnectionException: Could not connect to any of the specified Mongo DB servers defined in the "Hostname" parameter of the "settings\connection.ini" file in the root of the Repository.

Mongo DB “settings\connection.ini” info:
Hostname=fpdeadline01;192.168.0.107

and we are able to telenet fpdeadline01 27100 from the same worker machine which is giving above errors, please find below the Mongo DB config file for your reference:

#MongoDB config file

systemLog:
  destination: file
  # Mongo DB's output will be logged here.
  path: C:\DeadlineDatabase10\mongo\data\logs\log.txt
  # Default to quiet mode to limit log output size. Set to 'false' when debugging.
  quiet: true
  # Increase verbosity level for more debug messages (max: 5)
  verbosity: 0

net:
  # Port MongoDB will listen on for incoming connections
  port: 27100
  ipv6: true
  ssl:
    # SSL/TLS options
    mode: disabled
    # If enabling TLS, the below options need to be set:
    #PEMKeyFile:
    #CAFile:
  # By default mongo will only use localhost, this will allow us to use the IP Address
  bindIpAll: true

storage:
  # Database files will be stored here
  dbPath: C:\DeadlineDatabase10\mongo\data
  engine: wiredTiger

security:
  authorization: disabled

Any thoughts or pointers to fix these will be really helpful.

Thanks,
Fractal Picture

related to:

Hey, I’m very new to deadline too. But I’ll take a guess at your issue regarding the SlaveDataRoot. It sounds like the SlaveDataRoot inside the desktop.ini on the given worker doesn’t have a root. Like for example, if the worker is a Windows computer. I believe the path has to start with a drive letter or double backslash if it’s a network drive. Like C:\Thinkbox\Deadline10 or \networkDrive\Thinkbox\Deadline10.

Thanks, @Mads_Hangaard while your point is correct, the deadline.ini (C:\ProgramData\Thinkbox\Deadline10) for our deadline client machines’ looks like this:

[Deadline]
LicenseMode=LicenseFree
LicenseServer=
Region=
LauncherListeningPort=17000
LauncherServiceStartupDelay=60
AutoConfigurationPort=17001
SlaveStartupPort=17003
SlaveDataRoot=
RestartStalledSlave=false
NoGuiMode=false
LaunchSlaveAtStartup=1
AutoUpdateOverride=
ConnectionType=Repository
ProxyRoot=fpdeadline01:8080
ProxyUseSSL=False
ProxySSLCertificate=
ProxySSLCA=
ClientSSLAuthentication=Required
NetworkRoot=//fpdeadline01/DeadlineRepository10
DbSSLCertificate=
NetworkRoot0=//fpdeadline01/DeadlineRepository10
ProxyRoot0=fpdeadline01:8080
LogReportSyntaxHighlighting=true

where the SlaveDataRoot= is empty by default and therefore we expect the defaults to work correctly or give us an error message that’s a little more specific as to which default location the write attempt is being made and what fixes would allow us to get this working?

Is that make sense to you? Looking forward to your thoughts.

Thanks,
Bhavik

I see, then I don’t know why the error is happening. But I can tell you, that when we set up Deadline. The default was C:\Users\USER\AppData\Local\Thinkbox\Deadline10. I’m sorry I couldn’t be of better assistance.

Not a prob @Mads_Hangaard we also tried running this job with the below change in deadline.ini:

SlaveDataRoot=%AppData%\Thinkbox\Deadline10

and restarted the Deadline 10 Launcher Service and re-run the Command Line test task, however that again resulted into the same error as below:

2023-03-08 18:46:55:  0: Failed to properly create Deadline Worker data folder 'Thinkbox\Deadline10\workers' because: The SlaveDataRoot path in the deadline.ini file isn't a rooted path. (Deadline.Configuration.DeadlineConfigException)
2023-03-08 18:46:55:  0: ERROR: DataController threw an unexpected exception during initialization: FranticX.Database.DatabaseConnectionException: Could not connect to any of the specified Mongo DB servers defined in the "Hostname" parameter of the "settings\connection.ini" file in the root of the Repository.

Anyone else on these forums, would you recommend any other tests/checks to further narrow down the troubleshooting of this issue?

Thanks,
Bhavik

Additionally, we also tried again with the C:\ProgramData\Thinkbox\Deadline10\deadline.ini modified with:

SlaveDataRoot=C:\LocalSlaveData

Where C:\LocalSlaveData has permissions for Everyone to Modify, Read & Execute, List folder contents, Read, Write and the contents of C:\LocalSlaveData post filed job execution are:

C:\LocalSlaveData
\---fprdsk113
    +---jobsData
    |   \---6405f65b2f17b32c0725f147
    \---plugins
        \---6405f65b2f17b32c0725f147

so from the log errors, it is still somewhat unclear as to what exactly is the issue and how we fix it.

@karpreet would you be able to help us with this?

Thanks,
Bhavik

Hello,

  1. Does your deadline repo/settings/connection.ini file (e.g. EnableSSL=False and Authenticate=False) match the mongo config.conf ?

  2. Not sure if the affects anything, but assuming you’re on windows, can you re-write the NetworkRoot and NetworkRoot0 with backslashes (not forward slash) e.g.: \\fpdeadline01\DeadlineRepository10
    I believe the settings\connection.ini file is getting its root path from the NetworkRoot

  3. From the docs: The windows path location for the worker is: %PROGRAMDATA%\Thinkbox\Deadline[VERSION]\workers\[WORKERNAME]
    Does %PROGRAMDATA% expand/resolve correctly on your render node e.g.:

PS C:\Users\deadline> dir env:programdata

Name                           Value
----                           -----
ProgramData                    C:\ProgramData
  1. Your test of SlaveDataRoot=C:\LocalSlaveData, I think the permissions needs to be Full Control (this folder, subfolders, and files) as the worker is creating and deleting the jobsData and plugins folders and files under workers\[WORKERNAME]
1 Like

Do you get the same error ^ when SlaveDataRoot is set to C:\LocalSlaveData?

Usually the permissions right under C are inherited from the parent.

If you are logged in the render node as the same user the Worker is running under. Then recreate the issue and follow here: Troubleshooting — Deadline 10.2.0.10 documentation
Do you get the same error?

Do a similar test with AppData location too.

Hey I wanted add to my response. %APPDATA% or any path which starts with the environment variable like this cannot be expanded by the Worker because only Windows can do that Worker cannot.

Hi @jarak

Sorry couldn’t catch up on this thread earlier. Please find below the responses.

image
and

at the deadline worker power shell gives this output:

dir env:programdata

Name                           Value
----                           -----
ProgramData                    C:\ProgramData

As for the SlaveDataRoot it has full permissions,
image

noted @zainali thanks for that info and sure we will keep this in mind.

Additionally, I am trying to run the deadlinecommand to get the DatabaseSettings which seems to be working fine for the local (LAN) Deadline Repository. However, the same isn’t working for the remote Deadline Repository which we are able to connect over the RCS through Deadline Monitor GUI.

Please refer to the below example outputs for more details and reference,

Working for Local Deadline Repository:

"%DEADLINE_PATH%"\deadlinecommand -GetDatabaseSettings //fpdeadline01/DeadlineRepository10
 Database Type: MongoDB
   Hostname(s): fpdeadline01;192.168.0.117
 Database Name: deadline10db
          Port: 27100
      Port Alt: 0
   SSL Enabled: False
  Authenticate: False
     User Name:
   Replica Set:
      Split DB: False
       Version: 10

Failing for remote Deadline Repository:

option1:

"%DEADLINE_PATH%"\deadlinecommand -GetDatabaseSettings sgtdlrepo:4433;"D:\RCS Certs\RCS Certs\Deadlin10RemoteClient.pfx"
 Database Type: Unknown
   Hostname(s):
 Database Name:
          Port: -1
      Port Alt: -1
   SSL Enabled: False
  Authenticate: False
     User Name:
   Replica Set:
      Split DB: False
       Version: 6

option2:

"%DEADLINE_PATH%"\deadlinecommand -GetDatabaseSettings sgtdlrepo:4433
 Database Type: Unknown
   Hostname(s):
 Database Name:
          Port: -1
      Port Alt: -1
   SSL Enabled: False
  Authenticate: False
     User Name:
   Replica Set:
      Split DB: False
       Version: 6

option3:

"%DEADLINE_PATH%"\deadlinecommand RunCommandForRepository Remote sgtdlrepo:4433;"D:/RCS Certs/RCS Certs/Deadline10RemoteClient.pfx" -GetDatabaseSettings
Warning: This command does not support "RunCommandForRepository" and that option will be ignored.
An error occurred while updating the database settings:

Index was outside the bounds of the array. (System.IndexOutOfRangeException)

option4:

"%DEADLINE_PATH%"\deadlinecommand -GetDatabaseSettings remote sgtdlrepo:4433;"D:\RCS Certs\RCS Certs\Deadlin10RemoteClient.pfx"
 Database Type: Unknown
   Hostname(s):
 Database Name:
          Port: -1
      Port Alt: -1
   SSL Enabled: False
  Authenticate: False
     User Name:
   Replica Set:
      Split DB: False
       Version: 6

GetDatabaseSettings doesn’t work when pointed at the RCS. It’s reading out what’s set in DeadlineRepository10\settings\connection.ini. The command should be smart enough to tell you this in a human friendly way. I’ll get a dev ticket in to improve that.

If you want to confirm database settings, you’ll have to run that command on the RCS machine itself as it’ll be using a direct connection to the database.

To test connection to the database you could instead use something like deadlinecommand -getpools, which will prove a connection and pulling data from the database.

1 Like

Sure @Justin_B get that, and that’s very useful info.

My apologies if I should be posting this to some other topics and threads, but since I am new both to Deadline itself as well as these forums, a few quick questions:

  1. For already submitted jobs - how do we edit its “Job Info Parameters” as well as its “Plugin Info Parameters”?

  2. How can we edit the already submitted Gaffer job to add the “-threads 16” parameter to it? and how can we get the respective job’s execution also to pick it up and respect that flag for the actual command line execution?

  3. Can we have the Job’s “Concurrent Tasks” be a dynamic or programmatically calculated value? For example, for the Deadline-Gaffer job, we want to drive this through the job’s submission parameter “-threads int” value. And hoping to implement it in a way like the one below:

where the goal is to send concurrent jobs on a Deadline worker(s) with heterogeneous cores of machines (which would be say a mix of 16,32,72 and 128 cores) - where the job’s requested rendering thread counts should assign multiple concurrent jobs to a deadline worker, where

-threads 16

on a 16 core worker - it should assign/run one job/task
on a 32 core worker - it should assign/run two concurrent jobs/tasks
on a 72 core worker - it should assign/run four concurrent jobs/tasks
on a 128 core worker - it should assign/run eight concurrent jobs/tasks

this is what I am exploring right now,

Thanks,
Bhavik

The connection.ini and mongodb cfg seems good.

Can you please post your deadline.ini file? I think in the other thread/post you have an entry forConnectionType=Respository, but you also have ProxyRoot=fpdeadline01:8080

Maybe someone from Thinkbox can answer, but I think this may be where your worker is getting confused. If the ConnectionType is Repository or Direct, it will use NetworkRoot. But if the ConnectionType is Remote, it will use ProxyRoot, (Proxy* etc). Not sure if having both types entries is the issue, since one would expect it to ignore the Proxy keys if the ConnectionType is set to Repository.

For the next test, I suggest that you only have one or the other (i.e. Respository or Remote type entries).
Can you try removing the Proxy* and ClientSSLAuthentication entries?
e.g.:

[Deadline]
LicenseMode=LicenseFree
LicenseServer=
Region=
LauncherListeningPort=17000
LauncherServiceStartupDelay=60
AutoConfigurationPort=17001
SlaveStartupPort=17003
SlaveDataRoot=
RestartStalledSlave=false
NoGuiMode=false
LaunchSlaveAtStartup=1
AutoUpdateOverride=
ConnectionType=Repository
NetworkRoot=\\fpdeadline01\DeadlineRepository10
DbSSLCertificate=
NetworkRoot0=\\fpdeadline01\DeadlineRepository10
LogReportSyntaxHighlighting=true

And just to double check, you worker can reach the repo using the path: \\fpdeadline01\DeadlineRepository10

For the SlaveDataRoot You did not show the Full Control permission checkbox which is just above Modify. Can you click on Advanced and double check that the Permission entries are:

Type Allow
Principal Everyone
Access Full Control
Inherited from None
Applies to This folder, subfolders, and files

Sure @jarak please note that now we are working with two Deadline Repositories/Servers. One is local/LAN and the other is remote over RCS. Attaching the latest deadline.ini from C:\ProgramData\Thinkbox\Deadline10

Yes, you are right, but from what I can tell, from the Deadline Monitor GUI we have set the default repository as:
image

Permissions for C:\LocalSlaveData

Also, we will be working with both these DirectConnection (Local) and RCS (Remote) repositories. That’s on-premise and cloud Deadline repositories. So it is necessary for us to get things working with these. Any thoughts on fixing or setting things up for our use case will be very helpful.

Thanks,
Bhavik

Additionally, I am noticing the connections from Deadline Worker to the Deadline Repository for the below ports,

netstat -a | find "fprdsk113"
  TCP    192.168.0.117:445      fprdsk113:50214        ESTABLISHED
  TCP    192.168.0.117:3389     fprdsk113:59640        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:49739        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:49740        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:49741        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:49742        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:49796        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:49797        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:49798        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:49799        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:50441        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:50442        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:50443        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:50444        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:50448        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:50449        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:50450        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:50451        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:57177        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:57178        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:57179        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:57180        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:58965        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:58983        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:59261        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:59380        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:59417        ESTABLISHED

C:\LocalSlaveData needs to have “Full Control” permissions granted to it.
DL_perms

from MS:
Full control: Allows users to read, write, change, and delete files and subfolders.

The deadline worker needs full control perms in order to create and delete the temp. jobsData and plugins folders.

sure thanks for pointing that @jarak I have done the permissions fixes and rebooted the Deadline worker node and tried running the job again but no luck! I am sure something is going wrong on our side but not sure how do I narrow it down and fix it!

Attaching Job_2023-04-04_12-08-29_642bc60748224261782ad700.7z the latest re-run failed/error log for your perusal.

Thanks,
Bhavik

Privacy | Site terms | Cookie preferences