AWS Thinkbox Discussion Forums

Initial Deadline Setup Issues & Help Required

Not a prob @Mads_Hangaard we also tried running this job with the below change in deadline.ini:

SlaveDataRoot=%AppData%\Thinkbox\Deadline10

and restarted the Deadline 10 Launcher Service and re-run the Command Line test task, however that again resulted into the same error as below:

2023-03-08 18:46:55:  0: Failed to properly create Deadline Worker data folder 'Thinkbox\Deadline10\workers' because: The SlaveDataRoot path in the deadline.ini file isn't a rooted path. (Deadline.Configuration.DeadlineConfigException)
2023-03-08 18:46:55:  0: ERROR: DataController threw an unexpected exception during initialization: FranticX.Database.DatabaseConnectionException: Could not connect to any of the specified Mongo DB servers defined in the "Hostname" parameter of the "settings\connection.ini" file in the root of the Repository.

Anyone else on these forums, would you recommend any other tests/checks to further narrow down the troubleshooting of this issue?

Thanks,
Bhavik

Additionally, we also tried again with the C:\ProgramData\Thinkbox\Deadline10\deadline.ini modified with:

SlaveDataRoot=C:\LocalSlaveData

Where C:\LocalSlaveData has permissions for Everyone to Modify, Read & Execute, List folder contents, Read, Write and the contents of C:\LocalSlaveData post filed job execution are:

C:\LocalSlaveData
\---fprdsk113
    +---jobsData
    |   \---6405f65b2f17b32c0725f147
    \---plugins
        \---6405f65b2f17b32c0725f147

so from the log errors, it is still somewhat unclear as to what exactly is the issue and how we fix it.

@karpreet would you be able to help us with this?

Thanks,
Bhavik

Hello,

  1. Does your deadline repo/settings/connection.ini file (e.g. EnableSSL=False and Authenticate=False) match the mongo config.conf ?

  2. Not sure if the affects anything, but assuming you’re on windows, can you re-write the NetworkRoot and NetworkRoot0 with backslashes (not forward slash) e.g.: \\fpdeadline01\DeadlineRepository10
    I believe the settings\connection.ini file is getting its root path from the NetworkRoot

  3. From the docs: The windows path location for the worker is: %PROGRAMDATA%\Thinkbox\Deadline[VERSION]\workers\[WORKERNAME]
    Does %PROGRAMDATA% expand/resolve correctly on your render node e.g.:

PS C:\Users\deadline> dir env:programdata

Name                           Value
----                           -----
ProgramData                    C:\ProgramData
  1. Your test of SlaveDataRoot=C:\LocalSlaveData, I think the permissions needs to be Full Control (this folder, subfolders, and files) as the worker is creating and deleting the jobsData and plugins folders and files under workers\[WORKERNAME]
1 Like

Do you get the same error ^ when SlaveDataRoot is set to C:\LocalSlaveData?

Usually the permissions right under C are inherited from the parent.

If you are logged in the render node as the same user the Worker is running under. Then recreate the issue and follow here: Troubleshooting — Deadline 10.2.0.10 documentation
Do you get the same error?

Do a similar test with AppData location too.

Hey I wanted add to my response. %APPDATA% or any path which starts with the environment variable like this cannot be expanded by the Worker because only Windows can do that Worker cannot.

Hi @jarak

Sorry couldn’t catch up on this thread earlier. Please find below the responses.

image
and

at the deadline worker power shell gives this output:

dir env:programdata

Name                           Value
----                           -----
ProgramData                    C:\ProgramData

As for the SlaveDataRoot it has full permissions,
image

noted @zainali thanks for that info and sure we will keep this in mind.

Additionally, I am trying to run the deadlinecommand to get the DatabaseSettings which seems to be working fine for the local (LAN) Deadline Repository. However, the same isn’t working for the remote Deadline Repository which we are able to connect over the RCS through Deadline Monitor GUI.

Please refer to the below example outputs for more details and reference,

Working for Local Deadline Repository:

"%DEADLINE_PATH%"\deadlinecommand -GetDatabaseSettings //fpdeadline01/DeadlineRepository10
 Database Type: MongoDB
   Hostname(s): fpdeadline01;192.168.0.117
 Database Name: deadline10db
          Port: 27100
      Port Alt: 0
   SSL Enabled: False
  Authenticate: False
     User Name:
   Replica Set:
      Split DB: False
       Version: 10

Failing for remote Deadline Repository:

option1:

"%DEADLINE_PATH%"\deadlinecommand -GetDatabaseSettings sgtdlrepo:4433;"D:\RCS Certs\RCS Certs\Deadlin10RemoteClient.pfx"
 Database Type: Unknown
   Hostname(s):
 Database Name:
          Port: -1
      Port Alt: -1
   SSL Enabled: False
  Authenticate: False
     User Name:
   Replica Set:
      Split DB: False
       Version: 6

option2:

"%DEADLINE_PATH%"\deadlinecommand -GetDatabaseSettings sgtdlrepo:4433
 Database Type: Unknown
   Hostname(s):
 Database Name:
          Port: -1
      Port Alt: -1
   SSL Enabled: False
  Authenticate: False
     User Name:
   Replica Set:
      Split DB: False
       Version: 6

option3:

"%DEADLINE_PATH%"\deadlinecommand RunCommandForRepository Remote sgtdlrepo:4433;"D:/RCS Certs/RCS Certs/Deadline10RemoteClient.pfx" -GetDatabaseSettings
Warning: This command does not support "RunCommandForRepository" and that option will be ignored.
An error occurred while updating the database settings:

Index was outside the bounds of the array. (System.IndexOutOfRangeException)

option4:

"%DEADLINE_PATH%"\deadlinecommand -GetDatabaseSettings remote sgtdlrepo:4433;"D:\RCS Certs\RCS Certs\Deadlin10RemoteClient.pfx"
 Database Type: Unknown
   Hostname(s):
 Database Name:
          Port: -1
      Port Alt: -1
   SSL Enabled: False
  Authenticate: False
     User Name:
   Replica Set:
      Split DB: False
       Version: 6

GetDatabaseSettings doesn’t work when pointed at the RCS. It’s reading out what’s set in DeadlineRepository10\settings\connection.ini. The command should be smart enough to tell you this in a human friendly way. I’ll get a dev ticket in to improve that.

If you want to confirm database settings, you’ll have to run that command on the RCS machine itself as it’ll be using a direct connection to the database.

To test connection to the database you could instead use something like deadlinecommand -getpools, which will prove a connection and pulling data from the database.

1 Like

Sure @Justin_B get that, and that’s very useful info.

My apologies if I should be posting this to some other topics and threads, but since I am new both to Deadline itself as well as these forums, a few quick questions:

  1. For already submitted jobs - how do we edit its “Job Info Parameters” as well as its “Plugin Info Parameters”?

  2. How can we edit the already submitted Gaffer job to add the “-threads 16” parameter to it? and how can we get the respective job’s execution also to pick it up and respect that flag for the actual command line execution?

  3. Can we have the Job’s “Concurrent Tasks” be a dynamic or programmatically calculated value? For example, for the Deadline-Gaffer job, we want to drive this through the job’s submission parameter “-threads int” value. And hoping to implement it in a way like the one below:

where the goal is to send concurrent jobs on a Deadline worker(s) with heterogeneous cores of machines (which would be say a mix of 16,32,72 and 128 cores) - where the job’s requested rendering thread counts should assign multiple concurrent jobs to a deadline worker, where

-threads 16

on a 16 core worker - it should assign/run one job/task
on a 32 core worker - it should assign/run two concurrent jobs/tasks
on a 72 core worker - it should assign/run four concurrent jobs/tasks
on a 128 core worker - it should assign/run eight concurrent jobs/tasks

this is what I am exploring right now,

Thanks,
Bhavik

The connection.ini and mongodb cfg seems good.

Can you please post your deadline.ini file? I think in the other thread/post you have an entry forConnectionType=Respository, but you also have ProxyRoot=fpdeadline01:8080

Maybe someone from Thinkbox can answer, but I think this may be where your worker is getting confused. If the ConnectionType is Repository or Direct, it will use NetworkRoot. But if the ConnectionType is Remote, it will use ProxyRoot, (Proxy* etc). Not sure if having both types entries is the issue, since one would expect it to ignore the Proxy keys if the ConnectionType is set to Repository.

For the next test, I suggest that you only have one or the other (i.e. Respository or Remote type entries).
Can you try removing the Proxy* and ClientSSLAuthentication entries?
e.g.:

[Deadline]
LicenseMode=LicenseFree
LicenseServer=
Region=
LauncherListeningPort=17000
LauncherServiceStartupDelay=60
AutoConfigurationPort=17001
SlaveStartupPort=17003
SlaveDataRoot=
RestartStalledSlave=false
NoGuiMode=false
LaunchSlaveAtStartup=1
AutoUpdateOverride=
ConnectionType=Repository
NetworkRoot=\\fpdeadline01\DeadlineRepository10
DbSSLCertificate=
NetworkRoot0=\\fpdeadline01\DeadlineRepository10
LogReportSyntaxHighlighting=true

And just to double check, you worker can reach the repo using the path: \\fpdeadline01\DeadlineRepository10

For the SlaveDataRoot You did not show the Full Control permission checkbox which is just above Modify. Can you click on Advanced and double check that the Permission entries are:

Type Allow
Principal Everyone
Access Full Control
Inherited from None
Applies to This folder, subfolders, and files

Sure @jarak please note that now we are working with two Deadline Repositories/Servers. One is local/LAN and the other is remote over RCS. Attaching the latest deadline.ini from C:\ProgramData\Thinkbox\Deadline10

Yes, you are right, but from what I can tell, from the Deadline Monitor GUI we have set the default repository as:
image

Permissions for C:\LocalSlaveData

Also, we will be working with both these DirectConnection (Local) and RCS (Remote) repositories. That’s on-premise and cloud Deadline repositories. So it is necessary for us to get things working with these. Any thoughts on fixing or setting things up for our use case will be very helpful.

Thanks,
Bhavik

Additionally, I am noticing the connections from Deadline Worker to the Deadline Repository for the below ports,

netstat -a | find "fprdsk113"
  TCP    192.168.0.117:445      fprdsk113:50214        ESTABLISHED
  TCP    192.168.0.117:3389     fprdsk113:59640        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:49739        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:49740        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:49741        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:49742        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:49796        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:49797        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:49798        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:49799        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:50441        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:50442        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:50443        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:50444        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:50448        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:50449        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:50450        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:50451        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:57177        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:57178        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:57179        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:57180        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:58965        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:58983        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:59261        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:59380        ESTABLISHED
  TCP    192.168.0.117:27100    fprdsk113:59417        ESTABLISHED

C:\LocalSlaveData needs to have “Full Control” permissions granted to it.
DL_perms

from MS:
Full control: Allows users to read, write, change, and delete files and subfolders.

The deadline worker needs full control perms in order to create and delete the temp. jobsData and plugins folders.

sure thanks for pointing that @jarak I have done the permissions fixes and rebooted the Deadline worker node and tried running the job again but no luck! I am sure something is going wrong on our side but not sure how do I narrow it down and fix it!

Attaching Job_2023-04-04_12-08-29_642bc60748224261782ad700.7z the latest re-run failed/error log for your perusal.

Thanks,
Bhavik

  1. For already submitted jobs - how do we edit its “Job Info Parameters” as well as its “Plugin Info Parameters”?
    If the property you want to change isn’t exposed in the Job’s Modify Job Properties window you’ll have to resubmit the job with the desired settings.
  2. How can we edit the already submitted Gaffer job to add the “-threads 16” parameter to it? and how can we get the respective job’s execution also to pick it up and respect that flag for the actual command line execution?
    You’re referring to this Gaffer plugin?. It looks like you’d have to either edit the plugin or reach out to the author.
  1. Can we have the Job’s “Concurrent Tasks” be a dynamic or programmatically calculated value? For example, for the Deadline-Gaffer job, we want to drive this through the job’s submission parameter “-threads int” value. And hoping to implement it in a way like the one below:

You can modify the concurrent tasks programmatically, the issue is that’ll apply to the whole job and not just tasks picked up by individual Workers. Depending on how high the --threads option can go you might be able to interrogate the Worker for its CPU thread count and build that argument then. The nitty-gritty on how to do that I’m not certain of, but it should be possible.

You could also always submit with --threads 16 and set the maximum concurrent tasks for each size of worker. Then on the job set the concurrent tasks to some high number, and the Workers will bring it down to their local maximum set in ‘Concurrent Task Limit Override’.

The deadline.ini didn’t get attached above so I’m going to assume it’s got SlaveDataRoot= in it, which will mean Deadline uses the default path for the operating system which should be %PROGRAMDATA%\Thinkbox\Deadline[VERSION]\workers\[WORKERNAME].

But for some reason that output isn’t acceptable, so I’d like to know what’s being used. To that end, run the attached script getslavedataroot.py (464 Bytes) on the fprdsk113 machine. I’d expect to see something odd there that’ll give us some direction.

sorry about that @Justin_B maybe I would have missed attaching it or it was

Sorry, the file you are trying to upload is not authorized (authorized extensions: jpg, jpeg, png, gif, 7z, tar, targz, zip, exr, py).

which I may have missed addressing.

Attaching the deadline.ini in a zip format here:
deadline.zip (572 Bytes)

Please find below the output of getslavedataroot.py script:

"%DEADLINE_PATH%"\deadlinecommand -ExecuteScript C:\Users\bsukhadia\Documents\Deadline\forums\from_support\getslavedataroot.py
'C:\Users\bsukhadia\AppData\Local\Thinkbox\Deadline10\pythonAPIs\2022-11-22T212046.0000000Z' already exists. Skipping extraction of PythonSync.
DeadlineClientLocalDataHome = C:\ProgramData\Thinkbox\Deadline10
The path being used as SlaveDataRoot is C:\ProgramData\Thinkbox\Deadline10\workers

Hi @Justin_B

Can you please help with maybe a deadlinecommand example snippet where we can re-use all the settings of an already submitted Gaffer job with -threads n added/amended to it and resubmit it as a new job?

Yes that’s the Gaffer plugin from @egmehl we are using and trying to get -threads n working.

Thanks for this suggestion @Justin_B it sounds like a fair workaround and we will give it a try soon. However, the edge case I feel would be, we will not be able to prevent the worker from picking up the gaffer job where worker cpus are less than the requested rendering threads of the gaffer rendering job, isn’t it?

For this example case, if the job is submitted with -threads 16 and the Deadline worker is 12 CPU/Cores it will still pick up or assign this job, right?

Thanks,
Bhavik
Fractal Picture

Unfortunately not. From the Gaffer thread, there isn’t an option to set -threads. Unless there’s a setting that’ll take in arbitrary flags the plugin will have to be re-worked to add that option. As-is there’s not a way in Deadline to append to the arguments used for a task without some modifications to the application plugin.

If that does exist, you can export the job’s submission files with this deadlinecommand flag:

GenerateSubmissionInfoFiles <Job ID> <Job Info File> <Plugin Info File>
  Generates a Job Info file and a Plugin Info file that can be used to submit a
  new Job, based on an existing one.
    Job ID                   The ID of the Job on which to base the
                             Submission Parameters.
    Job Info File            The file to which the Job Submission Info will
                             be output.
    Plugin Info File         The file to which the Plugin Submission Info
                             will be output.

Yep, the Worker doesn’t query job’s render settings before dequeuing tasks. If you want a breakdown of what is considered, job scheduling is broken down here.

Privacy | Site terms | Cookie preferences