related to:
Hey, I’m very new to deadline too. But I’ll take a guess at your issue regarding the SlaveDataRoot. It sounds like the SlaveDataRoot inside the desktop.ini on the given worker doesn’t have a root. Like for example, if the worker is a Windows computer. I believe the path has to start with a drive letter or double backslash if it’s a network drive. Like C:\Thinkbox\Deadline10 or \networkDrive\Thinkbox\Deadline10.
Thanks, @Mads_Hangaard while your point is correct, the deadline.ini
(C:\ProgramData\Thinkbox\Deadline10) for our deadline client machines’ looks like this:
[Deadline]
LicenseMode=LicenseFree
LicenseServer=
Region=
LauncherListeningPort=17000
LauncherServiceStartupDelay=60
AutoConfigurationPort=17001
SlaveStartupPort=17003
SlaveDataRoot=
RestartStalledSlave=false
NoGuiMode=false
LaunchSlaveAtStartup=1
AutoUpdateOverride=
ConnectionType=Repository
ProxyRoot=fpdeadline01:8080
ProxyUseSSL=False
ProxySSLCertificate=
ProxySSLCA=
ClientSSLAuthentication=Required
NetworkRoot=//fpdeadline01/DeadlineRepository10
DbSSLCertificate=
NetworkRoot0=//fpdeadline01/DeadlineRepository10
ProxyRoot0=fpdeadline01:8080
LogReportSyntaxHighlighting=true
where the SlaveDataRoot=
is empty by default and therefore we expect the defaults to work correctly or give us an error message that’s a little more specific as to which default location the write attempt is being made and what fixes would allow us to get this working?
Is that make sense to you? Looking forward to your thoughts.
Thanks,
Bhavik
I see, then I don’t know why the error is happening. But I can tell you, that when we set up Deadline. The default was C:\Users\USER\AppData\Local\Thinkbox\Deadline10. I’m sorry I couldn’t be of better assistance.
Not a prob @Mads_Hangaard we also tried running this job with the below change in deadline.ini
:
SlaveDataRoot=%AppData%\Thinkbox\Deadline10
and restarted the Deadline 10 Launcher Service
and re-run the Command Line
test task, however that again resulted into the same error as below:
2023-03-08 18:46:55: 0: Failed to properly create Deadline Worker data folder 'Thinkbox\Deadline10\workers' because: The SlaveDataRoot path in the deadline.ini file isn't a rooted path. (Deadline.Configuration.DeadlineConfigException)
2023-03-08 18:46:55: 0: ERROR: DataController threw an unexpected exception during initialization: FranticX.Database.DatabaseConnectionException: Could not connect to any of the specified Mongo DB servers defined in the "Hostname" parameter of the "settings\connection.ini" file in the root of the Repository.
Anyone else on these forums, would you recommend any other tests/checks to further narrow down the troubleshooting of this issue?
Thanks,
Bhavik
Additionally, we also tried again with the C:\ProgramData\Thinkbox\Deadline10\deadline.ini
modified with:
SlaveDataRoot=C:\LocalSlaveData
Where C:\LocalSlaveData
has permissions for Everyone
to Modify, Read & Execute, List folder contents, Read, Write
and the contents of C:\LocalSlaveData
post filed job execution are:
C:\LocalSlaveData
\---fprdsk113
+---jobsData
| \---6405f65b2f17b32c0725f147
\---plugins
\---6405f65b2f17b32c0725f147
so from the log errors, it is still somewhat unclear as to what exactly is the issue and how we fix it.
@karpreet would you be able to help us with this?
Thanks,
Bhavik
Hello,
-
Does your deadline repo/settings/connection.ini file (e.g. EnableSSL=False and Authenticate=False) match the mongo config.conf ?
-
Not sure if the affects anything, but assuming you’re on windows, can you re-write the NetworkRoot and NetworkRoot0 with backslashes (not forward slash) e.g.:
\\fpdeadline01\DeadlineRepository10
I believe the settings\connection.ini file is getting its root path from the NetworkRoot -
From the docs: The windows path location for the worker is:
%PROGRAMDATA%\Thinkbox\Deadline[VERSION]\workers\[WORKERNAME]
Does%PROGRAMDATA%
expand/resolve correctly on your render node e.g.:
PS C:\Users\deadline> dir env:programdata
Name Value
---- -----
ProgramData C:\ProgramData
- Your test of
SlaveDataRoot=C:\LocalSlaveData
, I think the permissions needs to be Full Control (this folder, subfolders, and files) as the worker is creating and deleting thejobsData
andplugins
folders and files underworkers\[WORKERNAME]
Do you get the same error ^ when SlaveDataRoot is set to C:\LocalSlaveData?
Usually the permissions right under C are inherited from the parent.
If you are logged in the render node as the same user the Worker is running under. Then recreate the issue and follow here: Troubleshooting — Deadline 10.2.0.10 documentation
Do you get the same error?
Do a similar test with AppData location too.
Hey I wanted add to my response. %APPDATA% or any path which starts with the environment variable like this cannot be expanded by the Worker because only Windows can do that Worker cannot.
Hi @jarak
Sorry couldn’t catch up on this thread earlier. Please find below the responses.
and
at the deadline worker power shell gives this output:
dir env:programdata
Name Value
---- -----
ProgramData C:\ProgramData
As for the SlaveDataRoot
it has full permissions,
Additionally, I am trying to run the deadlinecommand
to get the DatabaseSettings
which seems to be working fine for the local (LAN) Deadline Repository. However, the same isn’t working for the remote
Deadline Repository which we are able to connect over the RCS through Deadline Monitor GUI.
Please refer to the below example outputs for more details and reference,
Working for Local Deadline Repository:
"%DEADLINE_PATH%"\deadlinecommand -GetDatabaseSettings //fpdeadline01/DeadlineRepository10
Database Type: MongoDB
Hostname(s): fpdeadline01;192.168.0.117
Database Name: deadline10db
Port: 27100
Port Alt: 0
SSL Enabled: False
Authenticate: False
User Name:
Replica Set:
Split DB: False
Version: 10
Failing for remote Deadline Repository:
option1:
"%DEADLINE_PATH%"\deadlinecommand -GetDatabaseSettings sgtdlrepo:4433;"D:\RCS Certs\RCS Certs\Deadlin10RemoteClient.pfx"
Database Type: Unknown
Hostname(s):
Database Name:
Port: -1
Port Alt: -1
SSL Enabled: False
Authenticate: False
User Name:
Replica Set:
Split DB: False
Version: 6
option2:
"%DEADLINE_PATH%"\deadlinecommand -GetDatabaseSettings sgtdlrepo:4433
Database Type: Unknown
Hostname(s):
Database Name:
Port: -1
Port Alt: -1
SSL Enabled: False
Authenticate: False
User Name:
Replica Set:
Split DB: False
Version: 6
option3:
"%DEADLINE_PATH%"\deadlinecommand RunCommandForRepository Remote sgtdlrepo:4433;"D:/RCS Certs/RCS Certs/Deadline10RemoteClient.pfx" -GetDatabaseSettings
Warning: This command does not support "RunCommandForRepository" and that option will be ignored.
An error occurred while updating the database settings:
Index was outside the bounds of the array. (System.IndexOutOfRangeException)
option4:
"%DEADLINE_PATH%"\deadlinecommand -GetDatabaseSettings remote sgtdlrepo:4433;"D:\RCS Certs\RCS Certs\Deadlin10RemoteClient.pfx"
Database Type: Unknown
Hostname(s):
Database Name:
Port: -1
Port Alt: -1
SSL Enabled: False
Authenticate: False
User Name:
Replica Set:
Split DB: False
Version: 6
GetDatabaseSettings doesn’t work when pointed at the RCS. It’s reading out what’s set in DeadlineRepository10\settings\connection.ini. The command should be smart enough to tell you this in a human friendly way. I’ll get a dev ticket in to improve that.
If you want to confirm database settings, you’ll have to run that command on the RCS machine itself as it’ll be using a direct connection to the database.
To test connection to the database you could instead use something like deadlinecommand -getpools, which will prove a connection and pulling data from the database.
Sure @Justin_B get that, and that’s very useful info.
My apologies if I should be posting this to some other topics and threads, but since I am new both to Deadline itself as well as these forums, a few quick questions:
-
For already submitted jobs - how do we edit its “Job Info Parameters” as well as its “Plugin Info Parameters”?
-
How can we edit the already submitted Gaffer job to add the “-threads 16” parameter to it? and how can we get the respective job’s execution also to pick it up and respect that flag for the actual command line execution?
-
Can we have the Job’s “Concurrent Tasks” be a dynamic or programmatically calculated value? For example, for the Deadline-Gaffer job, we want to drive this through the job’s submission parameter “-threads int” value. And hoping to implement it in a way like the one below:
where the goal is to send concurrent jobs on a Deadline worker(s) with heterogeneous cores of machines (which would be say a mix of 16,32,72 and 128 cores) - where the job’s requested rendering thread counts should assign multiple concurrent jobs to a deadline worker, where
-threads 16
on a 16 core worker - it should assign/run one job/task
on a 32 core worker - it should assign/run two concurrent jobs/tasks
on a 72 core worker - it should assign/run four concurrent jobs/tasks
on a 128 core worker - it should assign/run eight concurrent jobs/tasks
this is what I am exploring right now,
Thanks,
Bhavik
The connection.ini and mongodb cfg seems good.
Can you please post your deadline.ini
file? I think in the other thread/post you have an entry forConnectionType=Respository
, but you also have ProxyRoot=fpdeadline01:8080
Maybe someone from Thinkbox can answer, but I think this may be where your worker is getting confused. If the ConnectionType
is Repository
or Direct
, it will use NetworkRoot
. But if the ConnectionType
is Remote
, it will use ProxyRoot
, (Proxy* etc). Not sure if having both types entries is the issue, since one would expect it to ignore the Proxy keys if the ConnectionType is set to Repository.
For the next test, I suggest that you only have one or the other (i.e. Respository or Remote type entries).
Can you try removing the Proxy*
and ClientSSLAuthentication
entries?
e.g.:
[Deadline]
LicenseMode=LicenseFree
LicenseServer=
Region=
LauncherListeningPort=17000
LauncherServiceStartupDelay=60
AutoConfigurationPort=17001
SlaveStartupPort=17003
SlaveDataRoot=
RestartStalledSlave=false
NoGuiMode=false
LaunchSlaveAtStartup=1
AutoUpdateOverride=
ConnectionType=Repository
NetworkRoot=\\fpdeadline01\DeadlineRepository10
DbSSLCertificate=
NetworkRoot0=\\fpdeadline01\DeadlineRepository10
LogReportSyntaxHighlighting=true
And just to double check, you worker can reach the repo using the path: \\fpdeadline01\DeadlineRepository10
For the SlaveDataRoot
You did not show the Full Control
permission checkbox which is just above Modify
. Can you click on Advanced
and double check that the Permission entries are:
Type Allow
Principal Everyone
Access Full Control
Inherited from None
Applies to This folder, subfolders, and files
Sure @jarak please note that now we are working with two Deadline Repositories/Servers. One is local/LAN and the other is remote over RCS. Attaching the latest deadline.ini
from C:\ProgramData\Thinkbox\Deadline10
Yes, you are right, but from what I can tell, from the Deadline Monitor GUI we have set the default repository as:
Permissions for C:\LocalSlaveData
Also, we will be working with both these DirectConnection (Local) and RCS (Remote) repositories. That’s on-premise and cloud Deadline repositories. So it is necessary for us to get things working with these. Any thoughts on fixing or setting things up for our use case will be very helpful.
Thanks,
Bhavik
Additionally, I am noticing the connections from Deadline Worker
to the Deadline Repository
for the below ports,
netstat -a | find "fprdsk113"
TCP 192.168.0.117:445 fprdsk113:50214 ESTABLISHED
TCP 192.168.0.117:3389 fprdsk113:59640 ESTABLISHED
TCP 192.168.0.117:27100 fprdsk113:49739 ESTABLISHED
TCP 192.168.0.117:27100 fprdsk113:49740 ESTABLISHED
TCP 192.168.0.117:27100 fprdsk113:49741 ESTABLISHED
TCP 192.168.0.117:27100 fprdsk113:49742 ESTABLISHED
TCP 192.168.0.117:27100 fprdsk113:49796 ESTABLISHED
TCP 192.168.0.117:27100 fprdsk113:49797 ESTABLISHED
TCP 192.168.0.117:27100 fprdsk113:49798 ESTABLISHED
TCP 192.168.0.117:27100 fprdsk113:49799 ESTABLISHED
TCP 192.168.0.117:27100 fprdsk113:50441 ESTABLISHED
TCP 192.168.0.117:27100 fprdsk113:50442 ESTABLISHED
TCP 192.168.0.117:27100 fprdsk113:50443 ESTABLISHED
TCP 192.168.0.117:27100 fprdsk113:50444 ESTABLISHED
TCP 192.168.0.117:27100 fprdsk113:50448 ESTABLISHED
TCP 192.168.0.117:27100 fprdsk113:50449 ESTABLISHED
TCP 192.168.0.117:27100 fprdsk113:50450 ESTABLISHED
TCP 192.168.0.117:27100 fprdsk113:50451 ESTABLISHED
TCP 192.168.0.117:27100 fprdsk113:57177 ESTABLISHED
TCP 192.168.0.117:27100 fprdsk113:57178 ESTABLISHED
TCP 192.168.0.117:27100 fprdsk113:57179 ESTABLISHED
TCP 192.168.0.117:27100 fprdsk113:57180 ESTABLISHED
TCP 192.168.0.117:27100 fprdsk113:58965 ESTABLISHED
TCP 192.168.0.117:27100 fprdsk113:58983 ESTABLISHED
TCP 192.168.0.117:27100 fprdsk113:59261 ESTABLISHED
TCP 192.168.0.117:27100 fprdsk113:59380 ESTABLISHED
TCP 192.168.0.117:27100 fprdsk113:59417 ESTABLISHED
C:\LocalSlaveData
needs to have “Full Control” permissions granted to it.
from MS:
Full control: Allows users to read, write, change, and delete files and subfolders.
The deadline worker needs full control perms in order to create and delete the temp. jobsData and plugins folders.
sure thanks for pointing that @jarak I have done the permissions fixes and rebooted the Deadline worker node and tried running the job again but no luck! I am sure something is going wrong on our side but not sure how do I narrow it down and fix it!
Attaching Job_2023-04-04_12-08-29_642bc60748224261782ad700.7z the latest re-run failed/error log for your perusal.
Thanks,
Bhavik
- For already submitted jobs - how do we edit its “Job Info Parameters” as well as its “Plugin Info Parameters”?
If the property you want to change isn’t exposed in the Job’s Modify Job Properties window you’ll have to resubmit the job with the desired settings.- How can we edit the already submitted Gaffer job to add the “-threads 16” parameter to it? and how can we get the respective job’s execution also to pick it up and respect that flag for the actual command line execution?
You’re referring to this Gaffer plugin?. It looks like you’d have to either edit the plugin or reach out to the author.
- Can we have the Job’s “Concurrent Tasks” be a dynamic or programmatically calculated value? For example, for the Deadline-Gaffer job, we want to drive this through the job’s submission parameter “-threads int” value. And hoping to implement it in a way like the one below:
You can modify the concurrent tasks programmatically, the issue is that’ll apply to the whole job and not just tasks picked up by individual Workers. Depending on how high the --threads
option can go you might be able to interrogate the Worker for its CPU thread count and build that argument then. The nitty-gritty on how to do that I’m not certain of, but it should be possible.
You could also always submit with --threads 16
and set the maximum concurrent tasks for each size of worker. Then on the job set the concurrent tasks to some high number, and the Workers will bring it down to their local maximum set in ‘Concurrent Task Limit Override’.