Hi there,
I am running into issues when if a client is rendering a longer job (a single frame taking more than an hour or so) I suddenly start getting a lot of connection server errors and neither monitor or worker can properly connect to RCS. On top of that it seems that the RCS crashes/freezes also and it needs to be restarted to get it working again.
This is only happening on the longer jobs though, if it’s a less than say a 10 minute task it’s all ok and renders are completed without a hitch. When the connection server error comes up I can still ping the machine from the client though so the connection is there, it just can’t reach RCS. There’s no other errors/warnings on any of the logs when everything is working fine.
Other somewhat strange thing is that when this happen the Monitor on Server changes some of it’s colors, but otherwise runs fine (the odd background which is normally dark grey in the default color profile changes to white), restarting monitor clears this. But there’s no error in the log supporting this, neither in the Monitor or RCS log
RCS and database is running on a Mac Mini M4, MacOS Sequoia 15.3.1. The render worker is a Windows 11 Pro Machine. The clients are connecting to the RCS via Tailscale VPN.
Thanks so much
2025-10-08 08:08:21: ---------- Inner Stack Trace (System.Net.Sockets.SocketException) ----------
2025-10-08 08:08:21: at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
2025-10-08 08:08:21: at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
2025-10-08 08:08:21: at System.Net.Sockets.Socket.g__WaitForConnectWithCancellation|285_0(AwaitableSocketAsyncEventArgs saea, ValueTask connectTask, CancellationToken cancellationToken)
2025-10-08 08:08:21: at System.Net.Http.HttpConnectionPool.ConnectToTcpHostAsync(String host, Int32 port, HttpRequestMessage initialRequest, Boolean async, CancellationToken cancellationToken)
2025-10-08 08:08:53: ERROR: UpdateClient.MaybeSendRequestNow caught an exception: POST https://100.109.55.111:4433/rcs/v1/update returned “One or more errors occurred. (A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. (100.109.55.111:4433))” (Deadline.Net.Clients.Http.DeadlineHttpRequestException)
2025-10-08 08:08:59: ERROR: UpdateClient.MaybeSendRequestNow caught an exception: POST https://100.109.55.111:4433/rcs/v1/update returned “One or more errors occurred. (A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. (100.109.55.111:4433))” (Deadline.Net.Clients.Http.DeadlineHttpRequestException)
2025-10-08 08:09:29: Error occurred while updating Deadline AWS Resource Tracker Status label: Connection Server error: GET https://100.109.55.111:4433/db/dash/dashFleet/health?region=af-south-1 returned “One or more errors occurred. (A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. (100.109.55.111:4433))” (System.Net.WebException)
2025-10-08 08:09:29: at Deadline.StorageDB.Proxy.Utils.ProxyUtils.HandleException(Exception e, NetworkManager manager, String server, Int32 port, String certificatePath)
2025-10-08 08:09:29: at Deadline.StorageDB.Proxy.ProxyDashStorage.GetFleetHealthSummary(String awsRegion, String& fleetsHealthReport)
2025-10-08 08:09:29: at Deadline.StorageDB.DashStorage.UpdateResourceTrackerStatusLabel(Object o)