web service overload?

This happens with deadline 6.

We switched over some common lookups to use the webservice based api instead of the command line tool, and it seems to have overloaded the webservice completely. Not sure if that’s the case exactly, but here are some logs:

2014-12-24 13:22:33:  Web Service - Web Service listening for connections on port 8080...
2014-12-24 13:22:33:  Web Service - Received request from: 172.18.11.148
2014-12-24 13:22:33:  Web Service - Request: GET http://deadline02.scanlinevfxla.com:8080/api/slaves?Name=lapro1883&Data=settings
2014-12-24 13:22:33:  Web Service - Found API command: slaves
2014-12-24 13:22:33:  Web Service - Received request from: 172.18.1.170
2014-12-24 13:22:33:  Web Service - Request: GET http://deadline02.scanlinevfxla.com:8080/api/slaves?Name=lapro0070&Data=settings
2014-12-24 13:22:33:  Web Service - Found API command: slaves
2014-12-24 13:22:33:  Web Service - Web Service listening for requests...
2014-12-24 13:22:33:  Web Service - Web Service listening for requests...
2014-12-24 13:22:33:  Web Service - Received request from: 172.18.4.236
2014-12-24 13:22:33:  Web Service - Request: GET http://deadline02.scanlinevfxla.com:8080/api/slaves?Name=lapro1233&Data=settings
2014-12-24 13:22:33:  Web Service - Found API command: slaves
2014-12-24 13:22:33:  Web Service - Web Service listening for requests...
2014-12-24 13:22:34:  Web Service - Received request from: 172.18.13.132
2014-12-24 13:22:34:  Web Service - Request: GET http://deadline02.scanlinevfxla.com:8080/api/slaves?Name=lapro3234&Data=settings
2014-12-24 13:22:34:  An unhandled exception occurred: The object was used after being disposed. (System.ObjectDisposedException)
2014-12-24 13:22:34:    at System.Net.ResponseStream.EndWrite (IAsyncResult ares) [0x00000] in <filename unknown>:0
2014-12-24 13:22:34:    at Deadline.Pulses.PulseWebService.OnWriteComplete (IAsyncResult result) [0x00000] in <filename unknown>:0
2014-12-24 13:22:34:  Web Service - Received request from: 172.18.12.243
2014-12-24 13:22:34:  Web Service - Request: GET http://deadline02.scanlinevfxla.com:8080/api/slaves?Name=lapro0872&Data=settings
2014-12-24 13:22:34:  Web Service - Found API command: slaves
2014-12-24 13:22:34:  Web Service - Found API command: slaves
2014-12-24 13:22:34:  Web Service - Web Service listening for requests...
2014-12-24 13:22:34:  Web Service - Web Service listening for requests...
2014-12-24 13:22:34:  Web Service - Received request from: 172.18.12.147
2014-12-24 13:22:34:  Web Service - Request: GET http://deadline02.scanlinevfxla.com:8080/api/slaves?Name=lapro0776&Data=settings
2014-12-24 13:22:34:  Web Service - Found API command: slaves
2014-12-24 13:22:34:  Web Service - Web Service listening for requests...
2014-12-24 13:22:34:  Web Service - Received request from: 172.18.3.169
2014-12-24 13:22:34:  Web Service - Request: GET http://deadline02.scanlinevfxla.com:8080/api/slaves?Name=lapro0519&Data=settings
2014-12-24 13:22:34:  Web Service - Found API command: slaves
2014-12-24 13:22:36:  Web Service - Received request from: 172.18.11.133
2014-12-24 13:22:36:  Web Service - Request: GET http://deadline02.scanlinevfxla.com:8080/api/slaves?Name=lapro1868&Data=settings
2014-12-24 13:22:36:  Web Service - Received request from: 172.18.13.203
2014-12-24 13:22:36:  Web Service - Request: GET http://deadline02.scanlinevfxla.com:8080/api/slaves?Name=lapro3305&Data=settings
2014-12-24 13:22:36:  Web Service - Found API command: slaves
2014-12-24 13:22:36:  Web Service - Found API command: slaves
2014-12-24 13:22:36:  Web Service - Web Service listening for requests...
2014-12-24 13:22:36:  Web Service - Web Service listening for requests...
2014-12-24 13:22:36:  Web Service - Web Service listening for requests...
2014-12-24 13:22:36:  Web Service - Received request from: 172.18.13.23
2014-12-24 13:22:36:  Web Service - Request: GET http://deadline02.scanlinevfxla.com:8080/api/slaves?Name=lapro0897&Data=settings

It seems to get all the GET requests, but then there is no response. The clients eventually time out:

>>> d.isSlaveEnabled('lapro3067')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "\\s2.scanlinevfxla.com\exchange\software\managed\pythonScripts\site-packages\scl\deadline\deadlineAPI.py", line 1047, in isSlaveEnabled
    vSlaveSettings = dAPI.Slaves.GetSlaveSettings(hostname)
  File "\\s2.scanlinevfxla.com\exchange\software\managed\pythonScripts\site-packages\scl\deadline\api\Slaves.py", line 95, in GetSlaveSettings
    return self.__get__("/api/slaves?Name="+name.replace(' ','+')+"&Data=settings")
  File "\\s2.scanlinevfxla.com\exchange\software\managed\pythonScripts\site-packages\scl\deadline\api\Slaves.py", line 15, in __get__
    return DeadlineSend.send(self.address,commandString, "GET")
  File "\\s2.scanlinevfxla.com\exchange\software\managed\pythonScripts\site-packages\scl\deadline\api\DeadlineSend.py", line 13, in send
    conn.request(requestType, message)
  File "C:\python26_64\lib\httplib.py", line 898, in request
    self._send_request(method, url, body, headers)
  File "C:\python26_64\lib\httplib.py", line 935, in _send_request
    self.endheaders()
  File "C:\python26_64\lib\httplib.py", line 892, in endheaders
    self._send_output()
  File "C:\python26_64\lib\httplib.py", line 764, in _send_output
    self.send(msg)
  File "C:\python26_64\lib\httplib.py", line 723, in send
    self.connect()
  File "C:\python26_64\lib\httplib.py", line 704, in connect
    self.timeout)
  File "C:\python26_64\lib\socket.py", line 514, in create_connection
    raise error, msg
socket.error: [Errno 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
>>>

I think you might be on the right track.

The error is a standard socket timeout. I’ll make a test harness and hit the Pulse running on my machine until it breaks.

How many concurrent connections are hitting the web service? Is it just one script, or is this thing running on the render nodes?

Its running on all the rendernodes, so about 1400+. Its not running constantly, it does an occasional check every 10 mins or so to see if the slave is enabled/disabled and does some action based on that.

So, I’ve got nothing in our codebase for a function called “isSlaveEnabled()”

Did you folks build a custom wrapper?

I’m having the thing run ‘co.Slaves.GetSlaveSettings()’ in seven Python instances.

It’s very slow, but that could be because of the VPN. I’ll let this run for a bit and give it another crack when I’m in the office tomorrow.

I’m honestly not sure how performant .net’s web framework is with regard to heavy concurrency. You guys are fantastic for pushing this stuff hard. I’ll keep you posted.

Yeah, we had to make our own deadlineapi to wrap the underlying functionality, and to make communication transparent to our other libs. Some functions use the command line under the hood, some use the pulse api…

Still wishing for a native python library here… :-\

Yep, that’s the function isSlaveEnabled() used! Only run it on 1200+ instances :slight_smile:
It was a bit slower than i expected (some of the rest api calls are almost instant) but still faster than the command line.

Well In the mean time, I think I’ve got something here.

One of the Python instances just fell over, and now none of the other instances are getting a response from the web service, they’re just hanging.

This is my exception:

Traceback (most recent call last): File "Documents/Code/Thinkbox/DeadlineTesting/test-webservice.py", line 5, in <module> b=co.Slaves.GetSlaveSettings('Mobile-010') File "/Library/Python/2.7/site-packages/Deadline/Slaves.py", line 78, in GetSlaveSettings return self.connectionProperties.__get__("/api/slaves?Name="+name.replace(' ','+')+"&Data=settings") File "/Library/Python/2.7/site-packages/Deadline/ConnectionProperty.py", line 35, in __get__ return DeadlineSend.send(self.address,commandString, "GET", self.useAuth, self.user, self.password) File "/Library/Python/2.7/site-packages/Deadline/DeadlineSend.py", line 35, in send response = opener.open(request) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 404, in open response = self._open(req, data) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 422, in _open '_open', req) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 382, in _call_chain result = func(*args) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1214, in http_open return self.do_open(httplib.HTTPConnection, req) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1187, in do_open r = h.getresponse(buffering=True) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1045, in getresponse response.begin() File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 409, in begin version, status, reason = self._read_status() File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 373, in _read_status raise BadStatusLine(line) httplib.BadStatusLine: ''

So the response pipeline may be single threaded. I’ve got other things I need to do today, but I’ll have the right people take a look at this after I’ve done more testing tomorrow.

Thanks Edwin!