AWS Thinkbox Discussion Forums

beta2: force relaunch slave failure

Sent a remote ‘force relaunch slave’ command to some slaves, all of them failed to restart. They shut down properly, but did not restart:

Launcher log:

2014-08-21 11:16:41:  ::ffff:172.19.6.114 has connected
2014-08-21 11:16:41:  Launcher Thread - Received command: ForceRelaunchSlave LAPRO0601
2014-08-21 11:16:42:  Sending command to slave: StopSlave 
2014-08-21 11:16:42:  Got reply: LAPRO0601: Sent "StopSlave" command. Result: ""
2014-08-21 11:16:44:  Launcher Thread - Responded with: Success|
2014-08-21 11:16:54:  Local version file: C:\Program Files\Thinkbox\Deadline7\bin\Version
2014-08-21 11:16:54:  Network version file: \\inferno2.scanlinevfxla.com\deadline\repository7\bin\Windows\Version
2014-08-21 11:16:54:  Comparing version files...
2014-08-21 11:16:54:  Version files match
2014-08-21 11:16:54:  Launching Slave: LAPRO0601

Slave log:

2014-08-21 11:16:42:  Info Thread - requesting slave info thread quit.
2014-08-21 11:16:42:  Info Thread - shutdown complete
2014-08-21 11:16:44:  Scheduler Thread - shutdown complete

Repeated attempts at remote startup also don’t result in a running slave:

2014-08-21 11:19:46:  ::ffff:172.19.6.114 has connected
2014-08-21 11:19:46:  Launcher Thread - Received command: ForceRelaunchSlave LAPRO0601
2014-08-21 11:19:46:  No Slave to shutdown
2014-08-21 11:19:46:  Launcher Thread - Responded with: Success|
2014-08-21 11:19:56:  Local version file: C:\Program Files\Thinkbox\Deadline7\bin\Version
2014-08-21 11:19:56:  Network version file: \\inferno2.scanlinevfxla.com\deadline\repository7\bin\Windows\Version
2014-08-21 11:19:56:  Comparing version files...
2014-08-21 11:19:56:  Version files match
2014-08-21 11:19:56:  Launching Slave: LAPRO0601
2014-08-21 11:20:13:  Auto Configuration: Picking configuration based on: LAPRO0601 / 172.18.10.211
2014-08-21 11:20:13:  Auto Configuration: No auto configuration could be detected, using local configuration
2014-08-21 11:20:13:  Updating Repository options
2014-08-21 11:20:13:    - Remote Administration: enabled
2014-08-21 11:20:13:    - Automatic Updates: enabled
2014-08-21 11:20:19:  ::ffff:172.19.6.114 has connected
2014-08-21 11:20:19:  Launcher Thread - Received command: LaunchSlave LAPRO0601
2014-08-21 11:20:19:  Local version file: C:\Program Files\Thinkbox\Deadline7\bin\Version
2014-08-21 11:20:19:  Network version file: \\inferno2.scanlinevfxla.com\deadline\repository7\bin\Windows\Version
2014-08-21 11:20:19:  Comparing version files...
2014-08-21 11:20:20:  Version files match
2014-08-21 11:20:20:  Launching Slave: LAPRO0601
2014-08-21 11:20:20:  Launcher Thread - Responded with: Success|

Seems like if i shut down the launcher, i cant restart it… Process never appears. No errors, no logs

Even a machine reboot doesnt fix it… very odd

Might be related to a pyqt conflict… we are adding our own sitepackages folders.

Please ignore for now, ill post here if i cant resolve it

Seems like this is a critical issue.

If you have a print statement in the sitecustomize.py in C:\Program Files\Thinkbox\Deadline7\bin\Lib\site-packages\sitecustomize.py

None of the deadline apps will start, with the exception of dpython.exe…

Just simply create a sitecustomize.py with
print (“Hello world”)

and try to launch any of the deadline applications

I can reproduce this. Thanks for reporting it! We’ll definitely look at getting this fixed for beta 3.

Cheers,
Ryan

Looks like there was something wonky with the python27.dll that we built and shipped with the first couple betas.

We just rebuilt Python today for a different issue (also moved to 2.7.8), I tested against that one and it seems whatever was wrong with the old one has been fixed in this build.

The updated .dll should be part of the next beta release =).

Cheers,
Jon

Any chance of getting that dll now? If i have to wait 3 weeks for python to work that basically puts a hold on our testing / integration… :\

I get other python errors as well, such as this:
2014-08-21 14:54:47: 0: INFO: ‘import site’ failed; use -v for traceback
2014-08-21 14:54:47: 0: INFO: Traceback (most recent call last):
2014-08-21 14:54:47: 0: INFO: File “”, line 1, in
2014-08-21 14:54:47: 0: INFO: File “C:\Program Files\Thinkbox\Deadline7\bin\lib\os.py”, line 398, in
2014-08-21 14:54:47: 0: INFO: import UserDict
2014-08-21 14:54:47: 0: INFO: File “C:\Program Files\Thinkbox\Deadline7\bin\lib\UserDict.py”, line 84, in
2014-08-21 14:54:47: 0: INFO: _abcoll.MutableMapping.register(IterableUserDict)
2014-08-21 14:54:47: 0: INFO: File “C:\Program Files\Thinkbox\Deadline7\bin\lib\abc.py”, line 109, in register
2014-08-21 14:54:47: 0: INFO: if issubclass(subclass, cls):
2014-08-21 14:54:47: 0: INFO: File “C:\Program Files\Thinkbox\Deadline7\bin\lib\abc.py”, line 184, in subclasscheck
2014-08-21 14:54:47: 0: INFO: cls._abc_negative_cache.add(subclass)
2014-08-21 14:54:47: 0: INFO: File “C:\Program Files\Thinkbox\Deadline7\bin\lib_weakrefset.py”, line 84, in add
2014-08-21 14:54:47: 0: INFO: self.data.add(ref(item, self._remove))
2014-08-21 14:54:47: 0: INFO: TypeError: cannot create weak reference to ‘classobj’ object

Again, using dpython it seems to work, but not within deadlineslave.exe

Was the PYTHONPATH behavior changed? Seems like its trying to load 2.7 libs from deadline’s folder when its runnning a python 2.6 command line in a subprocess

Seems like any regular ‘print’ will take down deadline completely, not just ones ran in sitecustomize. We have scripts that simply print to the std out and they also crash the slave to desktop.

If we attempt to connect to shotgun, it crashes deadlineslave too. Probably the same python issue, or something related to that

I’ve attached a zipped up copy of python27.dll. In my limited tests, I just dropped it in as a replacement, replacing the existing python27.dll; I’m not sure if any of the shipping libs/modules changed with 2.7.8, but I didn’t encounter any issues.

The Shotgun thing is actually the first reason we needed to rebuild it; there was something wrong with the SSL cert validation that would hard crash whatever was running. That affected cloud plugins/shotgun/etc. I believe this new DLL should fix that issue as well, but if not, there should be a flag to turn off the SSL validation when connecting (if you’re using the built-in stuff, there’s a toggle for it in the Shotgun Event Plugin config).

As for PYTHONPATH behaviour, I don’t think that’s changed between 6.2 & 7.0… We also shouldn’t be running any 2.6 shells at all, everything should be 2.7. Or are you referring to running a command line Python job, using 2.6?
python27.zip (1.21 MB)

We are using our own inhouse shotgun libraries, so turning off SSL would affect the whole facility. Due to mpaa regulations that’s not something we would want to do, i’ll try the new dll asap!

Yes, our in-house standard python version is 2.6. So whenever we execute a python script using the shell from deadline it crashes now. Previously, i guess it didnt crash because cross pollination didn’t occur as deadline was using 2.6 as well.

Minor note on the dll, i dropped it in the windows/bin.zip and changed the version file. Deadline did an ‘upgrading deadline’, but the python27.dll was not copied over. Weird!
Rolling it manually now

Shotgun still crashes deadline to desktop…

I have a feeling you might also need to update these dlls if its ssl related:

C:\Program Files\Thinkbox\Deadline7\bin\DLLs_ssl.pyd
C:\Program Files\Thinkbox\Deadline7\bin\DLLs_socket.pyd

The new dll did fix the crashes from printing though!

Privacy | Site terms | Cookie preferences