Remote Control acting weird on linux 8.0.7.3

Hello guys

Ever since I’ve put 8.x in production I have some issues with remote controlling nodes on Linux

Here are the facts:
Remote control stop slave does stop the slave but the process remains and is then unable to relaunch since already existing
Remote control restart machine after current task completion restarts immediately
(that’s all we use in our setup so I didn’t test any further other options)
Remote control for Windows nodes is working perfectly fine

Linux Nodes are running CentOS Linux release 7.2.1511 (Core) from a netboot image

Provided are the scripts handling :
Deadline Launcher upon boot as user ‘render’ as a service in /etc/init.d and regular rc.d startup sequence
Deadline Slave if no interactive logon is made on a machine called by a regular cron job

From Deadline 7 we already had an issue where MAC Addresses are incorrectly reported as FF:FF:FF:FF:FF:FF from all Linux hosts
It is possible that the slave/launcher lacks a tiny bit of permissions to access some necessary information and process as expected

Note that Deadline 7 using the same schema was working all right (except for the MAC Address thing)

Any clue ?
Any more info I should provide ?

Thanks

init.d

[code]#!/bin/bash

deadlinelauncherservice Handles remote communication for Deadline.

chkconfig: 345 20 80

description: Handles remote communication between Deadline Client applications.

BEGIN INIT INFO

Default-Start: 3 4 5

Default-Stop: 0 6

Required-Start: network remote_fs

Required-Stop: network

Short-Description: Deadline remote communication.

Description: Handles remote communication between Deadline Client applications.

END INIT INFO

Source function library. (Supported only on Red Hat distributions.

( . /etc/rc.d/init.d/functions > /dev/null 2>&1 )

DEADLINEBIN=/opt/thinkbox/deadline-8.0.7.3/bin
DEADLINELOG=/var/log/Thinkbox/Deadline8
DEADLINEVERSION=8

We use the user render

#LAUNCHERSERVICEUSERNAME=deadline
LAUNCHERSERVICEUSERNAME=render
LAUNCHERSERVICEFILENAME=“launcherservice”
LAUNCHERSERVICELOCK="/var/lock/subsys/deadline$DEADLINEVERSION$LAUNCHERSERVICEFILENAME"
LOGFILE="/var/log/deadline$DEADLINEVERSION$LAUNCHERSERVICEFILENAME.log"

The correct folder is /run https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard

#PIDFILE="/tmp/deadline$DEADLINEVERSION$LAUNCHERSERVICEFILENAME.pid"
PIDFILE="/run/deadline$DEADLINEVERSION$LAUNCHERSERVICEFILENAME.pid"
LAUNCHERSTARTED=“Deadline $DEADLINEVERSION Launcher Service Started”
LAUNCHERSTOPPED=“Deadline $DEADLINEVERSION Launcher Service Stopped”
RUNNING=0 # Is the process in our PID file running?
PID=-1 # What is the process ID?

check_running() {
if [ -e “$PIDFILE” ]; then
PID=cat $PIDFILE
ps -p $PID &> /dev/null
if [ $? -eq 0 ]; then
RUNNING=1
fi
fi
RUNNING=0
}

start () {
check_running
if [ $RUNNING -eq 1 ]; then
echo “Deadline $DEADLINEVERSION Launcher Service already running”
exit 1
else
# Start the Deadline Launcher in headless mode.
if [ -x “$DEADLINEBIN/deadlinelauncher” ]; then
# Deadline is used by multiple users
# Every one of them must be able to write in the logs
mkdir -p $DEADLINELOG
chmod 777 $DEADLINELOG
# End

                    command="$DEADLINEBIN/deadlinelauncher -daemon -nogui > /dev/null & echo \$!"
                    /bin/su - $LAUNCHERSERVICEUSERNAME -c "$command" > $PIDFILE
                    echo $LAUNCHERSTARTED
                    date "+%F %T: $LAUNCHERSTARTED" >> $LOGFILE
                    date "+%F %T: Full Launcher log can be found in $DEADLINELOG" >> $LOGFILE

                    touch $LAUNCHERSERVICELOCK 2> /dev/null

                    exit 0
            else
                    echo "Either deadlinelauncher could not be found or could not be executed."
                    exit 1
            fi
    fi

}

stop () {
check_running
if [ $RUNNING -eq 1 ]; then
echo “Shutting down Deadline Launcher…”

            # Try shutting down the Deadline Launcher gracefully first, then kill if necessary.
            if [ -x "$DEADLINEBIN/deadlinelauncher" ]; then
                    /bin/su - $LAUNCHERSERVICEUSERNAME -c '$DEADLINEBIN/deadlinelauncher -shutdownall' &> /dev/null
            fi

            sleep 3

            check_running
            if [ $RUNNING -eq 1 ]; then
                    echo "Forcing exit"
                    kill -QUIT $PID
            fi

            echo $LAUNCHERSTOPPED
            date "+%F %T: $LAUNCHERSTOPPED" >> $LOGFILE
            rm -rf "$PIDFILE"

            # Alright, it should actually be dead now, let init know.
            rm -f $LAUNCHERSERVICELOCK

            exit 0
    else
            rm -rf "$PIDFILE"
            echo "Deadline $DEADLINEVERSION Launcher Service is not running"
            exit 7
    fi

}

restart() {
check_running
if [ $RUNNING -eq 1 ]; then
echo “Shutting down Deadline Launcher…”

            # Try shutting down the Deadline Launcher gracefully first, then kill if necessary.
            if [ -x "$DEADLINEBIN/deadlinelauncher" ]; then
                    /bin/su - $LAUNCHERSERVICEUSERNAME -c '$DEADLINEBIN/deadlinelauncher -shutdownall' &> /dev/null
            fi

            sleep 3

            check_running
            if [ $RUNNING -eq 1 ]; then
                    echo "Forcing exit"
                    kill -QUIT $PID
            fi

            echo $LAUNCHERSTOPPED
            date "+%F %T: $LAUNCHERSTOPPED" >> $LOGFILE
            rm -rf "$PIDFILE"
    else
            rm -rf "$PIDFILE"
            echo "Deadline $DEADLINEVERSION Launcher Service is not running"
    fi

    start

}

reload() {
restart
}

force_reload(){
restart
}

rh_status(){
check_running
if [ $RUNNING -eq 1 ]; then
echo “Deadlinelauncherservice is running as PID $PID.”
exit 0
else
echo “Deadlinelauncherservice is stopped.”
exit 3
fi
}

rh_status_q(){
rh_status > /dev/null 2>&1
}

See how we were called.

case “$1” in
start)
start
;;
stop)
stop
;;
restart)
restart
;;
reload)
rh_status_q || exit 7
$1
;;
force-reload)
force_reload
;;
status)
rh_status
;;
condrestart|try-restart)
rh_status_q || exit 0
restart
;;
*)
echo “Usage: $0 {start|stop|status|restart|condrestart|try-restart|reload|force-reload}”
exit 3
esac[/code]

cron

[code]#!/bin/bash

2015/09/15

This script will run from cron by render user

It will look if somebody is logged in

If not a slave will be started, otherwhise

a slave / task will stop

DLS_COMMAND="/opt/thinkbox/deadline-8.0.7.3/bin/deadlinecommand"
DLS_LOGFILE="/tmp/deadline-slave-8.log"
DEBUG=1

FQDN=hostname -f
HOSTNAME=hostname -s
if [ -f $DLS_COMMAND ];then
###if [[ “$(w | grep tty | wc -l)” -eq “0” ]] ;then
if [[ “$(w | grep -E 'tty| :[0-9]* ’ | wc -l)” -eq “0” ]] ;then
# Nobody logged in
if [[ “$(pgrep -u render -f /opt/thinkbox/deadline-8.0.7.3/bin/deadlineslave.exe | wc -l )” -eq “0” ]] ;then
[[ $DEBUG -eq “1” ]] && echo “Start a slave”
$DLS_COMMAND “RemoteControl” $HOSTNAME “LaunchSlave” 2>&1 >>$DLS_LOGFILE
else
[[ $DEBUG -eq “1” ]] && echo “Do nothing”
fi
else
# Someone is working
if [[ “$(pgrep -u render -f /opt/thinkbox/deadline-8.0.7.3/bin/deadlineslave.exe | wc -l )” -eq “0” ]] ;then
[[ $DEBUG -eq “1” ]] && echo “Do nothing”
else
[[ $DEBUG -eq “1” ]] && echo “Slave running, i will stop the slave”
$DLS_COMMAND “RemoteControl” $HOSTNAME “StopSlave” 2>&1 >>$DLS_LOGFILE
ps aux | grep /opt/thinkbox/deadline-8.0.7.3/bin/deadlineslave.exe | grep -v grep | awk ‘{ print $2 }’ >>$DLS_LOGFILE
sleep 5
PID=ps aux | grep /opt/thinkbox/deadline-8.0.7.3/bin/deadlineslave.exe | grep -v grep | awk '{ print $2 }'
kill -9 $PID >>$DLS_LOGFILE
fi
fi
fi[/code]

Hello,

I think getting the launcher and slave log from a machine where this has been tested from the same day so we can see what is happening, as opposed to what should be happening.

Update :

The Launcher is now called by systemd instead of the old fashion init.d
On top of it the 7/8 transition is now over

Calling the launcher by systemd helped a bit
Removing parallel use of deadline 7/8 fixed it for good