AWS Thinkbox Discussion Forums

pulse crash

Weird pulse crash… no error message, nothing.

The monitor reports pulse to be running, even though its been down 12 hours:

Last bits of the log:

2013-10-17 20:07:00: CheckPathMapping: Swapped “//s2/exchange/software/managed/deadline/scriptDependency/scriptDependency.py” with “/mnt/s2/exchange/software/managed/deadline/scriptDependency/scriptDependency.py”
2013-10-17 20:07:00: Executing dependency script: /mnt/s2/exchange/software/managed/deadline/scriptDependency/scriptDependency.py
2013-10-17 20:07:00: Running Scanline scriptDependency checker v0.21 for JobID: 5260a3387c5b8a2a3c2a0c28 TaskIDs: [0, 1, 2, 3, 4, 5, 6]
2013-10-17 20:07:00: scriptDependency Task ID: 0 (5260a3387c5b8a2a3c2a0c28_0) Frames: 1000-1009
2013-10-17 20:07:00: DO NOT Queue
2013-10-17 20:07:00: scriptDependency Task ID: 1 (5260a3387c5b8a2a3c2a0c28_1) Frames: 1010-1019
2013-10-17 20:07:00: DO NOT Queue
2013-10-17 20:07:00: scriptDependency Task ID: 2 (5260a3387c5b8a2a3c2a0c28_2) Frames: 1020-1029
2013-10-17 20:07:00: DO NOT Queue
2013-10-17 20:07:00: scriptDependency Task ID: 3 (5260a3387c5b8a2a3c2a0c28_3) Frames: 1030-1039
2013-10-17 20:07:00: DO NOT Queue
2013-10-17 20:07:00: scriptDependency Task ID: 4 (5260a3387c5b8a2a3c2a0c28_4) Frames: 1040-1049
2013-10-17 20:07:00: DO NOT Queue
2013-10-17 20:07:00: scriptDependency Task ID: 5 (5260a3387c5b8a2a3c2a0c28_5) Frames: 1050-1059
2013-10-17 20:07:00: DO NOT Queue
2013-10-17 20:07:00: scriptDependency Task ID: 6 (5260a3387c5b8a2a3c2a0c28_6) Frames: 1060-1066
2013-10-17 20:07:00: DO NOT Queue
2013-10-17 20:07:00: Queueing: []
2013-10-17 20:07:00: Dependency script returned 0 tasks that can start: /mnt/s2/exchange/software/managed/deadline/scriptDependency/scriptDependency.py
2013-10-17 20:07:00: Cleaning up orphaned tasks
2013-10-17 20:07:01: Done.
2013-10-17 20:07:01: Process exit code: 0
2013-10-17 20:07:04: Power Management - Thermal Shutdown: Skipping zone “Laszlo” because it is disabled
2013-10-17 20:07:04: Power Management - Thermal Shutdown: Skipping zone “AnimatorWorkstations” because it is disabled
2013-10-17 20:07:04: Power Management - Thermal Shutdown: Skipping zone “Slaves” because it is disabled
2013-10-17 20:07:04: Power Management - Idle Shutdown: Skipping idle shutdown group “Laszlo” because it is disabled
2013-10-17 20:07:04: Power Management - Idle Shutdown: Skipping idle shutdown group “AnimatorWorkstations” because it is disabled
2013-10-17 20:07:04: Power Management - Idle Shutdown: Skipping idle shutdown group “Slaves” because it is disabled
2013-10-17 20:07:04: Power Management - Machine Startup: There are no slaves that need to be woken up at this time
2013-10-17 20:07:04: Power Management - Machine Restart: Skipping machine group “Laszlo” because it is disabled
2013-10-17 20:07:04: Power Management - Machine Restart: Skipping machine group “AnimatorWorkstations” because it is disabled
2013-10-17 20:07:04: Power Management - Machine Restart: Skipping machine group “Slaves” because it is disabled
2013-10-17 20:07:04: Power Management - Slave Scheduling: Skipping scheduling group “Laszlo” because it is disabled
2013-10-17 20:07:04: Power Management - Slave Scheduling: Skipping scheduling group “AnimatorWorkstations” because it is disabled
2013-10-17 20:07:04: Power Management - Slave Scheduling: Skipping scheduling group “Slaves” because it is disabled

Can the pulse machine somehow be configured to auto start pulse if it dies?

Kinda like an old’timey backburner style batch file with a loop?

We do want to add some sort of auto-pulse failover or startup feature at some point. For now though, you could just have a scheduled task that runs deadlinepulse.exe every X minutes. If pulse is already running, running deadlinepulse.exe again won’t do anything, so you wouldn’t have to worry about multiple pulses running on the same machine.

Can i do that with the launcher somehow?

Ideally, i would like the self update to be triggered.

Or should i just do a:

mono --runtime=v4.0 /opt/Thinkbox/Deadline6/bin/deadlinepulse.exe -nogui

Yeah, you could do that with the launcher. This should be enough:

/opt/Thinkbox/Deadline6/bin/deadlinelauncher -pulse

It will trigger an auto-update if necessary, otherwise if pulse is running, it will do nothing.

It seems that for some reason our cronjob is not restarting pulse… it goes down a couple of times a day, and the restart attempts from the cronjob are not even being registered in the launcher log. Very odd… right now, i have to restart it manually after someone complains that dependencies arent dequeuing.

Man, it shouldn’t be crashing like that. Anything in the logs?

Last item is running the script dependency check python script, but no crash dump

Oh, I forgot that in beta 9, we made the ability to run the housecleaning in a separate thread optional. In the repository options, under Housecleaning, try turning the option back on to run it in a separate thread. Then restart Pulse to see if it stays up, and post the log if it doesn’t.

One way to get more info out of pulse is redirect its stdout to a log file. If there is a crash dump that isn’t making it into the regular pulse log file, it should hopefully make it into the one you’re redirecting stdout into.

Attached is the crash log from when it was still running in the same thread!

I’ve now reset that value, and restarting pulse
deadline_pulse_crash.txt (23.9 KB)

Thanks Laszlo! That log was very helpful, since it led us to a manual garbage collection trigger that shouldn’t have been there in the first place. We’ll be removing it in beta 10.

Cheers,

  • Ryan

Very odd, pulse still crashes (much less often though), but our cronjob is not restarting it…

Its been down for 2 days now (over the weekend, when we had a bunch of people working sadly)

Any ideas why the cronjob would not work?

0-59/2 * * * * /opt/Thinkbox/Deadline6/bin/deadlinelauncher -pulse

It doesnt seem to work at all… when i run the same command from the command line, i see in the launcher log that its doing some version checking etc. But there is not a single entry from the executions supposedly triggered by crontab

Hopefully the removal of the garbage collection code in beta 10 stabilizes things.

Not sure why that command wouldn’t be working though, unless perhaps the Pulse listening socket is still open. Does it make a difference if /opt/Thinkbox/Deadline6/bin/deadlinepulse is run directly instead of going through the launcher?

Seems like cornjobs dont have access to mono due to environment variable issues

If i have the command as:

          • /opt/mono-2.10.9/bin/mono /opt/Thinkbox/Deadline6/bin/deadlinelauncher.exe -pulse

It works

Privacy | Site terms | Cookie preferences