AWS Thinkbox Discussion Forums

FR: PM - Thermal Management - IPMI sensors

Hi,
I’d like to make a suggestion for enhancement to the thermal management sensors in Deadline. SNMP is great, but we are seeing more and more fancier kit these days which only has IPMI support. This is due to hardware such as blades and their “chassis” that they live in, requiring considerably more power. The manufacturers such as HP, Dell have removed SNMP in a bid to save power and only support IPMI going forward. This means newer kit is not going to have support for SNMP thermal sensors for Deadline. (I’m not asking to remove SNMP. The exact opposite! Keep SNMP but enhance it with the addition of IPMI!) So, here’s my proposal:

  1. With the aid of the openSource project “ipmiutil”, compile this application within Deadline to provide IPMI support.
  2. See attached screen-grab of how the current “Add Thermal Shutdown Sensor” dialog would need to change.
    ipmi_sensors.png
  3. Would need the following command executing: ““c:\ipmiutil_install_path\ipmiutil.exe” “ipmiutil sensor -N [IP_ADDRESS] -U [USERNAME] -P [PASSWORD]”” to query an IPMI sensor
  4. Replace SNMP OID with ipmi “ID” number.
  5. Enhance the “Test” button in the current dialog to allow a manual query to see the STDOUT as per below.
  6. Finally, provide (potentially) a conversion table between the returned Hex value and a real-world, usable string value for Deadline. (I think ipmiutil can already take care of this conversion, but just needs to be checked! It’s also better to keep this task to ipmiutil as its a crictical path for their project to maintain!)

See below for an example print dump of what happens when you execute the code above:

[code]
c:\Program Files (x86)\sourceforge\ipmiutil>ipmiutil sensor -N [IP_ADDRESS] -U [USERNAME] -P [PASSWORD]

ipmiutil ver 2.86
isensor: version 2.86
Opening lan connection to node 192.168.0.1 …
Connecting to node 192.168.0.1
– BMC version 2.50, IPMI version 2.0
ID SDR_Type_xx ET Own Typ S_Num Sens_Description Hex & Interp Reading
0001 SDR Full 01 01 20 a 01 snum 01 Temp = 5b OK -37.00 degrees C
0002 SDR Full 01 01 20 a 01 snum 02 Temp = 57 OK -41.00 degrees C
0003 SDR Full 01 01 20 a 01 snum 03 Mem Temp 1 = a4 OK 36.00 degrees C
0004 SDR Full 01 01 20 a 01 snum 04 Mem Temp 2 = a4 OK 36.00 degrees C
0005 SDR Comp 02 6f 20 a 29 snum 10 CMOS Battery = 0000 OK
0006 SDR Comp 02 03 20 a 02 snum 11 VCORE = 0001 OK
0007 SDR Comp 02 03 20 a 02 snum 12 VCORE = 0001 OK
0008 SDR Comp 02 03 20 a 02 snum 13 CPU VTT = 0001 OK
0009 SDR Comp 02 03 20 a 02 snum 14 1.5V PG = 0001 OK
000a SDR Comp 02 03 20 a 02 snum 15 1.8V PG = 0001 OK
000b SDR Comp 02 03 20 a 02 snum 16 3.3V PG = 0001 OK
000c SDR Comp 02 03 20 a 02 snum 17 5V PG = 0001 OK
000d SDR Comp 02 03 20 a 02 snum 18 1.5V ESB2 PG = 0001 OK
000e SDR Comp 02 03 20 a 02 snum 19 Linear PG = 0001 OK
000f SDR Comp 02 03 20 a 02 snum 1a 0.9V PG = 0001 OK
0010 SDR Comp 02 03 20 a 02 snum 1b 0.9V Over Volt = 0001 OK
0011 SDR Comp 02 03 20 a 02 snum 1c CPU Power Fault = 0001 OK
0012 SDR Comp 02 6f 20 a 25 snum 50 Presence = 0001 OK*
0013 SDR Comp 02 6f 20 a 25 snum 51 Presence = 0001 OK*
0014 SDR Comp 02 6f 20 a 07 snum 60 Status = 0080 ProcPresent
0015 SDR Comp 02 6f 20 a 07 snum 61 Status = 0080 ProcPresent
0016 SDR Comp 02 6f 20 a 23 snum 71 OS Watchdog = 0000 OK
0017 SDR Comp 02 6f 20 a 10 snum 72 SEL = 0000 Unknown
0018 SDR Comp 02 03 20 m 01 snum 76 CPU Temp Interf = 0040 NotAvailable
0019 SDR Comp 02 03 20 a 02 snum 5f PFault Fail Safe = 0000 Unknown
001a SDR Comp 02 0a 20 m 17 snum 55 Daughter Card = 0040 NotAvailable
001b SDR Comp 02 6f 20 a 0d snum 80 Drive = 0001 Unused Faulty
001c SDR EntA 08 10 07 01 c0: 03 01 03 02 0a 01 0a 02
001d SDR IPMB 12 0e dev: 20 00 9f 07 01 BMC
001e SDR FRU 11 17 dev: 20 00 80 00 07 01 System Board
001f SDR FRU 11 0f dev: 20 b0 02 00 03 01 CPU1
0020 SDR FRU 11 0f dev: 20 b0 02 00 03 02 CPU2
0021 SDR FRU 11 12 dev: 20 05 80 00 1a 01 Storage
0022 SDR FRU 11 18 dev: 20 07 80 00 0b 03 Daughter Card
0023 SDR Comp 02 6f b1 a 0c snum 01 ECC Corr Err = 0000 Unknown
0024 SDR Comp 02 6f b1 a 0c snum 02 ECC Uncorr Err = 0000 Unknown
0025 SDR Comp 02 6f b1 a 13 snum 03 I/O Channel Chk = 0000 Unknown
0026 SDR Comp 02 6f b1 a 13 snum 04 PCI Parity Err = 0000 Unknown
0027 SDR Comp 02 6f b1 a 13 snum 05 PCI System Err = 0000 Unknown
0028 SDR Comp 02 6f b1 a 10 snum 06 SBE Log Disabled = 0000 Unknown
0029 SDR Comp 02 6f b1 a 10 snum 07 Logging Disabled = 0000 Unknown
002a SDR Comp 02 6f b1 a 12 snum 08 Unknown = 0000 Unknown
002b SDR Comp 02 07 b1 a 07 snum 0a CPU Protocol Err = 0000 Unknown
002c SDR Comp 02 07 b1 a 07 snum 0b CPU Bus PERR = 0000 Unknown
002d SDR Comp 02 07 b1 a 07 snum 0c CPU Init Err = 0000 Unknown
002e SDR Comp 02 07 b1 a 07 snum 0d CPU Machine Chk = 0000 Unknown
002f SDR Comp 02 0b b1 a 0c snum 11 Memory Spared = 0000 Unknown
0030 SDR Comp 02 0b b1 a 0c snum 12 Memory Mirrored = 0000 Unknown
0031 SDR Comp 02 0b b1 a 0c snum 13 Memory RAID = 0000 Unknown
0032 SDR Comp 02 6f b1 a 0c snum 14 Memory Added = 0000 Unknown
0033 SDR Comp 02 6f b1 a 0c snum 15 Memory Removed = 0000 Unknown
0034 SDR Comp 02 6f b1 a 0c snum 16 Memory Cfg Err = 0000 Unknown
0035 SDR Comp 02 0b b1 a 0c snum 17 Mem Redun Gain = 0000 Unknown
0036 SDR Comp 02 6f b1 a 13 snum 18 PCIE Fatal Err = 0000 Unknown
0037 SDR Comp 02 6f b1 a 13 snum 19 Chipset Err = 0000 Unknown
0038 SDR Comp 02 7e b1 a c1 snum 1a Err Reg Pointer = 0000 Unknown
0039 SDR Comp 02 07 b1 a 0c snum 1b Mem ECC Warning = 0000 Unknown
003a SDR Comp 02 07 b1 a 0c snum 1c Mem Intrface Err = 0000 Unknown
003b SDR Comp 02 07 b1 a 0c snum 1d USB Over-current = 0000 Unknown
003c SDR Comp 02 6f b1 a 0f snum 1e POST Err = 0000 Unknown
003d SDR Comp 02 6f b1 a 2b snum 1f Hdwr version err = 0000 Unknown
003e SDR Comp 02 6f b1 a 0c snum 20 Mem Overtemp = 0000 Unknown
003f SDR Comp 02 6f b1 a 0c snum 21 Mem Fatal SB CRC = 0000 Unknown
0040 SDR Comp 02 6f b1 a 0c snum 22 Mem Fatal NB CRC = 0000 Unknown
ipmiutil sensor, completed successfully[/code]

I’m not expecting a miracle in v6.0, but v6.1 would be lovely :slight_smile:

Thanks,
Mike

Hey Mike,

We’ve added this to the wish list. It’s still to early to tell when we would have time to look into this, but once 6.0 starts to wrap up, we’re going to be sorting through our ever growing wishlist to figure out the roadmap for 6.1.

Cheers,

  • Ryan
Privacy | Site terms | Cookie preferences