-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plugin: ipmi_sensor: fails to detect psu failure #5755
Comments
It looks like we don't handle hexadecimal values currently. Could you also run this command and add the output for these sensors so I can be sure to fix it for
|
Wow, that is a worst timing ever :) I just replaced it with new ones. |
Ok, ipmitool sdr and sdr elist outputs looks like this:
(both PSU in one status, this is horrible) after bmc update, turning off one power line:
switching to failed psu:
PS funny thing:
PS1 is missing but its OK :) |
@vvershkov We are finalizing a fix for the You will need to setup your alerts to handle this specific issue either with the |
Relevant telegraf.conf:
System info:
Any OS or telegraf version: this bug is caused due to ipmi itself
Steps to reproduce:
We have a server with failed PSU: we know it because we saw it: it has amber light instead of green and sound alarm too. But ipmi and telegraf detect it status as OK:
However 0x03 flag is "failure":
https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-sg8039en_us&docLocale=en_US
Expected behavior:
PS1 Status is bad (0)
Actual behavior:
PS1 Status is OK (1)
Additional info:
AFAIK there is no readings with 0x03 and "OK" status. for old motherboards - 0x03 for CPU for example means overheating. But maybe flag check needed only for PSU.
The text was updated successfully, but these errors were encountered: