Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update filtering of spikes in temp reading #961

Merged

Conversation

MSECode
Copy link
Contributor

@MSECode MSECode commented May 22, 2024

Completing the work done in #959, the spikes in the temperature readings are
now filtered by software.
This means that if the delta between current and previous temperature is higher than a threshold (which is currently defined in sw and by default is 20 Celsius degree) that new temperature read is considered as a spike and therefore, even if its value is higher than the warning temperature threshold or the error temperature threshold, it is discarded and not considered and the previous temperature is not updated.
Moreover, we have added to the Watchdog class a method for setting the threshold, regarding the maximum number of erroneous values read before sending error or warning, depending on the transmission rate of the ethernet packets.
Therefore, depending on the transmission rate defined in the configuration of the robot, we are setting the watchdogs for overcoming the warning threshold on the temperature reading and for overcoming the limit on the negative temperature values (which means that we got an error in the reading) so that it is always constant and independent to the transmission rate of the ethernet packets.
By default, it is set so that we rise the warning or error when we are getting erroneous temperature values (which are not spikes) for at least 60 seconds.

Spikes in temperature readings are now filtered
when checking for overcoming warning/hw temperature limits spikes are
not considered
Median filter is added for initialization of temperature so that we are
not risking to save invalid or null temperatures
Overcoming of warning and hw limits is still correctly checked if not a
spike
Counter on the watchdog thresholds is now divided by the eth transmission rate
We set a limit for the watchdog when temperature is higher than the
warn/hw limit for 60 seconds continuously
@MSECode MSECode self-assigned this May 22, 2024
@MSECode MSECode requested a review from valegagge May 22, 2024 14:42
@MSECode MSECode marked this pull request as draft May 22, 2024 14:43
@MSECode MSECode requested a review from pattacini May 22, 2024 14:43
Copy link
Member

@valegagge valegagge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work!

@valegagge valegagge marked this pull request as ready for review May 23, 2024 08:59
@pattacini pattacini linked an issue May 24, 2024 that may be closed by this pull request
@pattacini pattacini merged commit 5410ab0 into robotology:devel May 24, 2024
8 checks passed
@S-Dafarra
Copy link
Contributor

By default, it is set so that we rise the warning or error when we are getting erroneous temperature values (which are not spikes) for at least 60 seconds.

Hi there, just as a curiosity. Is there a way to get these warnings from code?

@MSECode
Copy link
Contributor Author

MSECode commented May 27, 2024

I'm not sure if I'm replying to what u need. Anyway, warnings are just related to reading a temperature higher than the warning threshold when that increment it is not a spike (where in this case a spike is defined when the delta between current and previous temperature is higher than 20 Celsius degree). Moreover, the first time temperature goes over the warning threshold a warning message is sent. Then, in order to not flooding the YRI, we send the next warning message if the temperature stays continuously above the threshold for at least 60 seconds. Timing definitions have been set here for now:

class Watchdog
{
private:
bool _isStarted;
uint32_t _count;
double _time;
uint32_t _threshold; // use 10000 as limit on the watchdog for the error on the temperature sensor receiving of the values -
// since the ETH callback timing is 2ms by default so using 10000 we can set a checking threshould of 5 second
// in which we can allow the tdb to not respond. If cannot receive response over 1s we trigger the error
public:
Watchdog(): _count(0), _isStarted(false), _threshold(60000), _time(0){;}
Watchdog(uint32_t threshold):_count(0), _isStarted(false), _threshold(threshold), _time(0){;}
~Watchdog() = default;
Watchdog(const Watchdog& other) = default;
Watchdog(Watchdog&& other) noexcept = default;
Watchdog& operator=(const Watchdog& other) = default;
Watchdog& operator=(Watchdog&& other) noexcept = default;
bool isStarted(){return _isStarted;}
void start() {_count = 0; _time = yarp::os::Time::now(); _isStarted = true;}
bool isExpired() {return (_count > _threshold);}
void increment() {++_count;}
void clear(){_isStarted=false;}
double getStartTime() {return _time;}
uint32_t getCount() {return _count; }
void setThreshold(uint8_t txrateOfRegularROPs){_threshold = _threshold / txrateOfRegularROPs;}
uint32_t getThreshold(){return _threshold;}
};
class TemperatureFilter
{
private:
uint32_t _threshold; // threshold for the delta between current and previous temperature --> set to 20 Celsius deg by default --> over 20 deg delta spike
double _motorTempPrev; // motor temperature at previous instant for checking positive temperature spikes
bool _isStarted;
int32_t _initCounter;
std::vector<double> _initTempBuffer;
public:
TemperatureFilter(): _threshold(20), _isStarted(false), _initCounter(50), _initTempBuffer(0), _motorTempPrev(0){;}
TemperatureFilter(uint32_t threshold, int32_t initCounter): _threshold(threshold), _isStarted(false), _initCounter(initCounter), _initTempBuffer(0), _motorTempPrev(0){;}
~TemperatureFilter() = default;
TemperatureFilter(const TemperatureFilter& other) = default;
TemperatureFilter(TemperatureFilter&& other) noexcept = default;
TemperatureFilter& operator=(const TemperatureFilter& other) = default;
TemperatureFilter& operator=(TemperatureFilter&& other) noexcept = default;
bool isStarted(){return _isStarted;}
uint32_t getTemperatureThreshold() {return _threshold; }
double getPrevTemperature(){return _motorTempPrev;}
void updatePrevTemperature(double temperature){_motorTempPrev = temperature;}
void start(double temperature)
{
if(_initCounter < 0)
{
int median_pos = std::ceil(_initTempBuffer.size() / 2) -1;
_motorTempPrev = _initTempBuffer.at(median_pos);
_isStarted = true;
}
else
{
_initTempBuffer.push_back(temperature);
--_initCounter;
}
}
};

Instead, when you read negative readings, you check meaning of Celsius and Raw value in the error message from this table
https://icub-tech-iit.github.io/documentation/temperature_sensors/software/dataflow/#error-handling

@S-Dafarra
Copy link
Contributor

I meant that if after 60s the joint goes in HF, I would love to have some warning before, that can be caught not just by reading at the YRI output (which is virtually impossible during experiments), but maybe via some API calls so that we can inform the user/controller accordingly.

@MSECode
Copy link
Contributor Author

MSECode commented May 27, 2024

The HF is entirely managed by the 2FOC. This part in icub-main has been done to filter all the spikes that can happen due to noisy readings. Those parameters can be eventually adjusted depending on the need.
Currently there are the APIs for retrieving the instantaneous value of temperature. We can think to even something else if necessary. Or it may be possible to update the module motorTemperaturePublisher to let it send advise to the user.
The idea of informing the low level controller can be problematic. We can think to inform the user, but in this case where should we send this "warning". What do u actually need for working fine? What should these APIs make to solve the problem?
@valegagge, do you have some other ideas?

@S-Dafarra
Copy link
Contributor

The idea of informing the low level controller can be problematic

Just to clarify, I was referring more to a "high-level" controller, like the one controlling the walking.

Or it may be possible to update the module motorTemperaturePublisher to let it send advise to the user.

I don't know about this since the motorTemperaturePublisher is not something that runs with the robot.

We can think to inform the user, but in this case where should we send this "warning". What do u actually need for working fine? What should these APIs make to solve the problem?

Maybe a dedicated interface with a defined list of warning codes? I am pretty open to discussion here. Here some possible things I am imagining:

  • the yarpmotorgui starts blinking on the joint that is about to fault
  • a module starts making some noise, maybe also sending notifications to the OS (it is possible with QT, see https://doc.qt.io/qt-5/qtwidgets-desktop-systray-example.html)
  • the walking controller goes in a fault safe configuration (maybe stopping the walking and going in a joint configuration that is safe in case of sudden HF)

@MSECode
Copy link
Contributor Author

MSECode commented May 27, 2024

Oks, that's clear. Those points seem good. I'll discuss with the team about an implementation such that.

@valegagge
Copy link
Member

Hi @S-Dafarra,
as we already discussed some time ago, the high-lever controller, such as the walking controller, should read the current motor temperature and compare it to the warning threshold. (Both information is available on IMotor interface). If the temperature exceeds the threshold should put the robot in a safe position or reduce the load on the motor with high temperature.

If I understand correctly, you are asking to signal in some way the warning state also on the yarpmotorgui.

The work done in this PR had the purpose of avoiding false-positive warning to get a cleaner log.

@S-Dafarra
Copy link
Contributor

If I understand correctly, you are asking to signal in some way the warning state also on the yarpmotorgui.

I was referring to the case in which there are measurement errors for more than 60s. I was just wondering in which case, the user code can be informed about the potentially imminent fault.

@valegagge
Copy link
Member

If I understand correctly, you are asking to signal in some way the warning state also on the yarpmotorgui.

I was referring to the case in which there are measurement errors for more than 60s. I was just wondering in which case, the user code can be informed about the potentially imminent fault.

The fault on error readings happens after 10 seconds. We had planned to fix the error reading issue by hardware. I'll update you ASAP. Stay tuned

@MSECode MSECode deleted the feature/temperatureWarningFiltering branch June 13, 2024 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix the motor temperature spike filtering in embObjMotionControl
4 participants