Update filtering of spikes in temp reading #961

MSECode · 2024-05-22T14:42:44Z

Completing the work done in #959, the spikes in the temperature readings are
now filtered by software.
This means that if the delta between current and previous temperature is higher than a threshold (which is currently defined in sw and by default is 20 Celsius degree) that new temperature read is considered as a spike and therefore, even if its value is higher than the warning temperature threshold or the error temperature threshold, it is discarded and not considered and the previous temperature is not updated.
Moreover, we have added to the Watchdog class a method for setting the threshold, regarding the maximum number of erroneous values read before sending error or warning, depending on the transmission rate of the ethernet packets.
Therefore, depending on the transmission rate defined in the configuration of the robot, we are setting the watchdogs for overcoming the warning threshold on the temperature reading and for overcoming the limit on the negative temperature values (which means that we got an error in the reading) so that it is always constant and independent to the transmission rate of the ethernet packets.
By default, it is set so that we rise the warning or error when we are getting erroneous temperature values (which are not spikes) for at least 60 seconds.

Spikes in temperature readings are now filtered when checking for overcoming warning/hw temperature limits spikes are not considered Median filter is added for initialization of temperature so that we are not risking to save invalid or null temperatures Overcoming of warning and hw limits is still correctly checked if not a spike Counter on the watchdog thresholds is now divided by the eth transmission rate We set a limit for the watchdog when temperature is higher than the warn/hw limit for 60 seconds continuously

valegagge

Good work!

S-Dafarra · 2024-05-27T09:05:48Z

By default, it is set so that we rise the warning or error when we are getting erroneous temperature values (which are not spikes) for at least 60 seconds.

Hi there, just as a curiosity. Is there a way to get these warnings from code?

MSECode · 2024-05-27T09:23:50Z

I'm not sure if I'm replying to what u need. Anyway, warnings are just related to reading a temperature higher than the warning threshold when that increment it is not a spike (where in this case a spike is defined when the delta between current and previous temperature is higher than 20 Celsius degree). Moreover, the first time temperature goes over the warning threshold a warning message is sent. Then, in order to not flooding the YRI, we send the next warning message if the temperature stays continuously above the threshold for at least 60 seconds. Timing definitions have been set here for now:

icub-main/src/libraries/icubmod/embObjMotionControl/embObjMotionControl.h

Lines 111 to 182 in 5410ab0

    
           class Watchdog 
        
           { 
        
           private: 
        
           bool _isStarted; 
        
           uint32_t _count; 
        
           double _time; 
        
           uint32_t _threshold; // use 10000 as limit on the watchdog for the error on the temperature sensor receiving of the values -  
        
                               // since the ETH callback timing is 2ms by default so using 10000 we can set a checking threshould of 5 second 
        
                               // in which we can allow the tdb to not respond. If cannot receive response over 1s we trigger the error 
        
           public: 
        
           Watchdog(): _count(0), _isStarted(false), _threshold(60000), _time(0){;} 
        
           Watchdog(uint32_t threshold):_count(0), _isStarted(false), _threshold(threshold), _time(0){;} 
        
           ~Watchdog() = default; 
        
           Watchdog(const Watchdog& other) =  default; 
        
           Watchdog(Watchdog&& other) noexcept =  default; 
        
           Watchdog& operator=(const Watchdog& other) =  default; 
        
           Watchdog& operator=(Watchdog&& other) noexcept =  default; 
        
           bool isStarted(){return _isStarted;} 
        
           void start() {_count = 0; _time = yarp::os::Time::now(); _isStarted = true;} 
        
           bool isExpired() {return (_count > _threshold);} 
        
           void increment() {++_count;} 
        
           void clear(){_isStarted=false;} 
        
           double getStartTime() {return _time;} 
        
           uint32_t getCount() {return _count; } 
        
           void setThreshold(uint8_t txrateOfRegularROPs){_threshold = _threshold / txrateOfRegularROPs;} 
        
           uint32_t getThreshold(){return _threshold;} 
        
           }; 
        
           class TemperatureFilter 
        
           { 
        
           private: 
        
               uint32_t _threshold;   // threshold for the delta between current and previous temperature --> set to 20 Celsius deg by default --> over 20 deg delta spike 
        
               double _motorTempPrev; // motor temperature at previous instant for checking positive temperature spikes  
        
               bool _isStarted; 
        
               int32_t _initCounter; 
        
               std::vector<double> _initTempBuffer; 
        
           public: 
        
               TemperatureFilter(): _threshold(20), _isStarted(false), _initCounter(50), _initTempBuffer(0), _motorTempPrev(0){;} 
        
               TemperatureFilter(uint32_t threshold, int32_t initCounter): _threshold(threshold), _isStarted(false), _initCounter(initCounter), _initTempBuffer(0), _motorTempPrev(0){;} 
        
               ~TemperatureFilter() = default; 
        
               TemperatureFilter(const TemperatureFilter& other) = default; 
        
               TemperatureFilter(TemperatureFilter&& other) noexcept = default; 
        
               TemperatureFilter& operator=(const TemperatureFilter& other) = default; 
        
               TemperatureFilter& operator=(TemperatureFilter&& other) noexcept = default; 
        
               bool isStarted(){return _isStarted;} 
        
               uint32_t getTemperatureThreshold() {return _threshold; } 
        
               double getPrevTemperature(){return _motorTempPrev;} 
        
               void updatePrevTemperature(double temperature){_motorTempPrev = temperature;} 
        
               void start(double temperature) 
        
               { 
        
                   if(_initCounter < 0) 
        
                   { 
        
                       int median_pos = std::ceil(_initTempBuffer.size() / 2) -1; 
        
                       _motorTempPrev = _initTempBuffer.at(median_pos); 
        
                       _isStarted = true; 
        
                   } 
        
                   else 
        
                   { 
        
                       _initTempBuffer.push_back(temperature); 
        
                       --_initCounter; 
        
                   } 
        
               } 
        
           };

Instead, when you read negative readings, you check meaning of Celsius and Raw value in the error message from this table
https://icub-tech-iit.github.io/documentation/temperature_sensors/software/dataflow/#error-handling

S-Dafarra · 2024-05-27T09:28:38Z

I meant that if after 60s the joint goes in HF, I would love to have some warning before, that can be caught not just by reading at the YRI output (which is virtually impossible during experiments), but maybe via some API calls so that we can inform the user/controller accordingly.

MSECode · 2024-05-27T09:54:25Z

The HF is entirely managed by the 2FOC. This part in icub-main has been done to filter all the spikes that can happen due to noisy readings. Those parameters can be eventually adjusted depending on the need.
Currently there are the APIs for retrieving the instantaneous value of temperature. We can think to even something else if necessary. Or it may be possible to update the module motorTemperaturePublisher to let it send advise to the user.
The idea of informing the low level controller can be problematic. We can think to inform the user, but in this case where should we send this "warning". What do u actually need for working fine? What should these APIs make to solve the problem?
@valegagge, do you have some other ideas?

S-Dafarra · 2024-05-27T10:20:50Z

The idea of informing the low level controller can be problematic

Just to clarify, I was referring more to a "high-level" controller, like the one controlling the walking.

Or it may be possible to update the module motorTemperaturePublisher to let it send advise to the user.

I don't know about this since the motorTemperaturePublisher is not something that runs with the robot.

We can think to inform the user, but in this case where should we send this "warning". What do u actually need for working fine? What should these APIs make to solve the problem?

Maybe a dedicated interface with a defined list of warning codes? I am pretty open to discussion here. Here some possible things I am imagining:

the yarpmotorgui starts blinking on the joint that is about to fault
a module starts making some noise, maybe also sending notifications to the OS (it is possible with QT, see https://doc.qt.io/qt-5/qtwidgets-desktop-systray-example.html)
the walking controller goes in a fault safe configuration (maybe stopping the walking and going in a joint configuration that is safe in case of sudden HF)

MSECode · 2024-05-27T10:48:40Z

Oks, that's clear. Those points seem good. I'll discuss with the team about an implementation such that.

valegagge · 2024-05-27T11:48:26Z

Hi @S-Dafarra,
as we already discussed some time ago, the high-lever controller, such as the walking controller, should read the current motor temperature and compare it to the warning threshold. (Both information is available on IMotor interface). If the temperature exceeds the threshold should put the robot in a safe position or reduce the load on the motor with high temperature.

If I understand correctly, you are asking to signal in some way the warning state also on the yarpmotorgui.

The work done in this PR had the purpose of avoiding false-positive warning to get a cleaner log.

S-Dafarra · 2024-05-27T12:05:21Z

If I understand correctly, you are asking to signal in some way the warning state also on the yarpmotorgui.

I was referring to the case in which there are measurement errors for more than 60s. I was just wondering in which case, the user code can be informed about the potentially imminent fault.

valegagge · 2024-05-28T07:51:10Z

If I understand correctly, you are asking to signal in some way the warning state also on the yarpmotorgui.

I was referring to the case in which there are measurement errors for more than 60s. I was just wondering in which case, the user code can be informed about the potentially imminent fault.

The fault on error readings happens after 10 seconds. We had planned to fix the error reading issue by hardware. I'll update you ASAP. Stay tuned

MSECode self-assigned this May 22, 2024

MSECode requested a review from valegagge May 22, 2024 14:42

MSECode marked this pull request as draft May 22, 2024 14:43

MSECode requested a review from pattacini May 22, 2024 14:43

MSECode mentioned this pull request May 22, 2024

Fix the motor temperature spike filtering in embObjMotionControl #959

Closed

valegagge approved these changes May 23, 2024

View reviewed changes

valegagge marked this pull request as ready for review May 23, 2024 08:59

pattacini linked an issue May 24, 2024 that may be closed by this pull request

Fix the motor temperature spike filtering in embObjMotionControl #959

Closed

pattacini approved these changes May 24, 2024

View reviewed changes

pattacini merged commit 5410ab0 into robotology:devel May 24, 2024
8 checks passed

MSECode deleted the feature/temperatureWarningFiltering branch June 13, 2024 14:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update filtering of spikes in temp reading #961

Update filtering of spikes in temp reading #961

MSECode commented May 22, 2024 •

edited

Loading

valegagge left a comment

S-Dafarra commented May 27, 2024

MSECode commented May 27, 2024

S-Dafarra commented May 27, 2024

MSECode commented May 27, 2024

S-Dafarra commented May 27, 2024

MSECode commented May 27, 2024

valegagge commented May 27, 2024

S-Dafarra commented May 27, 2024

valegagge commented May 28, 2024

Update filtering of spikes in temp reading #961

Update filtering of spikes in temp reading #961

Conversation

MSECode commented May 22, 2024 • edited Loading

valegagge left a comment

Choose a reason for hiding this comment

S-Dafarra commented May 27, 2024

MSECode commented May 27, 2024

S-Dafarra commented May 27, 2024

MSECode commented May 27, 2024

S-Dafarra commented May 27, 2024

MSECode commented May 27, 2024

valegagge commented May 27, 2024

S-Dafarra commented May 27, 2024

valegagge commented May 28, 2024

MSECode commented May 22, 2024 •

edited

Loading