You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
publicstaticvoidmain(String[] args) {
AtomicLongmanagedTime = newAtomicLong();
Clockclock = newClock() {
@OverridepubliclongwallTime() {
returnmanagedTime.get();
}
@OverridepubliclongmonotonicTime() {
returnmanagedTime.get();
}
};
TimeWindowMaxtimeWindowMax = newTimeWindowMax(clock, 60_000, 3);
timeWindowMax.record(32);
System.out.println(timeWindowMax.poll()); // prints 32// emulate 12 hours of inactivitymanagedTime.set(TimeUnit.HOURS.toMillis(12));
System.out.println(timeWindowMax.poll()); // prints 0 as expectedtimeWindowMax.record(666);
System.out.println(timeWindowMax.poll()); // prints 0. 666 is missed because of bugtimeWindowMax.record(13);
System.out.println(timeWindowMax.poll()); // prints 0. both 666 and 13 are missed because of bugtimeWindowMax.record(100500);
System.out.println(timeWindowMax.poll()); // prints 0. All 666, 13 and 100500 are erased because of bug
}
Impact:
The impact is not very critical because usually, a monitoring agent (like Telegraph or something else) touches your metrics on a regular basis, as a result, rotation happens often.
scenario to reproduce
Imagine the network had been broken on the host, as a result, nobody was touching Timer because both incoming requests from clients and incoming requests from the monitoring agent were not able to reach the application.
Network was fixed after several hours. User requests as well as requests from the monitoring agent now successfully reach the application.
We have zero values instead of maximums, and it is hard to understand that is not a bug of the monitoring agent or monitoring database.
Cause:
After long period of inactivity rotate leaves lastRotateTimestampMillis in the far in the past.
The text was updated successfully, but these errors were encountered:
Thank you for reporting the issue and providing all the details as well as a fix. I suppose this wasn't found sooner because in most cases the max will be polled regularly. Even if the metrics backend were down, in a push based system, the application would still poll the max and try to publish it to the backend. It is in the case of a pull-based metrics backend like Prometheus that the backend being down or some network issue between the application and metrics backend can cause such long periods without polling the max. Additionally, if for some reason a registry had stop called on it for an extended period of time, I suspect this issue would also arise.
Code to reproduce:
Impact:
The impact is not very critical because usually, a monitoring agent (like Telegraph or something else) touches your metrics on a regular basis, as a result, rotation happens often.
scenario to reproduce
Cause:
After long period of inactivity
rotate
leaveslastRotateTimestampMillis
in the far in the past.The text was updated successfully, but these errors were encountered: