TimeWindowMax produces wrong values after long period of inactivity #2647

vladimir-bukhtoyarov · 2021-06-14T15:50:28Z

Code to reproduce:

public static void main(String[] args) {
        AtomicLong managedTime = new AtomicLong();
        Clock clock = new Clock() {
            @Override
            public long wallTime() {
                return managedTime.get();
            }

            @Override
            public long monotonicTime() {
                return managedTime.get();
            }
        };

        TimeWindowMax timeWindowMax = new TimeWindowMax(clock, 60_000, 3);
        timeWindowMax.record(32);
        System.out.println(timeWindowMax.poll()); // prints 32

        // emulate 12 hours of inactivity
        managedTime.set(TimeUnit.HOURS.toMillis(12));
        System.out.println(timeWindowMax.poll()); // prints 0 as expected

        timeWindowMax.record(666);
        System.out.println(timeWindowMax.poll()); // prints 0. 666 is missed because of bug

        timeWindowMax.record(13);
        System.out.println(timeWindowMax.poll()); // prints 0. both 666 and 13 are missed because of bug

        timeWindowMax.record(100500);
        System.out.println(timeWindowMax.poll()); // prints 0. All 666, 13 and 100500 are erased because of bug
    }

Impact:
The impact is not very critical because usually, a monitoring agent (like Telegraph or something else) touches your metrics on a regular basis, as a result, rotation happens often.

scenario to reproduce

Imagine the network had been broken on the host, as a result, nobody was touching Timer because both incoming requests from clients and incoming requests from the monitoring agent were not able to reach the application.
Network was fixed after several hours. User requests as well as requests from the monitoring agent now successfully reach the application.
We have zero values instead of maximums, and it is hard to understand that is not a bug of the monitoring agent or monitoring database.

Cause:

After long period of inactivity rotate leaves lastRotateTimestampMillis in the far in the past.

The text was updated successfully, but these errors were encountered:

shakuzen · 2021-08-25T07:08:17Z

Thank you for reporting the issue and providing all the details as well as a fix. I suppose this wasn't found sooner because in most cases the max will be polled regularly. Even if the metrics backend were down, in a push based system, the application would still poll the max and try to publish it to the backend. It is in the case of a pull-based metrics backend like Prometheus that the backend being down or some network issue between the application and metrics backend can cause such long periods without polling the max. Additionally, if for some reason a registry had stop called on it for an extended period of time, I suspect this issue would also arise.

vladimir-bukhtoyarov mentioned this issue Jun 14, 2021

Fix TimeWindowMax for case of long period of inactivity #2648

Merged

shakuzen added the module: micrometer-core An issue that is related to our core module label Jun 23, 2021

shakuzen added this to the 1.5.x milestone Jun 23, 2021

shakuzen added the bug A general bug label Jun 23, 2021

shakuzen modified the milestones: 1.5.x, 1.6.x Aug 12, 2021

shakuzen modified the milestones: 1.6.x, 1.6.11 Aug 24, 2021

shakuzen linked a pull request Aug 25, 2021 that will close this issue

Fix TimeWindowMax for case of long period of inactivity #2648

Merged

shakuzen closed this as completed in 9cf7c93 Aug 25, 2021

MalyginaEkaterina mentioned this issue Sep 7, 2022

AbstractTimeWindowHistogram produces wrong values after long period of inactivity #3395

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TimeWindowMax produces wrong values after long period of inactivity #2647

TimeWindowMax produces wrong values after long period of inactivity #2647

vladimir-bukhtoyarov commented Jun 14, 2021 •

edited

Loading

shakuzen commented Aug 25, 2021

TimeWindowMax produces wrong values after long period of inactivity #2647

TimeWindowMax produces wrong values after long period of inactivity #2647

Comments

vladimir-bukhtoyarov commented Jun 14, 2021 • edited Loading

shakuzen commented Aug 25, 2021

vladimir-bukhtoyarov commented Jun 14, 2021 •

edited

Loading