Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TimeWindowMax produces wrong values after long period of inactivity #2647

Closed
vladimir-bukhtoyarov opened this issue Jun 14, 2021 · 1 comment · Fixed by #2648
Closed

TimeWindowMax produces wrong values after long period of inactivity #2647

vladimir-bukhtoyarov opened this issue Jun 14, 2021 · 1 comment · Fixed by #2648
Labels
bug A general bug module: micrometer-core An issue that is related to our core module
Milestone

Comments

@vladimir-bukhtoyarov
Copy link
Contributor

vladimir-bukhtoyarov commented Jun 14, 2021

Code to reproduce:

public static void main(String[] args) {
        AtomicLong managedTime = new AtomicLong();
        Clock clock = new Clock() {
            @Override
            public long wallTime() {
                return managedTime.get();
            }

            @Override
            public long monotonicTime() {
                return managedTime.get();
            }
        };

        TimeWindowMax timeWindowMax = new TimeWindowMax(clock, 60_000, 3);
        timeWindowMax.record(32);
        System.out.println(timeWindowMax.poll()); // prints 32

        // emulate 12 hours of inactivity
        managedTime.set(TimeUnit.HOURS.toMillis(12));
        System.out.println(timeWindowMax.poll()); // prints 0 as expected

        timeWindowMax.record(666);
        System.out.println(timeWindowMax.poll()); // prints 0. 666 is missed because of bug

        timeWindowMax.record(13);
        System.out.println(timeWindowMax.poll()); // prints 0. both 666 and 13 are missed because of bug

        timeWindowMax.record(100500);
        System.out.println(timeWindowMax.poll()); // prints 0. All 666, 13 and 100500 are erased because of bug
    }

Impact:
The impact is not very critical because usually, a monitoring agent (like Telegraph or something else) touches your metrics on a regular basis, as a result, rotation happens often.

scenario to reproduce

  1. Imagine the network had been broken on the host, as a result, nobody was touching Timer because both incoming requests from clients and incoming requests from the monitoring agent were not able to reach the application.
  2. Network was fixed after several hours. User requests as well as requests from the monitoring agent now successfully reach the application.
  3. We have zero values instead of maximums, and it is hard to understand that is not a bug of the monitoring agent or monitoring database.

Cause:
image
After long period of inactivity rotate leaves lastRotateTimestampMillis in the far in the past.

@shakuzen shakuzen added the module: micrometer-core An issue that is related to our core module label Jun 23, 2021
@shakuzen shakuzen added this to the 1.5.x milestone Jun 23, 2021
@shakuzen shakuzen added the bug A general bug label Jun 23, 2021
@shakuzen shakuzen modified the milestones: 1.5.x, 1.6.x Aug 12, 2021
@shakuzen shakuzen modified the milestones: 1.6.x, 1.6.11 Aug 24, 2021
@shakuzen shakuzen linked a pull request Aug 25, 2021 that will close this issue
@shakuzen
Copy link
Member

Thank you for reporting the issue and providing all the details as well as a fix. I suppose this wasn't found sooner because in most cases the max will be polled regularly. Even if the metrics backend were down, in a push based system, the application would still poll the max and try to publish it to the backend. It is in the case of a pull-based metrics backend like Prometheus that the backend being down or some network issue between the application and metrics backend can cause such long periods without polling the max. Additionally, if for some reason a registry had stop called on it for an extended period of time, I suspect this issue would also arise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A general bug module: micrometer-core An issue that is related to our core module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants