Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeltaHistogram in SignalFx registry doesn't align with count and total #3774

Closed
lenin-jaganathan opened this issue Apr 17, 2023 · 2 comments · Fixed by #3799
Closed

DeltaHistogram in SignalFx registry doesn't align with count and total #3774

lenin-jaganathan opened this issue Apr 17, 2023 · 2 comments · Fixed by #3799
Labels
bug A general bug registry: signalfx A SignalFX Registry related issue
Milestone

Comments

@lenin-jaganathan
Copy link
Contributor

lenin-jaganathan commented Apr 17, 2023

Describe the bug
#3750 fixed the issue for distributed publishing of metrics. For StepRegistries, this implementation took an approach of scheduling a thread to poll the meters at the start of the minute and exporting at a different point in time. An unexpected outcome of this fix is that DeltaHistogram implementation of SignalFx registry is broken. Since the polling thread will call the count() on Timer at the start of the minute, the count and total are rolled over for the StepWindow. But the actual reading (takeSnapShot()) of the timer can happen at any point in time between the start of the step and 80% of the step (maybe even later if batching is involved). This breaks the Delta Histogram Implementation of the SignalFx registry.

Environment

All
  • Micrometer version 1.11.0
  • Micrometer registry - SignalFx

To Reproduce
How to reproduce the bug:

Set-Up:

  • Create a SIgnalFx Meter Registry with a Step of 60 seconds (any step would do for that matter. The 60s is chosen for explaining the behavior).
  • Create a Timer and configure SLAs / Histograms.
  • Increment the timer once per second.

Scenario:

Assume that the https://github.com/micrometer-metrics/micrometer/blob/main/micrometer-core/src/main/java/io/micrometer/core/instrument/push/PushMeterRegistry.java#L129-L138 returns 20th second. So, publishing will happen on the 20th second.

  • The poll service rollover count at the 0th second.
  • The scheduled publish happens on the 20th second when the metric is published the count will be a snapshot at the 0th second whereas the Histogram count will be a snapshot at the 20th second. So, for the first publication, the histogram count will be (count+20).

In real-case scenarios, the data recorded will not be uniform, so this Histogram will report a different snapshot that is misaligned with the count and total

Expected behavior
Irrespective of the publishing time, the count metric and the count of histogram buckets should match.

@lenin-jaganathan
Copy link
Contributor Author

Though I agree with the good intentions of #3750, to just call count() on Timers/Summaries, probably the pollMetersToRollover() can be protected so that registries can define it. (Similar to publish, where to publish is scheduled by the core but implemented by registries. Maybe in this case, pollMetersToRollover() can have a default implementation but with an option for registries to override them).

But that also won't solve the problem that the SignalFx registry has because of its implementation. #3749 has a completely StepAligned Histogram but it is still in its early phase. But what is needed for this might be a StepAligned Histogram Implementation.

@lenin-jaganathan lenin-jaganathan changed the title DeltaHistogram in SignalFx registry reports incorrect information DeltaHistogram in SignalFx registry doesn't align with count and total Apr 17, 2023
@lenin-jaganathan
Copy link
Contributor Author

#3749 - Now that this is merged. I think we can use StepBucketHistogram instead of Delta Histogram to address this problem.

@shakuzen shakuzen added bug A general bug registry: signalfx A SignalFX Registry related issue and removed waiting-for-triage labels Jun 9, 2023
@shakuzen shakuzen added this to the 1.11.1 milestone Jun 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A general bug registry: signalfx A SignalFX Registry related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants