Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict metric names to what is supported by Prometheus #3065

Closed
mstoykov opened this issue May 11, 2023 · 6 comments · Fixed by #3448
Closed

Restrict metric names to what is supported by Prometheus #3065

mstoykov opened this issue May 11, 2023 · 6 comments · Fixed by #3448
Labels
breaking change for PRs that need to be mentioned in the breaking changes section of the release notes
Milestone

Comments

@mstoykov
Copy link
Contributor

What?

k6 currently supports very liberal metric names
https://github.com/grafana/k6/blob/master/metrics/registry.go#L29
prometheus does not https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels

Open telemetry also has way more restricted list as well https://opentelemetry.io/docs/specification/otel/metrics/api/#instrument-name-syntax

Proposed solution:

Limit to [a-zA-Z_][a-zA-Z0-9_]* which is Prometheus without : which have special meaning. We can always expand them again later.

In order to have as little disturbance as possible we will have some release (3) with warning, and then we will make this into an error.

Caveats:

Depending on community feedback we might pivot more in one or another direction.

@mstoykov mstoykov added the breaking change for PRs that need to be mentioned in the breaking changes section of the release notes label May 11, 2023
@mstoykov mstoykov added this to the v0.45.0 milestone May 11, 2023
mstoykov added a commit that referenced this issue May 11, 2023
This adds a warning for custom metrics that are not Prometheus
compatible.

The warning is in the k6/metrics as the alternative was threading a
logger in the registry specifically for this and then removing it.

The major downside of this is that extension can still register
incompatible metric names without any warnings.

Updates #3065
mstoykov added a commit that referenced this issue May 11, 2023
This adds a warning for custom metrics that are not Prometheus
compatible.

The warning is in the js/bundle mostly for convenience.

It is a place we already have a registry and a logger and we go through
only once.

As this is temporary warning until we enforce stricter metric names
I don't think it is good idea to refactor the registry to support this
for a little while.

Updates #3065
@mstoykov mstoykov modified the milestones: v0.45.0, v0.48.0 May 15, 2023
mstoykov added a commit that referenced this issue May 15, 2023
This adds a warning for custom metrics that are not Prometheus
compatible.

The warning is in the js/bundle mostly for convenience.

It is a place we already have a registry and a logger and we go through
only once.

As this is temporary warning until we enforce stricter metric names
I don't think it is good idea to refactor the registry to support this
for a little while.

Updates #3065
mstoykov added a commit that referenced this issue May 15, 2023
This adds a warning for custom metrics that are not Prometheus
compatible.

The warning is in the js/bundle mostly for convenience.

It is a place we already have a registry and a logger and we go through
only once.

As this is temporary warning until we enforce stricter metric names
I don't think it is good idea to refactor the registry to support this
for a little while.

Updates #3065
@pkouzmineESS
Copy link

Suggestion: instead of completely eliminating ability to create metric names with a wider character set, why not put it behind a flag? By default, K6 could check that metrics are created only using [a-zA-Z_][a-zA-Z0-9_]*. But if launched with flag like --expanded-metric-name-charset, current behavior could be retained. This has a benefit of making it less of a breaking change.

@mstoykov
Copy link
Contributor Author

Hi @pkouzmineESS, we could add option to make it or not, but I would prefer not to have functionality that people have not requested.

This option basically will make it impossible for us to optimize anything based on these restrictions. And will make any functionality (outputs to otel/prometheus compatible engines) to either not work or have strange name transformations.

So the question is whether anyone actually uses this more extended names and if yes - was that on purpose or did they just write a name and as it didn't tell them anything they went with it?

Do you have a case where you want to more expanded metric name charset?

@pkouzmineESS
Copy link

pkouzmineESS commented Jul 13, 2023

In our case we test a little over 150 endpoints in over 50 scenarios. Some of our endpoints trigger differently optimized stored procedures on the database side depending on what query parameters are passed. In other cases, we have endpoints that do different things depending on the payload passed in and have different performance profiles based on the same (not the best design, I know, but what we have to live with for now). Further, different scenarios might call the same endpoint and sometimes we want to keep track of the originating scenario. For example, we might need to call search endpoint as if it was used by individual user and also call it in different scenario as if it was called by user from a support center.

We generate custom Trend metrics to keep track of these things, as we care about aggregate performance over the duration of the test more so than any individual call. We start off by using our routes file to name metrics by the HTTP request type, and route (so GET /gateway/api/search for example, which we'll have to modify when this change goes into effect). Then, depending on if we need to keep track of scenario that called the metric and parameters, we append that information to the metric name after white space and some separator, for example - character. This gives us easy to read and process metric results in k6-reporter where engineer can see what endpoint was called, as well as what scenario was used and what underlying functionality was targeted. It makes tracking performance problems much easier in our workflow, we end up using a little over 160 custom metrics and may want to expand that further.

We also use _ character to identify id placeholders in the URI, so with that becoming only allowed non number/letter character we'd have to account for that as well.

All that being said, I can certainly understand the desire to support a single standard. I think we can adapt, if need be, but that is our use case.

@pkouzmineESS
Copy link

pkouzmineESS commented Sep 13, 2023

@mstoykov I'm working on updating the names of our metrics to future proof against this change and its seems that both the warning K6 prints regarding this issue and the documentation in the issue itself is slightly misleading.

From what I'm seeing now, the warning is triggered if metric name does not match regex ^[a-zA-Z_][a-zA-Z0-9_]{1,63}$, not [a-zA-Z_][a-zA-Z0-9_]* as stated in this issue and the warning text. https://github.com/grafana/k6/blob/981b770946f5df5d772554e5906f888ef1ca8a94/js/bundle.go#L49C22-L49C51

Several longer metric names we have triggered the warning. Is the intent to limit metric name length going forward as well?

mstoykov added a commit that referenced this issue Sep 18, 2023
This is inline with the current restriction.

The previous 63 character one came from Otel, but with
open-telemetry/opentelemetry-specification#3648
This has been bumped to 255.

Updates #3065
@mstoykov
Copy link
Contributor Author

Hi sorry for dropping the ball on this conversation :(.

Several longer metric names we have triggered the warning. Is the intent to limit metric name length going forward as well?

This was mostly done as originally we kept the 128 characters limit that we had up until then, before it was pointed out that Otel limits them to 63.

That has since change and I just bumped it to 128 again in #3335. And we might bump it to 255. I also decided to add the 128 character restriction in the message, but still keep it short 🤞

On the previous comment:

What do you use for this metrics? Again this restrictions are to align us with prometheus and Otel and looking at some other names in the industry:

  1. Datadog has periods as well and then converts a bunch of stuff to _ which IMO is worse
  2. Influxdb doesn't have any restrictions (which breaks a bunch of integrations from the thing I looked at) but they have a list of recommendations so that you don't shoot yourself in the foot.
  3. Given that most everything else practically works with prometheus and Otel I expect they have similar limitations either by choice or due to the tooling for the protocols.

I am still not ruling out adding an escape hatch - but as I explained before, that will require that:

  1. we work around it in places - so disable some functionality for this.
  2. let it fail ... badly in cases where it is incompatible.

Both of those seem bad to me as UX 🤷

mstoykov added a commit that referenced this issue Sep 18, 2023
This is inline with the current restriction.

The previous 63 character one came from Otel, but with
open-telemetry/opentelemetry-specification#3648
This has been bumped to 255.

Updates #3065
@pkouzmineESS
Copy link

All good, life happens, thank you for the response.

That has since change and I just bumped it to 128 again in #3335. And we might bump it to 255. I also decided to add the 128 character restriction in the message, but still keep it short 🤞

Sounds good, this means I don't have to make any more changes on our end.

What do you use for this metrics?

At the time of my original comment, we didn't really send them anywhere except HTML report that was analyzed by engineers as part of our release process. My concern at that time was readability.

But now the point is moot: we'll be trying to send them to Datadog. I already made the changes to be compatible with this upcoming change in 0.48, try to make them look good in Datadog, while trying to keep them readable. Also discussed it with my team and they are fine with the changes. So, in retrospect this works for us, please consider my original suggestion withdrawn.

mstoykov added a commit that referenced this issue Sep 20, 2023
This is inline with the current restriction.

The previous 63 character one came from Otel, but with
open-telemetry/opentelemetry-specification#3648
This has been bumped to 255.

Updates #3065
mstoykov added a commit that referenced this issue Sep 20, 2023
This is inline with the current restriction.

The previous 63 character one came from Otel, but with
open-telemetry/opentelemetry-specification#3648
This has been bumped to 255.

Updates #3065
mstoykov added a commit that referenced this issue Nov 16, 2023
Closes #3065


Co-authored-by: Oleg Bespalov <oleg.bespalov@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change for PRs that need to be mentioned in the breaking changes section of the release notes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants