Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[telemetry] Simplify disabling of prometheus endpoint for internal telemetry metrics #10919

Open
pirgeo opened this issue Aug 20, 2024 · 9 comments
Labels
area:service collector-telemetry healthchecker and other telemetry collection issues

Comments

@pirgeo
Copy link
Member

pirgeo commented Aug 20, 2024

As it is now possible to export internal telemetry metrics via OTLP, scraping the Prometheus metrics from the collector is no longer the only way to get that data. Since users will likely export via both export paths, they might want to disable the Prometheus endpoint.

Have a look at the instantiation code today. You have two ways of disabling the Prometheus endpoint:

  1. Set service: telemetry: metrics: level: none. However, this will immediately return a noop MeterProvider and thus disable all exporting (including OTLP-based exporting).
  2. Set service: telemetry: metrics: address: "" (to an empty string). The code checks whether the string in the address is longer than 0 characters. However, having to know that only the empty string disables the additional Prometheus exporter feels unintuitive to me and makes the collector config confusing to read if you don't know exactly why you need to specify an empty string.

Describe the solution you'd like
If another exporter is explicitly defined, the Prometheus exporter is automatically disabled. However, it can be explicitly added alongside the OTLP exporter if required.

Describe alternatives you've considered
I am opening this issue to discuss possible solutions. Here are a few I came up with

  • Setting the address to null explicitly should disable the Prom exporter
  • Similarly, setting the address to "none" or "disabled" should turn off the exporter. I don't like this option, though, as it uses magic strings and is not intuitive.
  • Turning the Prometheus exporter off by default if another exporter is explicitly configured
  • Deprecating the service: telemetry: metrics: address: configuration option and requiring Prometheus exporters to be explicitly set up so they follow the same structure as the OTLP exporters. This could break backward compatibility, though, so we might need to introduce a feature gate.
@pirgeo
Copy link
Member Author

pirgeo commented Aug 20, 2024

Probably similar to the discussion in #10890, might also help with disabling metrics, for example here #10780

@pirgeo
Copy link
Member Author

pirgeo commented Aug 20, 2024

The plot thickens. It seems service: telemetry: metrics: address is already deprecated: https://github.com/open-telemetry/opentelemetry-collector/blob/main/service/service.go#L175-L177. However, I don't see that warning message in my collector startup logs.

EDIT: Initially this comment contained some wrong assumptions, I cleaned it up now:

Additionally, when I disable the Prometheus exporter by setting address: "", I only get a subset of metrics. I get the metrics otelcol_exporter_queue_size and otelcol_exporter_queue_capacity when turning on only OTLP export but no Prometheus exporter. None of the other metrics are exported (e.g. otelcol_process_uptime is missing).

@TylerHelmuth TylerHelmuth added area:service collector-telemetry healthchecker and other telemetry collection issues labels Aug 20, 2024
@TylerHelmuth TylerHelmuth added this to the Collector v1 distro milestone Aug 20, 2024
@codeboten
Copy link
Contributor

However, I don't see that warning message in my collector startup logs.

The warning will now show up by default with the featuregate moving to stable in #11091

Additionally, when I disable the Prometheus exporter by setting address: "", I only get a subset of metrics. I get the metrics otelcol_exporter_queue_size and otelcol_exporter_queue_capacity when turning on only OTLP export but no Prometheus exporter. None of the other metrics are exported (e.g. otelcol_process_uptime is missing).

This is a bug, will spend some time investigating it

@codeboten
Copy link
Contributor

I tested this via the current version of the collector with the config:

service:
  telemetry:
    metrics:
      address: ""
      readers:
       - periodic:
           exporter:
             otlp:
               protocol: http/protobuf
               endpoint: https://api.honeycomb.io:443
               headers:
                 "x-honeycomb-team": "${env:HONEYCOMB_API_KEY}"

The prometheus exporter is disabled as expected and the OTLP exporter emitted these:

otelcol_process_cpu_seconds
otelcol_process_memory_rss
otelcol_process_runtime_heap_alloc_bytes
otelcol_process_runtime_total_alloc_bytes
otelcol_process_runtime_total_sys_memory_bytes
otelcol_process_uptime
otelcol_processor_batch_metadata_cardinality

@codeboten
Copy link
Contributor

Note that the export of the metrics via OTLP happened over two different batches, not sure if that's important to your tests @pirgeo

@codeboten
Copy link
Contributor

@pirgeo was able to reproduce a similar behaviour as what you described by programmatically disabling the prometheus exporter configured via metrics::address, will do more testing

@codeboten
Copy link
Contributor

submitted a fix for the bug in #11093

@codeboten
Copy link
Contributor

I am opening this issue to discuss possible solutions. Here are a few I came up with

Setting the address to null explicitly should disable the Prom exporter

I think setting the address to null is not a great user experience. In the sense that I wouldn't expect having to set a readers configuration for my prometheus exporter and also to set null to a second value. This feels confusing.

Similarly, setting the address to "none" or "disabled" should turn off the exporter. I don't like this option, though, as it uses magic strings and is not intuitive.

Similarly to the null option, it's not super clear that more things are required.

Turning the Prometheus exporter off by default if another exporter is explicitly configured

Maybe this one would be the easiest one from the standpoint of a user. If I want to use readers, then we ignore the default of address and don't bother adding it. Although one of the issue we'll have is that address is set as a default in createDefaultConfig and there's difference between the string being unset or being set by the user to ":8888". I suppose we could return an error if both address and readers are configured, WDYT?

Deprecating the service: telemetry: metrics: address: configuration option and requiring Prometheus exporters to be explicitly set up so they follow the same structure as the OTLP exporters. This could break backward compatibility, though, so we might need to introduce a feature gate.

I would vote to do this and to introduce a feature gate that removes address

@pirgeo
Copy link
Member Author

pirgeo commented Sep 10, 2024

Note that the export of the metrics via OTLP happened over two different batches, not sure if that's important to your tests

Ah, I think I just ran a second collector with the debug exporter and maybe didn't wait long enough to see whether data would show up. I'll try it again. Thanks for taking a look!

codeboten added a commit that referenced this issue Sep 11, 2024
This bug caused proctelemetry metrics to not be registered if a user
configured the Collector's internal telemetry via `readers` only and
disabled `address`. The check in the `if` statement is no longer needed
since a no-op meter provider will be configured unless the telemetry
level is set.

Mentioned in #10919

---------

Signed-off-by: Alex Boten <223565+codeboten@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:service collector-telemetry healthchecker and other telemetry collection issues
Projects
None yet
Development

No branches or pull requests

3 participants