Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tempo-mixin: add panel for Envoy Proxy sidecar, update tanka example … #1137

Merged
merged 3 commits into from
Nov 26, 2021

Conversation

kvrhdn
Copy link
Member

@kvrhdn kvrhdn commented Nov 25, 2021

What this PR does:

Changes:

Tempo / Writes dashboard

Add a panel with Envoy metrics. This panel is to be used when the gateway is run with an Envoy Proxy sidecar (for HTTP2 load balancing).
The metrics from Envoy expose the gRPC status codes, which are essential pieces of information when debugging ingress issues. The gateway only reports HTTP statuses, which is deceptive since a 200 OK HTTP packet might contain a gRPC message with an error.
Example: if a request is denied by the ingester because it hits the limit, the distributor will return a HTTP 200 OK message containing a gRPC message with status 9 FAILED PRECONDITION.

I've also added a text panel listing the gRPC status codes since these aren't common knowledge.
It links to this doc: https://github.com/grpc/grpc/blob/master/doc/statuscodes.md

tanka example

  • update prometheus config to always add cluster label, the tempo-mixin dashboards depend on this label
  • enable search in Grafana and Tempo
  • use a relative link to tempo-mixin, this makes it easier to quickly test changes

I've also run jb update in tempo-mixin and tk examples, hence the large amount of vendor file being changed.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

…and jsonnet dependencies

Changes:

### Tempo / Writes dashboard

Add a panel with Envoy metrics. This panel is to be used when the gateway is run with an Envoy Proxy sidecar (for HTTP2 load balancing).
The metrics from Envoy expose the gRPC status codes, which are essential pieces of information when debugging ingress issues. The gateway only reports HTTP statuses, which is deceptive since a 200 OK HTTP packet might contain a gRPC message with an error.
Example: if a request is denied by the ingester because it hits the limit, the distributor will return a HTTP 200 OK message containg a gRPC message with status 9 FAILED PRECONDITION.

I've also added a text panel listing the gRPC status codes since these aren't common knowledge.
It links to this doc: https://github.com/grpc/grpc/blob/master/doc/statuscodes.md

### tanka example

- update prometheus config to always add `cluster` label, the tempo-mixin dashboards depend on this label
- enable search in Grafana and Tempo
- use a relative link to tempo-mixin, this makes it easier to quickly test changes

I've also run `jb update` in tempo-mixin and tk examples, hence the large amount of vendor file being changed.
@kvrhdn
Copy link
Member Author

kvrhdn commented Nov 26, 2021

Btw, the end result looks a little bit like this:

Screenshot 2021-11-26 at 00 57 52

Interesting things to note here:

  • the distributor has a small bump in refused spans starting around 23:00
  • the gateway does not report this, it's all 200s there
  • the gRPC status codes from Envoy expose this as an increase in 9 FAILED PRECONDITION

Copy link
Contributor

@annanay25 annanay25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.


Code | Number | Description
---|---|---
OK | 0 | Not an error; returned on success.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, very useful addition.

@kvrhdn kvrhdn merged commit a1a95b3 into grafana:main Nov 26, 2021
@kvrhdn kvrhdn deleted the kvrhdn/tempo-writes-grpc branch November 26, 2021 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants