Add telemetry to bundler #17

garryod · 2023-12-18T18:45:47Z

Add telemetry (logs + metrics + traces) to bundler using tracing with export in OpenTelemetry format via opentelemetry-otlp.

Collecting & visualising results with:

Jaeger for traces
Prometheus for metrics

tpoliaw

Again, I'll leave the yaml config to someone who knows what it's doing but the code change looks ok.

OpenTelemetry looks widely used enough to be a good choice for the exporting side, but I'm not familiar enough with the downstream platforms to really comment on them.

What does Jaeger offer over something like Grafana that's used elsewhere at Diamond? Are we going to end up with an individual monitoring system for each service or is the aim that they're all going to be combined into a central system at some point?

tpoliaw · 2024-01-09T09:33:01Z

.devcontainer/docker-compose.yml

@@ -12,6 +12,7 @@ services:
      DATABASE_URL: mysql://root:rootpassword@ispyb/ispyb_build
      BUNDLER_DATABASE_URL: mysql://root:rootpassword@ispyb/ispyb_build
      BUNDLER_LOG_LEVEL: DEBUG
+      BUNDLER_OTEL_COLLECTOR_URL: http://collector:4317


Is there any way of parameterising these values rather than repeating them in several places/files?

otelcol & jaeger both support env var substitution in their configs, so I could put them in the docker-compose. Unfortunately prometheus doesn't

garryod · 2024-01-09T12:58:49Z

What does Jaeger offer over something like Grafana that's used elsewhere at Diamond? Are we going to end up with an individual monitoring system for each service or is the aim that they're all going to be combined into a central system at some point?

Jaeger faciliates distributed tracing, as opposed to metrics visualisation. That is to say it provides a view of 'events' each of which can be drilled down into giving a waterfall of spans which were involved in this event and logs which were captured in those spans, even across service boundaries.

To really make distributed tracing powerful there needs to be a single instance which can correlate spans from desperate services, so I would envisage having one instance (cluster for HA) which all services would push their tracing data to.

tpoliaw · 2024-01-09T13:11:31Z

I thought grafana offered tracing visualisation as well but looking at it, it requires an external data source (such as jaeger) to work. I have no preferences one way or the other, I'm just not sure of what each part is doing. Where does prometheus fit in?

garryod · 2024-01-09T14:31:24Z

Jaeger and Prometheus both serve effectively the same role, they act as a storage backend and query engine for traces and metrics respectively. Grafana can then provide visualisations of this data

DiamondJoseph

If Peter is happy with the Rust, the devcontainer changes look sensible; presumably we have an equivalent standard logging deployment which handles the jaeger?

garryod · 2024-01-10T10:26:00Z

If Peter is happy with the Rust, the devcontainer changes look sensible; presumably we have an equivalent standard logging deployment which handles the jaeger?

Nothing in the deployment yet, and no recommended good practice for how to do this, kind of using this project (and XChem) as the test bed for OTEL & Jaeger with the view to spin up some Diamond wide infra to handle traces & metrics

DiamondJoseph · 2024-01-10T10:39:19Z

spin up some Diamond wide infra to handle traces & metrics

Either way, the monitoring deployment lives outside of the deployment it's monitoring, so that's a future problem

garryod added enhancement New feature or request rust Pull request that updates Rust code labels Dec 18, 2023

garryod self-assigned this Dec 18, 2023

garryod force-pushed the metrics branch 3 times, most recently from a708cf9 to 972863f Compare December 19, 2023 11:53

garryod force-pushed the metrics branch 2 times, most recently from cc6e36d to 4255b4e Compare January 3, 2024 14:40

garryod marked this pull request as draft January 3, 2024 18:07

garryod added 4 commits January 8, 2024 17:59

Add additional tracing spans

0830bc2

Add jaeger to dev container compose

dcc75f0

Export bundler traces to jaeger

345e2df

Create update_bundle span during actual update

8a55531

garryod force-pushed the metrics branch from 4255b4e to 66f1bd2 Compare January 8, 2024 18:01

garryod added 7 commits January 8, 2024 18:53

Export tracing via OTLP gRPC

258db5e

Remove erroneous sync span in future

de53bf9

Enable tracing in OPA

67b40ba

Rename devcontainer opa config to opa.yml

fca2800

Add prometheus to devcontainer docker-compose

68b6f32

Use otel collector to collate traces in dev container

871f94a

Expose bundler metrics via otlp exporter

c42e65b

garryod force-pushed the metrics branch from 66f1bd2 to c42e65b Compare January 8, 2024 18:53

garryod requested review from DiamondJoseph and tpoliaw January 8, 2024 18:53

garryod marked this pull request as ready for review January 9, 2024 11:16

tpoliaw reviewed Jan 9, 2024

View reviewed changes

DiamondJoseph approved these changes Jan 10, 2024

View reviewed changes

garryod merged commit 0c488e8 into main Jan 10, 2024
20 checks passed

garryod deleted the metrics branch April 11, 2024 15:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add telemetry to bundler #17

Add telemetry to bundler #17

garryod commented Dec 18, 2023 •

edited

Loading

tpoliaw left a comment

tpoliaw Jan 9, 2024

garryod Jan 9, 2024

garryod commented Jan 9, 2024

tpoliaw commented Jan 9, 2024

garryod commented Jan 9, 2024

DiamondJoseph left a comment

garryod commented Jan 10, 2024

DiamondJoseph commented Jan 10, 2024

Add telemetry to bundler #17

Add telemetry to bundler #17

Conversation

garryod commented Dec 18, 2023 • edited Loading

tpoliaw left a comment

Choose a reason for hiding this comment

tpoliaw Jan 9, 2024

Choose a reason for hiding this comment

garryod Jan 9, 2024

Choose a reason for hiding this comment

garryod commented Jan 9, 2024

tpoliaw commented Jan 9, 2024

garryod commented Jan 9, 2024

DiamondJoseph left a comment

Choose a reason for hiding this comment

garryod commented Jan 10, 2024

DiamondJoseph commented Jan 10, 2024

garryod commented Dec 18, 2023 •

edited

Loading