Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add telemetry to bundler #17

Merged
merged 11 commits into from
Jan 10, 2024
Merged

Add telemetry to bundler #17

merged 11 commits into from
Jan 10, 2024

Conversation

garryod
Copy link
Member

@garryod garryod commented Dec 18, 2023

Add telemetry (logs + metrics + traces) to bundler using tracing with export in OpenTelemetry format via opentelemetry-otlp.

Collecting & visualising results with:

  • Jaeger for traces
  • Prometheus for metrics

@garryod garryod added enhancement New feature or request rust Pull request that updates Rust code labels Dec 18, 2023
@garryod garryod self-assigned this Dec 18, 2023
@garryod garryod force-pushed the metrics branch 3 times, most recently from a708cf9 to 972863f Compare December 19, 2023 11:53
@garryod garryod force-pushed the metrics branch 2 times, most recently from cc6e36d to 4255b4e Compare January 3, 2024 14:40
@garryod garryod marked this pull request as draft January 3, 2024 18:07
Copy link

@tpoliaw tpoliaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I'll leave the yaml config to someone who knows what it's doing but the code change looks ok.

OpenTelemetry looks widely used enough to be a good choice for the exporting side, but I'm not familiar enough with the downstream platforms to really comment on them.

What does Jaeger offer over something like Grafana that's used elsewhere at Diamond? Are we going to end up with an individual monitoring system for each service or is the aim that they're all going to be combined into a central system at some point?

@@ -12,6 +12,7 @@ services:
DATABASE_URL: mysql://root:rootpassword@ispyb/ispyb_build
BUNDLER_DATABASE_URL: mysql://root:rootpassword@ispyb/ispyb_build
BUNDLER_LOG_LEVEL: DEBUG
BUNDLER_OTEL_COLLECTOR_URL: http://collector:4317
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way of parameterising these values rather than repeating them in several places/files?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otelcol & jaeger both support env var substitution in their configs, so I could put them in the docker-compose. Unfortunately prometheus doesn't

@garryod
Copy link
Member Author

garryod commented Jan 9, 2024

What does Jaeger offer over something like Grafana that's used elsewhere at Diamond? Are we going to end up with an individual monitoring system for each service or is the aim that they're all going to be combined into a central system at some point?

Jaeger faciliates distributed tracing, as opposed to metrics visualisation. That is to say it provides a view of 'events' each of which can be drilled down into giving a waterfall of spans which were involved in this event and logs which were captured in those spans, even across service boundaries.

To really make distributed tracing powerful there needs to be a single instance which can correlate spans from desperate services, so I would envisage having one instance (cluster for HA) which all services would push their tracing data to.

@tpoliaw
Copy link

tpoliaw commented Jan 9, 2024

I thought grafana offered tracing visualisation as well but looking at it, it requires an external data source (such as jaeger) to work. I have no preferences one way or the other, I'm just not sure of what each part is doing. Where does prometheus fit in?

@garryod
Copy link
Member Author

garryod commented Jan 9, 2024

Jaeger and Prometheus both serve effectively the same role, they act as a storage backend and query engine for traces and metrics respectively. Grafana can then provide visualisations of this data

Copy link
Collaborator

@DiamondJoseph DiamondJoseph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Peter is happy with the Rust, the devcontainer changes look sensible; presumably we have an equivalent standard logging deployment which handles the jaeger?

@garryod
Copy link
Member Author

garryod commented Jan 10, 2024

If Peter is happy with the Rust, the devcontainer changes look sensible; presumably we have an equivalent standard logging deployment which handles the jaeger?

Nothing in the deployment yet, and no recommended good practice for how to do this, kind of using this project (and XChem) as the test bed for OTEL & Jaeger with the view to spin up some Diamond wide infra to handle traces & metrics

@DiamondJoseph
Copy link
Collaborator

spin up some Diamond wide infra to handle traces & metrics

Either way, the monitoring deployment lives outside of the deployment it's monitoring, so that's a future problem

@garryod garryod merged commit 0c488e8 into main Jan 10, 2024
20 checks passed
@garryod garryod deleted the metrics branch April 11, 2024 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request rust Pull request that updates Rust code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants