-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add metrics in go-libp2p blogpost #77
Merged
Merged
Changes from 2 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
722ba74
add metrics in go-libp2p blogpost
sukunrt e36f33a
Optimised images with calibre/image-actions
github-actions[bot] 6d56447
address review comments
sukunrt 0fa3696
Optimised images with calibre/image-actions
github-actions[bot] 62fc4db
add metrics-and-dashboards example and address review comments
sukunrt 84a0b83
add local development setup info
sukunrt 09afc52
address review comments
sukunrt bbbc626
add header image
sukunrt a2dc25b
update date
sukunrt becc906
provide link to public dashboards
sukunrt 5a97a3e
Apply suggestions from code review
p-shahi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,137 @@ | ||||||
--- | ||||||
tags: | ||||||
- metrics | ||||||
- prometheus | ||||||
title: Metrics in go-libp2p | ||||||
description: | ||||||
date: 2023-06-15 | ||||||
permalink: "/2023-06-15-metrics-in-go-libp2p/" | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
note the permalink needs to change as well. can update date and permalink the day of merge |
||||||
author: Sukun Tarachandani | ||||||
--- | ||||||
|
||||||
# Metrics in go-libp2p | ||||||
|
||||||
## Introduction | ||||||
|
||||||
Libp2p is the core networking component for projects and so it is important to be able to observe the state of its components. To that effect, we've been adding metrics to the various components over the last few months. They've already helped us debug some issues and helped with the development of smart dialing. | ||||||
sukunrt marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
p-shahi marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
## Why Prometheus? | ||||||
|
||||||
After deliberations between prometheus, opencensus, and opentelemetry, we decided to use prometheus. The details of the discussion can be found [here](https://github.com/libp2p/go-libp2p/issues/1356). To summarise, prometheus was performant and ubiquitious. This allowed us to add metrics without worrying too much about performance. We also ensured that tracking metrics didn't put too much pressure on the garbage collector, something that we'd found to be an [issue with opencensus](https://github.com/libp2p/go-libp2p/issues/1955). | ||||||
sukunrt marked this conversation as resolved.
Show resolved
Hide resolved
sukunrt marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
|
||||||
## Enabling Metrics | ||||||
sukunrt marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
Metrics have been enabled by default from go-libp2p v0.26.0. All you need to do is to setup a promtheus exporter for the collected metrics. | ||||||
sukunrt marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
```go | ||||||
|
||||||
func main() { | ||||||
http.Handle("/metrics", promhttp.Handler()) | ||||||
go func() { | ||||||
http.ListenAndServe(":2112", nil) | ||||||
sukunrt marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
}() | ||||||
|
||||||
host, err := libp2p.New() | ||||||
// err handling | ||||||
... | ||||||
} | ||||||
``` | ||||||
Now all you have to do is to point your prometheus instance to scrape from `:2122/metrics` | ||||||
|
||||||
By default, metrics are sent to the default prometheus Registerer. To use a different Registerer from the default prometheus registerer, use the option `libp2p.PrometheusRegisterer`. | ||||||
|
||||||
```go | ||||||
|
||||||
func main() { | ||||||
reg := prometheus.NewRegistry() | ||||||
http.Handle("/metrics", promhttp.HandlerFor(reg, promhttp.HandlerOpts{})) | ||||||
go func() { | ||||||
http.ListenAndServe(":2112", nil) | ||||||
}() | ||||||
|
||||||
host, err := libp2p.New( | ||||||
libp2p.PrometheusRegisterer(reg), | ||||||
) | ||||||
// err handling | ||||||
... | ||||||
} | ||||||
``` | ||||||
|
||||||
<!-- TODO: incorporate this PR: https://github.com/libp2p/go-libp2p/pull/2232 --> | ||||||
sukunrt marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
## Discovering what metrics are available | ||||||
sukunrt marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
go-libp2p provides metrics and grafana dashboards for all its major subsystems out of the box. You can check https://github.com/libp2p/go-libp2p/tree/master/dashboards for the grafana dashboards available. Another great way to discover available metrics is to open prometheus ui and type `libp2p_(libp2p-package-name)_` and find available metrics from autocomplete. For Ex: `libp2p_autonat_` gives you the list of all metrics exported from AutoNAT. | ||||||
|
||||||
sukunrt marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
## How are metrics useful? | ||||||
sukunrt marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
I'll share two cases where having metrics were extremely helpful for us in go-libp2p. One case deals with being able to debug a memory leak and one where adding two new metrics helped us with development of a new feature. | ||||||
|
||||||
### Debugging with metrics | ||||||
sukunrt marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
We were excited about adding metrics because it gave us the opportunity to observe exactly what was happening within the system. One of the first system we added metrics to was the Event Bus. When we added event bus metrics, we were immediately able to see discrepancy between two of our metrics, EvtLocalReachabilityChanged and EvtLocalAddressesUpdated. | ||||||
sukunrt marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
<div class="container" style="display:flex; column-gap:10px; justify-content: center; align-items: center;"> | ||||||
<figure> | ||||||
<img src="../assets/metrics-in-go-libp2p-evtlocalreachabilitychanged.png" width="750"> | ||||||
<figcaption style="font-size:x-small;"> | ||||||
EvtLocalReachabilityChanged | ||||||
</figcaption> | ||||||
</figure> | ||||||
</div> | ||||||
|
||||||
<div class="container" style="display:flex; column-gap:10px; justify-content: center; align-items: center;"> | ||||||
<figure> | ||||||
<img src="../assets/metrics-in-go-libp2p-evtlocaladdressesupdated.png" width="750"> | ||||||
<figcaption style="font-size:x-small;"> | ||||||
EvtLocalAddressesUpdated | ||||||
</figcaption> | ||||||
</figure> | ||||||
</div> | ||||||
|
||||||
Ideally when a node's reachability changes, the node's addresses should change as it tries to obtain a relay reservation. This pointed us to an issue with AutoNAT. Upon debugging we realised that the we were emitting reachability changed events when the reachability had not changed and only the address to which the autonat dial succeeded had changed. Another event EvtLocalProtocolsUpdated pointed us to another problem. | ||||||
sukunrt marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
<div class="container" style="display:flex; column-gap:10px; justify-content: center; align-items: center;"> | ||||||
<figure> | ||||||
<img src="../assets/metrics-in-go-libp2p-evtprotocolsupdated.png" width="750"> | ||||||
<figcaption style="font-size:x-small;"> | ||||||
EvtLocalProtocolsUpdated | ||||||
</figcaption> | ||||||
</figure> | ||||||
</div> | ||||||
|
||||||
Node's supported protocols shouldn't change if node's reachability has not changed. Once aware of the issue, finding the root cause was simple enough. There was a problem with cleaning up the relay service used in relay manager. | ||||||
sukunrt marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
You can see the details on the [github issue](https://github.com/libp2p/go-libp2p/issues/2046) | ||||||
sukunrt marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
### Development using metrics | ||||||
sukunrt marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
In go-libp2p v0.28.0 we introduced smart dialing. When connecting with a peer instead of dialing all the addresses of the peer in parallel, we now prioritise QUIC dials. This significantly reduces dial cancellations and reduces unnecessary load on the network. | ||||||
sukunrt marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
Not dialing all addresses in parallel does increase latency for establishing a connection when the first dial doesn't succeed. We wanted to ensure that most of the connections succeeded with no additional latency. To help us better gauge the impact we added two metrics | ||||||
1. Dial ranking delay. This metric tracks the latency in connection establishment introduced by the dial prioritisation logic. | ||||||
2. Dials per connection. This metric counts the number of addresses dialed before a connection was established with the peer. | ||||||
|
||||||
Dials per connection measures the benefit of introducing smart dialing mechanism, and dial ranking delay provided us with the assurance that the vast majority of dials were unaffected. | ||||||
|
||||||
<div class="container" style="display:flex; column-gap:10px; justify-content: center; align-items: center;"> | ||||||
<figure> | ||||||
<img src="../assets/metrics-in-go-libp2p-smart-dialing.png" width="750"> | ||||||
<figcaption style="font-size:x-small;"> | ||||||
Smart dialing metrics | ||||||
</figcaption> | ||||||
</figure> | ||||||
</div> | ||||||
|
||||||
The details can be seen on the smart-dialing [PR](https://github.com/libp2p/go-libp2p/pull/2260) | ||||||
sukunrt marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
|
||||||
## Resources and How you can contribute | ||||||
sukunrt marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
Check out our grafana dashboards: https://github.com/libp2p/go-libp2p/tree/master/dashboards | ||||||
|
||||||
To create custom dashboards, the [prometheus](https://prometheus.io/docs/prometheus/latest/querying/basics/) and [grafana docs](https://grafana.com/docs/grafana/latest/panels-visualizations/) are great resources. | ||||||
|
||||||
sukunrt marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
If you would like to contribute, please [connect with the libp2p maintainers](https://libp2p.io/#community) | ||||||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to update this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can update it once we are ready to release. #77 (review)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done