Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telemetry export, reporting and monitoring with prometheus and grafana #124

Merged
merged 9 commits into from
May 9, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .env_example
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,10 @@ BLOCK_RESULT_OPERATOR_PRIVATE_KEY=<PRIVATE-KEY-OF-BLOCK-RESULT-OPERATOR>

# if running rudder and all supporting services using docker compose
NODE_ETHEREUM_MAINNET=http://hardhat-node:8545/
IPFS_PINNER_URL=http://ipfs-pinner:3000
IPFS_PINNER_URL=http://ipfs-pinner:3001
EVM_SERVER_URL=http://evm-server:3002

# if running rudder locally and all other services using docker compose
NODE_ETHEREUM_MAINNET=http://127.0.0.1:8545/
IPFS_PINNER_URL=http://127.0.0.1:3000
IPFS_PINNER_URL=http://127.0.0.1:3001
EVM_SERVER_URL=http://127.0.0.1:3002
2 changes: 1 addition & 1 deletion .envrc
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
export ERIGON_NODE="https://ethereum.web3.covalenthq.com/mainnet/rpc"
export IPFS_PINNER_URL="http://127.0.0.1:3000"
export IPFS_PINNER_URL="http://127.0.0.1:3001"
export EVM_SERVER_URL="http://127.0.0.1:3002"

[[ -f .envrc.local ]] && source_env .envrc.local
4 changes: 3 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -50,4 +50,6 @@ COPY --from=builder-elixir /mix/test-data/ /app/test-data
# Used only for testing in compose
# CMD [ "mix", "test", "./test/block_specimen_decoder_test.exs", "./test/block_result_uploader_test.exs"]

CMD ["/app/prod/bin/rudder", "start"]
CMD ["/app/prod/bin/rudder", "start"]

EXPOSE 9568
13 changes: 10 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,12 @@
- [Environment](#environment)
- [Pull](#pull)
- [Run](#docker-run)
- [Monitor](#monitor)
- [Build & Run From Source](#build-from-source)
- [Linux x86_64](#linux-x86_64-ubuntu-2204-lts-install-dependencies)
- [Environment](#env-vars)
- [Run](#source-run)
- [Monitor](#monitor)
- [Troubleshooting](#troubleshooting)
- [Bugs Reporting & Contributions](#bugs-reporting-contributions)
- [Scripts](#scripts)
Expand Down Expand Up @@ -170,7 +172,7 @@ Create `envrc.local` file and add the following env vars.
```bash
export BLOCK_RESULT_OPERATOR_PRIVATE_KEY=block-result-operator-private-key-without-0x-prefix
export NODE_ETHEREUM_MAINNET="https://moonbeam-alphanet.web3.covalenthq.com/alphanet/direct-rpc"
export IPFS_PINNER_URL="http://ipfs-pinner:3000"
export IPFS_PINNER_URL="http://ipfs-pinner:3001"
export EVM_SERVER_URL="http://evm-server:3002"
export WEB3_JWT="****"
```
Expand All @@ -193,7 +195,7 @@ This shows that the shell is loaded correctly. You can check if they're what you

```bash
echo $IPFS_PINNER_URL
http://ipfs-pinner:3000
http://ipfs-pinner:3001
```

### <span id="rudder_docker_pull">Pull</span>
Expand Down Expand Up @@ -271,7 +273,7 @@ Hence there is a single binary per "Environment". To understand more about this
rudder | moonbase-node: https://moonbeam-alphanet.web3.covalenthq.com/alphanet/direct-rpc
rudder | brp-operator: ecf0b636233c6580f60f50ee1d809336c3a76640dbd77f7cdd054a82c6fc0a31
rudder | evm-server: http://evm-server:3002
rudder | ipfs-node: http://ipfs-pinner:3000
rudder | ipfs-node: http://ipfs-pinner:3001
ipfs-pinner | 2023/04/19 16:53:31 Listening...
rudder | ==> nimble_options
rudder | Compiling 3 files (.ex)
Expand Down Expand Up @@ -358,6 +360,11 @@ Once the binary is compiled. Rudder can start to process block specimens into bl
rudder | [info] Summary for rudder_metrics - {0.0035489999999999996, 0.0035489999999999996}
rudder | [info] curr_block: 4180658 and latest_block_num:4180657
```
### <span id="rudder_monitor">Monitor</span>

`rudder`already captures the most relevant performance metrics and execution times for various processes in the pipeline and exports all of it using Prometheus.

See the full document on how to setup Prometheus and Grafana for [rudder metrics collection, monitoring, reporting and alerting](./docs/metrics.md)

## <span id="rudder_source">Build From Source</span>

Expand Down
2 changes: 1 addition & 1 deletion config/config.exs
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
import Config

config :rudder,
ipfs_pinner_url: System.get_env("IPFS_PINNER_URL", "http://127.0.0.1:3000"),
ipfs_pinner_url: System.get_env("IPFS_PINNER_URL", "http://127.0.0.1:3001"),
operator_private_key: System.get_env("BLOCK_RESULT_OPERATOR_PRIVATE_KEY"),
proofchain_address: "0x4f2E285227D43D9eB52799D0A28299540452446E",
proofchain_chain_id: 1284,
Expand Down
2 changes: 1 addition & 1 deletion config/docker.exs
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,5 @@ config :rudder,
proofchain_address: "0xCF3d5540525D191D6492F1E0928d4e816c29778c",
proofchain_chain_id: 31337,
proofchain_node: "http://hardhat-node:8545/",
ipfs_pinner_url: "http://ipfs-pinner:3000",
ipfs_pinner_url: "http://ipfs-pinner:3001",
evm_server_url: "http://evm-server:3002"
2 changes: 1 addition & 1 deletion config/test.exs
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,5 @@ config :rudder,
proofchain_address: "0xCF3d5540525D191D6492F1E0928d4e816c29778c",
proofchain_chain_id: 31337,
proofchain_node: "http://127.0.0.1:8545/",
ipfs_pinner_url: "http://127.0.0.1:3000",
ipfs_pinner_url: "http://127.0.0.1:3001",
evm_server_url: "http://127.0.0.1:3002"
8 changes: 4 additions & 4 deletions docker-compose-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,14 +47,14 @@ services:
restart: on-failure
expose:
- "4001:4001"
- "3000:3000"
- "3001:3001"
environment:
- WEB3_JWT=${WEB3_JWT}
networks:
- cqt-net
ports:
- "4001:4001"
- "3000:3000"
- "3001:3001"

evm-server:
image: "us-docker.pkg.dev/covalent-project/network/evm-server:latest"
Expand Down Expand Up @@ -94,8 +94,8 @@ services:
done;
echo Proof-chain contracts deployed!;
echo Uploading test files to local .ipfs...;
curl -F "filedata=@/app/test-data/codec-0.35/encoded/1-17090940-replica-0x7b8e1d463a0fbc6fce05b31c5c30e605aa13efaca14a1f3ba991d33ea979b12b" http://ipfs-pinner:3000/upload;
curl -F "filedata=@/app/test-data/codec-0.35/encoded/1-17090960-replica-0xc95d44182ee006e79f1352ef32664210f383baa016988d5ab2fd950b52bf22ff" http://ipfs-pinner:3000/upload;
curl -F "filedata=@/app/test-data/codec-0.35/encoded/1-17090940-replica-0x7b8e1d463a0fbc6fce05b31c5c30e605aa13efaca14a1f3ba991d33ea979b12b" http://ipfs-pinner:3001/upload;
curl -F "filedata=@/app/test-data/codec-0.35/encoded/1-17090960-replica-0xc95d44182ee006e79f1352ef32664210f383baa016988d5ab2fd950b52bf22ff" http://ipfs-pinner:3001/upload;
echo Test bsp files uploaded!;
cd /app;
MIX_ENV=docker mix test --trace --slowest 10;
Expand Down
10 changes: 8 additions & 2 deletions docker-compose-mbase.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
version: '3'
# runs the entire rudder pipeline with all supporting services (including rudder) in docker
# set .env such that all services in docker are talking to each other only

services:
ipfs-pinner:
image: "us-docker.pkg.dev/covalent-project/network/ipfs-pinner:stable"
Expand All @@ -9,14 +13,14 @@ services:
"autoheal": "true"
expose:
- "4001:4001"
- "3000:3000"
- "3001:3001"
environment:
- WEB3_JWT=${WEB3_JWT}
networks:
- cqt-net
ports:
- "4001:4001"
- "3000:3000"
- "3001:3001"

evm-server:
image: "us-docker.pkg.dev/covalent-project/network/evm-server:stable"
Expand Down Expand Up @@ -62,6 +66,8 @@ services:
- IPFS_PINNER_URL=${IPFS_PINNER_URL}
networks:
- cqt-net
ports:
- "9568:9568"

autoheal:
image: willfarrell/autoheal
Expand Down
6 changes: 4 additions & 2 deletions docker-compose-mbeam.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,14 @@ services:
"autoheal": "true"
expose:
- "4001:4001"
- "3000:3000"
- "3001:3001"
environment:
- WEB3_JWT=${WEB3_JWT}
networks:
- cqt-net
ports:
- "4001:4001"
- "3000:3000"
- "3001:3001"

evm-server:
image: "us-docker.pkg.dev/covalent-project/network/evm-server:stable"
Expand Down Expand Up @@ -65,6 +65,8 @@ services:
- IPFS_PINNER_URL=${IPFS_PINNER_URL}
networks:
- cqt-net
ports:
- "9568:9568"

autoheal:
image: willfarrell/autoheal
Expand Down
2 changes: 1 addition & 1 deletion docs/ARCH.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ That will lead to the corresponding logs:

Once the block results have been produced they need to be proved and uploaded. This ideally happens atomically for rudder.

Below is an example of how to interact with block result uploader that speaks to `ipfs-pinner` available with `export IPFS_PINNER_URL="http://127.0.0.1:3000"`. The file is directly uploaded to IPFS using the wrapped local IPFS node.
Below is an example of how to interact with block result uploader that speaks to `ipfs-pinner` available with `export IPFS_PINNER_URL="http://127.0.0.1:3001"`. The file is directly uploaded to IPFS using the wrapped local IPFS node.

```elixir
file_path = Path.expand(Path.absname(Path.relative_to_cwd("test-data/temp2.txt")))
Expand Down
2 changes: 1 addition & 1 deletion docs/development.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ Add the env vars to a .env file as below. Ask your node operator about these if
```bash
export BLOCK_RESULT_OPERATOR_PRIVATE_KEY=block-result-operator-private-key-without-0x-prefix
export NODE_ETHEREUM_MAINNET="https://moonbeam-alphanet.web3.covalenthq.com/alphanet/direct-rpc"
export IPFS_PINNER_URL="http://ipfs-pinner:3000"
export IPFS_PINNER_URL="http://ipfs-pinner:3001"
export EVM_SERVER_URL="http://evm-server:3002"
export WEB3_JWT="****"
```
Binary file added docs/grafana.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
113 changes: 108 additions & 5 deletions docs/metrics.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,111 @@
# Metrics collection and reporting
# Metrics Collection and Reporting

`rudder` enabled metrics collection via `--metrics` flag. A metrics server can be enabled (by `--metrics.addr` and `--metrics.port`). The metrics are served in two formats:
`rudder` is proactively enabled with metrics collection via prometheus.

- `/debug/metrics`: json representation of expvars and go-metrics
- `/debug/metrics/prometheus`: same metrics as above but in prometheus format
## Config

Monitoring can be setup (for example) by plugging the endpoint serving in prometheus-format into influxdb, which is plugged into grafana.
Install Prometheus <https://prometheus.io/download/>

* Edit `/opt/homebrew/etc/prometheus.yml` for mac/m1.
or
* Edit `/usr/local/etc/prometheus.yml` for linux/x86.

Add the config for prometheus to pick up exported [rudder telemetry metrics](../lib/rudder/metrics/prometheus.yml).

Restart your prometheus server

```bash
brew services restart prometheus
```

Monitoring can be setup (for example) by plugging the endpoint serving in prometheus-format into a grafana plugin, which can be viewed in grafana - sliced and diced further as per need per metric.

## Metrics

The following metrics captured from rudder are exported with `/metrics` endpoint via prometheus.

```elixir
# TYPE rudder_events_rudder_pipeline_success_duration gauge
rudder_events_rudder_pipeline_success_duration{operation="pipeline_success",table="rudder_metrics"} 0.004265
# TYPE rudder_events_rudder_pipeline_success_count counter
rudder_events_rudder_pipeline_success_count{operation="pipeline_success",table="rudder_metrics"} 4
# TYPE rudder_events_journal_fetch_items_duration gauge
rudder_events_journal_fetch_items_duration{operation="fetch_items",table="journal_metrics"} 1.2e-5
# TYPE rudder_events_journal_fetch_items_count counter
rudder_events_journal_fetch_items_count{operation="fetch_items",table="journal_metrics"} 1
# TYPE rudder_events_journal_fetch_last_duration gauge
rudder_events_journal_fetch_last_duration{operation="fetch_last",table="journal_metrics"} 3.6e-5
# TYPE rudder_events_journal_fetch_last_count counter
rudder_events_journal_fetch_last_count{operation="fetch_last",table="journal_metrics"} 1
# TYPE rudder_events_brp_proof_duration gauge
rudder_events_brp_proof_duration{operation="proof",table="brp_metrics"} 6.259999999999999e-4
# TYPE rudder_events_brp_proof_count counter
rudder_events_brp_proof_count{operation="proof",table="brp_metrics"} 4
# TYPE rudder_events_brp_upload_success_duration gauge
rudder_events_brp_upload_success_duration{operation="upload_success",table="brp_metrics"} 0.0023769999999999998
# TYPE rudder_events_brp_upload_success_count counter
rudder_events_brp_upload_success_count{operation="upload_success",table="brp_metrics"} 4
# TYPE rudder_events_bsp_execute_duration gauge
rudder_events_bsp_execute_duration{operation="execute",table="bsp_metrics"} 2.1799999999999999e-4
# TYPE rudder_events_bsp_execute_count counter
rudder_events_bsp_execute_count{operation="execute",table="bsp_metrics"} 4
# TYPE rudder_events_bsp_decode_duration gauge
rudder_events_bsp_decode_duration{operation="decode",table="bsp_metrics"} 0.0
# TYPE rudder_events_bsp_decode_count counter
rudder_events_bsp_decode_count{operation="decode",table="bsp_metrics"} 4
# TYPE rudder_events_ipfs_fetch_duration gauge
rudder_events_ipfs_fetch_duration{operation="fetch",table="ipfs_metrics"} 0.001588
# TYPE rudder_events_ipfs_fetch_count counter
rudder_events_ipfs_fetch_count{operation="fetch",table="ipfs_metrics"} 4
# TYPE rudder_events_ipfs_pin_duration gauge
rudder_events_ipfs_pin_duration{operation="pin",table="ipfs_metrics"} 0.00174
# TYPE rudder_events_ipfs_pin_count counter
rudder_events_ipfs_pin_count{operation="pin",table="ipfs_metrics"} 4
```

## API

View exported gauges and counters using prometheus at the endpoint -> <http://localhost:9568/metrics>.

Create graphs using prometheus at the endpoint -> <http://localhost:9090/graph>.

View timeseries and add alerting with grafana at the endpoint -> <http://localhost:3000/explore>.

Docker containers automatically export to this endpoint as well via exposed ports and port forwarding.

## Graph

Observe live the gauge time series graphs with plots for example with metrics for `pipeline_success` and `ipfs_fetch` -> <http://localhost:9090/graph?g0.expr=rudder_events_rudder_pipeline_success_duration&g0.tab=0&g0.stacked=1&g0.show_exemplars=0&g0.range_input=15m&g0.step_input=1&g1.expr=rudder_events_ipfs_fetch_duration&g1.tab=0&g1.stacked=1&g1.show_exemplars=1&g1.range_input=15m&g1.step_input=1>

![Observe](./prometheus.png)

## Monitor & Alert

For monitoring and alerting we advice using [Grafana (in conjunction with the aggregated prometheus metrics)](https://grafana.com/docs/grafana/latest/getting-started/get-started-grafana-prometheus/).

Install and start Grafana

```bash
brew install grafana
brew services start grafana
```

Ensure Grafana (default port 3000) and Prometheus (default port 9090) have started.

```bash
$ brew services list
Name Status User File
grafana started user ~/Library/LaunchAgents/homebrew.mxcl.grafana.plist
prometheus started user ~/Library/LaunchAgents/homebrew.mxcl.prometheus.plist
```

Login to your Grafana dashboard -> http://localhost:3000/.

Make sure prometheus is added as a data source -> http://localhost:3000/datasources with the default values for prometheus. Click on [Explore](http://localhost:3000/explore?left=%7B%22datasource%22:%22lVZwdz8Vz%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22prometheus%22,%22uid%22:%22lVZwdz8Vz%22%7D%7D%5D,%22range%22:%7B%22from%22:%22now-1h%22,%22to%22:%22now%22%7D%7D&orgId=1).

Select the metrics and time-series data to view from the dropdown with "Select Metric".
Below is an example of three selections `rudder_events_brp_upload_success_duration`, `rudder_events_rudder_pipeline_success_duration`, `rudder_events_ipfs_fetch_duration`.

This can directly be viewed [here](http://localhost:3000/explore?left=%7B%22datasource%22:%22lVZwdz8Vz%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22prometheus%22,%22uid%22:%22lVZwdz8Vz%22%7D,%22editorMode%22:%22builder%22,%22expr%22:%22rudder_events_brp_upload_success_duration%22,%22legendFormat%22:%22__auto%22,%22range%22:true,%22instant%22:true%7D,%7B%22refId%22:%22B%22,%22datasource%22:%7B%22type%22:%22prometheus%22,%22uid%22:%22lVZwdz8Vz%22%7D,%22editorMode%22:%22builder%22,%22expr%22:%22rudder_events_rudder_pipeline_success_duration%22,%22legendFormat%22:%22__auto%22,%22range%22:true,%22instant%22:true%7D,%7B%22refId%22:%22C%22,%22datasource%22:%7B%22type%22:%22prometheus%22,%22uid%22:%22lVZwdz8Vz%22%7D,%22editorMode%22:%22builder%22,%22expr%22:%22rudder_events_ipfs_fetch_duration%22,%22legendFormat%22:%22__auto%22,%22range%22:true,%22instant%22:true%7D%5D,%22range%22:%7B%22from%22:%22now-15m%22,%22to%22:%22now%22%7D%7D&orgId=1). You can also add operations on the exported data with aggregations like `sum` and range functions like `delta` etc as seen below.

![grafana](./grafana.png)
Binary file added docs/prometheus.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/rudder-compose.service
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Group=blockchain
Environment=HOME=/home/blockchain/tmp
Environment="BLOCK_RESULT_OPERATOR_PRIVATE_KEY=ba3193ff8df497b5369f0d5c92fe443efd7936ab910084b8d4e1d510f05da1b2"
Environment="NODE_ETHEREUM_MAINNET=https://moonbeam.web3.com/alphanet/rpc"
Environment="IPFS_PINNER_URL=http://ipfs-pinner:3000"
Environment="IPFS_PINNER_URL=http://ipfs-pinner:3001"
Environment="EVM_SERVER_URL=http://evm-server:3002"
Environment="WEB3_JWT=iI91eXzhJUCyIJIINC6bsJciVkOIIpniGRc5.UiNjXLQJJYdYE2NfXdN0ziG5zWFeIW3EpjygiLOol1caizyZIz5WkMTiMMzT1MAZNM3ITmCQGcLNbeJOMQb43JOiwQ3TQMJz2zTMWECOnsIQhWdNoi6ZipYtc2XOMixAW3JQR4cwlTNkYgJDmhjiJQFYkQ3ciUbWOR.2Vx73BD3BWoD5FG6alOp7foK8krI7Akysr5lVbhP4Bu"
Type=simple
Expand Down
10 changes: 10 additions & 0 deletions lib/rudder/metrics/prometheus.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
global:
scrape_interval: 15s

scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: "telemetry_metrics_prometheus"
static_configs:
- targets: ["localhost:9568"]
Loading