-
Notifications
You must be signed in to change notification settings - Fork 772
Add test to re-execute specified range of mainnet C-Chain blocks #4019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Note: running locally correctly exports the metrics to Grafana, but when using Linus/Snoopy (or in general), must be careful to ensure that prometheus is running correctly. I ran into some issues where
Since |
- name: Download Previous Benchmark Result | ||
uses: actions/cache@v4 | ||
with: | ||
path: ./cache | ||
key: ${{ runner.os }}-reexecute-cchain-range-benchmark.json | ||
- name: Compare Benchmark Results | ||
uses: benchmark-action/github-action-benchmark@v1 | ||
with: | ||
tool: 'go' | ||
output-file-path: $GITHUB_WORKSPACE/reexecution-data/reexecute-cchain-range.txt | ||
external-data-json-path: ./cache/${{ runner.os }}-reexecute-cchain-range-benchmark.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing to the cache and comparing against the last entry enables a comparison.
This should be changed to write to the cache only on the master branch for comparison and other triggers should compare against the latest baseline set by master
Trying out running on ARC and Blacksmith here: |
start-block: | ||
description: 'The start block for the benchmark.' | ||
required: false | ||
default: '100' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than duplicating the defaults here and in the job, maybe use env vars e.g.
env:
START_BLOCK: '100'
...
on:
pull_request:
workflow_dispatch:
inputs:
start-block:
description: 'The start block for the benchmark.'
required: false
default: ${{ env.START_BLOCK }}
...
- name: Configure AWS Credentials | ||
uses: aws-actions/configure-aws-credentials@v4 | ||
with: | ||
role-to-assume: ${{ secrets.AWS_S3_READ_ONLY_ROLE }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this secret won't be available from fork branches, you'll want to make job execution conditional (as per the example of run-monitored-tmpnet-cmd so that the job doesn't fail on fork branch PRs.
- name: Set task env via GITHUB_ENV | ||
id: set-params | ||
run: | | ||
if [[ "${{ github.event_name }}" == "workflow_dispatch" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since github.event.inputs.*
will be undefined when the job is not triggered by workflow_dispatch
, a simpler way of setting each variable would be using ||
to supply a default:
echo "START_BLOCK=${{ github.event.inputs.start-block || env.START_BLOCK }}" >> $GITHUB_ENV`
Note that the echoed text needs to be appended to $GITHUB_ENV
to affect the environment - the current form would only result in log output.
artifact_prefix: c-chain-reexecute-range | ||
prometheus_username: ${{ secrets.PROMETHEUS_ID || '' }} | ||
prometheus_password: ${{ secrets.PROMETHEUS_PASSWORD || '' }} | ||
grafana_dashboard_id: 'c-chain-processing-custom-v8/c-chain-processing' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the correct dashboard ID now that you've incorporated the target metrics into the main C-Chain dashboard?
uses: ./.github/actions/run-monitored-tmpnet-cmd | ||
with: | ||
run: ./scripts/run_task.sh reexecute-cchain-range-with-copied-data EXECUTION_DATA_DIR=${{ github.workspace }}/reexecution-data BENCHMARK_OUTPUT_FILE=${{ github.workspace }}/reexecute-cchain-range-benchmark-res.txt | ||
artifact_prefix: c-chain-reexecute-range |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be omitted since it is only relevant if inputs.runtime
is set to one of process
or kube
. If there is an artifact to collect, the job will need to call actions/upload-artifact
itself.
// This default delay is set above the default scrape interval used by StartPrometheus. | ||
time.Sleep(tmpnet.NetworkShutdownDelay) | ||
|
||
r.NoError(server.Stop()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this fails, it will prevent the removal of sdConfigFilePath. Ensure that both cleanup steps will always execute by registering them separately with tb.Cleanup
.
func getCounterMetricValue(tb testing.TB, registry prometheus.Gatherer, query string) (float64, error) { | ||
metricFamilies, err := registry.Gather() | ||
r := require.New(tb) | ||
r.NoError(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a given function, please avoid mixing testify assertions with a returned error. In this case, maybe just return the error given that the caller is already using testify to check it?
sdConfigFilePath, err = tmpnet.WritePrometheusSDConfig(name, tmpnet.SDConfig{ | ||
Targets: []string{server.Address()}, | ||
Labels: labels, | ||
}, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please try to avoid the use of magic values like this one in favor of either a descriptively-named variable (e.g. withGithubLabels
or by adding a comment e.g. true /* withGithubLabels */
.
// "avalanche_snowman" and the chain label (ex. chain="C") that would be handled | ||
// by the[chain manager](../../../chains/manager.go). | ||
func newConsensusMetrics(registry prometheus.Registerer) (*consensusMetrics, error) { | ||
errs := wrappers.Errs{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this suggested for a single potential error?
func (e *vmExecutor) executeSequence(ctx context.Context, blkChan <-chan blockResult, executionTimeout time.Duration) error { | ||
blkID, err := e.vm.LastAccepted(ctx) | ||
if err != nil { | ||
return fmt.Errorf("failed to get last accepted block: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(No action required) Maybe differentiate the error from the subsequent one e.g. block
-> block id
?
Why this should be merged
This PR adds a configurable C-Chain benchmark that can be run locally with Taskfile or via GitHub Action (this PR currently includes triggers for pull requests and manual GH Action, but this should be changed to a cronjob on master and manual GH Action before merge and once done testing).
The test clones a block database and current state database, so that it can execute a range of blocks starting from the state at an arbitrary height N.
The CI workflow copies the data from an S3 Bucket in the Ava Labs Experimental Account in us-east-2 and requires AWS credentials to be provided. It can also be used with local file system directory paths for use on Snoopy/Linus (dedicated machines with large SSDs used for Firewood testing).
How this works
Constructs an EVM instance mimicking the params passed in from AvalancheGo including passing in a metrics registry with the expected prefix and chain labels attached to it, so that it can be used directly with existing Grafana Dashboards used with tmpnet rather than needing to maintain an alternative without the prefixes/labels.
How this was tested
This has been tested by running in CI and running through the steps documented in the README. To review this PR, please walk through the steps in the benchmark README.
Need to be documented in RELEASES.md?
No.
Follow Ups to this PR
gh-pages
branch or a separate repo (demo of what this will look like on my fork of Canoto herePotential Future Work