Skip to content

[tmpnet] Enable externally accessible URIs for kube-hosted nodes #4016

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 8 commits into
base: tmpnet-configmap
Choose a base branch
from

Conversation

maru-ava
Copy link
Contributor

@maru-ava maru-ava commented Jun 17, 2025

Why this should be merged

Previously, access from outside of a kube cluster was enabled by port-forwarding through the kube API. This approach turned out incompatible with load testing because it greatly limited the rate at which transactions could be sent.

The port-forwarding is replaced with nginx+ingress to ensure minimum overhead when running outside of a kube cluster.

How this works

  • deploy kind cluster with a static NodePort
  • deploy the nginx ingress operator with helm and configure to use the NodePort
  • update kube runtime to create a service and ingress for each node to configure external accessibility via nginx
  • replace GetLocalURI with simplified GetAccessibleURI
  • enabled configuration of the base accessible URI to enable use of a hosted kube cluster

How this was tested

CI

Need to be documented in RELEASES.md?

N/A

TODO

@maru-ava maru-ava self-assigned this Jun 17, 2025
@Copilot Copilot AI review requested due to automatic review settings June 17, 2025 07:34
@maru-ava maru-ava added testing This primarily focuses on testing tooling Build, test and development tooling labels Jun 17, 2025
@maru-ava maru-ava marked this pull request as draft June 17, 2025 07:35
@maru-ava maru-ava moved this to In Progress 🏗️ in avalanchego Jun 17, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR replaces port-forwarding with an externally accessible URI approach for kube-hosted nodes in order to improve load testing performance. Key changes include refactoring functions from GetLocalURI to GetAccessibleURI (and staking address equivalents), adding new logic to create Kubernetes Service and Ingress resources, and updating tests to use the new accessible URI methods.

Reviewed Changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/fixture/tmpnet/utils.go Updated error handling in CheckNodeHealth and simplified Node URI retrieval
tests/fixture/tmpnet/start_kind_cluster.go Added functions to deploy and verify the nginx ingress controller via Helm
tests/fixture/tmpnet/process_runtime.go Replaced GetLocalURI with GetAccessibleURI and updated staking address method
tests/fixture/tmpnet/node.go Updated Node interface methods to use the accessible URI versions
tests/fixture/tmpnet/network.go Updated network URI retrieval to use GetAccessibleURI
tests/fixture/tmpnet/kube_runtime.go Refactored URI retrieval for nodes to support external access and added Service/Ingress
tests/fixture/tmpnet/flags/kube_runtime.go Introduced new flag for base accessible URI configuration
tests/e2e/* Updated tests to use GetAccessibleURI (and staking address) throughout
scripts/kind-with-registry.sh Added a new script to set up kind with a local registry
flake.nix Removed kind-with-registry derivation and updated PATH configuration

@maru-ava maru-ava moved this from In Progress 🏗️ to Ready 🚦 in avalanchego Jun 25, 2025
@maru-ava maru-ava marked this pull request as ready for review June 25, 2025 03:45
@maru-ava maru-ava force-pushed the tmpnet-nginx-ingress branch from 3814de2 to 30501bc Compare June 25, 2025 03:46
Copy link
Contributor

@RodrigoVillar RodrigoVillar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran this locally with the load test passing.

I tried using this against the remote cluster with no success (this is expected). However, I believe you mentioned that this PR should refuse deployment to a context other than kind-kind. Do you think we should add the refuse logic to this PR?

@maru-ava
Copy link
Contributor Author

maru-ava commented Jun 26, 2025

Ran this locally with the load test passing.

I tried using this against the remote cluster with no success (this is expected). However, I believe you mentioned that this PR should refuse deployment to a context other than kind-kind. Do you think we should add the refuse logic to this PR?

I meant to say that the test should only target kind-kind, not that tmpnet should refuse to deploy to a context other than kind-kind.

Was the failure something like failed to wait for Ingress...? It may make sense to log a warning if the ingress status suggests that the reason is the lack of an ingress controller.

@RodrigoVillar
Copy link
Contributor

@maru-ava I'm getting the following error:

[06-26|09:29:52.291] ERROR avalanchego-load-test tests/simple_test_context.go:33
        Error Trace:    /Users/rodrigo.villar/go/src/github.com/ava-labs/avalanchego/tests/fixture/e2e/helpers.go:304
                                                /Users/rodrigo.villar/go/src/github.com/ava-labs/avalanchego/tests/fixture/e2e/env.go:185
                                                /Users/rodrigo.villar/go/src/github.com/ava-labs/avalanchego/tests/load/c/main/main.go:63
                                                /Users/rodrigo.villar/.goenv/versions/1.23.9/src/runtime/proc.go:272
                                                /Users/rodrigo.villar/.goenv/versions/1.23.9/src/runtime/asm_arm64.s:1223
        Error:          Received unexpected error:
                        failed to query node health: failed to issue request: Post "http://localhost:30791/networks/cb4ff442-bea0-401b-b5b5-5d7c38fc1e9c/NodeID-PWaVaDZqkNCfsYBtdCsXBSBkdas6Uj3Fu/ext/health": dial tcp 127.0.0.1:30791: connect: connection refused
        Messages:       failed to bootstrap network

Going back to the original question, logging a warning would be great here.

@maru-ava maru-ava force-pushed the tmpnet-nginx-ingress branch from 30501bc to c2453bb Compare June 26, 2025 14:28
@maru-ava
Copy link
Contributor Author

Rebased

@maru-ava maru-ava force-pushed the tmpnet-nginx-ingress branch 4 times, most recently from 0d49b43 to 407219d Compare June 27, 2025 01:47
@maru-ava
Copy link
Contributor Author

maru-ava commented Jun 27, 2025

Going back to the original question, logging a warning would be great here.

I added more checks, but when I was testing this, I realized that our test cluster actually has an ingress controller so the checks didn't actually help.

Instead, I've changed how the base accessible URI is configured so that it is no longer defaulted to the kind-kind value. Now if you try to deploy outside a kube cluster and the target cluster is not the kind cluster we deploy, it will complain that --base-accessible-uri must be provided and exit without deploying anything.

Edit: Updated to source the ingress configuration from a configmap in the cluster. That way it can be zero-config for the end-user.

@maru-ava maru-ava force-pushed the tmpnet-nginx-ingress branch 2 times, most recently from 30e1313 to 6a0c887 Compare July 1, 2025 03:30
@maru-ava maru-ava force-pushed the tmpnet-nginx-ingress branch from 8869e8a to fec55a8 Compare July 3, 2025 00:51
@maru-ava
Copy link
Contributor Author

maru-ava commented Jul 3, 2025

Dropped basicauth protection - the guid and node id should be sufficient.

@maru-ava maru-ava force-pushed the tmpnet-nginx-ingress branch 2 times, most recently from 2a0ca73 to 51ac8dc Compare July 8, 2025 04:43
@maru-ava maru-ava changed the base branch from master to tmpnet-configmap July 8, 2025 04:43
@maru-ava maru-ava marked this pull request as draft July 8, 2025 04:43
@maru-ava maru-ava force-pushed the tmpnet-nginx-ingress branch 2 times, most recently from a26aa01 to 0db8da1 Compare July 8, 2025 05:20
maru-ava added 8 commits July 8, 2025 05:35
Previously, access from outside of a kube cluster was enabled by
port-forwarding through the kube API. This approach turned out
incompatible with load testing because it greatly limited the rate at
which transactions could be sent.

The port-forwarding is replaced with nginx+ingress to ensure minimum
overhead when running outside of a kube cluster.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing This primarily focuses on testing tooling Build, test and development tooling
Projects
Status: Ready 🚦
Development

Successfully merging this pull request may close these issues.

2 participants