Skip to content

Enable Cubist Signer integration #3965

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 65 commits into
base: master
Choose a base branch
from
Open

Conversation

geoff-vball
Copy link
Contributor

@geoff-vball geoff-vball commented May 15, 2025

Why this should be merged

The enables the rpc-signer configuration.

This is based off of this old PR.

How this works

I moved the instantiation of the signer from the config package to node.New() where we instantiate everything else that needs filesystem/networking. This way we don't need to propagate cleanup all over the place.

I also opted for a Shutdown() method on bls.Signer, which is how the other resources attached to node are managed. Currently, resources are not shutdown if we fail in another part of node.New(), but this is how the rest of the resources are also managed.

Out of scope for this PR, but we may want to consider some sort of registry of cleanup/shutdown functions that can be unwound whenever we fail.

How this was tested

Unit tests

Need to be documented in RELEASES.md?

Yes?

richardpringle and others added 30 commits March 12, 2025 14:05
The code was changed without changing the behaviour. Instead of
`os.Stat`ing the file, we just try to read the file (which returned the
same error type).
Prior to this, we were checking if the file existed before attempting to
readd it. It was unnecessary and the logic has been removed.
@geoff-vball geoff-vball requested a review from joshua-kim June 17, 2025 20:25
Comment on lines 25 to 27
signer1, err := NewStakingSigner(config1.StakingSignerConfig)
require.NoError(err)
signer2, err := NewStakingSigner(config1.StakingSignerConfig)
Copy link
Contributor

@joshua-kim joshua-kim Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like we're testing implementation details, since this code is not a part of the public node api. I think we should just create the node through node.New, pass in the configuration, and do a check against the Signer exposed on it. If we do that I think these tests belong in node_test instead of in their own file, since we generally shouldn't test unexported code by calling it explicitly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've simplified this test to hopefully make things more clear. What we're testing is that when a signer is created with a new key, that that key is saved and used the next time a signer is instantiated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern isn't that the test is clear - it's that we're testing implementation details. ref #3965 (comment)

func TestGetStakingSigner(t *testing.T) {
testKey := "HLimS3vRibTMk9lZD4b+Z+GLuSBShvgbsu0WTLt2Kd4="
testKeyPath1 := filepath.Join(t.TempDir(), ".avalanchego/staking/signer.key")
testKeyPath2 := strings.Replace(testKeyPath1, "001", "002", 1) // Anticipate the new temp dir that will be created
Copy link
Contributor

@joshua-kim joshua-kim Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this safe to do? I think t.TempDir does clean up the dirs (so you can expect to start at 001), but if you hit power loss, a sig kill, or any other unexpected failure, this string replacement is brittle and can lead to flaky unit tests. It's also not documented as a guarantee on the interface for TempDir. Is it possible for us to get rid of this string matching?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think t.TempDir creates an entirely new parent directory per test run, so should be unaffected by power-loss, panic etc, but I did find a way to remove the string replacement

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is still affected, since if the dir fails to cleanup the hard-coded string "001" would not match since testing will create a new dir with a unique name (ref).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I'm fairly certain that testing will create an entirely new parent directory for the subsequent test run

@geoff-vball geoff-vball requested a review from joshua-kim July 7, 2025 14:03
node/signer.go Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this into node.go? Typically in Go it's idiomatic to put stuff close to where they're used, so creating flat file structures where you have a dedicated file/test-file per abstraction isn't common. I think it can be okay in some circumstances, but for a non-exported struct I would lean away from it.

Comment on lines 25 to 28
signer1, err := newStakingSigner(config)
require.NoError(err)
signer2, err := newStakingSigner(config)
require.NoError(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since newStakingSigner is a non-exported helper, ideally we wouldn’t test it directly since it's an implementation detail of node, not a part of its api. Would it be possible to cover this logic by going through node.New instead? That would make the test more realistic and aligned with how the code is actually used by the caller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing the idempotence of creating a "default signer" by creating an entire node seems completely antithetical to to idea of unit testing. I can push down the localsigner creation logic into an exported helper in localsigner if that makes the separation cleaner

Copy link
Contributor

@joshua-kim joshua-kim Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing the idempotence of creating a "default signer" by creating an entire node seems completely antithetical to to idea of unit testing.

This is not exported as part of the package, so it's an implementation detail of the package api. I can be okay with testing un-exported code when:

  1. The codebase is legacy and it is not reasonable for us to refactor code to test a change
  2. There is an incident/sensitive timeframe
  3. The abstraction being tested is clearly encapsulated and is non-trivial logic and it is also not possible to test through a public api

In this case, although this is well-encapsulated code, it is still possible to test through the public api if we parse default viper flags. My understanding is that it is possible and simple enough to test this through node.New and in exchange our test is less brittle and gives us better coverage. The way this bug would occur would be if someone spun up a node twice - why not reflect that in the test?

I can push down the localsigner creation logic into an exported helper in localsigner if that makes the separation cleaner

This is an option... but this initialization logic is only needed for the node since it needs to write/parse the key it generates on startup so it feels leaky to expose that api to all consumers of localsigner who only need it for unit tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment w.r.t Go file structure, but this should probably live in node_test as well.

Signed-off-by: Joshua Kim <20001595+joshua-kim@users.noreply.github.com>
@joshua-kim joshua-kim force-pushed the signers-config-wip branch from b449826 to 945b7a1 Compare July 8, 2025 14:40
@@ -300,12 +301,13 @@ func (s *server) AddAliasesWithReadLock(endpoint string, aliases ...string) erro

func (s *server) Shutdown() error {
ctx, cancel := context.WithTimeout(context.Background(), s.shutdownTimeout)
err := s.srv.Shutdown(ctx)
listenerErr := s.listener.Close()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't getting closed during node.Shutdown() which was causing an error starting up a new node in the same test.

@geoff-vball geoff-vball requested a review from joshua-kim July 15, 2025 20:59
Signed-off-by: Joshua Kim <20001595+joshua-kim@users.noreply.github.com>
Signed-off-by: Joshua Kim <20001595+joshua-kim@users.noreply.github.com>
Signed-off-by: Joshua Kim <20001595+joshua-kim@users.noreply.github.com>
Comment on lines +21 to +23
v := setupViperFlags(t)
conf, err := config.GetNodeConfig(v)
require.NoError(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be creating a default node in unit tests.

Specifically, there are a number of default behaviors that aren't suitable for a unit test, such as:

  • Creating the ~/.avalanchego dir
    • populating it with a database
    • staking file
    • ect
  • opening servers at 9650, and 9651

When I run this test locally, it fails due to couldn't initialize API server: listen tcp 127.0.0.1:9650: bind: address already in use because I have a node running.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really didn't want to 😅

Josh and I had conversations on this here and here. What would you suggest doing?

Comment on lines +1630 to +1631
func newStakingSigner(config any) (bls.Signer, error) {
switch cfg := config.(type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have multiple config types? We don't do this anywhere else afaik...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a downstream consequence of moving the instantiation of the signer from config to node.

config validates the config options and therefore knows what type of signer to instantiate and can pass the relevant information on. We could use config as just a translation layer between file and struct and move the validation logic to node, but that seemed like an even bigger change.

@@ -549,6 +551,95 @@ func TestGetSubnetConfigsFromFlags(t *testing.T) {
}
}

func TestGetStakingSigner(t *testing.T) {
testKey := "HLimS3vRibTMk9lZD4b+Z+GLuSBShvgbsu0WTLt2Kd4="
defaultSignerKeyTestDir := t.TempDir()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this the dataDir?

geoff-vball and others added 4 commits July 16, 2025 14:47
Co-authored-by: Stephen Buttolph <stephen@avalabs.org>
Signed-off-by: Geoff Stuart <geoff.vball@gmail.com>
Co-authored-by: Stephen Buttolph <stephen@avalabs.org>
Signed-off-by: Geoff Stuart <geoff.vball@gmail.com>
Co-authored-by: Stephen Buttolph <stephen@avalabs.org>
Signed-off-by: Geoff Stuart <geoff.vball@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress 🏗️
Development

Successfully merging this pull request may close these issues.

6 participants