Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: relay v2 discovery (go-libp2p v0.19.0) #8868

Merged
merged 14 commits into from
Apr 28, 2022
Merged

feat: relay v2 discovery (go-libp2p v0.19.0) #8868

merged 14 commits into from
Apr 28, 2022

Conversation

marten-seemann
Copy link
Member

@marten-seemann marten-seemann commented Apr 10, 2022

Release Notes: https://github.com/libp2p/go-libp2p/releases/tag/v0.19.0

go-libp2p v0.19.0 implements circuit v2 relay discovery: AutoRelays now reads peers from a channel (provided by autorelay.WithPeerSource), tests if those peers speak the relay v2 protocol and tries to obtain a reservation with these nodes.

This PR gets the closest peers from the DHT, and provides them to AutoRelay on a regular basis.

By default, AutoRelay does not use Circuit Relay v1 nodes any more. Circuit v1 support is only enabled when static relays are in use.


TODO

BREAKING CHANGES

See CHANGELOG.md

@marten-seemann marten-seemann force-pushed the update-libp2p-v019 branch 3 times, most recently from 622ea1d to e395f42 Compare April 19, 2022 12:34
@marten-seemann marten-seemann changed the title WIP update go-libp2p to v0.19.0 update go-libp2p to v0.19.0 Apr 19, 2022
@lidel lidel added this to the go-ipfs 0.14 milestone Apr 19, 2022
@lidel lidel mentioned this pull request Apr 19, 2022
4 tasks
@BigLep BigLep modified the milestones: go-ipfs 0.14, go-ipfs 0.13 Apr 19, 2022
@lidel lidel changed the title update go-libp2p to v0.19.0 feat: relay discovery and hole punching (go-libp2p v0.19.0) Apr 19, 2022
@lidel lidel changed the title feat: relay discovery and hole punching (go-libp2p v0.19.0) feat: AutoRelay V2 discovery (go-libp2p v0.19.0) Apr 19, 2022
@lidel lidel changed the title feat: AutoRelay V2 discovery (go-libp2p v0.19.0) feat: relay v2 discovery (go-libp2p v0.19.0) Apr 19, 2022
@lidel lidel force-pushed the update-libp2p-v019 branch 2 times, most recently from 187121e to 747c114 Compare April 23, 2022 00:36
@lidel
Copy link
Member

lidel commented Apr 23, 2022

Note to self:

test/sharness/t0061-daemon-opts.sh has a socat test with --disable-transport-encryption
where

/multistream/1.0.0
ls

is expected to produce

/plaintext/2.0.0

but with go-libp2p 0.19 it gets

na

because ls was removed in multiformats/go-multistream#76.
We need to refactor tests to use something else.

Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased, CI is green, if we want we could merge this for 0.13-rc1 as-is.

Caveat: test/sharness/t0182-circuit-relay.sh became flaky – could be a bug? but we could tackle it after RC1 if needed – see comment below.

@lidel lidel force-pushed the update-libp2p-v019 branch 3 times, most recently from e0ddc66 to f27dc6d Compare April 27, 2022 20:17
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is ready for final review/merging.

Added Peering.Peers and basic exponential back off.
Should be good enough for go-ipfs 0.13-rc1 – see key comments below.

Comment on lines +202 to +208
// Feed peers more often right after the bootstrap, then backoff
bo := backoff.NewExponentialBackOff()
bo.InitialInterval = 15 * time.Second
bo.Multiplier = 3
bo.MaxInterval = 1 * time.Hour
bo.MaxElapsedTime = 0 // never stop
t := backoff.NewTicker(bo)
Copy link
Member

@lidel lidel Apr 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ℹ️ This will feed peers to AutoRelay every 15s, and then exponentially back off until it happens once an hour.
Should be okay solution until we have better mechanism in future go-libp2p 0.20 (where libp2p asks for peers, instead of being fed)

I used github.com/cenkalti/backoff/v4 because it was already an indirect dependency, but lmk if I should just write backoff by hand.

Comment on lines +217 to +223
// Always feed trusted IDs (Peering.Peers in the config)
for _, trustedPeer := range cfgPeering.Peers {
if len(trustedPeer.Addrs) == 0 {
continue
}
select {
case peerChan <- trustedPeer:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ℹ️ this allows people to reuse already trusted peers from Peering.Peers in addition to peers discovered via dht.WAN.GetClosestPeers

Comment on lines 58 to +66
test_expect_success 'wait until relay is ready to do work' '
sleep 1
while ! ipfsi 2 swarm connect /p2p/$PEERID_1/p2p-circuit/p2p/$PEERID_0; do
iptb stop &&
iptb_wait_stop &&
iptb start -wait -- --routing=none &&
iptb connect 0 1 &&
iptb connect 2 1 &&
sleep 5
done
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ℹ️ below is not a blocker, imo this PR can be merged and if this is a real problem we will figure it out with RC1 being used in the real world.

⚠️ Tests in go-ipfs/test/sharness/t0182-circuit-relay.sh became flaky.

ipfsi 2 swarm connect /p2p/$PEERID_1/p2p-circuit/p2p/$PEERID_0 sometimes fails with:

NO_RESERVATION (204)
Error: connect 12D3KooWLuY1Q13o6Q91NoJn4Qv7RSq5rYS6uC4YCzKCJ3S8GnwQ failure: failed to dial 12D3KooWLuY1Q13o6Q91NoJn4Qv7RSq5rYS6uC4YCzKCJ3S8GnwQ:
  * [/p2p/12D3KooWNfs8uFpQf6NnseZKLyS3c8EFppGtUGejHpViBeg8f7Vt/p2p-circuit] error opening relay circuit: NO_RESERVATION (204)`
  • it happens randomly, bumping it to 10 seconds does not help much, it still fails sometimes.
  • shutting down nodes and starting them again fixes the problem
    • ...but it feels like covering up some underlying racy bug: once NO_RESERVATION error is returned, the relay will never work, no matter how long we wait and retry again (only way to fix it is to reboot of the relay node / testbed).

Repro (runs tests until they error):

$ killall ipfs ; i=0; while ./t0182-circuit-relay.sh -v; do echo " -----> run no. $i <----"; sleep 1; ((i=i+1)) ; done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @marten-seemann for visibility – lmk if I should fill a bug in go-libp2p, or how I can provide more details

Copy link
Member

@lidel lidel May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigation continues in #8967

go.mod Outdated Show resolved Hide resolved
@lidel lidel marked this pull request as ready for review April 27, 2022 20:47
@lidel lidel requested a review from guseggert April 27, 2022 20:47
Comment on lines +193 to +198
defer func() {
if r := recover(); r != nil {
fmt.Println("Recovering from unexpected error in AutoRelayFeeder:", r)
debug.PrintStack()
}
}()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got paranoid because of panics around go-libp2p 0.19 (e.g. libp2p/go-libp2p#1467), and since this is bolt-on temporary feeder (that may get refactored multiple times) felt prudent to decrease blast radius.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like overkill but okay :)

}
closestPeers, err := dht.WAN.GetClosestPeers(ctx, h.ID().String())
if err != nil {
// no-op: usually 'failed to find any peer in table' during startup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we log this at debug level? Or check if it's kbucket.ErrLookupFailure ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo not worth it, because this entire func will be redone after go-libp2p 0.20 ships.

core/node/groups.go Outdated Show resolved Hide resolved
lidel and others added 8 commits April 28, 2022 15:50
Old tests were no longer working because go-libp2p 0.19 removed
the undocumented 'ls' pseudoprotocol.

This replaces these tests with handshake attempt (name is echoed back on
OK or 'na' is returned when protocol is not available) for tls and noise
variants + adds explicit test that safeguards us against enabling
plaintext by default by a mistake.
test is flaky, for now we just restart the testbed when we get
NO_RESERVATION error
switched to commit which landed in go-namesys master
It starts at feeding peers ever 15s, then backs off each time
until it is done once an hour

Should be acceptable until we have smarter mechanism in go-lib2p 0.20
This ensures we feed trusted Peering.Peers in addition to any peers
discovered over DHT.
@lidel
Copy link
Member

lidel commented Apr 28, 2022

Rebased to include updated resource manager 0.3.0 from #8901, waiting for CI.

@lidel lidel merged commit 232ccb4 into master Apr 28, 2022
@lidel lidel deleted the update-libp2p-v019 branch April 28, 2022 15:13
lidel added a commit that referenced this pull request May 12, 2022
Removes fixup introduced in #8868 (comment)
so we can dig into the underlying cause
schomatis pushed a commit that referenced this pull request Jun 1, 2022
Removes fixup introduced in #8868 (comment)
so we can dig into the underlying cause
lidel added a commit that referenced this pull request Sep 1, 2022
Removes fixup introduced in #8868 (comment)
so we can dig into the underlying cause
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants