Skip to content
This repository has been archived by the owner on May 26, 2022. It is now read-only.

hole punching support #194

Merged
merged 8 commits into from
Jul 5, 2021
Merged

hole punching support #194

merged 8 commits into from
Jul 5, 2021

Conversation

vyzo
Copy link
Contributor

@vyzo vyzo commented Feb 19, 2021

Needs testing, but feel free to review.

See libp2p/go-libp2p#1048

transport.go Outdated Show resolved Hide resolved
@aarshkshah1992
Copy link
Collaborator

aarshkshah1992 commented Feb 19, 2021

I don't understand the QUIC codebase enough to be able to give this a meaningful review.

Requesting @marten-seemann to please take a look.

transport.go Outdated Show resolved Hide resolved
transport.go Outdated Show resolved Hide resolved
transport.go Outdated Show resolved Hide resolved
transport.go Outdated Show resolved Hide resolved
transport.go Outdated Show resolved Hide resolved
transport.go Outdated
if bytes.Compare([]byte(t.localPeer), []byte(p)) < 0 {
err = t.holePunch(network, addr)
if err == nil {
err = errors.New("hole punching attempted; no active dial")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this an error?

Copy link
Contributor Author

@vyzo vyzo Feb 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is the tricky part.

Ideally we would return the connection accepted by listen, but this is problematic:

  1. we need to hook into the listen and intercept that
  2. the connection would be returned to the swarm twice, which is very undesirable for obvious reasons.
  3. We also need to signal to the swarm that we are done trying to connect, and propagate this to the caller.

I don't think it is safe to return nil, nil here to the swarm, as it will probably crash if it doesn't get a connection and no error.

So I couldn't find anything better to do, than just return an error and let the caller (who actually supplied the context option after all) to check for the incoming connection.

Any ideas for improving this situation are welcome!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Granted, this should probably be a named error (if we decide to keep this logic) so that the caller knows to wait for the incoming connection and not have to guess.

Copy link
Contributor Author

@vyzo vyzo Feb 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, now that I think about it, if the incoming connection succeeds the Connect call will not return an error to the caller.
I still think we need to return an error to the swarm (so that it doesn't expect a connection) though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point me to where this ctx is configured and Dial is called? I tried grepping through the libp2p workspace, but the only occurrence for WithSimultaneousConnect is in go-libp2p-core, so I assume it's only in a PR at the moment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's brand new -- it's gonna come from this pr: libp2p/go-libp2p#1057
I don't think aarsh has added yet, but he will do very soon.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with returning a named error here. Hooking into Listen and catching the connection there might be another (more complicated) solution, but I'm not sure if we're going to run into problems if both peers think that they're a client.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I am not sure either but I think that the swarm is not going to like it.
I'll make a named error for this, maybe it should live in core however?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added named error.

Copy link
Collaborator

@aarshkshah1992 aarshkshah1992 Feb 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vyzo

Actually, now that I think about it, if the incoming connection succeeds the Connect 
call will not return an error to the caller.

Is there some magic in Swarm that does NOT return an error for the Connect call even if the dial fails but an incoming succeeds ? Please can you point me to it ?

transport.go Outdated Show resolved Hide resolved
listener.go Outdated Show resolved Hide resolved
transport.go Outdated Show resolved Hide resolved
transport.go Show resolved Hide resolved
listener.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@marten-seemann marten-seemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM. Thank you @vyzo!

Once we have a test case this should be good to merge.

@vyzo
Copy link
Contributor Author

vyzo commented Feb 22, 2021

what's this stuff aarsh? I am going to drop these commits when we are to merge.

@aarshkshah1992
Copy link
Collaborator

@vyzo It's all debugging. We'll drop it all. DW

@vyzo vyzo force-pushed the feat/holepunch branch 3 times, most recently from 5bb6967 to 71724d9 Compare February 22, 2021 10:55
@aschmahmann aschmahmann marked this pull request as draft March 15, 2021 15:49
@marten-seemann marten-seemann self-assigned this Jun 30, 2021
@marten-seemann marten-seemann changed the title [WIP] hole punching support hole punching support Jun 30, 2021
listener.go Outdated
holePunch, ok := l.transport.holePunching[hpkey]
l.transport.holePunchingMx.Unlock()
if ok {
select {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a race condition here: If we receive 2 connections before the other go routine has deleted this entry from the holePunching map, we'll block here until the context is done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It's fixed now, right?
  2. Why not delete from the hole-punching map when we first access it? That should fix all the problems here, right?

Copy link
Member

@Stebalien Stebalien Jul 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, if we one-buffer the channel and atomically delete from the map, we should be able to do everything under the same lock:

        l.transport.holePunchingMx.Lock()
        holePunch, ok := l.transport.holePunching[key]
        if ok {
            holePunch.connCh <- conn
            delete(l.transport.holePunching, key)
        }
        l.transport.holePunchingMx.Unlock()

Probably not necessary, but may make things easier to reason about.

@marten-seemann marten-seemann marked this pull request as ready for review June 30, 2021 18:18
@marten-seemann
Copy link
Collaborator

@Stebalien Can I ask you for a review here? I think I fixed a race condition that could lead to a deadlock when we'd two connection from the same address, but I'd like to get a second set of eyes before merging this.

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly: make the handoff more atomic.

listener.go Outdated
holePunch, ok := l.transport.holePunching[hpkey]
l.transport.holePunchingMx.Unlock()
if ok {
select {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It's fixed now, right?
  2. Why not delete from the hole-punching map when we first access it? That should fix all the problems here, right?

listener.go Outdated Show resolved Hide resolved
transport.go Outdated Show resolved Hide resolved
transport.go Outdated Show resolved Hide resolved
transport.go Outdated Show resolved Hide resolved
listener.go Outdated
holePunch, ok := l.transport.holePunching[hpkey]
l.transport.holePunchingMx.Unlock()
if ok {
select {
Copy link
Member

@Stebalien Stebalien Jul 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, if we one-buffer the channel and atomically delete from the map, we should be able to do everything under the same lock:

        l.transport.holePunchingMx.Lock()
        holePunch, ok := l.transport.holePunching[key]
        if ok {
            holePunch.connCh <- conn
            delete(l.transport.holePunching, key)
        }
        l.transport.holePunchingMx.Unlock()

Probably not necessary, but may make things easier to reason about.

transport.go Show resolved Hide resolved
transport.go Outdated Show resolved Hide resolved
@Stebalien
Copy link
Member

(also happy to walk over this in-person as well)

@marten-seemann
Copy link
Collaborator

@Stebalien

  • fixed the timer
  • fixed the race condition when accepting connections (this time for real, hopefully)

listener.go Outdated Show resolved Hide resolved
listener.go Outdated Show resolved Hide resolved
transport.go Outdated
@@ -245,3 +344,7 @@ func (t *transport) Protocols() []int {
func (t *transport) String() string {
return "QUIC"
}

func holePunchKey(addr net.Addr, id peer.ID) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can also just use a custom type holePunchKey{addr string, id peer.ID}. Not critical, but that would be slightly easier to reason about.

@marten-seemann marten-seemann merged commit 38ad660 into master Jul 5, 2021
@aschmahmann aschmahmann mentioned this pull request Aug 23, 2021
62 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants