-
Notifications
You must be signed in to change notification settings - Fork 930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AutoNAT requests time out #3986
Comments
I do wonder how long it takes for a dial to timeout. Is this configurable? Does this depend on the OS or transport? It seems increasing the AutoNAT timeout to 5 minutes works. A TCP dial fails after about 125s in my setup. I assume that means the AutoNAT timeout should currently at least be above the 125s. In that case, AutoNAT will report back a Looks like this started to arise for me when multiple AutoNAT servers are added. With only a single server that is constantly probed, the server will eventually get a |
Two things:
|
Two separate issues:
I am not fully sure of the details of the test, but there could be a few things at play:
|
Interesting, thank you for sharing! I think we can draw two conclusions from this:
|
Yesterday I briefly looked at the Linux settings for this, but I was unable to find exactly what to change (lots of settings that relate to SYN/ACK retries and timings). However, I think it's more appropriate to come up with a user land solution. E.g, if the socket |
Using reasonable timeouts within rust-libp2p, instead of requiring operators to change their TCP stack, sounds reasonable to me. You could even do so within the |
You mean within the autonat server implementation? |
Yes, the |
Summary
About 20 publicy reachable peers are added as AutoNAT servers and are probed. From behind a NAT. After the timeout (set to 15s), AutoNAT gives an error
OutboundRequest(Timeout)
and probes another peer again. This continues, but sometimes aResponse(DialError)
is reported and that results in an AutoNAT status change.It's unclear to me why the
OutboundRequest(Timeout)
occurs. This stems from therequest_response
protocol. I assume an AutoNAT server can't dial a NAT peer, and takes too long to answer the dial-back probe request.Expected behaviour
Either the timeouts not to happen, or the timeouts to cause the AutoNAT status to change to
Private
eventually.Actual behaviour
Sometimes it takes more than a few minutes (up to 20) for a node to determine it's
Private
. This only happens after aDialError
, no theTimeout
.Debug Output
Version
Would you like to work on fixing this bug?
Maybe.
The text was updated successfully, but these errors were encountered: