-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: configure server keepalive, optimize client balancer with health check #8477
Conversation
bd621e1
to
c55d608
Compare
Is this with posting ep1 to |
That happens when we only post ep2 (ep1 is gray-listed), which is step 17. |
03b521f
to
3fff472
Compare
9a51d46
to
7a27a45
Compare
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
balancer will have to be a separate patch before keepalive can be merged in; I'm trying to refactor this into something that can cleanly work with partition failover as well. simpleBalancer
is getting too complicated to reason about
if len(addrs) == 0 { // no better alternative found | ||
addrs = b.addrs | ||
} else { // sort that latest failed be at the end | ||
addrConns := make([]addrConn, 0, len(b.failed)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does grpc make any guarantees about the ordering? my understanding is it can try all the connections at once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, there's no ordering and gRPC tries all at once. Still, goroutine starts in order.
Anyway, this wasn't necessary.
@heyitsanthony Agree. This is getting too complicated. I will separate out server options, first. |
there is something call // pickFirst is used to test multi-addresses in one addrConn in which all addresses share the same addrConn.
// It is a wrapper around roundRobin balancer. The logic of all methods works fine because balancer.Get()
// returns the only address Up by resetTransport().
type pickFirst struct {
*roundRobin
}
func pickFirstBalancer(r naming.Resolver) Balancer {
return &pickFirst{&roundRobin{r: r}}
} https://github.com/grpc/grpc-go/blob/master/balancer.go#L399-L408 |
@fanminshi |
Update: closed in favor of #8545. |
keepalive timed-out is 'connectivity.TransientFailure'
in gRPC; it keeps retrying (calling 'Balancer.Up') until
success. This is problematic in multi-endpoint balancer
with an endpoint being blackholed. Balancer can get stuck
retrying blackholed endpoint, taking several seconds to
find healthy ones.
Also gray-listing(#8463) endpoints doesn't work in following case:
At step 17 above, we should instead notify both ep1 and ep2.
If we exclude gray-listed ep1, balancer gets stuck with stopped endpoint ep2.
Problem is:
"transport: Error while dialing dial unix ep2: connect: no such file or directory"
This PR adds additional health-check API call to discover endpoint status on endpoint notify.