-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pickfirst: New pick first policy for dualstack #7498
base: master
Are you sure you want to change the base?
pickfirst: New pick first policy for dualstack #7498
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7498 +/- ##
========================================
Coverage 81.93% 81.93%
========================================
Files 361 362 +1
Lines 27816 28102 +286
========================================
+ Hits 22790 23026 +236
- Misses 3837 3869 +32
- Partials 1189 1207 +18
|
f6a52fc
to
84194db
Compare
d77dd20
to
586b091
Compare
e44b7a2
to
31e8a10
Compare
e0290ad
to
f50eecd
Compare
f50eecd
to
34da793
Compare
f0796a7
to
1a0131b
Compare
1a0131b
to
6a7720e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
switch state.ConnectivityState { | ||
case connectivity.TransientFailure: | ||
sd.lastErr = state.ConnectionError | ||
b.cc.UpdateState(balancer.State{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought Eric and Doug wanted more frequent picker updates since that surfaces the recent errors in RPC failures. But I'm OK with leaving this as is since you have gone through the pain of adding a test as well. We can change it later if we decide to go that route.
I feel like at this point, I'm happy with how the code looks now and I'm very unlikely to notice anything that might improve the code. Will let @dfawley review this at this point. @arjan-bal : Thanks for patiently handling all my comments. |
*/ | ||
|
||
// Package pickfirstleaf contains the pick_first load balancing policy which | ||
// will be the universal leaf policy after dualstack changes are implemented. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be internal/balancer/pickfirst
instead?
If not, it should probably be marked as experimental.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will eventually replace the existing pickfirst
when dualstack work is complete. Since the existing pickfirst
is in /balancer
, I did the same for this. Marked the package as experimental, please let me know if it should be moved instead.
|
||
func init() { | ||
if envconfig.NewPickFirstEnabled { | ||
internal.ShuffleAddressListForTesting = func(n int, swap func(i, j int)) { rand.Shuffle(n, swap) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this is only needed because you skip it in the default pick first's init? Maybe don't do that and delete this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed. I was trying to keep the replacement of this variable in the same file so its easier to see how its being overridden.
// Register as the default pick_first balancer. | ||
Name = "pick_first" | ||
} | ||
balancer.Register(pickfirstBuilder{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will register it even if it's not enabled.
Maybe this package should actually be behind a build flag instead of an environment variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we change petiole policies one by one to delegate to pickfirst
, we would need both the existing pickfirst
and pickifrstleaf
policy to be available at the same time. To know which name pickfirstleaf
is registered under, petiole policies can read pickfirstleaf.Name
. To run all the tests for the existing pickfirst
against the new policy, we need to register it as the default pickfirst
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't sound right to me. We should be able to delegate to either the old PF or the new PF, and these two migrations (delegating to PF & adding the new PF policy) should be independent. No? The new PF policy requires the new health checking, but I think we could control both with the same flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing PF will not work correctly with OD since it creates subchannels with multiple addresses. OD ignores subchannels with multiple addresses.
// The serializer is used to ensure synchronization of updates triggered | ||
// from the idle picker and the already serialized resolver, | ||
// subconn state updates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why a simple mutex is not sufficient? Is there a reentrancy problem you're trying to work around? Can you outline the steps how it would be triggered? IMO a mutex would remove some complexity, and I am not seeing where it would not work.
I believe without happy eyeballs, we shouldn't even need a mutex, since all calls into the LB policy (directly or into a state watcher) are expected to be done serially already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presently, the requestConnection
method can be called from 3 places:
- UpdateClientConnState: When the balancer gets an updated endpoint list and we want to start a new connection attempt.
- updateSubConnState: When a subconn transitions to TF and the balancer needs to try the next subconn.
- idlePicker.Pick: When the channel is kicked out of idle, we need to start a new connection attempt if one isn't already running.
1 & 2 are serialized by the balancer wrapper, but 3 can happen in parallel with the other two. Using a mutex required reads and writes to all the variables accessed inside requestConnection
to be guarded by a mutex. This turned out to most of the fields in the pickfirstBalancer
. Once dualstack changes are implemented there will be 2 other sources that call pickfirst
concurrently:
- The happy eyeballs timer: when the timer expires, it will call
requestConnection
to attempt connection using the next subconn. - The health check listener: Client side health checking updates can come in parallel with calls from the channel.
Java and c-core make use of their implementations of a callback serializer to manage these, so I decided to do the same.
I think we could have a single mutex and lock it in every method that can be called from outside the LB policy to achieve synchronization, do you think that's a better way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java and c-core make use of their implementations of a callback serializer to manage these, so I decided to do the same.
Java and C require the callback serializer to do anything. That's not the way we manage synchronization in gRPC-Go, unless there is the potential for re-ordering of operations that would cause problems, or some kind of re-entrancy problem or something.
I think the mutex approach would be better, since it's much simpler to do the synchronous operations (no need to make a channel, schedule, block -- and reason about whether that's valid), and doesn't require spawning & managing a separate goroutine, or passing around closures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced the callback serializer with a mutex.
} | ||
// The picker will not change since the balancer does not currently | ||
// report an error. | ||
if b.state != connectivity.TransientFailure { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this right? If the state is Connecting
and we get a resolver error, don't we want to transition to an erroring picker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the behaviour of the existing pickfirst
grpc-go/balancer/pickfirst/pickfirst.go
Lines 87 to 104 in 11c44fb
func (b *pickfirstBalancer) ResolverError(err error) { | |
if b.logger.V(2) { | |
b.logger.Infof("Received error from the name resolver: %v", err) | |
} | |
if b.subConn == nil { | |
b.state = connectivity.TransientFailure | |
} | |
if b.state != connectivity.TransientFailure { | |
// The picker will not change since the balancer does not currently | |
// report an error. | |
return | |
} | |
b.cc.UpdateState(balancer.State{ | |
ConnectivityState: connectivity.TransientFailure, | |
Picker: &picker{err: fmt.Errorf("name resolver error: %v", err)}, | |
}) | |
} |
Only if the resolver produces an empty address list, we discard the working resolver state and transition to TF from UpdateClientConnState
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the old PF starting in Connecting
, too? If so then that seems wrong. If we haven't gotten any name resolver update yet and it errors, then we should start failing RPCs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This edge case is mentioned in the doc comment of resolverError()
:
// If the resolver returns an error before sending the first update,
// it is handled by the gracefulswitch balancer (which is always the top-level
// LB policy on any channel), so we don't need to handle that here.
There is also a unit test in balancer/pickfirstleaf/test/pickfirstleaf_test.go
named TestPickFirstLeaf_InitialResolverError
which verifies that the channel reports TF when the resolver produces an error before a valid configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not proper to rely upon GSB for this behavior. PF will be a direct child of other LB policies, and it needs to behave appropriately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the change to report TF if no valid resolver update has been received yet. Added a test case for the same. Removed the test case that depended on GSB.
b.endFirstPass(scd.lastErr) | ||
return | ||
} | ||
b.requestConnection() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is recursive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requestConnection
tries to find the next subconn to connect. It can recurse upto # of subconns
level deep if all the subconns are in TF. At this point it will end the first pass and return.
The same logic can be written as a for loop that breaks when b.addressList.increment() == false
, but it adds one layer of indentation. Do you want me to remove the recursive call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe see how it looks the other way? Otherwise at least add a comment // Try the next address until we find a subchannel that isn't in transient failure.
(or whatever).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced the recursion with a for loop. Also updated the doc comment.
The dualstack support project will add Happy Eyeballs and change pick first to be a universal leaf LB policy. However, it details some architectural differences between C-core and Java/Go for the subchannel and pick first policy.
In Java and Go, pick first logic is implemented within the subchannel itself instead of inside the load balancer unlike C-core.
We will take this opportunity to bring gRPC to a more uniform architecture across implementations and write a new pick first policy. This is important so that different implementations do not continue to diverge as more features are implemented.
This change will include creating a PickFirstLeafLoadBalancer which will contain the pick first logic, as well as redesigning some components such as backoffs and address updates. This will set us up nicely to implement Happy Eyeballs and use pick first as a universal leaf policy.
The new pick first policy is not used by default and can be set using an environment variable. A GitHub actions workflow is created to run the tests with the env var set. Code coverage is also calculated by running with and without the env var set.
This implementation is based on the Java implementation.
RELEASE NOTES: