Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pickfirst: New pick first policy for dualstack #7498

Open
wants to merge 53 commits into
base: master
Choose a base branch
from

Conversation

arjan-bal
Copy link
Contributor

@arjan-bal arjan-bal commented Aug 9, 2024

The dualstack support project will add Happy Eyeballs and change pick first to be a universal leaf LB policy. However, it details some architectural differences between C-core and Java/Go for the subchannel and pick first policy.

In Java and Go, pick first logic is implemented within the subchannel itself instead of inside the load balancer unlike C-core.

We will take this opportunity to bring gRPC to a more uniform architecture across implementations and write a new pick first policy. This is important so that different implementations do not continue to diverge as more features are implemented.

This change will include creating a PickFirstLeafLoadBalancer which will contain the pick first logic, as well as redesigning some components such as backoffs and address updates. This will set us up nicely to implement Happy Eyeballs and use pick first as a universal leaf policy.

The new pick first policy is not used by default and can be set using an environment variable. A GitHub actions workflow is created to run the tests with the env var set. Code coverage is also calculated by running with and without the env var set.

This implementation is based on the Java implementation.

RELEASE NOTES:

  • An experimental pick first LB policy is added which created one subconn per address. This is disabled by default until the dualstack changes are completed.

@arjan-bal arjan-bal added this to the 1.66 Release milestone Aug 9, 2024
@arjan-bal arjan-bal added the Type: Feature New features or improvements in behavior label Aug 9, 2024
@arjan-bal arjan-bal requested a review from easwars August 9, 2024 13:28
Copy link

codecov bot commented Aug 9, 2024

Codecov Report

Attention: Patch coverage is 87.10801% with 37 lines in your changes missing coverage. Please review.

Project coverage is 81.93%. Comparing base (11c44fb) to head (6daf9cc).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
balancer/pickfirstleaf/pickfirstleaf.go 86.97% 25 Missing and 12 partials ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           master    #7498    +/-   ##
========================================
  Coverage   81.93%   81.93%            
========================================
  Files         361      362     +1     
  Lines       27816    28102   +286     
========================================
+ Hits        22790    23026   +236     
- Misses       3837     3869    +32     
- Partials     1189     1207    +18     
Files with missing lines Coverage Δ
balancer/pickfirst/pickfirst.go 84.80% <100.00%> (+1.06%) ⬆️
clientconn.go 93.09% <ø> (ø)
internal/envconfig/envconfig.go 100.00% <ø> (ø)
balancer/pickfirstleaf/pickfirstleaf.go 86.97% <86.97%> (ø)

... and 24 files with indirect coverage changes

@arjan-bal arjan-bal force-pushed the pickfirst_one_address_per_subconn branch from f6a52fc to 84194db Compare August 9, 2024 13:38
@arjan-bal arjan-bal force-pushed the pickfirst_one_address_per_subconn branch from d77dd20 to 586b091 Compare August 9, 2024 16:43
@arjan-bal arjan-bal force-pushed the pickfirst_one_address_per_subconn branch from e44b7a2 to 31e8a10 Compare August 9, 2024 17:31
@arjan-bal arjan-bal force-pushed the pickfirst_one_address_per_subconn branch from e0290ad to f50eecd Compare September 6, 2024 10:42
@arjan-bal arjan-bal force-pushed the pickfirst_one_address_per_subconn branch from f50eecd to 34da793 Compare September 6, 2024 10:46
@arjan-bal arjan-bal force-pushed the pickfirst_one_address_per_subconn branch from f0796a7 to 1a0131b Compare September 9, 2024 18:50
@arjan-bal arjan-bal force-pushed the pickfirst_one_address_per_subconn branch from 1a0131b to 6a7720e Compare September 9, 2024 20:30
Copy link
Contributor

@easwars easwars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

balancer/pickfirstleaf/pickfirstleaf.go Outdated Show resolved Hide resolved
balancer/pickfirstleaf/pickfirstleaf_test.go Outdated Show resolved Hide resolved
switch state.ConnectivityState {
case connectivity.TransientFailure:
sd.lastErr = state.ConnectionError
b.cc.UpdateState(balancer.State{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought Eric and Doug wanted more frequent picker updates since that surfaces the recent errors in RPC failures. But I'm OK with leaving this as is since you have gone through the pain of adding a test as well. We can change it later if we decide to go that route.

@easwars easwars removed their assignment Sep 9, 2024
@easwars
Copy link
Contributor

easwars commented Sep 9, 2024

I feel like at this point, I'm happy with how the code looks now and I'm very unlikely to notice anything that might improve the code. Will let @dfawley review this at this point.

@arjan-bal : Thanks for patiently handling all my comments.

@purnesh42H purnesh42H modified the milestones: 1.67 Release, 1.68 Release Sep 10, 2024
*/

// Package pickfirstleaf contains the pick_first load balancing policy which
// will be the universal leaf policy after dualstack changes are implemented.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be internal/balancer/pickfirst instead?

If not, it should probably be marked as experimental.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will eventually replace the existing pickfirst when dualstack work is complete. Since the existing pickfirst is in /balancer, I did the same for this. Marked the package as experimental, please let me know if it should be moved instead.


func init() {
if envconfig.NewPickFirstEnabled {
internal.ShuffleAddressListForTesting = func(n int, swap func(i, j int)) { rand.Shuffle(n, swap) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this is only needed because you skip it in the default pick first's init? Maybe don't do that and delete this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed. I was trying to keep the replacement of this variable in the same file so its easier to see how its being overridden.

// Register as the default pick_first balancer.
Name = "pick_first"
}
balancer.Register(pickfirstBuilder{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will register it even if it's not enabled.

Maybe this package should actually be behind a build flag instead of an environment variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we change petiole policies one by one to delegate to pickfirst, we would need both the existing pickfirst and pickifrstleaf policy to be available at the same time. To know which name pickfirstleaf is registered under, petiole policies can read pickfirstleaf.Name. To run all the tests for the existing pickfirst against the new policy, we need to register it as the default pickfirst.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't sound right to me. We should be able to delegate to either the old PF or the new PF, and these two migrations (delegating to PF & adding the new PF policy) should be independent. No? The new PF policy requires the new health checking, but I think we could control both with the same flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing PF will not work correctly with OD since it creates subchannels with multiple addresses. OD ignores subchannels with multiple addresses.

Comment on lines 143 to 145
// The serializer is used to ensure synchronization of updates triggered
// from the idle picker and the already serialized resolver,
// subconn state updates.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why a simple mutex is not sufficient? Is there a reentrancy problem you're trying to work around? Can you outline the steps how it would be triggered? IMO a mutex would remove some complexity, and I am not seeing where it would not work.

I believe without happy eyeballs, we shouldn't even need a mutex, since all calls into the LB policy (directly or into a state watcher) are expected to be done serially already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presently, the requestConnection method can be called from 3 places:

  1. UpdateClientConnState: When the balancer gets an updated endpoint list and we want to start a new connection attempt.
  2. updateSubConnState: When a subconn transitions to TF and the balancer needs to try the next subconn.
  3. idlePicker.Pick: When the channel is kicked out of idle, we need to start a new connection attempt if one isn't already running.

1 & 2 are serialized by the balancer wrapper, but 3 can happen in parallel with the other two. Using a mutex required reads and writes to all the variables accessed inside requestConnection to be guarded by a mutex. This turned out to most of the fields in the pickfirstBalancer. Once dualstack changes are implemented there will be 2 other sources that call pickfirst concurrently:

  1. The happy eyeballs timer: when the timer expires, it will call requestConnection to attempt connection using the next subconn.
  2. The health check listener: Client side health checking updates can come in parallel with calls from the channel.

Java and c-core make use of their implementations of a callback serializer to manage these, so I decided to do the same.

I think we could have a single mutex and lock it in every method that can be called from outside the LB policy to achieve synchronization, do you think that's a better way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java and c-core make use of their implementations of a callback serializer to manage these, so I decided to do the same.

Java and C require the callback serializer to do anything. That's not the way we manage synchronization in gRPC-Go, unless there is the potential for re-ordering of operations that would cause problems, or some kind of re-entrancy problem or something.

I think the mutex approach would be better, since it's much simpler to do the synchronous operations (no need to make a channel, schedule, block -- and reason about whether that's valid), and doesn't require spawning & managing a separate goroutine, or passing around closures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced the callback serializer with a mutex.

}
// The picker will not change since the balancer does not currently
// report an error.
if b.state != connectivity.TransientFailure {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right? If the state is Connecting and we get a resolver error, don't we want to transition to an erroring picker?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the behaviour of the existing pickfirst

func (b *pickfirstBalancer) ResolverError(err error) {
if b.logger.V(2) {
b.logger.Infof("Received error from the name resolver: %v", err)
}
if b.subConn == nil {
b.state = connectivity.TransientFailure
}
if b.state != connectivity.TransientFailure {
// The picker will not change since the balancer does not currently
// report an error.
return
}
b.cc.UpdateState(balancer.State{
ConnectivityState: connectivity.TransientFailure,
Picker: &picker{err: fmt.Errorf("name resolver error: %v", err)},
})
}

Only if the resolver produces an empty address list, we discard the working resolver state and transition to TF from UpdateClientConnState.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the old PF starting in Connecting, too? If so then that seems wrong. If we haven't gotten any name resolver update yet and it errors, then we should start failing RPCs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This edge case is mentioned in the doc comment of resolverError():

// If the resolver returns an error before sending the first update,
// it is handled by the gracefulswitch balancer (which is always the top-level
// LB policy on any channel), so we don't need to handle that here.

There is also a unit test in balancer/pickfirstleaf/test/pickfirstleaf_test.go named TestPickFirstLeaf_InitialResolverError which verifies that the channel reports TF when the resolver produces an error before a valid configuration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not proper to rely upon GSB for this behavior. PF will be a direct child of other LB policies, and it needs to behave appropriately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the change to report TF if no valid resolver update has been received yet. Added a test case for the same. Removed the test case that depended on GSB.

balancer/pickfirstleaf/pickfirstleaf.go Show resolved Hide resolved
b.endFirstPass(scd.lastErr)
return
}
b.requestConnection()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is recursive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requestConnection tries to find the next subconn to connect. It can recurse upto # of subconns level deep if all the subconns are in TF. At this point it will end the first pass and return.

The same logic can be written as a for loop that breaks when b.addressList.increment() == false, but it adds one layer of indentation. Do you want me to remove the recursive call?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe see how it looks the other way? Otherwise at least add a comment // Try the next address until we find a subchannel that isn't in transient failure. (or whatever).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced the recursion with a for loop. Also updated the doc comment.

balancer/pickfirstleaf/pickfirstleaf.go Show resolved Hide resolved
balancer/pickfirstleaf/pickfirstleaf.go Show resolved Hide resolved
balancer/pickfirstleaf/pickfirstleaf.go Show resolved Hide resolved
@zasweq zasweq assigned arjan-bal and unassigned dfawley Sep 17, 2024
@arjan-bal arjan-bal assigned dfawley and unassigned arjan-bal Sep 17, 2024
@arjan-bal arjan-bal assigned arjan-bal and unassigned dfawley Sep 18, 2024
@arjan-bal arjan-bal assigned dfawley and unassigned arjan-bal Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature New features or improvements in behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants