pickfirst: New pick first policy for dualstack #7498

arjan-bal · 2024-08-09T13:28:07Z

The dualstack support project will add Happy Eyeballs and change pick first to be a universal leaf LB policy. However, it details some architectural differences between C-core and Java/Go for the subchannel and pick first policy.

In Java and Go, pick first logic is implemented within the subchannel itself instead of inside the load balancer unlike C-core.

We will take this opportunity to bring gRPC to a more uniform architecture across implementations and write a new pick first policy. This is important so that different implementations do not continue to diverge as more features are implemented.

This change will include creating a PickFirstLeafLoadBalancer which will contain the pick first logic, as well as redesigning some components such as backoffs and address updates. This will set us up nicely to implement Happy Eyeballs and use pick first as a universal leaf policy.

The new pick first policy is not used by default and can be set using an environment variable. A GitHub actions workflow is created to run the tests with the env var set. Code coverage is also calculated by running with and without the env var set.

This implementation is based on the Java implementation.

RELEASE NOTES:

An experimental pick first LB policy is added which created one subconn per address. This is disabled by default until the dualstack changes are completed.

codecov · 2024-08-09T13:31:55Z

Codecov Report

Attention: Patch coverage is 87.10801% with 37 lines in your changes missing coverage. Please review.

Project coverage is 81.93%. Comparing base (11c44fb) to head (6daf9cc).
Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
balancer/pickfirstleaf/pickfirstleaf.go	86.97%	25 Missing and 12 partials ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##           master    #7498    +/-   ##
========================================
  Coverage   81.93%   81.93%            
========================================
  Files         361      362     +1     
  Lines       27816    28102   +286     
========================================
+ Hits        22790    23026   +236     
- Misses       3837     3869    +32     
- Partials     1189     1207    +18

Files with missing lines	Coverage Δ
balancer/pickfirst/pickfirst.go	`84.80% <100.00%> (+1.06%)`	⬆️
clientconn.go	`93.09% <ø> (ø)`
internal/envconfig/envconfig.go	`100.00% <ø> (ø)`
balancer/pickfirstleaf/pickfirstleaf.go	`86.97% <86.97%> (ø)`

... and 24 files with indirect coverage changes

easwars

LGTM.

balancer/pickfirstleaf/pickfirstleaf.go

balancer/pickfirstleaf/pickfirstleaf_test.go

easwars · 2024-09-09T20:49:05Z

balancer/pickfirstleaf/pickfirstleaf.go

+	switch state.ConnectivityState {
+	case connectivity.TransientFailure:
+		sd.lastErr = state.ConnectionError
+		b.cc.UpdateState(balancer.State{


I thought Eric and Doug wanted more frequent picker updates since that surfaces the recent errors in RPC failures. But I'm OK with leaving this as is since you have gone through the pain of adding a test as well. We can change it later if we decide to go that route.

easwars · 2024-09-09T20:53:26Z

I feel like at this point, I'm happy with how the code looks now and I'm very unlikely to notice anything that might improve the code. Will let @dfawley review this at this point.

@arjan-bal : Thanks for patiently handling all my comments.

dfawley · 2024-09-13T20:44:27Z

balancer/pickfirstleaf/pickfirstleaf.go

+ */
+
+// Package pickfirstleaf contains the pick_first load balancing policy which
+// will be the universal leaf policy after dualstack changes are implemented.


Should this be internal/balancer/pickfirst instead?

If not, it should probably be marked as experimental.

It will eventually replace the existing pickfirst when dualstack work is complete. Since the existing pickfirst is in /balancer, I did the same for this. Marked the package as experimental, please let me know if it should be moved instead.

dfawley · 2024-09-13T20:45:13Z

balancer/pickfirstleaf/pickfirstleaf.go

+
+func init() {
+	if envconfig.NewPickFirstEnabled {
+		internal.ShuffleAddressListForTesting = func(n int, swap func(i, j int)) { rand.Shuffle(n, swap) }


It seems this is only needed because you skip it in the default pick first's init? Maybe don't do that and delete this.

Changed. I was trying to keep the replacement of this variable in the same file so its easier to see how its being overridden.

dfawley · 2024-09-13T20:47:48Z

balancer/pickfirstleaf/pickfirstleaf.go

+		// Register as the default pick_first balancer.
+		Name = "pick_first"
+	}
+	balancer.Register(pickfirstBuilder{})


This will register it even if it's not enabled.

Maybe this package should actually be behind a build flag instead of an environment variable?

As we change petiole policies one by one to delegate to pickfirst, we would need both the existing pickfirst and pickifrstleaf policy to be available at the same time. To know which name pickfirstleaf is registered under, petiole policies can read pickfirstleaf.Name. To run all the tests for the existing pickfirst against the new policy, we need to register it as the default pickfirst.

This doesn't sound right to me. We should be able to delegate to either the old PF or the new PF, and these two migrations (delegating to PF & adding the new PF policy) should be independent. No? The new PF policy requires the new health checking, but I think we could control both with the same flag?

The existing PF will not work correctly with OD since it creates subchannels with multiple addresses. OD ignores subchannels with multiple addresses.

dfawley · 2024-09-13T20:52:39Z

balancer/pickfirstleaf/pickfirstleaf.go

+	// The serializer is used to ensure synchronization of updates triggered
+	// from the idle picker and the already serialized resolver,
+	// subconn state updates.


Can you explain why a simple mutex is not sufficient? Is there a reentrancy problem you're trying to work around? Can you outline the steps how it would be triggered? IMO a mutex would remove some complexity, and I am not seeing where it would not work.

I believe without happy eyeballs, we shouldn't even need a mutex, since all calls into the LB policy (directly or into a state watcher) are expected to be done serially already.

Presently, the requestConnection method can be called from 3 places:

UpdateClientConnState: When the balancer gets an updated endpoint list and we want to start a new connection attempt.

updateSubConnState: When a subconn transitions to TF and the balancer needs to try the next subconn.

idlePicker.Pick: When the channel is kicked out of idle, we need to start a new connection attempt if one isn't already running.

1 & 2 are serialized by the balancer wrapper, but 3 can happen in parallel with the other two. Using a mutex required reads and writes to all the variables accessed inside requestConnection to be guarded by a mutex. This turned out to most of the fields in the pickfirstBalancer. Once dualstack changes are implemented there will be 2 other sources that call pickfirst concurrently:

The happy eyeballs timer: when the timer expires, it will call requestConnection to attempt connection using the next subconn.

The health check listener: Client side health checking updates can come in parallel with calls from the channel.

Java and c-core make use of their implementations of a callback serializer to manage these, so I decided to do the same.

I think we could have a single mutex and lock it in every method that can be called from outside the LB policy to achieve synchronization, do you think that's a better way?

Java and c-core make use of their implementations of a callback serializer to manage these, so I decided to do the same.

Java and C require the callback serializer to do anything. That's not the way we manage synchronization in gRPC-Go, unless there is the potential for re-ordering of operations that would cause problems, or some kind of re-entrancy problem or something.

I think the mutex approach would be better, since it's much simpler to do the synchronous operations (no need to make a channel, schedule, block -- and reason about whether that's valid), and doesn't require spawning & managing a separate goroutine, or passing around closures.

Replaced the callback serializer with a mutex.

dfawley · 2024-09-13T20:54:33Z

balancer/pickfirstleaf/pickfirstleaf.go

+	}
+	// The picker will not change since the balancer does not currently
+	// report an error.
+	if b.state != connectivity.TransientFailure {


Is this right? If the state is Connecting and we get a resolver error, don't we want to transition to an erroring picker?

This is the behaviour of the existing pickfirst

grpc-go/balancer/pickfirst/pickfirst.go

Lines 87 to 104 in 11c44fb

func (b *pickfirstBalancer) ResolverError(err error) {

if b.logger.V(2) {

b.logger.Infof("Received error from the name resolver: %v", err)

}

if b.subConn == nil {

b.state = connectivity.TransientFailure

}

if b.state != connectivity.TransientFailure {

// The picker will not change since the balancer does not currently

// report an error.

return

}

b.cc.UpdateState(balancer.State{

ConnectivityState: connectivity.TransientFailure,

Picker: &picker{err: fmt.Errorf("name resolver error: %v", err)},

})

}

Only if the resolver produces an empty address list, we discard the working resolver state and transition to TF from UpdateClientConnState.

Is the old PF starting in Connecting, too? If so then that seems wrong. If we haven't gotten any name resolver update yet and it errors, then we should start failing RPCs.

This edge case is mentioned in the doc comment of resolverError():

// If the resolver returns an error before sending the first update,
// it is handled by the gracefulswitch balancer (which is always the top-level
// LB policy on any channel), so we don't need to handle that here.

There is also a unit test in balancer/pickfirstleaf/test/pickfirstleaf_test.go named TestPickFirstLeaf_InitialResolverError which verifies that the channel reports TF when the resolver produces an error before a valid configuration.

It's not proper to rely upon GSB for this behavior. PF will be a direct child of other LB policies, and it needs to behave appropriately.

Made the change to report TF if no valid resolver update has been received yet. Added a test case for the same. Removed the test case that depended on GSB.

balancer/pickfirstleaf/pickfirstleaf.go

dfawley · 2024-09-16T15:51:33Z

balancer/pickfirstleaf/pickfirstleaf.go

+			b.endFirstPass(scd.lastErr)
+			return
+		}
+		b.requestConnection()


This is recursive?

requestConnection tries to find the next subconn to connect. It can recurse upto # of subconns level deep if all the subconns are in TF. At this point it will end the first pass and return.

The same logic can be written as a for loop that breaks when b.addressList.increment() == false, but it adds one layer of indentation. Do you want me to remove the recursive call?

Maybe see how it looks the other way? Otherwise at least add a comment // Try the next address until we find a subchannel that isn't in transient failure. (or whatever).

Replaced the recursion with a for loop. Also updated the doc comment.

balancer/pickfirstleaf/pickfirstleaf.go

…ss_per_subconn

arjan-bal added 18 commits August 8, 2024 23:04

Use one subconn per address in pickfirst

a1f262c

address review comments

9043079

ensure subConnList is never nil

2541d51

Defer subConnList creation till conection start

f961973

Move empty subConnList handling into refreshList function

cb5acc6

Fix race conditions

22e0bd4

Replace mutex with atomic for subConnList state

258145a

Use a go routine to manage phase 1 connection attempts

eeacec3

support both pickfirst together

632f4df

Use Java style implementation

69c6c7b

Revert comment change

67b3e93

Fix revive warnings

cceb75d

Synchronize using callback serializer

0d43e4b

Prepare for running all tests with new pf

b2a092a

Fix test failures

56bfb59

Enable test using new pf

17c63d2

Fix tests

f1daad1

Add unit tests

747bd8a

arjan-bal added this to the 1.66 Release milestone Aug 9, 2024

arjan-bal added the Type: Feature New features or improvements in behavior label Aug 9, 2024

arjan-bal requested a review from easwars August 9, 2024 13:28

arjan-bal assigned easwars Aug 9, 2024

Fix vet

84194db

arjan-bal force-pushed the pickfirst_one_address_per_subconn branch from f6a52fc to 84194db Compare August 9, 2024 13:38

Calculate coverage using new pickfirst also

586b091

arjan-bal force-pushed the pickfirst_one_address_per_subconn branch from d77dd20 to 586b091 Compare August 9, 2024 16:43

Fix lint error

31e8a10

arjan-bal force-pushed the pickfirst_one_address_per_subconn branch from e44b7a2 to 31e8a10 Compare August 9, 2024 17:31

simplify serializer usage

fcb0120

arjan-bal force-pushed the pickfirst_one_address_per_subconn branch from e0290ad to f50eecd Compare September 6, 2024 10:42

Address review comments

34da793

arjan-bal force-pushed the pickfirst_one_address_per_subconn branch from f50eecd to 34da793 Compare September 6, 2024 10:46

arjan-bal requested a review from easwars September 9, 2024 18:38

arjan-bal force-pushed the pickfirst_one_address_per_subconn branch from f0796a7 to 1a0131b Compare September 9, 2024 18:50

Update picker less frequently

6a7720e

arjan-bal force-pushed the pickfirst_one_address_per_subconn branch from 1a0131b to 6a7720e Compare September 9, 2024 20:30

easwars approved these changes Sep 9, 2024

View reviewed changes

easwars removed their assignment Sep 9, 2024

Fix typos

1914445

purnesh42H modified the milestones: 1.67 Release, 1.68 Release Sep 10, 2024

dfawley reviewed Sep 16, 2024

View reviewed changes

zasweq assigned arjan-bal and unassigned dfawley Sep 17, 2024

address review comments

ef1af59

arjan-bal assigned dfawley and unassigned arjan-bal Sep 17, 2024

arjan-bal requested a review from dfawley September 17, 2024 20:56

arjan-bal added 2 commits September 18, 2024 02:42

Merge remote-tracking branch 'source/master' into pickfirst_one_addre…

45f3368

…ss_per_subconn

Fix test breakage due to grpc#7613

efa343b

arjan-bal assigned arjan-bal and unassigned dfawley Sep 18, 2024

arjan-bal added 2 commits September 18, 2024 16:04

Replace the callback serializer with a mutex

60e14d0

Convert requestConnectionLocked to iteration from recursion

88fbabe

arjan-bal assigned dfawley and unassigned arjan-bal Sep 18, 2024

arjan-bal added 2 commits September 18, 2024 16:44

Fix lint

30256a5

Report TF on resolver error

6daf9cc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pickfirst: New pick first policy for dualstack #7498

pickfirst: New pick first policy for dualstack #7498

arjan-bal commented Aug 9, 2024 •

edited

Loading

codecov bot commented Aug 9, 2024 •

edited

Loading

easwars left a comment

easwars Sep 9, 2024

easwars commented Sep 9, 2024

dfawley Sep 13, 2024

arjan-bal Sep 17, 2024

dfawley Sep 13, 2024

arjan-bal Sep 17, 2024

dfawley Sep 13, 2024

arjan-bal Sep 17, 2024

dfawley Sep 18, 2024

arjan-bal Sep 18, 2024

dfawley Sep 13, 2024

arjan-bal Sep 17, 2024

dfawley Sep 17, 2024

arjan-bal Sep 18, 2024

dfawley Sep 13, 2024

arjan-bal Sep 17, 2024

dfawley Sep 17, 2024

arjan-bal Sep 18, 2024

dfawley Sep 18, 2024

arjan-bal Sep 19, 2024

dfawley Sep 16, 2024

arjan-bal Sep 17, 2024

dfawley Sep 17, 2024

arjan-bal Sep 18, 2024

	func (b *pickfirstBalancer) ResolverError(err error) {
	if b.logger.V(2) {
	b.logger.Infof("Received error from the name resolver: %v", err)
	}
	if b.subConn == nil {
	b.state = connectivity.TransientFailure
	}

	if b.state != connectivity.TransientFailure {
	// The picker will not change since the balancer does not currently
	// report an error.
	return
	}
	b.cc.UpdateState(balancer.State{
	ConnectivityState: connectivity.TransientFailure,
	Picker: &picker{err: fmt.Errorf("name resolver error: %v", err)},
	})
	}

pickfirst: New pick first policy for dualstack #7498

Are you sure you want to change the base?

pickfirst: New pick first policy for dualstack #7498

Conversation

arjan-bal commented Aug 9, 2024 • edited Loading

codecov bot commented Aug 9, 2024 • edited Loading

Codecov Report

easwars left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

easwars commented Sep 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arjan-bal commented Aug 9, 2024 •

edited

Loading

codecov bot commented Aug 9, 2024 •

edited

Loading