P5: Require a gRFC for large scale changes to gRPC core #233

ctiller · 2021-04-29T21:00:10Z

No description provided.

chwarr

Noticed a few typos.

P5-core-large-scale-changes.md

Co-authored-by: Christopher Warrington <chwarr@microsoft.com>

ctiller · 2021-04-30T15:21:47Z

Noticed a few typos.

Thanks for the edits!

ctiller · 2021-05-06T15:53:41Z

@markdroth per our conversation I've modified this to include a lighter weight update process.

markdroth

Sorry for the delay in reviewing! Please let me know what you think.

P5-core-large-scale-changes.md

markdroth · 2021-05-13T18:32:08Z

P5-core-large-scale-changes.md

+
+Such changes include:
+* Changes to extensibility points, or the types used by extensibility points.
+* Changes that require downstream consumers of gRPC Core to modify their code.


Isn't this one already covered by P3? If not, can you give an example of what would be included in this?

I think I was thinking about internal extensibility... I'll clean this one up.

markdroth · 2021-05-13T18:32:25Z

P5-core-large-scale-changes.md

+* Changes that require downstream consumers of gRPC Core to modify their code.
+* Changes with a significant performance impact.
+* Changes to vocabulary types - types that are used broadly within the library.
+* Changes that constrain future development.


Can you give an example of this?

One that's happened before: L6 introduced the restriction that a C++11 compiler was required to compile Core, L59 required a standard library be dynamically linkable. Neither formally required a gRFC (no API changes), but we did so anyway because it felt right - this tries to formalize that.

New changes that might fall into that: taking a large dependency, refactoring in a way that prevents a known use case... things that would require a large scale change to undo.

Changed text.

markdroth · 2021-05-13T20:19:17Z

P5-core-large-scale-changes.md

+A large scale change is one that has a wide-ranging impact on the implementation of gRPC Core.
+
+Such changes include:
+* Changes to extensibility points, or the types used by extensibility points.


I think this bullet in particular needs to be more nuanced. I do absolutely agree that a gRFC is appropriate for large structural changes to these kinds of APIs, but I think we also have a lot of changes to these APIs that are smaller in scope, where requiring a gRFC doesn't seem like a positive cost/benefit trade-off.

In particular, the resolver and LB policy APIs have been seeing a fair amount of small changes over time, some as part of the xDS work and others just as we evolve the client channel architecture, which we've currently been free to do since they are not yet public APIs. It's been very convenient to make these kinds of changes happen slowly over time, and I think that would be a much heavier-weight approach if every small change required a gRFC.

To make this more concerete, here are some examples of actual changes we've made to these APIs over the years, grouped based on whether or not I think the change should require a gRFC moving forward:

First, the changes that I think should require a gRFC:

Add ResultHandler to Resolver API grpc#17987 (restructured resolver API to add ResultHandler interface) and Restructure how addresses and service config are passed from resolver to LB policy grpc#18357 (restructure how addresses flow from resolver to LB policy)

LB policy picker API grpc#17770 (introduced the LB policy picker API)

Here are a few that I think are borderline; the scope or impact of the changes probably justifies a gRFC, but the fact that there don't happen to be any external LB policy implementations today mean that the changes didn't affect anyone:

Pass address to CreateSubchannel() and expose attributes in SubchannelInterface. grpc#24172 (pass address to LB policy helper CreateSubchannel() separately instead of encoding in channel args)

Add MetadataInterface abstraction to LB policy API. grpc#19405 (add metadata abstraction to LB policy API) and Use a more standard iteration interface for LB policy metadata API. grpc#20443 (improve metadata iteration interface used by LB policies)

Use SubchannelInterface to hide implementation from LB policy API grpc#18917 (use an interface to hide real subchannel impl from LB policies) and Second attempt: Hide ConnectedSubchannel from LB policy API. grpc#19390 (hide ConnectedSubchannel from LB policy API)

Remove channelz from LB policy API. grpc#19066 (remove channelz from LB policy API)

More LB policy API improvements. grpc#19049 (various LB policy API improvements)

Add API for accessing per-call backend metric data in LB policies. grpc#19036 (add API for LB policies to access backend metric data)

Make service config ref-counted and hold refs to it in LB config. grpc#18081 (restructure how LB policies get their config)

And here are the ones that I think are small enough that they should not require a gRFC:

Implement google-c2p resolver. grpc#25215 (changed resolver base class to no longer take ownership of WorkSerializer or ResultHandler)

Make request path more easily visible to LB policies. grpc#23417 (make request path more efficient to access in LB policies)

Add experimental call attribute accessor method to LoadBalancingPolicy::CallState. grpc#23024 (add experimental call attributes API for LB policies)

Use std::function<> for recv_trailing_metadata callback in LB policy API. grpc#20441 (use std::function<> for LB policy callback to intercept trailing metadata)

LB policy API changes suggested by Sanjay. grpc#20090 (some cosmetic changes to the LB policy API suggested by Sanjay)

Use LRS to do client-side load reporting grpc#19394 (pass error to LB policy trailing metadata callback)

Second attempt: Simplify LB policy and resolver shutdown grpc#19389 (simplified shutdown of both resolvers and LB policies)

Remove CreateChannel() method from LB helper API. grpc#19038 (remove CreateChannel() from LB policy helper API)

Remove error from connectivity state tracking. grpc#18628 (remove error from connectivity state reporting) and then later Plumb absl::Status through connectivity state notifiers grpc#23480 (add error back in using absl::Status)

Store LB policy name in Config object. grpc#18251 (LB policy config includes policy name)

Don't use a separate call context for subchannel calls grpc#18094 (remove call context from LB policy API)

Change pick_first to not unref unselected subchannels. grpc#16342 (remove requirement that resolver RequestReresolutionLocked() must immediately return a new result)

Reset connection backoff grpc#16225 (add experimental API for resetting connection backoff)

Fall calls with wait_for_ready=false on transient resolver failure. grpc#14733 (fix resolver NextLocked() semantics to differentiate between fatal and transient failures)

(I think there are more in that last category, but I didn't take the time to go back further into the history.)

Given the above, I propose something like the following for these extension points:

If the change is a major structural change or fundamental API redesign, it requires a gRFC.

If the change affects almost all implementations of an API that is used externally, it requires a gRFC.

If the change adds, removes, or modifies a significant part of the API surface (i.e., more than just an individual parameter), it requires a gRFC.

Any smaller change not in the above categories does not require a gRFC.

WDYT?

Not opposed, but wanted to throw out an alternative:

This gRFC includes a light-weight update procedure for prior gRFC's that fall into it's domain.
Suppose we retroactively document the interfaces we're worried about here and publish them as gRFC's and then going forward we use the update process (and tune the timing to something that makes sense).

markdroth · 2021-05-13T20:21:02Z

P5-core-large-scale-changes.md

+* Changes with a significant performance impact.
+* Changes to vocabulary types - types that are used broadly within the library.
+* Changes that constrain future development.
+* Changes that modify system architecture.


Same question here, an example would help.

Added some text.

markdroth · 2021-05-13T20:22:34Z

P5-core-large-scale-changes.md

+* Changes that constrain future development.
+* Changes that modify system architecture.
+
+Since implementation experience can affect how a large scale change may proceed, it's additionally proposed that LSC gRFC's may be updated by later PR's against the approved change.


It might be a good idea to split this part into a separate P-series gRFC. In practice, we've actually been just sending PRs to update existing gRFCs when we need to, without any real review period at all. If we want to formalize this, then this process should apply to more than just C-core changes.

Yeah agree.

I was trying to not re-trigger the review period :)

ctiller · 2021-06-15T18:14:51Z

What are next steps here?

ctiller · 2021-07-14T16:16:05Z

@a11r this has now been idle for 2 months - what are the next steps here?

ctiller and others added 2 commits April 29, 2021 09:52

P5: Require a gRFC for large scale changes to gRPC core

cd714d9

Add discussion list

a3a11f9

chwarr reviewed Apr 29, 2021

View reviewed changes

P5-core-large-scale-changes.md Outdated Show resolved Hide resolved

P5-core-large-scale-changes.md Outdated Show resolved Hide resolved

ctiller and others added 2 commits April 30, 2021 08:20

Review feedback - fix grammar

5b64c30

Co-authored-by: Christopher Warrington <chwarr@microsoft.com>

Review feedback - fix grammar

6f8333a

Co-authored-by: Christopher Warrington <chwarr@microsoft.com>

ctiller assigned a11r Apr 30, 2021

Include a lightweight update period for LSC's

89b6e59

ctiller requested a review from a11r May 6, 2021 18:16

markdroth reviewed May 13, 2021

View reviewed changes

ctiller added 2 commits May 13, 2021 14:49

Review feedback

3be0d70

Further review feedback

e2bc712

ctiller closed this Aug 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

P5: Require a gRFC for large scale changes to gRPC core #233

P5: Require a gRFC for large scale changes to gRPC core #233

ctiller commented Apr 29, 2021

chwarr left a comment

ctiller commented Apr 30, 2021

ctiller commented May 6, 2021

markdroth left a comment

markdroth May 13, 2021

ctiller May 13, 2021

markdroth May 13, 2021

ctiller May 13, 2021

ctiller May 13, 2021

markdroth May 13, 2021

ctiller May 13, 2021

markdroth May 13, 2021

ctiller May 13, 2021

markdroth May 13, 2021

ctiller May 13, 2021

ctiller May 13, 2021

ctiller commented Jun 15, 2021

ctiller commented Jul 14, 2021

P5: Require a gRFC for large scale changes to gRPC core #233

P5: Require a gRFC for large scale changes to gRPC core #233

Conversation

ctiller commented Apr 29, 2021

chwarr left a comment

Choose a reason for hiding this comment

ctiller commented Apr 30, 2021

ctiller commented May 6, 2021

markdroth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ctiller commented Jun 15, 2021

ctiller commented Jul 14, 2021