-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
P5: Require a gRFC for large scale changes to gRPC core #233
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticed a few typos.
Co-authored-by: Christopher Warrington <chwarr@microsoft.com>
Co-authored-by: Christopher Warrington <chwarr@microsoft.com>
Thanks for the edits! |
@markdroth per our conversation I've modified this to include a lighter weight update process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay in reviewing! Please let me know what you think.
P5-core-large-scale-changes.md
Outdated
|
||
Such changes include: | ||
* Changes to extensibility points, or the types used by extensibility points. | ||
* Changes that require downstream consumers of gRPC Core to modify their code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this one already covered by P3? If not, can you give an example of what would be included in this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I was thinking about internal extensibility... I'll clean this one up.
P5-core-large-scale-changes.md
Outdated
* Changes that require downstream consumers of gRPC Core to modify their code. | ||
* Changes with a significant performance impact. | ||
* Changes to vocabulary types - types that are used broadly within the library. | ||
* Changes that constrain future development. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give an example of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One that's happened before: L6 introduced the restriction that a C++11 compiler was required to compile Core, L59 required a standard library be dynamically linkable. Neither formally required a gRFC (no API changes), but we did so anyway because it felt right - this tries to formalize that.
New changes that might fall into that: taking a large dependency, refactoring in a way that prevents a known use case... things that would require a large scale change to undo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed text.
A large scale change is one that has a wide-ranging impact on the implementation of gRPC Core. | ||
|
||
Such changes include: | ||
* Changes to extensibility points, or the types used by extensibility points. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this bullet in particular needs to be more nuanced. I do absolutely agree that a gRFC is appropriate for large structural changes to these kinds of APIs, but I think we also have a lot of changes to these APIs that are smaller in scope, where requiring a gRFC doesn't seem like a positive cost/benefit trade-off.
In particular, the resolver and LB policy APIs have been seeing a fair amount of small changes over time, some as part of the xDS work and others just as we evolve the client channel architecture, which we've currently been free to do since they are not yet public APIs. It's been very convenient to make these kinds of changes happen slowly over time, and I think that would be a much heavier-weight approach if every small change required a gRFC.
To make this more concerete, here are some examples of actual changes we've made to these APIs over the years, grouped based on whether or not I think the change should require a gRFC moving forward:
First, the changes that I think should require a gRFC:
- Add ResultHandler to Resolver API grpc#17987 (restructured resolver API to add
ResultHandler
interface) and Restructure how addresses and service config are passed from resolver to LB policy grpc#18357 (restructure how addresses flow from resolver to LB policy) - LB policy picker API grpc#17770 (introduced the LB policy picker API)
Here are a few that I think are borderline; the scope or impact of the changes probably justifies a gRFC, but the fact that there don't happen to be any external LB policy implementations today mean that the changes didn't affect anyone:
- Pass address to CreateSubchannel() and expose attributes in SubchannelInterface. grpc#24172 (pass address to LB policy helper
CreateSubchannel()
separately instead of encoding in channel args) - Add MetadataInterface abstraction to LB policy API. grpc#19405 (add metadata abstraction to LB policy API) and Use a more standard iteration interface for LB policy metadata API. grpc#20443 (improve metadata iteration interface used by LB policies)
- Use SubchannelInterface to hide implementation from LB policy API grpc#18917 (use an interface to hide real subchannel impl from LB policies) and Second attempt: Hide ConnectedSubchannel from LB policy API. grpc#19390 (hide
ConnectedSubchannel
from LB policy API) - Remove channelz from LB policy API. grpc#19066 (remove channelz from LB policy API)
- More LB policy API improvements. grpc#19049 (various LB policy API improvements)
- Add API for accessing per-call backend metric data in LB policies. grpc#19036 (add API for LB policies to access backend metric data)
- Make service config ref-counted and hold refs to it in LB config. grpc#18081 (restructure how LB policies get their config)
And here are the ones that I think are small enough that they should not require a gRFC:
- Implement google-c2p resolver. grpc#25215 (changed resolver base class to no longer take ownership of
WorkSerializer
orResultHandler
) - Make request path more easily visible to LB policies. grpc#23417 (make request path more efficient to access in LB policies)
- Add experimental call attribute accessor method to LoadBalancingPolicy::CallState. grpc#23024 (add experimental call attributes API for LB policies)
- Use std::function<> for recv_trailing_metadata callback in LB policy API. grpc#20441 (use
std::function<>
for LB policy callback to intercept trailing metadata) - LB policy API changes suggested by Sanjay. grpc#20090 (some cosmetic changes to the LB policy API suggested by Sanjay)
- Use LRS to do client-side load reporting grpc#19394 (pass error to LB policy trailing metadata callback)
- Second attempt: Simplify LB policy and resolver shutdown grpc#19389 (simplified shutdown of both resolvers and LB policies)
- Remove CreateChannel() method from LB helper API. grpc#19038 (remove
CreateChannel()
from LB policy helper API) - Remove error from connectivity state tracking. grpc#18628 (remove error from connectivity state reporting) and then later Plumb absl::Status through connectivity state notifiers grpc#23480 (add error back in using
absl::Status
) - Store LB policy name in Config object. grpc#18251 (LB policy config includes policy name)
- Don't use a separate call context for subchannel calls grpc#18094 (remove call context from LB policy API)
- Change pick_first to not unref unselected subchannels. grpc#16342 (remove requirement that resolver
RequestReresolutionLocked()
must immediately return a new result) - Reset connection backoff grpc#16225 (add experimental API for resetting connection backoff)
- Fall calls with wait_for_ready=false on transient resolver failure. grpc#14733 (fix resolver
NextLocked()
semantics to differentiate between fatal and transient failures)
(I think there are more in that last category, but I didn't take the time to go back further into the history.)
Given the above, I propose something like the following for these extension points:
- If the change is a major structural change or fundamental API redesign, it requires a gRFC.
- If the change affects almost all implementations of an API that is used externally, it requires a gRFC.
- If the change adds, removes, or modifies a significant part of the API surface (i.e., more than just an individual parameter), it requires a gRFC.
- Any smaller change not in the above categories does not require a gRFC.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not opposed, but wanted to throw out an alternative:
This gRFC includes a light-weight update procedure for prior gRFC's that fall into it's domain.
Suppose we retroactively document the interfaces we're worried about here and publish them as gRFC's and then going forward we use the update process (and tune the timing to something that makes sense).
P5-core-large-scale-changes.md
Outdated
* Changes with a significant performance impact. | ||
* Changes to vocabulary types - types that are used broadly within the library. | ||
* Changes that constrain future development. | ||
* Changes that modify system architecture. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here, an example would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some text.
P5-core-large-scale-changes.md
Outdated
* Changes that constrain future development. | ||
* Changes that modify system architecture. | ||
|
||
Since implementation experience can affect how a large scale change may proceed, it's additionally proposed that LSC gRFC's may be updated by later PR's against the approved change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be a good idea to split this part into a separate P-series gRFC. In practice, we've actually been just sending PRs to update existing gRFCs when we need to, without any real review period at all. If we want to formalize this, then this process should apply to more than just C-core changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to not re-trigger the review period :)
What are next steps here? |
@a11r this has now been idle for 2 months - what are the next steps here? |
No description provided.