Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: enforce protocols to always be valid utf8 strings #3745

Closed
wants to merge 23 commits into from

Conversation

thomaseizinger
Copy link
Contributor

@thomaseizinger thomaseizinger commented Apr 6, 2023

Description

Previously, a protocol could be any sequence of bytes as long as it started with /. Now, we directly parse a protocol as String which enforces it to be valid UTF8. This has a significant API impact. Instead of any type being able to represent a protocol, we use a Protocol newtype now that can only be constructed from a String or a &'static str.

Technically, this would not be necessary to for protocols to always be strings. We could also only change the ProtocolName trait to just return &str instead of &[u8]. However, both of these are breaking changes and I am of the opinion that there are currently too many abstraction layers in regards to stream upgrades. This change removes some of these layers by introducing the UpgradeProtocols trait which replaces the UpgradeInfo trait.

We will eventually lose a bit of type safety with this approach. In particular, once we also ship #2863, types like DeniedUpgrade won't work anymore because each protocol is represented as the same type.

Notes & open questions

Is the change in abstraction layers good? Should we go this far or retain the type-safety of upgrades we get from protocols being able to be any type and just having to satisfy a trait? I am going to prototype that too and see what it is like.

This is a draft because currently, only the multistream-select tests compile and work. I've only touched the code necessary for that to work.

Dependencies

In order to make the implementation of this PR simpler, I've extracted several others. They are all deprecations, meaning we can release them first, then delete the code and finally merge this PR (once it is ready in itself).

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • A changelog entry has been made in the appropriate crates

@thomaseizinger
Copy link
Contributor Author

@mxinden please review despite broken build, multistream-select by itself passes.

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preliminary review. Still need more time.

@@ -158,7 +196,7 @@ pub trait InboundUpgrade<C>: UpgradeInfo {
/// method is called to start the handshake.
///
/// The `info` is the identifier of the protocol, as produced by `protocol_info`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs updating.

core/src/upgrade.rs Outdated Show resolved Hide resolved
core/src/upgrade.rs Outdated Show resolved Hide resolved
@thomaseizinger
Copy link
Contributor Author

This is turning out to be pretty invasive but I think it is worth the breaking change. We reduce complexity by removing one layer of abstraction. This looses us some type safety as mentioned in the description but in reality, this just means we can no longer do void::unreachable in the event handler in some edge cases. I don't think the type safety in these edge cases is worth the complexity of these abstractions for our users.

@thomaseizinger
Copy link
Contributor Author

I'd be hoping that we can ship deprecations like #3807 first such that I can delete all this code and don't have to touch it in these breaking changes :)

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall direction looks good to me.

EitherName::A(info) => EitherFuture::First(self.0.upgrade_inbound(sock, info)),
EitherName::B(info) => EitherFuture::Second(self.1.upgrade_inbound(sock, info)),
fn upgrade_inbound(self, sock: C, selected_protocol: Protocol) -> Self::Future {
if self.0.protocols().any(|p| p == selected_protocol) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As described in your pull request, having to do this now is a bummer. This is in the hot-path. With our move to one-stream-per-request this will be executed often.

All that said, I am fine moving forward here. Based on intuition, this warrants a benchmark before introducing any optimization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue is that SelectUpgrade is constructed for each upgrade, meaning it is pointless to try and cache something.

Once we get rid of upgrades for protocols, we can try and cache some information in the handler, i.e. a HashMap of Protocol to sub-handler.

I'd suggest we defer it until then. Despite being the hotpath, I think iterating a handful of protocols (which should be allocation-free) shouldn't be very slow?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative that came to my mind:

Introduce a ToProtocol trait which Protocol implements by default. Our ToProtocolsIter would require an Iterator where Item implements ToProtocol.

With this abstraction we still have the simplicity for the user, namely that they can just use Protocol and that they can't mistake a Protocol for a str given that the latter is wrapped in a newtype. At the same time we maintain the functionality for e.g. the SelectUpgrade.

Just a thought. I don't feel strongly about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is effectively the same solution as we had before, just with a different name. It doesn't remove a "layer" for the user because they still have to specify the Item type.

But perhaps we should go with this path until we fully tackle: #2863

At that point, we will also be able to implement performance optimizations in the Select handler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the introduction of separate newtypes in multistream-select and libp2p-core, this is proving to be difficult.

On the one hand, I'd like to maintain this separation, I think multistream-select should be an implementation detail of libp2p-core and libp2p-swarm.

But, with the two new-types in place, I cannot re-create the type-safety we had before because multistream-select doesn't know about the type used in libp2p-core.

Perhaps it is too early to try and remove this abstraction. Here is a different idea:

misc/multistream-select/src/dialer_select.rs Outdated Show resolved Hide resolved
#[derive(Clone, Debug, PartialEq, Eq)]
pub struct Protocol(Bytes);
#[derive(Clone, Debug, PartialEq, Eq, Hash)]
pub struct Protocol(Cow<'static, str>); // TODO: Instead of `Cow`, we should probably be storing `Arc<str>`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we start with as is and introduce an Arc<str> if necessary? None of it is exposed, thus easy to change in a non-breaking way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not fully sure. It improves performance for cases where we dynamically construct a Protocol. Ideally, users use from_static as much as possible so it is kind of a niche usecase but I still want it to be performant.

It is not much work so I think I am just gonna do it. I don't see why a protocol string should ever be mutable which is the only thing that becomes impossible with Arc<str>.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An Arc<str> would not be enough, right? Arc would require a heap allocation, no? So you would have Either<&'static str, Arc<String>> instead?

Copy link
Contributor Author

@thomaseizinger thomaseizinger Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we would have an Either of &'static str and Arc<String>.

misc/multistream-select/src/protocol.rs Show resolved Hide resolved
@thomaseizinger thomaseizinger marked this pull request as ready for review April 23, 2023 10:56
@mergify

This comment was marked as resolved.

misc/metrics/src/identify.rs Outdated Show resolved Hide resolved
misc/multistream-select/src/dialer_select.rs Outdated Show resolved Hide resolved
core/src/upgrade.rs Outdated Show resolved Hide resolved
EitherName::A(info) => EitherFuture::First(self.0.upgrade_inbound(sock, info)),
EitherName::B(info) => EitherFuture::Second(self.1.upgrade_inbound(sock, info)),
fn upgrade_inbound(self, sock: C, selected_protocol: Protocol) -> Self::Future {
if self.0.protocols().any(|p| p == selected_protocol) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative that came to my mind:

Introduce a ToProtocol trait which Protocol implements by default. Our ToProtocolsIter would require an Iterator where Item implements ToProtocol.

With this abstraction we still have the simplicity for the user, namely that they can just use Protocol and that they can't mistake a Protocol for a str given that the latter is wrapped in a newtype. At the same time we maintain the functionality for e.g. the SelectUpgrade.

Just a thought. I don't feel strongly about this.

#[derive(Clone, Debug, PartialEq, Eq)]
pub struct Protocol(Bytes);
#[derive(Clone, Debug, PartialEq, Eq, Hash)]
pub struct Protocol(Cow<'static, str>); // TODO: Instead of `Cow`, we should probably be storing `Arc<str>`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An Arc<str> would not be enough, right? Arc would require a heap allocation, no? So you would have Either<&'static str, Arc<String>> instead?

protocols/kad/src/protocol_priv.rs Show resolved Hide resolved
transports/webrtc/src/tokio/upgrade/noise.rs Outdated Show resolved Hide resolved
mergify bot pushed a commit that referenced this pull request Apr 25, 2023
These functions were only used for some code in the interop-tests which is easily mitigated and perhaps even easier to understand now. We can thus deprecate these functions and their related types and thereby reduce the API surface of `libp2p-core` and the maintenance burden.

This change is motivated by the work around making protocols always strings which requires/required updates to all these upgrades.

Related #3806.
Related #3271.
Related #3745.

Pull-Request: #3807.
@mergify

This comment was marked as resolved.

@mxinden
Copy link
Member

mxinden commented Apr 26, 2023

@thomaseizinger please ping me when I should take another look.

@thomaseizinger
Copy link
Contributor Author

@thomaseizinger please ping me when I should take another look.

Will do!

@thomaseizinger

This comment was marked as outdated.

@mergify
Copy link
Contributor

mergify bot commented Apr 26, 2023

This pull request has merge conflicts. Could you please resolve them @thomaseizinger? 🙏

@thomaseizinger
Copy link
Contributor Author

I am going with a different approach now: #3746

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants