Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misc/multistream-select: Treat protocol as String and not [u8] #2831

Closed
mxinden opened this issue Aug 20, 2022 · 21 comments · Fixed by #3746
Closed

misc/multistream-select: Treat protocol as String and not [u8] #2831

mxinden opened this issue Aug 20, 2022 · 21 comments · Fixed by #3746
Labels
difficulty:easy getting-started Issues that can be tackled if you don't know the internals of libp2p very well help wanted

Comments

@mxinden
Copy link
Member

mxinden commented Aug 20, 2022

Description

Today we treat protocols negotiated via multistream-select as a [u8]. Instead we should treat it as a String.

Motivation

Current Implementation

/// A protocol (name) exchanged during protocol negotiation.
#[derive(Clone, Debug, PartialEq, Eq)]
pub struct Protocol(Bytes);
impl AsRef<[u8]> for Protocol {
fn as_ref(&self) -> &[u8] {
self.0.as_ref()
}
}

Are you planning to do it yourself in a pull request?

No

@mxinden mxinden added difficulty:easy help wanted getting-started Issues that can be tackled if you don't know the internals of libp2p very well labels Aug 20, 2022
@thomaseizinger
Copy link
Contributor

Should we remove the ProtocolName abstraction as part of this or just change it to &str for now?

@mxinden
Copy link
Member Author

mxinden commented Aug 20, 2022

For the sake of type safety, what do you think of a new type wrapping a String or &str?

@efarg
Copy link
Contributor

efarg commented Aug 28, 2022

I'd like to work on this.

impl AsRef<str> for Protocol {
    fn as_ref(&self) -> &str {
	std::str::from_utf8(&self.0.as_ref()).unwrap()
    }
}

Are we thinking something along these line? I am using &str here, because, String requires new memory. With AsRef<str> in place, we can replace AsRef<[u8]> with AsRef<str> in dialer_select.rs and listener_select.rs, right?

But doing so, rippled its effect into places where dialers and listeners are used, as in core/src/upgrade where NameWrap, UpgradeInfo, ProtocolName want AsRef. When I made changes to accommodate this, more places depending on the mentioned three started emitting errors. This keeps going on...

@thomaseizinger
Copy link
Contributor

It is a fairly invasive change yes. Happy to mentor you along if you want to pick it up!

See some of the communication in #2798.

I think we should store the protocol as String or &str everywhere to begin with and not convert it. Some places also use Cow so the Protocol newtype may need Cow inside!

@thomaseizinger
Copy link
Contributor

Many abstractions that require AsRef<[u8]> currently may need to be changed to just use the new newtype.

@efarg
Copy link
Contributor

efarg commented Aug 28, 2022

@thomaseizinger yes, I would like to work on this.

I think we should store the protocol as String or &str everywhere to begin with and not convert it. Some places also use Cow so the Protocol newtype may need Cow inside!

So currently it is struct Protocol(Bytes) , we want to change it to something like struct Protocol(String) or struct Protocol(&str). How to decide between the two? If we want Cow, is &str better?

Some places also use Cow
may need Cow inside!

I will try to look for places where Cow is used. But off the top of your head, lmk if you know places where this is the case.

Just out of curiosity: was there any particular reason for choosing Bytes over String in the first place?

@thomaseizinger
Copy link
Contributor

thomaseizinger commented Aug 28, 2022

In order to not have to decide between the two, we can have Cow<'static, String> :)

We will either need to use Protocol throughout the entirety of rust-libp2p or make a separate newtype that lives in libp2p-core.

I think using Protocol everywhere is worth exploring. If you don't mind, you can open a possibly not-compiling draft PR where you make comments on specific lines that are causing trouble and we continue discussing there!

multistream-select predates my involvement in the project so I don't know why bytes were chosen originally.

@efarg
Copy link
Contributor

efarg commented Aug 28, 2022

Cow<'static, String> wow! nice.

@thomaseizinger yeah, not-compiling draft PRs exploring the usage of Protocol everywhere sounds good. I'll begin.

@efarg
Copy link
Contributor

efarg commented Aug 28, 2022

Just out of curiosity: was there any particular reason for choosing Bytes over String in the first place?

[thomaseizinger]: multistream-select predates my involvement in the project so I don't know why bytes were chosen originally.

@mxinden do you know?

@mxinden
Copy link
Member Author

mxinden commented Aug 30, 2022

In order to not have to decide between the two, we can have Cow<'static, String> :)

👍 another option which I have seen used more would be Cow<'static, str>. Not sure what is better for us or whether there is even a difference.

Just out of curiosity: was there any particular reason for choosing Bytes over String in the first place?

[thomaseizinger]: multistream-select predates my involvement in the project so I don't know why bytes were chosen originally.

@mxinden do you know?

I don't unfortunately.

One day I would like to see rust-libp2p support both multistream-select and Protocol Select. With that in mind, I would suggest libp2p-core to expose its own Protocol type that would convert into multistream-select's Protocol type. Thus e.g. libp2p-kad would not depend on multistream-select but only on libp2p-core.

Thanks @efarg for the work and thanks @thomaseizinger for the guidance!

@thomaseizinger
Copy link
Contributor

Cow<'static, str> is the correct version! Using String is pointless because it is already an owned type, I've mixed it up 😅

@thomaseizinger
Copy link
Contributor

@efarg Are you still working on this? I'd otherwise pick it up soon :)

@efarg
Copy link
Contributor

efarg commented Sep 16, 2022

Yes, I am very much working on it. But I am a bit lost.

So here is the diff of changes made for this issue: https://paste.debian.net/hidden/524606d1/

Upon compilation, E0759 will be emitted. And I don't know how to solve it. I am thinking, the way we use Protocol struct we can't use 'static lifetime in Cow. Is that it?

Some queries:

(1) Why do we even need String? Are we ever modifying the data in Protocol struct? Can't we just use '&str' instead of Cow?

Thus e.g. libp2p-kad would not depend on multistream-select but only on libp2p-core.

(2) This implies that right now, libp2p-kad depends on multistream-select. But there is no mention of multistream-select in libp2p-kad's toml. @mxinden

We will either need to use Protocol throughout the entirety of rust-libp2p or make a separate newtype that lives in libp2p-core.
I think using Protocol everywhere is worth exploring.

(3) With @mxinden 's last comment, should the idea of using Protocol throughout rust-libp2p be dropped? Will there be a newtype in libp2p-core? Right now, I think, some sort of magic abstraction happens via trait ProtocolName and struct NameWrap. @thomaseizinger

@thomaseizinger
Copy link
Contributor

Thanks for sharing!

I'd tackle this from a different angle. Try and remove the ProtocolName trait first:

pub trait ProtocolName {
/// The protocol name as bytes. Transmitted on the network.
///
/// **Note:** Valid protocol names must start with `/` and
/// not exceed 140 bytes in length.
fn protocol_name(&self) -> &[u8];
}

And replace ProtocolName here with a newtype:

/// Common trait for upgrades that can be applied on inbound substreams, outbound substreams,
/// or both.
pub trait UpgradeInfo {
/// Opaque type representing a negotiable protocol.
type Info: ProtocolName + Clone;
/// Iterator returned by `protocol_info`.
type InfoIter: IntoIterator<Item = Self::Info>;
/// Returns the list of protocols that are supported. Used during the negotiation process.
fn protocol_info(&self) -> Self::InfoIter;
}

The reason we need a newtype is because some protocols construct their identifiers dynamically and thus need to allocate whereas others provide just a &'static str. Using Cow allows us to handle both of these cases to their optimum. To pull all of that completely through, you will need to convert the string in multistream-select to bytes but that is easy as an interim step.

Once that phase is done, you can adopt multistream-select to accept a string instead of bytes as the final action.

Hope this helps! :)

@efarg
Copy link
Contributor

efarg commented Sep 22, 2022

@thomaseizinger what is meant by replacing ProtocolName with newtype? newtype is a tuple struct, right? How will a struct replace ProtocolName, which is a trait?

I have been reading about newtype from ch. 19 in The Book, and can't find anything which would allow such replacement.

@thomaseizinger
Copy link
Contributor

You are right, it can't be replaced 1 to 1!

Essentially, removing ProtocolName from UpgradeInfo means that you can remove the Info associated type and just express InfoIter's bound as: IntoIterator<Item = ProtocolName> (where ProtocolName is now a struct).

Basically what we are doing is removing one layer of "genericness". Instead of allowing many different types that all have to implement ProtocolName, we are making one type that is used everywhere.

Does that make sense? :)

@efarg
Copy link
Contributor

efarg commented Oct 2, 2022

@thomaseizinger sharing some progress. This patch gives 29 errors. I am thinking about how to solve them.

https://paste.debian.net/hidden/3c3659e9/

Major problem is with either.rs. Also, fn. signature changes like this one worry me:

-    fn upgrade_inbound(self, socket: C, info: Self::Info) -> Self::Future;
+    fn upgrade_inbound(self, socket: C, info: ProtocolName) -> Self::Future;

I am thinking instead of replacing Self::Info with ProtocolName, it should be replaced by something that would give us the type of value stored in the iterator InfoIter. But I don't know how to do that. I am thinking something like: Self::InfoIter::Item.

@thomaseizinger
Copy link
Contributor

@thomaseizinger sharing some progress. This patch gives 29 errors. I am thinking about how to solve them.

https://paste.debian.net/hidden/3c3659e9/

Can you open it as a pull-request please? I'd be able to make inline comments then :)

Major problem is with either.rs. Also, fn. signature changes like this one worry me:

-    fn upgrade_inbound(self, socket: C, info: Self::Info) -> Self::Future;
+    fn upgrade_inbound(self, socket: C, info: ProtocolName) -> Self::Future;

Yes, that is exactly the signature change we want to see!

I am thinking instead of replacing Self::Info with ProtocolName, it should be replaced by something that would give us the type of value stored in the iterator InfoIter. But I don't know how to do that. I am thinking something like: Self::InfoIter::Item.

I am not sure this makes sense? The point of this change is to remove a layer of abstraction. By virtue, this means we will get more concrete in some places, e.g. the type in the iterator is always fixed!

@efarg
Copy link
Contributor

efarg commented Oct 3, 2022

I am not sure this makes sense? The point of this change is to remove a layer of abstraction. By virtue, this means we will get more concrete in some places, e.g. the type in the iterator is always fixed!

Great! I will open a pull request.

efarg added a commit to efarg/rust-libp2p that referenced this issue Oct 3, 2022
efarg added a commit to efarg/rust-libp2p that referenced this issue Nov 23, 2022
efarg added a commit to efarg/rust-libp2p that referenced this issue Nov 26, 2022
efarg added a commit to efarg/rust-libp2p that referenced this issue Dec 1, 2022
@mxinden
Copy link
Member Author

mxinden commented Mar 1, 2023

I am still in favor of this change. @efarg let me know in case you are still interested in contributing this change. (Sorry in case I am missing a related discussion.)

@thomaseizinger
Copy link
Contributor

I am still in favor of this change. @efarg let me know in case you are still interested in contributing this change. (Sorry in case I am missing a related discussion.)

They are working on it! We do have a conversation going on Element about this :)

@mergify mergify bot closed this as completed in #3746 May 4, 2023
mergify bot pushed a commit that referenced this issue May 4, 2023
Previously, a protocol could be any sequence of bytes as long as it started with `/`. Now, we directly parse a protocol as `String` which enforces it to be valid UTF8.

To notify users of this change, we delete the `ProtocolName` trait. The new requirement is that users need to provide a type that implements `AsRef<str>`.

We also add a `StreamProtocol` newtype in `libp2p-swarm` which provides an easy way for users to ensure their protocol strings are compliant. The newtype enforces that protocol strings start with `/`. `StreamProtocol` also implements `AsRef<str>`, meaning users can directly use it in their upgrades.

`multistream-select` by itself only changes marginally with this patch. The only thing we enforce in the type-system is that protocols must implement `AsRef<str>`.

Resolves: #2831.

Pull-Request: #3746.
umgefahren pushed a commit to umgefahren/rust-libp2p that referenced this issue Mar 8, 2024
Previously, a protocol could be any sequence of bytes as long as it started with `/`. Now, we directly parse a protocol as `String` which enforces it to be valid UTF8.

To notify users of this change, we delete the `ProtocolName` trait. The new requirement is that users need to provide a type that implements `AsRef<str>`.

We also add a `StreamProtocol` newtype in `libp2p-swarm` which provides an easy way for users to ensure their protocol strings are compliant. The newtype enforces that protocol strings start with `/`. `StreamProtocol` also implements `AsRef<str>`, meaning users can directly use it in their upgrades.

`multistream-select` by itself only changes marginally with this patch. The only thing we enforce in the type-system is that protocols must implement `AsRef<str>`.

Resolves: libp2p#2831.

Pull-Request: libp2p#3746.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty:easy getting-started Issues that can be tackled if you don't know the internals of libp2p very well help wanted
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants