Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a CStr::count_bytes() method #256

Closed
tgross35 opened this issue Aug 3, 2023 · 10 comments
Closed

Add a CStr::count_bytes() method #256

tgross35 opened this issue Aug 3, 2023 · 10 comments
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api

Comments

@tgross35
Copy link

tgross35 commented Aug 3, 2023

Proposal

Problem statement

Currently, the best way to evaluate the length of a CStr is cs.to_bytes().len(). This is not ergonomic, and may not be the best option when CStr becomes a thin pointer.

Motivating examples or use cases

For both a fat and thin CStr, this will provide a more intuitive method to get the length. When CStr becomes thin, CStr::count_bytes will be able to call strlen directly without going through the less direct to_bytes > to_bytes_with_nul > slice::from_raw_parts > len.

Solution sketch

Add a method CStr::count_bytes that returns its length. Currently, we can use a self.inner.len() - 1 to get the length of the string, which is a constant time operation. Once CStr becomes thin, this will become an O(1) call to strlen - we just need to make it clear in documentation that this will have a performance regression at some point (as we do for the to_bytes() methods).

The API is simple:

impl CStr {
    pub fn count_bytes(&self) -> usize {
        self.inner.len()
    }
}

Once thin, this will become:

impl CStr {
    pub fn count_bytes(&self) -> usize {
        strlen(self.inner_ptr)
    }
}

Alternatives

The status quo: accessing length via .to_bytes().len(). This ACP intends to improve upon this option.

constness

Currenly CStr::from_ptr uses a version of strlen that calls libc's implementation normally, or a naive Rust implementation when const. This works via const_eval_select which to my knowledge hasn't been okayed for internal use. So, we will not be able to make CStr::count_bytes const stable until either that happens, or we decide our naive strlen is always OK (making CStr::new const is also blocked by this, or by the switch to a thin pointer).

My current implementation gates the constness under const_cstr_from_ptr which has the same const-stable blocker.

cc @oli-obk because of your comment onconst_eval_select here rust-lang/rust#107624 (comment) (I believe this is the RFC: rust-lang/rfcs#3352)

Naming

The name count_bytes is up for some bikeshedding.

Links and related work

What happens now?

This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

@tgross35 tgross35 added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Aug 3, 2023
@tgross35
Copy link
Author

tgross35 commented Aug 3, 2023

Not yet approved of course, but this is small enough that I got started on work.

Tracking issue: rust-lang/rust#114441

Implementation PR: rust-lang/rust#114443

@kadiwa4
Copy link

kadiwa4 commented Aug 9, 2023

That same performance regression will also inevitably happen to CStr::to_bytes (if the implementation switch is actually done), and it already has a note in the docs. Why not do the exact same thing to CStr::len (which is equivalent to c_str.to_bytes().len() as you pointed out) and use a performant impl for now?

@cuviper
Copy link
Member

cuviper commented Aug 9, 2023

I agree, we should not hamstring this on purpose. A similar doc note would be appropriate, so we don't make promises we might not want to keep, but we would add this to the performance considerations of making that switch.

@tgross35
Copy link
Author

tgross35 commented Aug 10, 2023

That is all fair. I have updated the proposal to not call strlen at this time, will do my implementation PR soon (changed)

@m-ou-se
Copy link
Member

m-ou-se commented Aug 29, 2023

We discussed this in last week's libs-api meeting. Since the plan is still for &CStr to become a thin pointer, such that determining the length will be a O(N) operation at some point in the future, we don't think .len() is be an appropriate name for this method, as it hides the fact that some actual 'work' is done by this function.

The .to_bytes() method is called to_bytes and not as_bytes because it's not a 'free' conversion. Similarly, a length method should make clear it is not a O(1) operation, perhaps by adding a verb into the method name. (For example, count_bytes might be a good method name.)

@programmerjake
Copy link
Member

For example, count_bytes might be a good method name.

in that case there should be a doc alias for len and strlen

@tgross35 tgross35 changed the title Add a CStr::len() method Add a CStr::count_bytes() method Aug 29, 2023
@tgross35
Copy link
Author

tgross35 commented Aug 29, 2023

Thanks @m-ou-se. I updated this proposal and the implementation to use count_bytes instead, with the doc alias that @programmerjake suggested.

I am wondering if there may be a better name yet since count_bytes() will not return the same value as to_bytes().len() - but I don't know what would be better and don't feel super strongly about it. Maybe calc_len()? Nevermind, I forgot we have two different to_bytes functions

@programmerjake
Copy link
Member

I am wondering if there may be a better name yet since count_bytes() will not return the same value as to_bytes().len()

isn't count_bytes supposed to return the same value as strlen?!
v.to_bytes().len() == strlen(v.as_ptr())

perhaps you were thinking of to_bytes_with_nul()?

@tgross35
Copy link
Author

perhaps you were thinking of to_bytes_with_nul()?

Yes I am, please ignore me 🙂

@dtolnay dtolnay added the ACP-accepted API Change Proposal is accepted (seconded with no objections) label Sep 19, 2023
@dtolnay
Copy link
Member

dtolnay commented Sep 19, 2023

I would be happy to accept a PR for unstable count_bytes. Thank you!

@dtolnay dtolnay closed this as completed Sep 19, 2023
bors added a commit to rust-lang-ci/rust that referenced this issue Sep 20, 2023
Implement `cstr_count_bytes`

This has not yet been approved via ACP, but it's simple enough to get started on.

- ACP: rust-lang/libs-team#256
- Tracking issue: rust-lang#114441

`@rustbot` label +T-libs-api
github-actions bot pushed a commit to rust-lang/miri that referenced this issue Sep 21, 2023
Implement `cstr_count_bytes`

This has not yet been approved via ACP, but it's simple enough to get started on.

- ACP: rust-lang/libs-team#256
- Tracking issue: rust-lang/rust#114441

`@rustbot` label +T-libs-api
lnicola pushed a commit to lnicola/rust-analyzer that referenced this issue Apr 7, 2024
Implement `cstr_count_bytes`

This has not yet been approved via ACP, but it's simple enough to get started on.

- ACP: rust-lang/libs-team#256
- Tracking issue: rust-lang/rust#114441

`@rustbot` label +T-libs-api
RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this issue Apr 27, 2024
Implement `cstr_count_bytes`

This has not yet been approved via ACP, but it's simple enough to get started on.

- ACP: rust-lang/libs-team#256
- Tracking issue: rust-lang/rust#114441

`@rustbot` label +T-libs-api
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api
Projects
None yet
Development

No branches or pull requests

6 participants