Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ptr::add/sub: do not claim equivalence with offset(c as isize) #130229

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

RalfJung
Copy link
Member

@RalfJung RalfJung commented Sep 11, 2024

In #110837, the offset intrinsic got changed to also allow a usize offset parameter. The intention is that this will do an unsigned multiplication with the size, and we have UB if that overflows -- and we also have UB if the result is larger than usize::MAX, i.e., if a subsequent cast to isize would wrap. The LLVM backend sets some attributes accordingly.

This updates the docs for add/sub to match that intent, in preparation for adjusting codegen to exploit this UB. We use this opportunity to clarify what the exact requirements are: we compute the offset using mathematical multiplication (so it's no problem to have an isize * usize multiplication, we just multiply integers), and the result must fit in an isize.
Cc @rust-lang/opsem @nikic

#130239 updates Miri to detect this UB.

sub still has some cases of UB not reflected in the underlying intrinsic semantics (and Miri does not catch): when we subtract usize::MAX, then after casting to isize that's just -1 so we end up adding one unit without noticing any UB, but actually the offset we gave does not fit in an isize. Miri will currently still not complain for such cases:

fn main() {
    let x = &[0i32; 2];
    let x = x.as_ptr();
    // This should be UB, we are subtracting way too much.
    unsafe { x.sub(usize::MAX).read() };
}

However, the LLVM IR we generate here also is UB-free. This is "just" library UB but not language UB.
Cc @saethlin; might be worth adding precondition checks against overflow on offset/add/sub?

Fixes #130211

@rustbot
Copy link
Collaborator

rustbot commented Sep 11, 2024

r? @scottmcm

rustbot has assigned @scottmcm.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Sep 11, 2024
@rustbot
Copy link
Collaborator

rustbot commented Sep 11, 2024

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

The Miri subtree was changed

cc @rust-lang/miri

@RalfJung RalfJung changed the title Ptr offset unsigned ptr::add/sub: fix docs (do not claim equivalence with offset), and fix gap in Miri UB checks Sep 11, 2024
@@ -355,7 +355,8 @@ impl<T: ?Sized> *const T {
///
/// If any of the following conditions are violated, the result is Undefined Behavior:
///
/// * The computed offset, `count * size_of::<T>()` bytes, must not overflow `isize`.
/// * The computed offset, `count * size_of::<T>()` bytes (using unbounded arithmetic),
/// must fit in an `isize`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the most clear and concise way to say "convert count and size_of::<T>() from whatever their types are into unbounded mathematical integers, multiply those, and check if the result fits in the value range of isize"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe flip it around? Could say the same thing as

If sizeof(T) > 0, then count <= isize::MAX / sizeof(T).

Alternatively, could say it as something like

usize::saturating_mul(count, size_of::<T>) fits in an isize

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, the docs on offset_from say

the absolute distance between the pointers, in bytes, computed on mathematical integers (without "wrapping around"), cannot overflow an isize

so maybe something like this could work

The offset in bytes (count * size_of::<T>()), computed on mathematical integers (without "wrapping around"), must fit in an isize.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "mathematical multiplication" is the most intuitive version here -- good catch with the offset_from docs, I will follow that.

@lukas-code
Copy link
Member

To me this feels like we're changing the docs here to introduce new UB where there was none before. FWIW I'm in favor of this change and see how the new definition is much more useful, but I still feel that we should at least do a crater run (with an assert for the overflow) or something to see how widespread this misuse of add is in practice.

Copy link
Member

@lukas-code lukas-code left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The byte_add/byte_sub methods also need to be updated, especially that byte_add also cannot go "backwards".

@RalfJung
Copy link
Member Author

To me this feels like we're changing the docs here to introduce new UB where there was none before. FWIW I'm in favor of this change and see how the new definition is much more useful, but I still feel that we should at least do a crater run (with an assert for the overflow) or something to see how widespread this misuse of add is in practice.

This is skirting the edge of clarifying docs vs introducing new UB. "must not overflow isize" arguably already made this UB before, but "convenience for .offset(count as isize)" contradicts that. Paging in @rust-lang/lang for awareness.

I'm not sure such a crater run would be meaningful though, it would also catch all the cases that already were UB before...

@RalfJung
Copy link
Member Author

RalfJung commented Sep 11, 2024

I have moved the Miri changes into a separate PR (#130239) so it is not held up by discussions around how to deal with the "kind of a breaking change" aspect of this.

I also incorporated the feedback. The wrapping_ methods also still have the wording around "convenience for ... as isize ...". There it is technically correct due to the wrapping semantics, but it might make sense to keep the docs consistent with the non-wrapping versions?

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the new wording is clearer.

library/core/src/ptr/const_ptr.rs Outdated Show resolved Hide resolved
workingjubilee added a commit to workingjubilee/rustc that referenced this pull request Sep 11, 2024
… r=compiler-errors

miri: fix overflow detection for unsigned pointer offset

This is the Miri part of rust-lang#130229. This is already UB in codegen so we better make Miri detect it; updating the docs may take time if we have to follow some approval process, but let's make Miri match reality ASAP.

r? `@scottmcm`
@saethlin
Copy link
Member

might be worth adding precondition checks against overflow on offset/add/sub?

It is definitely worth adding. I'm working on it.

workingjubilee added a commit to workingjubilee/rustc that referenced this pull request Sep 11, 2024
… r=compiler-errors

miri: fix overflow detection for unsigned pointer offset

This is the Miri part of rust-lang#130229. This is already UB in codegen so we better make Miri detect it; updating the docs may take time if we have to follow some approval process, but let's make Miri match reality ASAP.

r? ``@scottmcm``
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Sep 12, 2024
Rollup merge of rust-lang#130239 - RalfJung:miri-ptr-offset-unsigned, r=compiler-errors

miri: fix overflow detection for unsigned pointer offset

This is the Miri part of rust-lang#130229. This is already UB in codegen so we better make Miri detect it; updating the docs may take time if we have to follow some approval process, but let's make Miri match reality ASAP.

r? ``@scottmcm``
github-actions bot pushed a commit to rust-lang/miri that referenced this pull request Sep 12, 2024
…er-errors

miri: fix overflow detection for unsigned pointer offset

This is the Miri part of rust-lang/rust#130229. This is already UB in codegen so we better make Miri detect it; updating the docs may take time if we have to follow some approval process, but let's make Miri match reality ASAP.

r? ``@scottmcm``
@WaffleLapkin
Copy link
Member

I also incorporated the feedback. The wrapping_ methods also still have the wording around "convenience for ... as isize ...". There it is technically correct due to the wrapping semantics, but it might make sense to keep the docs consistent with the non-wrapping versions?

I think this is a good idea. Looking at the docs now I think this "convinience for ... as isize" only adds confusion.

@RalfJung
Copy link
Member Author

All right so what's the process here? This is a library API so by default I would assume t-libs-api is responsible. This is fairly directly exposing a language primitive though, which is why I pinged t-lang above.

But still, let's nominate t-libs-api -- @rust-lang/libs-api are you okay with this change to the docs for our inbounds pointer arithmetic methods? This can be seen as a breaking change since we previously documented e.g. ptr.add(count) as convenience for ptr.offset(count as isize). This hasn't been true for a while, there are cases where the latter is fine but the former has UB: codegen assumes you're not able to move "backwards" even if count as isize would be negative. That's a useful assumption for codegen to have.

OTOH, the docs do say:

The computed offset, count * size_of::() bytes, must not overflow isize.

I would say if count * size_of::<T>() is usize::MAX then this overflows isize. So the docs kind of already say that ptr.add(usize::MAX) is UB, but they also say it is equivalent to ptr.offset(-1) which is not UB...

It seems unlikely that someone would rely on add with really big usize values to move the pointer backwards (why would they not use offset) -- unless of course they took our docs too literally. Therefore I hope this is acceptable breakage. Miri has already been adjusted to detect this UB and @saethlin is looking into the possibility of adding checks against this UB for debug builds.

@RalfJung RalfJung added the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Sep 12, 2024
@scottmcm
Copy link
Member

FWIW, I've always interpreted the "convenience" statement as a path-dependent "here's why you'd use this".

add was added well after offset existed, so I've always read that parenthetical as "if you've been using .offset(foo as isize) -- as we saw most uses of offset were -- then you probably want to switch to using .add(foo) now as shorter and easier", rather than it having ever been a promise that it's exactly the same as that. If it's just that as part of adding add we noticed that almost every case of offset always knew which way it's going, so we added add/sub as better ways to do that.

@scottmcm scottmcm added the S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). label Sep 12, 2024
@RalfJung
Copy link
Member Author

add was added well after offset existed, so I've always read that parenthetical as "if you've been using .offset(foo as isize) -- as we saw most uses of offset were -- then you probably want to switch to using .add(foo) now as shorter and easier",

However, if foo is usize::MAX, then this switch is wrong under our current codegen.

Of course that's a very unlikely thing to happen...

@workingjubilee
Copy link
Member

Both were added before we realized that we could not actually offset something by more than isize::MAX and remain within the same allocation according to the backend semantics we are constrained by.

@m-ou-se m-ou-se added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. and removed T-libs Relevant to the library team, which will review and decide on the PR/issue. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Sep 17, 2024
@m-ou-se
Copy link
Member

m-ou-se commented Sep 17, 2024

@rfcbot merge

@rfcbot
Copy link

rfcbot commented Sep 17, 2024

Team member @m-ou-se has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Sep 17, 2024
@the8472 the8472 added the relnotes Marks issues that should be documented in the release notes of the next release. label Sep 17, 2024
@RalfJung RalfJung changed the title ptr::add/sub: fix docs (do not claim equivalence with offset), and fix gap in Miri UB checks ptr::add/sub: fix docs (do not claim equivalence with offset) Sep 17, 2024
@RalfJung RalfJung changed the title ptr::add/sub: fix docs (do not claim equivalence with offset) ptr::add/sub: do not claim equivalence with offset(c as isize) Sep 17, 2024
@RalfJung
Copy link
Member Author

Looking at #130211 again it seems to me that actually we don't yet have codegen that makes things like p.add(usize::MAX) UB even when p.sub(1) is fine -- however, we'd like to adjust codegen to do this. So this is not catching up with what codegen already did, but it prepares for what we want to do in codegen, and makes the docs match the semantics that #110837 intended for the underlying intrinsic (and that #130239 implements for Miri).

@nikic
Copy link
Contributor

nikic commented Sep 18, 2024

@RalfJung Yes, that's correct. The backend doesn't make use of this right now, but intends to do so in the future. We'll likely start with LLVM 20, as getelementptr nuw support in LLVM 19 is still a bit weak.

@the8472
Copy link
Member

the8472 commented Sep 18, 2024

Can we add an ub_check for this?

@RalfJung
Copy link
Member Author

RalfJung commented Sep 18, 2024 via email

@saethlin
Copy link
Member

The draft PR is #130251

@dtolnay dtolnay removed the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Sep 19, 2024
@rfcbot rfcbot added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Sep 21, 2024
@rfcbot
Copy link

rfcbot commented Sep 21, 2024

🔔 This is now entering its final comment period, as per the review above. 🔔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. relnotes Marks issues that should be documented in the release notes of the next release. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pointer addition semantics are unclear