Decide about generics in arbitrary self types #129147

traviscross · 2024-08-16T05:31:44Z

Over here, @adetaylor gave an update about arbitrary self types that included this bit:

During the preparation of the RFC it was broadly felt that we should ban "generic" self types but we didn't really define what "generic" meant, and I didn't pin it down enough.

Some of the commentary:

Arbitrary self types v2 rfcs#3519 (comment)

Arbitrary self types v2 rfcs#3519 (comment)

Arbitrary self types v2 rfcs#3519 (comment)

A zulip comment

I've a feeling there's more which I've been unable to find, including the initial impetus to ban generics from arbitrary self types.

It seems to be widely felt that:
impl SomeType {
 fn m<R: Deref<Target=Self>>(self: R) {}
}
would be confusing, but (per those comments) it's actually pretty hard to distinguish that from various legitimate cases for arbitrary self types with generic receivers:
impl SomeType {
  fn method1(self: MyBox<Self, impl Allocator>) {}
  fn method2<const ID: u64>(self: ListArc<Self, ID>) {}
}
I played around with different tests here on the self type in wfcheck.rs but none of them yield the right sort of filter of good/bad cases (which is unsurprising since we haven't quite defined what that means, but I thought I might hit inspiration).

From those comment threads, the most concrete proposal (from @joshtriplett) is:

just disallow the case where the top-level receiver type itself is not concrete.

I plan to have a crack at that, unless folks have other thoughts. cc @Nadrieril who had opinions here.

If we do this, I might need to revisit the diagnostics mentioned in the bits of RFC removed by this commit.

We never made a decision about this. Perhaps if we could offer more clarity, it would save @adetaylor some time here, so it may be worth us discussing.

@rustbot labels +T-lang +I-lang-nominated -needs-triage +C-discussion

cc @rust-lang/lang @adetaylor

The text was updated successfully, but these errors were encountered:

adetaylor · 2024-08-27T13:40:19Z

Thanks for raising this @traviscross .

adetaylor · 2024-09-02T14:35:11Z

Current status: investigating whether I can improve the diagnostics here that can occur when generics are used and the self type doesn't unify cleanly. The canonical failure case is here and the suggestion to improve diagnostics here is from this comment by @compiler-errors.

Outcomes might be:

(the best case) we can find checks we put in place within wfcheck.rs to eliminate some of these classes of type mismatch - I'm not sure if this will turn out to be possible.
~~maybe it turns out that these mismatched types errors can only be achieved by use of a turbofish, and perhaps there is somewhere that turbofishes are checked for correctness and this can be enhanced~~ - I think I'm ruling this out because it would need to understand that the generic type parameter is used for the self type, and that information is presumably hard to find at the call site (thoughts welcome though)
(the worst case) we can't do anything to stop these errors being emitted where they are, but we can probably at least replace the generic "mismatched type" error with a new, slightly more helpful "mismatched self type" error.

Once I've got to the bottom of this, I'll report back, with some thoughts on all the reasons we might have wanted to exclude generics (not just diagnostics).

Adding tests for cases where diagnostics aren't as good as they should be in the case of arbitrary self types. These are slightly duplicative of the existing arbitrary-self-from-method-substs.rs test, but test more permutations so it seems worth adding to the coverage as we explore improving the diagnostics here. Improved diagnostics were suggested in commit 05c5caa This is a part of the arbitrary self types v2 project, rust-lang/rfcs#3519 and specifically the sub-issue exploring questions around generics, rust-lang#129147

Adding tests for cases where diagnostics aren't as good as they should be in the case of arbitrary self types. These are slightly duplicative of the existing arbitrary-self-from-method-substs.rs test, but test more permutations so it seems worth adding to the coverage as we explore improving the diagnostics here. Restritions here were suggested in commit 05c5caa This is a part of the arbitrary self types v2 project, rust-lang/rfcs#3519 rust-lang#44874 and specifically the sub-issue exploring questions around generics, rust-lang#129147

adetaylor · 2024-09-05T12:24:40Z

Here's a summary of the concerns and considerations about why we might want to block generic arbitrary self
types.

Also, I finally found the original suggestion to ban generics from @RalfJung , which led to the commit editing the RFC to do so.

Diagnostics. With this sort of code we end up with a mismatched type error. This comment from @compiler-errors says we should try to restrict the feature such that users can't end up in that circumstance quite as often.
- Having looked into this a bit, it's pretty hard to trigger such a case by accident. This code example uses a turbofish to explicitly insist on a different type.
- Without a turbofish, I can't think of any ways to trigger such a mismatched type error except by using the type of some other function parameter to bound a receiver like in this pretty contrived example (the actual method call is down here). Other cases like this don't trigger this error because the nature of a Self type can't easily be represented; trait bounds get a bit recursive.
- I therefore think these mismatched type errors will be sufficiently rare that we don't need to worry about constraining this feature just to avoid that confusing diagnostic.
- However, we can add better diagnostics in this case. As there are all sorts of type errors, I think the best approach is just to add an extra note like this to such type errors. (Note that more complex cases won't flow through this path so won't get this additional diagnostic, but we may as well show it in simple cases.)
Complex interaction with anti-shadowing algorithms
- An earlier draft of the RFC proposed a fairly complex algorithm to detect shadowed methods, which would allow us to use types like raw pointers, Weak<T> or NonNull<T> as self types. During the process of agreeing the RFC, we agreed that the complexity of this deshadowing approach was not worth its value - for now. Instead, such types need to be wrapped in newtype wrappers if they are to be used as self types.
- This deshadowing algorithm sometimes favored "inner" methods - e.g. in a case like A<B> it would pick B::foo rather than A::foo. This was surprising (which is the reason we dropped the idea) and the surprise levels might have been greater for generic types, where A::Target didn't simply and always point to B.
- This complexity has fallen away from the RFC now, so isn't a concern. If we put it back in future, we can consider excluding certain kinds of generic types from that algorithm.
Avoiding "unknown unknowns".
- The original reason to ban generics was simply because of unknown unknowns.

IMO the only remaining reason to avoid generic receiver types is the desire to avoid "unknown unknowns" per this third point. This is a good reason, but there are counterarguments:

We don't really know what we mean by a "generic" receiver, since by definition all the types of receiver we really care about are partially generic: Box<Self>, Rc<Self>, SomeCppPtr<Self>, etc. In fact some are even more generic (Box<Self, A>).
Even if we did have a clear definition, implementing it might cause more surprise in users. For example if a user had been happily using Foo<A> as their self type, then (for example) an upstream crate changes it to Foo<A,B> and things break. It's hard to think of any definition of "generic" where we don't run into this sort of risk.

As such: I am now firmly of the opinion that we should just allow generic receivers, as the current nightly arbitrary self types v1 does.

joshtriplett · 2024-09-06T00:19:55Z

Thank you for the detailed and thorough analysis!

If @RalfJung is on board with this analysis, then 👍 for lifting the restriction on generics.

RalfJung · 2024-09-06T07:00:42Z

I just suggested that we should be conservative here. My gut feeling is that we should "know the receiver type constructor" (i.e., identify the relevant Receiver impl) without instantiating the generic, but having generics "inside" the receiver type is fine (if that Receiver impl is itself generic). However, I am not sure if that is even a well-defined notion.

I'm too out of the loop to follow the details of the suggestion here, this sounds like a question for @rust-lang/types. :)

compiler-errors · 2024-09-06T09:04:32Z

I don't see any reason why we should need to support generics that come from the method itself. That should be easy enough to detect.

adetaylor · 2024-09-06T09:29:31Z

I don't see any reason why we should need to support generics that come from the method itself. That should be easy enough to detect.

OK, to check I understand your proposal, we want to distinguish these cases:

struct MyBox<T>(T);
impl<T> Receiver for MyBox<T> {
  type Target = T;
}

struct Foo;
impl Foo {
  fn ok(self: MyBox<Foo>) {}
  fn not_ok1<X: Receiver<Target=Self>>(self: X) {}
  fn not_ok2<A: Allocator>(self: Box<Self, A>)) {}
}

To detect this I'm assuming we want to do something along these lines in wfcheck.rs:

Retain the old non-arbitrary-self-types algorithm. Always first check whether a method call is valid according to that old algorithm. If so, approve it. i.e. we never reject any method calls which are currently valid on stable.
If the method call fails that validation, it requires arbitrary self types.
Assemble a list of type parameters of the function signature (in the above examples, X for not_ok1 and A for not_ok2)
Recurse into the self type and if its type contains any of those type parameters, reject it with a new diagnostic

The new diagnostic might be something like:

generic methods can't be used with arbitrary self types. Use one of the standard self types such as Self, &Self, &mut Self

(We can probably optimize to avoid two separate validation passes, but semantically, that's what you think we should aim for?)

I'm a bit concerned about the Box<T, A> case, since Box is already hard-coded as a valid receiver type. If we want to retain compatibility with code like this we would need to retain the special-casing for Box, which seems a bit sad. But perhaps we don't care since the allocator API isn't stable? Opinions?

RalfJung · 2024-09-06T13:09:46Z

I'm a bit concerned about the Box<T, A> case, since Box is already hard-coded as a valid receiver type.

I would imagine that has an impl like impl<T, A> Receiver for Box<T, A>. So fn not_ok2<A: Allocator>(self: Box<Self, A>)) should IMO be fine -- the concrete Receiver impl is determined statically.

It is only when using a Receiver bound (and maybe other traits like Deref? not sure what the system ends up looking like) that we get into situations where different instances could use Self in completely different ways. Maybe that's a problem, maybe not -- but if we allow that it should be carefully thought through.

adetaylor · 2024-09-08T09:18:55Z

After looking at the options, I agree with @compiler-errors 's plan here. Specifically, we'll reject methods where the outermost receiver type is:

A type parameter belonging to the method; or
A direct or indirect reference or pointer to such (as determined by following the &/Receiver chain, or in current "arbitrary self types", the &/Deref chain)

All other self types pass the test, including types which refer to such type params in their generic arguments.

This filter seems to work as desired - I'm working on a PR now.

Rationale

Documenting the thought process here for posterity.

First,

It is only when using a Receiver bound (and maybe other traits like Deref)

I'm worried about explicitly filtering out Receiver because of course other traits (including as you note Deref) might implement Receiver. We could attempt to do a query for whether a given trait indirectly implements Receiver but I'd be worried we'll cause semver compatibility problems if we do that, or more generally cause surprises.

I had a various chats with @Darksonn about this this weekend (thanks!) She made me realize that the only thing which matters is the outermost type kind, not any generic parameters elsewhere in the type. I think that's what the rest of you have been telling me for days too but I hadn't figured that out :)

We discussed only allowing concrete paths as the outermost type. I think the fundamentals of that idea are sound, but it turns out to be a bit more complex:

We obviously need to allow references to paths too. The rest of this code doesn't really distinguish between referents reached by following built-in references (&, &mut) vs following chains of Deref/Receiver so ideally this test wouldn't distinguish either.
We need raw pointers if #![feature(arbitrary_self_types_pointers) is enabled
We need to allow methods on anything if it's simply by value, not just paths:
- The stdlib implements methods on arrays, tuples, primitives, and never
- The Rust tests implement methods on str and dyn T
We need to allow type params if those params are Self, or a param defined on the impl block instead of in the method signature. (e.g. Rust tests include some methods in impl Trait for T where T: blah {} impls, and the self type works out to be the type param T)

The key case turns out to be our old friend, Pin<&mut T>:

// Both Self and A below are ty::Param
pub trait Future {
  fn poll(self: Pin<&mut Self>) {} // must compile
}

pub trait Foo {
  fn poll<A: Deref<Target=Self>>(self: Pin<&mut A>) {} // probably should NOT compile
}

So, IMO this proves that the "good vs bad" filter must involve distinguishing some ty::Params from others.

The ty::Params which are OK are those belonging to the impl block, e.g. Self. The ty::Params which are not OK are those belonging to the method. As we're evaluating the method signature, we have ready access to the latter list, so we'll blocklist them.

Hence - I'm now pretty sure that this is the right plan.

If that doesn't make any sense, don't worry - PR coming along soon which will make it clearer.

@wesleywiser

The RFC for arbitrary self types v2 declares that we should reject "generic" self types. This commit does so. The definition of "generic" was unclear in the RFC, but has been explored in rust-lang#129147 and the conclusion is that "generic" means any `self` type which is a type parameter defined on the method itself, or references to such a type. This approach was chosen because other definitions of "generic" don't work. Specifically, * we can't filter out generic type _arguments_, because that would filter out Rc<Self> and all the other types of smart pointer we want to support; * we can't filter out all type params, because Self itself is a type param, and because existing Rust code depends on other type params declared on the type (as opposed to the method). This PR decides to make a new error code for this case, instead of reusing the existing E0307 error. This makes the code a bit more complex, but it seems we have an opportunity to provide specific diagnostics for this case so we should do so. This PR filters out generic self types whether or not the 'arbitrary self types' feature is enabled. However, it's believed that it can't have any effect on code which uses stable Rust, since there are no stable traits which can be used to indicate a valid generic receiver type, and thus it would have been impossible to write code which could trigger this new error case. It is however possible that this could break existing code which uses either of the unstable `arbitrary_self_types` or `receiver_trait` features. This breakage is intentional; as we move arbitrary self types towards stabilization we don't want to continue to support generic such types. This PR adds lots of extra tests to arbitrary-self-from-method-substs. Most of these are ways to trigger a "type mismatch" error which https://github.com/rust-lang/rust/blob/9b82580c7347f800c2550e6719e4218a60a80b28/compiler/rustc_hir_typeck/src/method/confirm.rs#L519 hopes can be minimized by filtering out generics in this way. We remove a FIXME from confirm.rs suggesting that we make this change. It's still possible to cause type mismatch errors, and a subsequent PR may be able to improve diagnostics in this area, but it's harder to cause these errors without contrived uses of the turbofish. This is a part of the arbitrary self types v2 project, rust-lang/rfcs#3519 rust-lang#44874 r? @wesleywiser

RalfJung · 2024-09-08T11:33:38Z

I'm not sure this quite aligns with my intuotion... If the receiver type is "&MyType<T>” and we also have a "where MyType<T>: Receiver", this is still just as generic as the cases we want to rule out, isn't it? The key distinction is really whether we can prove that this is a receiver type without using the where clauses.

adetaylor · 2024-09-08T12:09:44Z

Is this the sort of case you're worried about?

@wesleywiser

The RFC for arbitrary self types v2 declares that we should reject "generic" self types. This commit does so. The definition of "generic" was unclear in the RFC, but has been explored in rust-lang#129147 and the conclusion is that "generic" means any `self` type which is a type parameter defined on the method itself, or references to such a type. This approach was chosen because other definitions of "generic" don't work. Specifically, * we can't filter out generic type _arguments_, because that would filter out Rc<Self> and all the other types of smart pointer we want to support; * we can't filter out all type params, because Self itself is a type param, and because existing Rust code depends on other type params declared on the type (as opposed to the method). This PR is a second attempt at achieving this, based on producing a new well-formedness checking context which is only aware of the type params of the impl block, not of the method itself. It doesn't currently work because it turns out we do need some method params under some circumstances, but raising it for completeness. This PR adds lots of extra tests to arbitrary-self-from-method-substs. Most of these are ways to trigger a "type mismatch" error which https://github.com/rust-lang/rust/blob/9b82580c7347f800c2550e6719e4218a60a80b28/compiler/rustc_hir_typeck/src/method/confirm.rs#L519 hopes can be minimized by filtering out generics in this way. We remove a FIXME from confirm.rs suggesting that we make this change. It's still possible to cause type mismatch errors, and a subsequent PR may be able to improve diagnostics in this area, but it's harder to cause these errors without contrived uses of the turbofish. This is a part of the arbitrary self types v2 project, rust-lang/rfcs#3519 rust-lang#44874 r? @wesleywiser

adetaylor · 2024-09-09T09:43:20Z

I've attempted to come up with an alternative PR which rejects all self types that depend in any way on method parameters, as opposed to the previous PR which only rejected self types which were a type param.

First approach, rejecting any self types which were a method type param: Reject generic self types. #130098
Second approach, rejecting any self types which depend upon a method type param in any way: Reject generic self types using impl defid: do not merge. #130120 (incomplete, needs cleanup)

The second approach should reject methods whose self type depends upon where clauses per the concerns raised here and explored here.

However, this second approach doesn't currently work due to the Box problem I was concerned about. In alloc::collections::LinkedList::Node:

impl<T> Node<T> {
    fn into_element<A: Allocator>(self: Box<Self, A>) -> T {
        self.element
    }
}

and this in slice:

impl<T> [T] {
    pub fn into_vec<A: Allocator>(self: Box<Self, A>) -> Vec<T, A> {
        // ...
    }
}

Presumably we'd find similar patterns in third party crates too. So... we can't just entirely ignore the type params belonging to the method.

It's very hard to think of a compromise here. Even if I could figure out a way to retain the type params but not their predicates, that won't help, because we need the : Allocator predicate to successfully match against the Deref impl. I can't think of a way to filter "good" predicates vs "bad" predicates.

Thanks to @nbdd0121 and (again) @Darksonn for help here.

I think the options are:

Allow generics after all.
Filter out most generics in practice using an approach like Reject generic self types. #130098. This isn't very satisfying from a language rules perspective, but if our goal is to maintain flexibility in the case of unknown unknowns, it probably does the trick by making generic self types very rare.
Spot some compromise I can't spot
Rewrite LinkedList::Node (seems possible by giving Node a PhantomData field), constrain this slice method to work with only the global allocator (presumably a significant compatibility break?), and accept we'll probably break some third party crates, and land something like Reject generic self types using impl defid: do not merge. #130120. It feels like this is probably not OK.

nbdd0121 · 2024-09-09T09:49:44Z

I feel the allocator bound is a legit use of arbitrary self types and therefore ignoring predicates on impl item would not be a good approach.

We could filter out only predicates that mention Deref/DerefMut, but it feels pretty ad-hoc and people can workaround by defining a new trait with Deref being it's super trait, so that's probably also not a good approach.

I would suggest that we drop the non-generic requirement all-together.

RalfJung · 2024-09-09T16:07:57Z

Is this the sort of case you're worried about?

No, as there we have a single impl<T> Deref for MyPtr<T>. But now imagine we have impl<T> Deref for MyPtr<Box<T>> and impl<T> Deref for MyPtr<Arc<T>> -- these impls could be doing entirely different things, and yet either of them could be relevant for dispatching fn m<T>(self: &MyPtr<T>) where MyPtr<T>: Deref<Target=Self>.

RalfJung · 2024-09-09T16:09:20Z

It's very hard to think of a compromise here

I've suggested a compromise multiple times: we have to be able to identify the unique impl that allows this to be a receiver, without knowing which values the generic type parameters take. That allows the allocator case but rejects the example from my previous comment.

I don't know how to implement that, as I know nothing about the trait solver, but as far as I can see this is entirely well-defined.

nbdd0121 · 2024-09-09T16:15:18Z

I don't think that'll work, since for the Deref case the Receiver impl can be resolved to the blanket impl that implements it for all Deref.

RalfJung · 2024-09-09T16:33:10Z

Ah, that's how these two traits interact? I was wondering about that. Yeah okay if we have such generic Receiver impls that makes it tricky then.

Anyway, if t-types says there is no problem here then that's fine for me as well.

nbdd0121 · 2024-09-09T16:56:45Z

I suppose we can first resolve Receiver impl, and if it resolves to the blanket impl then we try to resolve Deref and see if it's too generic. But that also feels a bit ad-hoc..

lcnr · 2024-09-12T10:01:16Z

Is this the sort of case you're worried about?

No, as there we have a single impl<T> Deref for MyPtr<T>. But now imagine we have impl<T> Deref for MyPtr<Box<T>> and impl<T> Deref for MyPtr<Arc<T>> -- these impls could be doing entirely different things, and yet either of them could be relevant for dispatching fn m<T>(self: &MyPtr<T>) where MyPtr<T>: Deref<Target=Self>.

so something like https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=feca59e56ec2092732906a7c10493f67?

RalfJung · 2024-09-12T10:07:50Z

Yeah, something like that. Those impls can now have completely different Target types and thus the method may be looked up quite differently... it's a bit hard for me to imagine what even happens in the general case.^^ Here's an example.

I am surprised that this even compiles... are we considering every inherent method of every type to be a potential candidate here to find out which one gets called? Won't that be a mess?

traviscross removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 16, 2024

traviscross mentioned this issue Aug 16, 2024

Tracking issue for RFC 3519: arbitrary_self_types #44874

Open

9 tasks

traviscross removed the I-lang-nominated Nominated for discussion during a lang team meeting. label Sep 2, 2024

adetaylor mentioned this issue Sep 4, 2024

Arbitrary self types: diagnostics for mismatched self types #129968

Draft

adetaylor mentioned this issue Sep 8, 2024

Reject generic self types. #130098

Open

adetaylor mentioned this issue Sep 8, 2024

Reject generic self types using impl defid: do not merge. #130120

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decide about generics in arbitrary self types #129147

Decide about generics in arbitrary self types #129147

traviscross commented Aug 16, 2024

adetaylor commented Aug 27, 2024

adetaylor commented Sep 2, 2024 •

edited

Loading

adetaylor commented Sep 5, 2024

joshtriplett commented Sep 6, 2024

RalfJung commented Sep 6, 2024 •

edited

Loading

compiler-errors commented Sep 6, 2024

adetaylor commented Sep 6, 2024

RalfJung commented Sep 6, 2024

adetaylor commented Sep 8, 2024

RalfJung commented Sep 8, 2024 via email

adetaylor commented Sep 8, 2024

adetaylor commented Sep 9, 2024

nbdd0121 commented Sep 9, 2024

RalfJung commented Sep 9, 2024

RalfJung commented Sep 9, 2024 •

edited

Loading

nbdd0121 commented Sep 9, 2024

RalfJung commented Sep 9, 2024

nbdd0121 commented Sep 9, 2024

lcnr commented Sep 12, 2024

RalfJung commented Sep 12, 2024 •

edited

Loading

Decide about generics in arbitrary self types #129147

Decide about generics in arbitrary self types #129147

Comments

traviscross commented Aug 16, 2024

adetaylor commented Aug 27, 2024

adetaylor commented Sep 2, 2024 • edited Loading

adetaylor commented Sep 5, 2024

joshtriplett commented Sep 6, 2024

RalfJung commented Sep 6, 2024 • edited Loading

compiler-errors commented Sep 6, 2024

adetaylor commented Sep 6, 2024

RalfJung commented Sep 6, 2024

adetaylor commented Sep 8, 2024

Rationale

RalfJung commented Sep 8, 2024 via email

adetaylor commented Sep 8, 2024

adetaylor commented Sep 9, 2024

nbdd0121 commented Sep 9, 2024

RalfJung commented Sep 9, 2024

RalfJung commented Sep 9, 2024 • edited Loading

nbdd0121 commented Sep 9, 2024

RalfJung commented Sep 9, 2024

nbdd0121 commented Sep 9, 2024

lcnr commented Sep 12, 2024

RalfJung commented Sep 12, 2024 • edited Loading

adetaylor commented Sep 2, 2024 •

edited

Loading

RalfJung commented Sep 6, 2024 •

edited

Loading

RalfJung commented Sep 9, 2024 •

edited

Loading

RalfJung commented Sep 12, 2024 •

edited

Loading