Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide about generics in arbitrary self types #129147

Open
traviscross opened this issue Aug 16, 2024 · 20 comments
Open

Decide about generics in arbitrary self types #129147

traviscross opened this issue Aug 16, 2024 · 20 comments
Labels
C-discussion Category: Discussion or questions that doesn't represent real issues. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@traviscross
Copy link
Contributor

Over here, @adetaylor gave an update about arbitrary self types that included this bit:

During the preparation of the RFC it was broadly felt that we should ban "generic" self types but we didn't really define what "generic" meant, and I didn't pin it down enough.

Some of the commentary:

It seems to be widely felt that:

impl SomeType {
 fn m<R: Deref<Target=Self>>(self: R) {}
}

would be confusing, but (per those comments) it's actually pretty hard to distinguish that from various legitimate cases for arbitrary self types with generic receivers:

impl SomeType {
  fn method1(self: MyBox<Self, impl Allocator>) {}
  fn method2<const ID: u64>(self: ListArc<Self, ID>) {}
}

I played around with different tests here on the self type in wfcheck.rs but none of them yield the right sort of filter of good/bad cases (which is unsurprising since we haven't quite defined what that means, but I thought I might hit inspiration).

From those comment threads, the most concrete proposal (from @joshtriplett) is:

just disallow the case where the top-level receiver type itself is not concrete.

I plan to have a crack at that, unless folks have other thoughts. cc @Nadrieril who had opinions here.

If we do this, I might need to revisit the diagnostics mentioned in the bits of RFC removed by this commit.

We never made a decision about this. Perhaps if we could offer more clarity, it would save @adetaylor some time here, so it may be worth us discussing.

@rustbot labels +T-lang +I-lang-nominated -needs-triage +C-discussion

cc @rust-lang/lang @adetaylor

@rustbot rustbot added needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. C-discussion Category: Discussion or questions that doesn't represent real issues. I-lang-nominated Nominated for discussion during a lang team meeting. T-lang Relevant to the language team, which will review and decide on the PR/issue. labels Aug 16, 2024
@traviscross traviscross removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 16, 2024
@adetaylor
Copy link
Contributor

Thanks for raising this @traviscross .

@adetaylor
Copy link
Contributor

adetaylor commented Sep 2, 2024

Current status: investigating whether I can improve the diagnostics here that can occur when generics are used and the self type doesn't unify cleanly. The canonical failure case is here and the suggestion to improve diagnostics here is from this comment by @compiler-errors.

Outcomes might be:

  • (the best case) we can find checks we put in place within wfcheck.rs to eliminate some of these classes of type mismatch - I'm not sure if this will turn out to be possible.
  • maybe it turns out that these mismatched types errors can only be achieved by use of a turbofish, and perhaps there is somewhere that turbofishes are checked for correctness and this can be enhanced - I think I'm ruling this out because it would need to understand that the generic type parameter is used for the self type, and that information is presumably hard to find at the call site (thoughts welcome though)
  • (the worst case) we can't do anything to stop these errors being emitted where they are, but we can probably at least replace the generic "mismatched type" error with a new, slightly more helpful "mismatched self type" error.

Once I've got to the bottom of this, I'll report back, with some thoughts on all the reasons we might have wanted to exclude generics (not just diagnostics).

@traviscross traviscross removed the I-lang-nominated Nominated for discussion during a lang team meeting. label Sep 2, 2024
adetaylor added a commit to adetaylor/rust that referenced this issue Sep 4, 2024
Adding tests for cases where diagnostics aren't as good as they should
be in the case of arbitrary self types.

These are slightly duplicative of the existing
arbitrary-self-from-method-substs.rs
test, but test more permutations so it seems worth adding to the
coverage as we explore improving the diagnostics here.

Improved diagnostics were suggested in commit 05c5caa

This is a part of the arbitrary self types v2 project,
rust-lang/rfcs#3519
and specifically the sub-issue exploring questions around
generics,
rust-lang#129147
adetaylor added a commit to adetaylor/rust that referenced this issue Sep 4, 2024
Adding tests for cases where diagnostics aren't as good as they should
be in the case of arbitrary self types.

These are slightly duplicative of the existing
arbitrary-self-from-method-substs.rs
test, but test more permutations so it seems worth adding to the
coverage as we explore improving the diagnostics here.

Restritions here were suggested in commit 05c5caa

This is a part of the arbitrary self types v2 project,
rust-lang/rfcs#3519
rust-lang#44874
and specifically the sub-issue exploring questions around
generics,
rust-lang#129147
adetaylor added a commit to adetaylor/rust that referenced this issue Sep 4, 2024
Adding tests for cases where diagnostics aren't as good as they should
be in the case of arbitrary self types.

These are slightly duplicative of the existing
arbitrary-self-from-method-substs.rs
test, but test more permutations so it seems worth adding to the
coverage as we explore improving the diagnostics here.

Restritions here were suggested in commit 05c5caa

This is a part of the arbitrary self types v2 project,
rust-lang/rfcs#3519
rust-lang#44874
and specifically the sub-issue exploring questions around
generics,
rust-lang#129147
adetaylor added a commit to adetaylor/rust that referenced this issue Sep 5, 2024
Adding tests for cases where diagnostics aren't as good as they should
be in the case of arbitrary self types.

These are slightly duplicative of the existing
arbitrary-self-from-method-substs.rs
test, but test more permutations so it seems worth adding to the
coverage as we explore improving the diagnostics here.

Restritions here were suggested in commit 05c5caa

This is a part of the arbitrary self types v2 project,
rust-lang/rfcs#3519
rust-lang#44874
and specifically the sub-issue exploring questions around
generics,
rust-lang#129147
@adetaylor
Copy link
Contributor

Here's a summary of the concerns and considerations about why we might want to block generic arbitrary self
types.

Also, I finally found the original suggestion to ban generics from @RalfJung , which led to the commit editing the RFC to do so.

  • Diagnostics. With this sort of code we end up with a mismatched type error. This comment from @compiler-errors says we should try to restrict the feature such that users can't end up in that circumstance quite as often.

    • Having looked into this a bit, it's pretty hard to trigger such a case by accident. This code example uses a turbofish to explicitly insist on a different type.
    • Without a turbofish, I can't think of any ways to trigger such a mismatched type error except by using the type of some other function parameter to bound a receiver like in this pretty contrived example (the actual method call is down here). Other cases like this don't trigger this error because the nature of a Self type can't easily be represented; trait bounds get a bit recursive.
    • I therefore think these mismatched type errors will be sufficiently rare that we don't need to worry about constraining this feature just to avoid that confusing diagnostic.
    • However, we can add better diagnostics in this case. As there are all sorts of type errors, I think the best approach is just to add an extra note like this to such type errors. (Note that more complex cases won't flow through this path so won't get this additional diagnostic, but we may as well show it in simple cases.)
  • Complex interaction with anti-shadowing algorithms

    • An earlier draft of the RFC proposed a fairly complex algorithm to detect shadowed methods, which would allow us to use types like raw pointers, Weak<T> or NonNull<T> as self types. During the process of agreeing the RFC, we agreed that the complexity of this deshadowing approach was not worth its value - for now. Instead, such types need to be wrapped in newtype wrappers if they are to be used as self types.
    • This deshadowing algorithm sometimes favored "inner" methods - e.g. in a case like A<B> it would pick B::foo rather than A::foo. This was surprising (which is the reason we dropped the idea) and the surprise levels might have been greater for generic types, where A::Target didn't simply and always point to B.
    • This complexity has fallen away from the RFC now, so isn't a concern. If we put it back in future, we can consider excluding certain kinds of generic types from that algorithm.
  • Avoiding "unknown unknowns".

IMO the only remaining reason to avoid generic receiver types is the desire to avoid "unknown unknowns" per this third point. This is a good reason, but there are counterarguments:

  • We don't really know what we mean by a "generic" receiver, since by definition all the types of receiver we really care about are partially generic: Box<Self>, Rc<Self>, SomeCppPtr<Self>, etc. In fact some are even more generic (Box<Self, A>).
  • Even if we did have a clear definition, implementing it might cause more surprise in users. For example if a user had been happily using Foo<A> as their self type, then (for example) an upstream crate changes it to Foo<A,B> and things break. It's hard to think of any definition of "generic" where we don't run into this sort of risk.

As such: I am now firmly of the opinion that we should just allow generic receivers, as the current nightly arbitrary self types v1 does.

@joshtriplett
Copy link
Member

Thank you for the detailed and thorough analysis!

If @RalfJung is on board with this analysis, then 👍 for lifting the restriction on generics.

@RalfJung
Copy link
Member

RalfJung commented Sep 6, 2024

I just suggested that we should be conservative here. My gut feeling is that we should "know the receiver type constructor" (i.e., identify the relevant Receiver impl) without instantiating the generic, but having generics "inside" the receiver type is fine (if that Receiver impl is itself generic). However, I am not sure if that is even a well-defined notion.

I'm too out of the loop to follow the details of the suggestion here, this sounds like a question for @rust-lang/types. :)

@compiler-errors
Copy link
Member

I don't see any reason why we should need to support generics that come from the method itself. That should be easy enough to detect.

@adetaylor
Copy link
Contributor

I don't see any reason why we should need to support generics that come from the method itself. That should be easy enough to detect.

OK, to check I understand your proposal, we want to distinguish these cases:

struct MyBox<T>(T);
impl<T> Receiver for MyBox<T> {
  type Target = T;
}

struct Foo;
impl Foo {
  fn ok(self: MyBox<Foo>) {}
  fn not_ok1<X: Receiver<Target=Self>>(self: X) {}
  fn not_ok2<A: Allocator>(self: Box<Self, A>)) {}
}

To detect this I'm assuming we want to do something along these lines in wfcheck.rs:

  • Retain the old non-arbitrary-self-types algorithm. Always first check whether a method call is valid according to that old algorithm. If so, approve it. i.e. we never reject any method calls which are currently valid on stable.
  • If the method call fails that validation, it requires arbitrary self types.
  • Assemble a list of type parameters of the function signature (in the above examples, X for not_ok1 and A for not_ok2)
  • Recurse into the self type and if its type contains any of those type parameters, reject it with a new diagnostic

The new diagnostic might be something like:

generic methods can't be used with arbitrary self types. Use one of the standard self types such as Self, &Self, &mut Self

(We can probably optimize to avoid two separate validation passes, but semantically, that's what you think we should aim for?)

I'm a bit concerned about the Box<T, A> case, since Box is already hard-coded as a valid receiver type. If we want to retain compatibility with code like this we would need to retain the special-casing for Box, which seems a bit sad. But perhaps we don't care since the allocator API isn't stable? Opinions?

@RalfJung
Copy link
Member

RalfJung commented Sep 6, 2024

I'm a bit concerned about the Box<T, A> case, since Box is already hard-coded as a valid receiver type.

I would imagine that has an impl like impl<T, A> Receiver for Box<T, A>. So fn not_ok2<A: Allocator>(self: Box<Self, A>)) should IMO be fine -- the concrete Receiver impl is determined statically.

It is only when using a Receiver bound (and maybe other traits like Deref? not sure what the system ends up looking like) that we get into situations where different instances could use Self in completely different ways. Maybe that's a problem, maybe not -- but if we allow that it should be carefully thought through.

@adetaylor
Copy link
Contributor

After looking at the options, I agree with @compiler-errors 's plan here. Specifically, we'll reject methods where the outermost receiver type is:

  • A type parameter belonging to the method; or
  • A direct or indirect reference or pointer to such (as determined by following the &/Receiver chain, or in current "arbitrary self types", the &/Deref chain)

All other self types pass the test, including types which refer to such type params in their generic arguments.

This filter seems to work as desired - I'm working on a PR now.

Rationale

Documenting the thought process here for posterity.

First,

It is only when using a Receiver bound (and maybe other traits like Deref)

I'm worried about explicitly filtering out Receiver because of course other traits (including as you note Deref) might implement Receiver. We could attempt to do a query for whether a given trait indirectly implements Receiver but I'd be worried we'll cause semver compatibility problems if we do that, or more generally cause surprises.

I had a various chats with @Darksonn about this this weekend (thanks!) She made me realize that the only thing which matters is the outermost type kind, not any generic parameters elsewhere in the type. I think that's what the rest of you have been telling me for days too but I hadn't figured that out :)

We discussed only allowing concrete paths as the outermost type. I think the fundamentals of that idea are sound, but it turns out to be a bit more complex:

  • We obviously need to allow references to paths too. The rest of this code doesn't really distinguish between referents reached by following built-in references (&, &mut) vs following chains of Deref/Receiver so ideally this test wouldn't distinguish either.
  • We need raw pointers if #![feature(arbitrary_self_types_pointers) is enabled
  • We need to allow methods on anything if it's simply by value, not just paths:
    • The stdlib implements methods on arrays, tuples, primitives, and never
    • The Rust tests implement methods on str and dyn T
  • We need to allow type params if those params are Self, or a param defined on the impl block instead of in the method signature. (e.g. Rust tests include some methods in impl Trait for T where T: blah {} impls, and the self type works out to be the type param T)

The key case turns out to be our old friend, Pin<&mut T>:

// Both Self and A below are ty::Param
pub trait Future {
  fn poll(self: Pin<&mut Self>) {} // must compile
}

pub trait Foo {
  fn poll<A: Deref<Target=Self>>(self: Pin<&mut A>) {} // probably should NOT compile
}

So, IMO this proves that the "good vs bad" filter must involve distinguishing some ty::Params from others.

The ty::Params which are OK are those belonging to the impl block, e.g. Self. The ty::Params which are not OK are those belonging to the method. As we're evaluating the method signature, we have ready access to the latter list, so we'll blocklist them.

Hence - I'm now pretty sure that this is the right plan.

If that doesn't make any sense, don't worry - PR coming along soon which will make it clearer.

adetaylor added a commit to adetaylor/rust that referenced this issue Sep 8, 2024
The RFC for arbitrary self types v2 declares that we should reject
"generic" self types. This commit does so.

The definition of "generic" was unclear in the RFC, but has been
explored in
rust-lang#129147
and the conclusion is that "generic" means any `self` type which
is a type parameter defined on the method itself, or references
to such a type.

This approach was chosen because other definitions of "generic"
don't work. Specifically,
* we can't filter out generic type _arguments_, because that would
  filter out Rc<Self> and all the other types of smart pointer
  we want to support;
* we can't filter out all type params, because Self itself is a
  type param, and because existing Rust code depends on other
  type params declared on the type (as opposed to the method).

This PR decides to make a new error code for this case, instead of
reusing the existing E0307 error. This makes the code a
bit more complex, but it seems we have an opportunity to provide
specific diagnostics for this case so we should do so.

This PR filters out generic self types whether or not the
'arbitrary self types' feature is enabled. However, it's believed
that it can't have any effect on code which uses stable Rust, since
there are no stable traits which can be used to indicate a valid
generic receiver type, and thus it would have been impossible to
write code which could trigger this new error case.
It is however possible that this could break existing code which
uses either of the unstable `arbitrary_self_types` or
`receiver_trait` features. This breakage is intentional; as
we move arbitrary self types towards stabilization we don't want
to continue to support generic such types.

This PR adds lots of extra tests to arbitrary-self-from-method-substs.
Most of these are ways to trigger a "type mismatch" error which
https://github.com/rust-lang/rust/blob/9b82580c7347f800c2550e6719e4218a60a80b28/compiler/rustc_hir_typeck/src/method/confirm.rs#L519
hopes can be minimized by filtering out generics in this way.
We remove a FIXME from confirm.rs suggesting that we make this change.
It's still possible to cause type mismatch errors, and a subsequent
PR may be able to improve diagnostics in this area, but it's harder
to cause these errors without contrived uses of the turbofish.

This is a part of the arbitrary self types v2 project,
rust-lang/rfcs#3519
rust-lang#44874

r? @wesleywiser
@RalfJung
Copy link
Member

RalfJung commented Sep 8, 2024 via email

@adetaylor
Copy link
Contributor

adetaylor added a commit to adetaylor/rust that referenced this issue Sep 8, 2024
The RFC for arbitrary self types v2 declares that we should reject
"generic" self types. This commit does so.

The definition of "generic" was unclear in the RFC, but has been
explored in
rust-lang#129147
and the conclusion is that "generic" means any `self` type which
is a type parameter defined on the method itself, or references
to such a type.

This approach was chosen because other definitions of "generic"
don't work. Specifically,
* we can't filter out generic type _arguments_, because that would
  filter out Rc<Self> and all the other types of smart pointer
  we want to support;
* we can't filter out all type params, because Self itself is a
  type param, and because existing Rust code depends on other
  type params declared on the type (as opposed to the method).

This PR is a second attempt at achieving this, based on producing a new
well-formedness checking context which is only aware of the type params
of the impl block, not of the method itself.

It doesn't currently work because it turns out we do need some method
params under some circumstances, but raising it for completeness.

This PR adds lots of extra tests to arbitrary-self-from-method-substs.
Most of these are ways to trigger a "type mismatch" error which
https://github.com/rust-lang/rust/blob/9b82580c7347f800c2550e6719e4218a60a80b28/compiler/rustc_hir_typeck/src/method/confirm.rs#L519
hopes can be minimized by filtering out generics in this way.
We remove a FIXME from confirm.rs suggesting that we make this change.
It's still possible to cause type mismatch errors, and a subsequent
PR may be able to improve diagnostics in this area, but it's harder
to cause these errors without contrived uses of the turbofish.

This is a part of the arbitrary self types v2 project,
rust-lang/rfcs#3519
rust-lang#44874

r? @wesleywiser
@adetaylor
Copy link
Contributor

I've attempted to come up with an alternative PR which rejects all self types that depend in any way on method parameters, as opposed to the previous PR which only rejected self types which were a type param.

The second approach should reject methods whose self type depends upon where clauses per the concerns raised here and explored here.

However, this second approach doesn't currently work due to the Box problem I was concerned about. In alloc::collections::LinkedList::Node:

impl<T> Node<T> {
    fn into_element<A: Allocator>(self: Box<Self, A>) -> T {
        self.element
    }
}

and this in slice:

impl<T> [T] {
    pub fn into_vec<A: Allocator>(self: Box<Self, A>) -> Vec<T, A> {
        // ...
    }
}

Presumably we'd find similar patterns in third party crates too. So... we can't just entirely ignore the type params belonging to the method.

It's very hard to think of a compromise here. Even if I could figure out a way to retain the type params but not their predicates, that won't help, because we need the : Allocator predicate to successfully match against the Deref impl. I can't think of a way to filter "good" predicates vs "bad" predicates.

Thanks to @nbdd0121 and (again) @Darksonn for help here.

I think the options are:

  1. Allow generics after all.
  2. Filter out most generics in practice using an approach like Reject generic self types. #130098. This isn't very satisfying from a language rules perspective, but if our goal is to maintain flexibility in the case of unknown unknowns, it probably does the trick by making generic self types very rare.
  3. Spot some compromise I can't spot
  4. Rewrite LinkedList::Node (seems possible by giving Node a PhantomData field), constrain this slice method to work with only the global allocator (presumably a significant compatibility break?), and accept we'll probably break some third party crates, and land something like Reject generic self types using impl defid: do not merge. #130120. It feels like this is probably not OK.

@nbdd0121
Copy link
Contributor

nbdd0121 commented Sep 9, 2024

I feel the allocator bound is a legit use of arbitrary self types and therefore ignoring predicates on impl item would not be a good approach.

We could filter out only predicates that mention Deref/DerefMut, but it feels pretty ad-hoc and people can workaround by defining a new trait with Deref being it's super trait, so that's probably also not a good approach.

I would suggest that we drop the non-generic requirement all-together.

@RalfJung
Copy link
Member

RalfJung commented Sep 9, 2024

Is this the sort of case you're worried about?

No, as there we have a single impl<T> Deref for MyPtr<T>. But now imagine we have impl<T> Deref for MyPtr<Box<T>> and impl<T> Deref for MyPtr<Arc<T>> -- these impls could be doing entirely different things, and yet either of them could be relevant for dispatching fn m<T>(self: &MyPtr<T>) where MyPtr<T>: Deref<Target=Self>.

@RalfJung
Copy link
Member

RalfJung commented Sep 9, 2024

It's very hard to think of a compromise here

I've suggested a compromise multiple times: we have to be able to identify the unique impl that allows this to be a receiver, without knowing which values the generic type parameters take. That allows the allocator case but rejects the example from my previous comment.

I don't know how to implement that, as I know nothing about the trait solver, but as far as I can see this is entirely well-defined.

@nbdd0121
Copy link
Contributor

nbdd0121 commented Sep 9, 2024

I don't think that'll work, since for the Deref case the Receiver impl can be resolved to the blanket impl that implements it for all Deref.

@RalfJung
Copy link
Member

RalfJung commented Sep 9, 2024

Ah, that's how these two traits interact? I was wondering about that. Yeah okay if we have such generic Receiver impls that makes it tricky then.

Anyway, if t-types says there is no problem here then that's fine for me as well.

@nbdd0121
Copy link
Contributor

nbdd0121 commented Sep 9, 2024

I suppose we can first resolve Receiver impl, and if it resolves to the blanket impl then we try to resolve Deref and see if it's too generic. But that also feels a bit ad-hoc..

@lcnr
Copy link
Contributor

lcnr commented Sep 12, 2024

Is this the sort of case you're worried about?

No, as there we have a single impl<T> Deref for MyPtr<T>. But now imagine we have impl<T> Deref for MyPtr<Box<T>> and impl<T> Deref for MyPtr<Arc<T>> -- these impls could be doing entirely different things, and yet either of them could be relevant for dispatching fn m<T>(self: &MyPtr<T>) where MyPtr<T>: Deref<Target=Self>.

so something like https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=feca59e56ec2092732906a7c10493f67?

@RalfJung
Copy link
Member

RalfJung commented Sep 12, 2024

Yeah, something like that. Those impls can now have completely different Target types and thus the method may be looked up quite differently... it's a bit hard for me to imagine what even happens in the general case.^^ Here's an example.

I am surprised that this even compiles... are we considering every inherent method of every type to be a potential candidate here to find out which one gets called? Won't that be a mess?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-discussion Category: Discussion or questions that doesn't represent real issues. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants