Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Elide array size #2545

Closed
wants to merge 5 commits into from
Closed

Conversation

llogiq
Copy link
Contributor

@llogiq llogiq commented Sep 17, 2018

This is a small-ish RFC for array size elision wherever it can be taken directly from the initializer. It is deliberately restricted like this to uphold locality and avoid errors at a distance.

Rendered

@Centril Centril added the T-lang Relevant to the language team, which will review and decide on the RFC. label Sep 17, 2018
@Centril
Copy link
Contributor

Centril commented Sep 18, 2018

I'm not opposed to value inference of this kind - in fact it excites me a lot.
However, I am concerned about the ad-hoc / piecemeal nature of doing it in one place and not generally for all type-constructors with const parameters (e.g. struct Foo<const Param: Type> { .. }).
In other words, I think this RFC may not go far enough :)

@llogiq
Copy link
Contributor Author

llogiq commented Sep 18, 2018

I can see that there is interest in having more daring inference here. However, as I've pointed out, this can lead to errors from a distance, if one static contains a const down the line whose type changes. Those errors could be hard to track down. At the very least, they'd be surprising. However, we may be able to tweak the error reporting in such a way that such errors are quickly resolved, and that'd be awesome.

For now, similar to the const lifetimes RFC I wrote, let's take a smaller step and elide the length only where it is obvious from the same line of code. This will already give us very real ergonomic wins with very little cost both in readability and teachability (It just occurred to me that I should add a paragraph about error messages), and it doesn't preclude us from aiming for stronger inference later.

For example, you might write:

```rust
const CONST_CHARS: [u8; _] = b"This is really a byte array";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the expression here needs to be *b"…" (with a star added) because the type of byte string literals is &'static [u8; N]. (Maybe it should have been plain [u8; N] but oh well.)

@Centril
Copy link
Contributor

Centril commented Sep 18, 2018

@llogiq

I can see that there is interest in having more daring inference here. However, as I've pointed out, this can lead to errors from a distance, if one static contains a const down the line whose type changes. [...]
[...]
elide the length only where it is obvious from the same line of code.

So with this RFC as-is, the following would be an error, right?

fn foo() -> [u8; 9] { /* details elided */ }

fn main() {
    let bar: [u8; _] = foo();
}

If so, this feels even more ad-hoc, non-uniform, and makes me wary of the teachability aspect.

For now, similar to the const lifetimes RFC I wrote, let's take a smaller step [...]. This will already give us very real ergonomic wins [...]

I am specifically concerned with taking small steps; I wouldn't want to stabilize this, leave the language in an inconsistent state in the interim and do the difficult extensions possibly later. I want to be sure that the extensions both can and will happen.

In particular, I expect that the following should work:

struct Matrix<T, const N: usize, const M: usize> {
    data: [[T; N]; M]
}

fn main() {
    let foo: Matrix<u8, _, _> = Matrix { data: [[1, 2, 3], [4, 5, 6]] };
}

@llogiq
Copy link
Contributor Author

llogiq commented Sep 18, 2018

That's an interesting case. As for your first example, foo could be in another module or even another crate. If the function changes, you may get an error, if any, at the use site of bar, which may be at yet another different part of the code.

As such I worry, that your proposal could lead to errors that are extremely hard to track down. So unless you can show me an algorithm to produce error messages for that case that makes those errors easy to fix, I remain unconvinced.

I don't agree with your teachability argument either. If the size is trivially inferrable from the initializer, wildcards are allowed. Otherwise insert the size.

@Centril
Copy link
Contributor

Centril commented Sep 18, 2018

@llogiq

As for your first example, foo could be in another module or even another crate. If the function changes, you may get an error, if any, at the use site of bar, which may be at yet another different part of the code.

This is no different from using _ for types.
Do note that it is currently valid to write:

// crate A:
pub fn producer() -> [u8; 9] { /* details elided */ }

// crate B:
pub fn consumer(arr: [u8; 9]) { /* details elided */ }

// crate C:
fn main() {
    let bar = producer(); // No type annotations.
    consumer(bar);
}

Thus, by writing let bar: [u8; _] we have actually provided more information than what the compiler already requires of us.

As such I worry, that your proposal could lead to errors that are extremely hard to track down. So unless you can show me an algorithm to produce error messages for that case that makes those errors easy to fix, I remain unconvinced.

cc @varkor

As far as I know, most dependently typed languages can do this sort of in-function inference (or the languages would be intolerable to write in) and so it should not be novel in any way.

I don't agree with your teachability argument either. If the size is trivially inferrable from the initializer, wildcards are allowed. Otherwise insert the size.

It is the "otherwise insert the size" that is surprising, compared to how inference currently works for types coupled with the fact that you are allowed to write let bar = producer();, and leads me to believe than users will inevitably hit such errors. This forces users to learn new rules, whereas if we had a more general value inference mechanism, there would be fewer corner cases and caveats in the language. I've lain out a more general philosophy about the value of uniformity in language design here.

@durka
Copy link
Contributor

durka commented Sep 19, 2018

To continue with my tradition of macro implementations of RFCs (except in this case I wrote the macro two years ago)... https://crates.io/crates/counted-array

Clearly, I think this is a good idea. :)

@joshtriplett
Copy link
Member

joshtriplett commented Sep 19, 2018

👍 I like the idea of handling the simple case people constantly run into first. I've wanted this many times.

glandium added a commit to glandium/sccache that referenced this pull request Sep 19, 2018
One of the most annoying thing from using static arrays in rust is that
there is no size inference, and you end up having to give the proper
size. Which makes updates to the arrays cumbersome.

I was reading "This Week in Rust 252", which linked to (RFC: Elide array
size)[rust-lang/rfcs#2545], set to address this
very problem, and @durka linked to his counted-arrays crate, which
already existed and essentially implements the idea behind the RFC.
@djc
Copy link
Contributor

djc commented Sep 19, 2018

I like this RFC a lot. Having to maintain the array length manually is boilerplate causing unnecessary work -- counting things is something computers are good at -- I know this would have samed me some edit-compile cycles.

@llogiq
Copy link
Contributor Author

llogiq commented Sep 19, 2018

@Centril I am sympathetic to your interest in uniformity and wouldn't be against inference for partial type ascription in let bindings (I note that your examples are all such bindings), but I don't know how complex the implementation would get.

I'm still defending the principle of locality for consts & statics. We want to make the code easier to write, but not harder to read and especially not harder to change later!

@earthengine
Copy link

earthengine commented Sep 20, 2018

I would like to have this. For the concerns @Centril have, I propose the following rule:

If a type annotation is required (for example top level const or function) and the item is not public, and if the type can be inferenced, _ can be used in the array size position

so

fn foo() -> [u8; 9] { /* details elided */ }

fn main() {
    let bar: [u8; _] = foo();
}

is not legal as bar do not require a type annotation. Instead, foo can return [u8; _] if it can be derived from the return statement. However, this is not allowed if foo is pub.

We need the rule for pub because, we want the code reader know exactly what the type is if it can be accessed from other places. If it can only be accessed locally, the reader should be able to pick up quickly.

@eddyb
Copy link
Member

eddyb commented Sep 26, 2018

I don't believe this RFC is necessary - it is subsumed by const generics, perhaps other than the surface syntax. The ability to infer constants in types is essential to const generics, particularly when used in impls (but also to keep the design and implementation uniform across liftimes, types and consts).

rust-lang/rust#53645 has already been open for just over a month now, and it already contains the needed implementation work (minus the surface syntax of _).

Nominating for @rust-lang/lang meeting, with a preference to close.

@eddyb
Copy link
Member

eddyb commented Sep 26, 2018

To be clear, my comment above is about inference wherever we accept inference today.

For the types of const / static to be determined from the initializer, this decision would be affected: #2010 (comment) - I see no mention of that in this RFC.

Note that said decision interacts with this RFC two-fold:

Array literals where you have to say the length const Foo: [u8; 3] = [...]; often you can do const Foo: &[u8] = &[...] instead here though.

Perhaps something syntactic like @petrochenkov suggested would be better, but that feels kind of scary to me. It's not something we have precedent for. I don't think we have a good idea what it's corner cases are, or how it would feel to have two distinct kinds of inference co-existing in the compiler.

That former shows why this is not as pressing of a problem, while the latter talks about pretty much the same strategy taken by this RFC.

OTOH, the original RFC stands some chance of being revived, with perhaps some restrictions around integer fallback, and ignoring "the closure problem" (#2010 (comment)).

@joshtriplett
Copy link
Member

Based on discussion in @rust-lang/lang, we agreed that (based on @eddyb's comments above) const generics addresses the implementation of this, so the RFC as written is not quite what we're looking for.

However, we do think that there's an RFC needed to explicitly allow inferring the const values in the types of const values, in specified circumstances, including array sizes.

@llogiq, would you be open to restructuring this RFC into that, and referencing the const generics work as how this will be implemented?

@Centril
Copy link
Contributor

Centril commented Sep 27, 2018

I'm still defending the principle of locality for consts & statics. We want to make the code easier to write, but not harder to read and especially not harder to change later!

allow inferring the const values in the types of const values, in specified circumstances, including array sizes.

I think perhaps the problem you are getting at @llogiq with local reasoning is in particular with inference for items defined elsewhere. A more precise mechanism to get at that problem could be to require explicit type ascription for const values (and perhaps types) that arise from const fn function calls referring to functions defined in X and other const values defined in X. Here X could perhaps be "everywhere" or perhaps "other modules". This would also allow you to write things such as: const FOO: Bar<_> = Bar(make_foo() : Foo<3>);.

@llogiq
Copy link
Contributor Author

llogiq commented Sep 28, 2018

@joshtriplett will do. I'll just have to understand #2000 and #2010 first 😄

[drawbacks]: #drawbacks

There is a modicum of complexity to extend the parser and AST which needs to be
done for every Rust parser in existenct. However, the feature is minor, so the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

existenct -> existence

[summary]: #summary

In arrays, allow to elide the size in the type and put an underscore there
instead if it can be deduced from the initializer. For example: `static
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the summary should already mention that we only want this inference if the length can be trivially obtained by either counting the elements of the array initializer or taking it verbatim from a repeat expression.

needs to count the elements of an array manually to determine its size. Letting
the compiler find the size reduces the potential for error. It also allows us
to ascribe the array component type without requiring the size (which is –
perhaps surprisingly – not currently allowed).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seen many situations where users just gave up and used const X: &[u8] = &[4, 5, 6]; in order to not need to specify the length.

luser pushed a commit to glandium/sccache that referenced this pull request Oct 9, 2018
One of the most annoying thing from using static arrays in rust is that
there is no size inference, and you end up having to give the proper
size. Which makes updates to the arrays cumbersome.

I was reading "This Week in Rust 252", which linked to (RFC: Elide array
size)[rust-lang/rfcs#2545], set to address this
very problem, and @durka linked to his counted-arrays crate, which
already existed and essentially implements the idea behind the RFC.
@Centril
Copy link
Contributor

Centril commented Sep 28, 2019

It's worth noting that after rust-lang/rust#64698, the compiler now has parser recovery for the common case of const A = 42;:

error: missing type for `const` item
 --> src/lib.rs:1:7
  |
1 | const A = 42;
  |       ^ help: provide a type for the item: `A: i32`

@Centril
Copy link
Contributor

Centril commented Oct 15, 2019

I read the diff when it was made (I'm watching the RFC repo) and I don't think the concerns were resolved. The RFC talks about let bindings and arrays specifically in an ad-hoc manner.

What I would like to see instead is that the grammar of expressions is amended with expr ::= expr_before_this_rfc | "_". That is, syntactically, _ becomes a valid expression everywhere and stands for a value inference variable (rustc::mir::interpret::ConstValue::Infer specifically). We can then apply semantic restrictions (in wherever the current equivalent restriction for _ as type inference variables is made) around not having it in function parameter and return types.

Moreover, I think the RFC should explain why, when we other wise do not allow global type inference, it is OK to have global type inference in this case. (That is, @cramertj's comment in #2545 (comment) should be addressed.)

Finally, given that #2545 (comment) is now implemented it would be good to explain why that isn't enough and why a similar diagnostics-only thing for expressions wouldn't be enough.

@gnzlbg
Copy link
Contributor

gnzlbg commented Oct 15, 2019

What I would like to see instead is that the grammar of expressions is amended with expr ::= expr_before_this_rfc | "_". That is, syntactically, _ becomes a valid expression everywhere and stands for a value inference variable (rustc::mir::interpret::ConstValue::Infer specifically).

This might deserve its own RFC.

@Centril
Copy link
Contributor

Centril commented Oct 15, 2019

@gnzlbg Possibly. I don't care strongly whether this RFC morphs into that or if a new one is made, but I am against ad-hoc additions in any case.

@pnkfelix
Copy link
Member

pnkfelix commented Mar 5, 2020

@rfcbot resolved state-contextual-constraints-up-front

@nikomatsakis
Copy link
Contributor

@rfcbot fcp cancel

So this FCP is ancient. I'm canceling it, along with some other ancient FCPs.

That said, I suppose that the "core conflict" remains: do we want to try and do a targeted fix or fix this via a more general mechanism building on const generics?

In today's meeting we discussed the idea of creating a proposal based on the general motivation of permitting things like const FOO: [u32; _] = [1, 2, 3] and, if accepted, forming a project group to figure out the best way forward.

In any case, as I said, I'm going to cancel the FCP and we can decide whether to move forward based on whether (a) we have people motivated in helping to push this proposal through to completion (including impl) and (b) we feel we have a willing liaison and enough bandwidth on lang-team to see that through.

@rfcbot
Copy link
Collaborator

rfcbot commented Jun 15, 2020

@nikomatsakis proposal cancelled.

@rfcbot rfcbot removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. labels Jun 15, 2020
@scottmcm
Copy link
Member

An observation here: this feels like the value-space version of what impl Trait does in the type-space.

To flesh that out:

  • It's long been the Rust rule that items have their types fully specified. I think just changing this one specific bit isn't the right holistic change. As a trivial example, why would it be allowed in const FOO but not const fn foo()?
  • Looking at the goal here, it seems to be that the crate doesn't want to promise a specific size from a semver perspective -- if it did it could just put that one by copying it from the error -- but at the same time allow the compiler to know what the concrete value there is when compiling things. That sounds very much to me like the same thing we had with types, where we wanted to add a way for people to promise that they're returning some iterator of u32s, but not its concrete type. To push that a bit further, the proposed example of static BLORP_NUMBERS: [u32; _] = [0, 8, 15]; can be roughly thought of as being static BLORP_NUMBERS: impl Copy + AsRef<[u32]> = [0, 8, 15];: you can copy it and a contiguous (thus indexable) thing, but you can't actually see its length at compile time, only at runtime.
  • So to me, rather than this proposal having the size be inferred -- which would make changing the size a breaking change because one could assign it to something of non-inferred size -- this should have additional semantic rules. So then the example would mean something like static BLORP_NUMBERS: [u32; impl usize] = [0, 8, 15];: we promise there's some usize there, but you can't take a dependency on which one is there. (That syntax being a placeholder, of course, just there to ape impl Trait.)

(Note that I'm only talking about static and const here -- having something like this for let seems clearly good, and if I remember the discussions about things correctly I think it's going to happen as part of const generics.)

@shepmaster
Copy link
Member

@scottmcm now, just mash that up a bit with const generics somehow...

fn example() -> impl [i32; const N: usize] {
    [42]
}

I wonder how it would work/look for non-array types.

@joshtriplett
Copy link
Member

Since this was previously discussed, min_const_generics is now on the path to stabilization.

Given that, and the amount of time that has passed since this proposal, we think this likely needs to be revisited as a project proposal. That project proposal would need to look at the impact of min_const_generics, and what this should look like in light of that.

@rfcbot postpone

@rfcbot
Copy link
Collaborator

rfcbot commented Mar 23, 2021

Team member @joshtriplett has proposed to postpone this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. disposition-postpone This RFC is in PFCP or FCP with a disposition to postpone it. labels Mar 23, 2021
@joshtriplett
Copy link
Member

For my part, I'm still interested in seeing this happen.

@rfcbot
Copy link
Collaborator

rfcbot commented Mar 24, 2021

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Mar 24, 2021
@rfcbot rfcbot added the finished-final-comment-period The final comment period is finished for this RFC. label Apr 3, 2021
@rfcbot
Copy link
Collaborator

rfcbot commented Apr 3, 2021

The final comment period, with a disposition to postpone, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC is now postponed.

@rfcbot rfcbot added to-announce postponed RFCs that have been postponed and may be revisited at a later time. and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. disposition-postpone This RFC is in PFCP or FCP with a disposition to postpone it. labels Apr 3, 2021
@rfcbot rfcbot closed this Apr 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-array Array related proposals & ideas A-const-generics Const generics related proposals. A-inference Type inference related proposals & ideas A-typesystem Type system related proposals & ideas finished-final-comment-period The final comment period is finished for this RFC. postponed RFCs that have been postponed and may be revisited at a later time. T-lang Relevant to the language team, which will review and decide on the RFC. to-announce
Projects
None yet
Development

Successfully merging this pull request may close these issues.