Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsized Rvalues #1909

Merged
merged 4 commits into from
Feb 8, 2018
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
203 changes: 203 additions & 0 deletions text/0000-unsized-rvalues.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
- Feature Name: unsized_locals
- Start Date: 2017-02-11
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)

# Summary
[summary]: #summary

Allow for local variables, function arguments, and some expressions to have an unsized type, and implement it by storing the temporaries in variably-sized allocas.

Have repeat expressions with a length that captures local variables be such an expression, returning an `[T]` slice.

Provide some optimization guarantees that unnecessary temporaries will not create unnecessary allocas.

# Motivation
[motivation]: #motivation

There are 2 motivations for this RFC:

1. Passing unsized values, such as trait objects, to functions by value is often desired. Currently, this must be done through a `Box<T>` with an unnecessary allocation.

One particularly common example is passing closures that consume their environment without using monomorphization. One would like for this code to work:

```Rust
fn takes_closure(f: FnOnce()) { f(); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's going to be hard to explain to beginners why they should be writing fn takes_closure<F: FnOnce()>(f: F) vs. fn takes_closure(f: FnOnce()). The fn takes_closure(f: FnOnce()) syntax is the easiest to read/write, IMO. However, I rarely want to pass unsized types by value or dynamically dispatch on owned types. It seems odd to grant the easiest syntax to the lesser-used feature.

I'm not sure of the best way to solve this. It might be a good idea to introduce dyn/virtual or a similar qualifier (cc @ticki) to prevent beginners from accidentally using dynamic dispatch everywhere. However, the fact that this doesn't mirror existing trait object syntax (e.g. &FnOnce() and Box<FnOnce()>) is regrettable.

P.S. Note that this problem would be made even worse with the addition of fn takes_closure(f: impl FnOnce()) or fn takes_closure(f: any FnOnce()).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd think <FnOnce()>, Owned<FnOnce()>, Virtual<FnOnce()>, Trait<FnOnce()>, etc. all work if you need to discourage folks form using it. All these are slightly more consistent with existing trait object syntax than introducing a new keyword.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's really nothing stopping a compiler from monomorphizing fn takes_closure(f: FnOnce()) the same way as fn takes_closure<F: FnOnce()>(f: F) in the cases where the type is statically known (i.e. in all of the cases where the second form would've compiled).

Copy link
Member

@cramertj cramertj Feb 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glaebhoerl I suppose it's possible, but by the same reasoning it'd be possible for the compiler to monomorphize fn takes_closure(f: Box<FnOnce()>) and fn takes_closure(f: &FnOnce()), neither of which would be expected to produce monomorphized code. Correct me if I'm wrong, but Rust tries to be explicit about monomorphization vs. trait objects, and adding an optimization such as these seems like it would be taking control away from the user.

Copy link

@burdges burdges Feb 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I supposefn takes_closure(f: FnOnce()) could mean "let the compiler do what it thinks is best", possibly including using a trait object for code size reasons due to optimization settings, stored profiling results, etc., while another syntax like fn takes_closure(f: <FnOnce()>) actually means "use an owned trait object".

Copy link
Contributor Author

@arielb1 arielb1 Feb 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't screw up by not writing your wrapper functions as #[inline] (which ATM is also a problem with generic functions with codegen-units), I think LLVM should be smart enough to do the inlining thing by itself when it makes sense - the "vtable" is passed as a plain immutable reference to an pointer, so LLVM does not need to do any smarts to devirtualize the call.

The reason we introduced unboxed closures is not because passing boxed closures to functions is hard to optimize, it's because storing boxed closures in structs can inhibit optimizations. And storing an FnOnce within a struct will make your struct unsized and hard to use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storing an FnOnce within a struct will make your struct unsized and hard to use

Are you referring to my comment about whether or not to allow passing Option<FnOnce()>? Because it seems to me that Option<FnOnce()> makes a lot of sense, and doesn't seem any harder to use than FnOnce() on its own.

Copy link
Contributor Author

@arielb1 arielb1 Feb 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option<FnOnce()> is not a valid type. The reason is that Option<T> requires T: Sized, which is because Option<&fn()> (size 8, pointer at offset 0) has a different layout from Option<[closure capturing usize]> (size 16, pointer at offset 8), and the vtable is not specific for Option (it's specific to the ref-to-fn-ptr or closure) so it can't contain the offset.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's true of Option<_> as currently implemented. I'm just trying to understand what sorts of things would be allowed under this proposal-- for instance, @stjepang mentioned wanting to be able to write a struct with an unsized field. If that's allowed, why not an enum with an unsized variant?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you referred to this as the "unsized DSTs option."

```

But today you have to use a hack, such as taking a `Box<FnBox<()>>`.

2. Allocating a runtime-sized variable on the stack is important for good performance in some use-cases - see RFC #1808, which this is intended to supersede.

# Detailed design
[design]: #detailed-design

## Unsized Rvalues - language

Remove the rule that requires all locals and rvalues to have a sized type. Instead, require the following:

1. The following expressions must always return a Sized type:
1. Function calls, method calls, operator expressions
- implementing unsized return values for function calls would require the *called function* to do the alloca in our stack frame.
2. ADT expressions
- see alternatives
3. cast expressions
- this seems like an implementation simplicity thing. These can only be trivial casts.
2. The RHS of assignment expressions must always have a Sized type.
- Assigning an unsized type is impossible because we don't know how much memory is available at the destination. This applies to ExprAssign assignments and not to StmtLet let-statements.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This got all munched together by Markdown, might want to put it in a list

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know what this means, but I think that the RFC would be improved by mentioning things in terms of the language as well as the compiler. Something like:

"Note that assignments covers cases where an existing lvalue is being assiged, e.g. the b in a = b; initializers in let expressions (e.g., the b in let a = b) are not subject to this restriction."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mental model, the name "assignment expression" always refers to the ExprAssign AST node (for one, StmtLet is not an expression).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but it can't hurt to clarify.


This also allows passing unsized values to functions, with the ABI being as if a `&move` pointer was passed (a `(by-move-data, extra)` pair). This also means that methods taking `self` by value are object-safe, though vtable shims are sometimes needed to translate the ABI (as the callee-side intentionally does not pass `extra` to the fn in the vtable, no vtable shim is needed if the vtable function already takes its argument indirectly).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I suspect I agree with it, I don't really understand what this parenthetical means. 😄 Specifically: "(as the callee-side intentionally does not pass extra to the fn in the vtable, no vtable shim is needed if the vtable function already takes its argument indirectly)"

Copy link
Contributor Author

@arielb1 arielb1 Feb 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose you have something like:

trait Foo {
    fn bar(self, x: u32);
}

struct Bar(...);

impl Foo for Bar { /* .. */ }

The function (shim) <Foo as Foo>::foo takes 3 "LLVM" argument - a fat by-move pointer to Self, and x by-value.

function <Bar as Foo>::foo(%data: *u8, %vtbl: *<vtable for Foo>, %x: u32) {
    %fn = vtbl.foo;
    call %fn(%data, %x); // vtbl is not passed.
}

Then the vtable entry for <Bar as Foo>::foo has the type llfn(*u8, u32). The Rust function <Bar as Foo>::foo has the Rust type fn(Foo, u32). If Foo is a large struct that is passed by reference, this is lowered to the llvm type llfn(*Foo, u32) which can be bitcast. OTOH, if Foo is, say, an u32, this is lowered to llfn(u32, u32) and we need a shim that passes a reference.


For example:

```Rust
struct StringData {
len: usize,
data: [u8],
}

fn foo(s1: Box<StringData>, s2: Box<StringData>, cond: bool) {
// this creates a VLA copy of either `s1.1` or `s2.1` on
// the stack.
let mut s = if cond {
s1.data
} else {
s2.data
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean s1.data here? StringData does not have a .1 member.

drop(s1);
drop(s2);
foo(s);
}

fn example(f: for<'a> FnOnce(&'a X<'a>)) {
let x = X::new();
f(x); // aka FnOnce::call_once(f, (x,));
}
```

## VLA expressions

Allow repeat expressions to capture variables from their surrounding environment. If a repeat expression captures such a variable, it has type `[T]` with the length being evaluated at run-time. If the repeat expression does not capture any variable, the length is evaluated at compile-time. For example:
```Rust
extern "C" {
fn random() -> usize;
}

fn foo(n: usize) {
let x = [0u8; n]; // x: [u8]
let x = [0u8; n + (random() % 100)]; // x: [u8]
let x = [0u8; 42]; // x: [u8; 42], like today
let x = [0u8; random() % 100]; //~ ERROR constant evaluation error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error definitely wants a note explaining what's going on, it looks very confusing otherwise

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this limitation can be easily worked around.

let n = 0;
let x = [0u8; n + (random() % 100)]; // OK

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this rule sounds good to me as long as the error message includes something like this:

note: if you want this array's size to be computed at runtime, move the expression into a temporary variable:
    let n = random() % 100;
    let x = [0u8; n];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to just get rid of the restriction and allow random % 100 to work. Programmers have the intuition that one can replace let n = random() % 100; f(n) with f(random() % 100) and IMO it isn't worth fighting that intuition.

In the furture Rust should create some equivalent to constexpr for expressions that guarantees that such-marked expressions are evaluated at compile time. That would be a more general mitigation for the concern here.

Copy link
Contributor

@petrochenkov petrochenkov Feb 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@briansmith
As I understand, this is mostly an implementation concern.
If captured local value is the indicator, then type of the array is immediately known, but detecting "constexprness" for an arbitrary expression may be unreasonably hard to do in time.

Copy link
Contributor

@nikomatsakis nikomatsakis Feb 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how hard it was for us to find the [T; n] syntax I despair of finding another syntax though. =) I guess one question is how often people will want to create VLA on the stack, and particularly VLA where the length is not compile-time constant, but also doesn't use any local variables or other inputs!

It seems like access to a static (as opposed to const) could also trigger the "runtime-dependent" rule. For example, the following does not compile today (and rightly so, I think):

static X: usize = 1;

fn main() {
    let v = [0; X];
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's tricky, although I'd argue that in a type, e.g. [i32; X], reading the static is perfectly fine and observes the value just-after-initialization (which also makes accessing a static before it's fully initialized a regular on-demand cycle error).

Copy link

@engstad engstad Feb 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For backward compatibility, I would assume that syntax has to be different. I feel that the current proposal is way too similar to the old syntax. As a programmer, I would very much like to be able to search for places where the stack could grow unboundedly; and I think the syntax should help in that effort.

The type syntax is fine. [T] is sufficiently different from [T; N]. The constructor for VLA should reflect that initialization happens at runtime. Perhaps, instead, something along the lines of:

fn f(n: usize) {
    let v = stack_alloc::init::<[u32]>(n, 0);
    let u: [i32] = stack_alloc::init_with(n, |i: usize| 2 * i);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a programmer, I would very much like to be able to search for places where the stack could grow unboundedly; and I think the syntax should help in that effort.

Uh, this also happens everywhere you have recursion. How about having a lint / editor command for this instead?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, unbounded stack-use due to recursion would also be nice to detect, but that's a different issue.

As far as using lint goes, that would do its job, but not everyone lints their code. I'm talking about "understanding what the code does by looking at it". With the current proposal, it is hard to figure out if it is an unbounded allocation or a statically bound allocation on the stack.

}
```
"captures a variable" - as in RFC #1558 - is used as the condition for making the return be `[T]` because it is simple, easy to understand, and introduces no type-checking complications.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree that it is simple to understand, at least I need to read the RFC + comments 3 times to see why [0u8; random() % 100] is error while [0u8; n + random() % 100] is fine.

I don't see why #1558's rule should apply to VLA, there is no difference between random() % 100 and n + random() % 100 that makes the former unsuitable for VLA.

I'd rather you have a core::intrinsic::make_vla::<u8>(0u8, random() % 100) than having such a strange rule.

Copy link
Contributor

@glaebhoerl glaebhoerl Feb 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW the "previous" rule suggested by @eddyb was that only array-repeat-literals with their expected type explicitly specified as being [T] would be allowed to be VLAs, that is:

fn foo(n: usize) {
    let x = [0u8; n]; // error
    let x: [u8] = [0u8; n]; // OK
    let x = [0u8; random() % 100]; // error
    let x: [u8] = [0u8; random() % 100]; // OK

    fn expects_slice(arg: &[u8]) { ... }

    expects_slice(&[0u8; n + random()]); // also OK

    fn expects_ref<T: ?Sized>(arg: &T) { ... }

    expects_ref(&[0u8; random()]); // error!

    expects_ref::<[T]>(&[0u8; random()]); // presumably OK!
}

This might be less ergonomic (presumably why the new rule was chosen instead), but I do like the explicitness.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the explicitness, but I worry this doesn't give well with the principle that explicit type annotations are "just another" way to constrain type inference, rather than required à la C or Go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "expected type" rule is also fine. Not sure which of these is easier to explain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we need to make VLA easy to construct by overloading the [x; n] syntax, it is an advanced feature which normally no one should use it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is an advanced feature which normally no one should use it.

I would be fine with a totally different syntax (in general I concede syntax to others).

I disagree that we should conclude that VLAs are an advanced feature, though. I think that it's better to start off trying to make VLAs as natural and ergonomic as possible, and see if they actually get abused in such a way that people write bad or dangerous (stack-overflowing) code with them. If, based on initial experience, the feature turns out to actually be dangerous enough to justify making it harder to use, then the syntax can be changed.

Copy link
Contributor

@nikomatsakis nikomatsakis Feb 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 to this simple condition

EDIT: I ... think I still feel the same, but I hadn't read the above conversation when I wrote this. See my comments there.

Copy link
Contributor

@glaebhoerl glaebhoerl Feb 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikomatsakis another now-hidden conversation https://github.com/rust-lang/rfcs/pull/1909/files/9e2260e1cd31fdb4fd59382d465806877517ea8a#discussion_r101970548

EDIT: eh, I can't seem to make that link work no matter how I try. anyway, it's under one of the Show Outdated expanders.


The last error message could have a user-helpful note, for example "extract the length to a local variable if you want a variable-length array".

## Unsized Rvalues - MIR

The way this is implemented in MIR is that operands, rvalues, and temporaries are allowed to be unsized. An unsized operand is always "by-ref". Unsized rvalues are either a `Use` or a `Repeat` and both can be translated easily.

Unsized locals can never be reassigned within a scope. When first assigning to an unsized local, a stack allocation is made with the correct size.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not allow reassignment with the requirement that the size must be the same, with a dynamic check, similar to how copy_from_slice() works?

In my case, I'd have local values of struct Elem { value: [usize] }, and all the local values in a function of type Elem would use the same length for the value, so (hopefully) the compiler could easily optimize the equal-length checks away.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because a = b; is not supposed to be doing magic dynamic checks behind your back. You can use copy_from_slice if you want.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because a = b; is not supposed to be doing magic dynamic checks behind your back. You can use copy_from_slice if you want.

a = b[0] does a “magic dynamic check behind your back” and it's already considered acceptable. This isn't any different, except it probably will be easier to optimize away than the array index check.

Copy link

@eternaleye eternaleye Feb 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is different, and no, a = b[0] is neither "magic" nor "behind your back".

It assigns the result of an indexing operation, which has a syntactic representation. All such cases, OpAssign included, have some extra syntax on one side of the assignment or another.

Assignment of VLAs would not, which is what makes it both "magic" and "behind your back".

(And furthermore, it's worth noting that all of the dynamic checks in such cases come from the Op, not the Assign.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the b[0], you mean? The problem with the check on assignment is that assignment expressions normally can't panic, so an very innocuous-looking assignment statement could cause runtime panics, especially because variants of vec[0..0] = *w; would compile and panic, confusing Python programmers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with the check on assignment is that assignment expressions normally can't panic, so an very innocuous-looking assignment statement could cause runtime panics, especially because variants of vec[0..0] = *w; would compile and panic, confusing Python programmers.

I think this is also good to note in the RFC.

Would at least the following work?

#[derive(Clone)]
struct Elem([usize]);

I believe an implementation of Clone for such a type could be implemented manually:

impl Clone for Elem {
    pub fn clone(&self) -> Self {
        let value_len = self.0.len();
        let mut r = Elem([0; value_len]);
        r.0.copy_from_slice(&self.0);
        r
    }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tricky thing. In general, unsized return values can't quite work without some additional machinery because the caller has no idea how much space to allocate. In other words, here, how is the caller to know that you are returning an Elem with the same length as the input had?

To handle that properly would I think require making some kind of (erased) parameter that lets you say that the length of the return value is the same as the input. (Similar to how Java wildcard capture can allow you to do some things kind of like that, or full on existentials.) I would be more precise but I'm kind of at a loss for how that would look in Rust syntax at the moment.

A related thought I had once is that we could potentially make a trait with -> Self methods be object safe: this works precisely because the underlying type must be sized, and because we control the impls enough to make sure (statically) that the returned value will always be of the same type that the receiver has (so we can allocate the right amount of space by examining how much space the receiver wants).

I could imagine trying to do something similar but with array lengths.


MIR construction remains unchanged.

## Guaranteed Temporary Elision

MIR likes to create lots of temporaries for OOE reason. We should optimize them out in a guaranteed way in these cases (FIXME: extend these guarantees to locals aka NRVO?).

TODO: add description of problem & solution.

# How We Teach This
[teach]: #how-we-teach-this

Passing arguments to functions by value should not be too complicated to teach. I would like VLAs to be mentioned in the book.

The "guaranteed temporary elimination" rules require more work to teach. It might be better to come up with new rules entirely.

# Drawbacks
[drawbacks]: #drawbacks

In Unsafe code, it is very easy to create unintended temporaries, such as in:
```Rust
unsafe fn poke(ptr: *mut [u8]) { /* .. */ }
unsafe fn foo(mut a: [u8]) {
let ptr: *mut [u8] = &mut a;
// here, `a` must be copied to a temporary, because
// `poke(ptr)` might access the original.
bar(a, poke(ptr));
}
```

If we make `[u8]` be `Copy`, that would be even easier, because even uses of `poke(ptr);` after the function call could potentially access the supposedly-valid data behind `a`.

And even if it is not as easy, it is possible to accidentally create temporaries in safe code.

Unsized temporaries are dangerous - they can easily cause aborts through stack overflow.

# Alternatives
[alternatives]: #alternatives

## The bikeshed

There are several alternative options for the VLA syntax.

1. The RFC choice, `[t; φ]` has type `[T; φ]` if `φ` captures no variables and type `[T]` if φ captures a variable.
- pro: can be understood using "HIR"/resolution only.
- pro: requires no additional syntax.
- con: might be confusing at first glance.
- con: `[t; foo()]` requires the length to be extracted to a local.
2. The "permissive" choice: `[t; φ]` has type `[T; φ]` if `φ` is a constexpr, otherwise `[T]`
- pro: allows the most code
- pro: requires no additional syntax.
- con: depends on what is exactly a const expression. This is a big issue because that is both non-local and might change between rustc versions.
3. Use the expected type - `[t; φ]` has type `[T]` if it is evaluated in a context that expects that type (for example `[t; foo()]: [T]`) and `[T; _]` otherwise.
- pro: in most cases, very human-visible.
- pro: requires no additional syntax.
- con: relies on the notion of "expected type". While I think we *do* have to rely on that in the unsafe code semantics of `&foo` borrow expressions (as in, whether a borrow is treated as a "safe" or "unsafe" borrow - I'll write more details sometime), it might be better to not rely on expected types too much.
4. use an explicit syntax, for example `[t; virtual φ]`.
- bikeshed: exact syntax.
- pro: very explicit and visible.
- con: more syntax.
5. use an intrinsic, `std::intrinsics::repeat(t, n)` or something.
- pro: theoretically minimizes changes to the language.
- con: requires returning unsized values from intrinsics.
- con: unergonomic to use.
Copy link
Member

@kennytm kennytm Feb 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could always define a macro

// unsafe fn ::core::intrinsics::alloca<T>(n: count) -> *mut [T]
//
// or something.

macro_rules! alloca {
    ($t:expr, $n:expr) => { 
        unsafe { 
            let ptr = ::core::intrinsics::alloca::<T>($n);
            clone_into_uninitialized_slice(ptr, $t);
            *ptr
        } 
    }
}

alloca![t; n]

to hide the intrinsics, just like vec! hiding collections::vec::from_elem.

OK this sounds like patching alloca on top of this RFC 😝 (I think unsized rvalue and VLA syntax should probably be split into two RFCs)


## Unsized ADT Expressions

Allowing unsized ADT expressions would make unsized structs constructible without using unsafe code, as in:
```Rust
let len_ = s.len();
let p = Box::new(PascalString {
length: len_,
data: *s
});
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this really needs to be supported:

#[derive(Copy, Clone)]
struct Elem {
    // ...
    value: [T],
}

I understand the concern expressed above about uses of Box adding surprising allocas. However, I think this feature is actually more likely to be used by code, like mine, that is trying to avoid Box completely. In particular, if we're willing/able to use the heap instead of the stack then we don't miss stack-allocated VLAs nearly as much.

Copy link
Contributor Author

@arielb1 arielb1 Feb 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clone::clone has Self: Sized, so this should work depending on the intelligence of derive(Clone) (and more to the point, implementing Clone/Copy yourself would work).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arielb1

Clone::clone has Self: Sized, so this should work depending on the intelligence of derive(Clone) (and more to the point, implementing Clone/Copy yourself would work).

I'm confused by this. How could this work unless we modify the Clone trait?


However, without some way to guarantee that this can be done without allocas, that might be a large footgun.
Copy link

@briansmith briansmith Feb 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with large allocas is already a problem today without this feature. In fact this feature helps resolve the existing problem.

Consider:

#[allow(non_snake_case)] // Use the standard names.
pub struct RSAKeyPair {
    n: bigint::Modulus<N>,
    e: bigint::PublicExponent,
    p: bigint::Modulus<P>,
    q: bigint::Modulus<Q>,
    dP: bigint::OddPositive,
    dQ: bigint::OddPositive,
    qInv: bigint::Elem<P, R>,
    qq: bigint::Modulus<QQ>,
    q_mod_n: bigint::Elem<N, R>,
    r_mod_p: bigint::Elem<P, R>, // 1 (mod p), Montgomery encoded.
    r_mod_q: bigint::Elem<Q, R>, // 1 (mod q), Montgomery encoded.
    rrr_mod_p: bigint::Elem<P, R>, // RR (mod p), Montgomery encoded.
    rrr_mod_q: bigint::Elem<Q, R>, // RR (mod q), Montgomery encoded.
}

Oversimplifying a bit, each one of those values is ideally an [u8] with a length of between 128-512 bytes. But, instead, I currently am in the process of making them [u8; 1024] because Rust doesn't have VLAs (Other parts of my code need 1024-byte values for these types, but the RSAKeyPair operates on shorter values).. So, currently I'd need about 13KB or 26KB of stack space when constructing one of these already today, already. VLAs would actually reduce the expected stack space usage, even without implementing any sort of solution to this problem.

Copy link

@eternaleye eternaleye Feb 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@briansmith: So, I don't think the RSAKeyPair use case is actually viable even with the most generous expansion of this, because it not only needs VLAs, requires a struct with more than one unsized member.

This is something completely unsupported in Rust at present, poses major layout difficulties, would require multiple different runtime sizes (which is a big thing to extend fat-pointers to do), and I'm hugely unsure if LLVM will even permit this without a great deal of effort, as Clang doesn't even support VLAIS, a notable hole in its GCC compatibility.

I suspect that use case is considerably better handled by const-dependent types, and possibly a later extension of DST to support unsizing a type with a single const-dependent parameter, to a type with none (and a fat-pointer, fattened by that value)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@briansmith: So, I don't think the RSAKeyPair use case is actually viable even with the most generous expansion of this, because it not only needs VLAs, requires a struct with more than one unsized member.

Good point that VLAs won't direct solve this problem. However, my main point here is that the problem with Box potentially allocating large numbers of giant values on the stack before they are put on the heap already exists, for any large structure with large elements.

I think the compiler just needs to put in a little effort to ensure that it properly optimizes (minimizes) stack usage for code using this pattern:

let large_vla_1 = ....;
let boxed_vla_1 = Box::new(large_vla_1);
let large_vla_2 = ....;
let boxed_vla_2 = Box::new(large_vla_2);
...
let large_vla_n = ....;
let boxed_vla_n = Box::new(large_vla_n);

In particular, it should be optimized into this:

let boxed_vla_1;
let boxed_vla_2;
...
let boxed_vla_n;
{
    let large_vla_1 = ....; // alloca
    boxed_vla_1 = Box::new(large_vla_1);
} // pop `large_vla_1` off the stack;
{
    let large_vla_2 = ....; // alloca
    boxed_vla_2 = Box::new(large_vla_2);
} // deallocate `large_vla_2`.
...
{
    let large_vla_n = ....; // alloca
    boxed_vla_n = Box::new(large_vla_n);
} // deallocate `large_vla_n`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@briansmith That's the point of the box expression (#809 / rust-lang/rust#22181).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@briansmith

Unfortunately, placement of allocas is entirely controlled by LLVM in both of your code cases. I believe that it should generate the "optimal" code in both cases, as long as you don't take references to large_vla_N before boxing them.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@briansmith Would it be possible to solve the problem with raw byte arrays and autogenerated unsafe code?


## Copy Slices

One somewhat-orthogonal proposal that came up was to make `Clone` (and therefore `Copy`) not depend on `Sized`, and to make `[u8]` be `Copy`, by moving the `Self: Sized` bound from the trait to the methods, i.e. using the following declaration:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to note that removing a supertrait is a breaking change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but Sized is...special, in this case.

Also getting rid of the arbitrary limit of 32 array elements would make up for a lot...let's see a crater run before arguing further.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference to the concrete type [u8] here is confusing. Did you mean [T] or something different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean T where T: Copy

```Rust
pub trait Clone {
fn clone(&self) -> Self where Self: Sized;
fn clone_from(&mut self, source: &Self) where Self: Sized {
// ...
}
}
```

That would be a backwards-compatability-breaking change, because today `T: Clone + ?Sized` (or of course `Self: Clone` in a trait context, with no implied `Self: Sized`) implies that `T: Sized`, but it might be that its impact is small enough to allow (and even if not, it might be worth it for Rust 2.0).

# Unresolved questions
[unresolved]: #unresolved-questions

How can we mitigate the risk of unintended unsized or large allocas? Note that the problem already exists today with large structs/arrays. A MIR lint against large/variable stack sizes would probably help users avoid these stack overflows. Do we want it in Clippy? rustc?

How do we handle truely-unsized DSTs when we get them? They can theoretically be passed to functions, but they can never be put in temporaries.

Accumulative allocas (aka `'fn` borrows) are beyond the scope of this RFC.

See alternatives.