Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exhaustive integer matching #50912

Merged
merged 47 commits into from
Aug 22, 2018
Merged

Conversation

varkor
Copy link
Member

@varkor varkor commented May 20, 2018

This adds a new feature flag exhaustive_integer_patterns that enables exhaustive matching of integer types by their values. For example, the following is now accepted:

#![feature(exhaustive_integer_patterns)]
#![feature(exclusive_range_pattern)]

fn matcher(x: u8) {
  match x { // ok
    0 .. 32 => { /* foo */ }
    32 => { /* bar */ }
    33 ..= 255 => { /* baz */ }
  }
}

This matching is permitted on all integer (signed/unsigned and char) types. Sensible error messages are also provided. For example:

fn matcher(x: u8) {
  match x { //~ ERROR
    0 .. 32 => { /* foo */ }
  }
}

results in:

error[E0004]: non-exhaustive patterns: `32u8...255u8` not covered
 --> matches.rs:3:9
  |
6 |   match x {
  |         ^ pattern `32u8...255u8` not covered

This implements rust-lang/rfcs#1550 for #50907. While there hasn't been a full RFC for this feature, it was suggested that this might be a feature that obviously complements the existing exhaustiveness checks (e.g. for bool) and so a feature gate would be sufficient for now.

@rust-highfive
Copy link
Collaborator

r? @michaelwoerister

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 20, 2018
@kennytm kennytm added the T-lang Relevant to the language team, which will review and decide on the PR/issue. label May 20, 2018
Copy link
Member

@scottmcm scottmcm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm definitely fine with merging this without an RFC (feature-gated of course, as you did).

^ pattern 32u8...255u8 not covered

This is amazing even if all it did was give that error instead of just "pattern _ not covered".


// Let's test other types too!
match '\u{0}' {
'\u{0}' ..= char::MAX => {} // ok
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since char is a USV, it doesn't actually need to cover that entire range. If it's feasible, could this handle the example in rust-lang/rfcs#1550 (comment) too? (Feel free to punt if hard.)

--> $DIR/exhaustive_integer_patterns.rs:50:11
|
LL | match x { //~ ERROR non-exhaustive patterns
| ^ patterns `-128i8...-5i8`, `120i8...121i8` and `121i8...127i8` not covered
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could adjacent ranges be merged, or at least not overlap? 121 is in two of the uncovered ranges, and it would be nice if the message was "patterns -128i8...-5i8 and 120i8...127i8 not covered" instead.

LL | match x { //~ ERROR non-exhaustive patterns
| ^ patterns `-128i8...-5i8`, `120i8...121i8` and `121i8...127i8` not covered

error: aborting due to 5 previous errors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the holes in the examples are non-singleton. Consider a UI test for something like -127i8..=127 or match 0 { i16::MIN..=-1 => {} 1..=i16::MAX => {} }.

value_constructors = true;
vec![ConstantRange(ty::Const::from_bits(cx.tcx, min, cx.tcx.types.char),
ty::Const::from_bits(cx.tcx, max, cx.tcx.types.char),
RangeEnd::Included)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully the char request is just to initialize this as a two-element vec instead...

@michaelwoerister
Copy link
Member

r? @eddyb maybe?

I16 => min_max_ty!(i16, u16, cx.tcx.types.i16),
I32 => min_max_ty!(i32, u32, cx.tcx.types.i32),
I64 => min_max_ty!(i64, u64, cx.tcx.types.i64),
I128 => min_max_ty!(i128, u128, cx.tcx.types.i128),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @oli-obk This sort of thing can be done from the bit width (which you can get from the layout of these types).

struct Interval<'tcx> {
pub lo: u128,
pub hi: u128,
pub ty: Ty<'tcx>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use ty::layout::Integer here or similar.

/// An inclusive interval, used for precise integer exhaustiveness checking.
struct Interval<'tcx> {
pub lo: u128,
pub hi: u128,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can use RangeInclusive<u128> instead?

remaining_ranges.into_iter().map(|(lo, hi)| {
let (lo, hi) = Interval::offset_sign(ty, (lo, hi), false);
ConstantRange(ty::Const::from_bits(cx.tcx, lo, ty),
ty::Const::from_bits(cx.tcx, hi, ty),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should ConstantRange use ty::Const? Seems expensive, and useless unless the ty::Const is Bits (cc @oli-obk).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely be using ConstValue instead of ty::Const. I put it on my TODO list

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConstantRange can only hold integers, so ConstValue shouldn't be needed, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably. My TODO already says "ConstValue or smaller" ;). That should not block this PR though, since it's a preexisting issue.

// The pattern intersects the middle of the subrange,
// so we create two ranges either side of the intersection.)
remaining_ranges.push((subrange_lo, pat_interval.lo));
remaining_ranges.push((pat_interval.hi, subrange_hi));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you missed -1 and +1 on the fields of pat_interval, here.

for (subrange_lo, subrange_hi) in ranges {
if pat_interval.lo > subrange_hi || pat_interval.hi < subrange_lo {
// The pattern doesn't intersect with the subrange at all,
// so the subrange remains untouched.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could construct the intersection by i = a.start.max(b.start)..=a.end.min(b.end), and the two halves of the subtraction a - b by sl = a.start..=i.start-1 and sh = i.end+1..=a.end (be careful with overflow).

AFAIK, the empty condition is range.start > range.end (for inclusive ranges).

Since I don't think you need the intersection by itself, the two halves can be:
sl = a.start..=a.start.max(b.start)-1 and
sh = a.end.min(b.end)+1..=a.end
(when a.start.max(b.start) <= a.end.min(b.end), otherwise sl = sh = a)

Therefore:

if a.start > b.end || b.start > a.end {
    // No intersection.
    remaining_ranges.push(a);
} else {
    // Overflow is now not a problem, because of the conditions.
    if b.start > a.start {
        remaining_ranges.push(a.start..=b.start-1);
    }
    if b.end < a.end {
        remaining_ranges.push(b.end+1..=a.end);
    }
}

I've just checked and your code corresponds to this, but you have the 4 possibilities of my two small ifs expanded out - my main suggestion from this comment would be to orthogonalize those.

// `missing_ctors` are those that should have appeared
// as patterns in the `match` expression, but did not.
let mut missing_ctors = vec![];
'req: for req_ctor in all_ctors.clone() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this iterate over &all_ctors instead?

@eddyb
Copy link
Member

eddyb commented May 21, 2018

I left some nit-picky comments about implementation details, but r? @nikomatsakis

@rust-highfive rust-highfive assigned nikomatsakis and unassigned eddyb May 21, 2018
}
}
let (min, max, ty) = match int_ty {
Isize => min_max_ty!(isize, usize, cx.tcx.types.isize),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is broken, because it's using the host's usize / isize bitwidth instead of the target's.

@varkor
Copy link
Member Author

varkor commented May 22, 2018

I've addressed most of the comments, but there's still the issue with usize/isize for offset_sign_for_ty, which I'll try to get to tomorrow.

--> $DIR/exhaustive_integer_patterns.rs:50:11
|
LL | match x { //~ ERROR non-exhaustive patterns
| ^ patterns `-128i8...-6i8` and `122i8...127i8` not covered
Copy link
Member Author

@varkor varkor May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These offsets do not look right now. I'll look into that.

U128 => ( u128::MAX as u128, cx.tcx.types.u128),
});
let min_max_ty = |sty| {
let size = cx.tcx.layout_of(ty::ParamEnv::reveal_all().and(sty))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use cx.param_env or something. Also, you don't need to match on uint_ty at all! Pass pcx.ty instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's so much nicer, of course!

if pat_interval_hi < subrange_hi {
// The pattern intersects a lower section of the
// subrange, so an upper section will remain.
remaining_ranges.push((pat_interval_hi + 1, subrange_hi));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remaining_ranges could be a Vec<RangeInclusive<u128>>, i.e. use ..= syntax.

@@ -34,6 +34,9 @@ use arena::TypedArena;
use std::cmp::{self, Ordering};
use std::fmt;
use std::iter::{FromIterator, IntoIterator, repeat};
use std::{char, usize, u8, u16, u32, u64, u128, isize, i8, i16, i32, i64, i128};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These, especially usize, are a bad sign - you shouldn't need them.

let size = cx.tcx.layout_of(ty::ParamEnv::reveal_all().and(pcx.ty))
.unwrap().size.bits() as u32;
let shift = 1u128.overflowing_shl(size);
let max = shift.0.wrapping_sub(1 + (shift.1 as u128));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be implemented with just !0 >> (128 - bits).

ty::TyInt(_) if exhaustive_integer_patterns => {
let size = cx.tcx.layout_of(ty::ParamEnv::reveal_all().and(pcx.ty))
.unwrap().size.bits() as u128;
let min = (1u128 << (size - 1)).wrapping_neg();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need wrapping_neg here?

let size = cx.tcx.layout_of(ty::ParamEnv::reveal_all().and(pcx.ty))
.unwrap().size.bits() as u128;
let min = (1u128 << (size - 1)).wrapping_neg();
let max = (1u128 << (size - 1)).wrapping_sub(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need wrapping_sub, can just - 1.

@varkor
Copy link
Member Author

varkor commented May 23, 2018

Everything seems to be working correctly now, as far as I can tell.

@eddyb had some suggestions for refactoring ConstantRange and PatternKind::Range to use u128 instead of ty::Const, but after looking into it, it looks like it's going to require touching a lot of unrelated areas, which I'd rather avoid doing for this PR. At the very least, any negative effects of creating extra Consts are going to be limited to #![feature(exhaustive_integer_patterns)], so following this PR up later shouldn't be an issue. (Though let me know if I've misunderstood anything, @eddyb.)

@varkor varkor force-pushed the exhaustive-integer-matching branch from b3f6184 to 61b6363 Compare August 20, 2018 23:17
@varkor
Copy link
Member Author

varkor commented Aug 20, 2018

I've tried to improve the comments in split_grouped_constructors, but it's hard to know just how detailed it should be (as it's fresher in my head than it is to a reader), so let me know how it can be improved further!

}
let c = match a.1 {
Endpoint::Start => a.0,
Endpoint::End | Endpoint::Both => a.0 + 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this handle integer overflow (i.e. a.0 = uint128::MAX)? worth a test case, at least.

Copy link
Member Author

@varkor varkor Aug 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Edited) a.0 < b.0 strictly, so overflow shouldn't be possible. There are already test cases for covering the entire range of u128 and i128, so I think that covers the bases already. I'll add a comment to this effect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than that and the duplication, this looks good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A: integer overflow can't occur, because only the last point can be uint128::MAX, and only the first point can be 0.

// endpoint a point corresponds to. Whenever a point corresponds to both a start
// and an end, then we create a unit range for it.
#[derive(PartialEq, Clone, Copy, Debug)]
enum Endpoint {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A potential simplification:
I just noticed that range boundaries are always "in between" numbers - an Endpoint::Start is a boundary between p-1 and p, and an Endpoint::End is between p and p+1 (that means that an Endpoint::End is equivalent to an Endpoint::Start that immediately follows it). You could do something based on that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g., I think this is a correct implementation:

/// Represents a border between 2 integers. Because of the "fencepost error",
/// there are be 2^128+1 such borders.
#[derive(Copy, Clone, PartialOrd, Ord, PartialEq, Eq)]
enum Border {
    JustBefore(u128),
    AfterU128Max
}

impl Border {
    fn before(n: u128) -> Self {
        Border::JustBefore(n)
    }

    fn after(n: u128) -> Self {
        match n.checked_add(1) {
            m => Border::JustBefore(m),
            None => Border::AfterU128Max
        }
    }

    fn range_borders(r: IntRange<'_>) -> [Self; 2] {
        let (r_lo, r_hi) = r.range.into_inner();
        vec![Border::before(r_lo), Border::after(r_hi)]
    }

    // return the integers between `self` and `to` if `self < to`, `None` otherwise.
    fn range_to(self, to: Self, ty: Ty<'tcx>) -> Option<IntRange<'tcx>> {
        match (self, to) {
            (Border::JustBefore(n), Border::JustBefore(m)) => {
                IntRange::from_interval(ty, n, m, RangeEnd::Excluded)
            }
            (Border::JustBefore(n), Border::AfterU128Max) => {
                IntRange::from_interval(ty, n, u128::MAX, RangeEnd::Included)
            }
            (Border::AfterU128Max, _) => None
        }
    }
}

// `borders` is the set of borders between equivalence classes - each equivalence
// class is between 2 borders.
let row_borders = m.iter()
    .flat_map(|row| IntRange::from_pat(tcx, row[0]))
    .flat_map(|range| ctor_range.intersection(&r))
    .flat_map(|range| Border::range_borders(range));
let range_borders = Border::range_borders(ctor_range);
let mut borders: Vec<Border> = row_borders.chain(range_borders).collect();
borders.sort();

let ranges = borders.windows(2).map(|window| {
    window[0].range_to(window[1]);
})

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think this is cleaner :) The logic looks good to me too — I'm going to give it a go.

@arielb1
Copy link
Contributor

arielb1 commented Aug 21, 2018

LGTM. If you think borders are cleaner enough than endpoints, you can use that, otherwise r=me. (Also, add a test that 0..u128::MAX-1 is not exhaustive and write my "why integer overflow can't occur" note in a comment).

@arielb1
Copy link
Contributor

arielb1 commented Aug 21, 2018

@bors r+

@bors
Copy link
Contributor

bors commented Aug 21, 2018

📌 Commit 6971c5d has been approved by arielb1

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 21, 2018
@varkor
Copy link
Member Author

varkor commented Aug 21, 2018

@arielb1: thank you for all your detailed comments and pointing me in the right direction!

@bors
Copy link
Contributor

bors commented Aug 22, 2018

⌛ Testing commit 6971c5d with merge a79cffb...

bors added a commit that referenced this pull request Aug 22, 2018
Exhaustive integer matching

This adds a new feature flag `exhaustive_integer_patterns` that enables exhaustive matching of integer types by their values. For example, the following is now accepted:
```rust
#![feature(exhaustive_integer_patterns)]
#![feature(exclusive_range_pattern)]

fn matcher(x: u8) {
  match x { // ok
    0 .. 32 => { /* foo */ }
    32 => { /* bar */ }
    33 ..= 255 => { /* baz */ }
  }
}
```
This matching is permitted on all integer (signed/unsigned and char) types. Sensible error messages are also provided. For example:
```rust
fn matcher(x: u8) {
  match x { //~ ERROR
    0 .. 32 => { /* foo */ }
  }
}
```
results in:
```
error[E0004]: non-exhaustive patterns: `32u8...255u8` not covered
 --> matches.rs:3:9
  |
6 |   match x {
  |         ^ pattern `32u8...255u8` not covered
```

This implements rust-lang/rfcs#1550 for #50907. While there hasn't been a full RFC for this feature, it was suggested that this might be a feature that obviously complements the existing exhaustiveness checks (e.g. for `bool`) and so a feature gate would be sufficient for now.
@bors
Copy link
Contributor

bors commented Aug 22, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: arielb1
Pushing a79cffb to master...

@bors bors merged commit 6971c5d into rust-lang:master Aug 22, 2018
This was referenced Aug 22, 2018
@varkor varkor deleted the exhaustive-integer-matching branch August 22, 2018 10:26
@Centril Centril added the F-exclusive_range_pattern `#![feature(exclusive_range_pattern)]` label Dec 12, 2019
bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 13, 2023
Match usize/isize exhaustively with half-open ranges

The long-awaited finale to the saga of [exhaustiveness checking for integers](rust-lang#50912)!

```rust
match 0usize {
    0.. => {} // exhaustive!
}
match 0usize {
    0..usize::MAX => {} // helpful error message!
}
```

Features:
- Half-open ranges behave as expected for `usize`/`isize`;
- Trying to use `0..usize::MAX` will tell you that `usize::MAX..` is missing and explain why. No more unhelpful "`_` is missing";
- Everything else stays the same.

This should unblock rust-lang#37854.

Review-wise:
- I recommend looking commit-by-commit;
- This regresses perf because of the added complexity in `IntRange`; hopefully not too much;
- I measured each `#[inline]`, they all help a bit with the perf regression (tho I don't get why);
- I did not touch MIR building; I expect there's an easy PR there that would skip unnecessary comparisons when the range is half-open.
bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 13, 2023
Match usize/isize exhaustively with half-open ranges

The long-awaited finale to the saga of [exhaustiveness checking for integers](rust-lang#50912)!

```rust
match 0usize {
    0.. => {} // exhaustive!
}
match 0usize {
    0..usize::MAX => {} // helpful error message!
}
```

Features:
- Half-open ranges behave as expected for `usize`/`isize`;
- Trying to use `0..usize::MAX` will tell you that `usize::MAX..` is missing and explain why. No more unhelpful "`_` is missing";
- Everything else stays the same.

This should unblock rust-lang#37854.

Review-wise:
- I recommend looking commit-by-commit;
- This regresses perf because of the added complexity in `IntRange`; hopefully not too much;
- I measured each `#[inline]`, they all help a bit with the perf regression (tho I don't get why);
- I did not touch MIR building; I expect there's an easy PR there that would skip unnecessary comparisons when the range is half-open.
bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 21, 2023
Match usize/isize exhaustively with half-open ranges

The long-awaited finale to the saga of [exhaustiveness checking for integers](rust-lang#50912)!

```rust
match 0usize {
    0.. => {} // exhaustive!
}
match 0usize {
    0..usize::MAX => {} // helpful error message!
}
```

Features:
- Half-open ranges behave as expected for `usize`/`isize`;
- Trying to use `0..usize::MAX` will tell you that `usize::MAX..` is missing and explain why. No more unhelpful "`_` is missing";
- Everything else stays the same.

This should unblock rust-lang#37854.

Review-wise:
- I recommend looking commit-by-commit;
- This regresses perf because of the added complexity in `IntRange`; hopefully not too much;
- I measured each `#[inline]`, they all help a bit with the perf regression (tho I don't get why);
- I did not touch MIR building; I expect there's an easy PR there that would skip unnecessary comparisons when the range is half-open.
bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 1, 2023
Match usize/isize exhaustively with half-open ranges

The long-awaited finale to the saga of [exhaustiveness checking for integers](rust-lang#50912)!

```rust
match 0usize {
    0.. => {} // exhaustive!
}
match 0usize {
    0..usize::MAX => {} // helpful error message!
}
```

Features:
- Half-open ranges behave as expected for `usize`/`isize`;
- Trying to use `0..usize::MAX` will tell you that `usize::MAX..` is missing and explain why. No more unhelpful "`_` is missing";
- Everything else stays the same.

This should unblock rust-lang#37854.

Review-wise:
- I recommend looking commit-by-commit;
- This regresses perf because of the added complexity in `IntRange`; hopefully not too much;
- I measured each `#[inline]`, they all help a bit with the perf regression (tho I don't get why);
- I did not touch MIR building; I expect there's an easy PR there that would skip unnecessary comparisons when the range is half-open.
github-actions bot pushed a commit to rust-lang/miri that referenced this pull request Nov 2, 2023
Match usize/isize exhaustively with half-open ranges

The long-awaited finale to the saga of [exhaustiveness checking for integers](rust-lang/rust#50912)!

```rust
match 0usize {
    0.. => {} // exhaustive!
}
match 0usize {
    0..usize::MAX => {} // helpful error message!
}
```

Features:
- Half-open ranges behave as expected for `usize`/`isize`;
- Trying to use `0..usize::MAX` will tell you that `usize::MAX..` is missing and explain why. No more unhelpful "`_` is missing";
- Everything else stays the same.

This should unblock rust-lang/rust#37854.

Review-wise:
- I recommend looking commit-by-commit;
- This regresses perf because of the added complexity in `IntRange`; hopefully not too much;
- I measured each `#[inline]`, they all help a bit with the perf regression (tho I don't get why);
- I did not touch MIR building; I expect there's an easy PR there that would skip unnecessary comparisons when the range is half-open.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. F-exclusive_range_pattern `#![feature(exclusive_range_pattern)]` finished-final-comment-period The final comment period is finished for this PR / Issue. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.