Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std: Add retain method for HashMap and HashSet #39560

Merged
merged 1 commit into from
Feb 15, 2017
Merged

std: Add retain method for HashMap and HashSet #39560

merged 1 commit into from
Feb 15, 2017

Conversation

F001
Copy link
Contributor

@F001 F001 commented Feb 5, 2017

Fix #36648

r? @bluss

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @bluss (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

/// map.retain(|&k, _| k % 2 == 0);
/// assert_eq!(map.len(), 3);
/// ```
#[stable(feature = "retain_hash_map", since = "1.17.0")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should enter the source tree as an unstable feature. The feature name is good. I'll just give you an initial review libs team will decide on ultimately accepting this or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we learned from Vec::retain's restrictions is that mutable access while retaining is useful. In this case it should then be that it passes &K, &mut V.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accept.

/// set.retain(|&k| k % 2 == 0);
/// assert_eq!(set.len(), 3);
/// ```
#[stable(feature = "retain_hash_set", since = "1.17.0")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same (unstable) The two methods can use the same feature name, that makes it simpler; they will almost surely be stabilized later as a unit.

for (h, k, v) in old_table.into_iter() {
if f(&k, &v) {
self.insert_hashed_nocheck(h, k, v);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks good. It could also use the ordered insertion loop that resize is using, since we're iterating in hash order and preserving the order.

Either of those are not in place, and ideally this operation should be in place, I wonder if that can be done efficiently? I'd prefer to have this API even if it is not implemented in place. Since it's only ever removing keys, I don't see any traps (cases where it runs very slowly).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterating the map buckets in reverse order and deleting when needed should be efficient (lower number of backward shift) but I'm not sure if the current XXXBucket interface supports that.

Copy link
Contributor

@pczarn pczarn Feb 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current bucket table interface doesn't support reverse iteration over buckets. Adding that support is as simple as writing a prev method, though. For deletion, you should be able to use code similar to the existing backward shift.

Anyway, @F001's implementation is a good starting point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this function by "in place" algorithm. And used reverse iteration.

Though, the benchmark data has not collected. I don't know whether it is more efficient.

@bluss
Copy link
Member

bluss commented Feb 5, 2017

cc @pczarn @arthurprs

@alexcrichton alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Feb 6, 2017
Copy link
Contributor

@arthurprs arthurprs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great addition.

Even if the non-inplace algorithm performs better in some cases, I think the inplace is the way to go in order to allow caller to intentionally avoid the memory allocator.

// during insertion. We must skip forward to a bucket that won't
// get reinserted too early and won't unfairly steal others spot.
// This eliminates the need for robin hood.
pub fn buffer_head(table: M) -> Bucket<K, V, M> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about head_bucket here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

}
Empty(b) => {
// Encountered a hole between clusters.
b.into_bucket()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we use it for retain we should early return on an empty bucket as well, avoiding walking the buckets more than needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this. Empty bucket and occupied bucket are not in separate groups. I have to walk all the buckets without missing any one, just like the clone method does.

Could you please explain more about this?

Copy link
Contributor

@arthurprs arthurprs Feb 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the way RobinHood hashing works as soon as you encounter an empty bucket the next non-empty bucket is guaranteed to have displacement = 0. Right now the code goes all the way to the first full bucket with displacement = 0, which is fine for both use cases but may mean extra work for retain. Changing it to return on empty bucket as well changes nothing for resize and possibly saves this bit of work on retain. Micro, but still...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind this, it's probably not worth it.

Copy link
Contributor Author

@F001 F001 Feb 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still struggling to fully understand the algorithm...

Thank you all the same.

Is there any other issue there?

@@ -1061,6 +1129,45 @@ impl<K: Clone, V: Clone> Clone for RawTable<K, V> {
}
}

impl<K: Debug, V: Debug> Debug for RawTable<K, V> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep these out to avoid unnecessary code? The end user can't use these anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -416,22 +416,26 @@ fn search_hashed<K, V, M, F>(table: M, hash: SafeHash, mut is_match: F) -> Inter
}
}

fn pop_internal<K, V>(starting_bucket: FullBucketMut<K, V>) -> (K, V) {
fn pop_internal<K, V>(starting_bucket: FullBucketMut<K, V>)
-> (K, V, &mut RawTable<K, V>)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to return Bucket<K, V, ...> instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accept.

Copy link
Contributor

@arthurprs arthurprs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, just one small issue.

@@ -416,22 +416,26 @@ fn search_hashed<K, V, M, F>(table: M, hash: SafeHash, mut is_match: F) -> Inter
}
}

fn pop_internal<K, V>(starting_bucket: FullBucketMut<K, V>) -> (K, V) {
fn pop_internal<K, V>(starting_bucket: FullBucketMut<K, V>)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you revert this to return the &mut Table (reverting my previous suggestion)? Sorry about that.
Right now it isn't consistent regarding what bucket it returns. Fixing that would increase the diff even more for no good reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Should I squash all the commits together?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@F001
Copy link
Contributor Author

F001 commented Feb 9, 2017

@bluss @pczarn , do you have other concerns?

Copy link
Contributor

@arthurprs arthurprs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alexcrichton
Copy link
Member

@bors: r+

@bors
Copy link
Contributor

bors commented Feb 13, 2017

📌 Commit d90a7b3 has been approved by alexcrichton

@bors
Copy link
Contributor

bors commented Feb 13, 2017

⌛ Testing commit d90a7b3 with merge cccf756...

@steveklabnik
Copy link
Member

@bors: retry

frewsxcv added a commit to frewsxcv/rust that referenced this pull request Feb 14, 2017
std: Add retain method for HashMap and HashSet

Fix rust-lang#36648

r? @bluss
frewsxcv added a commit to frewsxcv/rust that referenced this pull request Feb 14, 2017
std: Add retain method for HashMap and HashSet

Fix rust-lang#36648

r? @bluss
frewsxcv added a commit to frewsxcv/rust that referenced this pull request Feb 14, 2017
std: Add retain method for HashMap and HashSet

Fix rust-lang#36648

r? @bluss
@bors
Copy link
Contributor

bors commented Feb 14, 2017

⌛ Testing commit d90a7b3 with merge 9344cd3...

@arthurprs
Copy link
Contributor

I overlooked it before but we may want a have a &V instead of &mut V for consistency with Vec and other friends (#25477) even if it's strictly less useful.

@alexcrichton
Copy link
Member

Ah no that's intentional, we'd like to change Vec but we're unfortunately unable to :(

@bors
Copy link
Contributor

bors commented Feb 14, 2017

💔 Test failed - status-travis

@alexcrichton
Copy link
Member

alexcrichton commented Feb 14, 2017 via email

frewsxcv added a commit to frewsxcv/rust that referenced this pull request Feb 15, 2017
std: Add retain method for HashMap and HashSet

Fix rust-lang#36648

r? @bluss
@bors
Copy link
Contributor

bors commented Feb 15, 2017

⌛ Testing commit d90a7b3 with merge ea8c629...

bors added a commit that referenced this pull request Feb 15, 2017
std: Add retain method for HashMap and HashSet

Fix #36648

r? @bluss
@bors
Copy link
Contributor

bors commented Feb 15, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing ea8c629 to master...

@bors bors merged commit d90a7b3 into rust-lang:master Feb 15, 2017
@mbrubeck mbrubeck mentioned this pull request Apr 17, 2017
2 tasks
@F001 F001 deleted the retainHashMap branch November 16, 2017 08:44
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Nov 13, 2020
Implement BTreeMap::retain and BTreeSet::retain

Adds new methods `BTreeMap::retain` and `BTreeSet::retain`.  These are implemented on top of `drain_filter` (rust-lang#70530).

The API of these methods is identical to `HashMap::retain` and `HashSet::retain`, which were implemented in rust-lang#39560 and stabilized in rust-lang#36648.  The docs and tests are also copied from HashMap/HashSet.

The new methods are unstable, behind the `btree_retain` feature gate, with tracking issue rust-lang#79025.  See also rust-lang/rfcs#1338.
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Nov 13, 2020
Implement BTreeMap::retain and BTreeSet::retain

Adds new methods `BTreeMap::retain` and `BTreeSet::retain`.  These are implemented on top of `drain_filter` (rust-lang#70530).

The API of these methods is identical to `HashMap::retain` and `HashSet::retain`, which were implemented in rust-lang#39560 and stabilized in rust-lang#36648.  The docs and tests are also copied from HashMap/HashSet.

The new methods are unstable, behind the `btree_retain` feature gate, with tracking issue rust-lang#79025.  See also rust-lang/rfcs#1338.
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Nov 14, 2020
Implement BTreeMap::retain and BTreeSet::retain

Adds new methods `BTreeMap::retain` and `BTreeSet::retain`.  These are implemented on top of `drain_filter` (rust-lang#70530).

The API of these methods is identical to `HashMap::retain` and `HashSet::retain`, which were implemented in rust-lang#39560 and stabilized in rust-lang#36648.  The docs and tests are also copied from HashMap/HashSet.

The new methods are unstable, behind the `btree_retain` feature gate, with tracking issue rust-lang#79025.  See also rust-lang/rfcs#1338.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants