Cache conscious hashmap table #36692

arthurprs · 2016-09-24T09:43:02Z

Right now the internal HashMap representation is 3 unziped arrays hhhkkkvvv, I propose to change it to hhhkvkvkv (in further iterations kvkvkvhhh may allow inplace grow). A previous attempt is at #21973.

benefits

This layout is generally more cache conscious as it makes the value immediately accessible after a key matches. The separated hash arrays is a no-brainer because of how the RH algorithm works and that's unchanged.

Lookups: Upon a successful match in the hash array the code can check the key and immediately have access to the value in the same or next cache line (effectively saving a L[1,2,3] miss compared to the current layout).
Inserts/Deletes/Resize: Moving values in the table (robin hooding it) is faster because it touches consecutive cache lines and uses less instructions.

Some backing benchmarks (besides the ones bellow) for the benefits of this layout can be seen here as well http://www.reedbeta.com/blog/2015/01/12/data-oriented-hash-table/

drawbacks

The obvious drawbacks is: padding can be wasted between the key and value. Because of that keys(), values() and contains() can consume more cache and be slower.

Total wasted padding between items (C being the capacity of the table).

Old layout: C * (K-K padding) + C * (V-V padding)
Proposed: C * (K-V padding) + C * (V-K padding)

In practice padding between K-K and V-V can be smaller than K-V and V-K. The overhead is capped(ish) at sizeof u64 - 1 so we can actually measure the worst case (u8 at the end of key type and value with aliment of 1, hardly the average case in practice).

Starting from the worst case the memory overhead is:

HashMap<u64, u8> 46% memory overhead. (aka worst case)
HashMap<u64, u16> 33% memory overhead.
HashMap<u64, u32> 20% memory overhead.
HashMap<T, T> 0% memory overhead
Worst case based on sizeof K + sizeof V:

x	16	24	32	64	128
(8+x+7)/(8+x)	1.29	1.22	1.18	1.1	1.05

benchmarks

I've a test repo here to run benchmarks https://github.com/arthurprs/hashmap2/tree/layout

 ➜  hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt
 name                            hhkkvv:: ns/iter  hhkvkv:: ns/iter  diff ns/iter   diff % 
 grow_10_000                     922,064           783,933               -138,131  -14.98% 
 grow_big_value_10_000           1,901,909         1,171,862             -730,047  -38.38% 
 grow_fnv_10_000                 443,544           418,674                -24,870   -5.61% 
 insert_100                      2,469             2,342                     -127   -5.14% 
 insert_1000                     23,331            21,536                  -1,795   -7.69% 
 insert_100_000                  4,748,048         3,764,305             -983,743  -20.72% 
 insert_10_000                   321,744           290,126                -31,618   -9.83% 
 insert_int_bigvalue_10_000      749,764           407,547               -342,217  -45.64% 
 insert_str_10_000               337,425           334,009                 -3,416   -1.01% 
 insert_string_10_000            788,667           788,262                   -405   -0.05% 
 iter_keys_100_000               394,484           374,161                -20,323   -5.15% 
 iter_keys_big_value_100_000     402,071           620,810                218,739   54.40% 
 iter_values_100_000             424,794           373,004                -51,790  -12.19% 
 iterate_100_000                 424,297           389,950                -34,347   -8.10% 
 lookup_100_000                  189,997           186,554                 -3,443   -1.81% 
 lookup_100_000_bigvalue         192,509           189,695                 -2,814   -1.46% 
 lookup_10_000                   154,251           145,731                 -8,520   -5.52% 
 lookup_10_000_bigvalue          162,315           146,527                -15,788   -9.73% 
 lookup_10_000_exist             132,769           128,922                 -3,847   -2.90% 
 lookup_10_000_noexist           146,880           144,504                 -2,376   -1.62% 
 lookup_1_000_000                137,167           132,260                 -4,907   -3.58% 
 lookup_1_000_000_bigvalue       141,130           134,371                 -6,759   -4.79% 
 lookup_1_000_000_bigvalue_unif  567,235           481,272                -85,963  -15.15% 
 lookup_1_000_000_unif           589,391           453,576               -135,815  -23.04% 
 merge_shuffle                   1,253,357         1,207,387              -45,970   -3.67% 
 merge_simple                    40,264,690        37,996,903          -2,267,787   -5.63% 
 new                             6                 5                           -1  -16.67% 
 with_capacity_10e5              3,214             3,256                       42    1.31%

➜  hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt                                           
 name                           hhkkvv:: ns/iter  hhkvkv:: ns/iter  diff ns/iter   diff % 
 iter_keys_100_000              391,677           382,839                 -8,838   -2.26% 
 iter_keys_1_000_000            10,797,360        10,209,898            -587,462   -5.44% 
 iter_keys_big_value_100_000    414,736           662,255                247,519   59.68% 
 iter_keys_big_value_1_000_000  10,147,837        12,067,938           1,920,101   18.92% 
 iter_values_100_000            440,445           377,080                -63,365  -14.39% 
 iter_values_1_000_000          10,931,844        9,979,173             -952,671   -8.71% 
 iterate_100_000                428,644           388,509                -40,135   -9.36% 
 iterate_1_000_000              11,065,419        10,042,427          -1,022,992   -9.24%

rust-highfive · 2016-09-24T09:43:08Z

r? @aturon

(rust_highfive has picked a reviewer for you, use r? to override)

bluss · 2016-09-24T09:57:38Z

src/libstd/collections/hash/table.rs

@@ -371,8 +370,7 @@ impl<K, V, M> EmptyBucket<K, V, M>
    pub fn put(mut self, hash: SafeHash, key: K, value: V) -> FullBucket<K, V, M> {
        unsafe {
            *self.raw.hash = hash.inspect();
-            ptr::write(self.raw.key as *mut K, key);
-            ptr::write(self.raw.val as *mut V, value);
+            ptr::write(self.raw.pair as *mut (K, V), (key, value));


It would feel more natural to have two writes here and skip making a tuple. Does it matter either direction for performance?

I looked at the disassembler and the end result seems to be the same

for (usize, usize) it's both MOVDQU

for (usize, [u64; 10]) it's

pub fn put(mut self, hash: SafeHash, key: K, value: V) -> FullBucket<K, V, M> { unsafe { *self.raw.hash = hash.inspect(); ec1d: 49 89 39 mov %rdi,(%r9) ec20: 48 8b 8d 10 fe ff ff mov -0x1f0(%rbp),%rcx ec27: 49 89 0c 24 mov %rcx,(%r12) ec2b: 0f 28 85 50 ff ff ff movaps -0xb0(%rbp),%xmm0 ec32: 41 0f 11 44 24 48 movups %xmm0,0x48(%r12) ec38: 0f 28 85 10 ff ff ff movaps -0xf0(%rbp),%xmm0 ec3f: 0f 28 8d 20 ff ff ff movaps -0xe0(%rbp),%xmm1 ec46: 0f 28 95 30 ff ff ff movaps -0xd0(%rbp),%xmm2 ec4d: 0f 28 9d 40 ff ff ff movaps -0xc0(%rbp),%xmm3 ec54: 41 0f 11 5c 24 38 movups %xmm3,0x38(%r12) ec5a: 41 0f 11 54 24 28 movups %xmm2,0x28(%r12) ec60: 41 0f 11 4c 24 18 movups %xmm1,0x18(%r12) ec66: 41 0f 11 44 24 08 movups %xmm0,0x8(%r12) ec6c: 4c 8b 75 b8 mov -0x48(%rbp),%r14 let pair_mut = self.raw.pair as *mut (K, V); ptr::write(&mut (*pair_mut).0, key); ptr::write(&mut (*pair_mut).1, value);

pub fn put(mut self, hash: SafeHash, key: K, value: V) -> FullBucket<K, V, M> { unsafe { *self.raw.hash = hash.inspect(); ec1d: 49 89 39 mov %rdi,(%r9) ec20: 48 8b 8d 10 fe ff ff mov -0x1f0(%rbp),%rcx ec27: 49 89 0c 24 mov %rcx,(%r12) ec2b: 0f 28 85 50 ff ff ff movaps -0xb0(%rbp),%xmm0 ec32: 41 0f 11 44 24 48 movups %xmm0,0x48(%r12) ec38: 0f 28 85 10 ff ff ff movaps -0xf0(%rbp),%xmm0 ec3f: 0f 28 8d 20 ff ff ff movaps -0xe0(%rbp),%xmm1 ec46: 0f 28 95 30 ff ff ff movaps -0xd0(%rbp),%xmm2 ec4d: 0f 28 9d 40 ff ff ff movaps -0xc0(%rbp),%xmm3 ec54: 41 0f 11 5c 24 38 movups %xmm3,0x38(%r12) ec5a: 41 0f 11 54 24 28 movups %xmm2,0x28(%r12) ec60: 41 0f 11 4c 24 18 movups %xmm1,0x18(%r12) ec66: 41 0f 11 44 24 08 movups %xmm0,0x8(%r12) ec6c: 4c 8b 75 b8 mov -0x48(%rbp),%r14 let pair_mut = self.raw.pair as *mut (K, V); ptr::write(pair_mut, (key, value));

for (String, usize) it's

pub fn put(mut self, hash: SafeHash, key: K, value: V) -> FullBucket<K, V, M> { unsafe { *self.raw.hash = hash.inspect(); f670: 4d 89 20 mov %r12,(%r8) f673: 48 8b 45 90 mov -0x70(%rbp),%rax f677: 49 89 06 mov %rax,(%r14) f67a: f3 41 0f 7f 46 08 movdqu %xmm0,0x8(%r14) f680: 49 89 5e 18 mov %rbx,0x18(%r14) f684: 48 8b 5d c8 mov -0x38(%rbp),%rbx let pair_mut = self.raw.pair as *mut (K, V); ptr::write(pair_mut, (key, value));

pub fn put(mut self, hash: SafeHash, key: K, value: V) -> FullBucket<K, V, M> { unsafe { *self.raw.hash = hash.inspect(); f670: 4d 89 20 mov %r12,(%r8) f673: 48 8b 45 90 mov -0x70(%rbp),%rax f677: 49 89 06 mov %rax,(%r14) f67a: f3 41 0f 7f 46 08 movdqu %xmm0,0x8(%r14) f680: 49 89 5e 18 mov %rbx,0x18(%r14) f684: 48 8b 5d c8 mov -0x38(%rbp),%rbx let pair_mut = self.raw.pair as *mut (K, V); // ptr::write(pair_mut, (key, value)); ptr::write(&mut (*pair_mut).0, key); ptr::write(&mut (*pair_mut).1, value);

Nice. As usual, the compiler is very smart.

bluss · 2016-09-24T12:43:22Z

This subset are the .contains_key() benchmarks (retrieving key but not value).
The change to be slower by a few percent reproduces on my machine too (mine is closer to 2%).

 lookup_10_000_exist         127,965           131,684                  3,719    2.91% 
 lookup_10_000_noexist       143,792           143,362                   -430   -0.30%

So this would be the drawback, where the old layout had better cache usage. It seems ok to give this up in return for the rest?

arthurprs · 2016-09-24T13:51:18Z

.keys() and .values() should be slower in this layout, but I can't reproduce it.

arthurprs · 2016-09-24T15:32:55Z

Results for x86

➜  hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: x86.txt
 name                        hhkkvv:: ns/iter  hhkvkv:: ns/iter  diff ns/iter   diff % 
 grow_10_000                 1,298,744         1,197,093             -101,651   -7.83% 
 grow_big_value_10_000       4,285,679         3,887,095             -398,584   -9.30% 
 grow_fnv_10_000             434,184           419,664                -14,520   -3.34% 
 insert_100                  5,256             4,897                     -359   -6.83% 
 insert_1000                 47,448            47,906                     458    0.97% 
 insert_100_000              6,955,971         6,586,020             -369,951   -5.32% 
 insert_10_000               544,478           530,413                -14,065   -2.58% 
 insert_int_bigvalue_10_000  1,441,801         1,178,893             -262,908  -18.23% 
 insert_str_10_000           631,572           596,395                -35,177   -5.57% 
 insert_string_10_000        1,413,129         1,384,202              -28,927   -2.05% 
 iter_keys_10_000            56,995            55,921                  -1,074   -1.88% 
BUSTED iter_keys_big_value_10_000  67,816            60,087                  -7,729  -11.40% 
 iter_values_10_000          62,525            55,809                  -6,716  -10.74% 
 iterate_10_000              62,070            53,937                  -8,133  -13.10% 
 lookup_100_000              334,076           313,012                -21,064   -6.31% 
 lookup_100_000_bigvalue     325,324           319,972                 -5,352   -1.65% 
 lookup_10_000               270,232           263,861                 -6,371   -2.36% 
 lookup_10_000_bigvalue      288,415           270,581                -17,834   -6.18% 
 lookup_10_000_exist         252,338           248,224                 -4,114   -1.63% 
 lookup_10_000_noexist       273,254           272,914                   -340   -0.12% 
 lookup_1_000_000            262,000           259,096                 -2,904   -1.11% 
 lookup_1_000_000_bigvalue   275,820           265,966                 -9,854   -3.57% 
 merge_shuffle               1,664,975         1,542,400             -122,575   -7.36% 
 merge_simple                47,805,889        36,244,422         -11,561,467  -24.18% 
 new                         10                9                           -1  -10.00% 
 with_capacity_10e5          2,496             2,555                       59    2.36%

x86 again (with usize hashes from #36595, thus 31 hash bits)
This one is interesting because I was expecting a regression, but that's not the case.

➜  hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: x86.txt                 
 name                        hhkkvv:: ns/iter  hhkvkv:: ns/iter  diff ns/iter   diff % 
 grow_10_000                 1,274,315         1,123,181             -151,134  -11.86% 
 grow_big_value_10_000       4,303,715         4,018,353             -285,362   -6.63% 
 grow_fnv_10_000             382,259           352,470                -29,789   -7.79% 
 insert_100                  4,923             4,792                     -131   -2.66% 
 insert_1000                 46,183            44,468                  -1,715   -3.71% 
 insert_100_000              7,096,014         6,078,014           -1,018,000  -14.35% 
 insert_10_000               517,265           507,215                -10,050   -1.94% 
 insert_int_bigvalue_10_000  1,401,856         1,175,129             -226,727  -16.17% 
 insert_str_10_000           598,338           586,628                -11,710   -1.96% 
 insert_string_10_000        1,365,544         1,358,503               -7,041   -0.52% 
 iter_keys_10_000            59,629            53,578                  -6,051  -10.15% 
BUSTED iter_keys_big_value_10_000  73,169            65,105                  -8,064  -11.02% 
 iter_values_10_000          81,068            52,079                 -28,989  -35.76% 
 iterate_10_000              85,855            53,962                 -31,893  -37.15% 
 lookup_100_000              313,490           299,432                -14,058   -4.48% 
 lookup_100_000_bigvalue     309,488           302,861                 -6,627   -2.14% 
 lookup_10_000               256,165           250,370                 -5,795   -2.26% 
 lookup_10_000_bigvalue      270,559           256,912                -13,647   -5.04% 
 lookup_10_000_exist         249,432           241,687                 -7,745   -3.11% 
 lookup_10_000_noexist       272,390           272,683                    293    0.11% 
 lookup_1_000_000            261,079           252,781                 -8,298   -3.18% 
 lookup_1_000_000_bigvalue   265,090           253,114                -11,976   -4.52% 
 merge_shuffle               1,580,698         1,423,032             -157,666   -9.97% 
 merge_simple                44,202,722        29,126,884         -15,075,838  -34.11% 
 new                         9                 9                            0    0.00% 
 with_capacity_10e5          1,264             1,349                       85    6.72%

bluss · 2016-09-24T16:59:49Z

Maybe with bigger hashmaps? To make sure it's well out of the cpu cache size.

arthurprs · 2016-09-24T17:32:29Z

After the 3000x look I finally saw that iter_keys_big_value was busted, here are several others for good measure:

➜  hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt                                           
 name                           hhkkvv:: ns/iter  hhkvkv:: ns/iter  diff ns/iter   diff % 
 iter_keys_100_000              391,677           382,839                 -8,838   -2.26% 
 iter_keys_1_000_000            10,797,360        10,209,898            -587,462   -5.44% 
 iter_keys_big_value_100_000    414,736           662,255                247,519   59.68% 
 iter_keys_big_value_1_000_000  10,147,837        12,067,938           1,920,101   18.92% 
 iter_values_100_000            440,445           377,080                -63,365  -14.39% 
 iter_values_1_000_000          10,931,844        9,979,173             -952,671   -8.71% 
 iterate_100_000                428,644           388,509                -40,135   -9.36% 
 iterate_1_000_000              11,065,419        10,042,427          -1,022,992   -9.24%

durka

If the potential downside is wasted space, shouldn't there be some memory benchmarks as well?

durka · 2016-09-25T14:07:20Z

src/libstd/collections/hash/table.rs

 ///
-/// This design uses less memory and is a lot faster than the naive
+/// This design uses is a lot faster than the naive


durka · 2016-09-25T14:07:42Z

src/libstd/collections/hash/table.rs

 ///
-/// This design uses less memory and is a lot faster than the naive
+/// This design uses is a lot faster than the naive
 /// `Vec<Option<u64, K, V>>`, because we don't pay for the overhead of an


is this supposed to say Vec<Option<(u64, K, V)>>?

durka · 2016-09-25T14:08:01Z

src/libstd/collections/hash/table.rs

@@ -48,12 +48,14 @@ const EMPTY_BUCKET: u64 = 0;
 ///     which will likely map to the same bucket, while not being confused
 ///     with "empty".
 ///
-///   - All three "arrays represented by pointers" are the same length:
+///   - All two "arrays represented by pointers" are the same length:


Thanks, I fixed all three.

alexcrichton · 2016-09-26T16:25:12Z

cc @pczarn

arthurprs · 2016-09-29T16:28:14Z

@Veedrac PTAL

Veedrac · 2016-09-29T21:33:31Z

@arthurprs Seems like a solid improvement.

alexcrichton · 2016-10-03T23:55:33Z

@rfcbot fcp merge

Looks like we've got solid wins all around to consider merging?

rfcbot · 2016-10-03T23:59:13Z

FCP proposed with disposition to merge. Review requested from:

No concerns currently listed.
See this document for info about what commands tagged team members can give me.

aturon · 2016-10-04T21:05:22Z

I'm happy to go along with the experts here.

bluss · 2016-10-05T11:04:07Z

Has anyone evaluated this on a real workload? The first one that comes to mind is of course rustc.

arthurprs · 2016-10-05T11:09:20Z

I'm not familiar enough with the bootstrap process, but if somebody provide some guidance I could do it.

bluss · 2016-10-05T12:45:10Z

Tip from simulacrum, that we can use https://github.com/rust-lang-nursery/rustc-benchmarks to test rustc impact. Rustc building itself is a heavier (and more important?) benchmark, don't know exactly what to time there

alexcrichton · 2016-10-05T16:02:43Z

@arthurprs short of timing an execution make followed by applying your patch and doing it again there's not a great way to benchmark the bootstrap. I'd be fine assuming that this'll be a win and we can always revert if it causes a regression. We'll just want to keep a close eye on the online numbers

pczarn · 2016-10-05T17:46:31Z

@arthurprs You can run make TIME_PASSES=1 before and after the patch, then compare the results side-by-side. Keep in mind that compilatons of libstd may not be comparable because the patch changes libstd's code.

I agree with changing the memory layout. However, the tradeoffs are subtle. The benefits and drawbacks of this change depend on circumstances such as the sizes of keys and values.

There is one more drawback that you didn't describe in detail. Let's say the user wants to iterate through HashMap's keys. The user will access every key, which will waste some memory and cache bandwidth on loading the map's values. So neither layout is truly cache conscious. Both are cache conscious in different ways. Of course you have to decide if the efficiency of the keys() and values() iterators is important enough to give the change to the layout a second thought.

I think the benefits outweigh the drawbacks, because accessing single map entries is very common.

arthurprs · 2016-10-05T18:45:11Z

I don't think those tests will be feasible in my laptop. Specially considering the trial and error involved.

I think the benefits far outweighs the drawbacks, there's potential to waste some padding but in the real world it's frequently not the case (try using github search in rust repo and skim some pages). We shouldn't optimize for keys() and values() and those will definitely take a hit (as per benchmarks).

bors · 2016-10-07T07:57:09Z

☔ The latest upstream changes (presumably #36753) made this pull request unmergeable. Please resolve the merge conflicts.

brson · 2016-10-10T21:32:04Z

Nice work @arthurprs !

Rollup of 10 pull requests - Successful merges: #36692, #36743, #36762, #36991, #37023, #37050, #37056, #37064, #37066, #37067 - Failed merges:

bors · 2016-10-12T10:32:06Z

⌛ Testing commit c5068a4 with merge e33334f...

bors · 2016-10-12T12:29:09Z

💔 Test failed - auto-linux-64-nopt-t

arthurprs · 2016-10-12T14:17:09Z

I'll fix it.

arthurprs · 2016-10-12T21:19:57Z

Travis is happy again.

bluss · 2016-10-12T22:53:54Z

@arthurprs Have you looked into why the buildbot tests failed? log link Since it's in the big testsuite and I don't see the PR changing anything there. It was unfortunately green on travis before, and still the buildbot build failed.

bluss · 2016-10-12T22:54:22Z

It's the nopt builder, so presumably related to debug assertions?

arthurprs · 2016-10-13T07:03:20Z

Yes, I should have said "CI should be happy".
I had to use a couple of wrapping Ops to make sure the overflow happened at a specific spot on debug builds.

alexcrichton · 2016-10-13T18:02:57Z

@bors: r+

bors · 2016-10-13T18:02:59Z

📌 Commit c435821 has been approved by alexcrichton

bors · 2016-10-13T18:51:17Z

⌛ Testing commit c435821 with merge 2e0a3dc...

bors · 2016-10-13T19:03:25Z

💔 Test failed - auto-linux-cross-opt

arthurprs · 2016-10-13T19:29:58Z

I'm not sure it's related to the PR.

alexcrichton · 2016-10-13T20:23:03Z

@bors: retry

bors · 2016-10-14T00:25:14Z

⌛ Testing commit c435821 with merge 2353987...

bors · 2016-10-14T02:56:52Z

💔 Test failed - auto-win-gnu-32-opt-rustbuild

arielb1 · 2016-10-14T09:07:58Z

error: pretty-printing failed in round 0 revision None
status: exit code: 3221225477 (STATUS_ACCESS_VIOLATION)

arielb1 · 2016-10-14T09:08:45Z

@bors retry

bors · 2016-10-14T09:23:20Z

⌛ Testing commit c435821 with merge 40cd1fd...

Cache conscious hashmap table Right now the internal HashMap representation is 3 unziped arrays hhhkkkvvv, I propose to change it to hhhkvkvkv (in further iterations kvkvkvhhh may allow inplace grow). A previous attempt is at #21973. This layout is generally more cache conscious as it makes the value immediately accessible after a key matches. The separated hash arrays is a _no-brainer_ because of how the RH algorithm works and that's unchanged. **Lookups**: Upon a successful match in the hash array the code can check the key and immediately have access to the value in the same or next cache line (effectively saving a L[1,2,3] miss compared to the current layout). **Inserts/Deletes/Resize**: Moving values in the table (robin hooding it) is faster because it touches consecutive cache lines and uses less instructions. Some backing benchmarks (besides the ones bellow) for the benefits of this layout can be seen here as well http://www.reedbeta.com/blog/2015/01/12/data-oriented-hash-table/ The obvious drawbacks is: padding can be wasted between the key and value. Because of that keys(), values() and contains() can consume more cache and be slower. Total wasted padding between items (C being the capacity of the table). * Old layout: C * (K-K padding) + C * (V-V padding) * Proposed: C * (K-V padding) + C * (V-K padding) In practice padding between K-K and V-V *can* be smaller than K-V and V-K. The overhead is capped(ish) at sizeof u64 - 1 so we can actually measure the worst case (u8 at the end of key type and value with aliment of 1, _hardly the average case in practice_). Starting from the worst case the memory overhead is: * `HashMap<u64, u8>` 46% memory overhead. (aka *worst case*) * `HashMap<u64, u16>` 33% memory overhead. * `HashMap<u64, u32>` 20% memory overhead. * `HashMap<T, T>` 0% memory overhead * Worst case based on sizeof K + sizeof V: | x | 16 | 24 | 32 | 64 | 128 | |----------------|--------|--------|--------|-------|-------| | (8+x+7)/(8+x) | 1.29 | 1.22 | 1.18 | 1.1 | 1.05 | I've a test repo here to run benchmarks https://github.com/arthurprs/hashmap2/tree/layout ``` ➜ hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt name hhkkvv:: ns/iter hhkvkv:: ns/iter diff ns/iter diff % grow_10_000 922,064 783,933 -138,131 -14.98% grow_big_value_10_000 1,901,909 1,171,862 -730,047 -38.38% grow_fnv_10_000 443,544 418,674 -24,870 -5.61% insert_100 2,469 2,342 -127 -5.14% insert_1000 23,331 21,536 -1,795 -7.69% insert_100_000 4,748,048 3,764,305 -983,743 -20.72% insert_10_000 321,744 290,126 -31,618 -9.83% insert_int_bigvalue_10_000 749,764 407,547 -342,217 -45.64% insert_str_10_000 337,425 334,009 -3,416 -1.01% insert_string_10_000 788,667 788,262 -405 -0.05% iter_keys_100_000 394,484 374,161 -20,323 -5.15% iter_keys_big_value_100_000 402,071 620,810 218,739 54.40% iter_values_100_000 424,794 373,004 -51,790 -12.19% iterate_100_000 424,297 389,950 -34,347 -8.10% lookup_100_000 189,997 186,554 -3,443 -1.81% lookup_100_000_bigvalue 192,509 189,695 -2,814 -1.46% lookup_10_000 154,251 145,731 -8,520 -5.52% lookup_10_000_bigvalue 162,315 146,527 -15,788 -9.73% lookup_10_000_exist 132,769 128,922 -3,847 -2.90% lookup_10_000_noexist 146,880 144,504 -2,376 -1.62% lookup_1_000_000 137,167 132,260 -4,907 -3.58% lookup_1_000_000_bigvalue 141,130 134,371 -6,759 -4.79% lookup_1_000_000_bigvalue_unif 567,235 481,272 -85,963 -15.15% lookup_1_000_000_unif 589,391 453,576 -135,815 -23.04% merge_shuffle 1,253,357 1,207,387 -45,970 -3.67% merge_simple 40,264,690 37,996,903 -2,267,787 -5.63% new 6 5 -1 -16.67% with_capacity_10e5 3,214 3,256 42 1.31% ``` ``` ➜ hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt name hhkkvv:: ns/iter hhkvkv:: ns/iter diff ns/iter diff % iter_keys_100_000 391,677 382,839 -8,838 -2.26% iter_keys_1_000_000 10,797,360 10,209,898 -587,462 -5.44% iter_keys_big_value_100_000 414,736 662,255 247,519 59.68% iter_keys_big_value_1_000_000 10,147,837 12,067,938 1,920,101 18.92% iter_values_100_000 440,445 377,080 -63,365 -14.39% iter_values_1_000_000 10,931,844 9,979,173 -952,671 -8.71% iterate_100_000 428,644 388,509 -40,135 -9.36% iterate_1_000_000 11,065,419 10,042,427 -1,022,992 -9.24% ```

bors · 2016-10-14T18:58:41Z

rfcbot · 2016-10-17T01:04:29Z

All relevant subteam members have reviewed. No concerns remain.

rfcbot · 2016-10-24T01:06:06Z

It has been one week since all blocks to the FCP were resolved.

rust-highfive assigned aturon Sep 24, 2016

arthurprs mentioned this pull request Sep 24, 2016

Revisit HashMap memory layout #36660

Closed

arthurprs force-pushed the hashmap-layout branch from 976004c to 006c6ba Compare September 24, 2016 09:51

bluss reviewed Sep 24, 2016

View reviewed changes

durka reviewed Sep 25, 2016

View reviewed changes

arthurprs force-pushed the hashmap-layout branch from 006c6ba to 9098c5c Compare September 25, 2016 20:51

alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Sep 26, 2016

arthurprs mentioned this pull request Oct 5, 2016

Exposure of HashMap iteration order allows for O(n²) blowup. #36481

Open

arthurprs force-pushed the hashmap-layout branch from 9098c5c to 70f9b98 Compare October 10, 2016 09:28

brson added the relnotes Marks issues that should be documented in the release notes of the next release. label Oct 10, 2016

bors added a commit that referenced this pull request Oct 12, 2016

Auto merge of #37093 - jonathandturner:rollup, r=jonathandturner

ccb8b3e

Rollup of 10 pull requests - Successful merges: #36692, #36743, #36762, #36991, #37023, #37050, #37056, #37064, #37066, #37067 - Failed merges:

Cache conscious hashmap table

c435821

arthurprs force-pushed the hashmap-layout branch from c5068a4 to c435821 Compare October 12, 2016 15:07

bors merged commit c435821 into rust-lang:master Oct 14, 2016

Cache conscious hashmap table #36692

Cache conscious hashmap table #36692

Conversation

arthurprs commented Sep 24, 2016 • edited Loading

benefits

drawbacks

benchmarks

rust-highfive commented Sep 24, 2016

bluss Sep 24, 2016

Choose a reason for hiding this comment

arthurprs Sep 24, 2016 • edited Loading

Choose a reason for hiding this comment

bluss Sep 24, 2016

Choose a reason for hiding this comment

bluss commented Sep 24, 2016

arthurprs commented Sep 24, 2016

arthurprs commented Sep 24, 2016 • edited Loading

bluss commented Sep 24, 2016

arthurprs commented Sep 24, 2016 • edited Loading

durka left a comment

Choose a reason for hiding this comment

durka Sep 25, 2016

Choose a reason for hiding this comment

durka Sep 25, 2016

Choose a reason for hiding this comment

durka Sep 25, 2016

Choose a reason for hiding this comment

arthurprs Sep 26, 2016

Choose a reason for hiding this comment

alexcrichton commented Sep 26, 2016

arthurprs commented Sep 29, 2016

Veedrac commented Sep 29, 2016

alexcrichton commented Oct 3, 2016

rfcbot commented Oct 3, 2016 • edited Loading

aturon commented Oct 4, 2016

bluss commented Oct 5, 2016

arthurprs commented Oct 5, 2016

bluss commented Oct 5, 2016

alexcrichton commented Oct 5, 2016

pczarn commented Oct 5, 2016

arthurprs commented Oct 5, 2016

bors commented Oct 7, 2016

brson commented Oct 10, 2016

bors commented Oct 12, 2016

bors commented Oct 12, 2016

arthurprs commented Oct 12, 2016

arthurprs commented Oct 12, 2016

bluss commented Oct 12, 2016

bluss commented Oct 12, 2016

arthurprs commented Oct 13, 2016

alexcrichton commented Oct 13, 2016

bors commented Oct 13, 2016

bors commented Oct 13, 2016

bors commented Oct 13, 2016

arthurprs commented Oct 13, 2016

alexcrichton commented Oct 13, 2016

bors commented Oct 14, 2016

bors commented Oct 14, 2016

arielb1 commented Oct 14, 2016 • edited Loading

arielb1 commented Oct 14, 2016

bors commented Oct 14, 2016

bors commented Oct 14, 2016

rfcbot commented Oct 17, 2016

rfcbot commented Oct 24, 2016

arthurprs commented Sep 24, 2016 •

edited

Loading

arthurprs Sep 24, 2016 •

edited

Loading

arthurprs commented Sep 24, 2016 •

edited

Loading

arthurprs commented Sep 24, 2016 •

edited

Loading

rfcbot commented Oct 3, 2016 •

edited

Loading

arielb1 commented Oct 14, 2016 •

edited

Loading