Skip to content

Commit

Permalink
btree: Reduce opportunities for branch mispredictions in binary search
Browse files Browse the repository at this point in the history
We currently use a textbook binary search algorithm. This is known to
suffer from branch misprediction penalties. The branch mispredictions
can also clobber cache lines, which is detrimental to performance.

I recently read about a branchless binary search algorithm published by
Knuth called Shar's Algorithm (not to be confused with Shor's
algorithm). It is well known to outperform the textbook binary search
algorithm. It does an extra comparison. It is typically presented for
power of 2 array sizes, and adapting it to support non-power of 2 array
sizes is difficult to do in a way that is convincingly correct. Adapting
it to fill out zfs_btree_index_t is even more complex.

Therefore, I invented my own algorithm by refactoring the textbook
algorithm using a few tricks:

	1. x = (y < z) ? a : b is equivalent to
	   x = a * (y < z) + b * (y >= z)

	2. x = (y > z) ? a : b is equivalent to
	   x = a * (y > z) + b * (y <= z)

	3. The maximum number of iterations will be highbit(size), so we
	   can iterate on that.

	4. Ensuring that we get the same results means that we need to
	   handle early matches. This means we must avoid changes to the
	   values of comp and idx when comp is 0, which can do when comp
	   is 0 by doing idx = !!comp * (min + max) / 2 + !comp * idx.
	   This will make us repeat the previous comparison.

	5. If we delete the equal to case from the equivalencies used in
	   calculating min and max, we can cause them to be 0 when we
	   have an early match. This allows us to drop !!comp, since
	   0 + 0) / 2 is 0.

	6. We still iterate an extra time on non-power of 2 array sizes
	   whenever we would be checking a value that is not present.
	   However, in that case, the value will always be less than,
	   such that the value of max does not change and we exit the
	   loop with the same value of max as if we had exited early.

	7. We can access invalid memory if min is ever allowed to equal
	   nelems, which is why the original algorithm imposes the loop
	   condition `min < max`. Being branchless, we cannot do that,
	   but we can replace the 1 being used to increment min with
	   (min == nelems - 1) to avoid the incrementation. This will
	   cause us to harmlessly repeat the last operation.

The result is that we avoid both branch misprediction penalties in the
loop and cache pollution by only accessing the memory locations that we
need to access to perform a binary search. This comes at the expense of
some additonal computations, but we are likely to stall waiting on
memory accesses otherwise, so the additional computations should be
effectively free.

Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
  • Loading branch information
ryao committed May 13, 2023
1 parent 7381ddf commit c4ed029
Showing 1 changed file with 20 additions and 14 deletions.
34 changes: 20 additions & 14 deletions module/zfs/btree.c
Original file line number Diff line number Diff line change
Expand Up @@ -216,27 +216,33 @@ zfs_btree_create_custom(zfs_btree_t *tree,
}

/*
* Find value in the array of elements provided. Uses a simple binary search.
* Find value in the array of elements provided. Uses a "branchless" binary
* search derived by refactoring a simple binary search to avoid branch
* misprediction penalties by not branching within the loop.
*/
static void *
zfs_btree_find_in_buf(zfs_btree_t *tree, uint8_t *buf, uint32_t nelems,
const void *value, zfs_btree_index_t *where)
{
uint32_t max = nelems;
uint32_t min = 0;
while (max > min) {
uint32_t idx = (min + max) / 2;
uint8_t *cur = buf + idx * tree->bt_elem_size;
int comp = tree->bt_compar(cur, value);
if (comp < 0) {
min = idx + 1;
} else if (comp > 0) {
max = idx;
} else {
where->bti_offset = idx;
where->bti_before = B_FALSE;
return (cur);
}
uint32_t idx = 0;
uint8_t *cur;
uint32_t i = highbit64(nelems);
int comp = 1;

while (i--) {
idx = (min + max) / 2 + !comp * idx;
cur = buf + idx * tree->bt_elem_size;
comp = tree->bt_compar(cur, value);
min = (idx + (min != nelems - 1)) * (comp < 0) + min * (comp > 0);
max = idx * (comp > 0) + max * (comp < 0);
}

if (comp == 0) {
where->bti_offset = idx;
where->bti_before = B_FALSE;
return (cur);
}

where->bti_offset = max;
Expand Down

0 comments on commit c4ed029

Please sign in to comment.