Express Maj with 2 XORs instead of 5 #1299
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
constraints for 3 hashes: 18.8k -> 15.8k
speeds up
Maj
with the formula:the first version is what we originally used, but needs 5 XORs, the second/new version only needs 2 XORs and also fewer generic gates.
also, speeds up
Ch
thanks to:the insight here was that we can use the second formula (
+
instead of XOR) because the terms have no overlapping 1 bits.third speed-up comes from using faster range checks for the quotient in
divMod32
when used for addition mod 32. the existing 32 bit check (which uses 2.5 rows) is a good default because it supports all inputs up to 64 bits, for example resulting from 32x32 bit multiplication. But addition can only overflow by 1 bit, so we can use the boolean check which only uses 0.5 rows. For longer sums we can use the 16-bit check which uses 1 row.in general, it seems reasonable to allow specifying the quotient bits as an extra argument in
divMod32