Indexed Merkle tree #1666

mitschabaude · 2024-05-27T18:59:37Z

This PR introduces IndexedMerkleMap, an all around better version of MerkleMap. See #1655 for a detailed description, including the API which this PR implements.

In short, there are two motivations to introduce a new Merkle storage primitive:

The old primitives didn't cleanly map to provable types, and are appallingly hard to use in circuits
The old Merkle map is a sparse tree of height 256, which meant a large number of in-circuit hashes.

IndexedMerkleMap uses about 4-8x fewer constraints than MerkleMap when used with height 31 (which supports 1 billion entries). Here are some constraint counts for different operations:

indexed merkle map (get) 461
indexed merkle map (get option) 975
sparse merkle map (get) 4208

indexed merkle map (insert) 1696
indexed merkle map (update) 905
indexed merkle map (set) 1878
sparse merkle map (set) 8160

indexed merkle map (assert included) 460
indexed merkle map (assert not included) 507
indexed merkle map (is included) 508

EDIT: Based on feedback from @dfstio, the API was expanded to be useful for cases where only inclusion of a key (but not the value) is important, see discussion below.

mitschabaude · 2024-05-29T15:25:26Z

Thanks for your feedback @dfstio. Based on it, I added a new version of get() which only proves inclusion and made it the default version. The old version, which also handled non-inclusion, is called getOption() now.

Here are the numbers for height 11, it matches with what you wrote:

indexed merkle map (get) 176
indexed merkle map (get option) 405
sparse merkle map (get) 4208

indexed merkle map (insert) 542
indexed merkle map (update) 350
indexed merkle map (set) 758
sparse merkle map (set) 8160

dfstio · 2024-05-29T15:27:43Z

Nonetheless, your comments gave me a good idea: Add a cheaper version of get() which asserts that the key-value pair is included (the current version gracefully handles the case when it's not included, and returns an option)

It would be a great addition. Proving inclusion and exclusion (for nullifiers) are very important operations that should probably be reflected in the IndexedMerkleMap methods.

And given that data is public, I can do myself toJSON () and fromJSON () to/from base64.

dfstio · 2024-05-29T15:33:17Z

The old version, which also handled non-inclusion, is called getOption() now.

Can the version that will only prove exclusion (non-inclusion) take less than 405 constraints, similar to the get() that takes half of it?

mitschabaude · 2024-05-29T15:41:49Z

The old version, which also handled non-inclusion, is called getOption() now.

Can the version that will only prove exclusion (non-inclusion) take less than 405 constraints, similar to the get() that takes half of it?

Great point, done!

indexed merkle map (assert included) 175
indexed merkle map (assert not included) 222
indexed merkle map (is included) 223

dfstio · 2024-05-29T15:44:29Z

Can assertIncluded() return a value for the key?

mitschabaude · 2024-05-29T15:46:41Z

Can assertIncluded() return a value for the key?

the version of assertIncluded() which returns a value already exists - it's get()!

dfstio · 2024-05-29T20:08:04Z

How do we serialize the witness to calculate recursive proofs using many workers?

If I want to create a proof confirming that I've correctly added ten key-values to the IndexedMerkleMap and want to split the calculation between 10 separate workers running in parallel, each calculating the proof to be merged later for one key-value pair, I would need to be able to generate a serializable witness to be passed to the worker. Otherwise, I should serialize the whole map, which would take much longer.

Effectively, for this use case, _computeRoot() should be split to map.getWitness(key) and computeRoot(witness), with the witness being easily serializable.

map.getWitness() should be called in the master worker for all ten key-value pairs, and computeRoot() should be called in the provable code for each of the ten workers. Each worker should not have the map, just a witness.

Example when it is needed: Serializing Proving

mitschabaude · 2024-05-29T21:09:04Z

How do we serialize the witness to calculate recursive proofs using many workers?

Interesting challenge!

mitschabaude · 2024-05-30T09:27:19Z

map.getWitness() should be called in the master worker for all ten key-value pairs, and computeRoot() should be called in the provable code for each of the ten workers. Each worker should not have the map, just a witness.

I thought about this a bit, and think it can be done in a way that is compatible with the current design of the IndexedMerkleTree data structure.

The idea is that the current implementation should work if you don't have the full tree, but just the subset that are touched by your updates. This is quite similar to what you propose, since in the end a collection of Merkle witnesses is also just a subset of the tree.

There are two internal data structures: nodes and sortedLeaves. Both should currently allow pruning to the values you actually need. For nodes, you'd need to store arrays of the same length, but mostly filled with empty slots, not sure how much memory that saves. In the case of sortedLeaves, only having a subset should just work.

So for parallel proving, we could:

Run all tree updates serially, without proving
Take a pruned snapshot every k updates
Send each pruned snapshot to a worker, which will perform a proof of k updates
Merge those proofs in a tree to get a single proof

The nice thing is that circuits can be written exactly as in a normal, serial implementation.

Actually this is extremely close to what Mina does with transaction proofs, where snapshots of the ledger are updated :D

mitschabaude · 2024-05-30T10:26:19Z

Note to self / reviewers: the implementation currently doesn't do proof of updates correctly. The problem is that it doesn't connect the Merkle path for the update with the path previously validated against the old commitment

dfstio · 2024-05-31T17:07:19Z

For nodes, you'd need to store arrays of the same length, but mostly filled with empty slots, not sure how much memory that saves

I've done some testing with MerkleTree to evaluate serialized map size and serialized MerkleTree witness size, and the results are as follows:

random indexes in the tree:
height: 11, elements: 1000, tree size: 75,508 chars, witness size: 495 chars (0.66%)
height: 20, elements: 10000, tree size: 3,243,225 chars, witness size: 860 chars (0.03%)
height: 30, elements: 10000, tree size: 8,159,377 chars, witness size: 1,312 chars (0.02%)
height: 50, elements: 10000, tree size: 18,475,240 chars, witness size: 2,211 chars (0.012%)
height: 100, elements: 100000, tree size: 456,081,607 chars, witness size: 4,507 chars (0.0010%)
height: 255, elements: 25000, tree size: 405,075,106 chars, witness size: 11,600 chars (0.0029%)

ordered indexes in the tree:
height: 11, elements: 1000, tree size: 93,319 chars, witness size: 499 chars (0.53%)
height: 20, elements: 10000, tree size: 941,828 chars, witness size: 910 chars (0.096%)
height: 30, elements: 10000, tree size: 942,558 chars, witness size: 1,367 chars (0.15%)
height: 50, elements: 10000, tree size: 943,572 chars, witness size: 2,283 chars (0.24%)
height: 100, elements: 100000, tree size: 9,526,927 chars, witness size: 4,567 chars (0.048%)
height: 255, elements: 100000, tree size: 9,535,772 chars, witness size: 11,662 chars (0.12%)

The IndexedMerkleMap should be closer to ordered indexes, so by creating a witness or pruned snapshot we can decrease the serialized witness size by circa 1000x. Btw, it also shows the serialized map size savings IndexedMerkleMap will bring: it should be 170x (16k per element in MerkleMap vs 95 bytes per element with IndexedMerkleMap)

Take a pruned snapshot every k updates

We need to take a pruned snapshot several times BEFORE running the circuit without proofs for k updates and make sure that low leaves are also included.

Actually this is extremely close to what Mina does with transaction proofs, where snapshots of the ledger are updated :D

I believe that IndexedMerkleMap is extremely important for rollups on Mina protocol and will save a lot of money in proving costs

mitschabaude added 30 commits May 25, 2024 15:11

scaffold some types

c1ce7df

scaffold implementation

061ef22

get / set internal nodes

c71a7ba

restructure

1859512

introduce low nodes to the data structure

d2e8d91

bisection algorithm

ecc495c

fix and simplify algorithm

df3c66a

find node

5ccd155

inclusion proof

977d0d5

insert leaf

5395145

fix update leaf, some cleanup

cc30939

update

af33aec

represent index in constraints

171d168

start on set

472ba24

finish set

6bc79d8

give option a constructor

33c216e

implement get, remove remove

910e574

simplify

97f03ac

correct initial root

23eede4

better name

7201a2c

simplify

8bb2290

api friendliness

258a64e

fix constructor

5028c39

minor

a2efac8

fix several bugs

4243827

fix provable from class

70da097

move around code

3aac569

first iteration of a provable intf (but won't work)

5c0723b

remove type intf

6ab8801

make it a class factory

d123aa9

mitschabaude added 2 commits May 29, 2024 17:38

add methods that are only about inclusion of a key

cd973d7

measure constraints

5fc02e8

mitschabaude added 2 commits May 29, 2024 18:12

prefix low-level / internal APIs with an underscore, some more docs

1c31007

tweak types, remove abstract from base class

a7eb350

mitschabaude added 9 commits June 3, 2024 11:19

minor

2c3f1a1

prove updates correctly

6461e1c

avoid computing index bits in circuit when not needed

92d35f7

remove nextIndex

b0ea55d

safer/more logical representation of leaf index

c189fab

misc improvements

3d8c784

unconstrained tweak

c6445ab

make indexed map clonable

25ddcce

return the previous value

d59e84b

This was referenced Jun 3, 2024

IndexedMerkleMap: Support 0 and -1 keys #1671

Merged

Use IndexedMerkleMap for OffchainState #1672

Merged

Trivo25 approved these changes Jun 4, 2024

View reviewed changes

mitschabaude merged commit 8758daa into main Jun 4, 2024
14 checks passed

mitschabaude deleted the feature/indexed-merkle-map branch June 4, 2024 16:45

mitschabaude mentioned this pull request Jul 3, 2024

Indexed Merkle tree to improve offchain state efficiency #1655

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexed Merkle tree #1666

Indexed Merkle tree #1666

mitschabaude commented May 27, 2024 •

edited

Loading

mitschabaude commented May 29, 2024

dfstio commented May 29, 2024

dfstio commented May 29, 2024

mitschabaude commented May 29, 2024

dfstio commented May 29, 2024

mitschabaude commented May 29, 2024

dfstio commented May 29, 2024 •

edited

Loading

mitschabaude commented May 29, 2024

mitschabaude commented May 30, 2024 •

edited

Loading

mitschabaude commented May 30, 2024

dfstio commented May 31, 2024 •

edited

Loading

Indexed Merkle tree #1666

Indexed Merkle tree #1666

Conversation

mitschabaude commented May 27, 2024 • edited Loading

mitschabaude commented May 29, 2024

dfstio commented May 29, 2024

dfstio commented May 29, 2024

mitschabaude commented May 29, 2024

dfstio commented May 29, 2024

mitschabaude commented May 29, 2024

dfstio commented May 29, 2024 • edited Loading

mitschabaude commented May 29, 2024

mitschabaude commented May 30, 2024 • edited Loading

mitschabaude commented May 30, 2024

dfstio commented May 31, 2024 • edited Loading

mitschabaude commented May 27, 2024 •

edited

Loading

dfstio commented May 29, 2024 •

edited

Loading

mitschabaude commented May 30, 2024 •

edited

Loading

dfstio commented May 31, 2024 •

edited

Loading