Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get blobs from the EL's blob pool #5829

Closed
wants to merge 12 commits into from

Conversation

michaelsproul
Copy link
Member

@michaelsproul michaelsproul commented May 23, 2024

Proposed Changes

This PR implements an experimental optimisation to fetch blobs via JSON-RPC from the EL. If a blob has already been seen in the public mempool, then it is often unnecessary to wait for it to arrive on P2P gossip. This PR uses a new JSON-RPC method (engine_getBlobsV1) which allows the CL to load the blobs quickly from the EL's blob pool.

Spec here:

This PR works in tandem with my changes to Reth:

TODO before merging:

  • Check engine capabilities before call
  • Wait for spec to be agreed on by EL clients
  • Broadcast blobs from EL over P2P

@michaelsproul michaelsproul added spec_change A change related to the Eth2 spec work-in-progress PR is a work-in-progress deneb optimization Something to make Lighthouse run more efficiently. labels May 23, 2024
@michaelsproul
Copy link
Member Author

Should also consider re-publishing blobs from the EL on gossip in order to speed up propagation network-wide!

@dapplion
Copy link
Collaborator

dapplion commented May 23, 2024

Really nice! This makes a lot of sense and can definitely improve status quo before peerdas.

Should also consider re-publishing blobs from the EL on gossip in order to speed up propagation network-wide!

Some options:

  • Just insert message in gossip duplicate cache so the app layer is not notified and we just forward
  • Add a condition in blob gossip verification to consider a blob valid if it's already in the da_checker
  • Actually publish blob: depending on our publishing rules this can lead to greater amplification of total network bandwidth. We could only publish to our current set of mesh peers. Could couple this with IDONTWANT control message

@michaelsproul
Copy link
Member Author

My latest commit with IPC doesn't work (yet), but here's a summary of performance using HTTP:

  • Decoding the blob txn hashes from the transaction bytes seems to be nearly instantaneous, with the log occurring the same millisecond as Blobs from EL - start request.
  • Reth often responds with all blobs in around 8-20ms, very quick!
  • Constructing the KZG inclusion proofs takes around 5ms per blob (30ms for 6). Some room for optimisation here.
  • Processing in the data availability checker also takes around 5ms per blob (30ms for 6).

In total this puts the latency for all blobs to be fetched and processed at 20+30+30 = 80ms total for 6-blob blocks. Compared to gossip, the additional cost is <= 50ms, because the DA processing is required for gossip blobs too, and they require KZG inclusion proof verification rather than construction.

@dapplion
Copy link
Collaborator

Processing in the data availability checker also takes around 5ms per blob

What's taking so long? Could be that it's verifying the kzg proof? If yes that can be skipped as you have just created it in the same process

@dapplion
Copy link
Collaborator

Notes regarding future compatibility of this feature with PeerDAS:

PeerDAS importability condition is to have seen and validated its custody columns. If a node can get all block's blobs from the EL pool (and they are valid) it is guaranteed that it can recompute its custody columns. Therefore, if a node has seen all block's blobs it can "optimistically" consider that block as an available head before importing the columns. At that moment it can choose to either:

  • compute its custody columns from the blobs
  • wait for the columns to arrive from gossip

There's no consensus safety degradation from this optimistic behavior. Only that the node is announcing to its peers "I have imported block X, with my custody columns being Y". Then its peers can request via ReqResp said columns which the peer does not have yet. The node should make sure to compute or receive the columns before TTFB_TIMEOUT to prevent losing peers.

@michaelsproul
Copy link
Member Author

Finally got a chance to update the Reth prototype & this PR to match new spec:

Seems to be working

@michaelsproul
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deneb optimization Something to make Lighthouse run more efficiently. spec_change A change related to the Eth2 spec work-in-progress PR is a work-in-progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants