Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid storing redundant attestations in slasher DB #2112

Closed
michaelsproul opened this issue Dec 22, 2020 · 0 comments
Closed

Avoid storing redundant attestations in slasher DB #2112

michaelsproul opened this issue Dec 22, 2020 · 0 comments
Assignees

Comments

@michaelsproul
Copy link
Member

Description

The slasher stores all the attestations provided to it individually, even if some of them contain duplicate information. The most common instance of this seems to be unaggregated attestations stored alongside their aggregate, which wastes space. I suspected this was an issue, but hadn't measured how bad it was in practice. The use of --subscribe-all-subnets on some of the SigP nodes revealed the extent of the problem: a 70GB database using --subscribe-all-subnets vs a 14GB database without.

Version

Lighthouse v1.0.4

Steps to resolve

I think one change that's straight-forward to implement would be the following:

Deduplicate the attestations in-memory, when they are hashed and stored in the attestation queue prior to being processed as part of a batch. A mapping from (validator_index, attestation_data_root) => indexed_attestation could be used, where on insert, we keep only the max indexed attestations (by # of attesters). Some Arc magic could gracefully handle the sharing and garbage collection.

This will be close to optimal so long as attestations and their aggregate arrive in the same batch. If that assumption turns out to be too strong, some more sophisticated (and likely costly) method to deduplicate them upon writing to disk could be used (perhaps in addition to the in-memory deduplication).

@michaelsproul michaelsproul added A0 and removed A1 labels Mar 26, 2021
@michaelsproul michaelsproul self-assigned this Nov 2, 2021
bors bot pushed a commit that referenced this issue Nov 8, 2021
## Issue Addressed

Closes #2112
Closes #1861

## Proposed Changes

Collect attestations by validator index in the slasher, and use the magic of reference counting to automatically discard redundant attestations. This results in us storing only 1-2% of the attestations observed when subscribed to all subnets, which carries over to a 50-100x reduction in data stored 🎉 

## Additional Info

There's some nuance to the configuration of the `slot-offset`. It has a profound effect on the effictiveness of de-duplication, see the docs added to the book for an explanation: https://github.com/michaelsproul/lighthouse/blob/5442e695e5256046b91d4b4f45b7d244b0d8ad12/book/src/slasher.md#slot-offset
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant