Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all: bloom-filter based pruning mechanism #21724

Merged
merged 9 commits into from
Feb 8, 2021

Conversation

rjl493456442
Copy link
Member

@rjl493456442 rjl493456442 commented Oct 19, 2020

It's an alternative of #21042 . Instead of preserving the entire active state, this PR uses the bloom filter for filtering the trie nodes belongs to the specific state.

Three previous PRs need to be merged.

Pruning mechanism overview

The overall flow of this mechanism is:

  • Pick a snapshot layer which has the associated state available in the disk
    • Usually, the HEAD-127 diff layer is a good option, because
      • (a) we will maintain 128 diff layers, it's the bottom-most diff layer, very unlike to be reorged
      • (b) in most of the normal cases, it's paired with HEAD-127 state, which is available in the disk
    • It's also possible that the user can specify a customized snapshot layer as the target
      • (a) but the associated state should be present before running the code
    • If the snapshot is not fully constructed(e.g. the disk layer is incomplete), the pruning is rejected
  • Iterate the given snapshot layer, reconstruct the whole state trie(+ storage tries) with stacktrie
    • The storage tries are iterated concurrently with the main trie
    • All newly generated trie nodes are committed
    • All referenced contract codes are committed
  • Commit all the state entries(trie node + contract code) into the bloom filter
    • The character of the bloom filter: if it says YES, the entry may be included; otherwise the entry is definitely excluded
    • If the state entry is excluded in the bloom filter, it means it doesn't belong to our target state(can be pruned)
    • If the state entry is included in the bloom filter, it means it may or may not belong to the target state
      • Keep the state entry which really belongs to the target state
      • Keep the state entry which doesn't belong to the target state(dangling trie nodes!).
      • But in this case, we hold the assumption that this dangling node won't be revisited again
      • The false-positive rate of the bloom filter is set to 0.05%
      • The possibility that the "false-positive" trie node is the state root is "extremely small"
      • OPEN QUESTION, this mechanism may break the fast sync. The trie node presence implicitly indicate the presence of the entire sub-trie
  • Extract all state entries of the genesis state(we never delete the genesis state)
  • Iterate the database, delete the DB entries if
    • It's excluded in the bloom filter
    • It's excluded in the genesis marker
  • Run the range compaction afterward, in order to delete the "deleted" entries from the disk

Exception handling

What if during the long pruning procedure, the crash happens(or the manual kill). We have to ensure the database is not broken. So we need the additional exception handling mechanism.

  • The bloom filter will be persisted before deleting any entry in the database
    • The bloom filter will firstly be persisted with a temporary name
    • When the bloom filter is persisted, then rename it with a formal name(with root hash encoded in the file name)
    • Sync the directory, to ensure all pending rename operations are transferred to the disk
  • Before the bloom filter is persisted and the crash happens, in this case, nothing is deleted
  • After the bloom filter is persisted, part of the entries are already deleted. In the next restart, we have to resume the interrupted pruning, otherwise, a lot of dangling trie nodes will be left in the disk.
  • If the pruning can be finished successfully, the persisted bloom filter will be deleted
  • If not, in the next restart, geth will enter the recovery.
  • In the pruning recovery
    • the persisted bloom filter will be loaded
    • the clean trie cache will be deleted(it's necessary, users may run the pruning command multiple time, the trie cache can be reconstructed)
    • iterate the database, delete the entries
    • delete the bloom filter, the whole pruning is marked as "finished"

Corner case handling

The pruning is really sensitive because it touches the database offline. We need to ensure everything still works after the pruning.

  • Delete the trie clean cache. Geth will persist the trie clean cache periodically or during the shutdown. We have to delete it
    during the pruning. Otherwise, the "deleted" node(usually it's the state root) will be hit in the cache and Geth will use the incomplete state for running(bad block, etc)
  • For clique networks(rinkeby, goerli and some other private networks), it's possible that two sequential blocks have the exact same root hash. In this case, the diff layer won't be created for the second block. So it breaks the assumption: HEAD-127 snapshot layer is paired with HEAD-127 state. But luckily it can be fixed. We just need to traverse upward to find the corresponding layer with HEAD-127. Note in the traverse, we will ignore the HEAD and HEAD-1 by default. This state will be reorged with a very high possibility.
  • We have to modify the snapshot after the pruning
    • Firstly, the target layer should be committed as the disk layer, reason:
      • In the normal snapshot recovery, the head should be rewound below the disk layer
      • The pruning target is higher than the disk layer(e.g. HEAD-127)
      • All other state roots are deleted
      • So if we don't commit the target as the disk layer, the entire blockchain will be rewound
    • Secondly, write the snapshot journal with the target as the head
      • It will silently drop all diff layers upon
  • Use the recovery mode to initialize the snapshot during the recovery procedure
    • The user may run prune-state multiple times
    • None of them really finish
    • The head header may already be rewound
    • In the recovery, the snapshot is not continuous the chain anymore, but it's still feasible to do the recovery

@rjl493456442
Copy link
Member Author

@holiman Care to take a look at this PR :P

@holiman
Copy link
Contributor

holiman commented Oct 20, 2020

I had a mainnet fullsync up to somehwere in the 1.5M region, and decided to try to pruning on it on my laptop.
First I generated the snapshot:

[user@work go-ethereum]$ ./build/bin/geth --snapshot --maxpeers 0 --nodiscover
INFO [10-20|09:24:21.479] Starting Geth on Ethereum mainnet... 
INFO [10-20|09:24:21.479] Bumping default cache on mainnet         provided=1024 updated=4096
WARN [10-20|09:24:21.479] Sanitizing cache to Go's GC limits       provided=4096 updated=2589
INFO [10-20|09:24:21.483] Maximum peer count                       ETH=0 LES=0 total=0
...
INFO [10-20|09:24:21.486] Allocated trie memory caches             clean=388.00MiB dirty=647.00MiB
INFO [10-20|09:24:21.486] Allocated cache and file handles         database=/home/user/.ethereum/geth/chaindata cache=1.26GiB handles=262144
INFO [10-20|09:24:21.538] Opened ancient database                  database=/home/user/.ethereum/geth/chaindata/ancient
INFO [10-20|09:24:21.618] Initialised chain configuration          config="{ChainID: 1 Homestead: 1150000 DAO: 1920000 DAOSupport: true EIP150: 2463000 EIP155: 2675000 EIP158: 2675000 Byzantium: 4370000 Constantinople: 7280000 Petersburg: 7280000 Istanbul: 9069000, Muir Glacier: 9200000, YOLO v1: <nil>, Engine: ethash}"
INFO [10-20|09:24:21.618] Disk storage enabled for ethash caches   dir=/home/user/.ethereum/geth/ethash count=3
INFO [10-20|09:24:21.618] Disk storage enabled for ethash DAGs     dir=/home/user/.ethash               count=2
INFO [10-20|09:24:21.628] Initialising Ethereum protocol           versions="[65 64 63]" network=1 dbversion=8
INFO [10-20|09:24:21.629] Loaded most recent local header          number=1481015 hash="ce870d…eb2c39" td=18215007117177305406 age=4y6mo5d
INFO [10-20|09:24:21.629] Loaded most recent local full block      number=1481015 hash="ce870d…eb2c39" td=18215007117177305406 age=4y6mo5d
INFO [10-20|09:24:21.629] Loaded most recent local fast block      number=1481015 hash="ce870d…eb2c39" td=18215007117177305406 age=4y6mo5d
...
INFO [10-20|09:25:20.551] Generating state snapshot                root="9cb984…971085" at="d83656…783f37" accounts=163260 slots=692476 storage=56.66MiB elapsed=49.333s    eta=9.078s
INFO [10-20|09:25:22.353] Generated state snapshot                 accounts=193305 slots=742039 storage=61.86MiB elapsed=51.135s
^CINFO [10-20|09:28:23.037] Got interrupt, shutting down... 
INFO [10-20|09:28:23.038] IPC endpoint closed                      url=/home/user/.ethereum/geth.ipc
INFO [10-20|09:28:23.038] Ethereum protocol stopped 
INFO [10-20|09:28:23.039] Transaction pool stopped 
INFO [10-20|09:28:23.039] Writing cached state to disk             block=1481015 hash="ce870d…eb2c39" root="9cb984…971085"
INFO [10-20|09:28:23.039] Persisted trie from memory database      nodes=0 size=0.00B time="6.819µs" gcnodes=0 gcsize=0.00B gctime=0s livenodes=1 livesize=0.00B
INFO [10-20|09:28:23.039] Writing cached state to disk             block=1481014 hash="c78e11…9fadac" root="e3809b…278f8d"
INFO [10-20|09:28:23.039] Persisted trie from memory database      nodes=0 size=0.00B time="1.889µs" gcnodes=0 gcsize=0.00B gctime=0s livenodes=1 livesize=0.00B
INFO [10-20|09:28:23.040] Writing cached state to disk             block=1480888 hash="89e8f4…bb98f5" root="a5bb96…bc523a"
INFO [10-20|09:28:23.040] Persisted trie from memory database      nodes=0 size=0.00B time="3.743µs" gcnodes=0 gcsize=0.00B gctime=0s livenodes=1 livesize=0.00B
INFO [10-20|09:28:23.040] Writing snapshot state to disk           root="9cb984…971085"
INFO [10-20|09:28:23.040] Persisted trie from memory database      nodes=0 size=0.00B time="1.852µs" gcnodes=0 gcsize=0.00B gctime=0s livenodes=1 livesize=0.00B
INFO [10-20|09:28:23.040] Writing clean trie cache to disk         path=/home/user/.ethereum/geth/triecache threads=6
INFO [10-20|09:28:23.331] Persisted the clean trie cache           path=/home/user/.ethereum/geth/triecache elapsed=291.243ms
INFO [10-20|09:28:23.331] Blockchain stopped 

And then I tried the pruning. As I did so, I also saw that the htop went up to 6G:

[user@work go-ethereum]$ ./build/bin/geth snapshot prune-state
INFO [10-20|09:28:40.752] Maximum peer count                       ETH=50 LES=0 total=50
INFO [10-20|09:28:40.756] Smartcard socket not found, disabling    err="stat /run/pcscd/pcscd.comm: no such file or directory"
ERROR[10-20|09:28:40.786] Failed to enumerate USB devices          hub=ledger vendor=11415 failcount=1 err="failed to initialize libusb: libusb: unknown error [code -99]"
ERROR[10-20|09:28:40.786] Failed to enumerate USB devices          hub=trezor vendor=21324 failcount=1 err="failed to initialize libusb: libusb: unknown error [code -99]"
ERROR[10-20|09:28:40.786] Failed to enumerate USB devices          hub=trezor vendor=4617  failcount=1 err="failed to initialize libusb: libusb: unknown error [code -99]"
ERROR[10-20|09:28:40.786] Failed to enumerate USB devices          hub=ledger vendor=11415 failcount=2 err="failed to initialize libusb: libusb: unknown error [code -99]"
ERROR[10-20|09:28:40.786] Failed to enumerate USB devices          hub=trezor vendor=21324 failcount=2 err="failed to initialize libusb: libusb: unknown error [code -99]"
ERROR[10-20|09:28:40.786] Failed to enumerate USB devices          hub=trezor vendor=4617  failcount=2 err="failed to initialize libusb: libusb: unknown error [code -99]"
INFO [10-20|09:28:40.789] Set global gas cap                       cap=25000000
INFO [10-20|09:28:40.802] Allocated cache and file handles         database=/home/user/.ethereum/geth/chaindata cache=512.00MiB handles=262144
ERROR[10-20|09:28:41.789] Failed to enumerate USB devices          hub=trezor vendor=4617  failcount=3 err="failed to initialize libusb: libusb: unknown error [code -99]"
ERROR[10-20|09:28:41.789] Failed to enumerate USB devices          hub=ledger vendor=11415 failcount=3 err="failed to initialize libusb: libusb: unknown error [code -99]"
ERROR[10-20|09:28:41.789] Failed to enumerate USB devices          hub=trezor vendor=21324 failcount=3 err="failed to initialize libusb: libusb: unknown error [code -99]"
INFO [10-20|09:28:42.363] Opened ancient database                  database=/home/user/.ethereum/geth/chaindata/ancient
INFO [10-20|09:28:42.368] Disk storage enabled for ethash caches   dir=/home/user/.ethereum/geth/ethash count=3
INFO [10-20|09:28:42.368] Disk storage enabled for ethash DAGs     dir=/home/user/.ethash               count=2
INFO [10-20|09:28:42.372] Loaded most recent local header          number=1481015 hash="ce870d…eb2c39" td=18215007117177305406 age=4y6mo5d
INFO [10-20|09:28:42.372] Loaded most recent local full block      number=1481015 hash="ce870d…eb2c39" td=18215007117177305406 age=4y6mo5d
INFO [10-20|09:28:42.372] Loaded most recent local fast block      number=1481015 hash="ce870d…eb2c39" td=18215007117177305406 age=4y6mo5d
INFO [10-20|09:28:42.411] Initialized state bloom                  size=1.16GiB
INFO [10-20|09:28:42.415] Iterating snapshot                       at="000000…000000" accounts=0 elapsed=1.692ms
INFO [10-20|09:28:50.421] Iterating snapshot                       in="59f36f…b12662" at="290dec…f3e563" accounts=46091 slots=252135 elapsed=8.006s
INFO [10-20|09:28:58.423] Iterating snapshot                       in="ed1c41…9f6b8b" at="290dec…f3e563" accounts=166732 slots=495597 elapsed=16.008s
INFO [10-20|09:29:00.204] Iterated snapshot                        accounts=193305 slots=742039 elapsed=17.790s
fatal error: runtime: out of memory

runtime stack:
runtime.throw(0x12d8d66, 0x16)
	runtime/panic.go:1116 +0x72
runtime.sysMap(0xc100000000, 0x4c000000, 0x2068758)
	runtime/mem_linux.go:169 +0xc5
runtime.(*mheap).sysAlloc(0x2052e60, 0x4a400000, 0x2052e68, 0x25143)
	runtime/malloc.go:715 +0x1cd
runtime.(*mheap).grow(0x2052e60, 0x25143, 0x0)
	runtime/mheap.go:1286 +0x11c
runtime.(*mheap).allocSpan(0x2052e60, 0x25143, 0x450100, 0x2068768, 0x200000001)
	runtime/mheap.go:1124 +0x6a0
runtime.(*mheap).alloc.func1()
	runtime/mheap.go:871 +0x64
runtime.(*mheap).alloc(0x2052e60, 0x25143, 0xc000000101, 0xc000103c80)
	runtime/mheap.go:865 +0x81
runtime.largeAlloc(0x4a285888, 0xc000130101, 0x484201)
	runtime/malloc.go:1152 +0x92
runtime.mallocgc.func1()
	runtime/malloc.go:1047 +0x46
runtime.systemstack(0x7359e4000020)
	runtime/asm_amd64.s:370 +0x66
runtime.mstart()
	runtime/proc.go:1041

goroutine 1 [running]:
runtime.systemstack_switch()
	runtime/asm_amd64.s:330 fp=0xc000457d58 sp=0xc000457d50 pc=0x4803b0
runtime.mallocgc(0x4a285888, 0x10cf240, 0x1, 0x3fd3333333333333)
	runtime/malloc.go:1046 +0x895 fp=0xc000457df8 sp=0xc000457d58 pc=0x4271f5
runtime.makeslice(0x10cf240, 0x4a285888, 0x4a285888, 0xa33641)
	runtime/slice.go:49 +0x6c fp=0xc000457e28 sp=0xc000457df8 pc=0x46750c
bytes.makeSlice(0x4a285888, 0x0, 0x0, 0x0)
	bytes/buffer.go:229 +0x73 fp=0xc000457e68 sp=0xc000457e28 pc=0x513f43
bytes.(*Buffer).grow(0xc06e606570, 0x4a2856d8, 0x4824c1)
	bytes/buffer.go:142 +0x15b fp=0xc000457eb8 sp=0xc000457e68 pc=0x51388b
bytes.(*Buffer).Write(0xc06e606570, 0xc0b1c60000, 0x4a2856d8, 0x4a2856d8, 0x0, 0x0, 0x0)
	bytes/buffer.go:172 +0xdc fp=0xc000457ee8 sp=0xc000457eb8 pc=0x513b8c
encoding/binary.Write(0x1535360, 0xc06e606570, 0x1562d60, 0x2065b48, 0x10ba460, 0xc06e608ca0, 0x0, 0x0)
	encoding/binary/binary.go:375 +0x157 fp=0xc000458010 sp=0xc000457ee8 pc=0x527e17
github.com/steakknife/bloomfilter.(*Filter).marshal(0xc00069e000, 0xc06e606570, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
	github.com/steakknife/bloomfilter@v0.0.0-20180922174646-6819c0d2a570/binarymarshaler.go:64 +0x48d fp=0xc0004580f8 sp=0xc000458010 pc=0x7a01dd
github.com/steakknife/bloomfilter.(*Filter).MarshalBinary(0xc00069e000, 0xc09f465e60, 0xffffffffffffffff, 0xc00062ea50, 0x0, 0x0)
	github.com/steakknife/bloomfilter@v0.0.0-20180922174646-6819c0d2a570/binarymarshaler.go:76 +0x40 fp=0xc0004581e0 sp=0xc0004580f8 pc=0x7a04d0
github.com/steakknife/bloomfilter.(*Filter).WriteTo(0xc00069e000, 0x15371e0, 0xc09f465e60, 0x0, 0x0, 0x0)
	github.com/steakknife/bloomfilter@v0.0.0-20180922174646-6819c0d2a570/fileio.go:83 +0xe9 fp=0xc000458260 sp=0xc0004581e0 pc=0x7a24c9
github.com/steakknife/bloomfilter.(*Filter).WriteFile(0xc00069e000, 0xc099e37cb0, 0x2e, 0x0, 0x0, 0x0)
	github.com/steakknife/bloomfilter@v0.0.0-20180922174646-6819c0d2a570/fileio.go:104 +0xef fp=0xc0004582c8 sp=0xc000458260 pc=0x7a26ef
github.com/ethereum/go-ethereum/core/state/pruner.(*StateBloom).Commit(0xc00007a040, 0xc000618030, 0x2a, 0xf837d9c35f04c263, 0x851097d424fe3bd9)
	github.com/ethereum/go-ethereum/core/state/pruner/bloom.go:101 +0x9d fp=0xc000458318 sp=0xc0004582c8 pc=0xcea1ed
github.com/ethereum/go-ethereum/core/state/pruner.(*Pruner).Prune(0xc000078030, 0x2dc19c17fb84b99c, 0xdd17b801690cce5d, 0xf837d9c35f04c263, 0x851097d424fe3bd9, 0x851097d424fe3bd9, 0xc000039c80)
	github.com/ethereum/go-ethereum/core/state/pruner/pruner.go:206 +0x13f fp=0xc0004583d8 sp=0xc000458318 pc=0xcebebf
main.pruneState(0xc000330dc0, 0x0, 0x0)
	github.com/ethereum/go-ethereum/cmd/geth/snapshot.go:145 +0x2b1 fp=0xc000458a50 sp=0xc0004583d8 pc=0xf94aa1
github.com/ethereum/go-ethereum/cmd/utils.MigrateFlags.func1(0xc000330dc0, 0x0, 0xc000330dc0)
	github.com/ethereum/go-ethereum/cmd/utils/flags.go:1898 +0xbe fp=0xc000458ac8 sp=0xc000458a50 pc=0xe43e7e
gopkg.in/urfave/cli%2ev1.HandleAction(0x10e9d80, 0xc000648af0, 0xc000330dc0, 0xc00007e300, 0x0)
	gopkg.in/urfave/cli.v1@v1.20.0/app.go:490 +0xc8 fp=0xc000458af0 sp=0xc000458ac8 pc=0xa353b8
gopkg.in/urfave/cli%2ev1.Command.Run(0x12cbccb, 0xb, 0x0, 0x0, 0x0, 0x0, 0x0, 0x12fd2f7, 0x31, 0x0, ...)
	gopkg.in/urfave/cli.v1@v1.20.0/command.go:210 +0x9e8 fp=0xc000458d70 sp=0xc000458af0 pc=0xa36818
gopkg.in/urfave/cli%2ev1.(*App).RunAsSubcommand(0xc0006704e0, 0xc0003306e0, 0x0, 0x0)
	gopkg.in/urfave/cli.v1@v1.20.0/app.go:379 +0x88b fp=0xc000459328 sp=0xc000458d70 pc=0xa3427b
gopkg.in/urfave/cli%2ev1.Command.startApp(0x12c8982, 0x8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x12f1a61, 0x27, 0x0, ...)
	gopkg.in/urfave/cli.v1@v1.20.0/command.go:298 +0x81a fp=0xc000459998 sp=0xc000459328 pc=0xa37e2a
gopkg.in/urfave/cli%2ev1.Command.Run(0x12c8982, 0x8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x12f1a61, 0x27, 0x0, ...)
	gopkg.in/urfave/cli.v1@v1.20.0/command.go:98 +0x1219 fp=0xc000459c18 sp=0xc000459998 pc=0xa37049
gopkg.in/urfave/cli%2ev1.(*App).Run(0xc0006701a0, 0xc00011a150, 0x3, 0x3, 0x0, 0x0)
	gopkg.in/urfave/cli.v1@v1.20.0/app.go:255 +0x741 fp=0xc000459f28 sp=0xc000459c18 pc=0xa33641
main.main()
	github.com/ethereum/go-ethereum/cmd/geth/main.go:275 +0x55 fp=0xc000459f88 sp=0xc000459f28 pc=0xf87a85
runtime.main()
	runtime/proc.go:203 +0x1fa fp=0xc000459fe0 sp=0xc000459f88 pc=0x45199a
runtime.goexit()
	runtime/asm_amd64.s:1373 +0x1 fp=0xc000459fe8 sp=0xc000459fe0 pc=0x4824c1

@holiman
Copy link
Contributor

holiman commented Oct 20, 2020

It seems that the binarymarshaller writes the entire 1.6G bloom into a buffer, So that's a 2x amplification. Then it calls .bytes on it, to shove into a hash function (so that's a 3x amplification):

func (f *Filter) marshal() (buf *bytes.Buffer,
	hash [sha512.Size384]byte,
	err error,
) {
	f.lock.RLock()
	defer f.lock.RUnlock()

	debug("write bf k=%d n=%d m=%d\n", f.K(), f.n, f.m)

	buf = new(bytes.Buffer)

	err = binary.Write(buf, binary.LittleEndian, f.K())
	if err != nil {
		return nil, hash, err
	}

	err = binary.Write(buf, binary.LittleEndian, f.n)
	if err != nil {
		return nil, hash, err
	}

	err = binary.Write(buf, binary.LittleEndian, f.m)
	if err != nil {
		return nil, hash, err
	}

	err = binary.Write(buf, binary.LittleEndian, f.keys)
	if err != nil {
		return nil, hash, err
	}

	err = binary.Write(buf, binary.LittleEndian, f.bits)
	if err != nil {
		return nil, hash, err
	}

	hash = sha512.Sum384(buf.Bytes())
	err = binary.Write(buf, binary.LittleEndian, hash)
	return buf, hash, err
}

@holiman
Copy link
Contributor

holiman commented Oct 20, 2020

I can make a less memory-explosive save-function in https://github.com/holiman/bloomfilter , and we can switch to that one instead

@holiman
Copy link
Contributor

holiman commented Oct 20, 2020

Yup, 3x:

func TestWrite(t *testing.T) {
	// 1Mb
	f, _ := New(1* 8*1024*1024, 1)
	fmt.Printf("Allocated 1mb filter\n")
	PrintMemUsage()
	_, _ = f.WriteTo(devnull{})
	fmt.Printf("Wrote filter to devnull\n")
	PrintMemUsage()
}
Allocated 1mb filter
Alloc = 1 MiB	TotalAlloc = 1 MiB	Sys = 67 MiB	NumGC = 0
Wrote filter to devnull
Alloc = 3 MiB	TotalAlloc = 3 MiB	Sys = 68 MiB	NumGC = 1

@holiman
Copy link
Contributor

holiman commented Oct 20, 2020

Swapped it out for my fixed bloomfilter, now it succeeded:

INFO [10-20|16:07:10.610] Iterated snapshot                        accounts=193305 slots=742039 elapsed=14.499s
INFO [10-20|16:07:39.083] Pruning state data                       count=622000 size=220.77MiB elapsed=8.013s
INFO [10-20|16:07:47.087] Pruning state data                       count=1473000 size=522.33MiB elapsed=16.017s
INFO [10-20|16:07:47.771] Pruned state data                        count=1598659 size=566.83MiB elapsed=16.701s
INFO [10-20|16:07:47.771] Start compacting the database 
INFO [10-20|16:08:23.280] Compacted the whole database             elapsed=35.509s
INFO [10-20|16:08:23.280] Successfully prune the state             pruned=566.83MiB elasped=1m27.174s

@rjl493456442
Copy link
Member Author

@holiman I realized that this approach may not work. We hold the assumption that the associated state of the snapshot exists in the disk. However, there is no such guarantee. Snapshot can be generated from the runtime and after that the state is pruned.

We can use the "latest state". Because usually the snapshot and the state exist in most of the cases. But it's too aggressive that we use the "latest state" as the pruning target. It can be invalidated because of the reorg.

@holiman
Copy link
Contributor

holiman commented Oct 20, 2020

In any normal case, we have n, n-1 and n-127, IIRC. So using n-127 seems like a good fit?

@rjl493456442
Copy link
Member Author

@holiman Yes. We can try the n-127. At least before the pruning, we can check the presence of the root node.

@rjl493456442
Copy link
Member Author

rjl493456442 commented Oct 21, 2020

Now I picked the root layer as the "pruning target". According to here https://github.com/ethereum/go-ethereum/blob/master/core/blockchain.go#L965, in theory the disk layer state should also be persisted.

core/blockchain.go Outdated Show resolved Hide resolved
@rjl493456442
Copy link
Member Author

rjl493456442 commented Nov 3, 2020

  1. Update the pivot point after the pruning

Probably the complexity is not worthwhile. It's not very easy to get the relevant block number in the pruning recovery function. And also even we don't update the pivot point, it's still acceptable to iterate a few more blocks.

@rjl493456442
Copy link
Member Author

rjl493456442 commented Nov 3, 2020

2. Print warning if the cache is not in the default location

Done

@rjl493456442
Copy link
Member Author

rjl493456442 commented Nov 3, 2020

3. Initialize the optimal bloom filter with the "allowance". The default size can be 2GB for example
Done

Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

cmd/geth/snapshot.go Outdated Show resolved Hide resolved
cmd/geth/snapshot.go Outdated Show resolved Hide resolved
func pruneState(ctx *cli.Context) error {
glogger := log.NewGlogHandler(log.StreamHandler(os.Stderr, log.TerminalFormat(true)))
glogger.Verbosity(log.LvlInfo)
log.Root().SetHandler(glogger)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you explicitly hard coding the log configs? Aren't these automatically set up for all commands (also taking into consideration all the log flags)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remove these codes. But it's still problematic. We have to set the log relevant via global settings because the logging initialization is before the flag migration.

I will leave it now it looks more like a general issue. We can fix it later(to enable the logging config for all sub-commands).

cmd/geth/snapshot.go Outdated Show resolved Hide resolved
cmd/geth/snapshot.go Outdated Show resolved Hide resolved

// emptyCode is the known hash of the empty EVM bytecode.
emptyCode = crypto.Keccak256(nil)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this to the top of the file. Also could we perhaps separate these out into some package, it's getting annoying that every package redeclares this? cc @holiman @fjl ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, perhaps we can put them in the common(e.g. common/misc.go). But I can do it in a separate PR.

cmd/geth/snapshot.go Outdated Show resolved Hide resolved
cmd/geth/snapshot.go Outdated Show resolved Hide resolved
rjl493456442 and others added 5 commits February 5, 2021 14:52
core: fix db inspector

cmd/geth: add verify-state

cmd/geth: add verification tool

core/rawdb: implement flatdb

cmd, core: fix rebase

core/state: use new contract code layout

core/state/pruner: avoid deleting genesis state

cmd/geth: add helper function

core, cmd: fix extract genesis

core: minor fixes

contracts: remove useless

core/state/snapshot: plugin stacktrie

core: polish

core/state/snapshot: iterate storage concurrently

core/state/snapshot: fix iteration

core: add comments

core/state/snapshot: polish code

core/state: polish

core/state/snapshot: rebase

core/rawdb: add comments

core/rawdb: fix tests

core/rawdb: improve tests

core/state/snapshot: fix concurrent iteration

core/state: run pruning during the recovery

core, trie: implement martin's idea

core, eth: delete flatdb and polish pruner

trie: fix import

core/state/pruner: add log

core/state/pruner: fix issues

core/state/pruner: don't read back

core/state/pruner: fix contract code write

core/state/pruner: check root node presence

cmd, core: polish log

core/state: use HEAD-127 as the target

core/state/snapshot: improve tests

cmd/geth: fix verification tool

cmd/geth: use HEAD as the verification default target

all: replace the bloomfilter with martin's fork

cmd, core: polish code

core, cmd: forcibly delete state root

core/state/pruner: add hash64

core/state/pruner: fix blacklist

core/state: remove blacklist

cmd, core: delete trie clean cache before pruning

cmd, core: fix lint

cmd, core: fix rebase

core/state: fix the special case for clique networks

core/state/snapshot: remove useless code

core/state/pruner: capping the snapshot after pruning

cmd, core, eth: fixes

core/rawdb: update db inspector

cmd/geth: polish code

core/state/pruner: fsync bloom filter

cmd, core: print warning log

core/state/pruner: adjust the parameters for bloom filter

cmd, core: create the bloom filter by size

core: polish

core/state/pruner: sanitize invalid bloomfilter size

cmd: address comments

cmd/geth: address comments

cmd/geth: address comment

core/state/pruner: address comments

core/state/pruner: rename homedir to datadir

cmd, core: address comments

core/state/pruner: address comment

core/state: address comments

core, cmd, tests: address comments

core: address comments

core/state/pruner: release the iterator after each commit

core/state/pruner: improve pruner

cmd, core: adjust bloom paramters

core/state/pruner: fix lint

core/state/pruner: fix tests

core: fix rebase

core/state/pruner: remove atomic rename

core/state/pruner: address comments

all: run go mod tidy

core/state/pruner: avoid false-positive for the middle state roots

core/state/pruner: add checks for middle roots

cmd/geth: replace crit with error
@karalabe karalabe merged commit f566dd3 into ethereum:master Feb 8, 2021
renaynay pushed a commit to renaynay/go-ethereum that referenced this pull request Feb 9, 2021
* cmd, core, tests: initial state pruner

core: fix db inspector

cmd/geth: add verify-state

cmd/geth: add verification tool

core/rawdb: implement flatdb

cmd, core: fix rebase

core/state: use new contract code layout

core/state/pruner: avoid deleting genesis state

cmd/geth: add helper function

core, cmd: fix extract genesis

core: minor fixes

contracts: remove useless

core/state/snapshot: plugin stacktrie

core: polish

core/state/snapshot: iterate storage concurrently

core/state/snapshot: fix iteration

core: add comments

core/state/snapshot: polish code

core/state: polish

core/state/snapshot: rebase

core/rawdb: add comments

core/rawdb: fix tests

core/rawdb: improve tests

core/state/snapshot: fix concurrent iteration

core/state: run pruning during the recovery

core, trie: implement martin's idea

core, eth: delete flatdb and polish pruner

trie: fix import

core/state/pruner: add log

core/state/pruner: fix issues

core/state/pruner: don't read back

core/state/pruner: fix contract code write

core/state/pruner: check root node presence

cmd, core: polish log

core/state: use HEAD-127 as the target

core/state/snapshot: improve tests

cmd/geth: fix verification tool

cmd/geth: use HEAD as the verification default target

all: replace the bloomfilter with martin's fork

cmd, core: polish code

core, cmd: forcibly delete state root

core/state/pruner: add hash64

core/state/pruner: fix blacklist

core/state: remove blacklist

cmd, core: delete trie clean cache before pruning

cmd, core: fix lint

cmd, core: fix rebase

core/state: fix the special case for clique networks

core/state/snapshot: remove useless code

core/state/pruner: capping the snapshot after pruning

cmd, core, eth: fixes

core/rawdb: update db inspector

cmd/geth: polish code

core/state/pruner: fsync bloom filter

cmd, core: print warning log

core/state/pruner: adjust the parameters for bloom filter

cmd, core: create the bloom filter by size

core: polish

core/state/pruner: sanitize invalid bloomfilter size

cmd: address comments

cmd/geth: address comments

cmd/geth: address comment

core/state/pruner: address comments

core/state/pruner: rename homedir to datadir

cmd, core: address comments

core/state/pruner: address comment

core/state: address comments

core, cmd, tests: address comments

core: address comments

core/state/pruner: release the iterator after each commit

core/state/pruner: improve pruner

cmd, core: adjust bloom paramters

core/state/pruner: fix lint

core/state/pruner: fix tests

core: fix rebase

core/state/pruner: remove atomic rename

core/state/pruner: address comments

all: run go mod tidy

core/state/pruner: avoid false-positive for the middle state roots

core/state/pruner: add checks for middle roots

cmd/geth: replace crit with error

* core/state/pruner: fix lint

* core: drop legacy bloom filter

* core/state/snapshot: improve pruner

* core/state/snapshot: polish concurrent logs to report ETA vs. hashes

* core/state/pruner: add progress report for pruning and compaction too

* core: fix snapshot test API

* core/state: fix some pruning logs

* core/state/pruner: support recovering from bloom flush fail

Co-authored-by: Péter Szilágyi <peterke@gmail.com>
renaynay pushed a commit to renaynay/go-ethereum that referenced this pull request Feb 16, 2021
* cmd, core, tests: initial state pruner

core: fix db inspector

cmd/geth: add verify-state

cmd/geth: add verification tool

core/rawdb: implement flatdb

cmd, core: fix rebase

core/state: use new contract code layout

core/state/pruner: avoid deleting genesis state

cmd/geth: add helper function

core, cmd: fix extract genesis

core: minor fixes

contracts: remove useless

core/state/snapshot: plugin stacktrie

core: polish

core/state/snapshot: iterate storage concurrently

core/state/snapshot: fix iteration

core: add comments

core/state/snapshot: polish code

core/state: polish

core/state/snapshot: rebase

core/rawdb: add comments

core/rawdb: fix tests

core/rawdb: improve tests

core/state/snapshot: fix concurrent iteration

core/state: run pruning during the recovery

core, trie: implement martin's idea

core, eth: delete flatdb and polish pruner

trie: fix import

core/state/pruner: add log

core/state/pruner: fix issues

core/state/pruner: don't read back

core/state/pruner: fix contract code write

core/state/pruner: check root node presence

cmd, core: polish log

core/state: use HEAD-127 as the target

core/state/snapshot: improve tests

cmd/geth: fix verification tool

cmd/geth: use HEAD as the verification default target

all: replace the bloomfilter with martin's fork

cmd, core: polish code

core, cmd: forcibly delete state root

core/state/pruner: add hash64

core/state/pruner: fix blacklist

core/state: remove blacklist

cmd, core: delete trie clean cache before pruning

cmd, core: fix lint

cmd, core: fix rebase

core/state: fix the special case for clique networks

core/state/snapshot: remove useless code

core/state/pruner: capping the snapshot after pruning

cmd, core, eth: fixes

core/rawdb: update db inspector

cmd/geth: polish code

core/state/pruner: fsync bloom filter

cmd, core: print warning log

core/state/pruner: adjust the parameters for bloom filter

cmd, core: create the bloom filter by size

core: polish

core/state/pruner: sanitize invalid bloomfilter size

cmd: address comments

cmd/geth: address comments

cmd/geth: address comment

core/state/pruner: address comments

core/state/pruner: rename homedir to datadir

cmd, core: address comments

core/state/pruner: address comment

core/state: address comments

core, cmd, tests: address comments

core: address comments

core/state/pruner: release the iterator after each commit

core/state/pruner: improve pruner

cmd, core: adjust bloom paramters

core/state/pruner: fix lint

core/state/pruner: fix tests

core: fix rebase

core/state/pruner: remove atomic rename

core/state/pruner: address comments

all: run go mod tidy

core/state/pruner: avoid false-positive for the middle state roots

core/state/pruner: add checks for middle roots

cmd/geth: replace crit with error

* core/state/pruner: fix lint

* core: drop legacy bloom filter

* core/state/snapshot: improve pruner

* core/state/snapshot: polish concurrent logs to report ETA vs. hashes

* core/state/pruner: add progress report for pruning and compaction too

* core: fix snapshot test API

* core/state: fix some pruning logs

* core/state/pruner: support recovering from bloom flush fail

Co-authored-by: Péter Szilágyi <peterke@gmail.com>
filipescuc pushed a commit to EthereumGenesys/go-ethereum that referenced this pull request Mar 2, 2021
* cmd, core, tests: initial state pruner

core: fix db inspector

cmd/geth: add verify-state

cmd/geth: add verification tool

core/rawdb: implement flatdb

cmd, core: fix rebase

core/state: use new contract code layout

core/state/pruner: avoid deleting genesis state

cmd/geth: add helper function

core, cmd: fix extract genesis

core: minor fixes

contracts: remove useless

core/state/snapshot: plugin stacktrie

core: polish

core/state/snapshot: iterate storage concurrently

core/state/snapshot: fix iteration

core: add comments

core/state/snapshot: polish code

core/state: polish

core/state/snapshot: rebase

core/rawdb: add comments

core/rawdb: fix tests

core/rawdb: improve tests

core/state/snapshot: fix concurrent iteration

core/state: run pruning during the recovery

core, trie: implement martin's idea

core, eth: delete flatdb and polish pruner

trie: fix import

core/state/pruner: add log

core/state/pruner: fix issues

core/state/pruner: don't read back

core/state/pruner: fix contract code write

core/state/pruner: check root node presence

cmd, core: polish log

core/state: use HEAD-127 as the target

core/state/snapshot: improve tests

cmd/geth: fix verification tool

cmd/geth: use HEAD as the verification default target

all: replace the bloomfilter with martin's fork

cmd, core: polish code

core, cmd: forcibly delete state root

core/state/pruner: add hash64

core/state/pruner: fix blacklist

core/state: remove blacklist

cmd, core: delete trie clean cache before pruning

cmd, core: fix lint

cmd, core: fix rebase

core/state: fix the special case for clique networks

core/state/snapshot: remove useless code

core/state/pruner: capping the snapshot after pruning

cmd, core, eth: fixes

core/rawdb: update db inspector

cmd/geth: polish code

core/state/pruner: fsync bloom filter

cmd, core: print warning log

core/state/pruner: adjust the parameters for bloom filter

cmd, core: create the bloom filter by size

core: polish

core/state/pruner: sanitize invalid bloomfilter size

cmd: address comments

cmd/geth: address comments

cmd/geth: address comment

core/state/pruner: address comments

core/state/pruner: rename homedir to datadir

cmd, core: address comments

core/state/pruner: address comment

core/state: address comments

core, cmd, tests: address comments

core: address comments

core/state/pruner: release the iterator after each commit

core/state/pruner: improve pruner

cmd, core: adjust bloom paramters

core/state/pruner: fix lint

core/state/pruner: fix tests

core: fix rebase

core/state/pruner: remove atomic rename

core/state/pruner: address comments

all: run go mod tidy

core/state/pruner: avoid false-positive for the middle state roots

core/state/pruner: add checks for middle roots

cmd/geth: replace crit with error

* core/state/pruner: fix lint

* core: drop legacy bloom filter

* core/state/snapshot: improve pruner

* core/state/snapshot: polish concurrent logs to report ETA vs. hashes

* core/state/pruner: add progress report for pruning and compaction too

* core: fix snapshot test API

* core/state: fix some pruning logs

* core/state/pruner: support recovering from bloom flush fail

Co-authored-by: Péter Szilágyi <peterke@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants