Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloader sync mode changed mid-run #29817

Open
rjl493456442 opened this issue May 22, 2024 · 2 comments
Open

Downloader sync mode changed mid-run #29817

rjl493456442 opened this issue May 22, 2024 · 2 comments
Labels

Comments

@rjl493456442
Copy link
Member

Recently i have observed some error logs in our benchmark infra, saying Downloader sync mode changed mid-run.
And even, in the latest snap sync, bad block error occurs

WARN [05-21|16:40:03.161] Unexpected trienode heal packet          peer=c847ea3a reqid=1,847,216,001,477,791,397
ERROR[05-21|16:40:03.171] Beacon backfilling failed                err="syncing canceled (requested)"
WARN [05-21|16:40:03.177] Retrieved pivot header from local        number=19,919,467 hash=694934..f919d8 latest=19,919,531 oldest=19,919,529
INFO [05-21|16:40:03.177] Forkchoice requested sync to new head    number=19,919,531 hash=235392..2ef6b4 finalized=unknown
WARN [05-21|16:40:03.190] Failed to load old bad blocks            error="pebble: not found"
ERROR[05-21|16:40:03.196] 
########## BAD BLOCK #########
Error: could not apply tx 0 [0x9521c089bbcb8808db31da1d39338cc4177ef92e325f0dee7408cb7419c24c67]: nonce too high: address 0x22B0351D445840dB59B97df3808dd642DCB17e96, tx: 1105 state: 0
Platform: geth (devel) go1.22.3 amd64 linux
VCS: 639900af-20240518
Chain config: &params.ChainConfig{ChainID:1, HomesteadBlock:1150000, DAOForkBlock:1920000, DAOForkSupport:true, EIP150Block:2463000, EIP155Block:2675000, EIP158Block:2675000, ByzantiumBlock:4370000, ConstantinopleBlock:7280000, PetersburgBlock:7280000, IstanbulBlock:9069000, MuirGlacierBlock:9200000, BerlinBlock:12244000, LondonBlock:12965000, ArrowGlacierBlock:13773000, GrayGlacierBlock:15050000, MergeNetsplitBlock:<nil>, ShanghaiTime:(*uint64)(0xc0003ed5c8), CancunTime:(*uint64)(0xc0003ed5d0), PragueTime:(*uint64)(nil), VerkleTime:(*uint64)(nil), TerminalTotalDifficulty:58750000000000000000000, TerminalTotalDifficultyPassed:true, Ethash:(*params.EthashConfig)(0x2eb3da0), Clique:(*params.CliqueConfig)(nil)}
Receipts: 
##############################

After some preliminary investigations, here are some important events ordered by timeline:

  • Initial state sync completes at block 19,919,465

INFO [05-21|16:39:48.215] Committed new head block number=19,919,465 hash=481b69..7f2734

  • Remaining blocks after 19,919,465 are inserted one by one

  • A new block announcement received and it's gapped with local beacon tip

A new sync cycle is scheduled with snap mode

INFO [05-21|16:39:52.214] New head block                           number=19,919,530 hash=e4e51e..b852e8 beaconRoot=52355f..a5e8b4
WARN [05-21|16:39:52.224] Ignoring payload while snap syncing      number=19,919,530 hash=e4e51e..b852e8 reason="chain gapped, head: 19919528, newHead: 19919530"
INFO [05-21|16:39:52.225] Forkchoice requested sync to new head    number=19,919,530 hash=e4e51e..b852e8 finalized=unknown
INFO [05-21|16:39:52.225] Restarting sync cycle                    reason="chain gapped, head: 19919528, newHead: 19919530"
  • Remaining blocks of last sync cycle are fully inserted, with 19,919,529 as chain tip
May 22 00:40:00 bench06.ethdevops.io geth INFO [05-21|16:40:00.918] Imported new chain segment               number=19,919,529 hash=fa4f2c..c4a659 blocks=23         txs=4505 mgas=338.444 elapsed=4.648s      mgasps=72.805 snapdiffs=359.45KiB triediffs=100.91MiB triedirty=0.00B
May 22 00:40:00 bench06.ethdevops.io geth INFO [05-21|16:40:00.918] Syncing: chain download in progress      synced=100.00% chain=753.90GiB headers=19,919,529@9.29GiB bodies=19,919,529@553.06GiB receipts=19,919,529@191.55GiB eta=-11.071s
May 22 00:40:00 bench06.ethdevops.io geth INFO [05-21|16:40:00.918] Snap sync complete, auto disabling
  • A new sync cycle is restarted with "snap mode" (the mode was assigned before the completion of last sync cycle)

State healer removes the inconsistent root trie node and seeks for the state corresponding to block 19,919,530

May 22 00:40:00 bench06.ethdevops.io geth INFO [05-21|16:40:00.919] Disabled trie database due to state sync
May 22 00:40:01 bench06.ethdevops.io geth INFO [05-21|16:40:00.941] Aborting state snapshot generation       root=3dc301..5a1fab in=0042a5..807326 at=dab444..649ebe accounts=249,416              slots=1,914,524               storage=149.99MiB dangling=0 elapsed=12.725s     eta=3h28m21.072s
May 22 00:40:01 bench06.ethdevops.io geth WARN [05-21|16:40:00.948] Retrieved pivot header from local        number=19,919,466 hash=7f30be..0df48d latest=19,919,530 oldest=19,919,529
May 22 00:40:01 bench06.ethdevops.io geth INFO [05-21|16:40:00.949] Disabling direct-ancient mode            origin=19,919,465 ancient=19,918,621
May 22 00:40:01 bench06.ethdevops.io geth WARN [05-21|16:40:00.950] Retrieved beacon headers from local      from=19,919,466 count=63
May 22 00:40:01 bench06.ethdevops.io geth INFO [05-21|16:40:00.951] Syncing: state healing in progress       accounts=97978@6.50MiB        slots=119,196@8.99MiB         codes=37@305.80KiB       nodes=929,750@284.54MiB pending=1
May 22 00:40:01 bench06.ethdevops.io geth WARN [05-21|16:40:01.027] Ignoring already known beacon payload    number=19,919,529 hash=fa4f2c..c4a659 age=26s
May 22 00:40:01 bench06.ethdevops.io geth INFO [05-21|16:40:01.140] Syncing: state healing in progress       accounts=97978@6.50MiB        slots=119,196@8.99MiB         codes=37@305.80KiB       nodes=929,937@284.63MiB pending=475
May 22 00:40:01 bench06.ethdevops.io geth WARN [05-21|16:40:01.152] Unexpected trienode heal packet          peer=5b1055ce reqid=6,030,026,841,057,844,391
May 22 00:40:01 bench06.ethdevops.io geth INFO [05-21|16:40:01.922] Indexing transactions                    blocks=2742       txs=477,046 tail=19,916,788 total=2,350,000  elapsed=1.003s
May 22 00:40:02 bench06.ethdevops.io factor INFO [05-21|16:40:02.424] New head block                           number=19,919,531 hash=235392..2ef6b4 beaconRoot=e794fb..6e2916
May 22 00:40:02 bench06.ethdevops.io geth ERROR[05-21|16:40:02.431] Downloader sync mode changed mid-run     old=full new=full
May 22 00:40:03 bench06.ethdevops.io geth INFO [05-21|16:40:03.152] Syncing: state healing in progress       accounts=98140@6.51MiB        slots=119,196@8.99MiB         codes=37@305.80KiB       nodes=931,593@285.29MiB pending=1152
  • A new payload event for block 19,919,530 is executed in the mean time, result in the bad block due to state loss
@rjl493456442
Copy link
Member Author

rjl493456442 commented May 22, 2024

Conclusion I got until now:

Sync mode is changed in the middle of sync process because of data race

The mode of the last sync cycle was assigned before the completion of the prior cycle, and the desired sync mode was set to full when the prior cycle completes. Namely snap sync is running while the desired sync mode has been changed to full. Any consensus event received during this time window could trigger this error log.

Block insertion may run with snap sync simultaneously

It could lead to bad block or potentially state corruption.

@smin-k
Copy link
Contributor

smin-k commented Aug 5, 2024

I am interested in this issue.

Was the change from "snap" to "full" sync automatic, or was it initiated by a client command?

Could restricting mode changes during synchronization, as managed by the setMode function, be a potential solution to this problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants