Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logical Commit(s) are not Atomic #6370

Closed
4 tasks
joe-bowman opened this issue Jun 8, 2020 · 15 comments
Closed
4 tasks

Logical Commit(s) are not Atomic #6370

joe-bowman opened this issue Jun 8, 2020 · 15 comments
Labels
C:Store T:tech debt Tech debt that should be cleaned up

Comments

@joe-bowman
Copy link
Contributor

joe-bowman commented Jun 8, 2020

Summary of Bug

When syncing a node using --pruning=everything, it is possible to gracefully stop a node and it fail to restart.

Version

cosmos-sdk 0.38.4 (via both Kava v0.8.1 and Gaia)
tendermint 0.33.3 and 0.33.4

Steps to Reproduce

  1. A running validator exists. Pruning mode does not appear to be relevant here. Tried with nothing, everything and syncable.
  2. Start a fullnode, with the validator as it's persistent peer; set pruning=everything (both commandline and app.toml behave the same - as expected)
  3. Allow the node to sync briefly (best results > 2.5k block)
  4. ctrl+c to gracefully halt the syncing fullnode
  5. start the node again.
  6. node fails to start.

Expected behaviour: Node starts and begin syncing again

Actual behaviour: (not 100% of the time, but c 75% of tests)

I[2020-06-08|21:22:56.843] starting ABCI with Tendermint                module=main 
failed to load Store: wanted to load target 3171 but only found up to 0

Anecdotally, if you stop shortly after restarting, this does not happen - however, a few thousand blocks in, it is reasonably consistent behaviour that the node fails in the way described above.


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@joe-bowman
Copy link
Contributor Author

Seems related to: #5746

@alexanderbez
Copy link
Contributor

alexanderbez commented Jun 8, 2020

Please confirm that you're starting the node with a pruning strategy, say nothing, and then later changing it, say to everything. If so, this behavior is not supported by IAVL.

You must keep in mind IAVL now keeps versions in-memory only and then periodically flushes to disk.

@alexanderbez
Copy link
Contributor

Note, I cannot reproduce this on Gaia.

@karzak
Copy link
Contributor

karzak commented Jun 8, 2020

Have confirmed this behavior as well on kava v0.8.1. Had to shutdown and restart the process ~20 times to get a failure.

@joe-bowman
Copy link
Contributor Author

Please confirm that you're starting the node with a pruning strategy, say nothing, and then later changing it, say to everything. If so, this behavior is not supported by IAVL.

You must keep in mind IAVL now keeps versions in-memory only and then periodically flushes to disk.

I can confirm this is not the case. This is using pruning=everything from height 0.

@olwee
Copy link

olwee commented Jun 9, 2020

Able to replicate for the scenario of pruning=everything for heights above 2500.

Test Heights: 2500 ++ to 3300 ++
Failure rate: 1 in 5
Status: Unrecoverable after failure.
Sample Error Message:
I[2020-06-09|08:09:43.216] starting ABCI with Tendermint module=main
failed to load Store: wanted to load target 2842 but only found up to 0

Remedy: unsafe-reset-all and sync from scratch

@alexanderbez alexanderbez added this to the v0.39 milestone Jun 9, 2020
@alexanderbez
Copy link
Contributor

alexanderbez commented Jun 9, 2020

@AdityaSripal @zmanian more pruning related issues. Seems like when pruning=everything from the onset causes an issue after you stop a node after a snapshot height.

@erikgrinaker
Copy link
Contributor

I've been able to reproduce this with Gaia master, running 4 txs per 1-second block and restarting a node every 10 seconds:

gaiadnode3    | failed to load latest version: failed to load store: wanted to load target 797 but only found up to 0
gaiadnode3 exited with code 0

This happened when restarting just as block 798 was in the process of being committed. I have some ideas about what might be causing this, will dig further.

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Jun 12, 2020

I've identified the problem: Cosmos SDK commits are not atomic (as @alexanderbez also suspected).

The SDK uses multiple logically-independent IAVL stores with a single backing LevelDB database via rootmulti.Store. When the root store is committed, it commits each of the backing IAVL stores sequentially, which both commits the new version and deletes the old. Once all stores have committed, the root store records the latest version. Of course, if the commit is aborted midway, some IAVL stores may have committed the new version (and deleted the previous) while others haven't - and the root still points to the previous version. On restart, it tries to load the previous version, but some IAVL stores have already deleted it.

The problem is further exacerbated by Tendermint not waiting for in-flight blocks to be applied before exiting.

The following are LevelDB entries from the broken node, showing the IAVL root nodes of the acc logical store, where /r signifies root followed by the big-endian int64 version:

application.db $ leveldbutil dump *.ldb | grep /r | sort
's/k:acc/r\x00\x00\x00\x00\x00\x00\x03\x1d' @ 298335 : val => '\x0c\xe8\xa8...
's/k:acc/r\x00\x00\x00\x00\x00\x00\x03\x1d' @ 298646 : del => ''
's/k:acc/r\x00\x00\x00\x00\x00\x00\x03\x1e' @ 298619 : val => '\xee\x8cj...

0x031e is 798 (exists), 0x031d is 797 (deleted). The following is the root store version key:

application.db $ leveldbutil dump *.ldb | grep s/latest | sort | tail -n 3
's/latest' @ 297796 : val => '\x9b\x06'
's/latest' @ 298184 : val => '\x9c\x06'
's/latest' @ 298572 : val => '\x9d\x06'

0x9d06 is the Amino-encoded (i.e. varint-encoded) integer version, which decodes to 797:

i, _ := binary.Uvarint([]byte{0x9d, 0x06})
fmt.Printf("%v", i) // Output: 797

Thus, on startup the root store attempts to load version 797, which has already been deleted by the acc IAVL store.

I believe this is a different problem from cosmos/iavl#261, so I will continue debugging that separately.

How to fix

I will leave it for the SDK team to fix the commit issue. Ideally, a single ACID database transaction should be used to store all data for a given version, but this probably requires a major architecture change. Barring that, I'd recommend first committing all sub-stores, then updating the root commit data, and only then deleting the old versions - and also to consider the effects of having some IAVL stores with "future" versions already committed, which must be ignored/deleted/replaced.

Tendermint should also wait for commits to complete before shutting down, see tendermint/tendermint#5002.

@alexanderbez
Copy link
Contributor

alexanderbez commented Jun 12, 2020

Excellent @erikgrinaker, this confirms my suspicions -- updates to the logical stores and the root logical store (along with pruning) is NOT atomic.

Being that they all use the same underlying physical store, we need to devise a way for them all to use the same batch object. So as you've pointed, some relatively major changes to the SDK and IAVL.

This is certainly blocking 0.39. I'll take this on.

@alexanderbez alexanderbez self-assigned this Jun 12, 2020
@alexanderbez alexanderbez changed the title Pruning: Node cannot be restarted after graceful shutdown with --pruning=everything Logical Commit is not Atomic Jun 12, 2020
@alexanderbez alexanderbez changed the title Logical Commit is not Atomic Logical Commit(s) are not Atomic Jun 12, 2020
@alexanderbez alexanderbez modified the milestones: v0.39, v0.40 Jun 26, 2020
@aaronc aaronc added ice-box and removed backlog labels Oct 23, 2020
@aaronc aaronc modified the milestones: v0.41, v0.42 Jan 6, 2021
@clevinson
Copy link
Contributor

We'll be tackling this when evaluating the future of IAVL / commitment proofs, and its currently blocked on that larger conversation. Handover of IAVL is starting from Interchain GmbH this week.

@clevinson
Copy link
Contributor

Solution to this is being discussed as part of the larger ARD-040 work in #9331

@robert-zaremba
Copy link
Collaborator

robert-zaremba commented May 21, 2021

More precisely:

  • we are removing a multitree
  • There will be only one global commit for the storage which yield one atomic transaction set for the KV.

Details: #9355

@alexanderbez
Copy link
Contributor

ACK

@tac0turtle
Copy link
Member

closing this as its being discussed in store v2 working group

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:Store T:tech debt Tech debt that should be cleaned up
Projects
None yet
Development

No branches or pull requests