Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blockchain: fix inconsistent utxocache and database on reorg #2123

Conversation

kcalvinalvin
Copy link
Collaborator

@kcalvinalvin kcalvinalvin commented Feb 16, 2024

During reorgs, the assumption being made was that the utxocache is ignored and we're directly accessing the database for utxo set modifications.

However, after a block disconnect, the BlockChain.chainLock, the lock that prevents modifications to the utxo set, was being freed for the callback functions for the blockchain notifications.

btcd/blockchain/chain.go

Lines 853 to 858 in 13152b3

// Notify the caller that the block was disconnected from the main
// chain. The caller would typically want to react with actions such as
// updating wallets.
b.chainLock.Unlock()
b.sendNotification(NTBlockDisconnected, block)
b.chainLock.Lock()

If a caller calls either of the two methods below, the utxocache could be loaded with utxos that are fetched from the database:

// FetchUtxoView loads unspent transaction outputs for the inputs referenced by
// the passed transaction from the point of view of the end of the main chain.
// It also attempts to fetch the utxos for the outputs of the transaction itself
// so the returned view can be examined for duplicate transactions.
//
// This function is safe for concurrent access however the returned view is NOT.
func (b *BlockChain) FetchUtxoView(tx *btcutil.Tx) (*UtxoViewpoint, error) {

// FetchUtxoEntry loads and returns the requested unspent transaction output
// from the point of view of the end of the main chain.
//
// NOTE: Requesting an output for which there is no data will NOT return an
// error. Instead both the entry and the error will be nil. This is done to
// allow pruning of spent transaction outputs. In practice this means the
// caller must check if the returned entry is nil before invoking methods on it.
//
// This function is safe for concurrent access however the returned entry (if
// any) is NOT.
func (b *BlockChain) FetchUtxoEntry(outpoint wire.OutPoint) (*UtxoEntry, error) {

Since the UTXOs fetched here are considered unmodified and therefore wouldn't be saved in the database when the utxoCache is flushed at the end of a reorganization, if a caller makes a call to the above two functions during a reorg, they could be mistakenly think that a utxo exists when it was removed from the utxo set.

Example:

There's a chain like so where a utxo created in b1001 is spent in b1002. Blocks b1002 and b1001 are being disconnected out by a longer chain.

b1000() -> b1001(b1000.tx[1].out[0]) -> b1002(b1001.tx[1].out[0])

Mempool tries to fetch b1001.tx[1].out[0] three times in total. 2, 8 and 11.
At 6 the utxo b1001.tx[1].out[0] get loaded up to the cache. The cache never gets reset and that utxo gets fetched at 12. When the reorg finishes, the mempool can no longer fetch b1001.tx[1].out[0]

         database                 UTXO cache                 BlockChain                                     Mempool

1.     {b1002.tx[0].out[0]}       {}                         chainLock.Lock() (for reorg)
2.     {b1002.tx[0].out[0]}       {}                                                                        Try to acquire lock (fetch output b1001.tx[1].out[0])
4.     {b1001.tx[1].out[0]}       {}                         disconnect b1002                               Try to acquire lock (fetch output b1001.tx[1].out[0])
5.     {}                         {}                         chainLock.Unlock() (subscriber callback)       Acquired lock (fetch output b1001.tx[1].out[0])
6.     {}                         {b1001.tx[1].out[0]}                                                      Fetched b1001.tx[1].out[0] (should happen)
7.     {}                         {b1001.tx[1].out[0]}       chainLock.Lock()   (finished callback)
8.     {}                         {b1001.tx[1].out[0]}                                                      Try to acquire lock (fetch output b1001.tx[1].out[0])
9.     {}                         {b1001.tx[1].out[0]}                                                      Try to acquire lock (fetch output b1001.tx[1].out[0])
10.    {b1000.tx[1].out[0]}       {b1001.tx[1].out[0]}       disconnect b1001                               Try to acquire lock (fetch output b1001.tx[1].out[0])
11.    {b1000.tx[1].out[0]}       {b1001.tx[1].out[0]}       chainLock.Unlock() (subscriber callback)       Acquired lock (fetch output b1001.tx[1].out[0])
12.    {b1000.tx[1].out[0]}       {b1001.tx[1].out[0]}                                                      Fetched b1001.tx[1].out[0] (shouldn't happen)
13.    {b1000.tx[1].out[0]}       {b1001.tx[1].out[0]}       chainLock.Lock()   (finished callback)
14.    {b1000.tx[1].out[0]}       {}                         Flush UTXO set
15.    {b1000.tx[1].out[0]}       {}                                                                        Try to acquire lock (fetch output b1001.tx[1].out[0])
16.    {b1000.tx[1].out[0]}       {}                                                                        Acquired lock (fetch output b1001.tx[1].out[0])
17.    {b1000.tx[1].out[0]}       {}                                                                        Did not find b1001.tx[1].out[0] (should happen)

@coveralls
Copy link

coveralls commented Feb 16, 2024

Pull Request Test Coverage Report for Build 8184050728

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 72 of 78 (92.31%) changed or added relevant lines in 4 files are covered.
  • 22 unchanged lines in 4 files lost coverage.
  • Overall coverage increased (+0.003%) to 56.779%

Changes Missing Coverage Covered Lines Changed/Added Lines %
blockchain/fullblocktests/generate.go 55 57 96.49%
blockchain/chain.go 14 18 77.78%
Files with Coverage Reduction New Missed Lines %
blockchain/chainio.go 2 60.99%
connmgr/connmanager.go 2 86.27%
blockchain/utxoviewpoint.go 9 72.54%
peer/peer.go 9 73.94%
Totals Coverage Status
Change from base Build 8117173129: 0.003%
Covered Lines: 29309
Relevant Lines: 51619

💛 - Coveralls

@kcalvinalvin kcalvinalvin force-pushed the 2024-02-15-no-utxocache-loading-on-reorgs branch from de1be7d to 65ecf4a Compare February 21, 2024 09:17
@kcalvinalvin kcalvinalvin marked this pull request as ready for review February 21, 2024 09:19
Copy link
Collaborator

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the fix! LGTM 🎉

blockchain/fullblocks_test.go Outdated Show resolved Hide resolved
existance/non-existance

New test instance BlockDisconnectExpectUTXO tests that a utxo
exists/doesn't exist after a specific block has been disconnected.
@kcalvinalvin kcalvinalvin force-pushed the 2024-02-15-no-utxocache-loading-on-reorgs branch from 65ecf4a to f76a887 Compare February 21, 2024 10:09
Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work tracking down this fix! Have some questions around some lingering logic that looks redundant after this change, and a panic I get in the unit tests w/ the first commit (new tests), but w/o the second (the fix).

blockchain/fullblocktests/generate.go Show resolved Hide resolved
blockchain/chain.go Show resolved Hide resolved
blockchain/chain.go Show resolved Hide resolved
blockchain/chain.go Outdated Show resolved Hide resolved
blockchain/chain.go Outdated Show resolved Hide resolved
The assumption in the previous code was incorrect in that we were
assuming that the chainLock is held throughout the entire chain reorg.
This is not the case since the chainLock is let go of during the
callback to the subscribers.

Because of this, we need to ensure that the utxo set is consistent on
each block disconnect. To achieve this, additional flushes are added
during block disconnects.

Also the utxocache is no longer avoided during block connects and when
we're checking for the validity of the block connects and disconnects as
we can just use the cache instead of trying to avoid it.
@kcalvinalvin kcalvinalvin force-pushed the 2024-02-15-no-utxocache-loading-on-reorgs branch from f76a887 to a254998 Compare March 5, 2024 17:41
Allowing the caller to fetch from either the database or the cache
resulted in inconsistencies if the cache were ever to be dirty.
Removing this option eliminates this problem.
Since no code is now depending on accepting new blocks without the
cache, we get rid of the option to do so.
Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🍖

@Roasbeef Roasbeef merged commit e63bf03 into btcsuite:master Mar 9, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants