Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New higher capacity Txpool #2660

Open
wants to merge 42 commits into
base: develop
Choose a base branch
from
Open

New higher capacity Txpool #2660

wants to merge 42 commits into from

Conversation

emailtovamos
Copy link
Contributor

@emailtovamos emailtovamos commented Aug 20, 2024

Description

This PR works to expand the legacy transaction pool's functionality by adding two new concepts:

  • pool2 -> only broadcasts to static peers
  • pool3 -> local buffer pool which waits for pool2 to get space and when it can it sends the transactions to pool2.

The current pool as it is now is simply referred to pool1.

This will help accommodating more number of transactions especially in high traffic when somewhat underpriced and otherwise valid transactions get rejected.

Rationale

tell us why we need these changes...

Example

add an example CLI or API response...

Changes

Notable changes:

  • add each change in a bullet point here
  • ...

Copy link

@WTDuck2255 WTDuck2255 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx so much

@emailtovamos emailtovamos marked this pull request as ready for review September 2, 2024 06:18
// Static bool is Whether to send to only Static peer or not.
// This is because at high traffic we still want to broadcast transactions to at least some peers so that we
// minimize the transaction lost.
Static bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call it differently, maybe BroadcastToStatic or something more meaningful

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion-1:
For pool-2: it has its own peers to broadcast the Txs, the peers of pool-2 could be consisted by:
1.StaticNodes
2.A subset of the other peers, sqrt()?
Because, many node may not contain any StaticNodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sqrt() of peers is already what happens in pool-1. So we might do cube root or something?

Copy link
Collaborator

@zzzckck zzzckck Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sqrt() of peers is already what happens in pool-1
// ==
do you mean the sqrt() logic in BroadcastTransactions? https://github.com/bnb-chain/bsc/blob/master/eth/handler.go#L865?

What I mean here is that: suppose we have 400 connected peers, 10 StaticNodes, then:
1.For pool-1: it has 400 peers, directly send full tx content to 20(sqrt(400) peers, and send announcement to the left 380 peers
2.For pool-2: it has sqrt(400) + 10(size of static nodes), discard the overlapped one, it could have 20 -> 30 peers. Suppose it is 25 peets, base on it, send full tx content to 5(sqrt(25)) peers, and send announcement to the left 20 peers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the above example, there is a chance that all 5 peers are non-static. Is that fine? Or we should have at least 1 static peer (if it exists) to send full tx always.

txSlots := numSlots(tx)

// Remove elements until there is enough capacity
for lru.size+txSlots > lru.capacity && lru.buffer.Len() > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why lru.buffer.Len() > 0 ?

Comment on lines 968 to 981
pool.all.Add(tx, pool2)
pool.priced.Put(tx, pool2)
pool.journalTx(from, tx)
pool.queueTxEvent(tx, false)
_, err := pool.enqueueTx(tx.Hash(), tx, isLocal, true, false) // At this point pool1 can incorporate this. So no need for pool2 or pool3
if err != nil {
return false, err
}
log.Trace("Pooled new executable transaction", "hash", tx.Hash(), "from", from, "to", tx.To())

// Successful promotion, bump the heartbeat
pool.beats[from] = time.Now()

return true, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code repeats maybe it could be separate function

Copy link
Contributor

@MatusKysel MatusKysel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, some minor comments

Copy link

@Aces42020 Aces42020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JumpTasker#4900

cmd/geth/main.go Show resolved Hide resolved
cmd/geth/main.go Show resolved Hide resolved
@@ -30,6 +30,10 @@ var (
// configured for the transaction pool.
ErrUnderpriced = errors.New("transaction underpriced")

// ErrUnderpriced is returned if a transaction's gas price is below the minimum
// configured for the transaction pool.
ErrUnderpricedTransferredtoAnotherPool = errors.New("transaction underpriced, so it is either in pool2 or pool3")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need this error? I think underpriced transaction can be simply discarded

"github.com/ethereum/go-ethereum/core/types"
)

type LRUBuffer struct {
Copy link
Collaborator

@zzzckck zzzckck Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion-2:
maybe we can just use the fastcache instead? it is a RingBuffer, not LRU, RingBuffer has some advantages, you may refer: https://github.com/bnb-chain/bsc/blob/master/core/state/snapshot/disklayer.go#L36

Fastcache may not work directly, as it does not support iteration. We can add a list of TxHash to support the iteration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also suggest here to use well tested lib, like github.com/ethereum/go-ethereum/common/lru, github.com/VictoriaMetrics/fastcache, etc. They have less bug and better performance.

ReannounceTime time.Duration // Duration for announcing local pending transactions again
Lifetime time.Duration // Maximum amount of time non-executable transaction are queued
ReannounceTime time.Duration // Duration for announcing local pending transactions again
InterPoolTransferTime time.Duration // Attempt to transfer from pool3 to pool2 every this much time
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need InterPoolTransferTime?

txPoolSizeAfterCurrentTx := uint64(pool.all.Slots() + numSlots(tx))
var includePool1, includePool2, includePool3 bool
if txPoolSizeAfterCurrentTx <= maxPool1Size {
includePool1 = true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion-3:
transaction would be discarded in pool-1, even when pool-1 is not full.
like this case:

  • txpool.pending.length > pool.config.GlobalSlots
  • the sender's pending transaction > pool.config.AccountSlots

I think we can just move transactions into pool-2 if pool-1 reject it, so does pool-3, it is a prioritised pool:

  • 1.try put it into pool-1, ok=> done
  • 2.try put it into pool-2, ok=> done
  • 3.try put it into pool-3, ok=> done
    It could be much clearer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above two scenarios check happen anyway during runReorg(). So even if such a transaction got added it will be removed in the later call of runReorg() just like what happens currently unless I am missing something?

@@ -2038,3 +2164,91 @@ func (t *lookup) RemotesBelowTip(threshold *big.Int) types.Transactions {
func numSlots(tx *types.Transaction) int {
return int((tx.Size() + txSlotSize - 1) / txSlotSize)
}

func (pool *LegacyPool) startPeriodicTransfer(t time.Duration) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion-4:
transfer should happen on new blocked inserted, rather than a timer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion-4: transfer should happen on new blocked inserted, rather than a timer.

a timer will get better performance?
if tansfer happen on new blocked inserted, then will wait the transfer before mining new blocks

}()
}

// transferTransactions mainly moves from pool 3 to pool 2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion-5:
I think pool-3 does not need to transfer to pool-2 specifically. the transactions in pool-3 can just seen as new arrived transaction, there is the workflow:

  • 1.check if it is still valid or not, remove if it is no longer valid
  • 2.try to put it into pool-1, ok => remove it from pool-3
  • 3.try to put it into pool-2, ok => remove it from pool-3
  • 4.if failed to transfer, due to pool-1 and pool-2 are full, stop the transfer procedure.

func TestNewLRUBuffer(t *testing.T) {
capacity := 10
lru := NewLRUBuffer(capacity)
if lru.capacity != capacity {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to use github.com/stretchr/testify/require to assert:

require.Equal(t, capacity, lru.capacity, "capacity wrong")

txs := make([]*types.Transaction, 0)
statics := make([]bool, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statics var can be removed because it drops when returned from func.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is one of the function returns. How can I remove it? Or did I misunderstand?

"github.com/ethereum/go-ethereum/core/types"
)

type LRUBuffer struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also suggest here to use well tested lib, like github.com/ethereum/go-ethereum/common/lru, github.com/VictoriaMetrics/fastcache, etc. They have less bug and better performance.


// calculate total number of slots in drop. Accordingly add them to pool3 (if there is space)
// all members of drop will be dropped from pool1/2 regardless of whether they get added to pool3 or not
availableSlotsPool3 := pool.availableSlotsPool3()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the call here be simplified? Let the complex logic be coupled in the pool3 object? For example, for pool3, which is an LRU, just add it directly and let the LRU eliminate it.

pool.addToPool3(drop, isLocal)

@WillAR37 WillAR37 mentioned this pull request Sep 20, 2024
// Move a transaction to the front of the LRU order
func (lru *LRUBufferFastCache) moveToFront(hash common.Hash) {
for i, h := range lru.order {
if h == hash {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iterate all?cost much

for i, h := range lru.order {
if h == hash {
// Remove the hash from its current position
lru.order = append(lru.order[:i], lru.order[i+1:]...)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huge copy?

}
}
// Add it to the front
lru.order = append(lru.order, hash)
Copy link
Collaborator

@buddh0 buddh0 Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huge copy?

reorgDoneCh: make(chan chan struct{}),
reorgShutdownCh: make(chan struct{}),
initDoneCh: make(chan struct{}),
localBufferPool: NewLRUBufferFastCache(int(config.Pool3Slots)),
Copy link
Collaborator

@buddh0 buddh0 Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
100k means 100k*1024*0124, 100G?

}

// addToPool12OrPool3 adds a transaction to pool1 or pool2 or pool3 depending on which one is asked for
func (pool *LegacyPool) addToPool12OrPool3(tx *types.Transaction, from common.Address, isLocal bool, pool1, pool2, pool3 bool) (bool, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no scene call addToPool12OrPool3 with pool1 or pool2 ?

}
}
// Add it to the front
lru.order = append(lru.order, hash)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call move item to the back as moveToFront?

@buddh0
Copy link
Collaborator

buddh0 commented Sep 20, 2024

during the benchmark in QA env, I use the follwing config

GlobalSlots = 10000
GlobalQueue = 5000

then I find the txpool is not the bottleneck of performance, validator always have enough txs to handle

based on this, I think we don't need pool2

@buddh0
Copy link
Collaborator

buddh0 commented Sep 20, 2024

no replacement logic in pool3, so if txs put into pool3, the replacement may fail, does it matter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants