Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract sequence tracking from the Broadcaster #12353

Merged
merged 11 commits into from
Mar 19, 2024

Conversation

amit-momin
Copy link
Contributor

@amit-momin amit-momin commented Mar 8, 2024

  • Created the new SequenceTracker interface which will be used to manage sequences used in the Broadcaster
  • Created a new nonceTracker component in the EVM code as the EVM specific implementation of SequenceTracker
  • Extracted the sequence management logic from the Broadcaster into the nonceTracker
  • Removed dependency on the FindLatestSequence TxStore method for loading the sequences on startup. Switch to using the more generic GetAllTransactions method.

Copy link
Contributor

github-actions bot commented Mar 8, 2024

I see you updated files related to core. Please run pnpm changeset to add a changeset.

Copy link
Contributor

github-actions bot commented Mar 8, 2024

I see that you haven't updated any README files. Would it make sense to do so?

@amit-momin amit-momin marked this pull request as ready for review March 11, 2024 19:41
@amit-momin amit-momin requested review from a team as code owners March 11, 2024 19:41
@amit-momin amit-momin changed the title Extract sequence tracking from the Broadcaster into a separate component Extract sequence tracking from the Broadcaster Mar 11, 2024
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciation comment for cleaning this up from TXM 🙌

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually does it make sense to move NonceTracker inside Broadcaster completely? It seems like Broadcaster already has all the necessary parameters and non else calls NonceTracker for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Moved the nonce tracker initialization into NewEvmBroadcaster in the latest commit. Had to adjust how we initialize the Broadcaster in tests though to still have an exposed nonce tracker for tests.

return seq, err
}

func (s *nonceTracker) getSequenceFromStore(ctx context.Context, address common.Address) (seq evmtypes.Nonce, err error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why unpack FindLatestSequence's logic here and not keep it as it is? GetAllTransactions seems like a heavier query with no added benefits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prashantkumar1982 mentioned we're trying to move to a generic tx store so moving forward we'd only want to rely on generic methods. So this was just to break our reliance on an EVM specific method while I was already making these changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a good practice to load every tx from the DB and handle the logic in memory. This is significantly heavier than what we do now. If we decide moving to a generic tx store makes sense then we can make the change, but for now I would suggest to keep it as it is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Dimitris here. Let's keep it the same. When we refactor the txstore, we can deal with this problem then. For now, let's just focus on the nonce and not change any behavior related to the txstore.

Copy link
Collaborator

@patrick-dowell patrick-dowell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. I appreciate that you removed more lines than you added :)

Just a few notes left in the comments - we will need to address these before merging.

return seq, err
}

func (s *nonceTracker) getSequenceFromStore(ctx context.Context, address common.Address) (seq evmtypes.Nonce, err error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Dimitris here. Let's keep it the same. When we refactor the txstore, we can deal with this problem then. For now, let's just focus on the nonce and not change any behavior related to the txstore.

@@ -1013,16 +1029,6 @@ func (o *evmTxStore) UpdateTxCallbackCompleted(ctx context.Context, pipelineTask
return nil
}

func (o *evmTxStore) FindLatestSequence(ctx context.Context, fromAddress common.Address, chainId *big.Int) (nonce evmtypes.Nonce, err error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep using this method and UpdateKeyNextSequence until we refactor the TxStore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to note UpdateKeyNextSequence is unused. I removed it just as a clean up. It's actually updating a table that we don't own.

s.lggr.Infow("Fast-forward sequence", "address", addr, "newNextSequence", nonce, "oldNextSequence", localSequence)
}

s.sequenceLock.Lock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit dangerous to lock after doing a check that can affect whether we want to proceed with the write. In this case, what happens if the value of nonce here changes after you do the check on 152 but before you lock here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check on line 152 is actually only used to determine if we should log some statements, otherwise we proceed with the write regardless. Also the nonce here is just the one we retrieved from on chain. I guess it could change but the sequence map lock wouldn't prevent any desync with on-chain data.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok cool this makes sense. So it's fine as is then!

client NonceTrackerClient
enabledAddresses []common.Address

sequenceLock sync.RWMutex
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity, does the nonceTracker need a lock? If it's always been called from the broadcaster which has its own lock, then it might be protected already and adding the extra lock just slows down the code. But I'd need to look more carefully to confirm this. It might be safer just to have it unless it's crystal clear that the the calling code is protected by a broadcaster lock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lock was actually moved over from the Broadcaster so we aren't double locking here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically, the Broadcaster can process multiple addresses at the same time, which are handled by different go routines (monitorTxs) so there is a scenario where SequenceTracker requires locking. Good observation though, I like that we're pushing to clean up unnecessary things 🙌 .

logger logger.Logger,
checkerFactory TransmitCheckerFactory,
autoSyncNonce bool,
) *Broadcaster {
return txmgr.NewBroadcaster(txStore, client, chainConfig, feeConfig, txConfig, listenerConfig, keystore, txAttemptBuilder, nonceSyncer, logger, checkerFactory, autoSyncNonce, evmtypes.GenerateNextNonce)
nonceTracker := NewNonceTracker(logger, txStore, client)
Copy link
Collaborator

@dimriou dimriou Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of creating the SequenceTracker inside the Broadcaster altogether but I guess the plan is to use it in other components as well so this is totally fine.

dimriou
dimriou previously approved these changes Mar 15, 2024
patrick-dowell
patrick-dowell previously approved these changes Mar 15, 2024
Copy link
Collaborator

@patrick-dowell patrick-dowell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

// Try to retrieve next sequence from tx table or on-chain to load the map
// A scenario could exist where loading the map during startup failed (e.g. All configured RPC's are unreachable at start)
// The expectation is that the node does not fail startup so sequences need to be loaded during runtime
foundSeq, err := s.getSequenceForAddr(ctx, address)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes an RPC call onchain, while holding the lock. So we are locking this for potentially many seconds.
Could we acquire the lock only when we are reading/writing from the nextSequenceMap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we run into an issue where multiple calls are made to GetNextSequence where none find the value locally so they all search on-chain? Then if we only lock the actual write to the map, the callers could get different values from on-chain depending which RPC their request was sent to. They would then write conflicting values to the map and return different nonces? I think this wouldn't be a problem in the current setup where the Broadcaster only uses the nonce tracker but if other components start using this we could maybe run into this problem. My worries could also be completely unfounded though so not against making this change if this isn't a worry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I am assuming the broadcaster is the only caller, and for a given FromAddress, it only calls NonceTracker from a single thread.
But lets keep the logic as it is for now.

"chainlink": patch
---

Fixed nonce gap bug if in-progress tx sent but TXM shutdown before marked as broadcasted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Fixed a race condition bug around EVM nonce management, which could cause the Node to skip a nonce and get stuck.

@prashantkumar1982 prashantkumar1982 added this pull request to the merge queue Mar 19, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 19, 2024
@prashantkumar1982 prashantkumar1982 added this pull request to the merge queue Mar 19, 2024
Merged via the queue into develop with commit 07c9f6c Mar 19, 2024
105 checks passed
@prashantkumar1982 prashantkumar1982 deleted the txm-sequence-refactor branch March 19, 2024 01:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants