-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add additional idempotence check to cover Kafka server restart, while EthConnect stays running #227
Conversation
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
…essor impl Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
…een persisted Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
// Then check LevelDB - we should find the entry | ||
r, err := p.receiptStore.GetReceipt(inflight.msgID) | ||
if err != nil { | ||
return false, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add a richer return status to account for transient receipt store access issues? I'm wondering what the implication of indicating a false negative is for the application. Here, we appear to be saying "we don't know if there is a receipt with this ID or not, assume this is not idempotent".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did track through the result of this error return, and it will actually end up coming back as an error reply in Kafka, which would overwrite any "good" reply if there was one that was earlier.
The error would be very generic, to just whatever came from the LevelDB/MongoDB persistence layer - rather than being specific to the idempotency check.
I couldn't think of a better option here:
- Infinite retry felt wrong under the lock
- Silently ignoring felt wrong, because we can't sure sure an event went back at all in that case
But, you're absolutely right I should wrap this is a more detailed explanation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @vdamle - I've added a more descriptive error, but am open to other suggestions too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, looks great now.
…tore Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Codecov Report
@@ Coverage Diff @@
## main #227 +/- ##
==========================================
- Coverage 97.23% 96.95% -0.28%
==========================================
Files 70 70
Lines 7660 7717 +57
==========================================
+ Hits 7448 7482 +34
- Misses 163 184 +21
- Partials 49 51 +2
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
EthConnect works on an "at target" nonce allocation (see comparision with "at source" in
readme of FFTM), meaning it gives an "at least once" delivery assurance to the blockchain, backed by Apache Kafka.
In the case that the REST API Gateway is tuned for high performance, with an in-flight count in the many hundreds, and the Kafka servers are restarted, the consumer groups might redeliver many messages. This is undesirable.
EthConnect already has the concept of an idempotency key, on the front-side of the REST API Gateway, using
ackmode
as added in #175 to get an immediate receipt, combined with supplying your own custom ID. However, that is only checked in the REST API Gateway boundary layer today.See the following diagram as a reference, showing how a Kafka at-least-once redelivery results in this duplication:
This PR proposes adding an additional idempotency check, at the point we receive the message from Kafka. Note this does not change the fundamental nature of the "at target" architecture from being at-least-once, and in some failure scenarios (for full idempotent delivery e2e with Ethereum nonces you would need the "at source" ordering architecture of https://github.com/hyperledger/firefly-evmconnect based on the new FFTM architecture).
But this PR does protect against something like a planned HA rolling restart of a Kafka cluster, from causing redelivery.
The new check is only enabled when:
fly-ackmode=receipt
has been specified in the transaction submission via the APIs described in Add circuit breaker to stop runaway producers losing messages, and immediate receipt option #175The check covers two scenarios: