-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft: use a type-safe slice of log entries #46
Conversation
The sanity checking introduced in this PR helped to uncover an invalid
Figuring out whether it's a result of the specific test setup. But either way, the leader should not send incorrect |
6527252
to
f5542f8
Compare
@tbg @ahrtr PTAL, what do you think of the concept? This is inspired by etcd-io/etcd#15595 and etcd-io/etcd#15704 where entry slices are [suspect to be] invalid. In the meantime I'm looking at the panic in |
raft.go
Outdated
@@ -1644,11 +1646,15 @@ func stepFollower(r *raft, m pb.Message) error { | |||
} | |||
|
|||
func (r *raft) handleAppendEntries(m pb.Message) { | |||
entries, err := VerifyLogRange(m.Index, m.LogTerm, m.Entries) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may cause reduce of performance. So suggest to only enable the verification in test by default; and disable it in production.
We can introduce an environment variable to enable / disable it. FYI. https://github.com/etcd-io/etcd/blob/main/client/pkg/verify/verify.go
It's a good idea to me, but we should only enable the verification in test. See #46 (comment) |
This commit adds a couple invariant checks which were silently ignored and resulted in incorrectly setup tests passing, and incorrect messages exchanged. Property 1: for a MsgApp message m, m.Entries[i].Term >= m.LogTerm. Or, more generally, m.Entries is a slice of contiguous entries with a non-decreasing Term, and m.Index+1 == m.Entries[0].Index && m.LogTerm <= m.Entries[0].Term. This property was broken in a few tests. The root cause was that leader appends out-of-order entries to its own log when it becomeLeader(). This was a result of incorrectly set up tests: they restored to a snapshot at (index=11,term=11), but became leader at an earlier term 1. Then it was sending the following, obviously incorrect, MsgApp: {Index=11,LogTerm=11,Entries={{Index=12,Term=1}}. The fix in tests is either going through a follower state at the correct term, by calling becomeFollower(term, ...), or initializing from a correct HardState in storage. Property 2: For a MsgSnap message m, m.Term >= m.Snapshot.Metadata.Term, because the leader doesn't know of any higher term than its own, and hence can't send a message with such an inversion. This was broken in TestRestoreFromSnapMsg, and is now fixed. Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
Fixing the test failures in #47 - it's a pre-existing bug in tests, and a lack of invariant checks that lead to an incorrect |
065e347
to
505efe2
Compare
This commit introduces LogRange - a type-safe slice of contiguous log entries. Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
505efe2
to
ad64f0b
Compare
All appends are conditional puts, with condition being that the preceding entry is (index, term). The entries slice + this (index, term) can be used by all the log append stack starting from handling the MsgApp message, all the way down to storage. Signed-off-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
ad64f0b
to
f245683
Compare
All
[]pb.Entry
slices thatraft
works with are contiguous valid log ranges, for which the following invariants hold:entries[i+1].Index == entries[i].Index + 1
entries[i+1].Term >= entries[i].Term
Term > 0
andIndex > 0
; the only zero case is the "sentinel"(0, 0)
entry.Much of the code implicitly assumes this, and will behave arbitrarily if this isn't true. This PR adds a
LogRange
type which is a type-safe slice promising these invariants explicitly.Additionally, entry slices are often used in conjunction with an
(index, term)
tuple which describes the entry that immediately precedes the slice, i.e.:entries[0].Index == index + 1
entries[0].Term >= term
For example, the log append handling stack (starting from a
MsgApp
message handler all the way down tounstable
andStorage
) operates with such a construct, effectively a "conditional put".We have seen incorrect (not adhering to this invariant)
MsgApp
being sent (#47), or received (etcd-io/etcd#14735). A type-safe / verified entries "append" slice should reduce the chance of such bugs, or at least make them better discoverable.This PR introduces such a type,
logAppend
, and refactors theraftLog
type to take advantage of it. This also makes the code cleaner.This type could evolve into "the only" way of storing and passing entry slices around. Effectively, the log (or its sub-range) is always an
(index, term)
indicating a compaction or other "watermark", plus a[]pb.Entry
suffix of currently available entries. This seems true forunstable
andraftLog
types. Another example is theMemoryStorage
.