Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcdserver: creates a non-empty raft log snapshot on server startup #18494

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

clement2026
Copy link
Contributor

@clement2026 clement2026 commented Aug 25, 2024

#18459 requires that a non-empty raft log snapshot is always available. This PR creates a non-empty raft log snapshot on server startup.

Part of #17098

Key changes

  • Alter shouldSnapshot function
  • Added integration tests in tests/integration/raft_log_snapshot_test.go
  • Fixed failing tests

Blocked by

@k8s-ci-robot
Copy link

Hi @clement2026. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@codecov-commenter
Copy link

codecov-commenter commented Aug 25, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 22.50000% with 31 lines in your changes missing coverage. Please review.

Project coverage is 68.79%. Comparing base (2c53be7) to head (8ab968e).
Report is 14 commits behind head on main.

Current head 8ab968e differs from pull request most recent head ccbec07

Please upload reports for the commit ccbec07 to get more accurate results.

Files with missing lines Patch % Lines
client/v3/kubernetes/client.go 0.00% 27 Missing ⚠️
server/etcdserver/server.go 69.23% 3 Missing and 1 partial ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
Files with missing lines Coverage Δ
server/etcdserver/server.go 81.13% <69.23%> (-0.14%) ⬇️
client/v3/kubernetes/client.go 0.00% <0.00%> (ø)

... and 22 files with indirect coverage changes

@@           Coverage Diff           @@
##             main   #18494   +/-   ##
=======================================
  Coverage   68.79%   68.79%           
=======================================
  Files         420      420           
  Lines       35489    35471   -18     
=======================================
- Hits        24413    24404    -9     
+ Misses       9646     9633   -13     
- Partials     1430     1434    +4     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2c53be7...ccbec07. Read the comment docs.

@clement2026 clement2026 force-pushed the non-empty-raft-log-snapshot-always-available branch from 86dc7a7 to a0c0639 Compare August 25, 2024 19:54
@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: clement2026
Once this PR has been reviewed and has the lgtm label, please assign ahrtr for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@clement2026 clement2026 changed the title etcdserver: a non-empty raft log snapshot should always be available WIP: etcdserver: a non-empty raft log snapshot should always be available Aug 28, 2024
Signed-off-by: Clement <gh.2lgqz@aleeas.com>
Signed-off-by: Clement <gh.2lgqz@aleeas.com>
Signed-off-by: Clement <gh.2lgqz@aleeas.com>
@clement2026 clement2026 force-pushed the non-empty-raft-log-snapshot-always-available branch from 0a3fb00 to 103398a Compare August 28, 2024 19:51
Signed-off-by: Clement <gh.2lgqz@aleeas.com>
…itial snap index

Signed-off-by: Clement <gh.2lgqz@aleeas.com>
Signed-off-by: Clement <gh.2lgqz@aleeas.com>
Signed-off-by: Clement <gh.2lgqz@aleeas.com>
Signed-off-by: Clement <gh.2lgqz@aleeas.com>
@clement2026 clement2026 changed the title WIP: etcdserver: a non-empty raft log snapshot should always be available etcdserver: a non-empty raft log snapshot should always be available Aug 30, 2024
@clement2026 clement2026 marked this pull request as ready for review August 30, 2024 10:29
@clement2026 clement2026 changed the title etcdserver: a non-empty raft log snapshot should always be available [needs-ok-to-test] etcdserver: a non-empty raft log snapshot should always be available Aug 30, 2024
Signed-off-by: Clement <gh.2lgqz@aleeas.com>
Signed-off-by: Clement <gh.2lgqz@aleeas.com>
@clement2026 clement2026 changed the title [needs-ok-to-test] etcdserver: a non-empty raft log snapshot should always be available [WIP] etcdserver: a non-empty raft log snapshot should always be available Aug 31, 2024
Signed-off-by: Clement <gh.2lgqz@aleeas.com>
@@ -2181,24 +2184,24 @@ func (s *EtcdServer) snapshot(snapi uint64, confState raftpb.ConfState) {
}

// keep some in memory log entries for slow followers.
compacti := uint64(1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could cause a panic if the applied index is also 1

@@ -99,8 +99,12 @@ func TestApplyRepeat(t *testing.T) {
SyncTicker: &time.Ticker{},
consistIndex: cindex.NewFakeConsistentIndex(0),
uberApply: uberApplierMock{},
kv: mvcc.New(zaptest.NewLogger(t), be, &lease.FakeLessor{}, mvcc.StoreConfig{}),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set kv to avoid a nil pointer panic:

func (s *EtcdServer) snapshot(snapi uint64, confState raftpb.ConfState) {
d := GetMembershipInfoInV2Format(s.Logger(), s.cluster)
// commit kv to write metadata (for example: consistent index) to disk.
//
// This guarantees that Backend's consistent_index is >= index of last snapshot.
//
// KV().commit() updates the consistent index in backend.
// All operations that update consistent index must be called sequentially
// from applyAll function.
// So KV().Commit() cannot run in parallel with toApply. It has to be called outside
// the go routine created below.
s.KV().Commit()
lg := s.Logger()
// For backward compatibility, generate v2 snapshot from v3 state.
snap, err := s.r.raftStorage.CreateSnapshot(snapi, &confState, d)
if err != nil {

}
s.start()

n.readyc <- newDummyPutReqReady()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case wasn’t appending raft log entries correctly when the applied index increases, leading to an “slice bounds out of range” panic during snapshot creation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move it to a separate PR as proposed above?

}
listeners = append(listeners, l)
}
m.PeerListeners = listeners
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case added a learner member with peer URL http://127.0.0.1:1234 to the cluster, but the learner member didn't listen to the host and couldn’t receive a snapshot from the leader.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move to a separate PR so we can merge this faster.

Signed-off-by: Clement <gh.2lgqz@aleeas.com>
@clement2026 clement2026 changed the title [WIP] etcdserver: a non-empty raft log snapshot should always be available etcdserver: a non-empty raft log snapshot should always be available Aug 31, 2024
@k8s-ci-robot
Copy link

@clement2026: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Signed-off-by: Clement <gh.2lgqz@aleeas.com>
@ivanvc
Copy link
Member

ivanvc commented Sep 4, 2024

/ok-to-test

@clement2026
Copy link
Contributor Author

Hey @serathius, it would be great if you could check this out when you have a moment. Thanks!

@@ -99,8 +99,12 @@ func TestApplyRepeat(t *testing.T) {
SyncTicker: &time.Ticker{},
consistIndex: cindex.NewFakeConsistentIndex(0),
uberApply: uberApplierMock{},
kv: mvcc.New(zaptest.NewLogger(t), be, &lease.FakeLessor{}, mvcc.StoreConfig{}),
Copy link
Member

@serathius serathius Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that we might need to fix couple of things in tests before we can merge the logic change. How about creating a separate PR to fix the tests first? That would give us confidence that there is no interdependence.

If those tests fixes are really beneficial, we can merge them immediately so detailed review needed for raft logic change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s good practice to avoid interdependence. Thanks for the insight! I’ll create new PRs to fix the tests whenever they can be separated.

@@ -138,7 +139,32 @@ func TestV2DeprecationSnapshotMatches(t *testing.T) {
members2 := addAndRemoveKeysAndMembers(ctx, t, cc2, snapshotCount)
assert.NoError(t, epc.Close())

assertSnapshotsMatch(t, oldMemberDataDir, newMemberDataDir, func(data []byte) []byte {
lastVer, err := e2e.GetVersionFromBinary(e2e.BinPath.EtcdLastRelease)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this is needed, please all test changes to separate PR.

@clement2026 clement2026 changed the title etcdserver: a non-empty raft log snapshot should always be available etcdserver: creates a non-empty raft log snapshot on server startup Sep 13, 2024
@ahrtr
Copy link
Member

ahrtr commented Sep 13, 2024

#18459 requires that a non-empty raft log snapshot is always available.

Can you explain this? Why a non-empty snapshot is always required? We don't have such restriction before.

@clement2026
Copy link
Contributor Author

Can you explain this? Why a non-empty snapshot is always required? We don't have such restriction before.

@ahrtr sure, check out this comment for more details #18459 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

6 participants