Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-1.28] Fix issue that allowed multiple simultaneous snapshots to be allowed #10377

Merged

Conversation

brandond
Copy link
Member

Proposed Changes

Replace 1-weight semaphore on snapshots with simple mutex

Fixes an issue where the semaphore wasn't permanently initialized until a scheduled snapshot was taken, allowing multiple on-demand snapshots to be taken until the first scheduled snapshot was triggered.

Types of Changes

bugfix

Verification

When fixed, you should see:

root@k3s-server-1:~# k3s etcd-snapshot save & k3s etcd-snapshot save; sleep 5
FATA[0000] see server log for details: Internal error occurred: etcd-snapshot error ID 40905
INFO[0000] Snapshot on-demand-k3s-server-1-1718755362 saved.
[1]+  Done(1)                    k3s etcd-snapshot save

server log:

INFO[0055] Saving etcd snapshot to /var/lib/rancher/k3s/server/db/snapshots/on-demand-k3s-server-1-1718755362
{"level":"info","ts":"2024-06-19T00:02:41.544644Z","logger":"etcd-client","caller":"snapshot/v3_snapshot.go:65","msg":"created temporary db file","path":"/var/lib/rancher/k3s/server/db/snapshots/on-demand-k3s-server-1-1718755362.part"}
{"level":"info","ts":"2024-06-19T00:02:41.546515Z","logger":"etcd-client.client","caller":"v3@v3.5.13-k3s1/maintenance.go:212","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":"2024-06-19T00:02:41.546559Z","logger":"etcd-client","caller":"snapshot/v3_snapshot.go:73","msg":"fetching snapshot","endpoint":"https://127.0.0.1:2379"}
{"level":"info","ts":"2024-06-19T00:02:41.546709Z","caller":"v3rpc/maintenance.go:126","msg":"sending database snapshot to client","total-bytes":2859008,"size":"2.9 MB"}
ERRO[0055] etcd-snapshot error ID 40905: snapshot save already in progress
ERRO[0055] Sending HTTP 500 response to 127.0.0.1:35044: etcd-snapshot error ID 40905
{"level":"info","ts":"2024-06-19T00:02:41.5617Z","caller":"v3rpc/maintenance.go:166","msg":"sending database sha256 checksum to client","total-bytes":2859008,"checksum-size":32}
{"level":"info","ts":"2024-06-19T00:02:41.561741Z","caller":"v3rpc/maintenance.go:175","msg":"successfully sent database snapshot to client","total-bytes":2859008,"size":"2.9 MB","took":"now"}
{"level":"info","ts":"2024-06-19T00:02:41.561852Z","logger":"etcd-client.client","caller":"v3@v3.5.13-k3s1/maintenance.go:220","msg":"completed snapshot read; closing"}
{"level":"info","ts":"2024-06-19T00:02:41.571993Z","logger":"etcd-client","caller":"snapshot/v3_snapshot.go:88","msg":"fetched snapshot","endpoint":"https://127.0.0.1:2379","size":"2.9 MB","took":"now"}
{"level":"info","ts":"2024-06-19T00:02:41.572068Z","logger":"etcd-client","caller":"snapshot/v3_snapshot.go:97","msg":"saved","path":"/var/lib/rancher/k3s/server/db/snapshots/on-demand-k3s-server-1-1718755362"}
INFO[0055] Reconciling ETCDSnapshotFile resources
I0619 00:02:41.576558      54 event.go:389] "Event occurred" object="local-on-demand-k3s-server-1-1718755362-e95d32" fieldPath="" kind="ETCDSnapshotFile" apiVersion="k3s.cattle.io/v1" type="Normal" reason="ETCDSnapshotCreated" message="Snapshot on-demand-k3s-server-1-1718755362 saved on k3s-server-1"
INFO[0055] Reconciliation of ETCDSnapshotFile resources complete

Testing

Linked Issues

User-Facing Change

Further Comments

Fixes an issue where the semaphore wasn't permanently initialized
until a scheduled snapshot was taken, allowing multiple on-demand
snapshots to be taken until the first scheduled snapshot was triggered.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
@brandond brandond requested a review from a team as a code owner June 19, 2024 01:16
Copy link

codecov bot commented Jun 19, 2024

Codecov Report

Attention: Patch coverage is 20.00000% with 8 lines in your changes missing coverage. Please review.

Project coverage is 36.35%. Comparing base (dd86a05) to head (837aaee).

Files Patch % Lines
pkg/etcd/snapshot_handler.go 0.00% 5 Missing ⚠️
pkg/etcd/snapshot.go 0.00% 3 Missing ⚠️
Additional details and impacted files
@@               Coverage Diff                @@
##           release-1.28   #10377      +/-   ##
================================================
- Coverage         42.69%   36.35%   -6.34%     
================================================
  Files               163      160       -3     
  Lines             16645    14052    -2593     
================================================
- Hits               7107     5109    -1998     
+ Misses             8276     7948     -328     
+ Partials           1262      995     -267     
Flag Coverage Δ
e2etests 36.35% <20.00%> (-9.44%) ⬇️
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@brandond brandond merged commit 617b0e8 into k3s-io:release-1.28 Jun 19, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants