Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple issues with temporary bootstrap etcd config #7368

Closed
brandond opened this issue Apr 27, 2023 · 1 comment
Closed

Multiple issues with temporary bootstrap etcd config #7368

brandond opened this issue Apr 27, 2023 · 1 comment
Assignees
Milestone

Comments

@brandond
Copy link
Member

brandond commented Apr 27, 2023

A recent customer call turned up a few issues with the temporary etcd that runs to extract bootstrap data:

  • The temporary etcd config pulls extra args from apiserver arg list, instead of the etcd arg list.
    This made it difficult to raise the quota, as the arg was not being passed through properly.
  • The defrag/alarm clear runs with a short deadline that may not succeed when the datastore is on slower storage.
    This made it difficult to bring the cluster up with an increased quota, as the defrag operation took 22 seconds to complete which is longer than the deadline allowed for.
    This was fixed a while ago; the customer was on v1.23.6 which had a 10 second timeout. This is 30 seconds on current releleases.

As an additional note, defragging on startup isn't always sufficient to recover storage space; in this case we had to compact as well as defrag in order to bring the size down after they filled the datastore by running an etcdctl perf test on it. I'm not sure we need to handle that though.

@fmoral2
Copy link
Contributor

fmoral2 commented May 10, 2023

Validated on Version:

- k3s version v1.27.1+k3s-ad41fb8c (ad41fb8c) - Previous version Commit ID 9980504196ce0cb53c8e04756598d6f8982a5756
- k3s version v1.27.1+k3s-607cbf0a (607cbf0a) (Target commit id version) Commit ID 324ecfc30da9465b13b4bf93ff90abe7adde1527

Environment Details

Infrastructure
Cloud EC2 instance

Node(s) CPU architecture, OS, and Version:
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP4"

Cluster Configuration:
1 node

Config.yaml:

cat /etc/rancher/k3s/config,yaml
write-kubeconfig-mode: 644
token: test

Steps to reproduce the issue

  1. Install k3s in previous version
  2. Stop server
  3. Start server with additional etcd args
  4. Look in the logs for the etcd args
  5. Check if they are set

Steps to validate the fix

  1. Install k3s in commit version
  2. Stop server
  3. Start server with additional etcd args
  4. Look in the logs for the etcd args
  5. Check if they are set

Validation Results:

###########    Issue     ###########


v1.24.13+k3s-280e058f (280e058f)


~$ k3s --version

k3s version v1.27.1+k3s-ad41fb8c (ad41fb8c)


$ sudo systemctl stop k3s

$ sudo k3s server --etcd-arg='--quota-backend-bytes=6000000000'


Not found!


{"level":"info","ts":"2023-05-10T19:40:55.998Z","caller":"etcdserver/quota.go:94","msg":"enabled backend quota with default value","quota-name":"v3-applier","quota-size-bytes":2147483648,"quota-size":"2.1 GB"}

{
    "level": "info",
    "ts": "2023-05-10T19:40:55.828Z",
    "caller": "embed/etcd.go:306",
    "msg": "starting an etcd server",
    "etcd-version": "3.5.7",
    "git-sha": "Not provided (use ./build instead of go build)",
    "go-version": "go1.20.3",
    "go-os": "linux",
    "go-arch": "amd64",
    "max-cpu-set": 2,
    "max-cpu-available": 2,
    "member-initialized": true,
    "name": "ip-172-31-10-102-40a8aa1b",
    "data-dir": "/var/lib/rancher/k3s/server/db/etcd-tmp",
    "wal-dir": "",
    "wal-dir-dedicated": "",
    "member-dir": "/var/lib/rancher/k3s/server/db/etcd-tmp/member",
    "force-new-cluster": true,
    "heartbeat-interval": "500ms",
    "election-timeout": "5s",
    "initial-election-tick-advance": true,
    "snapshot-count": 10000,
    "max-wals": 0,
    "max-snapshots": 0,
    "snapshot-catchup-entries": 5000,
    "initial-advertise-peer-urls": [
        "http://127.0.0.1:2400"
    ],
    "listen-peer-urls": [
        "http://127.0.0.1:2400"
    ],
    "advertise-client-urls": [
        "http://127.0.0.1:2399"
    ],
    "listen-client-urls": [
        "http://127.0.0.1:2399"
    ],
    "listen-metrics-urls": [],
    "cors": [
        "*"
    ],
    "host-whitelist": [
        "*"
    ],
    "initial-cluster": "",
    "initial-cluster-state": "new",
    "initial-cluster-token": "",
    "quota-backend-bytes": 2147483648,
    "max-request-bytes": 1572864,
    "max-concurrent-streams": 4294967295,
    "pre-vote": true,
    "initial-corrupt-check": true,
    "corrupt-check-time-interval": "0s",
    "compact-check-time-enabled": false,
    "compact-check-time-interval": "1m0s",
    "auto-compaction-mode": "",
    "auto-compaction-retention": "0s",
    "auto-compaction-interval": "0s",
    "discovery-url": "",
    "discovery-proxy": "",
    "downgrade-check-interval": "5s"
}

 

========================================================================================================================   



###########    FIX     ###########

 k3s version v1.27.1+k3s-607cbf0a (607cbf0a) (Target commit id version) Commit ID 324ecfc30da9465b13b4bf93ff90abe7adde1527
 
~$ k3s --version
v1.27.1+k3s-607cbf0a (607cbf0a)



$ sudo systemctl stop k3s

$ sudo k3s server --etcd-arg='--quota-backend-bytes=6000000000'

Found in logs! Value set.



level":"info","ts":"2023-05-10T18:39:02.683Z","caller":"etcdserver/quota.go:117","msg":"enabled backend quota","quota-name":"v3-applier","quota-size-bytes":6000000000,"quota-size":"6.0 GB"}


{
    "level": "info",
    "ts": "2023-05-10T18:39:02.487Z",
    "caller": "embed/etcd.go:306",
    "msg": "starting an etcd server",
    "etcd-version": "3.5.7",
    "git-sha": "Not provided (use ./build instead of go build)",
    "go-version": "go1.20.3",
    "go-os": "linux",
    "go-arch": "amd64",
    "max-cpu-set": 2,
    "max-cpu-available": 2,
    "member-initialized": true,
    "name": "ip-172-31-45-216-d09f3a59",
    "data-dir": "/var/lib/rancher/k3s/server/db/etcd-tmp",
    "wal-dir": "",
    "wal-dir-dedicated": "",
    "member-dir": "/var/lib/rancher/k3s/server/db/etcd-tmp/member",
    "force-new-cluster": true,
    "heartbeat-interval": "500ms",
    "election-timeout": "5s",
    "initial-election-tick-advance": true,
    "snapshot-count": 10000,
    "max-wals": 0,
    "max-snapshots": 0,
    "snapshot-catchup-entries": 5000,
    "initial-advertise-peer-urls": [
        "http://127.0.0.1:2400"
    ],
    "listen-peer-urls": [
        "http://127.0.0.1:2400"
    ],
    "advertise-client-urls": [
        "http://127.0.0.1:2399"
    ],
    "listen-client-urls": [
        "http://127.0.0.1:2399"
    ],
    "listen-metrics-urls": [],
    "cors": [
        "*"
    ],
    "host-whitelist": [
        "*"
    ],
    "initial-cluster": "",
    "initial-cluster-state": "new",
    "initial-cluster-token": "",
    "quota-backend-bytes": 6000000000,
    "max-request-bytes": 1572864,
    "max-concurrent-streams": 4294967295,
    "pre-vote": true,
    "initial-corrupt-check": true,
    "corrupt-check-time-interval": "0s",
    "compact-check-time-enabled": false,
    "compact-check-time-interval": "1m0s",
    "auto-compaction-mode": "",
    "auto-compaction-retention": "0s",
    "auto-compaction-interval": "0s",
    "discovery-url": "",
    "discovery-proxy": "",
    "downgrade-check-interval": "5s"
}




INFO[0003] Defragmenting etcd database                  

{"level":"info","ts":"2023-05-10T19:15:02.396Z","caller":"v3rpc/maintenance.go:90","msg":"starting defragment"}
{"level":"info","ts":"2023-05-10T19:15:02.400Z","caller":"backend/backend.go:497","msg":"defragmenting","path":"/var/lib/rancher/k3s/server/db/etcd-tmp/member/snap/db","current-db-size-bytes":10813440,"current-db-size":"11 MB","current-db-size-in-use-bytes":6897664,"current-db-size-in-use":"6.9 MB"}
{"level":"info","ts":"2023-05-10T19:15:02.588Z","caller":"backend/backend.go:549","msg":"finished defragmenting directory","path":"/var/lib/rancher/k3s/server/db/etcd-tmp/member/snap/db","current-db-size-bytes-diff":-4067328,"current-db-size-bytes":6746112,"current-db-size":"6.7 MB","current-db-size-in-use-bytes-diff":-167936,"current-db-size-in-use-bytes":6729728,"current-db-size-in-use":"6.7 MB","took":"192.053989ms"}
{"level":"info","ts":"2023-05-10T19:15:02.588Z","caller":"v3rpc/maintenance.go:96","msg":"finished defragment"}






  


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants