[k3s-upgrade] k3s service failed to start after upgrade #5345

ac5tin · 2022-03-28T09:56:01Z

Environmental Info:
K3s Version:

k3s version v1.23.4+k3s1 (43b1cb48)
go version go1.17.5

Node(s) CPU architecture, OS, and Version:

5.4.0-1056-raspi #63-Ubuntu
aarch64 aarch64 aarch64 GNU/Linux

Distributor ID: Ubuntu
Description:    Ubuntu 20.04.4 LTS
Release:        20.04
Codename:       focal

Describe the bug:
I tried to upgrade the k3s version of my cluster (master node and worker nodes) by following this : k3s-upgrade

Steps To Reproduce:

kubectl apply -f https://github.com/rancher/system-upgrade-controller/master/manifests/system-upgrade-controller.yaml

# master nodes
kubectl label node <node-name> k3s-master-upgrade=true
# worker nodes
kubectl label node <node-name> k3s-worker-upgrade=true

# apply upgrade plan
kubectl apply -f agent.yml
kubectl apply -f server.yml

my plans:
server.yml

# Server plan
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: server-plan
  namespace: system-upgrade
spec:
  concurrency: 1
  cordon: true
  nodeSelector:
    matchExpressions:
    - key: k3s-master-upgrade
      operator: In
      values:
      - "true"
  serviceAccountName: system-upgrade
  upgrade:
    image: rancher/k3s-upgrade
  version: v1.23.4+k3s1

agent.yml

# Agent plan
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: agent-plan
  namespace: system-upgrade
spec:
  concurrency: 1
  cordon: true
  nodeSelector:
    matchExpressions:
    - key: k3s-worker-upgrade
      operator: In
      values:
      - "true"
  prepare:
    args:
    - prepare
    - server-plan
    image: rancher/k3s-upgrade
  serviceAccountName: system-upgrade
  upgrade:
    image: rancher/k3s-upgrade
  version: v1.23.4+k3s1

Expected behavior:
All nodes to upgrade successfully to k3s version 1.23.4+k3s1

Actual behavior:
master node k3s updated the k3s binary on the machine but failed to start the service

Additional context / logs:

Mar 28 09:25:54 huey sh[3502]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
Mar 28 09:25:54 huey sh[3508]: Failed to get unit file state for nm-cloud-setup.service: No such file or directory
Mar 28 09:25:55 huey k3s[799]: time="2022-03-28T09:25:55Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": EOF"
Mar 28 09:25:56 huey k3s[3517]: time="2022-03-28T09:25:56Z" level=info msg="Starting k3s v1.23.4+k3s1 (43b1cb48)"
Mar 28 09:25:56 huey k3s[3517]: time="2022-03-28T09:25:56Z" level=info msg="Configuring sqlite3 database connection pooling: maxIdleConns=2, maxOpenConns=0, connMaxLifetime=0s"
Mar 28 09:25:56 huey k3s[3517]: time="2022-03-28T09:25:56Z" level=info msg="Configuring database table schema and indexes, this may take a moment..."
Mar 28 09:25:56 huey k3s[3517]: time="2022-03-28T09:25:56Z" level=info msg="Database tables and indexes are up to date"
Mar 28 09:25:56 huey k3s[3517]: time="2022-03-28T09:25:56Z" level=info msg="Kine available at unix://kine.sock"
Mar 28 09:25:56 huey k3s[3517]: time="2022-03-28T09:25:56Z" level=fatal msg="starting kubernetes: preparing server: failed to normalize token; must be in format K10<CA-HASH>::<USERNAME>:<PASSWORD> or <PASS>
Mar 28 09:25:56 huey systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE

The text was updated successfully, but these errors were encountered:

brandond · 2022-03-28T17:00:17Z

Mar 28 09:25:56 huey k3s[3517]: time="2022-03-28T09:25:56Z" level=fatal msg="starting kubernetes: preparing server: failed to normalize token; must be in format K10::: or

It looks like the --token value in your config file or systemd unit is in an invalid format. How have you specified it?

ac5tin · 2022-04-07T08:26:45Z

i haven't changed the the config file. Not sure if it got modified by the update process?
I had to completely uninstall k3s and reinstall from scratch

vvanouytsel · 2022-08-17T14:49:58Z

@brandond
Is there any way to figure out what the token should be in case it got removed in the k3s/server/token file?

brandond · 2022-08-17T16:25:38Z

No, if you were not manually configuring the token, and all nodes with a copy of the token file have been lost, there is no way to recover the value with only a copy of the datastore.

vvanouytsel · 2022-08-18T12:34:58Z

No, if you were not manually configuring the token, and all nodes with a copy of the token file have been lost, there is no way to recover the value with only a copy of the datastore.

Is it also stored in etcd (or sqlite by default on k3s)?

brandond · 2022-08-19T17:21:45Z

The bootstrap data (cluster CA certificates and such) are stored in the datastore, encrypted with the token as the key generation passphrase. The token value cannot be extracted from the datastore; that would render the encryption meaningless.

vvanouytsel · 2022-08-22T08:50:52Z

I deleted the k3s/server/token file from the filesystem and restarted the k3s systemd service. In my case k3s was able to restore the contents of that file.

brandond · 2022-08-22T18:06:26Z

If you delete that file but the token is not specified elsewhere (in the config or on the CLI), then a new one will be generated on startup. This is most likely fine on single-server clusters, but it will cause problems when using etcd or an external SQL datastore.

vvanouytsel · 2022-08-22T18:15:41Z

I am indeed running a single-server cluster. Thanks for your explanation!

bramnet · 2022-08-23T01:05:15Z

What about multi-node clusters? I ran into this issue while trying to upgrade an agent node from 1.22.6+k3s1 to the latest. Can I just grab the token from another node and force inject it during the upgrade? The weirdest part is that it's communicating with the cluster just fine.

brandond · 2022-08-23T03:37:44Z

@bramnet this issue has wandered a bit; I may need to lock it so that folks can open their own issues describing their individual problems. What is the exact message you're getting?

bramnet · 2022-08-23T04:44:17Z

I was just trying again to reproduce it, and suddenly it’s saying the node is up to date… not sure what happened here.
All I remember is that it was very similar to what ac5tin had in the 2nd to last line in their logs: level=fatal msg="starting kubernetes: preparing server: failed to normalize token; must be in format K10::: or
What’s also weird is Rancher isn’t reflecting they’re up to date… I’ll have to look into that.

RaphaelKimmig · 2022-11-02T07:45:55Z

I'm having the same issue on a single node cluster. I noticed that /var/lib/rancher/k3s/server/token has recently been written and is now empty.

ryan4yin · 2022-11-10T18:26:36Z

same here using single master mode, vesrion v1.25.3+k3s1, I resolved this by delete the empty file /var/lib/rancher/k3s/server/token

brandond · 2023-06-13T17:17:54Z

I'm not aware of any paths in the k3s code that would cause it to write an empty token file. If anyone else runs into this, and can confirm that they are not using any automation or scripting to manage the content of that file, please open a new issue with steps that can help us reproduce this.

brandond closed this as completed Jun 13, 2023

k3s-io locked and limited conversation to collaborators Jun 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[k3s-upgrade] k3s service failed to start after upgrade #5345

[k3s-upgrade] k3s service failed to start after upgrade #5345

ac5tin commented Mar 28, 2022

brandond commented Mar 28, 2022

ac5tin commented Apr 7, 2022

vvanouytsel commented Aug 17, 2022 •

edited

Loading

brandond commented Aug 17, 2022

vvanouytsel commented Aug 18, 2022

brandond commented Aug 19, 2022

vvanouytsel commented Aug 22, 2022

brandond commented Aug 22, 2022

vvanouytsel commented Aug 22, 2022

bramnet commented Aug 23, 2022

brandond commented Aug 23, 2022 •

edited

Loading

bramnet commented Aug 23, 2022

RaphaelKimmig commented Nov 2, 2022

ryan4yin commented Nov 10, 2022 •

edited

Loading

brandond commented Jun 13, 2023

[k3s-upgrade] k3s service failed to start after upgrade #5345

[k3s-upgrade] k3s service failed to start after upgrade #5345

Comments

ac5tin commented Mar 28, 2022

brandond commented Mar 28, 2022

ac5tin commented Apr 7, 2022

vvanouytsel commented Aug 17, 2022 • edited Loading

brandond commented Aug 17, 2022

vvanouytsel commented Aug 18, 2022

brandond commented Aug 19, 2022

vvanouytsel commented Aug 22, 2022

brandond commented Aug 22, 2022

vvanouytsel commented Aug 22, 2022

bramnet commented Aug 23, 2022

brandond commented Aug 23, 2022 • edited Loading

bramnet commented Aug 23, 2022

RaphaelKimmig commented Nov 2, 2022

ryan4yin commented Nov 10, 2022 • edited Loading

brandond commented Jun 13, 2023

vvanouytsel commented Aug 17, 2022 •

edited

Loading

brandond commented Aug 23, 2022 •

edited

Loading

ryan4yin commented Nov 10, 2022 •

edited

Loading