Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.26.6+k3s1 on nix23.05] Traefik pods crashing due to Helm 3.9.0 needed #8831

Closed
senpro-ingwersenk opened this issue Nov 14, 2023 · 5 comments

Comments

@senpro-ingwersenk
Copy link

Environmental Info:
K3s Version:

# k3s -v
k3s version v1.26.6+k3s1 (3b1919b0)
go version go1.20.8

Nix OS:

# cat /etc/os-release
BUG_REPORT_URL="https://github.com/NixOS/nixpkgs/issues"
BUILD_ID="23.05.4808.da4024d0ead5"
DOCUMENTATION_URL="https://nixos.org/learn.html"
HOME_URL="https://nixos.org/"
ID=nixos
LOGO="nix-snowflake"
NAME=NixOS
PRETTY_NAME="NixOS 23.05 (Stoat)"
SUPPORT_END="2023-12-31"
SUPPORT_URL="https://nixos.org/community.html"
VERSION="23.05 (Stoat)"
VERSION_CODENAME=stoat
VERSION_ID="23.05"

Node(s) CPU architecture, OS, and Version:

// 3 Nodes:
Linux senst-sv-k3s01 6.1.62 #1-NixOS SMP PREEMPT_DYNAMIC Wed Nov  8 13:11:05 UTC 2023 x86_64 GNU/Linux
Linux senst-sv-k3s02 6.1.62 #1-NixOS SMP PREEMPT_DYNAMIC Wed Nov  8 13:11:05 UTC 2023 x86_64 GNU/Linux
Linux senst-sv-k3s03 5.15.119 #1-NixOS SMP Wed Jun 28 08:29:53 UTC 2023 x86_64 GNU/Linux

Cluster Configuration:
2 Servers, 1 Agent

Describe the bug:
When k3s attempts to auto-update the integrated Traefik, this happens:

# kubectl logs -n kube-system job/helm-install-traefik
if [[ ${KUBERNETES_SERVICE_HOST} =~ .*:.* ]]; then
        echo "KUBERNETES_SERVICE_HOST is using IPv6"
        CHART="${CHART//%\{KUBERNETES_API\}%/[${KUBERNETES_SERVICE_HOST}]:${KUBERNETES_SERVICE_PORT}}"
else
        CHART="${CHART//%\{KUBERNETES_API\}%/${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}}"
fi

set +v -x
+ [[ '' != \t\r\u\e ]]
+ export HELM_HOST=127.0.0.1:44134
+ HELM_HOST=127.0.0.1:44134
+ helm_v2 init --skip-refresh --client-only --stable-repo-url https://charts.helm.sh/stable/
+ tiller --listen=127.0.0.1:44134 --storage=secret
Creating /home/klipper-helm/.helm
Creating /home/klipper-helm/.helm/repository
Creating /home/klipper-helm/.helm/repository/cache
Creating /home/klipper-helm/.helm/repository/local
Creating /home/klipper-helm/.helm/plugins
Creating /home/klipper-helm/.helm/starters
Creating /home/klipper-helm/.helm/cache/archive
Creating /home/klipper-helm/.helm/repository/repositories.yaml
Adding stable repo with URL: https://charts.helm.sh/stable/
Adding local repo with URL: http://127.0.0.1:8879/charts
$HELM_HOME has been configured at /home/klipper-helm/.helm.
Not installing Tiller due to 'client-only' flag having been set
++ timeout -s KILL 30 helm_v2 ls --all '^traefik$' --output json
++ jq -r '.Releases | length'
[main] 2023/11/14 14:35:47 Starting Tiller v2.17.0 (tls=false)
[main] 2023/11/14 14:35:47 GRPC listening on 127.0.0.1:44134
[main] 2023/11/14 14:35:47 Probes listening on :44135
[main] 2023/11/14 14:35:47 Storage driver is Secret
[main] 2023/11/14 14:35:47 Max history per release is 0
[storage] 2023/11/14 14:35:47 listing all releases with filter
+ V2_CHART_EXISTS=
+ [[ '' == \1 ]]
+ [[ '' == \v\2 ]]
+ [[ -f /config/ca-file.pem ]]
+ [[ -n '' ]]
+ shopt -s nullglob
+ helm_content_decode
+ set -e
+ ENC_CHART_PATH=/chart/traefik.tgz.base64
+ CHART_PATH=/tmp/traefik.tgz
+ [[ ! -f /chart/traefik.tgz.base64 ]]
+ return
+ [[ install != \d\e\l\e\t\e ]]
+ helm_repo_init
+ grep -q -e 'https\?://'
+ echo 'chart path is a url, skipping repo update'
+ helm_v3 repo remove stable
chart path is a url, skipping repo update
Error: no repositories configured
+ true
+ return
+ helm_update install --set-string global.systemDefaultRegistry=
+ [[ helm_v3 == \h\e\l\m\_\v\3 ]]
++ helm_v3 ls --all -f '^traefik$' --namespace kube-system --output json
++ jq -r '"\(.[0].app_version),\(.[0].status)"'
++ tr '[:upper:]' '[:lower:]'
+ LINE=2.9.1,deployed
+ IFS=,
+ read -r INSTALLED_VERSION STATUS _
+ VALUES=
+ for VALUES_FILE in /config/*.yaml
+ VALUES=' --values /config/values-01_HelmChart.yaml'
+ for VALUES_FILE in /config/*.yaml
+ VALUES=' --values /config/values-01_HelmChart.yaml --values /config/values-10_HelmChartConfig.yaml'
+ [[ install = \d\e\l\e\t\e ]]
+ [[ 2.9.1 =~ ^(|null)$ ]]
+ [[ deployed =~ ^(pending-install|pending-upgrade|pending-rollback)$ ]]
+ [[ deployed == \d\e\p\l\o\y\e\d ]]
Already installed traefik
+ echo 'Already installed traefik'
+ [[ helm_v3 == \h\e\l\m\_\v\3 ]]
+ helm_v3 mapkubeapis traefik --namespace kube-system
2023/11/14 14:35:47 Release 'traefik' will be checked for deprecated or removed Kubernetes APIs and will be updated if necessary to supported API versions.
2023/11/14 14:35:47 Get release 'traefik' latest version.
2023/11/14 14:35:47 Check release 'traefik' for deprecated or removed APIs...
2023/11/14 14:35:47 Finished checking release 'traefik' for deprecated or removed APIs.
2023/11/14 14:35:47 Release 'traefik' has no deprecated or removed APIs.
2023/11/14 14:35:47 Map of release 'traefik' deprecated or removed APIs to supported versions, completed successfully.
+ echo 'Upgrading helm_v3 chart'
Upgrading traefik
+ echo 'Upgrading traefik'
+ shift 1
+ helm_v3 upgrade --set-string global.systemDefaultRegistry= traefik https://10.43.0.1:443/static/charts/traefik-21.2.1+up21.2.0.tgz --values /config/values-01_HelmChart.yaml --values /config/values-10_HelmChartConfig.yaml
Error: UPGRADE FAILED: execution error at (traefik/templates/deployment.yaml:3:8): ERROR: Helm >= 3.9.0 is required

Steps To Reproduce:

  • Installed K3s: via NixOS configuration with company internal deriviations used to add token and IP ranges. No k3s version is specifically specified and used verbatim from the nix channels - thus 1.26.6+k3s1

I am not the original cluster admin. Our company has recently lost their clusteradmin and as I was the most capable to work with linux, I have since spent hours reading the Kubernetes documentation whilst experimenting on a private cluster at home, in an attempt to migrate my Docker Compose configuration as well as some system services to their apropriate Kubernetes variants. So, my experience with both Kubernetes and k3s is very little and mostly derived from reading the "Concepts" chapter and my past experience with Docker, Docker Compose, systemd and friends. Hence the information I provide here are as far as I can go... sorry!

What I do know, however, is that a HelmChartConfig was created. I have snipped out details that seem much more like secrets to me (I should convert them into actual Secrets asap).

apiVersion: v1
kind: PersistentVolumeClaim

metadata:
  name: traefik
  namespace: kube-system
spec:
  storageClassName: nfs-csi
  accessModes:
  - ReadWriteMany
  - ReadWriteOnce
  resources:
    requests:
      storage: 256Mi
---
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: traefik
  namespace: kube-system
spec:
  valuesContent: |-
    logs:
      level: INFO
      access:
        enabled: true

    deployment:
      enabled: true
      replicas: 2
      initContainers:
        - name: volume-permissions
          image: busybox:latest
          command: ["sh", "-c", "touch /data/acme.json; chmod -v 600 /data/acme.json"]
          securityContext:
            runAsNonRoot: true
            runAsGroup: 65532
            runAsUser: 65532
          volumeMounts:
            - name: data
              mountPath: /data

    ports:
      web:
        redirectTo: websecure
      websecure:
        tls:
          certResolver: "letsEncrypt"

    persistence:
      enabled: true
      existingClaim: traefik
      accessMode: ReadWriteOnce
      size: 256Mi
      path: /data

    providers:
      kubernetesCRD:
        enabled: true
        namespaces: []
    kubernetesIngress:
      enabled: true
      namespaces: []
      publishedService:
        enabled: true

    rbac:
      enabled: true

    service:
      enabled: true
      type: LoadBalancer
      spec:
        loadBalancerIP: "192.168.40.51"

    certResolvers:
      letsEncrypt:
        dnsChallenge:
          provider: <snip>
          delayBeforeCheck: 30
          resolvers:
            - 1.1.1.1
            - 8.8.8.8
            - 9.9.9.9
        storage: /data/acme.json

    updateStrategy:
      type: RollingUpdate
      rollingUpdate:
        maxUnavailable: 1

    env:
      # Values for DNS challenge provider

    additionalArguments:
      - --providers.kubernetescrd.allowCrossNamespace=true

Expected behavior:
I expected Traefik to just stay as it was, as it is a built-in resource. So it should not change, or try to change.

Actual behavior:
When I got to work this morning, it was in a constant backoff state since the "update" had failed.

Additional context / logs:
I already added the logs above, not sure what else I could add here. What I can add though, is that I have looked at the NixOS version supplied in 23.05, read the documentation on customizing built-in components, looked at the built-in HelmChart (which uses the tag 2.9.10, by the way) and consulted with all locally available resources. Long story short, I have no idea what is actually wrong here...

Thank you and kind regards!

@brandond
Copy link
Member

brandond commented Nov 15, 2023

Can you confirm that all of your nodes are running the same release of K3s? I'm not sure how you would get in a state where it's trying to install the traefik chart using an old version of the klipper-helm image, unless for some reason one of your servers was still on an older version of K3s. Make sure they're both on the latest available 1.26 release, and the issue should resolve itself.

@senpro-ingwersenk
Copy link
Author

Good point, I only checked the version on one node. Here:

> ssh senst-sv-k3s01 k3s --version
k3s version v1.26.6+k3s1 (3b1919b0)
go version go1.20.8
> ssh senst-sv-k3s02 k3s --version
k3s version v1.26.6+k3s1 (3b1919b0)
go version go1.20.8
> ssh senst-sv-k3s03 k3s --version
k3s version v1.25.3+k3s1 (f2585c16)
go version go1.19.9

Jackpot. I will update the third node and see if that changes the behaviour. Hadn't even thought of that...

@senpro-ingwersenk
Copy link
Author

Yep, that was at least part of it. Now the message is this:

+ helm_v3 upgrade --set-string global.systemDefaultRegistry= traefik https://10.43.0.1:443/static/charts/traefik-21.2.1+up21.2.0.tgz --values /config/values-01_HelmChart.yaml --values /config/values-10_HelmChartConfig.yaml
Error: UPGRADE FAILED: rendered manifests contain a resource that already exists. Unable to continue with update: IngressRoute "traefik-dashboard" in namespace "kube-system" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "traefik"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "kube-system"

Do I have to specify the requested namespace as an annotation in the HelmChartConfig? Alternatively, I wouldn't be surprised if the update left the cluster a little confused - so maybe it'd be nice to just reapply the config entirely, making sure all three nodes are up to sync. Whilst k3s kubectl replace seems like an option, I am not sure if that is what I am looking for...

Got an idea?

@brandond
Copy link
Member

brandond commented Nov 16, 2023

Just add the missing annotations to that resource. You don't need to do anything to the chart.

@senpro-ingwersenk
Copy link
Author

I deleted the errornous object after dumping it to disk (kubectl get ... -o yaml > dashboard.yaml) and waited a little. It came back, installed itself, and is now ready to be used.

Thanks for all your help! Much appreciated. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

2 participants