DNS resolution of `hosNetwork` pods (e.g. Restric Backup Addon) #1178

toschneck · 2020-11-30T21:34:30Z

What happened:
As using kubeone as seed cluster provisioner (on vSphere), we applied the restrict addon with the target to use the in cluster minio service minio.minio.svc.cluster.local. Unfortunately this didn't worked, because the in cluster DNS name didn't get resolved.

backup job yaml

apiVersion: v1
kind: Secret
metadata:
  name: s3-credentials
  namespace: kube-system
type: Opaque
data:
  AWS_ACCESS_KEY_ID: xxxxxxxxxxxxxx
  AWS_SECRET_ACCESS_KEY: xxxxxxxxxxxxxxxxxxxxxx
---
apiVersion: v1
kind: Secret
metadata:
  name: restic-config
  namespace: kube-system
type: Opaque
data:
  password: xxxxxxxxxxxxxxxxxxxxxxxx
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: etcd-s3-backup
  namespace: kube-system
spec:
  concurrencyPolicy: Forbid
  failedJobsHistoryLimit: 1
  schedule: '@every 30m'
  successfulJobsHistoryLimit: 0
  suspend: false
  jobTemplate:
    spec:
      template:
        spec:
          hostNetwork: true
          nodeSelector:
            node-role.kubernetes.io/master: ""
          tolerations:
          - key: node-role.kubernetes.io/master
            effect: NoSchedule
            operator: Exists
          restartPolicy: OnFailure
          volumes:
          - name: etcd-backup
            emptyDir: {}
          - name: host-pki
            hostPath:
              path: /etc/kubernetes/pki
          initContainers:
          - name: snapshoter
            image: {{ Registry "gcr.io" }}/etcd-development/etcd:v3.4.3
            imagePullPolicy: IfNotPresent
            command:
            - /bin/sh
            - -c
            - |-
              set -euf
              mkdir -p /backup/pki/kubernetes
              mkdir -p /backup/pki/etcd
              cp -a /etc/kubernetes/pki/etcd/ca.crt /backup/pki/etcd/
              cp -a /etc/kubernetes/pki/etcd/ca.key /backup/pki/etcd/
              cp -a /etc/kubernetes/pki/ca.crt /backup/pki/kubernetes
              cp -a /etc/kubernetes/pki/ca.key /backup/pki/kubernetes
              cp -a /etc/kubernetes/pki/front-proxy-ca.crt /backup/pki/kubernetes
              cp -a /etc/kubernetes/pki/front-proxy-ca.key /backup/pki/kubernetes
              cp -a /etc/kubernetes/pki/sa.key /backup/pki/kubernetes
              cp -a /etc/kubernetes/pki/sa.pub /backup/pki/kubernetes
              etcdctl snapshot save /backup/etcd-snapshot.db
            env:
            - name: ETCDCTL_API
              value: "3"
            - name: ETCDCTL_DIAL_TIMEOUT
              value: 3s
            - name: ETCDCTL_CACERT
              value: /etc/kubernetes/pki/etcd/ca.crt
            - name: ETCDCTL_CERT
              value: /etc/kubernetes/pki/etcd/healthcheck-client.crt
            - name: ETCDCTL_KEY
              value: /etc/kubernetes/pki/etcd/healthcheck-client.key
            - name: ETCD_HOSTNAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            volumeMounts:
            - mountPath: /backup
              name: etcd-backup
            - mountPath: /etc/kubernetes/pki
              name: host-pki
              readOnly: true
          containers:
          - name: uploader
            image: {{ Registry "docker.io" }}/restic/restic:0.9.6
            imagePullPolicy: IfNotPresent
            command:
            - /bin/sh
            - -c
            - |-
              set -euf
              restic snapshots -q || restic init -q
              restic backup --tag=etcd --host=${ETCD_HOSTNAME} /backup
              restic forget --prune --keep-last 48
            env:
            - name: ETCD_HOSTNAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: RESTIC_REPOSITORY
              value: "s3:http://minio.minio.svc.cluster.local:9000/kubermatic-etcd-backups"
            - name: RESTIC_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: restic-config
                  key: password
            - name: AWS_DEFAULT_REGION
              value: "<<AWS_DEFAULT_REGION>>"
            - name: AWS_ACCESS_KEY_ID
              valueFrom:
                secretKeyRef:
                  key: AWS_ACCESS_KEY_ID
                  name: s3-credentials
            - name: AWS_SECRET_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  key: AWS_SECRET_ACCESS_KEY
                  name: s3-credentials
            volumeMounts:
            - mountPath: /backup
              name: etcd-backup

Due to some debugging I find out that the /etc/resolv.conf of the job pod didn't contain the search domains. After some research, it seams that using the hostNetwork: true could somehow cause that DNS resolution doesn't work. Maybe also the combination with flannel is a issue. Some ref. upstream issues:

backup pod, without searchdomain:

# cat /etc/resolv.conf 
nameserver 10.2.0.1
search localdomain

normal POD

cat /etc/resolv.conf 
nameserver 169.254.20.10
search default.svc.cluster.local svc.cluster.local cluster.local localdomain
options ndots:5

As I don't think this a normal behavior, we should may investigate the DNS Resolution issue.

What is the expected behavior:

The provided backup addon should work with in cluster svc
In cluster DNS names should get resolved also for hostnetwork=true

How to reproduce the issue:

Deploy kubeone cluster, with kubermatic minio + restric backup addon, introduced with PR Add the backups addon #855
Cluster DNS resolution should wirk with hostNetwork=true

Anything else we need to know?
Issue happend in two different environments on vsphere. At my Lab setup (https://github.com/kubermatic-labs/kubermatic-demo/tree/master/vsphere) + customer

Information about the environment:
KubeOne version (kubeone version):

{
  "kubeone": {
    "major": "1",
    "minor": "1",
    "gitVersion": "v1.1.0",
    "gitCommit": "3e84d523a75cd178a7801a0fccf2b3195db3a376",
    "gitTreeState": "",
    "buildDate": "2020-11-17T23:13:15+01:00",
    "goVersion": "go1.15.2",
    "compiler": "gc",
    "platform": "linux/amd64"
  },
  "machine_controller": {
    "major": "1",
    "minor": "19",
    "gitVersion": "v1.19.0",
    "gitCommit": "",
    "gitTreeState": "",
    "buildDate": "",
    "goVersion": "",
    "compiler": "",
    "platform": "linux/amd64"
  }
}

Operating system: ubuntu
Provider you're deploying cluster on: vsphere
Operating system you're deploying on: ubuntu 18.04

workaround
For the backup location itself a the service IP of the minio svc could be used: kubectl get svc -n minio. Unfortunately this only as long stable as the service won't get redeployed

The text was updated successfully, but these errors were encountered:

xmudrii · 2020-11-30T21:39:29Z

@toschneck Can you try setting dnsPolicy on the pod to ClusterFirstWithHostNet? The Kubernetes docs state that dnsPolicy should be set to ClusterFirstWithHostNet if hostNetwork is set to true.

If that solves the problem, we can create a PR to add this to the manifest.

toschneck · 2020-12-01T17:53:08Z

will try it and let you know

toschneck · 2020-12-01T18:03:50Z

@xmudrii it seams to work:

k logs --all-containers test-backup-job-gdbp8 -f
{"level":"info","ts":1606845603.9692698,"caller":"snapshot/v3_snapshot.go:110","msg":"created temporary db file","path":"/backup/etcd-snapshot.db.part"}
{"level":"warn","ts":"2020-12-01T18:00:03.988Z","caller":"clientv3/retry_interceptor.go:116","msg":"retry stream intercept"}
{"level":"info","ts":1606845603.9892063,"caller":"snapshot/v3_snapshot.go:121","msg":"fetching snapshot","endpoint":"127.0.0.1:2379"}
{"level":"info","ts":1606845605.7280579,"caller":"snapshot/v3_snapshot.go:134","msg":"fetched snapshot","endpoint":"127.0.0.1:2379","took":1.758665093}
{"level":"info","ts":1606845605.7292426,"caller":"snapshot/v3_snapshot.go:143","msg":"saved","path":"/backup/etcd-snapshot.db"}
Snapshot saved at /backup/etcd-snapshot.db
Fatal: unable to open config file: Stat: The specified key does not exist.
Is there a repository at the following location?
s3:http://minio.minio.svc.cluster.local:9000/kubermatic-etcd-backups
created new cache in /root/.cache/restic

Files:           9 new,     0 changed,     0 unmodified
Dirs:            0 new,     0 changed,     0 unmodified
Added to the repo: 34.631 MiB

processed 9 files, 34.631 MiB in 0:03
snapshot af93cda6 saved
Applying Policy: keep the last 48 snapshots snapshots
keep 1 snapshots:
ID        Time                 Host                         Tags        Reasons        Paths
----------------------------------------------------------------------------------------------
af93cda6  2020-12-01 18:00:09  tobi-kubeone-vsphere-1-cp-1  etcd        last snapshot  /backup
----------------------------------------------------------------------------------------------
1 snapshots

Fix restrict backup addon according to: #1178 (comment) and https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-policy

toschneck added the kind/bug Categorizes issue or PR as related to a bug. label Nov 30, 2020

toschneck added a commit that referenced this issue Dec 1, 2020

fix dns-resolution for minio

a187aef

Fix restrict backup addon according to: #1178 (comment) and https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-policy

toschneck mentioned this issue Dec 1, 2020

fix dns-resolution for minio #1179

Merged

kubermatic-bot closed this as completed in #1179 Dec 1, 2020

kubermatic-bot pushed a commit that referenced this issue Dec 1, 2020

fix dns-resolution for minio (#1179)

aae881c

Fix restrict backup addon according to: #1178 (comment) and https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-policy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DNS resolution of `hosNetwork` pods (e.g. Restric Backup Addon) #1178

DNS resolution of `hosNetwork` pods (e.g. Restric Backup Addon) #1178

toschneck commented Nov 30, 2020

xmudrii commented Nov 30, 2020

toschneck commented Dec 1, 2020

toschneck commented Dec 1, 2020 •

edited

Loading

DNS resolution of hosNetwork pods (e.g. Restric Backup Addon) #1178

DNS resolution of hosNetwork pods (e.g. Restric Backup Addon) #1178

Comments

toschneck commented Nov 30, 2020

xmudrii commented Nov 30, 2020

toschneck commented Dec 1, 2020

toschneck commented Dec 1, 2020 • edited Loading

DNS resolution of `hosNetwork` pods (e.g. Restric Backup Addon) #1178

DNS resolution of `hosNetwork` pods (e.g. Restric Backup Addon) #1178

toschneck commented Dec 1, 2020 •

edited

Loading