Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EOF error in cdi-apiserver #2695

Closed
ikyrycho opened this issue Apr 17, 2023 · 2 comments
Closed

EOF error in cdi-apiserver #2695

ikyrycho opened this issue Apr 17, 2023 · 2 comments
Labels

Comments

@ikyrycho
Copy link

ikyrycho commented Apr 17, 2023

What happened:
After installation on both K8s/K3s cdi-apiserver throws EOF error:

$ kubectl logs cdi-apiserver-bcfd79644-hhscs
I0417 09:32:58.785492       1 apiserver.go:83] Note: increase the -v level in the api deployment for more detailed logging, eg. -v=2 or -v=3
W0417 09:32:58.785599       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0417 09:32:58.990430       1 certwatcher.go:125] Updated current TLS certificate
I0417 09:32:58.996613       1 certwatcher.go:81] Starting certificate watcher
2023/04/17 09:33:45 http: TLS handshake error from x.y.z.v:57936: EOF
2023/04/17 09:34:45 http: TLS handshake error from x.y.z.v:57676: EOF
2023/04/17 09:35:45 http: TLS handshake error from x.y.z.v:53806: EOF
2023/04/17 09:40:45 http: TLS handshake error from x.y.z.v:40276: EOF
2023/04/17 09:41:45 http: TLS handshake error from x.y.z.v:34194: EOF
2023/04/17 11:32:48 http: TLS handshake error from x.y.z.v:49062: EOF

After 24-hours error changes to

kubectl logs cdi-apiserver-bcfd79644-12345
...
, Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]
I0417 07:51:12.517981       1 controller.go:129] OpenAPI AggregationController: action for item v1beta1.upload.cdi.kubevirt.io: Rate Limited Requeue.
E0417 07:51:12.531770       1 controller.go:113] loading OpenAPI spec for "v1alpha1.upload.cdi.kubevirt.io" failed with: Error, could not get list of group versions for APIService
I0417 07:51:12.531798       1 controller.go:126] OpenAPI AggregationController: action for item v1alpha1.upload.cdi.kubevirt.io: Rate Limited Requeue.
E0417 07:51:12.536534       1 controller.go:116] loading OpenAPI spec for "v1alpha1.upload.cdi.kubevirt.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: error trying to reach service: x509: certificate has expired or is not yet valid: current time 2023-04-17T07:51:12Z is after 2023-04-15T13:31:20Z
, Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]
I0417 07:51:12.536565       1 controller.go:129] OpenAPI AggregationController: action for item v1alpha1.upload.cdi.kubevirt.io: Rate Limited Requeue.
E0417 07:53:12.483848       1 controller.go:113] loading OpenAPI spec for "v1beta1.upload.cdi.kubevirt.io" failed with: Error, could not get list of group versions for APIService
I0417 07:53:12.483878       1 controller.go:126] OpenAPI AggregationController: action for item v1beta1.upload.cdi.kubevirt.io: Rate Limited Requeue.
E0417 07:53:12.484976       1 controller.go:116] loading OpenAPI spec for "v1beta1.upload.cdi.kubevirt.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: error trying to reach service: x509: certificate has expired or is not yet valid: current time 2023-04-17T07:53:12Z is after 2023-04-15T13:31:20Z

Second error prevents us from using CDI, because CDI resources are not visible on API (which returns same error) and slows down API calls, because kubectl retries failing calls to API.

$ kubectl get all -v6
...
I0417 15:15:57.721093 3380376 shortcut.go:89] Error loading discovery information: unable to retrieve the complete list of server APIs: upload.cdi.kubevirt.io/v1alpha1: the server is currently unable to handle the request, upload.cdi.kubevirt.io/v1beta1: the server is currently unable to handle the request
I0417 15:15:57.748883 3380376 round_trippers.go:454] GET https://x.y.z.v.sslip.io/k8s/clusters/c-m-7n4cmkld/apis/upload.cdi.kubevirt.io/v1beta1?timeout=32s 503 Service Unavailable in 24 milliseconds
I0417 15:15:57.749007 3380376 request.go:1347] body was not decodable (unable to check for Status): yaml: mapping values are not allowed in this context
I0417 15:15:57.749021 3380376 cached_discovery.go:78] skipped caching discovery info due to the server is currently unable to handle the request
I0417 15:15:57.756061 3380376 round_trippers.go:454] GET https://x.y.z.v.sslip.io/k8s/clusters/c-m-7n4cmkld/apis/upload.cdi.kubevirt.io/v1alpha1?timeout=32s 503 Service Unavailable in 31 milliseconds
I0417 15:15:57.756151 3380376 request.go:1347] body was not decodable (unable to check for Status): yaml: mapping values are not allowed in this context
I0417 15:15:57.756165 3380376 cached_discovery.go:78] skipped caching discovery info due to the server is currently unable to handle the request
I0417 15:15:57.756201 3380376 shortcut.go:89] Error loading discovery information: unable to retrieve the complete list of server APIs: upload.cdi.kubevirt.io/v1alpha1: the server is currently unable to handle the request, upload.cdi.kubevirt.io/v1beta1: the server is currently unable to handle the request
I0417 15:15:57.832953 3380376 request.go:600] Waited for 75.840551ms due to client-side throttling, not priority and fairness, request: GET:https://x.y.z.v.sslip.io/k8s/clusters/c-m-7n4cmkld/apis/upload.cdi.kubevirt.io/v1beta1?timeout=32s
I0417 15:15:57.861781 3380376 round_trippers.go:454] GET https://x.y.z.v.sslip.io/k8s/clusters/c-m-7n4cmkld/apis/upload.cdi.kubevirt.io/v1beta1?timeout=32s 503 Service Unavailable in 28 milliseconds

CDI works correctly for first 24 hours after installation.

What you expected to happen:
CDI client should be able to connect to k8s-api-server (it's my assumption that it is tries to connect to it) without EOF error and should be able to renew certificate after 24-hours

How to reproduce it (as minimally and precisely as possible):

  1. Install K8s or K3s
  2. Install CDI
    3a) Check logs from cdi-apiserver after ~15 minutes
    3b) Wait for 24 hours and try to run kubectl get all -v 6 command

Additional context:
There are several similar reported EOF issues in K8s/Golang (kubernetes/kubernetes#109022 and golang/go#50984) that could be the root cause for this one.

Environment:

  • CDI version (use kubectl get deployments cdi-deployment -o yaml):
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "2"
    operator.cdi.kubevirt.io/lastAppliedConfiguration: '{"kind":"Deployment","apiVersion":"apps/v1","metadata":{"name":"cdi-deployment","namespace":"cdi","creationTimestamp":null,"labels":{"app":"containerized-data-importer","app.kubernetes.io/component":"storage","app.kubernetes.io/managed-by":"cdi-operator","cdi.kubevirt.io":"","prometheus.cdi.kubevirt.io":"true"}},"spec":{"replicas":1,"selector":{"matchLabels":{"app":"containerized-data-importer"}},"template":{"metadata":{"creationTimestamp":null,"labels":{"app":"containerized-data-importer","app.kubernetes.io/component":"storage","app.kubernetes.io/managed-by":"cdi-operator","cdi.kubevirt.io":"","prometheus.cdi.kubevirt.io":"true"}},"spec":{"volumes":[{"name":"cdi-api-signing-key","secret":{"secretName":"cdi-api-signing-key","items":[{"key":"id_rsa.pub","path":"id_rsa.pub"},{"key":"id_rsa","path":"id_rsa"}],"defaultMode":420}},{"name":"uploadserver-ca-cert","secret":{"secretName":"cdi-uploadserver-signer","items":[{"key":"tls.crt","path":"tls.crt"},{"key":"tls.key","path":"tls.key"}],"defaultMode":420}},{"name":"uploadserver-client-ca-cert","secret":{"secretName":"cdi-uploadserver-client-signer","items":[{"key":"tls.crt","path":"tls.crt"},{"key":"tls.key","path":"tls.key"}],"defaultMode":420}},{"name":"uploadserver-ca-bundle","configMap":{"name":"cdi-uploadserver-signer-bundle","items":[{"key":"ca-bundle.crt","path":"ca-bundle.crt"}],"defaultMode":420}},{"name":"uploadserver-client-ca-bundle","configMap":{"name":"cdi-uploadserver-client-signer-bundle","items":[{"key":"ca-bundle.crt","path":"ca-bundle.crt"}],"defaultMode":420}}],"containers":[{"name":"cdi-controller","image":"quay.io/kubevirt/cdi-controller:v1.55.2","args":["-v=1"],"ports":[{"name":"metrics","containerPort":8080,"protocol":"TCP"}],"env":[{"name":"IMPORTER_IMAGE","value":"quay.io/kubevirt/cdi-importer:v1.55.2"},{"name":"CLONER_IMAGE","value":"quay.io/kubevirt/cdi-cloner:v1.55.2"},{"name":"UPLOADSERVER_IMAGE","value":"quay.io/kubevirt/cdi-uploadserver:v1.55.2"},{"name":"UPLOADPROXY_SERVICE","value":"cdi-uploadproxy"},{"name":"PULL_POLICY","value":"IfNotPresent"},{"name":"INSTALLER_PART_OF_LABEL","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.labels[''app.kubernetes.io/part-of'']"}}},{"name":"INSTALLER_VERSION_LABEL","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.labels[''app.kubernetes.io/version'']"}}}],"resources":{"requests":{"cpu":"10m","memory":"150Mi"}},"volumeMounts":[{"name":"cdi-api-signing-key","mountPath":"/var/run/cdi/token/keys"},{"name":"uploadserver-ca-cert","mountPath":"/var/run/certs/cdi-uploadserver-signer"},{"name":"uploadserver-client-ca-cert","mountPath":"/var/run/certs/cdi-uploadserver-client-signer"},{"name":"uploadserver-ca-bundle","mountPath":"/var/run/ca-bundle/cdi-uploadserver-signer-bundle"},{"name":"uploadserver-client-ca-bundle","mountPath":"/var/run/ca-bundle/cdi-uploadserver-client-signer-bundle"}],"readinessProbe":{"exec":{"command":["cat","/tmp/ready"]},"initialDelaySeconds":2,"timeoutSeconds":1,"periodSeconds":5,"successThreshold":1,"failureThreshold":3},"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","imagePullPolicy":"IfNotPresent","securityContext":{"capabilities":{"drop":["ALL"]},"runAsNonRoot":true,"allowPrivilegeEscalation":false,"seccompProfile":{"type":"RuntimeDefault"}}}],"nodeSelector":{"kubernetes.io/os":"linux"},"serviceAccountName":"cdi-sa","securityContext":{"runAsNonRoot":true},"tolerations":[{"key":"CriticalAddonsOnly","operator":"Exists"}],"priorityClassName":"kubevirt-cluster-critical"}},"strategy":{}},"status":{}}'
  creationTimestamp: "2023-04-14T11:46:12Z"
  generation: 2
  labels:
    app: containerized-data-importer
    app.kubernetes.io/component: storage
    app.kubernetes.io/managed-by: cdi-operator
    cdi.kubevirt.io: ""
    operator.cdi.kubevirt.io/createVersion: v1.55.2
    operator.cdi.kubevirt.io/updateVersion: v1.55.2
    prometheus.cdi.kubevirt.io: "true"
  name: cdi-deployment
  namespace: cdi
  ownerReferences:
  - apiVersion: cdi.kubevirt.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: CDI
    name: cdi
    uid: fe570c47-9d70-4def-9e75-91941e2d989b
  resourceVersion: "5123"
  uid: 0826e6e3-6bfc-447d-bb6c-657c1eef97db
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: containerized-data-importer
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: containerized-data-importer
        app.kubernetes.io/component: storage
        app.kubernetes.io/managed-by: cdi-operator
        cdi.kubevirt.io: ""
        operator.cdi.kubevirt.io/createVersion: v1.55.2
        prometheus.cdi.kubevirt.io: "true"
    spec:
      containers:
      - args:
        - -v=1
        env:
        - name: IMPORTER_IMAGE
          value: quay.io/kubevirt/cdi-importer:v1.55.2
        - name: CLONER_IMAGE
          value: quay.io/kubevirt/cdi-cloner:v1.55.2
        - name: UPLOADSERVER_IMAGE
          value: quay.io/kubevirt/cdi-uploadserver:v1.55.2
        - name: UPLOADPROXY_SERVICE
          value: cdi-uploadproxy
        - name: PULL_POLICY
          value: IfNotPresent
        - name: INSTALLER_PART_OF_LABEL
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.labels['app.kubernetes.io/part-of']
        - name: INSTALLER_VERSION_LABEL
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.labels['app.kubernetes.io/version']
        image: quay.io/kubevirt/cdi-controller:v1.55.2
        imagePullPolicy: IfNotPresent
        name: cdi-controller
        ports:
        - containerPort: 8080
          name: metrics
          protocol: TCP
        readinessProbe:
          exec:
            command:
            - cat
            - /tmp/ready
          failureThreshold: 3
          initialDelaySeconds: 2
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          requests:
            cpu: 10m
            memory: 150Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          runAsNonRoot: true
          seccompProfile:
            type: RuntimeDefault
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/run/cdi/token/keys
          name: cdi-api-signing-key
        - mountPath: /var/run/certs/cdi-uploadserver-signer
          name: uploadserver-ca-cert
        - mountPath: /var/run/certs/cdi-uploadserver-client-signer
          name: uploadserver-client-ca-cert
        - mountPath: /var/run/ca-bundle/cdi-uploadserver-signer-bundle
          name: uploadserver-ca-bundle
        - mountPath: /var/run/ca-bundle/cdi-uploadserver-client-signer-bundle
          name: uploadserver-client-ca-bundle
      dnsPolicy: ClusterFirst
      nodeSelector:
        kubernetes.io/os: linux
      priorityClassName: kubevirt-cluster-critical
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        runAsNonRoot: true
      serviceAccount: cdi-sa
      serviceAccountName: cdi-sa
      terminationGracePeriodSeconds: 30
      tolerations:
      - key: CriticalAddonsOnly
        operator: Exists
      volumes:
      - name: cdi-api-signing-key
        secret:
          defaultMode: 420
          items:
          - key: id_rsa.pub
            path: id_rsa.pub
          - key: id_rsa
            path: id_rsa
          secretName: cdi-api-signing-key
      - name: uploadserver-ca-cert
        secret:
          defaultMode: 420
          items:
          - key: tls.crt
            path: tls.crt
          - key: tls.key
            path: tls.key
          secretName: cdi-uploadserver-signer
      - name: uploadserver-client-ca-cert
        secret:
          defaultMode: 420
          items:
          - key: tls.crt
            path: tls.crt
          - key: tls.key
            path: tls.key
          secretName: cdi-uploadserver-client-signer
      - configMap:
          defaultMode: 420
          items:
          - key: ca-bundle.crt
            path: ca-bundle.crt
          name: cdi-uploadserver-signer-bundle
        name: uploadserver-ca-bundle
      - configMap:
          defaultMode: 420
          items:
          - key: ca-bundle.crt
            path: ca-bundle.crt
          name: cdi-uploadserver-client-signer-bundle
        name: uploadserver-client-ca-bundle
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2023-04-14T11:46:46Z"
    lastUpdateTime: "2023-04-14T11:46:46Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2023-04-14T11:46:12Z"
    lastUpdateTime: "2023-04-14T11:46:46Z"
    message: ReplicaSet "cdi-deployment-7cd7bcc6f4" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 2
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1
  • Kubernetes version (use kubectl version):
    K3s:
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.0", GitCommit:"1b4df30b3cdfeaba6024e81e559a6cd09a089d65", GitTreeState:"clean", BuildDate:"2023-04-11T17:10:18Z", GoVersion:"go1.20.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.2+rke2r1", GitCommit:"fc04e732bb3e7198d2fa44efa5457c7c6f8c0f5b", GitTreeState:"clean", BuildDate:"2023-03-01T00:08:27Z", GoVersion:"go1.19.6 X:boringcrypto", Compiler:"gc", Platform:"linux/amd64"}

and K8s:

Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.0", GitCommit:"1b4df30b3cdfeaba6024e81e559a6cd09a089d65", GitTreeState:"clean", BuildDate:"2023-04-11T17:10:18Z", GoVersion:"go1.20.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.0", GitCommit:"b46a3f887ca979b1a5d14fd39cb1af43e7e5d12d", GitTreeState:"clean", BuildDate:"2022-12-08T19:51:45Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}
@mhenriks
Copy link
Member

hi @ikyrycho thanks for the detailed bug report! Check the cdi-operator log. That is the component that is responsible for cert rotation. The default cdi-apiserver server cert is good for 24 hours [1]. But should be rotated out after 12 [2]. Since you are having problems after 24 hours, it would appear that cert is not being rotated.

Ideally, we can verify/resolve the issue with cert rotation. But if you need a quick fix, the cert durations may be set on the CDI CR. Can do something like this:

apiVersion: cdi.kubevirt.io/v1beta1
kind: CDI
metadata:
  name: cdi
spec:
...
  certConfig:
    ca:
      duration: 100000h
      renewBefore: 10000h
    server:
      duration: 80000h
      renewBefore: 10000h
...

[1] https://github.com/kubevirt/containerized-data-importer/blob/main/pkg/operator/resources/cert/factory.go#L132
[2] https://github.com/kubevirt/containerized-data-importer/blob/main/pkg/operator/resources/cert/factory.go#L133

@ikyrycho
Copy link
Author

@mhenriks thanks for tips (especially for advice of changing rotation interval in CDI).
In short, this issue in my case was caused by incorrect node-wide proxy configuration, that was preventing pod from communicating with API due to fact that proxy wasn't aware of resources on cluster. After modifying NO_PROXY certificate is rotated correctly, however there are still some EOF errors in logs

I0419 12:52:37.495240       1 certwatcher.go:125] Updated current TLS certificate
I0419 12:52:37.495430       1 certwatcher.go:136] certificate eventevent"/var/run/certs/cdi-apiserver-server-cert/tls.crt": REMOVE
I0419 12:52:37.495866       1 certwatcher.go:125] Updated current TLS certificate
2023/04/19 12:52:57 http: TLS handshake error from x.y.z.v:39074: EOF
2023/04/19 12:52:57 http: TLS handshake error from x.y.z.v:39078: EOF
I0419 12:53:12.383833       1 certwatcher.go:136] certificate eventevent"/var/run/certs/cdi-apiserver-server-cert/tls.crt": REMOVE
I0419 12:53:12.384335       1 certwatcher.go:125] Updated current TLS certificate
I0419 12:53:12.384613       1 certwatcher.go:136] certificate eventevent"/var/run/certs/cdi-apiserver-server-cert/tls.key": REMOVE

Since CDI is working, I will ignore them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants