Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(controller-runtime): enable recover true option #493

Merged

Conversation

haarchri
Copy link
Contributor

@haarchri haarchri commented Aug 1, 2023

Description of your changes

Presently, in the reconciler, an unhandled panic remains unrecovered, leading to potential crashes in the provider. This poses a significant issue since a panic could be triggered by a single managed resource in an unforeseen state. Consequently, one problematic managed resource has the potential to obstruct the processing of all other managed resources from provider.

Fixes #

I have:

  • Read and followed Crossplane's contribution process.
  • Run make reviewable test to ensure this PR is ready for review.

How has this code been tested

tested with crossplane-contrib/provider-upjet-aws#780

prior to this PR, the provider experiences a crash, resulting in the following stack trace:

2023-08-01T10:54:41+02:00       INFO    Observed a panic in reconciler: assignment to entry in nil map  {"controller": "managed/efs.aws.upbound.io/v1beta1, kind=filesystem", "namespace": "", "name": "example", "reconcileID": "cd513dc8-b3cb-4e5f-8475-b86a8bcef6b4"}
panic: assignment to entry in nil map [recovered]
        panic: assignment to entry in nil map

goroutine 28848 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
        sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:115 +0x1a8
panic({0x106d08420, 0x1075f79e0})
        runtime/panic.go:884 +0x204
github.com/upbound/provider-aws/internal/clients.DefaultTerraformSetupBuilder({0x10762ad90, 0x14001e3b920}, {0x10763d320?, 0x140001daea0?}, {0x1076ad320?, 0x14006dc2300?}, 0x140070f1520, 0x140092f1dc8)
        github.com/upbound/provider-aws/internal/clients/aws.go:187 +0x504
github.com/upbound/provider-aws/internal/clients.SelectTerraformSetup.func1({0x10762ad90, 0x14001e3b920}, {0x10763d320, 0x140001daea0}, {0x1076ad320?, 0x14006dc2300?})
        github.com/upbound/provider-aws/internal/clients/aws.go:81 +0x230
github.com/upbound/upjet/pkg/controller.(*Connector).Connect(0x14006a30190, {0x10762ad90, 0x14001e3b920}, {0x1076ad320, 0x14006dc2300})
        github.com/upbound/upjet@v0.9.0-rc.0.0.20230728172702-46549fea2147/pkg/controller/external.go:94 +0x7c
github.com/crossplane/crossplane-runtime/pkg/reconciler/managed.(*NopDisconnecter).Connect(0x14006a1e640?, {0x10762ad90?, 0x14001e3b920?}, {0x1076ad320?, 0x14006dc2300?})
        github.com/crossplane/crossplane-runtime@v0.20.0/pkg/reconciler/managed/reconciler.go:244 +0x38
github.com/crossplane/crossplane-runtime/pkg/reconciler/managed.(*Reconciler).Reconcile(0x140069ebef0, {0x10762adc8, 0x1400aabc2d0}, {{{0x0?, 0x0?}, {0x1400de24730?, 0x7?}}})
        github.com/crossplane/crossplane-runtime@v0.20.0/pkg/reconciler/managed/reconciler.go:839 +0x1cac
github.com/crossplane/crossplane-runtime/pkg/ratelimiter.(*Reconciler).Reconcile(0x14006a30230, {0x10762adc8, 0x1400aabc2d0}, {{{0x0?, 0x30?}, {0x1400de24730?, 0x10b8f2b01?}}})
        github.com/crossplane/crossplane-runtime@v0.20.0/pkg/ratelimiter/reconciler.go:54 +0x124
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x10762adc8?, {0x10762adc8?, 0x1400aabc2d0?}, {{{0x0?, 0x106bd5580?}, {0x1400de24730?, 0xf0304140b6f1631d?}}})
        sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:118 +0x8c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0x140069dd860, {0x10762ad20, 0x14000890880}, {0x1070ef480?, 0x1400d866000?})
        sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:314 +0x2f8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0x140069dd860, {0x10762ad20, 0x14000890880})
        sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265 +0x1b0
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
        sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226 +0x74
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
        sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:222 +0x28c
make: *** [run] Error 2

with this pull request, it becomes evident that the controller is now capable of recovering from panic, and as a result, the provider remains operational and functional for all other managed resources:

2023-08-01T10:52:41+02:00       DEBUG   provider-aws    Reconciling     {"controller": "managed/efs.aws.upbound.io/v1beta1, kind=filesystem", "request": {"name":"example"}}
E0801 10:52:41.773136   66620 runtime.go:79] Observed a panic: "assignment to entry in nil map" (assignment to entry in nil map)
goroutine 29483 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x104974420?, 0x1052639e0})
        k8s.io/apimachinery@v0.27.4/pkg/util/runtime/runtime.go:75 +0x7c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
        sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:107 +0xa4
panic({0x104974420, 0x1052639e0})
        runtime/panic.go:884 +0x204
github.com/upbound/provider-aws/internal/clients.DefaultTerraformSetupBuilder({0x105296d90, 0x1400080e0c0}, {0x1052a9320?, 0x140000f8630?}, {0x105319320?, 0x1400091a000?}, 0x14000659d40, 0x140017abdc8)
        github.com/upbound/provider-aws/internal/clients/aws.go:187 +0x504
github.com/upbound/provider-aws/internal/clients.SelectTerraformSetup.func1({0x105296d90, 0x1400080e0c0}, {0x1052a9320, 0x140000f8630}, {0x105319320?, 0x1400091a000?})
        github.com/upbound/provider-aws/internal/clients/aws.go:81 +0x230
github.com/upbound/upjet/pkg/controller.(*Connector).Connect(0x14004b1ceb0, {0x105296d90, 0x1400080e0c0}, {0x105319320, 0x1400091a000})
        github.com/upbound/upjet@v0.9.0-rc.0.0.20230728172702-46549fea2147/pkg/controller/external.go:94 +0x7c
github.com/crossplane/crossplane-runtime/pkg/reconciler/managed.(*NopDisconnecter).Connect(0x14004ad3ae0?, {0x105296d90?, 0x1400080e0c0?}, {0x105319320?, 0x1400091a000?})
        github.com/crossplane/crossplane-runtime@v0.20.0/pkg/reconciler/managed/reconciler.go:244 +0x38
github.com/crossplane/crossplane-runtime/pkg/reconciler/managed.(*Reconciler).Reconcile(0x14004ad1e00, {0x105296dc8, 0x1400f4bf770}, {{{0x0?, 0x0?}, {0x1400bee4310?, 0x7?}}})
        github.com/crossplane/crossplane-runtime@v0.20.0/pkg/reconciler/managed/reconciler.go:839 +0x1cac
github.com/crossplane/crossplane-runtime/pkg/ratelimiter.(*Reconciler).Reconcile(0x14004b1cf50, {0x105296dc8, 0x1400f4bf770}, {{{0x0?, 0x30?}, {0x1400bee4310?, 0x10955ca01?}}})
        github.com/crossplane/crossplane-runtime@v0.20.0/pkg/ratelimiter/reconciler.go:54 +0x124
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x105296dc8?, {0x105296dc8?, 0x1400f4bf770?}, {{{0x0?, 0x104841580?}, {0x1400bee4310?, 0xf0304140b6f1631d?}}})
        sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:118 +0x8c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0x14004b06d20, {0x105296d20, 0x14000036a00}, {0x104d5b480?, 0x14007723680?})
        sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:314 +0x2f8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0x14004b06d20, {0x105296d20, 0x14000036a00})
        sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265 +0x1b0
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
        sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226 +0x74
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
        sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:222 +0x28c

2023-08-01T10:52:41+02:00       ERROR   Reconciler error        {"controller": "managed/efs.aws.upbound.io/v1beta1, kind=filesystem", "namespace": "", "name": "example", "reconcileID": "84119f62-06be-4d24-a01c-bf7d63ed1338", "error": "panic: assignment to entry in nil map [recovered]"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:324
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226

tested with the following configs and manifests:

add to go.mod and run go mod tidy
replace github.com/crossplane/crossplane-runtime => github.com/haarchri/crossplane-runtime v0.21.0-rc.0.0.20230801-e147d407d7eddcd57891fc85d68315c1917d072f

apply the following resources:

kubectl -n upbound-system create secret generic aws-creds --from-literal=credentials="${UPTEST_CLOUD_CREDENTIALS}" \
    --dry-run=client -o yaml | kubectl apply -f -
apiVersion: aws.upbound.io/v1beta1
kind: ProviderConfig
metadata:
  name: 780-issue-nil
spec:
  credentials:
    source: Secret
    secretRef:
      name: aws-creds
      namespace: upbound-system
      key: credentials
  endpoint:
    hostnameImmutable:  true
    services:
      - apigateway
      - apigatewayv2
      - cloudformation
      - cloudwatch
      - dynamodb
      - ec2
      - es
      - elasticache
      - firehose
      - iam
      - kinesis
      - lambda
      - rds
      - redshift
      - route53
      - s3
      - secretsmanager
      - ses
      - sns
      - sqs
      - ssm
      - stepfunctions
      - sts
    url:
      static:                   http://127.0.0.1:4566
      type:                     Static
  skip_credentials_validation:  true
  skip_metadata_api_check:      true
  skip_requesting_account_id:   true
apiVersion: efs.aws.upbound.io/v1beta1
kind: FileSystem
metadata:
  name: example
  labels:
    testing.upbound.io/example-name: efs
spec:
  forProvider:
    region: us-west-1
    creationToken: my-product
    tags:
      Name: MyProduct
  providerConfigRef:
    name: 780-issue-nil

Signed-off-by: Christopher Haar <christopher.haar@upbound.io>
@haarchri haarchri requested review from a team as code owners August 1, 2023 09:01
@sttts
Copy link
Contributor

sttts commented Aug 1, 2023

Sgtm. One question: is there a backoff of controller restarts?

Copy link
Contributor

@pedjak pedjak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps to rephrase PR title to reflect that we did not add any new option, we just activate recover from panic controller option.

@haarchri haarchri changed the title feat(controller-runtime): add recover true option feat(controller-runtime): enable recover true option Aug 1, 2023
@phisco phisco merged commit 89472a2 into crossplane:master Aug 4, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants