Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure in k8s-operator endpoint-template (PostStart hook) #7396

Closed
jcsp opened this issue Nov 21, 2022 · 21 comments
Closed

Failure in k8s-operator endpoint-template (PostStart hook) #7396

jcsp opened this issue Nov 21, 2022 · 21 comments
Labels
area/k8s ci-failure kind/bug Something isn't working

Comments

@jcsp
Copy link
Contributor

jcsp commented Nov 21, 2022

https://buildkite.com/redpanda/redpanda/builds/18840#01849373-8425-495a-95e6-7c462a832601

--- FAIL: kuttl (2273.54s)
    --- FAIL: kuttl/harness (0.00s)
        --- PASS: kuttl/harness/resources-redpanda (9.51s)
        --- PASS: kuttl/harness/additional-configuration (218.28s)
        --- PASS: kuttl/harness/redpanda-schema-registry-sasl (401.49s)
        --- PASS: kuttl/harness/produce-tls (12.22s)
        --- PASS: kuttl/harness/pandaproxy-produce-consume-tls-client (10.17s)
        --- PASS: kuttl/harness/pandaproxy-produce-consume-sasl (46.31s)
        --- PASS: kuttl/harness/pandaproxy-produce-consume (79.35s)
        --- PASS: kuttl/harness/node-select-tolerations (6.38s)
        --- PASS: kuttl/harness/external-connectivity (9.23s)
        --- PASS: kuttl/harness/explicit-ports-tls (10.15s)
        --- PASS: kuttl/harness/redpanda-schema-registry (649.27s)
        --- PASS: kuttl/harness/create-topic-with-client-auth (11.27s)
        --- FAIL: kuttl/harness/endpoint-template (300.70s)
        --- PASS: kuttl/harness/create-topic-given-issuer (47.35s)
        --- PASS: kuttl/harness/create-topic-given-issuer-with-client-auth (55.63s)
        --- PASS: kuttl/harness/create-topic-given-cert-secret (52.29s)
        --- PASS: kuttl/harness/create-topic-given-cm-secret-client-auth (56.34s)
        --- PASS: kuttl/harness/console-admin-api (164.97s)
        --- PASS: kuttl/harness/console-kafka-mtls (172.62s)
        --- PASS: kuttl/harness/admin-api-tls-client-auth (11.22s)
        --- PASS: kuttl/harness/admin-api-tls (12.21s)
        --- PASS: kuttl/harness/admin-api (10.09s)
        --- PASS: kuttl/harness/centralized-configuration-bootstrap (43.42s)
        --- PASS: kuttl/harness/confluent-schema-registry (60.37s)
        --- PASS: kuttl/harness/console (164.46s)
        --- PASS: kuttl/harness/centralized-configuration-upgrade (117.20s)
        --- PASS: kuttl/harness/centralized-configuration-tls (218.49s)
        --- PASS: kuttl/harness/update-image-and-node-port (294.65s)
        --- PASS: kuttl/harness/update-image-tls-client-auth (291.80s)
        --- PASS: kuttl/harness/update-image-tls (238.90s)
        --- PASS: kuttl/harness/update (85.01s)
        --- PASS: kuttl/harness/centralized-configuration-drift (96.50s)
        --- PASS: kuttl/harness/superusers-prefix (194.81s)
FAIL
@jcsp jcsp added kind/bug Something isn't working area/k8s ci-failure labels Nov 21, 2022
@RafalKorepta
Copy link
Contributor

This is miss leading. The amount of logs generated by kuttl from unstable k8s test hides the true problem in the main test suite.

Taken from raw output.

--- FAIL: kuttl (2273.54s)
    --- FAIL: kuttl/harness (0.00s)
        --- PASS: kuttl/harness/resources-redpanda (9.51s)
        --- PASS: kuttl/harness/additional-configuration (218.28s)
        --- PASS: kuttl/harness/redpanda-schema-registry-sasl (401.49s)
        --- PASS: kuttl/harness/produce-tls (12.22s)
        --- PASS: kuttl/harness/pandaproxy-produce-consume-tls-client (10.17s)
        --- PASS: kuttl/harness/pandaproxy-produce-consume-sasl (46.31s)
        --- PASS: kuttl/harness/pandaproxy-produce-consume (79.35s)
        --- PASS: kuttl/harness/node-select-tolerations (6.38s)
        --- PASS: kuttl/harness/external-connectivity (9.23s)
        --- PASS: kuttl/harness/explicit-ports-tls (10.15s)
        --- PASS: kuttl/harness/redpanda-schema-registry (649.27s)
        --- PASS: kuttl/harness/create-topic-with-client-auth (11.27s)
        --- FAIL: kuttl/harness/endpoint-template (300.70s)
        --- PASS: kuttl/harness/create-topic-given-issuer (47.35s)
        --- PASS: kuttl/harness/create-topic-given-issuer-with-client-auth (55.63s)
        --- PASS: kuttl/harness/create-topic-given-cert-secret (52.29s)
        --- PASS: kuttl/harness/create-topic-given-cm-secret-client-auth (56.34s)
        --- PASS: kuttl/harness/console-admin-api (164.97s)
        --- PASS: kuttl/harness/console-kafka-mtls (172.62s)
        --- PASS: kuttl/harness/admin-api-tls-client-auth (11.22s)
        --- PASS: kuttl/harness/admin-api-tls (12.21s)
        --- PASS: kuttl/harness/admin-api (10.09s)
        --- PASS: kuttl/harness/centralized-configuration-bootstrap (43.42s)
        --- PASS: kuttl/harness/confluent-schema-registry (60.37s)
        --- PASS: kuttl/harness/console (164.46s)
        --- PASS: kuttl/harness/centralized-configuration-upgrade (117.20s)
        --- PASS: kuttl/harness/centralized-configuration-tls (218.49s)
        --- PASS: kuttl/harness/update-image-and-node-port (294.65s)
        --- PASS: kuttl/harness/update-image-tls-client-auth (291.80s)
        --- PASS: kuttl/harness/update-image-tls (238.90s)
        --- PASS: kuttl/harness/update (85.01s)
        --- PASS: kuttl/harness/centralized-configuration-drift (96.50s)
        --- PASS: kuttl/harness/superusers-prefix (194.81s)
FAIL

Kuttl returned exit code 1. Creating artifacts tarball

@RafalKorepta
Copy link
Contributor

Events from kuttle output doesn't show the root cause of pod-0 not being scheduled

    logger.go:42: 06:03:12 | endpoint-template | endpoint-template events from ns kuttl-test-select-louse:
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:11 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-admin-client		Issuing	Issuing certificate as Secret does not exist		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:11 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-kafka-root-certificate-7wd6c		cert-manager.io	Certificate request has been approved by cert-manager.io		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:11 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-kafka-root-certificate-7wd6c		CertificateIssued	Certificate fetched from issuer successfully		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:11 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-kafka-root-certificate		Issuing	Issuing certificate as Secret does not exist		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:11 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-kafka-root-certificate		Generated	Stored new private key in temporary Secret resource "endpoint-template-kafka-root-certificate-ldcxk"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:11 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-kafka-root-certificate		Requested	Created new CertificateRequest resource "endpoint-template-kafka-root-certificate-7wd6c"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:11 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-kafka-root-certificate		Issuing	The certificate has been successfully issued		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:11 +0000 UTC	Normal	Issuer.cert-manager.io endpoint-template-kafka-root-issuer		KeyPairVerified	Signing CA verified		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:11 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-operator-client		Issuing	Issuing certificate as Secret does not exist		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:11 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-proxy-api-node		Issuing	Issuing certificate as Secret does not exist		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:11 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-proxy-root-certificate		Issuing	Issuing certificate as Secret does not exist		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:11 +0000 UTC	Warning	Issuer.cert-manager.io endpoint-template-proxy-root-issuer		ErrGetKeyPair	Error getting keypair for CA issuer: secret "endpoint-template-proxy-root-certificate" not found		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:11 +0000 UTC	Warning	Issuer.cert-manager.io endpoint-template-proxy-root-issuer		ErrInitIssuer	Error initializing issuer: secret "endpoint-template-proxy-root-certificate" not found		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:11 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-redpanda		Issuing	Issuing certificate as Secret does not exist		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:11 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-user-client		Issuing	Issuing certificate as Secret does not exist		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	PersistentVolumeClaim datadir-endpoint-template-0		WaitForFirstConsumer	waiting for first consumer to be created before binding		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	PersistentVolumeClaim datadir-endpoint-template-0		ExternalProvisioning	waiting for a volume to be created, either by external provisioner "rancher.io/local-path" or manually created by system administrator		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	PersistentVolumeClaim datadir-endpoint-template-0		Provisioning	External provisioner is provisioning volume for claim "kuttl-test-select-louse/datadir-endpoint-template-0"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	PersistentVolumeClaim datadir-endpoint-template-1		WaitForFirstConsumer	waiting for first consumer to be created before binding		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	PersistentVolumeClaim datadir-endpoint-template-1		ExternalProvisioning	waiting for a volume to be created, either by external provisioner "rancher.io/local-path" or manually created by system administrator		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	PersistentVolumeClaim datadir-endpoint-template-1		Provisioning	External provisioner is provisioning volume for claim "kuttl-test-select-louse/datadir-endpoint-template-1"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	PersistentVolumeClaim datadir-endpoint-template-2		WaitForFirstConsumer	waiting for first consumer to be created before binding		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	PersistentVolumeClaim datadir-endpoint-template-2		ExternalProvisioning	waiting for a volume to be created, either by external provisioner "rancher.io/local-path" or manually created by system administrator		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	PersistentVolumeClaim datadir-endpoint-template-2		Provisioning	External provisioner is provisioning volume for claim "kuttl-test-select-louse/datadir-endpoint-template-2"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-admin-client-vrzps		cert-manager.io	Certificate request has been approved by cert-manager.io		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-admin-client-vrzps		CertificateIssued	Certificate fetched from issuer successfully		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-admin-client		Generated	Stored new private key in temporary Secret resource "endpoint-template-admin-client-pzjjd"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-admin-client		Requested	Created new CertificateRequest resource "endpoint-template-admin-client-vrzps"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-admin-client		Issuing	The certificate has been successfully issued		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-operator-client-gtb6j		cert-manager.io	Certificate request has been approved by cert-manager.io		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-operator-client-gtb6j		CertificateIssued	Certificate fetched from issuer successfully		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-operator-client		Generated	Stored new private key in temporary Secret resource "endpoint-template-operator-client-jt6hs"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-operator-client		Requested	Created new CertificateRequest resource "endpoint-template-operator-client-gtb6j"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-operator-client		Issuing	The certificate has been successfully issued		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-proxy-api-client		Issuing	Issuing certificate as Secret does not exist		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-proxy-api-client		Generated	Stored new private key in temporary Secret resource "endpoint-template-proxy-api-client-xlc78"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-proxy-api-node		Generated	Stored new private key in temporary Secret resource "endpoint-template-proxy-api-node-l5whb"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-proxy-root-certificate-6r8wx		cert-manager.io	Certificate request has been approved by cert-manager.io		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-proxy-root-certificate-6r8wx		CertificateIssued	Certificate fetched from issuer successfully		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-proxy-root-certificate		Generated	Stored new private key in temporary Secret resource "endpoint-template-proxy-root-certificate-dbbm2"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-proxy-root-certificate		Requested	Created new CertificateRequest resource "endpoint-template-proxy-root-certificate-6r8wx"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-proxy-root-certificate		Issuing	The certificate has been successfully issued		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-redpanda-fwzf7		cert-manager.io	Certificate request has been approved by cert-manager.io		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-redpanda-fwzf7		CertificateIssued	Certificate fetched from issuer successfully		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-redpanda		Generated	Stored new private key in temporary Secret resource "endpoint-template-redpanda-5pntk"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-redpanda		Requested	Created new CertificateRequest resource "endpoint-template-redpanda-fwzf7"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-redpanda		Issuing	The certificate has been successfully issued		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-user-client		Generated	Stored new private key in temporary Secret resource "endpoint-template-user-client-hxcdh"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	PodDisruptionBudget.policy endpoint-template		NoPods	No matching pods found		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	StatefulSet.apps endpoint-template		SuccessfulCreate	create Claim datadir-endpoint-template-0 Pod endpoint-template-0 in StatefulSet endpoint-template success		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	StatefulSet.apps endpoint-template		SuccessfulCreate	create Pod endpoint-template-0 in StatefulSet endpoint-template successful		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	StatefulSet.apps endpoint-template		SuccessfulCreate	create Claim datadir-endpoint-template-1 Pod endpoint-template-1 in StatefulSet endpoint-template success		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	StatefulSet.apps endpoint-template		SuccessfulCreate	create Pod endpoint-template-1 in StatefulSet endpoint-template successful		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	StatefulSet.apps endpoint-template		SuccessfulCreate	create Claim datadir-endpoint-template-2 Pod endpoint-template-2 in StatefulSet endpoint-template success		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	StatefulSet.apps endpoint-template		SuccessfulCreate	create Pod endpoint-template-2 in StatefulSet endpoint-template successful		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:13 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-proxy-api-client-c6tvl		cert-manager.io	Certificate request has been approved by cert-manager.io		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:13 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-proxy-api-client-c6tvl		IssuerNotReady	Referenced issuer does not have a Ready status condition		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:13 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-proxy-api-client		Requested	Created new CertificateRequest resource "endpoint-template-proxy-api-client-c6tvl"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:13 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-proxy-api-node-nkxz5		cert-manager.io	Certificate request has been approved by cert-manager.io		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:13 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-proxy-api-node-nkxz5		IssuerNotReady	Referenced issuer does not have a Ready status condition		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:13 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-proxy-api-node		Requested	Created new CertificateRequest resource "endpoint-template-proxy-api-node-nkxz5"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:13 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-user-client-nkxj8		cert-manager.io	Certificate request has been approved by cert-manager.io		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:13 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-user-client-nkxj8		CertificateIssued	Certificate fetched from issuer successfully		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:13 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-user-client		Requested	Created new CertificateRequest resource "endpoint-template-user-client-nkxj8"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:14 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-user-client		Issuing	The certificate has been successfully issued		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:15 +0000 UTC	Normal	PersistentVolumeClaim datadir-endpoint-template-0		ProvisioningSucceeded	Successfully provisioned volume pvc-dd6f0158-409c-419b-bfe1-fa1686cfcd87		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:15 +0000 UTC	Normal	PersistentVolumeClaim datadir-endpoint-template-1		ProvisioningSucceeded	Successfully provisioned volume pvc-480df756-64fd-4d71-9968-d7ec2088faa6		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:15 +0000 UTC	Normal	PersistentVolumeClaim datadir-endpoint-template-2		ProvisioningSucceeded	Successfully provisioned volume pvc-d9ab6a89-740e-4369-8b7d-6962fcc6022e		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:15 +0000 UTC	Normal	Pod endpoint-template-0		Scheduled	Successfully assigned kuttl-test-select-louse/endpoint-template-0 to kind-worker		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:15 +0000 UTC	Warning	Pod endpoint-template-0		FailedMount	MountVolume.SetUp failed for volume "tlspandaproxyca" : secret "endpoint-template-proxy-api-client" not found		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:15 +0000 UTC	Warning	Pod endpoint-template-0		FailedMount	MountVolume.SetUp failed for volume "tlspandaproxycert" : secret "endpoint-template-proxy-api-node" not found		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:15 +0000 UTC	Normal	Pod endpoint-template-1		Scheduled	Successfully assigned kuttl-test-select-louse/endpoint-template-1 to kind-worker6		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:15 +0000 UTC	Warning	Pod endpoint-template-1		FailedMount	MountVolume.SetUp failed for volume "tlspandaproxycert" : secret "endpoint-template-proxy-api-node" not found		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:15 +0000 UTC	Warning	Pod endpoint-template-1		FailedMount	MountVolume.SetUp failed for volume "tlspandaproxyca" : secret "endpoint-template-proxy-api-client" not found		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:16 +0000 UTC	Normal	Pod endpoint-template-2		Scheduled	Successfully assigned kuttl-test-select-louse/endpoint-template-2 to kind-worker8		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:16 +0000 UTC	Warning	Pod endpoint-template-2		FailedMount	MountVolume.SetUp failed for volume "tlspandaproxyca" : secret "endpoint-template-proxy-api-client" not found		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:16 +0000 UTC	Warning	Pod endpoint-template-2		FailedMount	MountVolume.SetUp failed for volume "tlspandaproxycert" : secret "endpoint-template-proxy-api-node" not found		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:16 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-proxy-api-client-c6tvl		CertificateIssued	Certificate fetched from issuer successfully		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:16 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-proxy-api-client		Issuing	The certificate has been successfully issued		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:16 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-proxy-api-node-nkxz5		CertificateIssued	Certificate fetched from issuer successfully		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:16 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-proxy-api-node		Issuing	The certificate has been successfully issued		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:16 +0000 UTC	Normal	Issuer.cert-manager.io endpoint-template-proxy-root-issuer		KeyPairVerified	Signing CA verified		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:18 +0000 UTC	Normal	Pod endpoint-template-2.spec.initContainers{redpanda-configurator}		Pulled	Container image "localhost/configurator:dev" already present on machine		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:18 +0000 UTC	Normal	Pod endpoint-template-2.spec.initContainers{redpanda-configurator}		Created	Created container redpanda-configurator		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:18 +0000 UTC	Normal	Pod endpoint-template-2.spec.initContainers{redpanda-configurator}		Started	Started container redpanda-configurator		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:19 +0000 UTC	Normal	Pod endpoint-template-0.spec.initContainers{redpanda-configurator}		Pulled	Container image "localhost/configurator:dev" already present on machine		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:19 +0000 UTC	Normal	Pod endpoint-template-0.spec.initContainers{redpanda-configurator}		Created	Created container redpanda-configurator		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:19 +0000 UTC	Normal	Pod endpoint-template-0.spec.initContainers{redpanda-configurator}		Started	Started container redpanda-configurator		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:19 +0000 UTC	Normal	Pod endpoint-template-1.spec.initContainers{redpanda-configurator}		Pulled	Container image "localhost/configurator:dev" already present on machine		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:19 +0000 UTC	Normal	Pod endpoint-template-1.spec.initContainers{redpanda-configurator}		Created	Created container redpanda-configurator		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:19 +0000 UTC	Normal	Pod endpoint-template-1.spec.initContainers{redpanda-configurator}		Started	Started container redpanda-configurator		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:19 +0000 UTC	Normal	Pod endpoint-template-2.spec.containers{redpanda}		Pulled	Container image "localhost/redpanda:dev" already present on machine		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:19 +0000 UTC	Normal	Pod endpoint-template-2.spec.containers{redpanda}		Created	Created container redpanda		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:19 +0000 UTC	Normal	Pod endpoint-template-2.spec.containers{redpanda}		Started	Started container redpanda		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:20 +0000 UTC	Normal	Pod endpoint-template-0.spec.containers{redpanda}		Pulled	Container image "localhost/redpanda:dev" already present on machine		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:20 +0000 UTC	Normal	Pod endpoint-template-0.spec.containers{redpanda}		Created	Created container redpanda		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:20 +0000 UTC	Normal	Pod endpoint-template-0.spec.containers{redpanda}		Started	Started container redpanda		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:20 +0000 UTC	Normal	Pod endpoint-template-1.spec.containers{redpanda}		Pulled	Container image "localhost/redpanda:dev" already present on machine		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:20 +0000 UTC	Normal	Pod endpoint-template-1.spec.containers{redpanda}		Created	Created container redpanda		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:20 +0000 UTC	Normal	Pod endpoint-template-1.spec.containers{redpanda}		Started	Started container redpanda

@RafalKorepta
Copy link
Contributor

The kubectl get pod output shows the following problem

...
            {
              "lastProbeTime": null,
              "lastTransitionTime": "2022-11-20T05:58:15Z",
              "message": "containers with unready status: [redpanda]",
              "reason": "ContainersNotReady",
              "status": "False",
              "type": "Ready"
            },
            {
              "lastProbeTime": null,
              "lastTransitionTime": "2022-11-20T05:58:15Z",
              "message": "containers with unready status: [redpanda]",
              "reason": "ContainersNotReady",
              "status": "False",
              "type": "ContainersReady"
            }
          ],
          "containerStatuses": [
            {
              "image": "localhost/redpanda:dev",
              "imageID": "",
              "lastState": {},
              "name": "redpanda",
              "ready": false,
              "restartCount": 0,
              "started": false,
              "state": {
                "waiting": {
                  "reason": "PodInitializing"
                }
              }
            }
          ],
...

RafalKorepta pushed a commit to RafalKorepta/redpanda that referenced this issue Nov 21, 2022
@RafalKorepta
Copy link
Contributor

Nov 20 06:05:15 kind-worker kubelet[258]: E1120 06:05:15.425494     258 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"redpanda\" with PostStartHookError: command '/scripts/postStart.sh' exited with 137: : \"Exec lifecycle hook ([/scripts/postStart.sh]) for Container \\\"redpanda\\\" in Pod \\\"endpoint-template-0_kuttl-test-select-louse(35d2ae0a-bd23-41f7-81a4-89ac99f99f4f)\\\" failed - error: command '/scripts/postStart.sh' exited with 137: , message: \\\"\\\"\"" pod="kuttl-test-select-louse/endpoint-template-0" podUID=35d2ae0a-bd23-41f7-81a4-89ac99f99f4f

For what ever reason the postStart script failed.

@RafalKorepta RafalKorepta changed the title Failure in k8s-operator centralized-configuration, decommission Failure in k8s-operator endpoint-template (PostStart hook) Nov 22, 2022
@RafalKorepta
Copy link
Contributor

RafalKorepta commented Nov 22, 2022

Second time https://buildkite.com/redpanda/redpanda/builds/18895#01849d6c-9736-4275-b872-dd1116132d33 the same problem

--- FAIL: kuttl/harness/endpoint-template (601.47s)

@RafalKorepta
Copy link
Contributor

RafalKorepta commented Nov 22, 2022

Third time https://buildkite.com/redpanda/redpanda/builds/18875#01849b55-2a33-426c-9267-ae7b4410d220 the same problem

From this test run one pod named ednpoint-template-1 couldn't finish PostStart hook. The following logs was gathered from the Redpanda container:

2022-11-21T19:11:14.036075263Z stderr F INFO  2022-11-21 19:11:14,035 [shard 0] cluster - cluster_discovery.cc:147 - Requesting node ID for node UUID {6d0bca35-3ff2-4a4e-b3ce-720843bf0261} from {host: endpoint-template-0.endpoint-template.kuttl-test-helped-gator.svc.cluster.local., port: 33145}
2022-11-21T19:11:14.036108999Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] dns_resolver - Query name endpoint-template-0.endpoint-template.kuttl-test-helped-gator.svc.cluster.local. (INET)
2022-11-21T19:11:14.036312853Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] dns_resolver - Query failed: 4
2022-11-21T19:11:14.036370634Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] rpc - transport.cc:308 - RPC Client probes: { requests_sent: 0, requests_pending: 0, requests_completed: 0, request_errors: 0, request_timeouts: 0, in_bytes: 0, out_bytes: 0, connects: 0, connections: 0, connection_errors: 1, read_dispatch_errors: 0, corrupted_headers: 0, server_correlation_errors: 0, client_correlation_errors: 0, requests_blocked_memory: 0 }
2022-11-21T19:11:14.036411964Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] cluster - cluster_discovery.cc:169 - Error registering node UUID {6d0bca35-3ff2-4a4e-b3ce-720843bf0261}, retrying: std::__1::system_error (error C-Ares:4, endpoint-template-0.endpoint-template.kuttl-test-helped-gator.svc.cluster.local.: Not found)
2022-11-21T19:11:14.036415432Z stderr F INFO  2022-11-21 19:11:14,036 [shard 0] cluster - cluster_discovery.cc:147 - Requesting node ID for node UUID {6d0bca35-3ff2-4a4e-b3ce-720843bf0261} from {host: endpoint-template-1.endpoint-template.kuttl-test-helped-gator.svc.cluster.local., port: 33145}
2022-11-21T19:11:14.036420704Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] dns_resolver - Query name endpoint-template-1.endpoint-template.kuttl-test-helped-gator.svc.cluster.local. (INET)
2022-11-21T19:11:14.036604935Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] dns_resolver - Query failed: 4
2022-11-21T19:11:14.036633878Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] rpc - transport.cc:308 - RPC Client probes: { requests_sent: 0, requests_pending: 0, requests_completed: 0, request_errors: 0, request_timeouts: 0, in_bytes: 0, out_bytes: 0, connects: 0, connections: 0, connection_errors: 1, read_dispatch_errors: 0, corrupted_headers: 0, server_correlation_errors: 0, client_correlation_errors: 0, requests_blocked_memory: 0 }
2022-11-21T19:11:14.036663665Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] cluster - cluster_discovery.cc:169 - Error registering node UUID {6d0bca35-3ff2-4a4e-b3ce-720843bf0261}, retrying: std::__1::system_error (error C-Ares:4, endpoint-template-1.endpoint-template.kuttl-test-helped-gator.svc.cluster.local.: Not found)
2022-11-21T19:11:14.036678215Z stderr F INFO  2022-11-21 19:11:14,036 [shard 0] cluster - cluster_discovery.cc:147 - Requesting node ID for node UUID {6d0bca35-3ff2-4a4e-b3ce-720843bf0261} from {host: endpoint-template-2.endpoint-template.kuttl-test-helped-gator.svc.cluster.local., port: 33145}
2022-11-21T19:11:14.036689774Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] dns_resolver - Query name endpoint-template-2.endpoint-template.kuttl-test-helped-gator.svc.cluster.local. (INET)
2022-11-21T19:11:14.036817588Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] dns_resolver - Query failed: 4
2022-11-21T19:11:14.036851303Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] rpc - transport.cc:308 - RPC Client probes: { requests_sent: 0, requests_pending: 0, requests_completed: 0, request_errors: 0, request_timeouts: 0, in_bytes: 0, out_bytes: 0, connects: 0, connections: 0, connection_errors: 1, read_dispatch_errors: 0, corrupted_headers: 0, server_correlation_errors: 0, client_correlation_errors: 0, requests_blocked_memory: 0 }
2022-11-21T19:11:14.03687066Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] cluster - cluster_discovery.cc:169 - Error registering node UUID {6d0bca35-3ff2-4a4e-b3ce-720843bf0261}, retrying: std::__1::system_error (error C-Ares:4, endpoint-template-2.endpoint-template.kuttl-test-helped-gator.svc.cluster.local.: Not found)

RafalKorepta pushed a commit to RafalKorepta/redpanda that referenced this issue Nov 22, 2022
Due to failing PostStart hook it seems that one of the Redpanda broker is in
degraded state and AdminAPI is not responsive. The Pod log collector does not
work if pod is in `PodInitializing` state.

REF
redpanda-data#7396
RafalKorepta pushed a commit to RafalKorepta/redpanda that referenced this issue Nov 22, 2022
Due to failing PostStart hook it seems that one of the Redpanda broker is in
degraded state and AdminAPI is not responsive. The Pod log collector does not
work if pod is in `PodInitializing` state.

To find more evidence what is happening kind export logs command was added.
It should be collected by the buildkite agent and served for later
investigations.

REF
redpanda-data#7396
RafalKorepta added a commit that referenced this issue Nov 22, 2022
@RafalKorepta
Copy link
Contributor

The script that tries to remove maintenance mode

#!/usr/bin/env bash
set -e

until NODE_ID=$(curl --silent --fail http://${POD_NAME}.endpoint-template.kuttl-test-composed-jackass.svc.cluster.local:9644/v1/node_config | grep -o '\"node_id\":[^,}]*' | grep -o '[^: ]*$'); do
	sleep 0.5
done
echo "Clearing maintenance mode on node ${NODE_ID}"
until [ "${status:-}" = "200" ] || [ "${status:-}" = "400" ]; do
	status=$(curl -X DELETE --silent -o /dev/null -w "%{http_code}" http://${POD_NAME}.endpoint-template.kuttl-test-composed-jackass.svc.cluster.local:9644/v1/brokers/${NODE_ID}/maintenance)
	sleep 0.5
done

@RafalKorepta
Copy link
Contributor

Redpanda config

redpanda:
    data_directory: /var/lib/redpanda/data
    empty_seed_starts_cluster: false
    seed_servers:
        - host:
            address: endpoint-template-0.endpoint-template.kuttl-test-composed-jackass.svc.cluster.local.
            port: 33145
        - host:
            address: endpoint-template-1.endpoint-template.kuttl-test-composed-jackass.svc.cluster.local.
            port: 33145
        - host:
            address: endpoint-template-2.endpoint-template.kuttl-test-composed-jackass.svc.cluster.local.
            port: 33145
    rpc_server:
        address: 0.0.0.0
        port: 33145
    kafka_api:
        - address: 0.0.0.0
          port: 9092
          name: kafka
          authentication_method: none
        - address: 0.0.0.0
          port: 9093
          name: kafka-external
          authentication_method: none
    kafka_api_tls:
        - name: kafka-external
          key_file: /etc/tls/certs/tls.key
          cert_file: /etc/tls/certs/tls.crt
          truststore_file: /etc/tls/certs/ca/ca.crt
          enabled: true
          require_client_auth: true
    admin:
        - address: 0.0.0.0
          port: 9644
          name: admin
    advertised_rpc_api:
        address: endpoint-template-0.endpoint-template.kuttl-test-composed-jackass.svc.cluster.local.
        port: 33145
    advertised_kafka_api:
        - address: endpoint-template-0.endpoint-template.kuttl-test-composed-jackass.svc.cluster.local.
          port: 9092
          name: kafka
        - address: 0-5c2a3f8f4c-kafka.example.com
          port: 30804
          name: kafka-external
    developer_mode: true
    auto_create_topics_enabled: true
    cloud_storage_segment_max_upload_interval_sec: 1800
    default_topic_replications: 3
    enable_rack_awareness: true
    fetch_reads_debounce_timeout: 10
    group_topic_partitions: 3
    id_allocator_replication: 3
    log_segment_size: 536870912
    storage_min_free_bytes: 10485760
    topic_partitions_per_shard: 1000
    transaction_coordinator_replication: 3
rpk:
    tune_network: true
    tune_disk_scheduler: true
    tune_disk_nomerges: true
    tune_disk_write_cache: true
    tune_disk_irq: true
    tune_cpu: true
    tune_aio_events: true
    tune_clocksource: true
    tune_swappiness: true
    coredump_dir: /var/lib/redpanda/coredump
    tune_ballast_file: true
    overprovisioned: true
pandaproxy:
    pandaproxy_api:
        - address: 0.0.0.0
          port: 8082
          name: proxy
        - address: 0.0.0.0
          port: 8083
          name: proxy-external
    pandaproxy_api_tls:
        - name: proxy-external
          key_file: /etc/tls/certs/pandaproxy/tls.key
          cert_file: /etc/tls/certs/pandaproxy/tls.crt
          truststore_file: /etc/tls/certs/pandaproxy/ca/ca.crt
          enabled: true
          require_client_auth: true
    advertised_pandaproxy_api:
        - address: endpoint-template-0.endpoint-template.kuttl-test-composed-jackass.svc.cluster.local.
          port: 8082
          name: proxy
        - address: 0-5c2a3f8f4c-pandaproxy.example.com
          port: 32576
          name: proxy-external
pandaproxy_client:
    brokers:
        - address: endpoint-template-0.endpoint-template.kuttl-test-composed-jackass.svc.cluster.local.
          port: 9092
        - address: endpoint-template-1.endpoint-template.kuttl-test-composed-jackass.svc.cluster.local.
          port: 9092
        - address: endpoint-template-2.endpoint-template.kuttl-test-composed-jackass.svc.cluster.local.
          port: 9092
schema_registry: {}

@RafalKorepta RafalKorepta self-assigned this Nov 22, 2022
@andrwng
Copy link
Contributor

andrwng commented Nov 22, 2022

From Redpanda's perspective, a couple observations:

  • this node believes it isn't a cluster founder, meaning it either found that another node has a cluster UUID (only possible if we have previously been able to send RPCs), or it's not configured as a part of the seed_servers list
  • it looks like this is a brand new node that doesn't yet have a node ID (the code path hit means that we're trying to get assigned a node ID because we don't have one stored locally yet)

Just curious if that aligns with what we think the test is doing; is it adding a brand new node to a cluster? For a brand new node, is it possible to skip the maintenance mode disabling? More broadly speaking, if the issue is that we can't send any RPCs while initializing, that seems problematic, and maybe we need to expose a different readiness endpoint for these hooks to use.

2022-11-21T19:11:14.036108999Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] dns_resolver - Query name endpoint-template-0.endpoint-template.kuttl-test-helped-gator.svc.cluster.local. (INET)
2022-11-21T19:11:14.036312853Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] dns_resolver - Query failed: 4
...
2022-11-21T19:11:14.036420704Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] dns_resolver - Query name endpoint-template-1.endpoint-template.kuttl-test-helped-gator.svc.cluster.local. (INET)
2022-11-21T19:11:14.036604935Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] dns_resolver - Query failed: 4
...
2022-11-21T19:11:14.036689774Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] dns_resolver - Query name endpoint-template-2.endpoint-template.kuttl-test-helped-gator.svc.cluster.local. (INET)
2022-11-21T19:11:14.036817588Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] dns_resolver - Query failed: 4

It looks like the node is having trouble sending RPCs to any of the seed nodes. Is that something that we should be expecting when initializing a pod?

@dlex
Copy link
Contributor

dlex commented Nov 22, 2022

@andrwng
Copy link
Contributor

andrwng commented Nov 22, 2022

Another thing I'm confused about is why this is only happening only sometimes, if the issue truly is that DNS setup cannot proceed until the node exposes its admin endpoint.

@joejulian
Copy link
Contributor

Just for posterity, exited with 137 means that it was still trying with the pod was deleted.

@RafalKorepta
Copy link
Contributor

It looks like the node is having trouble sending RPCs to any of the seed nodes. Is that something that we should be expecting when initializing a pod?

Thanks @andrwng I will drill down with the DNS (core dns) pod if it was causing the problem.

Just curious if that aligns with what we think the test is doing; is it adding a brand new node to a cluster?

This tests is related to setting up the external advertised kafka api DNS using some template mechanism. It happens that this particular test will create 3 Redpanda cluster. Others tries to create only 1 Redpanda cluster.

For a brand new node, is it possible to skip the maintenance mode disabling?

It's not that easy. We would need to implement some hacks around when node is considered formed and when not. If we can not use Admin API reliably then I we might end up with some kind of workaround.

More broadly speaking, if the issue is that we can't send any RPCs while initializing, that seems problematic, and maybe we need to expose a different readiness endpoint for these hooks to use.

We are not using readiness probe to put Redpanda broker out of maintenance mode. It's PostStart hook https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/#container-hooks

@RafalKorepta
Copy link
Contributor

DNS containers doesn't have any useful logs. I don't want to blame kind cluster or buildkite environment, but I run out of options.

Maybe we have too many kind nodes, but it's a stretch hypothesis.

@RafalKorepta
Copy link
Contributor

@RafalKorepta
Copy link
Contributor

andijcr pushed a commit to andijcr/redpanda that referenced this issue Dec 3, 2022
Due to failing PostStart hook it seems that one of the Redpanda broker is in
degraded state and AdminAPI is not responsive. The Pod log collector does not
work if pod is in `PodInitializing` state.

To find more evidence what is happening kind export logs command was added.
It should be collected by the buildkite agent and served for later
investigations.

REF
redpanda-data#7396
@dlex
Copy link
Contributor

dlex commented Jan 1, 2023

@joejulian
Copy link
Contributor

We need to collect previous logs from pods. All we can see is that pod/endpoint-template-1's initContainer was running at the time we collected logs.

@dotnwat
Copy link
Member

dotnwat commented Jan 13, 2023

We need to collect previous logs from pods. All we can see is that pod/endpoint-template-1's initContainer was running at the time we collected logs.

can that be done automatically and added into the buildkite artifacts so that we can access them?

@joejulian
Copy link
Contributor

That will be the next step when this next has someone working on it, yes.

@RafalKorepta
Copy link
Contributor

Based on 1 week of failures there was no failures in endpoint-template. I will close this issue. If it reappears I will re-open.

joejulian pushed a commit to joejulian/redpanda that referenced this issue Mar 10, 2023
Due to failing PostStart hook it seems that one of the Redpanda broker is in
degraded state and AdminAPI is not responsive. The Pod log collector does not
work if pod is in `PodInitializing` state.

To find more evidence what is happening kind export logs command was added.
It should be collected by the buildkite agent and served for later
investigations.

REF
redpanda-data#7396
joejulian pushed a commit to joejulian/redpanda that referenced this issue Mar 10, 2023
…dpanda-datagh-7396/gather-more-events

k8s: Gather k8s events from failed test
joejulian pushed a commit to joejulian/redpanda that referenced this issue Mar 24, 2023
Due to failing PostStart hook it seems that one of the Redpanda broker is in
degraded state and AdminAPI is not responsive. The Pod log collector does not
work if pod is in `PodInitializing` state.

To find more evidence what is happening kind export logs command was added.
It should be collected by the buildkite agent and served for later
investigations.

REF
redpanda-data#7396
joejulian pushed a commit to joejulian/redpanda that referenced this issue Mar 24, 2023
…edpanda-datagh-7396/gather-more-events

k8s: Gather k8s events from failed test
joejulian pushed a commit to joejulian/redpanda that referenced this issue Apr 12, 2023
Due to failing PostStart hook it seems that one of the Redpanda broker is in
degraded state and AdminAPI is not responsive. The Pod log collector does not
work if pod is in `PodInitializing` state.

To find more evidence what is happening kind export logs command was added.
It should be collected by the buildkite agent and served for later
investigations.

REF
redpanda-data#7396

(cherry picked from commit 441a967)
joejulian pushed a commit to joejulian/redpanda that referenced this issue Apr 12, 2023
Due to failing PostStart hook it seems that one of the Redpanda broker is in
degraded state and AdminAPI is not responsive. The Pod log collector does not
work if pod is in `PodInitializing` state.

To find more evidence what is happening kind export logs command was added.
It should be collected by the buildkite agent and served for later
investigations.

REF
redpanda-data#7396

(cherry picked from commit 441a967)
joejulian pushed a commit to joejulian/redpanda that referenced this issue Apr 13, 2023
Due to failing PostStart hook it seems that one of the Redpanda broker is in
degraded state and AdminAPI is not responsive. The Pod log collector does not
work if pod is in `PodInitializing` state.

To find more evidence what is happening kind export logs command was added.
It should be collected by the buildkite agent and served for later
investigations.

REF
redpanda-data#7396

(cherry picked from commit 441a967)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/k8s ci-failure kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants