Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AL2023 - PrivateDNSName regression #1711

Closed
bribroder opened this issue Mar 6, 2024 · 6 comments
Closed

AL2023 - PrivateDNSName regression #1711

bribroder opened this issue Mar 6, 2024 · 6 comments

Comments

@bribroder
Copy link

bribroder commented Mar 6, 2024

What happened:

With the new AL2023 NodeConfig system, it seems like private DNS names cause problems for new nodes (previously reported in #1263 and fixed in #1264). Our VPC uses a DHCP options set to specify a custom hostname, this prevents nodes from joining the cluster.

What you expected to happen:

Nodes can join the cluster successfully after launch

How to reproduce it (as minimally and precisely as possible):

#1263 gives great reproduction steps

For me, just launching new AL2023 nodes in a VPC with DHCP that sets a domain name causes these logs from kubelet:

"Attempting to register node" node="ip-172-16-0-100.domain.com"
"Unable to register node with API server" err="nodes \"ip-172-16-0-100.domain.com\" is forbidden: node \"ip-172-16-0-100.ec2.internal\" is not allowed to modify node \"ip-172-16-0-100.domain.com\"" node="ip-172-16-0-100.domain.com"
"Eviction manager: failed to get summary stats" err="failed to get node info: node \"ip-172-16-0-100.domain.com\" not found"
Failed to contact API server when waiting for CSINode publishing: csinodes.storage.k8s.io "ip-172-16-0-100.domain.com" is forbidden: User "system:node:ip-172-16-0-100.ec2.internal" cannot getresource "csinodes" in API group "storage.k8s.io" at the cluster scope: can only access CSINode with the same name as the requesting node

Anything else we need to know?:

Erroneously reported this here: aws/karpenter-provider-aws#5793

Other similar issues:
#1376
#1457

Environment:

  • EKS Platform version (use aws eks describe-cluster --name <name> --query cluster.platformVersion): eks.1
  • Kubernetes version (use aws eks describe-cluster --name <name> --query cluster.version): 1.29
  • AMI Version: ami-0552b3e5085247f36 amazon-eks-node-al2023-x86_64-standard-1.29-v20240227
  • Kernel (e.g. uname -a): 6.1.77-99.164.amzn2023.x86_64
  • Release information (run cat /etc/eks/release on a node):
BASE_AMI_ID="ami-0a56ce835d6f72c8e"
BUILD_TIME="Tue Feb 27 23:51:40 UTC 2024"
BUILD_KERNEL="6.1.77-99.164.amzn2023.x86_64"
ARCH="x86_64"
@bribroder
Copy link
Author

I use Karpenter to launch nodes.... Is there a way to patch the userdata with a blend of bash and the new nodeconfig?

# doesn't actually work
userData: |
    apiVersion: node.eks.aws/v1alpha1
    kind: NodeConfig
    spec:
      kubelet:
        flags:
          - --hostname-override=$(aws ec2 describe-instances --instance-ids $(imds /latest/meta-data/instance-id) --query 'Reservations[].Instances[].PrivateDnsName' --output text)

@cartermckinnon
Copy link
Member

Sorry about this. We were intending to change the naming convention for nodes in AL2023 from the beginning, to use instance ID's instead of the PrivateDnsName. This had some downstream effects and didn't ultimately make the cut (though we intend to make it opt-in soon). I'll get a PR up to address this.

@iodeslykos
Copy link

Now that the fix for this issue has been merged, how long before we can expect to see it released? We're itching to get AL2023 nodes running our EKS cluster.

@DevelopBuildRun
Copy link

btw, I ran into this same issue and I found if you set the hostname to be:

TOKEN=$(curl --request PUT "http://169.254.169.254/latest/api/token" --header "X-aws-ec2-metadata-token-ttl-seconds: 10")
ZONE=$(curl http://169.254.169.254/latest/meta-data/placement/region --header "X-aws-ec2-metadata-token: $TOKEN")
IP_BASED_NAME=$(curl http://169.254.169.254/latest/meta-data/hostname --header "X-aws-ec2-metadata-token: $TOKEN" | cut -f1 -d".")
hostnamectl set-hostname --static $IP_BASED_NAME.$ZONE.compute.internal

in your user data you should be able to get your instance working in the meanwhile so you can test before the patch is out.

@Issacwww
Copy link
Member

The fix will be release in next AMI: https://github.com/awslabs/amazon-eks-ami/releases/tag/v20240315

@iodeslykos
Copy link

Confirmed this fix is working with the following image:

ami_id     = ami-07acdbd513e154aa8
image_name = amazon-eks-node-al2023-x86_64-standard-1.27-v20240315

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants