-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom VPC domain-name affecting node lease #1457
Comments
I'm not clear on the exact behavior you're seeing. You mean that you're changing the
We're not assuming this. The
Are you referring to the |
@cartermckinnon thanks for the reply.
the node group is always there and a part of the cluster with everything working fine, we changed the VPC DNS option (false to true) and after some time (1-2 weeks) we saw the the issue happen. we didn't change PrivateDnsName for any node directly. that VPC DNS change was the only change on the infra.
other than the main failure itself, the other symptom we observed for the problematic node groups are hostname being
so the error message reported by the api server is as follows:
question is who is giving
hmm we see |
Changing your VPC's DNS settings can cause the
If both of these are My guess as to why you're seeing something break a long time after you change these options is because the Your Eventually though, the Even if the |
got it. this seems to be the most probable root cause. would this essentially conclude that if I flip my VPC DNS settings that results in a change of PrivateDNSName, my existing node group is doomed to go wrong if i keep it running long enough for the aws-iam-authenticator instances to be recycled? This whole thing seems a bit wrong to me that we're sensitive to a VPC DNS change in the context of EKS node groups. Shouldn't there be some kind of canonical and static format for a node's identity that's used in auth to control plane instead of a volatile (relatively speaking) one like PrivateDNSName? On the other hand, if there are valid reasons for this we should at least make recommendations in the doc that you shouldn't make certain VPC dns changes for a running node group, or if you have to, re-create your node group asap after you've done so -- if that makes sense |
yep, exactly! I’m working on this for a future Kubernetes version on EKS. But for now, the DNS name is in the critical path by default. |
Going to close this as it's a known issue. |
What happened:
We observed the same symptom reported by #1263, however this is not a duplicate because we're using the updated AMI and the issue still happened. We took a closer look at the fix for that issue -- https://github.com/awslabs/amazon-eks-ami/pull/1264/files (which went out with 20230501) -- and found that the command used,
aws ec2 describe-instances --instance-ids $INSTANCE_ID --query 'Reservations[].Instances[].PrivateDnsName'
, actually returns the hostname with our custom domain as well.order of events:
What you expected to happen:
No NODE DENY error should appear
How to reproduce it (as minimally and precisely as possible):
I'm not sure if this is deterministic, but the triggering condition seems to be:
Anything else we need to know?:
Hostname type
isIP name: <ip>.custom_domain
andPrivate IP DNS name (IPv4 only)
is<ip>.ec2.internal
. This mismatch is expected because we have that DHCP option, however, the CLI commandaws ec2 describe-instances --instance-ids $INSTANCE_ID --query 'Reservations[].Instances[].PrivateDnsName'
(which is used in the fix) actually returns<ip>.custom_domain
aws ec2 describe-instances ... PrivateDnsName
should return .ec2.internal, but apparently that's not the caseEnvironment:
aws eks describe-cluster --name <name> --query cluster.platformVersion
): eks.12aws eks describe-cluster --name <name> --query cluster.version
): 1.23uname -a
):Linux ip-10-176-42-253.custom_domain 5.4.249-163.359.amzn2.x86_64 #1 SMP Wed Jul 12 18:58:58 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
cat /etc/eks/release
on a node):BASE_AMI_ID="ami-018ae0f2e02aab38b" BUILD_TIME="Fri Jul 28 04:19:03 UTC 2023" BUILD_KERNEL="5.4.249-163.359.amzn2.x86_64" ARCH="x86_64"
The text was updated successfully, but these errors were encountered: