Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EKS] [request]: Managed Node Groups support for node taints #864

Closed
mikestef9 opened this issue Apr 28, 2020 · 57 comments
Closed

[EKS] [request]: Managed Node Groups support for node taints #864

mikestef9 opened this issue Apr 28, 2020 · 57 comments
Assignees
Labels
EKS Managed Nodes EKS Managed Nodes EKS Amazon Elastic Kubernetes Service

Comments

@mikestef9
Copy link
Contributor

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request
Add support for tainting nodes through managed node groups API

Which service(s) is this request for?
EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
Managed nodes supports adding Kubernetes labels as part of node group creation. This makes it easy for all nodes in a node group to have consistent labels. However, taints are not supported through the API.

Are you currently working around this issue?
Manual kubectl commands after new nodes in node group come up.

@mikestef9 mikestef9 added the EKS Amazon Elastic Kubernetes Service label Apr 28, 2020
@mikestef9 mikestef9 self-assigned this Apr 29, 2020
@TBBle
Copy link

TBBle commented May 4, 2020

When this was raised in #585, #507 was tagged as an existing request for this feature, but I think that was confusion... #507 seems to be about Container Insights correctly monitoring tainted nodes, while what we want here (and in #585) is to support setting the taints on Managed Nodegroups as part of a rollout, e.g. with eksctl.

The comment in #585 had nine thumbs-up, on top of the three currently here.

@mikestef9
Copy link
Contributor Author

@TBBle correct, I wanted to open a separate issue to explicitly track tainting node groups through the EKS API

@karstenmueller
Copy link

@mikestef9 we would like to see "tainting node groups through the EKS API" progressing and bumped it from #12 👍 to #37 as of now.

@aviau
Copy link

aviau commented May 27, 2020

It looks like the bootstrap script used by eks nodes already support taints. My understanding is that it would be a small feature to implement because it would only require to modify the userdata in the launch template to add extra args, just like its done for labels currently.

@AlbertoPeon
Copy link

We would love to have this!

@jhcook-ag
Copy link

"When nodes are created dynamically by the Kubernetes autoscaler, they need to be created with the proper taint and label.
With EKS, the taint and label can be specified in the Kubernetes kubelet service defined in the UserData section of the AWS autoscaling group LaunchConfiguration."

https://docs.cloudbees.com/docs/cloudbees-ci/latest/cloud-admin-guide/eks-auto-scaling-nodes

@TBBle
Copy link

TBBle commented Jul 30, 2020

@jhcook-ag You can't specify the UserData for Managed Node Groups when you create them.

You can modify the UserData in the Launch Configuration in the AWS console after creation, but then the Managed Node Groups feature will refuse to touch your Launch Configuration again, and you're effectively now using unmanaged Node Groups, although eksctl will still try to use the Managed Node Groups API and fail.

@jhcook-ag
Copy link

@mhausenblas we really need this 👍

@borisputerka-zz
Copy link

Absolutely would love the idea.

@Lincon-Freitas
Copy link

It is a must-have feature!

@vcucereanu
Copy link

👍

@martinoravsky
Copy link

This is a must-have feature for us as well. We can't use managed node groups because of this. When would you expect this to be released? (just roughly) 👍

@Dudssource
Copy link

Hi @martinoravsky, I believe this feature is available now.

https://aws.amazon.com/blogs/containers/introducing-launch-template-and-custom-ami-support-in-amazon-eks-managed-node-groups/

We did it by customizing the userdata on the custom launch template and specifying the taints for the kubelet (using the register-with-taints argument).

@martinoravsky
Copy link

Hi @Dudssource ,

are you using custom AMIs? I'm using launch templates with EKS optimized AMIs which include UserData that bootstraps the node to the cluster automatically (with --kubelet-extra-args empty). This userdata is not editable for us, we can only add our own UserData as MIME multipart file which has no effect on bootstrapping the cluster. I'm curious if you were able to get this to work without custom AMIs.

@Dudssource
Copy link

@martinoravsky, yes unfortunately we had to use a custom AMI for this to work.
But we used the same optimized AMI that EKS uses, we use terraform so we used a datasource to get the latest AMI for our cluster version. I know that this is possible with Cloudformation and parameter store too.

@mikestef9
Copy link
Contributor Author

The approach that @Dudssource used here is certainly an option, but we do plan to add taints directly to the EKS API (similar to labels), so that a custom AMI is not required.

@lwimmer
Copy link

lwimmer commented Sep 15, 2020

I've found a solution (admittedly quite hackish) to allow setting taints with the offical AMIs:

Set the userdata for the Launch Template similar to this:

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==7561478f-5b81-4e9d-9db6-aec8f463d2ab=="

--==7561478f-5b81-4e9d-9db6-aec8f463d2ab==
Content-Type: text/x-shellscript; charset="us-ascii"

#!/bin/bash
sed -i '/^KUBELET_EXTRA_ARGS=/a KUBELET_EXTRA_ARGS+=" --register-with-taints=foo=bar:NoSchedule"' /etc/eks/bootstrap.sh

--==7561478f-5b81-4e9d-9db6-aec8f463d2ab==--\

This script is run before the bootstrap script, which is managed by EKS, patching the /etc/eks/bootstrap.sh to inject the necessary --register-with-taints in the KUBELET_EXTRA_ARGS variable.
This solution is not perfect and might break if AWS changes the bootstrap script, but it works for now and can be used until there is proper support for taints.

@dwilliams782
Copy link

@lwimmer That is superbly hacky! Good work.

I'm really surprised this feature is missing, and overall I'm shocked how feature incomplete node groups are.

@markthebault
Copy link

+1

1 similar comment
@DovAmir
Copy link

DovAmir commented Dec 7, 2020

+1

@pierluigilenoci
Copy link

@EvertonSA 🥳

Yeeeee

@teochenglim
Copy link

Guys,

I found the same implementation on AWS terraform workshop
https://github.com/aws-samples/terraform-eks-code/blob/master/extra/nodeg2/user_data.tf

Having say that I am hoping to simple pass in a simple flag in terraform param and not the twist and turn way and get it done and worry about when next release, this feature doesn't work again, just like the eksctl prebootstrap.

I don't know how difficult will be "if define" this and plug this value in just like the "helm chart" to be implemented on terraform or eksctl . We, community can twist and turn to provide a solution. But overall I think all these reasonable production ready feature shall be inside allow all of us to extend the functionality properly. These has to be answered by AWS and committed, if in next release it is wipe out again, why should anyone waste time doing it?

Cheers.

@ArchiFleKs
Copy link

Hi, this has been merged and it seems to still work with official AMI.

I have just tested with the following configuration:

  node_groups = {
    "default-${local.aws_region}a" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[0]]
      disk_size        = 20
    }

    "default-${local.aws_region}b" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[1]]
      disk_size        = 20
    }

    "default-${local.aws_region}c" = {
      ami_type = "AL2_ARM_64"
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
    "taint-${local.aws_region}c" = {
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t3a.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
  }

And it is working as expected, this PR is based on the fix found here

@EKami
Copy link

EKami commented May 2, 2021

Hi, this has been merged and it seems to still work with official AMI.

I have just tested with the following configuration:

  node_groups = {
    "default-${local.aws_region}a" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[0]]
      disk_size        = 20
    }

    "default-${local.aws_region}b" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[1]]
      disk_size        = 20
    }

    "default-${local.aws_region}c" = {
      ami_type = "AL2_ARM_64"
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
    "taint-${local.aws_region}c" = {
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t3a.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
  }

And it is working as expected, this PR is based on the fix found here

Still not working for me, even with create_launch_template set to true :(

@EvertonSA
Copy link

EvertonSA commented May 5, 2021

Hi, this has been merged and it seems to still work with official AMI.

I have just tested with the following configuration:

  node_groups = {
    "default-${local.aws_region}a" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[0]]
      disk_size        = 20
    }

    "default-${local.aws_region}b" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[1]]
      disk_size        = 20
    }

    "default-${local.aws_region}c" = {
      ami_type = "AL2_ARM_64"
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
    "taint-${local.aws_region}c" = {
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t3a.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
  }

And it is working as expected, this PR is based on the fix found here

Thanks for the PR, I will try it out. Your solution is very elegant. Unfortunately, some of our already running environments are provisioned using rawly terraform resources. I have no clue how much effort would it take to migrate to use the terraform aws eks module. I might give it a shot on our development environments in the next following weeks.

Although I strongly support your development, I still think taints should be accepted here: https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_node_group. What happens if I open a ticket with mr. Bezos and he replies that they will go further with the incident because I'm using a community module (that modifies the kubelet default behavior) instead of the official product API?

I really don't know what are the implications of using community modules, but according to the documentation, only Enterprise Business Support can have "Third-party software support". So #AWS, if I encorage my team to fully migrate to this module, will I have issues with your definition of "Third-party software support"? Does "Third-party software support" includes kubelet default behavior modifications?

We eagerly wait for a response.

@EvertonSA
Copy link

EvertonSA commented May 5, 2021

I could not find a definition for "Third-party software support" other than:

Third-party software support – Help with Amazon Elastic Compute Cloud (Amazon EC2) instance operating systems and configuration. Also, help with the performance of the most popular third-party software components on AWS. Third-party software support isn't available for customers on Basic or Developer Support plans.

@teochenglim
Copy link

teochenglim commented May 5, 2021

Hello guys,

I have been working on these for couple of months (with or without terraform). It will not work no matter how hard you tried. That's a problem on EKS manage nodegroup it will plug their own user-data behind your user-data. AWS created another secondary launchtemplate on your behave and the user-data from running instance is getting from the new the new launchtemplate.

You can verify this on your EC2 node then you will know what am i talking about. And you can compare your launch template on the AWS console and on the running instance launch template.

# ssh [your_eks_node]
$ curl http://169.254.169.254/latest/user-data

You can also manually view the launchtemplate (sorted by recent date)

You can still do it, but the status of the nodegroup creation is "NodeCreationFailure" after waited 20 minutes for each try.

Cheers,
Cheng Lim

@ArchiFleKs
Copy link

Hi, this has been merged and it seems to still work with official AMI.
I have just tested with the following configuration:

  node_groups = {
    "default-${local.aws_region}a" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[0]]
      disk_size        = 20
    }

    "default-${local.aws_region}b" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[1]]
      disk_size        = 20
    }

    "default-${local.aws_region}c" = {
      ami_type = "AL2_ARM_64"
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
    "taint-${local.aws_region}c" = {
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t3a.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
  }

And it is working as expected, this PR is based on the fix found here

Thanks for the PR, I will try it out. Your solution is very elegant. Unfortunately, some of our already running environments are provisioned using rawly terraform resources. I have no clue how much effort would it take to migrate to use the terraform aws eks module. I might give it a shot on our development environments in the next following weeks.

Although I strongly support your development, I still think taints should be accepted here: https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_node_group. What happens if I open a ticket with mr. Bezos and he replies that they will go further with the incident because I'm using a community module (that modifies the kubelet default behavior) instead of the official product API?

I really don't know what are the implications of using community modules, but according to the documentation, only Enterprise Business Support can have "Third-party software support". So #AWS, if I encorage my team to fully migrate to this module, will I have issues with your definition of "Third-party software support"? Does "Third-party software support" includes kubelet default behavior modifications?

We eagerly wait for a response.

I honestly do not know about the support but using custom launch template is supposed to be supported on AWS, if you have support and are using the official AMI, I do not see why you would loose the support. I guess the same thing could apply to people using custom AMI that AWS have not way to verified, do they also loose support ?

@jfoechsler
Copy link

jfoechsler commented May 5, 2021

@teochenglim Not sure what you are referring to, but providing user data in managed nodegroup launch template works fine and is merged in the EKS created launch template.

edit
What I meant to say is yes there is a known working workaround, but this issue is for support in AWS API, not support forum for Terraform etc. I also missed the status Coming Soon of this issue 👍

The fact that this can be used as workaround to add taints by modifying eks bootstrap should in my view obviously not be considered solution. I don't know how this is even missing in EKS when taints has been Kubernetes feature since long time ago.
In Azure managed node pools it has pretty much always been supported

@teochenglim
Copy link

Hi ArchiFleKs,

My 2 cents, if everything need to custom, why EKS? We should run on perm Kubernetes.

Yes, that's an option that custom launch template is supported on AWS now. But is has bug.

And to be fair people just mixing around everything now. Some are talking about terraform module, some are talking about eksctl, some are talking about custom workgroup or manage workgroup. And you are talking about office AMI?

But base on my simple troubleshooting, an extra launchtemplate is created and your manage node group is pointing to that. This behaviour is the same for terraform or manually do it on AWS console. I am yet to try eksctl but why should i try it since i am no longer using it?

@teochenglim
Copy link

teochenglim commented May 5, 2021

@teochenglim Not sure what you are referring to, but providing user data in managed nodegroup launch template works fine and is merged in the EKS created launch template.

The fact that this can be used as workaround to add taints by modifying eks bootstrap should in my view obviously not be considered solution. I don't know how this is even missing in EKS when taints has been Kubernetes feature since long time ago.
In Azure managed node pools it has pretty much always been supported:

az aks nodepool add \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name taintnp \
    --node-count 1 \
    --node-taints sku=gpu:NoSchedule \

I try it today and it doesn't work for me. Can you show me your working version?
This is EKS, why you showing aks?
BTW we are creating EKS using terraform, we can't add nodegroup using eksctl

@TBBle
Copy link

TBBle commented May 5, 2021

Given this has gone from "We're Working On It" to "Coming Soon", presumably it's mostly done, and is being tested/validated/integrated, so "AWS sucks, everyone else has had this forever" isn't really a useful contribution.

Workarounds in the meantime are a useful contribution, I think, but support questions about them does generate a bit of noise in this ticket. Is there a terraform-specific place to debug the terraform-based workaround instead, so this ticket can remain focussed on the Managed Node Groups API for this, and maybe just catalog the workarounds (all using custom launch templates now?).

If custom launch templates aren't working correctly, that's not really a "here" thing either. #585 would be closer, but this isn't really a support forum anyway, so you may not have much luck there.

@ArchiFleKs
Copy link

ArchiFleKs commented May 5, 2021

Hi ArchiFleKs,

My 2 cents, if everything need to custom, why EKS? We should run on perm Kubernetes.

Yes, that's an option that custom launch template is supported on AWS now. But is has bug.

And to be fair people just mixing around everything now. Some are talking about terraform module, some are talking about eksctl, some are talking about custom workgroup or manage workgroup. And you are talking about office AMI?

But base on my simple troubleshooting, an extra launchtemplate is created and your manage node group is pointing to that. This behaviour is the same for terraform or manually do it on AWS console. I am yet to try eksctl but why should i try it since i am no longer using it?

You still need tools to orchestrate your infrastructure, whether it is managed or not. Even if you do it by hand with the AWS console or the awscli, Cloudformation, Terraform or eksctl

I agree that AWS EKS managed node group API should expose a native Taint options like it does for the labels. Exposing the kubelet args allow people to customize kubelet as they wish. This allow power user to do custom configuration even with managed node group.

Even if using managed service, you still need to use an AMI (by official I mean this one https://docs.aws.amazon.com/eks/latest/userguide/eks-optimized-ami.html) or you can build your own.

The behavior if building your own is different than th official one if using user data as explained here. (https://docs.aws.amazon.com/eks/latest/userguide/launch-templates.html#launch-template-user-data).There is a merged involved with official AMI that you do not have when using official AMI (that prevent the pre userdata of being used).

If you can explain your bug in more detail maybe someone here can help. We are trying to build tools (eksctl or terraform-aws-eks) to abstract this part for the user (just like a manage service does).

Personally I'm using the terraform-aws-eks module and this feature has just been release and is working at least with official AMI, I have not tested with custom AMI.

Let me know if I can help you with this.

@ArchiFleKs
Copy link

Hi, this has been merged and it seems to still work with official AMI.
I have just tested with the following configuration:

  node_groups = {
    "default-${local.aws_region}a" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[0]]
      disk_size        = 20
    }

    "default-${local.aws_region}b" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[1]]
      disk_size        = 20
    }

    "default-${local.aws_region}c" = {
      ami_type = "AL2_ARM_64"
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
    "taint-${local.aws_region}c" = {
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t3a.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
  }

And it is working as expected, this PR is based on the fix found here

Still not working for me, even with create_launch_template set to true :(

Are you using the master version of the module ? Latest release with this PR is only out today in https://github.com/terraform-aws-modules/terraform-aws-eks/releases/tag/v15.2.0

@EKami
Copy link

EKami commented May 5, 2021

Are you using the master version of the module ? Latest release with this PR is only out today in https://github.com/terraform-aws-modules/terraform-aws-eks/releases/tag/v15.2.0

Oh, I thought it was included in version 15.1.0, I'll try with version 15.2.0 then, thanks! :)

@EvertonSA
Copy link

Hi, this has been merged and it seems to still work with official AMI.
I have just tested with the following configuration:

  node_groups = {
    "default-${local.aws_region}a" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[0]]
      disk_size        = 20
    }

    "default-${local.aws_region}b" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[1]]
      disk_size        = 20
    }

    "default-${local.aws_region}c" = {
      ami_type = "AL2_ARM_64"
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
    "taint-${local.aws_region}c" = {
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t3a.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
  }

And it is working as expected, this PR is based on the fix found here

Thanks for the PR, I will try it out. Your solution is very elegant. Unfortunately, some of our already running environments are provisioned using rawly terraform resources. I have no clue how much effort would it take to migrate to use the terraform aws eks module. I might give it a shot on our development environments in the next following weeks.
Although I strongly support your development, I still think taints should be accepted here: https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_node_group. What happens if I open a ticket with mr. Bezos and he replies that they will go further with the incident because I'm using a community module (that modifies the kubelet default behavior) instead of the official product API?
I really don't know what are the implications of using community modules, but according to the documentation, only Enterprise Business Support can have "Third-party software support". So #AWS, if I encorage my team to fully migrate to this module, will I have issues with your definition of "Third-party software support"? Does "Third-party software support" includes kubelet default behavior modifications?
We eagerly wait for a response.

I honestly do not know about the support but using custom launch template is supposed to be supported on AWS, if you have support and are using the official AMI, I do not see why you would loose the support. I guess the same thing could apply to people using custom AMI that AWS have not way to verified, do they also loose support ?

yes, I had my ticket dropped few years ago.

@EvertonSA
Copy link

Some features took 15 days to change from Comming Soon to Shipped. Other features took months. How long should I wait? Does it make sense using community terraform workarunds if we are now on comming soon?

@TBBle "so "AWS sucks, everyone else has had this forever" isn't really a useful contribution.". Disagre totaly. As a product owner think this is REALLY useful contribution to my product.

@TBBle
Copy link

TBBle commented May 5, 2021

How long should I wait? Does it make sense using community terraform workarunds if we are now on comming soon?

That depends on your needs and priorities. If you need a terraform deployment today, then you can't wait, so don't wait. If you are just tracking this as a blocker for migrating to Managed Node Groups, and are happy with self/un-managed Node Groups in the meantime, then waiting is fine. (I'm in the latter boat, but it's not the only "migration-blocking" feature I'm tracking, and really only applies to the "next cluster" I build, since existing clusters work now)

As for the other part, since you stripped the context of my quote, including the important part, I'll requote it

presumably it's mostly done, and is being tested/validated/integrated, so "AWS sucks, everyone else has had this forever" isn't really a useful contribution.

Leaving aside the toxic phrasing of this feedback, "AWS sucks, everyone else has had this forever" tells a Product Owner nothing about a feature which is already in the delivery pipeline. That sort of information is more useful when deciding if and where to prioritise a feature, or if the PO has (for whatever reason) never looked at their competition's offerings.

Once it's at the stage of the pipeline I presumed it to be at, it's very unlikely that someone is going to slap their forehead and say "Oh! We should just ship that, instead of sitting on the ready-to-go feature in order to feast on the tears of our users" (or whatever reaction one expects from such comments).

This by far the most 👍'd feature request in the Coming Soon bucket (by a multiple of 5 from its next-closest) and I certainly assume that the person/people managing this backlog can count.

@teochenglim
Copy link

Hi, this has been merged and it seems to still work with official AMI.
I have just tested with the following configuration:

  node_groups = {
    "default-${local.aws_region}a" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[0]]
      disk_size        = 20
    }

    "default-${local.aws_region}b" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[1]]
      disk_size        = 20
    }

    "default-${local.aws_region}c" = {
      ami_type = "AL2_ARM_64"
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
    "taint-${local.aws_region}c" = {
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t3a.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
  }

And it is working as expected, this PR is based on the fix found here

Still not working for me, even with create_launch_template set to true :(

Are you using the master version of the module ? Latest release with this PR is only out today in https://github.com/terraform-aws-modules/terraform-aws-eks/releases/tag/v15.2.0

@ArchiFleKs
With Terrafrom or without Terraform it is the same error. I don't think change version will help because terraform has no problem, the problem is at AWS console.
It works on terraform side, but AWS console display it the wrong launch template.

@teochenglim
Copy link

Some features took 15 days to change from Comming Soon to Shipped. Other features took months. How long should I wait? Does it make sense using community terraform workarunds if we are now on comming soon?

@TBBle "so "AWS sucks, everyone else has had this forever" isn't really a useful contribution.". Disagre totaly. As a product owner think this is REALLY useful contribution to my product.

@EvertonSA If the Product Owner is serious about his product and take into consideration multiple users has different need base on what they have existing, he/she shall make things flexible. We (Community) spend time and effort and use his product. If he/she decided out of long list of Kubelet flags (https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/) he can only choose to expose 1 flag and "is good enough" and solve all the problem in the world, And community shall be happy about it?

Besides, "KUBELET_EXTRA_ARGS" exists long time ago and he/she decided to remove it and make this problem? eksctl overrideBootstrapCommand also get behavioural changes.

My point is few months back we have freedom to choose what to do and now everything is buggy and we worry about next version it get totally change again. So for each release (every 3 months roughly), we have to revisit this again? And pray hard it works this time?

Most other user drop the case (github issue get close without known why and given up) already, they will just claim eks doesn't work for them. But I am still here.

@EvertonSA
Copy link

How long should I wait? Does it make sense using community terraform workarunds if we are now on comming soon?

That depends on your needs and priorities. If you need a terraform deployment today, then you can't wait, so don't wait. If you are just tracking this as a blocker for migrating to Managed Node Groups, and are happy with self/un-managed Node Groups in the meantime, then waiting is fine. (I'm in the latter boat, but it's not the only "migration-blocking" feature I'm tracking, and really only applies to the "next cluster" I build, since existing clusters work now)

As for the other part, since you stripped the context of my quote, including the important part, I'll requote it

presumably it's mostly done, and is being tested/validated/integrated, so "AWS sucks, everyone else has had this forever" isn't really a useful contribution.

Leaving aside the toxic phrasing of this feedback, "AWS sucks, everyone else has had this forever" tells a Product Owner nothing about a feature which is already in the delivery pipeline. That sort of information is more useful when deciding if and where to prioritise a feature, or if the PO has (for whatever reason) never looked at their competition's offerings.

Once it's at the stage of the pipeline I presumed it to be at, it's very unlikely that someone is going to slap their forehead and say "Oh! We should just ship that, instead of sitting on the ready-to-go feature in order to feast on the tears of our users" (or whatever reaction one expects from such comments).

This by far the most 👍'd feature request in the Coming Soon bucket (by a multiple of 5 from its next-closest) and I certainly assume that the person/people managing this backlog can count.

thanks for the input. I did not mean to be toxic or rude. I totally understand your reactions. Our opinions might not get allong but fine.

regarding terraform development, I will wait until I get feedback from my team.

@EvertonSA
Copy link

Some features took 15 days to change from Comming Soon to Shipped. Other features took months. How long should I wait? Does it make sense using community terraform workarunds if we are now on comming soon?
@TBBle "so "AWS sucks, everyone else has had this forever" isn't really a useful contribution.". Disagre totaly. As a product owner think this is REALLY useful contribution to my product.

@EvertonSA If the Product Owner is serious about his product and take into consideration multiple users has different need base on what they have existing, he/she shall make things flexible. We (Community) spend time and effort and use his product. If he/she decided out of long list of Kubelet flags (https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/) he can only choose to expose 1 flag and "is good enough" and solve all the problem in the world, And community shall be happy about it?

Besides, "KUBELET_EXTRA_ARGS" exists long time ago and he/she decided to remove it and make this problem? eksctl overrideBootstrapCommand also get behavioural changes.

My point is few months back we have freedom to choose what to do and now everything is buggy and we worry about next version it get totally change again. So for each release (every 3 months roughly), we have to revisit this again? And pray hard it works this time?

Most other user drop the case (github issue get close without known why and given up) already, they will just claim eks doesn't work for them. But I am still here.

thanks for the input. Let's say she don't really care of what I do with the kubelet. As long as AWS business support don't turn their backs on us if we need then. I totally understand your reactions.

@ArchiFleKs
Copy link

ArchiFleKs commented May 8, 2021 via email

@mikestef9
Copy link
Contributor Author

mikestef9 commented May 11, 2021

Hey folks,

Native support for Kubernetes taints is now available in managed node groups!

@teochenglim
Copy link

@mikestef9 thank you for that saw it on my console today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EKS Managed Nodes EKS Managed Nodes EKS Amazon Elastic Kubernetes Service
Projects
None yet
Development

No branches or pull requests