[EKS] [request]: Managed Node Groups support for node taints #864

mikestef9 · 2020-04-28T15:30:53Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request
Add support for tainting nodes through managed node groups API

Which service(s) is this request for?
EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
Managed nodes supports adding Kubernetes labels as part of node group creation. This makes it easy for all nodes in a node group to have consistent labels. However, taints are not supported through the API.

Are you currently working around this issue?
Manual kubectl commands after new nodes in node group come up.

TBBle · 2020-05-04T09:08:23Z

When this was raised in #585, #507 was tagged as an existing request for this feature, but I think that was confusion... #507 seems to be about Container Insights correctly monitoring tainted nodes, while what we want here (and in #585) is to support setting the taints on Managed Nodegroups as part of a rollout, e.g. with eksctl.

The comment in #585 had nine thumbs-up, on top of the three currently here.

mikestef9 · 2020-05-04T14:59:48Z

@TBBle correct, I wanted to open a separate issue to explicitly track tainting node groups through the EKS API

karstenmueller · 2020-05-12T07:33:23Z

@mikestef9 we would like to see "tainting node groups through the EKS API" progressing and bumped it from #12 👍 to #37 as of now.

aviau · 2020-05-27T21:38:57Z

It looks like the bootstrap script used by eks nodes already support taints. My understanding is that it would be a small feature to implement because it would only require to modify the userdata in the launch template to add extra args, just like its done for labels currently.

AlbertoPeon · 2020-07-17T14:04:55Z

We would love to have this!

jhcook-ag · 2020-07-30T12:49:57Z

"When nodes are created dynamically by the Kubernetes autoscaler, they need to be created with the proper taint and label.
With EKS, the taint and label can be specified in the Kubernetes kubelet service defined in the UserData section of the AWS autoscaling group LaunchConfiguration."

https://docs.cloudbees.com/docs/cloudbees-ci/latest/cloud-admin-guide/eks-auto-scaling-nodes

TBBle · 2020-07-30T12:53:16Z

@jhcook-ag You can't specify the UserData for Managed Node Groups when you create them.

You can modify the UserData in the Launch Configuration in the AWS console after creation, but then the Managed Node Groups feature will refuse to touch your Launch Configuration again, and you're effectively now using unmanaged Node Groups, although eksctl will still try to use the Managed Node Groups API and fail.

jhcook-ag · 2020-07-30T12:58:51Z

@mhausenblas we really need this 👍

borisputerka-zz · 2020-08-03T07:58:38Z

Absolutely would love the idea.

Lincon-Freitas · 2020-08-04T18:52:43Z

It is a must-have feature!

vcucereanu · 2020-08-12T14:56:24Z

👍

martinoravsky · 2020-09-10T11:23:34Z

This is a must-have feature for us as well. We can't use managed node groups because of this. When would you expect this to be released? (just roughly) 👍

Dudssource · 2020-09-10T12:20:38Z

Hi @martinoravsky, I believe this feature is available now.

https://aws.amazon.com/blogs/containers/introducing-launch-template-and-custom-ami-support-in-amazon-eks-managed-node-groups/

We did it by customizing the userdata on the custom launch template and specifying the taints for the kubelet (using the register-with-taints argument).

martinoravsky · 2020-09-10T12:26:00Z

Hi @Dudssource ,

are you using custom AMIs? I'm using launch templates with EKS optimized AMIs which include UserData that bootstraps the node to the cluster automatically (with --kubelet-extra-args empty). This userdata is not editable for us, we can only add our own UserData as MIME multipart file which has no effect on bootstrapping the cluster. I'm curious if you were able to get this to work without custom AMIs.

Dudssource · 2020-09-10T12:38:40Z

@martinoravsky, yes unfortunately we had to use a custom AMI for this to work.
But we used the same optimized AMI that EKS uses, we use terraform so we used a datasource to get the latest AMI for our cluster version. I know that this is possible with Cloudformation and parameter store too.

mikestef9 · 2020-09-10T15:53:36Z

The approach that @Dudssource used here is certainly an option, but we do plan to add taints directly to the EKS API (similar to labels), so that a custom AMI is not required.

lwimmer · 2020-09-15T13:12:06Z

I've found a solution (admittedly quite hackish) to allow setting taints with the offical AMIs:

Set the userdata for the Launch Template similar to this:

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==7561478f-5b81-4e9d-9db6-aec8f463d2ab=="

--==7561478f-5b81-4e9d-9db6-aec8f463d2ab==
Content-Type: text/x-shellscript; charset="us-ascii"

#!/bin/bash
sed -i '/^KUBELET_EXTRA_ARGS=/a KUBELET_EXTRA_ARGS+=" --register-with-taints=foo=bar:NoSchedule"' /etc/eks/bootstrap.sh

--==7561478f-5b81-4e9d-9db6-aec8f463d2ab==--\

This script is run before the bootstrap script, which is managed by EKS, patching the /etc/eks/bootstrap.sh to inject the necessary --register-with-taints in the KUBELET_EXTRA_ARGS variable.
This solution is not perfect and might break if AWS changes the bootstrap script, but it works for now and can be used until there is proper support for taints.

dwilliams782 · 2020-09-30T15:00:59Z

@lwimmer That is superbly hacky! Good work.

I'm really surprised this feature is missing, and overall I'm shocked how feature incomplete node groups are.

markthebault · 2020-12-03T15:04:00Z

+1

DovAmir · 2020-12-07T14:23:20Z

+1

pierluigilenoci · 2021-04-12T13:45:37Z

@EvertonSA 🥳

teochenglim · 2021-04-14T02:04:24Z

Guys,

I found the same implementation on AWS terraform workshop
https://github.com/aws-samples/terraform-eks-code/blob/master/extra/nodeg2/user_data.tf

Having say that I am hoping to simple pass in a simple flag in terraform param and not the twist and turn way and get it done and worry about when next release, this feature doesn't work again, just like the eksctl prebootstrap.

I don't know how difficult will be "if define" this and plug this value in just like the "helm chart" to be implemented on terraform or eksctl . We, community can twist and turn to provide a solution. But overall I think all these reasonable production ready feature shall be inside allow all of us to extend the functionality properly. These has to be answered by AWS and committed, if in next release it is wipe out again, why should anyone waste time doing it?

Cheers.

ArchiFleKs · 2021-04-22T17:53:00Z

Hi, this has been merged and it seems to still work with official AMI.

I have just tested with the following configuration:

  node_groups = {
    "default-${local.aws_region}a" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[0]]
      disk_size        = 20
    }

    "default-${local.aws_region}b" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[1]]
      disk_size        = 20
    }

    "default-${local.aws_region}c" = {
      ami_type = "AL2_ARM_64"
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
    "taint-${local.aws_region}c" = {
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t3a.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
  }

And it is working as expected, this PR is based on the fix found here

EKami · 2021-05-02T20:56:24Z

Hi, this has been merged and it seems to still work with official AMI.

I have just tested with the following configuration:

  node_groups = {
    "default-${local.aws_region}a" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[0]]
      disk_size        = 20
    }

    "default-${local.aws_region}b" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[1]]
      disk_size        = 20
    }

    "default-${local.aws_region}c" = {
      ami_type = "AL2_ARM_64"
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
    "taint-${local.aws_region}c" = {
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t3a.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
  }

And it is working as expected, this PR is based on the fix found here

Still not working for me, even with create_launch_template set to true :(

EvertonSA · 2021-05-05T00:33:43Z

Hi, this has been merged and it seems to still work with official AMI.

I have just tested with the following configuration:

  node_groups = {
    "default-${local.aws_region}a" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[0]]
      disk_size        = 20
    }

    "default-${local.aws_region}b" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[1]]
      disk_size        = 20
    }

    "default-${local.aws_region}c" = {
      ami_type = "AL2_ARM_64"
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
    "taint-${local.aws_region}c" = {
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t3a.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
  }

And it is working as expected, this PR is based on the fix found here

Thanks for the PR, I will try it out. Your solution is very elegant. Unfortunately, some of our already running environments are provisioned using rawly terraform resources. I have no clue how much effort would it take to migrate to use the terraform aws eks module. I might give it a shot on our development environments in the next following weeks.

Although I strongly support your development, I still think taints should be accepted here: https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_node_group. What happens if I open a ticket with mr. Bezos and he replies that they will go further with the incident because I'm using a community module (that modifies the kubelet default behavior) instead of the official product API?

I really don't know what are the implications of using community modules, but according to the documentation, only Enterprise Business Support can have "Third-party software support". So #AWS, if I encorage my team to fully migrate to this module, will I have issues with your definition of "Third-party software support"? Does "Third-party software support" includes kubelet default behavior modifications?

We eagerly wait for a response.

EvertonSA · 2021-05-05T00:46:55Z

I could not find a definition for "Third-party software support" other than:

Third-party software support – Help with Amazon Elastic Compute Cloud (Amazon EC2) instance operating systems and configuration. Also, help with the performance of the most popular third-party software components on AWS. Third-party software support isn't available for customers on Basic or Developer Support plans.

teochenglim · 2021-05-05T09:16:41Z

Hello guys,

I have been working on these for couple of months (with or without terraform). It will not work no matter how hard you tried. That's a problem on EKS manage nodegroup it will plug their own user-data behind your user-data. AWS created another secondary launchtemplate on your behave and the user-data from running instance is getting from the new the new launchtemplate.

You can verify this on your EC2 node then you will know what am i talking about. And you can compare your launch template on the AWS console and on the running instance launch template.

# ssh [your_eks_node]
$ curl http://169.254.169.254/latest/user-data

You can also manually view the launchtemplate (sorted by recent date)

You can still do it, but the status of the nodegroup creation is "NodeCreationFailure" after waited 20 minutes for each try.

Cheers,
Cheng Lim

ArchiFleKs · 2021-05-05T09:28:49Z

Hi, this has been merged and it seems to still work with official AMI.
I have just tested with the following configuration:
  node_groups = {
    "default-${local.aws_region}a" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[0]]
      disk_size        = 20
    }

    "default-${local.aws_region}b" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[1]]
      disk_size        = 20
    }

    "default-${local.aws_region}c" = {
      ami_type = "AL2_ARM_64"
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
    "taint-${local.aws_region}c" = {
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t3a.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
  }
And it is working as expected, this PR is based on the fix found here
Thanks for the PR, I will try it out. Your solution is very elegant. Unfortunately, some of our already running environments are provisioned using rawly terraform resources. I have no clue how much effort would it take to migrate to use the terraform aws eks module. I might give it a shot on our development environments in the next following weeks.

Although I strongly support your development, I still think taints should be accepted here: https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_node_group. What happens if I open a ticket with mr. Bezos and he replies that they will go further with the incident because I'm using a community module (that modifies the kubelet default behavior) instead of the official product API?

I really don't know what are the implications of using community modules, but according to the documentation, only Enterprise Business Support can have "Third-party software support". So #AWS, if I encorage my team to fully migrate to this module, will I have issues with your definition of "Third-party software support"? Does "Third-party software support" includes kubelet default behavior modifications?

We eagerly wait for a response.

I honestly do not know about the support but using custom launch template is supposed to be supported on AWS, if you have support and are using the official AMI, I do not see why you would loose the support. I guess the same thing could apply to people using custom AMI that AWS have not way to verified, do they also loose support ?

jfoechsler · 2021-05-05T09:47:30Z

@teochenglim Not sure what you are referring to, but providing user data in managed nodegroup launch template works fine and is merged in the EKS created launch template.

edit
What I meant to say is yes there is a known working workaround, but this issue is for support in AWS API, not support forum for Terraform etc. I also missed the status Coming Soon of this issue 👍

The fact that this can be used as workaround to add taints by modifying eks bootstrap should in my view obviously not be considered solution. I don't know how this is even missing in EKS when taints has been Kubernetes feature since long time ago.
In Azure managed node pools it has pretty much always been supported

teochenglim · 2021-05-05T09:48:24Z

Hi ArchiFleKs,

My 2 cents, if everything need to custom, why EKS? We should run on perm Kubernetes.

Yes, that's an option that custom launch template is supported on AWS now. But is has bug.

And to be fair people just mixing around everything now. Some are talking about terraform module, some are talking about eksctl, some are talking about custom workgroup or manage workgroup. And you are talking about office AMI?

But base on my simple troubleshooting, an extra launchtemplate is created and your manage node group is pointing to that. This behaviour is the same for terraform or manually do it on AWS console. I am yet to try eksctl but why should i try it since i am no longer using it?

teochenglim · 2021-05-05T09:49:32Z

@teochenglim Not sure what you are referring to, but providing user data in managed nodegroup launch template works fine and is merged in the EKS created launch template.

The fact that this can be used as workaround to add taints by modifying eks bootstrap should in my view obviously not be considered solution. I don't know how this is even missing in EKS when taints has been Kubernetes feature since long time ago.
In Azure managed node pools it has pretty much always been supported:
az aks nodepool add \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name taintnp \
    --node-count 1 \
    --node-taints sku=gpu:NoSchedule \

I try it today and it doesn't work for me. Can you show me your working version?
This is EKS, why you showing aks?
BTW we are creating EKS using terraform, we can't add nodegroup using eksctl

TBBle · 2021-05-05T09:56:49Z

Given this has gone from "We're Working On It" to "Coming Soon", presumably it's mostly done, and is being tested/validated/integrated, so "AWS sucks, everyone else has had this forever" isn't really a useful contribution.

Workarounds in the meantime are a useful contribution, I think, but support questions about them does generate a bit of noise in this ticket. Is there a terraform-specific place to debug the terraform-based workaround instead, so this ticket can remain focussed on the Managed Node Groups API for this, and maybe just catalog the workarounds (all using custom launch templates now?).

If custom launch templates aren't working correctly, that's not really a "here" thing either. #585 would be closer, but this isn't really a support forum anyway, so you may not have much luck there.

ArchiFleKs · 2021-05-05T10:03:44Z

Hi ArchiFleKs,

My 2 cents, if everything need to custom, why EKS? We should run on perm Kubernetes.

Yes, that's an option that custom launch template is supported on AWS now. But is has bug.

And to be fair people just mixing around everything now. Some are talking about terraform module, some are talking about eksctl, some are talking about custom workgroup or manage workgroup. And you are talking about office AMI?

But base on my simple troubleshooting, an extra launchtemplate is created and your manage node group is pointing to that. This behaviour is the same for terraform or manually do it on AWS console. I am yet to try eksctl but why should i try it since i am no longer using it?

You still need tools to orchestrate your infrastructure, whether it is managed or not. Even if you do it by hand with the AWS console or the awscli, Cloudformation, Terraform or eksctl

I agree that AWS EKS managed node group API should expose a native Taint options like it does for the labels. Exposing the kubelet args allow people to customize kubelet as they wish. This allow power user to do custom configuration even with managed node group.

Even if using managed service, you still need to use an AMI (by official I mean this one https://docs.aws.amazon.com/eks/latest/userguide/eks-optimized-ami.html) or you can build your own.

The behavior if building your own is different than th official one if using user data as explained here. (https://docs.aws.amazon.com/eks/latest/userguide/launch-templates.html#launch-template-user-data).There is a merged involved with official AMI that you do not have when using official AMI (that prevent the pre userdata of being used).

If you can explain your bug in more detail maybe someone here can help. We are trying to build tools (eksctl or terraform-aws-eks) to abstract this part for the user (just like a manage service does).

Personally I'm using the terraform-aws-eks module and this feature has just been release and is working at least with official AMI, I have not tested with custom AMI.

Let me know if I can help you with this.

ArchiFleKs · 2021-05-05T10:07:23Z

Hi, this has been merged and it seems to still work with official AMI.
I have just tested with the following configuration:

  node_groups = {
    "default-${local.aws_region}a" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[0]]
      disk_size        = 20
    }

    "default-${local.aws_region}b" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[1]]
      disk_size        = 20
    }

    "default-${local.aws_region}c" = {
      ami_type = "AL2_ARM_64"
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
    "taint-${local.aws_region}c" = {
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t3a.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
  }

And it is working as expected, this PR is based on the fix found here

Still not working for me, even with create_launch_template set to true :(

Are you using the master version of the module ? Latest release with this PR is only out today in https://github.com/terraform-aws-modules/terraform-aws-eks/releases/tag/v15.2.0

EKami · 2021-05-05T10:21:17Z

Are you using the master version of the module ? Latest release with this PR is only out today in https://github.com/terraform-aws-modules/terraform-aws-eks/releases/tag/v15.2.0

Oh, I thought it was included in version 15.1.0, I'll try with version 15.2.0 then, thanks! :)

EvertonSA · 2021-05-05T12:32:31Z

Hi, this has been merged and it seems to still work with official AMI.
I have just tested with the following configuration:
  node_groups = {
    "default-${local.aws_region}a" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[0]]
      disk_size        = 20
    }

    "default-${local.aws_region}b" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[1]]
      disk_size        = 20
    }

    "default-${local.aws_region}c" = {
      ami_type = "AL2_ARM_64"
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
    "taint-${local.aws_region}c" = {
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t3a.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
  }
And it is working as expected, this PR is based on the fix found here
Thanks for the PR, I will try it out. Your solution is very elegant. Unfortunately, some of our already running environments are provisioned using rawly terraform resources. I have no clue how much effort would it take to migrate to use the terraform aws eks module. I might give it a shot on our development environments in the next following weeks.
Although I strongly support your development, I still think taints should be accepted here: https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_node_group. What happens if I open a ticket with mr. Bezos and he replies that they will go further with the incident because I'm using a community module (that modifies the kubelet default behavior) instead of the official product API?
I really don't know what are the implications of using community modules, but according to the documentation, only Enterprise Business Support can have "Third-party software support". So #AWS, if I encorage my team to fully migrate to this module, will I have issues with your definition of "Third-party software support"? Does "Third-party software support" includes kubelet default behavior modifications?
We eagerly wait for a response.
I honestly do not know about the support but using custom launch template is supposed to be supported on AWS, if you have support and are using the official AMI, I do not see why you would loose the support. I guess the same thing could apply to people using custom AMI that AWS have not way to verified, do they also loose support ?

yes, I had my ticket dropped few years ago.

EvertonSA · 2021-05-05T19:26:13Z

Some features took 15 days to change from Comming Soon to Shipped. Other features took months. How long should I wait? Does it make sense using community terraform workarunds if we are now on comming soon?

@TBBle "so "AWS sucks, everyone else has had this forever" isn't really a useful contribution.". Disagre totaly. As a product owner think this is REALLY useful contribution to my product.

TBBle · 2021-05-05T22:44:42Z

How long should I wait? Does it make sense using community terraform workarunds if we are now on comming soon?

That depends on your needs and priorities. If you need a terraform deployment today, then you can't wait, so don't wait. If you are just tracking this as a blocker for migrating to Managed Node Groups, and are happy with self/un-managed Node Groups in the meantime, then waiting is fine. (I'm in the latter boat, but it's not the only "migration-blocking" feature I'm tracking, and really only applies to the "next cluster" I build, since existing clusters work now)

As for the other part, since you stripped the context of my quote, including the important part, I'll requote it

presumably it's mostly done, and is being tested/validated/integrated, so "AWS sucks, everyone else has had this forever" isn't really a useful contribution.

Leaving aside the toxic phrasing of this feedback, "AWS sucks, everyone else has had this forever" tells a Product Owner nothing about a feature which is already in the delivery pipeline. That sort of information is more useful when deciding if and where to prioritise a feature, or if the PO has (for whatever reason) never looked at their competition's offerings.

Once it's at the stage of the pipeline I presumed it to be at, it's very unlikely that someone is going to slap their forehead and say "Oh! We should just ship that, instead of sitting on the ready-to-go feature in order to feast on the tears of our users" (or whatever reaction one expects from such comments).

This by far the most 👍'd feature request in the Coming Soon bucket (by a multiple of 5 from its next-closest) and I certainly assume that the person/people managing this backlog can count.

teochenglim · 2021-05-06T00:40:13Z

Hi, this has been merged and it seems to still work with official AMI.
I have just tested with the following configuration:

  node_groups = {
    "default-${local.aws_region}a" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[0]]
      disk_size        = 20
    }

    "default-${local.aws_region}b" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[1]]
      disk_size        = 20
    }

    "default-${local.aws_region}c" = {
      ami_type = "AL2_ARM_64"
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
    "taint-${local.aws_region}c" = {
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t3a.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
  }

And it is working as expected, this PR is based on the fix found here

Still not working for me, even with create_launch_template set to true :(

Are you using the master version of the module ? Latest release with this PR is only out today in https://github.com/terraform-aws-modules/terraform-aws-eks/releases/tag/v15.2.0

@ArchiFleKs
With Terrafrom or without Terraform it is the same error. I don't think change version will help because terraform has no problem, the problem is at AWS console.
It works on terraform side, but AWS console display it the wrong launch template.

teochenglim · 2021-05-06T00:57:13Z

Some features took 15 days to change from Comming Soon to Shipped. Other features took months. How long should I wait? Does it make sense using community terraform workarunds if we are now on comming soon?

@TBBle "so "AWS sucks, everyone else has had this forever" isn't really a useful contribution.". Disagre totaly. As a product owner think this is REALLY useful contribution to my product.

@EvertonSA If the Product Owner is serious about his product and take into consideration multiple users has different need base on what they have existing, he/she shall make things flexible. We (Community) spend time and effort and use his product. If he/she decided out of long list of Kubelet flags (https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/) he can only choose to expose 1 flag and "is good enough" and solve all the problem in the world, And community shall be happy about it?

Besides, "KUBELET_EXTRA_ARGS" exists long time ago and he/she decided to remove it and make this problem? eksctl overrideBootstrapCommand also get behavioural changes.

My point is few months back we have freedom to choose what to do and now everything is buggy and we worry about next version it get totally change again. So for each release (every 3 months roughly), we have to revisit this again? And pray hard it works this time?

Most other user drop the case (github issue get close without known why and given up) already, they will just claim eks doesn't work for them. But I am still here.

EvertonSA · 2021-05-06T00:57:51Z

How long should I wait? Does it make sense using community terraform workarunds if we are now on comming soon?

That depends on your needs and priorities. If you need a terraform deployment today, then you can't wait, so don't wait. If you are just tracking this as a blocker for migrating to Managed Node Groups, and are happy with self/un-managed Node Groups in the meantime, then waiting is fine. (I'm in the latter boat, but it's not the only "migration-blocking" feature I'm tracking, and really only applies to the "next cluster" I build, since existing clusters work now)

As for the other part, since you stripped the context of my quote, including the important part, I'll requote it

presumably it's mostly done, and is being tested/validated/integrated, so "AWS sucks, everyone else has had this forever" isn't really a useful contribution.

Leaving aside the toxic phrasing of this feedback, "AWS sucks, everyone else has had this forever" tells a Product Owner nothing about a feature which is already in the delivery pipeline. That sort of information is more useful when deciding if and where to prioritise a feature, or if the PO has (for whatever reason) never looked at their competition's offerings.

Once it's at the stage of the pipeline I presumed it to be at, it's very unlikely that someone is going to slap their forehead and say "Oh! We should just ship that, instead of sitting on the ready-to-go feature in order to feast on the tears of our users" (or whatever reaction one expects from such comments).

This by far the most 👍'd feature request in the Coming Soon bucket (by a multiple of 5 from its next-closest) and I certainly assume that the person/people managing this backlog can count.

thanks for the input. I did not mean to be toxic or rude. I totally understand your reactions. Our opinions might not get allong but fine.

regarding terraform development, I will wait until I get feedback from my team.

EvertonSA · 2021-05-06T01:03:38Z

Some features took 15 days to change from Comming Soon to Shipped. Other features took months. How long should I wait? Does it make sense using community terraform workarunds if we are now on comming soon?
@TBBle "so "AWS sucks, everyone else has had this forever" isn't really a useful contribution.". Disagre totaly. As a product owner think this is REALLY useful contribution to my product.

@EvertonSA If the Product Owner is serious about his product and take into consideration multiple users has different need base on what they have existing, he/she shall make things flexible. We (Community) spend time and effort and use his product. If he/she decided out of long list of Kubelet flags (https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/) he can only choose to expose 1 flag and "is good enough" and solve all the problem in the world, And community shall be happy about it?

Besides, "KUBELET_EXTRA_ARGS" exists long time ago and he/she decided to remove it and make this problem? eksctl overrideBootstrapCommand also get behavioural changes.

My point is few months back we have freedom to choose what to do and now everything is buggy and we worry about next version it get totally change again. So for each release (every 3 months roughly), we have to revisit this again? And pray hard it works this time?

Most other user drop the case (github issue get close without known why and given up) already, they will just claim eks doesn't work for them. But I am still here.

thanks for the input. Let's say she don't really care of what I do with the kubelet. As long as AWS business support don't turn their backs on us if we need then. I totally understand your reactions.

ArchiFleKs · 2021-05-08T10:36:03Z

I think you can use this input https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest#input_worker_create_cluster_primary_security_group_rules To avoid doing this trick -- Kevin Lefevre

…

On Friday, May 07, 2021 at 7:36 PM, Dru Goradia ***@***.*** ***@***.***)> wrote: For anyone experiencing the same with the terraform eks module (15.2.0) I was able to resolve it using worker_additional_security_group_ids locals { cluster_primary_security_group_id = module.mycluster.cluster_primary_security_group_id } module "mycluster" { source = "terraform-aws-modules/eks/aws" version = "15.2.0" ... worker_additional_security_group_ids = [ local.cluster_primary_security_group_id, ] ... node_groups_defaults = { ami_type = "AL2_x86_64" disk_size = 40 subnets = data.aws_subnet_ids.private.ids key_name = var.key_name source_security_group_ids = [data.aws_security_group.bastion_only.id] } node_groups = { notaint = { desired_capacity = 2 max_capacity = 20 min_capacity = 2 instance_types = ["t3.medium"] k8s_labels = { "workload/type" = "notaint" } additional_tags = { Name = "eks-ng-notaint" } } sometaint = { desired_capacity = 3 max_capacity = 6 min_capacity = 3 instance_types = ["m5a.large"] k8s_labels = { "workload/type" = "sometaint" } create_launch_template = true kubelet_extra_args = "--register-with-taints=workload=sometaint:NoSchedule" additional_tags = { Name = "eks-ng-sometaint" } } } — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub (#864 (comment)), or unsubscribe (https://github.com/notifications/unsubscribe-auth/AAJXRKNX67XMRGMRDGH4HNDTMQQKPANCNFSM4MS53PWA).

mikestef9 · 2021-05-11T00:22:47Z

Hey folks,

Native support for Kubernetes taints is now available in managed node groups!

teochenglim · 2021-05-11T02:25:12Z

@mikestef9 thank you for that saw it on my console today.

mikestef9 added the EKS Amazon Elastic Kubernetes Service label Apr 28, 2020

mikestef9 self-assigned this Apr 29, 2020

aviau mentioned this issue May 27, 2020

[EKS] [request]: add taints to the CreateNodegroup API #917

Closed

aviau mentioned this issue May 27, 2020

aws_eks_node_group is missing way to configure taints on nodes hashicorp/terraform-provider-aws#12780

Closed

mikestef9 added the EKS Managed Nodes EKS Managed Nodes label Jun 11, 2020

dpiddockcmp mentioned this issue Jun 26, 2020

Difference between node_groups and worker_groups terraform-aws-modules/terraform-aws-eks#895

Closed

dpiddockcmp mentioned this issue Aug 9, 2020

Node taints for Node Groups terraform-aws-modules/terraform-aws-eks#962

Closed

4 tasks

ArchiFleKs mentioned this issue Dec 7, 2020

feat: Create launch template for Managed Node Groups terraform-aws-modules/terraform-aws-eks#1138

Merged

2 tasks

petewilcock mentioned this issue May 11, 2021

Support for taints on EKS Managed NodeGroup (aws_eks_node_group) hashicorp/terraform-provider-aws#19310

Closed

mikestef9 closed this as completed May 11, 2021

pierluigilenoci mentioned this issue Nov 16, 2021

[EKS Add-On] [CoreDNS]: Patched Add-On never recovers from 'Degraded' State #1389

Open

[EKS] [request]: Managed Node Groups support for node taints #864

[EKS] [request]: Managed Node Groups support for node taints #864

Comments

mikestef9 commented Apr 28, 2020

Community Note

TBBle commented May 4, 2020 • edited Loading

mikestef9 commented May 4, 2020

karstenmueller commented May 12, 2020

aviau commented May 27, 2020 • edited Loading

AlbertoPeon commented Jul 17, 2020

jhcook-ag commented Jul 30, 2020

TBBle commented Jul 30, 2020

jhcook-ag commented Jul 30, 2020

borisputerka-zz commented Aug 3, 2020

Lincon-Freitas commented Aug 4, 2020

vcucereanu commented Aug 12, 2020

martinoravsky commented Sep 10, 2020

Dudssource commented Sep 10, 2020

martinoravsky commented Sep 10, 2020

Dudssource commented Sep 10, 2020

mikestef9 commented Sep 10, 2020

lwimmer commented Sep 15, 2020 • edited Loading

dwilliams782 commented Sep 30, 2020

markthebault commented Dec 3, 2020

DovAmir commented Dec 7, 2020

pierluigilenoci commented Apr 12, 2021

teochenglim commented Apr 14, 2021

ArchiFleKs commented Apr 22, 2021

EKami commented May 2, 2021

EvertonSA commented May 5, 2021 • edited Loading

EvertonSA commented May 5, 2021 • edited Loading

teochenglim commented May 5, 2021 • edited Loading

ArchiFleKs commented May 5, 2021

jfoechsler commented May 5, 2021 • edited Loading

teochenglim commented May 5, 2021

teochenglim commented May 5, 2021 • edited Loading

TBBle commented May 5, 2021

ArchiFleKs commented May 5, 2021 • edited Loading

ArchiFleKs commented May 5, 2021

EKami commented May 5, 2021

EvertonSA commented May 5, 2021

EvertonSA commented May 5, 2021

TBBle commented May 5, 2021

teochenglim commented May 6, 2021

teochenglim commented May 6, 2021

EvertonSA commented May 6, 2021

EvertonSA commented May 6, 2021

ArchiFleKs commented May 8, 2021 via email

mikestef9 commented May 11, 2021 • edited Loading

teochenglim commented May 11, 2021

TBBle commented May 4, 2020 •

edited

Loading

aviau commented May 27, 2020 •

edited

Loading

lwimmer commented Sep 15, 2020 •

edited

Loading

EvertonSA commented May 5, 2021 •

edited

Loading

EvertonSA commented May 5, 2021 •

edited

Loading

teochenglim commented May 5, 2021 •

edited

Loading

jfoechsler commented May 5, 2021 •

edited

Loading

teochenglim commented May 5, 2021 •

edited

Loading

ArchiFleKs commented May 5, 2021 •

edited

Loading

mikestef9 commented May 11, 2021 •

edited

Loading