Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support of instance metadata service #30

Closed
feiskyer opened this issue May 5, 2019 · 17 comments
Closed

Support of instance metadata service #30

feiskyer opened this issue May 5, 2019 · 17 comments
Assignees
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. P2 Priority 2
Milestone

Comments

@feiskyer
Copy link
Member

feiskyer commented May 5, 2019

Before CCM, kubelet supports getting Node information by cloud provider's instance metadata service. This includes:

• NodeName
• ProviderID
• NodeAddresses
• InstanceType
• AvailabilityZone

Instance metadata service could help to reduce API throttling issues, and increase the nodes initialization duration. This is especially helpful for large clusters.

But with CCM, this is not possible anymore because the above functionality has been moved to cloud controller manager. We should add this back into Kubelet.

Since cloud providers are moving to external, we may need to add new plugins into Kubelet, e.g. via grpc.

Refer #14 (#14 is focused on node controller, while this one is focused on cloud API throttling).

@andrewsykim @justaugustus @craiglpeters Any ideas on this?

@andrewsykim
Copy link
Member

andrewsykim commented May 5, 2019

ref: #15 #14 #18

@andrewsykim andrewsykim added this to the Next milestone May 6, 2019
@andrewsykim andrewsykim added the P1 Priority 1 label May 6, 2019
@andrewsykim
Copy link
Member

I'm not sure if this will be feasible, but at the very least we should focus efforts in making controllers API quota sensitive before we invest efforts in something like this. We can make API quota sensitive controllers a high priority for v1.16 as a first step and go from there. What do you think?

@andrewsykim andrewsykim added P2 Priority 2 and removed P1 Priority 1 labels May 6, 2019
@feiskyer
Copy link
Member Author

feiskyer commented May 7, 2019

We can make API quota sensitive controllers a high priority for v1.16 as a first step and go from there. What do you think?

Yep, that may be the first step. Meanwhile, I'm thinking the solutions for such issues, so that CCM still has the same performance as KCM.

@andrewsykim
Copy link
Member

so that CCM still has the same performance as KCM.

The only controller that is different from KCM is cloud node controller, I think kubernetes/kubernetes#75405 should help with this a lot

@feiskyer
Copy link
Member Author

feiskyer commented May 7, 2019

I think kubernetes/kubernetes#75405 should help with this a lot

Yep, but still not enough. e.g. for cluster provision step, hundreds of nodes are initialized at the same time. kubernetes/kubernetes#75405 could reduce the errors in node controller, but the node initialization may be still slow because of API throttling.

With instance metadata, however, all nodes could init them without any API invoking, hence no API throttling would happen and nodes could be registered much faster.

@andrewsykim
Copy link
Member

Yep, but still not enough. e.g. for cluster provision step, hundreds of nodes are initialized at the same time. kubernetes/kubernetes#75405 could reduce the errors in node controller, but the node initialization may be still slow because of API throttling.

Agreed that we should improve this, but we should be mindful that this is a one time cost when the cluster is being created and we should optimize for that and nothing more.

@andrewsykim
Copy link
Member

@feiskyer are you able to validate if kubernetes/kubernetes#75405 is helping with this problem for Azure?

@feiskyer
Copy link
Member Author

@andrewsykim Yep, of course. Would do it.

@andrewsykim
Copy link
Member

@feiskyer any updates on this one?

@feiskyer
Copy link
Member Author

@andrewsykim Based on Azure API throttles, this is still a hard requirement for Azure cloud provider. I'm planning to draft a proposal for this during v1.16.

/milestone v1.16
/assign

@feiskyer
Copy link
Member Author

@andrewsykim I'm preparing the proposal here. Would you like having a look before I send it out as KEP?

@feiskyer
Copy link
Member Author

Opened the KEP here: kubernetes/enhancements#1158.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 20, 2019
@feiskyer
Copy link
Member Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 20, 2019
@cheftako
Copy link
Member

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Oct 21, 2019
@andrewsykim
Copy link
Member

/close

@k8s-ci-robot
Copy link
Contributor

@andrewsykim: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. P2 Priority 2
Projects
None yet
Development

No branches or pull requests

5 participants