Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MachineSet and ControlPlanes with longer names no longer reconcile #7710

Open
killianmuldoon opened this issue Dec 8, 2022 · 12 comments
Open
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@killianmuldoon
Copy link
Contributor

killianmuldoon commented Dec 8, 2022

On main, v1.3.0 and v1.2.7 MachineSets and ControlPlanes with longer names can no longer be reconciled. This is due to using the MachineSet name as the value in a label which has different validation rules than a Kubernetes object name. The most important difference being a maximum length of 63 characters for label values.

The issue was introduced as part of the adoption fix for Machines which was backported to the v1.2 release branch - #7591

This can be resolved by hashing overly-long names when applying the label. This will result in a worse UX for users that have MachineSets with long names as the label will no longer be human-readable.

As part of fixing this we can regularlize and improve the way objects in CAPI are named in order to prevent this issue from recurring and limit the situations where the hashing needs to be done.

The following would improve the situation:

  1. Update labelling strategy to reduce the length of MS and CP labels when overly-long. (described above) 🐛 Add name hashing for long MS and KCP names #7711
  2. Explicitly enforce max length of 63 characters for Cluster name. This is currently enforced implicitly as the Cluster name is used as a label value in multiple objects. 🌱 Add explicit length check for cluster and md names #7712
  3. Explicitly enforce max length of 63 characters for MachineDeployment name length in MD webhook. This is currently enforced implicitly as the MD name is used as a label value in multiple objects. 🌱 Add explicit length check for cluster and md names #7712
  4. Use SimpleNameGenerator for generating names for MachineSets. This will implicitly enforce a maximum MS name length of 63 characters for MSes created by the MD controller.

/kind bug

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 8, 2022
@chaunceyjiang
Copy link
Contributor

chaunceyjiang commented Dec 8, 2022

This can be resolved by hashing overly-long names when applying the label.

  1. Update labelling strategy to reduce the length of MS and CP labels when overly-long. (described above)

Which hash algorithm should be used? fnv

@killianmuldoon
Copy link
Contributor Author

@chaunceyjiang Yeah - I've used fnv in my first attempt to fix this - it's used it elsewhere for hashing the machineTemplates in CAPI.

/assign

@fabriziopandini
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 12, 2022
@sbueringer
Copy link
Member

sbueringer commented Dec 22, 2022

@killianmuldoon Looks like 1-3 is fixed.

I'm probably missing something but to me it looks like 4. is also very critical.

MS name seems to be calculated via:

Name:      d.Name + "-" + apirand.SafeEncodeString(machineTemplateSpecHash),

This looks to me like as soon as a MD is created which has the max length it won't work as it's unable to create MachineSets.

Or is that not necessary as we don't have the restriction on MS name length anymore as we're using hashing for the labels?

Am I missing something?

P.S. Bonus question: Looks like in the Cluster topology controller we create MD names with the SimpleNameGenerator. Doesn't this mean that if we end up with a MD with max length there, the corresponding MachineSets cannot be created?

P.S.2 Just asking to understand the priority

@killianmuldoon
Copy link
Contributor Author

Or is that not necessary as we don't have the restriction on MS name length anymore as we're using hashing for the labels?

Exactly - the MS name generation isn't critical as we don't directly limit MS names - just names for MDs and Clusters. Longer MS / KCP names still work, but they're hashed in the Cluster label. It would be a good idea to fix name generation in a backward compatible way though - and maybe backport the fix if desired. It's a harder fix though because the MS name touches the rollout logic which is why it's generated in this way today.

There's also a broader UX issue where long names lose their suffix with the SimpleNameGenerator. This is a UX issue for e.g. MDs created by the Cluster topology controller as they lose their meaningful suffix e.g. md-0

@sbueringer
Copy link
Member

sbueringer commented Dec 22, 2022

Given that we don't have an actual length limitation on MachineSets, I wonder if we really should use the SimpleNameGenerator. For MachineSets it seems like we could lose the meaningful name which means it would be hard to easily correlate MachineSets to MachineDeployments?

(And thinking one step further, the same might apply to Machine => MachineSet?)

P.S. not sure if it's to ClusterClass specific but we could have a column for MDs which use the topology name as a value (?)

@killianmuldoon
Copy link
Contributor Author

Yeah - I think there's some thinking to be done on how we link objects across the hierarchy. I don't think this is a critical issue right now so there's time to consider it and bring it to the community.

The length limit for MS right now is soft, and just results in a degraded UX and maybe hard-to-use API as the label may be hashed some time. We currently lose meaningful names today for MDs and control planes in any case where the name is too long, so this isn't just an issue for MachineSets.

@fabriziopandini
Copy link
Member

From my point of view 1,2,3 were a regression introduced by #7591, while 4 is an issue with long names existing for a long time.

We are now in a condition where the regression has been fixed, and we can now look at the problem more in a holistic way.

With this regard, I think that limiting names in Cluster and Machine deployments is a step in the right direction; we can now think about

  • add documentation reccommanding short names, explaining long names = degraded ux
  • surface hierachy in columnd (MD name, not the MS name which is a technical detail the user should not really care about)

@killianmuldoon
Copy link
Contributor Author

/help

@k8s-ci-robot
Copy link
Contributor

@killianmuldoon:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Nov 7, 2023
@killianmuldoon
Copy link
Contributor Author

/unassign

@fabriziopandini
Copy link
Member

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

5 participants