Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support configurable and overridable limits and boundaries for nodes and workloads #60

Open
autodidaddict opened this issue Jan 17, 2024 · 1 comment
Labels
0.3.x On 0.3.x Roadmap

Comments

@autodidaddict
Copy link
Contributor

In the current implementation, the only time you can specify things like rate limiters is in the machineconfig template that is defined in the node configuration (which is really just a firecracker rate limiter JSON definition). We should be able to provide more constraints around the node limits as well as be able to override some or all of the machineconfig template in the individual workload.

Note when we do allow overriding the firecracker constraints on a per-workload basis, we will have to document the caveat that if you override the machine config template for a given workload, that workload will be started from scratch, and not use a warm VM from the pool.

Incomplete list of Node limitations we should probably support:

  • Max machines deployed
  • Max total bytes of all workloads deployed
  • Max bytes of individual workloads
  • Max aggregate memory allocated to machines
  • Max total CPU cores allocated to machines

The above limitations can be specified either as per node (within a single nex process) or per namespace. In the case of the latter, e.g. the max machines deployed number will be enforced per namespace rather than for the entire node as a whole.

Overridable Limits that can be supplied with an individual workload:

  • CPU cores
  • Memory
  • Rate limiters as supported by firecracker

When we emit the workload_deployed event, we should include the effective limitations, which should be representative of the actual limitations used in the firecracker definition for that VM.

@autodidaddict autodidaddict added the 0.3.x On 0.3.x Roadmap label May 15, 2024
@jordan-rash jordan-rash added this to the 0.3.x milestone May 21, 2024
@jordan-rash jordan-rash removed this from the 0.3.x milestone May 31, 2024
@autodidaddict
Copy link
Contributor Author

autodidaddict commented Jun 18, 2024

After having had a think about this, I wonder if the "per namespace" requirements make sense. On the surface it sounds fine, but the edge cases are numerous and dangerous. If we set a per namespace limit for number of workloads, and there's a network partition event, each side of that partition could then allow up to the limits and then when there's a re-join, we now have double the number of workloads allowed.

In short I think it'll take more effort to maintain an accurate view of the per-namespace resource consumption than any benefits we might get. I think all of the node limits are still good ideas, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.3.x On 0.3.x Roadmap
Projects
None yet
Development

No branches or pull requests

2 participants