-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cpu oversubscription #23920
Comments
@noffke if Nomad doesn't set the Have you considered using |
@tgross thank you for your explanation! From looking at the running docker containers, and also from my understanding, that doesn't seem to be the case on our (plain ubuntu) systems. If I don't set any resource limits on
The service in question doesn't have a constantly high cpu load but rather a medium load with bursts, so if I assign cpus exclusively, they'd be unavailable to the other containers even though they're idle, in my understanding. |
Right, but "any other processes on the system" are also constrained by cgroups, because that's how the so-called Completely Fair Scheduler (CFS) works. Docker doesn't apply any extra constraints, but the process is still constrained with a default Assuming you're on cgroups v2 (which is likely unless you're using a very old Ubuntu), to give this process as much The catch with that is the Nomad scheduler then assumes those CPU resources aren't available. Which probably means you're really looking for "CPU oversubscription" here, which would have to be implemented in the scheduler and not the task driver. This is all an unfortunate consequence of having a generic "CPU resource" that different task drivers handle in different ways. Or even different ways for the same task driver, depending on kernel options, as we see with Docker! Additional context about CPU resources... For example, let's compare running a process by itself, running a process with Docker directly but without First I'll run busybox
Next I'll run the same process in a container:
In another terminal, find the PID for the process and look up its cgroups and see that this also has the default
Lastly, we'll run a minimal Nomad jobspec with jobspecjob "example" {
group "group" {
task "task" {
driver = "docker"
config {
image = "busybox:1"
command = "httpd"
args = ["-vv", "-f", "-p", "8001", "-h", "/local"]
}
resources {
cpu = 200
memory = 50
}
}
}
} When we look up the cgroup, we can see a
Why 8? That's Docker mapping the
I'm also realizing here that the documentation in https://developer.hashicorp.com/nomad/docs/drivers/docker#cpu is stale and doesn't account for cgroups v2. |
@tgross Thanks again for your very detailed explanation! |
Proposal
Please allow to opt-out of setting cpu-shares on the docker driver.
Nomad uses the "cpu" resources of a job config for planning/performing allocations and, in case of the docker driver, the value will then also be set as cpu shares (https://docs.docker.com/engine/containers/resource_constraints/#cpu) on starting a docker container. We're running nomad on our own hardware, so outside of general system performance considerations, we don't have to worry about consuming too many cpu cycles, contrarily to a paid cloud environment. As a form of low-effort, at your own risk cpu overprovisioning, it would be great if there was an option to tell nomad to simply not set any cpu shares setting when starting a docker container.
Use-cases
We're running a cpu intensive service via nomad, that, even though it gets given most of the available cpu of a machine, still gets throttled to unusability.
Attempted Solutions
Running the service via docker on a nomad client machine without nomad (i.e. starting it manually on the command line) and not setting cpu shares makes the service run fine. Ideally, we'd like to continue to run the service via nomad, so we can avoid running one service outside of nomad.
The text was updated successfully, but these errors were encountered: