This is the server that controls the Ainekko stack deployment on an Oxide sled. It serves these purposes:
- It ensures a Kubernetes cluster is running on the sled, providing a loop that checks the state of the cluster and keeps it alive.
- It provides a REST API for the Ainekko stack, to enable users to interact with the sled, changing the worker VM count.
This can run standalone on your own device, in a VM, or inside the Kubernetes cluster itself.
All options are configured via CLI flags. Run oxide-controller --help
to see the available options.
The production-recommended, and default, that this runs is:
- Start locally, pointing to the Oxide sled, which will cause a 3-node kubernetes cluster to be deployed.
- Deploy a helm chart to the cluster, which will deploy the oxide-controller into the cluster.
- Exit locally
- The oxide-controller running in the cluster ensures that the cluster remains up and running, that the workers nodes are in the desired state, and listens for API calls to change the worker node count.
Note that the above assumes an available OCI image containing the controller. The default image is available in the chart. You can override it.
For rapid development purposes, you may want to make changes to the controller, and run it with go run
or dlv debug
,
or even compile it and run it locally. In all of those cases, you are not yet publishing an OCI image to a registry.
You can launch your own program if you set --controller-oci-image dev:<path-to-src>
, it will compile a binary from the
provided path.
By default, the binary is built for the same platform (OS and architecture) as the host system. If you want to build for
a different platform, you can set the --controller-oci-image
flag to dev:<path-to-src>:<platform>
, where
<platform>
is the target platform in the format <os>/<arch>
. For example, dev:./cmd/oxide-controller:linux/amd64
would build for Linux on AMD64 architecture.
This is useful for testing changes to the controller itself, but should not be used in production.
The REST API is fairly simple, with the following endpoints:
GET /nodes/workers
- get the targeted count of worker nodesPOST /nodes/workers/modify
- modify the targeted count of worker nodes. Content-Type should beapplication/text
, body should just be the new count.
The oxide-controller uses two ssh keys:
- A dedicated short-lived public/private key pair, created by the oxide-controller for the initial cluster node only
- An optional existing public key you provide
The optional user-provided SSH public key is injected into each node created, giving you the option to access the nodes separately, should you choose to do so.
The key pair generated by the oxide-controller is used to access just the initial cluster control plane node via SSH. It uses that to:
- Retrieve the join token
- Retrieve the kubeconfig file
The controller uses the kubeconfig to:
- Create a Kubernetes Secret
kube-system/oxide-controller
with the join token, the user-provided ssh key and the controller-created keypair - Create the necessary ServiceAccount, ClusterRole and ClusterRoleBinding for the oxide-controller
- Deploy the oxide-controller into the cluster
The oxide-controller will save the kubeconfig file to the location
provided by the --kubeconfig
flag. The default, if no flag is provided, is ~/.kube/oxide-controller-config
. If that file already exists, and it does not access an existing cluster
successfully, it will be considered an error.
When running a controller against an existing cluster, one of the following must be true:
- It is running inside the cluster and can access it using
ServiceAccount
and the ssh key secret - It is running outside the cluster and can access it using the kubeconfig file
- It is running outside the cluster and can access it using the ssh key secret
If none of the above is true, the controller will not be able to access the cluster and will fail. For a new cluster with no control plane nodes, the controller will create the new cluster and inject the new information.
The oxide-controller can optionally install tailscale on the nodes in the cluster. This is useful for accessing the nodes from outside the cluster, in addition to or in exchange for using the public IP addresses.
See tailscale.md for more information on how to set up tailscale.