Skip to content

aifoundry-org/oxide-controller

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

oxide-controller

This is the server that controls the Ainekko stack deployment on an Oxide sled. It serves these purposes:

  1. It ensures a Kubernetes cluster is running on the sled, providing a loop that checks the state of the cluster and keeps it alive.
  2. It provides a REST API for the Ainekko stack, to enable users to interact with the sled, changing the worker VM count.

This can run standalone on your own device, in a VM, or inside the Kubernetes cluster itself.

All options are configured via CLI flags. Run oxide-controller --help to see the available options.

The production-recommended, and default, that this runs is:

  1. Start locally, pointing to the Oxide sled, which will cause a 3-node kubernetes cluster to be deployed.
  2. Deploy a helm chart to the cluster, which will deploy the oxide-controller into the cluster.
  3. Exit locally
  4. The oxide-controller running in the cluster ensures that the cluster remains up and running, that the workers nodes are in the desired state, and listens for API calls to change the worker node count.

Note that the above assumes an available OCI image containing the controller. The default image is available in the chart. You can override it.

Rapid Development Mode

For rapid development purposes, you may want to make changes to the controller, and run it with go run or dlv debug, or even compile it and run it locally. In all of those cases, you are not yet publishing an OCI image to a registry. You can launch your own program if you set --controller-oci-image dev:<path-to-src>, it will compile a binary from the provided path.

By default, the binary is built for the same platform (OS and architecture) as the host system. If you want to build for a different platform, you can set the --controller-oci-image flag to dev:<path-to-src>:<platform>, where <platform> is the target platform in the format <os>/<arch>. For example, dev:./cmd/oxide-controller:linux/amd64 would build for Linux on AMD64 architecture.

This is useful for testing changes to the controller itself, but should not be used in production.

REST API

The REST API is fairly simple, with the following endpoints:

  • GET /nodes/workers - get the targeted count of worker nodes
  • POST /nodes/workers/modify - modify the targeted count of worker nodes. Content-Type should be application/text, body should just be the new count.

ssh Keys

The oxide-controller uses two ssh keys:

  • A dedicated short-lived public/private key pair, created by the oxide-controller for the initial cluster node only
  • An optional existing public key you provide

The optional user-provided SSH public key is injected into each node created, giving you the option to access the nodes separately, should you choose to do so.

The key pair generated by the oxide-controller is used to access just the initial cluster control plane node via SSH. It uses that to:

  1. Retrieve the join token
  2. Retrieve the kubeconfig file

The controller uses the kubeconfig to:

  • Create a Kubernetes Secret kube-system/oxide-controller with the join token, the user-provided ssh key and the controller-created keypair
  • Create the necessary ServiceAccount, ClusterRole and ClusterRoleBinding for the oxide-controller
  • Deploy the oxide-controller into the cluster

The oxide-controller will save the kubeconfig file to the location provided by the --kubeconfig flag. The default, if no flag is provided, is ~/.kube/oxide-controller-config. If that file already exists, and it does not access an existing cluster successfully, it will be considered an error.

When running a controller against an existing cluster, one of the following must be true:

  • It is running inside the cluster and can access it using ServiceAccount and the ssh key secret
  • It is running outside the cluster and can access it using the kubeconfig file
  • It is running outside the cluster and can access it using the ssh key secret

If none of the above is true, the controller will not be able to access the cluster and will fail. For a new cluster with no control plane nodes, the controller will create the new cluster and inject the new information.

tailscale

The oxide-controller can optionally install tailscale on the nodes in the cluster. This is useful for accessing the nodes from outside the cluster, in addition to or in exchange for using the public IP addresses.

See tailscale.md for more information on how to set up tailscale.

About

Manages the deployment of ainekko stack on an Oxide sled

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages