Provision a fleet of K8s clusters in multiple clouds with GitOps
In one of my previous posts I showed how we can use CAPI and Flux to declare and provision AKS clusters. The beauty of CAPI is that it was designed as a cloud agnostic API. So there is a generic abstraction layer (CAPI) and there are providers for different clouds (CAPZ, CAPA, CAPV, etc.). Just like with Terraform. Only better. See the list of all providers that support CAPI.
In this post I am considering a use case where a fleet of K8s clusters is spread across two different clouds: Azure and AWS.
The platform admin team declares all their clusters in a single source of truth, in a fleet repo. To declare a cluster, you need to describe a number of CRD’s in a yaml file. There are generic CRD’s defined at the CAPI level (e.g. “cluster”, “MachineDeployment”) and cloud specific CRD’s defined by each cloud provider (e.g. “AzureManagedControlPlane”, “AWSManagedControlPlane”, “AWSMachineTemplate”). Obviously, the yaml files with cluster definitions for different clouds look differently and contain different CRD’s. It’s convenient to incapsulate those CRD’s in Helm charts. So there is a Helm chart per cloud provider and in order to declare a K8s cluster in a cloud you need to create a Helm release referring to a corresponding chart and providing the chart values such as cluster name, number of nodes, K8s version, etc.
Besides the “fleet” repo, there is a “management” cluster that observes the “fleet” repo and provisions the declared resources in multiple clouds. This is the only job of the management cluster. It doesn't cary any workloads. The management cluster itself can live in any cloud, it can be on prem, it can be local (kind, k3s/k3d), it can be even ephemeral with a very short lifespan. Follow the instructions to install CAPZ for Azure and CAPA for AWS on the management cluster.
Once Helm charts and Helm releases with cluster definitions are pushed to the fleet repo, Flux will deliver them to the management cluster. Flux HelmRelease controller will create CAPI/CAPZ/CAPA resources in the cluster. With that in place CAPI and CAPZ/CAPA controllers will start to bring the resources to the desired state by provisioning them in a corresponding cloud. So if the cluster definition contains CAPZ resources, they will be provisioned in Azure, if the cluster contains CAPA resources they will be provisioned in AWS.
Once a new cluster is provisioned, Flux on the management cluster will use the remote cluster technique to install Flux on the new cluster and it will remotely create Flux resources such as GitRepository and Kustomization to reconcile the infrastructure workloads defined in the fleet repo.
The sample in this post implements a simple CAPI setup for the multi-cloud environment. There is also an advanced configuration pattern with multiple management clusters. So each cloud may have a dedicated management cluster which, in its turn, is also created with CAPI by a “master management cluster”. The advanced pattern addresses a single management cluster bottleneck and security challenges. Those are well described in the Weaveworks article GitOps and Cluster API: Multi-cluster Manager.
That’s it!