Provision a fleet of AKS clusters in GitOps fashion

Eugene Fedorenko
3 min readMar 7, 2021

--

A new team in the organization comes to a platform administrator and asks to provision a new AKS cluster for their new project. The platform admin answers “Sure! Why not?”, takes their beloved Terraform scripts or even an IaC pipeline running the scripts and provisions a new cluster. Having done that the admin bootstraps a cluster with Flux and adds it to the fleet repo so that all required infrastructure configurations are installed on the new cluster in GitOps way from the central source of truth. Done.

But would it be possible to do the whole process in the GitOps fashion? So that I just add a new cluster to the fleet repo, PR it, and … that’s it … Once the PR is merged, in a while a new cluster magically pops up with Flux installed on it and all the required in my organization infra setup up and running (e.g. ingress controllers, CSI drivers, etc.). Furthermore, the cluster itself is provisioned in compliance with all the security regulations required in my organization. Yes, it’s possible. Cluster API (CAPI) is our friend here.

CAPI is an implementation of “K8s operator” pattern (resource + controller) to provision and manage k8s clusters. So we can define a “worker” cluster as a CRD resource and there is a controller running on “managing” cluster which brings the resource to the desired state by actually provisioning/altering the “worker” k8s cluster. That said, it opens a wide door for GitOps when the resource definitions (“worker” cluster descriptors) are delivered to the “managing” K8s cluster from a Git repository by a GitOps operator, e.g. Flux.

There are a number of various CAPI providers to provision clusters on different clouds (e.g. Azure, AWS, GCP, VMWare, etc.). In my case Cluster API Provider Azure (CAPZ) makes it happen on Azure.

The diagram below demonstrates how it actually works:

Managing cluster is any K8s cluster (AKS, Kind, k3s, etc.). It’s not supposed to run any workloads. Its purpose is to observe the Fleet repo and provision or update “worker” clusters that run the workloads. Therefore the managing cluster should have a GitOps operator (e.g. Flux) and CAPI/CAPZ installed.

Flux can be installed with the Flux bootstrap command. Having done that, we should define a Flux “clusters” Kustomization to observe capi/clusters folder in the Fleet repo:

CAPI with the Azure provider implementation (CAPZ) can be installed on the managing cluster with the following script:

In order to add a cluster to the fleet we need to create a corresponding subfolder (e.g. atlantic-aks) in capi/clusters folder in the Fleet repo. The pacific-aks subfolder contains a Flux HelmRelease definition pointing to a Hem chart with all necessary CRDs (e.g. Cluster, Control Plane, Agent Pool, etc.) that CAPI/CAPZ will use to provision a cluster in Azure. The HelmRelease also specifies values for the Helm chart such as cluster name, resource group name, agent pools names:

Once a new cluster is defined in the Fleet repo, Flux “clusters” Kustomization reconciles the pacific-aks HelmRelease and creates the CAPI resources in the cluster. CAPI/CAPZ sees the new resources and brings them to the desired state by provisioning the resources (AKS, node pool, subnet, etc.) in Azure.

Cluster Helm chart also defines “flux-system” and “infra” Flux Kustomizations. The first one remotely installs Flux on the new provisioned cluster so it can manage its workloads independently. The “infa” Kustomization is responsible for installing all required infrastructure configurations on the new cluster. In this example there is Nginx ingress controller to be set up on all clusters in the fleet. The “infa” Kustomization remotely creates a Flux “nginx” HelmRelease on the new cluster which in its turn will fetch the “nginx” Helm Chart from internet and install it on the cluster.

To delete or update a cluster, for example to add a new node pool or increase the number of nodes, we just need to make the changes in the Fleet repo. All configurations will be replicated automatically to Azure.

Note, that with this approach there is not any scripts or pipelines to run. The Git repository is the source of truth. Create a PR, review, merge and Flux with CAPI will make it happen.

That’s it!

--

--