Provision Secure AKS cluster with GitOps and CAPI
In the previous post I showed how we can use GitOps and CAPI/CAPZ to declaratively provision AKS clusters. To make these clusters production ready they should meet a set of certain requirements. Microsoft recommends to follow AKS Secure Baseline Architecture which contains guidance for the whole environment topology including networking, security, identity management, and monitoring of the cluster.
In this post we are focusing specifically on requirements for AKS cluster. These requirements can be summarized in the following table:
So there are two categories of Secure AKS requirements: AKS Attributes that determine how the cluster itself is provisioned and configured and Infrastructure Workloads that are supposed to be installed on the provisioned cluster. Let’s see how we can meet requirements of both categories.
It is convenient to define a cluster with a Flux HelmRelease definition that points to a Helm chart with all necessary CRDs such as cluster, control plane, agent pool, etc. CAPI/CAPZ will use them to provision a cluster in Azure. HelmRelease also specifies values for the Helm chart such as cluster name, resource group name, agent pools names, network plugins and so on. This is a good place to configure AKS attributes in compliance with Secure AKS requirements.
Azure CNI plugin
CAPZ supports two networking plugins “azure” and “kubenet”. The Secure AKS baseline recommends to go with “azure”, so we specified it in the “networkPlugin” attribute.
Azure network policy
With CAPZ we can provision an AKS cluster with either “calico” or “azure” networking policy. We can go with either policy to meet the Secure AKS requirements, however “azure” policy is the recommended one. It is specified in the “networkPolicy” attribute in the HelmRelease.
Azure private link to Key Vaults and ACR
It is recommended to close any outbound traffic from a Secure AKS cluster to the outer world. With … some exceptions. The cluster needs to have access to ACR to pull the images and to Azure Key Vaults to handle CSI secrets. It is suggested to achieve this connectivity with a mechanism of private links. So you can have your vnet with the required private links and you can specify it in the cluster definition. We did that in the “virtualNetwork” attribute in the HelmRelease. CAPZ will use this vnet instead of creating a new one. Alternatively, if the vnet doesn't exist, CAPZ will create it and you can configure it with the private links afterwards.
Different Subnets for system and user nodes
This is not a strong requirement, but according to the Secure AKS baseline network topology, in some cases you may want to consider having different subnets for different node pools. With CAPZ we can define different node pools for the control plane (“clsecpool”) and for the worker nodes (“wrsecpool”), but both of them will live in the same subnet. At the moment of publishing this post, CAPZ team was working on eliminating this limitation, so this feature should be available soon.
K8s Role-Based Access Control
It is recommended to use Azure RBAC for Kubernetes in a production ready AKS cluster. However, CAPZ doesn't support it yet, so it should be performed manually following the Azure RBAC for Kubernetes Authorization guidance.
Attachment to ACR
To make a provisioned AKS cluster able to pull images from a private ACR registry you need to attach it with the following command:
However, the CAPZ team is working on the ability to provide in the cluster definition yaml a user-managed identity, preconfigured with the permissions for the private ACR. With that in place, AKS kubelet will be able to pull the images without any additional configuration.
As it was described in the previous post a workload cluster is provisioned by CAPI/CAPZ controller running on a management cluster. Once a new cluster is up and running, Flux controller, working on the same management cluster, will remotely install “Infrastructure Workloads” recommended by Secure AKS baseline :
- AAD Pod Managed Identity
- Azure Key Vault CSI secret provider
- Azure Monitor Prometheus Scraping
- Ingress and Egress network policies
To summarize, we have seen that we could meet pretty much all Secure AKS baseline requirements with CAPI/CAPZ. Even though there are still some limitations and inconveniences, all of them are on the roadmap of CAPZ team. That said, I would consider CAPI/CAPZ as a very powerful modern approach to provision the full stack of your AKS environments, from development to production.