# ☸ Kubernetes — architecture, platforms, Cluster API ## Overview Kubernetes (K8s) is an open-source container orchestrator — the de facto standard for deploying, scaling, and managing containerized applications. Built on declarative configuration and control loops (reconciliation). ## Kubernetes deployment methods | Method | Description | Control plane | Best for | |--------|-------------|--------------|----------| | **kubeadm** | Official K8s cluster bootstrap tool | Self-managed (stacked/external etcd) | On-prem, lab, learning | | **K3s** | Lightweight K8s (Rancher), single binary, embedded etcd/SQLite | Self-managed | Edge, IoT, low-resource, HA with embedded etcd | | **RKE2** | Rancher Kubernetes Engine 2, CIS-hardened, FIPS-ready | Self-managed | Enterprise on-prem, air-gapped, regulatory | | **OpenShift** | Red Hat enterprise K8s + operator lifecycle + SDN + routing | Self-managed (RHCOS) | Enterprise, multicluster, platform engineering | | **Vanilla K8s (CAPI)** | Cluster API — declarative provisioning and lifecycle management | Self-managed (CAPI managed) | Fleet management, GitOps, multi-provider | | **EKS** (AWS) | Managed K8s | AWS managed | AWS cloud-native, least ops | | **AKS** (Azure) | Managed K8s | Azure managed | Azure cloud-native | | **GKE** (GCP) | Managed K8s, auto-pilot, autopilot modes | GCP managed | GCP cloud-native | | **SKE** (Sangfor) | Managed K8s on Sangfor HCI | Vendor managed | Sangfor HCI ecosystem | --- ## Cluster API (CAPI) ### What is Cluster API Cluster API is a Kubernetes sub-project (SIG Cluster-Lifecycle) that brings declarative APIs for provisioning, upgrading, and operating Kubernetes clusters. Instead of Terraform scripts or manual `kubeadm`, you define clusters as Kubernetes Custom Resources — `Cluster`, `Machine`, `MachineDeployment`, etc. Core principle: **A Kubernetes cluster that manages Kubernetes clusters.** ### Architecture ``` ┌─────────────────────────────────────────┐ │ Management Cluster │ │ │ │ ┌──────────────────────────────────┐ │ │ │ CAPI Controllers │ │ │ │ ┌──────┐ ┌──────┐ ┌─────────┐ │ │ │ │ │ Infra│ │Bootstrap│ │Control │ │ │ │ │ │ Prov │ │ Prov │ │Plane Pr │ │ │ │ │ └──────┘ └──────┘ └─────────┘ │ │ │ └──────────────────────────────────┘ │ │ │ │ CR: Cluster, Machine, MachineDeployment│ │ ... │ └────────────────┬────────────────────────┘ │ CAPI controller │ creates / manages ┌────────┴────────┐ ▼ ▼ ┌───────────────┐ ┌───────────────┐ │ Workload │ │ Workload │ │ Cluster (dev) │ │ Cluster (prod)│ │ ┌───┐ ┌───┐ │ │ ┌───┐ ┌───┐ │ │ │ CP│ │ W │ │ │ │ CP│ │ W │ │ │ └───┘ └───┘ │ │ └───┘ └───┘ │ └───────────────┘ └───────────────┘ ``` - **Management cluster** — a Kubernetes cluster running CAPI controllers. Can be a dedicated small admin cluster. - **Workload (managed) cluster** — Kubernetes clusters managed by CAPI; each is a CRD inside the management cluster. - **Machine** — abstraction of a compute unit (VM, bare metal) that becomes a K8s node. ### Key CRDs (Custom Resource Definitions) | CRD | API group | Purpose | |-----|-----------|---------| | **Cluster** | `cluster.x-k8s.io` | Cluster representation (infra ref, control plane ref, networking) | | **Machine** | `cluster.x-k8s.io` | Individual node (VM/BM instance) | | **MachineDeployment** | `cluster.x-k8s.io` | Declarative scaling and rolling update of workers | | **MachineSet** | `cluster.x-k8s.io` | Replica set for Machines (lower-level) | | **MachineHealthCheck** | `cluster.x-k8s.io` | Auto-remediation (replace unhealthy nodes) | | **ClusterClass** | `cluster.x-k8s.io` | Cluster template for reuse | | **KubeadmControlPlane** | `controlplane.cluster.x-k8s.io` | Kubeadm-managed control plane (stacked/external etcd) | | **KubeadmConfig / KubeadmConfigTemplate** | `bootstrap.cluster.x-k8s.io` | Bootstrap configuration (kubeadm init/join) | ### Provider model CAPI uses a three-layer provider model: #### 1. Infrastructure Provider Creates and manages infrastructure (VM, networks, LB, storage). | Provider | Platform | Status | |----------|----------|--------| | **AWS (CAPA)** | AWS EC2, VPC, ELB, EKS | Stable, SIG-sponsored | | **Azure (CAPZ)** | Azure VM, VNet, LB, AKS | Stable, SIG-sponsored | | **GCP (CAPG)** | GCP Compute, VPC, GKE | Beta | | **vSphere (CAPV)** | VMware vSphere | Stable | | **OpenStack (CAPO)** | OpenStack compute/network | Stable | | **Metal3** | Bare metal (Ironic) | Stable | | **Docker (CAPD)** | Docker containers (development) | Tilt/Dev only | | **Akamai (Linode)** | Linode | Community | | **Azure Stack HCI** | Azure Stack HCI | Community | | **cloudscale** | cloudscale.ch | Community | | **Exoscale** | Exoscale | Community | | **IBM Cloud** | IBM Cloud | Community | | **Equinix Metal** | Equinix (ex Packet) | Community | | **Hetzner** | Hetzner Cloud | Community | | **OpenNebula** | OpenNebula | Community | #### 2. Bootstrap Provider Handles K8s initialization on a node (kubeadm init/join, TLS certs, tokens). | Provider | Description | |----------|-------------| | **Kubeadm** (built-in) | Standard kubeadm init/join, supports stacked/external etcd | | **EKS** | Bootstrap for EKS managed control plane (AWS) | | **K3s** | Lightweight K8s bootstrap (edge, IoT) | | **RKE2** | Rancher K8s bootstrap, CIS-hardened | | **Talos** | API-driven bootstrap (Sidero Labs), immutable OS | | **k0smotron** | K0s-based bootstrap + hosted control plane | | **MicroK8s** | Canonical MicroK8s bootstrap | | **Canonical Kubernetes** | Canonical K8s (snap-based) | #### 3. Control Plane Provider Manages control plane nodes. | Provider | Description | |----------|-------------| | **KubeadmControlPlane** (built-in) | Kubeadm-managed CP, stacked/external etcd | | **EKS** | AWS EKS managed control plane | | **Kamaji** | Hosted control plane (CP runs as deployment in management cluster) | | **K3s** | K3s control plane (edge-optimized) | | **RKE2** | RKE2 control plane | | **Talos** | Talos control plane, API-based management | | **k0smotron** | Hosted control plane (k0s-based) | | **Nested** | Nested virtualization control plane | ### ClusterClass and Managed Topologies ClusterClass (stable since CAPI v1beta1, CAPI v1.0+) allows defining a **cluster template**: ```yaml apiVersion: cluster.x-k8s.io/v1beta1 kind: ClusterClass metadata: name: standard-aws-cluster spec: controlPlane: ref: apiVersion: controlplane.cluster.x-k8s.io/v1beta1 kind: KubeadmControlPlaneTemplate name: aws-cp-tmpl machineInfrastructure: ref: kind: AWSMachineTemplate apiVersion: infrastructure.cluster.x-k8s.io/v1beta2 name: aws-cp-machine-tmpl workers: machineDeployments: - class: default-worker template: bootstrap: ref: apiVersion: bootstrap.cluster.x-k8s.io/v1beta1 kind: KubeadmConfigTemplate name: aws-worker-bootstrap-tmpl infrastructure: ref: apiVersion: infrastructure.cluster.x-k8s.io/v1beta2 kind: AWSMachineTemplate name: aws-worker-machine-tmpl variables: - name: instanceType required: true schema: openAPIV3Schema: type: string enum: ["t3.large", "m5.large", "m5.xlarge"] ``` Then create a cluster with variable overrides: ```yaml apiVersion: cluster.x-k8s.io/v1beta1 kind: Cluster metadata: name: dev-team-alpha namespace: clusters spec: topology: class: standard-aws-cluster version: v1.30.2 controlPlane: replicas: 1 workers: machineDeployments: - class: default-worker name: md-0 replicas: 2 variables: - name: instanceType value: "m5.xlarge" ``` ### Cluster lifecycle with CAPI | Phase | Action | CAPI mechanism | |-------|--------|----------------| | **Create** | `kubectl apply -f cluster.yaml` | Controller creates infra (VM, network), runs kubeadm init/join bootstrap | | **Scale** | Update `replicas` in MachineDeployment | Controller creates/removes Machine → VM → node join/drain | | **Upgrade** | Change `version` in KubeadmControlPlane / MachineDeployment | Rolling update: new CP node → upgrade → old drain & delete. Workers: MachineDeployment rolling update | | **Health check** | MachineHealthCheck | If node unhealthy > timeout, controller creates replacement Machine | | **Delete** | `kubectl delete cluster` | Controller drains, deletes VMs, cleans up infrastructure | | **Template update** | Change AWSMachineTemplate / KubeadmConfigTemplate | New Machines use the new template; existing Machines only affected via rolling update | ### Auto-remediation (MachineHealthCheck) ```yaml apiVersion: cluster.x-k8s.io/v1beta1 kind: MachineHealthCheck metadata: name: prod-mhc namespace: clusters spec: clusterName: prod-us-east selector: matchLabels: cluster.x-k8s.io/deployment-name: prod-us-east-workers unhealthyConditions: - type: Ready status: "False" timeout: 5m - type: Ready status: Unknown timeout: 5m maxUnhealthy: "40%" nodeStartupTimeout: 10m ``` ### CAPI + GitOps CAPI integrates naturally with GitOps: - **ArgoCD** — Cluster and MachineDeployment manifests in Git repo, ArgoCD applies them to the management cluster - **Flux** — `Kustomization` + `OCIRepository` for CAPI objects - **Crossplane** — can be combined: Crossplane provisions cloud resources (VPC, subnets), CAPI manages K8s clusters on top Pattern: a dedicated "fleet management" cluster running CAPI + ArgoCD. All workload clusters are defined as YAML in Git. ### CAPI for on-prem | Provider | Use case | Note | |----------|----------|------| | **Metal3** (Ironic) | Bare metal provisioning (PXE, IPMI, Redfish) | Automatically provisions BM servers as K8s nodes | | **CAPV (vSphere)** | VMware VMs as K8s nodes | Most common enterprise on-prem | | **CAPO (OpenStack)** | OpenStack VMs as K8s nodes | OpenStack-native | | **Nutanix (CAPNX)** | Nutanix AHV/Prism | Community provider | ### CAPI for edge | Provider | Use case | Note | |----------|----------|------| | **K3s bootstrap + control plane** | Lightweight K8s on edge devices | Single binary, SQLite/embedded etcd | | **RKE2 bootstrap + control plane** | Enterprise edge, air-gapped | CIS-hardened, FIPS | | **Talos** | Immutable OS, API-driven | Minimal footprint, no SSH | | **k0smotron** | Hosted control plane for edge clusters | CP runs in management cluster, worker on edge | ### CAPI vs alternatives | Tool | Approach | CAPI advantage | CAPI disadvantage | |------|----------|----------------|-------------------| | **Terraform/Pulumi** | Imperative/declarative IaC | CAPI is K8s-native — same tool for apps and clusters; GitOps ready | Terraform has broader non-K8s resource support | | **kubeadm** | Manual or scripted | CAPI automates full lifecycle including upgrades and remediation | Higher complexity, requires management cluster | | **Rancher** | Web UI + API for K8s cluster management | CAPI is open-source, vendor-neutral | Rancher has GUI, monitoring, app catalog | | **OpenShift Hive/ACM** | Red Hat Advanced Cluster Management | CAPI is standard (SIG) — wider provider ecosystem | ACM has governance, policy, compliance | ### Limitations and maturity - **Management cluster is SPOF** — needs its own HA and backup (etcd snapshots, certificates) - **CAPI is not a cluster autoscaler** — it handles cluster lifecycle, not pod auto-scaling within a cluster (use Cluster Autoscaler separately) - **Provider maturity varies** — AWS/Azure/vSphere stable, GCP/OpenStack beta, some community providers alpha - **etcd backup is not built-in** — must be handled externally (Velero, etcd snapshot) - **CAPI does not handle applications** — only K8s cluster lifecycle (monitoring, logging, ingress is user-managed) - **Learning curve** — requires understanding management cluster, provider model, CRDs - **CAPI v1.13+ (2026)** — stable release, v1beta1 API is GA, ClusterClass stable, EKS/AKS/GKE managed control plane support ### Recommended production CAPI stack | Component | Recommendation | |-----------|---------------| | **Management cluster** | K3s (small footprint) or kubeadm (3 nodes HA) | | **Infra provider** | CAPA (AWS) / CAPV (vSphere) / CAPO (OpenStack) — based on platform | | **Bootstrap/CP provider** | Kubeadm or RKE2 | | **GitOps** | ArgoCD or Flux | | **Backup** | Velero + restic/Ceph | | **Cluster autoscaler** | Cluster Autoscaler (via CAPI integration) | | **Network** | Cilium (CAPI-native, support) | | **Secrets** | External Secrets Operator / Sealed Secrets | | **Monitoring** | Prometheus + Grafana (kube-prometheus-stack) | | **Ingress** | ingress-nginx / Kong / Traefik | ## Sources Links, books and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md) *Last revision: 2026-06-18*