knowledge-base/KUBERNETES.en.md

# ☸ Kubernetes — architecture, platforms, Cluster API

## Overview

Kubernetes (K8s) is an open-source container orchestrator — the de facto standard for deploying, scaling, and managing containerized applications. Built on declarative configuration and control loops (reconciliation).

## Kubernetes deployment methods

| Method | Description | Control plane | Best for |
|--------|-------------|--------------|----------|
| **kubeadm** | Official K8s cluster bootstrap tool | Self-managed (stacked/external etcd) | On-prem, lab, learning |
| **K3s** | Lightweight K8s (Rancher), single binary, embedded etcd/SQLite | Self-managed | Edge, IoT, low-resource, HA with embedded etcd |
| **RKE2** | Rancher Kubernetes Engine 2, CIS-hardened, FIPS-ready | Self-managed | Enterprise on-prem, air-gapped, regulatory |
| **OpenShift** | Red Hat enterprise K8s + operator lifecycle + SDN + routing | Self-managed (RHCOS) | Enterprise, multicluster, platform engineering |
| **Vanilla K8s (CAPI)** | Cluster API — declarative provisioning and lifecycle management | Self-managed (CAPI managed) | Fleet management, GitOps, multi-provider |
| **EKS** (AWS) | Managed K8s | AWS managed | AWS cloud-native, least ops |
| **AKS** (Azure) | Managed K8s | Azure managed | Azure cloud-native |
| **GKE** (GCP) | Managed K8s, auto-pilot, autopilot modes | GCP managed | GCP cloud-native |
| **SKE** (Sangfor) | Managed K8s on Sangfor HCI | Vendor managed | Sangfor HCI ecosystem |

---

## Cluster API (CAPI)

### What is Cluster API

Cluster API is a Kubernetes sub-project (SIG Cluster-Lifecycle) that brings declarative APIs for provisioning, upgrading, and operating Kubernetes clusters. Instead of Terraform scripts or manual `kubeadm`, you define clusters as Kubernetes Custom Resources — `Cluster`, `Machine`, `MachineDeployment`, etc.

Core principle: **A Kubernetes cluster that manages Kubernetes clusters.**

### Architecture

```
┌─────────────────────────────────────────┐
│           Management Cluster            │
│                                         │
│  ┌──────────────────────────────────┐   │
│  │        CAPI Controllers          │   │
│  │  ┌──────┐ ┌──────┐ ┌─────────┐  │   │
│  │  │ Infra│ │Bootstrap│ │Control  │  │   │
│  │  │ Prov │ │ Prov   │ │Plane Pr │  │   │
│  │  └──────┘ └──────┘ └─────────┘  │   │
│  └──────────────────────────────────┘   │
│                                         │
│  CR: Cluster, Machine, MachineDeployment│
│  ...                                    │
└────────────────┬────────────────────────┘
                 │ CAPI controller
                 │ creates / manages
        ┌────────┴────────┐
        ▼                 ▼
┌───────────────┐  ┌───────────────┐
│ Workload      │  │ Workload      │
│ Cluster (dev) │  │ Cluster (prod)│
│ ┌───┐ ┌───┐   │  │ ┌───┐ ┌───┐   │
│ │ CP│ │ W │   │  │ │ CP│ │ W │   │
│ └───┘ └───┘   │  │ └───┘ └───┘   │
└───────────────┘  └───────────────┘
```

- **Management cluster** — a Kubernetes cluster running CAPI controllers. Can be a dedicated small admin cluster.
- **Workload (managed) cluster** — Kubernetes clusters managed by CAPI; each is a CRD inside the management cluster.
- **Machine** — abstraction of a compute unit (VM, bare metal) that becomes a K8s node.

### Key CRDs (Custom Resource Definitions)

| CRD | API group | Purpose |
|-----|-----------|---------|
| **Cluster** | `cluster.x-k8s.io` | Cluster representation (infra ref, control plane ref, networking) |
| **Machine** | `cluster.x-k8s.io` | Individual node (VM/BM instance) |
| **MachineDeployment** | `cluster.x-k8s.io` | Declarative scaling and rolling update of workers |
| **MachineSet** | `cluster.x-k8s.io` | Replica set for Machines (lower-level) |
| **MachineHealthCheck** | `cluster.x-k8s.io` | Auto-remediation (replace unhealthy nodes) |
| **ClusterClass** | `cluster.x-k8s.io` | Cluster template for reuse |
| **KubeadmControlPlane** | `controlplane.cluster.x-k8s.io` | Kubeadm-managed control plane (stacked/external etcd) |
| **KubeadmConfig / KubeadmConfigTemplate** | `bootstrap.cluster.x-k8s.io` | Bootstrap configuration (kubeadm init/join) |

### Provider model

CAPI uses a three-layer provider model:

#### 1. Infrastructure Provider
Creates and manages infrastructure (VM, networks, LB, storage).

| Provider | Platform | Status |
|----------|----------|--------|
| **AWS (CAPA)** | AWS EC2, VPC, ELB, EKS | Stable, SIG-sponsored |
| **Azure (CAPZ)** | Azure VM, VNet, LB, AKS | Stable, SIG-sponsored |
| **GCP (CAPG)** | GCP Compute, VPC, GKE | Beta |
| **vSphere (CAPV)** | VMware vSphere | Stable |
| **OpenStack (CAPO)** | OpenStack compute/network | Stable |
| **Metal3** | Bare metal (Ironic) | Stable |
| **Docker (CAPD)** | Docker containers (development) | Tilt/Dev only |
| **Akamai (Linode)** | Linode | Community |
| **Azure Stack HCI** | Azure Stack HCI | Community |
| **cloudscale** | cloudscale.ch | Community |
| **Exoscale** | Exoscale | Community |
| **IBM Cloud** | IBM Cloud | Community |
| **Equinix Metal** | Equinix (ex Packet) | Community |
| **Hetzner** | Hetzner Cloud | Community |
| **OpenNebula** | OpenNebula | Community |

#### 2. Bootstrap Provider
Handles K8s initialization on a node (kubeadm init/join, TLS certs, tokens).

| Provider | Description |
|----------|-------------|
| **Kubeadm** (built-in) | Standard kubeadm init/join, supports stacked/external etcd |
| **EKS** | Bootstrap for EKS managed control plane (AWS) |
| **K3s** | Lightweight K8s bootstrap (edge, IoT) |
| **RKE2** | Rancher K8s bootstrap, CIS-hardened |
| **Talos** | API-driven bootstrap (Sidero Labs), immutable OS |
| **k0smotron** | K0s-based bootstrap + hosted control plane |
| **MicroK8s** | Canonical MicroK8s bootstrap |
| **Canonical Kubernetes** | Canonical K8s (snap-based) |

#### 3. Control Plane Provider
Manages control plane nodes.

| Provider | Description |
|----------|-------------|
| **KubeadmControlPlane** (built-in) | Kubeadm-managed CP, stacked/external etcd |
| **EKS** | AWS EKS managed control plane |
| **Kamaji** | Hosted control plane (CP runs as deployment in management cluster) |
| **K3s** | K3s control plane (edge-optimized) |
| **RKE2** | RKE2 control plane |
| **Talos** | Talos control plane, API-based management |
| **k0smotron** | Hosted control plane (k0s-based) |
| **Nested** | Nested virtualization control plane |

### ClusterClass and Managed Topologies

ClusterClass (stable since CAPI v1beta1, CAPI v1.0+) allows defining a **cluster template**:

```yaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
  name: standard-aws-cluster
spec:
  controlPlane:
    ref:
      apiVersion: controlplane.cluster.x-k8s.io/v1beta1
      kind: KubeadmControlPlaneTemplate
      name: aws-cp-tmpl
    machineInfrastructure:
      ref:
        kind: AWSMachineTemplate
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
        name: aws-cp-machine-tmpl
  workers:
    machineDeployments:
    - class: default-worker
      template:
        bootstrap:
          ref:
            apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
            kind: KubeadmConfigTemplate
            name: aws-worker-bootstrap-tmpl
        infrastructure:
          ref:
            apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
            kind: AWSMachineTemplate
            name: aws-worker-machine-tmpl
  variables:
    - name: instanceType
      required: true
      schema:
        openAPIV3Schema:
          type: string
          enum: ["t3.large", "m5.large", "m5.xlarge"]
```

Then create a cluster with variable overrides:

```yaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: dev-team-alpha
  namespace: clusters
spec:
  topology:
    class: standard-aws-cluster
    version: v1.30.2
    controlPlane:
      replicas: 1
    workers:
      machineDeployments:
      - class: default-worker
        name: md-0
        replicas: 2
    variables:
      - name: instanceType
        value: "m5.xlarge"
```

### Cluster lifecycle with CAPI

| Phase | Action | CAPI mechanism |
|-------|--------|----------------|
| **Create** | `kubectl apply -f cluster.yaml` | Controller creates infra (VM, network), runs kubeadm init/join bootstrap |
| **Scale** | Update `replicas` in MachineDeployment | Controller creates/removes Machine → VM → node join/drain |
| **Upgrade** | Change `version` in KubeadmControlPlane / MachineDeployment | Rolling update: new CP node → upgrade → old drain & delete. Workers: MachineDeployment rolling update |
| **Health check** | MachineHealthCheck | If node unhealthy > timeout, controller creates replacement Machine |
| **Delete** | `kubectl delete cluster` | Controller drains, deletes VMs, cleans up infrastructure |
| **Template update** | Change AWSMachineTemplate / KubeadmConfigTemplate | New Machines use the new template; existing Machines only affected via rolling update |

### Auto-remediation (MachineHealthCheck)

```yaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineHealthCheck
metadata:
  name: prod-mhc
  namespace: clusters
spec:
  clusterName: prod-us-east
  selector:
    matchLabels:
      cluster.x-k8s.io/deployment-name: prod-us-east-workers
  unhealthyConditions:
  - type: Ready
    status: "False"
    timeout: 5m
  - type: Ready
    status: Unknown
    timeout: 5m
  maxUnhealthy: "40%"
  nodeStartupTimeout: 10m
```

### CAPI + GitOps

CAPI integrates naturally with GitOps:

- **ArgoCD** — Cluster and MachineDeployment manifests in Git repo, ArgoCD applies them to the management cluster
- **Flux** — `Kustomization` + `OCIRepository` for CAPI objects
- **Crossplane** — can be combined: Crossplane provisions cloud resources (VPC, subnets), CAPI manages K8s clusters on top

Pattern: a dedicated "fleet management" cluster running CAPI + ArgoCD. All workload clusters are defined as YAML in Git.

### CAPI for on-prem

| Provider | Use case | Note |
|----------|----------|------|
| **Metal3** (Ironic) | Bare metal provisioning (PXE, IPMI, Redfish) | Automatically provisions BM servers as K8s nodes |
| **CAPV (vSphere)** | VMware VMs as K8s nodes | Most common enterprise on-prem |
| **CAPO (OpenStack)** | OpenStack VMs as K8s nodes | OpenStack-native |
| **Nutanix (CAPNX)** | Nutanix AHV/Prism | Community provider |

### CAPI for edge

| Provider | Use case | Note |
|----------|----------|------|
| **K3s bootstrap + control plane** | Lightweight K8s on edge devices | Single binary, SQLite/embedded etcd |
| **RKE2 bootstrap + control plane** | Enterprise edge, air-gapped | CIS-hardened, FIPS |
| **Talos** | Immutable OS, API-driven | Minimal footprint, no SSH |
| **k0smotron** | Hosted control plane for edge clusters | CP runs in management cluster, worker on edge |

### CAPI vs alternatives

| Tool | Approach | CAPI advantage | CAPI disadvantage |
|------|----------|----------------|-------------------|
| **Terraform/Pulumi** | Imperative/declarative IaC | CAPI is K8s-native — same tool for apps and clusters; GitOps ready | Terraform has broader non-K8s resource support |
| **kubeadm** | Manual or scripted | CAPI automates full lifecycle including upgrades and remediation | Higher complexity, requires management cluster |
| **Rancher** | Web UI + API for K8s cluster management | CAPI is open-source, vendor-neutral | Rancher has GUI, monitoring, app catalog |
| **OpenShift Hive/ACM** | Red Hat Advanced Cluster Management | CAPI is standard (SIG) — wider provider ecosystem | ACM has governance, policy, compliance |

### Limitations and maturity

- **Management cluster is SPOF** — needs its own HA and backup (etcd snapshots, certificates)
- **CAPI is not a cluster autoscaler** — it handles cluster lifecycle, not pod auto-scaling within a cluster (use Cluster Autoscaler separately)
- **Provider maturity varies** — AWS/Azure/vSphere stable, GCP/OpenStack beta, some community providers alpha
- **etcd backup is not built-in** — must be handled externally (Velero, etcd snapshot)
- **CAPI does not handle applications** — only K8s cluster lifecycle (monitoring, logging, ingress is user-managed)
- **Learning curve** — requires understanding management cluster, provider model, CRDs
- **CAPI v1.13+ (2026)** — stable release, v1beta1 API is GA, ClusterClass stable, EKS/AKS/GKE managed control plane support

### Recommended production CAPI stack

| Component | Recommendation |
|-----------|---------------|
| **Management cluster** | K3s (small footprint) or kubeadm (3 nodes HA) |
| **Infra provider** | CAPA (AWS) / CAPV (vSphere) / CAPO (OpenStack) — based on platform |
| **Bootstrap/CP provider** | Kubeadm or RKE2 |
| **GitOps** | ArgoCD or Flux |
| **Backup** | Velero + restic/Ceph |
| **Cluster autoscaler** | Cluster Autoscaler (via CAPI integration) |
| **Network** | Cilium (CAPI-native, support) |
| **Secrets** | External Secrets Operator / Sealed Secrets |
| **Monitoring** | Prometheus + Grafana (kube-prometheus-stack) |
| **Ingress** | ingress-nginx / Kong / Traefik |

## Sources

Links, books and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)

*Last revision: 2026-06-18*