18.6.2026

2026-06-18 16:25:33 +02:00
parent b53714113c
commit ef3c2f75b1
43 changed files with 3637 additions and 129 deletions
--- a/AI-INFRASTRUCTURE.en.md
+++ b/AI-INFRASTRUCTURE.en.md
@@ -0,0 +1,600 @@
+# 🧠 AI/ML Infrastructure
+
+## Component overview
+
+```mermaid
+flowchart TD
+    subgraph Compute
+        GPU["GPU (H100/B200/Instinct)"]
+        CPU["CPU (AMD EPYC / Intel Xeon)"]
+        ASIC["ASIC (TPU, Trainium, Inferentia)"]
+    end
+    subgraph Network
+        IB["InfiniBand NDR/XDR"]
+        ROCE["RoCEv2"]
+        NVL["NVLink / NVSwitch"]
+    end
+    subgraph Storage
+        FS["Parallel FS (Lustre, GPFS, Weka)"]
+        OBJ["Object Store (S3, MinIO)"]
+        NVME["Local NVMe cache"]
+    end
+    subgraph Orchestration
+        S["Slurm"]
+        K["Kubernetes + Volcano/Kueue"]
+    end
+    subgraph Cooling
+        DLC["Direct-to-chip liquid"]
+        IMM["Immersion"]
+        AIR["Air (high-density)"]
+    end
+
+    Compute --> Network --> Storage
+    Orchestration --> Compute
+    Cooling --> Compute
+```
+
+---
+
+## GPU compute
+
+### NVIDIA
+
+| GPU | Architecture | FP8 | FP16/BF16 | FP64 | HBM | NVLink | TDP | Rack config |
+|-----|-------------|-----|-----------|------|-----|--------|-----|------|
+| **H100 SXM** | Hopper | 3,958 TFLOPS | 1,979 TFLOPS | 67 TFLOPS | 80 GB HBM3 | 900 GB/s | 700 W | 6–8× in DGX H100 |
+| **H200 SXM** | Hopper (HBM3e) | 3,958 TFLOPS | 1,979 TFLOPS | 67 TFLOPS | 141 GB HBM3e | 900 GB/s | 700 W | 6–8× in DGX H200 |
+| **B200** | Blackwell | ~9,000 TFLOPS | ~4,500 TFLOPS | ~40 TFLOPS | 192 GB HBM3e | 1,800 GB/s | 1,000 W | 6–8× in DGX B200 |
+| **GB200 Grace Hopper** | Blackwell | ~18,000 TFLOPS | ~9,000 TFLOPS | — | 192 GB + 480 GB (Grace) | NVLink-C2C | 1,000 W (GPU) + 500 W (CPU) | DGX GB200 (36× GPU) |
+| **L40S** | Ada Lovelace | 733 TFLOPS | 367 TFLOPS | — | 48 GB GDDR6 | N/A | 350 W | Inference, enterprise |
+| **A100 SXM** | Ampere | 1,248 TFLOPS | 624 TFLOPS | 19.5 TFLOPS | 80 GB HBM2e | 600 GB/s | 400 W | DGX A100 |
+
+### AMD
+
+| GPU | Architecture | FP8 | FP16/BF16 | FP64 | HBM | Infinity Fabric | TDP |
+|-----|-------------|-----|-----------|------|-----|----------------|-----|
+| **MI300X** | CDNA 3 | 2,615 TFLOPS | 1,307 TFLOPS | 81 TFLOPS | 192 GB HBM3 | 896 GB/s | 750 W |
+| **MI250** | CDNA 2 | — | 383 TFLOPS | 95.7 TFLOPS | 128 GB HBM2e | 400 GB/s | 500 W |
+
+### Intel
+
+| GPU | Architecture | FP16/BF16 | FP32 | HBM | TDP |
+|-----|-------------|-----------|------|-----|-----|
+| **Gaudi 3** | Custom | 1,835 TFLOPS | — | 144 GB HBM2e | 600 W |
+| **Max 1550** | Xe HPC | 600+ TFLOPS | 200 TFLOPS | 128 GB HBM2e | 600 W |
+
+### Cloud ASIC
+
+| ASIC | Provider | Use case | Performance |
+|------|----------|----------|-------|
+| **TPU v5p** | Google | Training | ~4,600 TFLOPS (BF16) per pod |
+| **Trainium 2** | AWS | Training | ~1,000 TFLOPS (BF16) per chip |
+| **Inferentia 2** | AWS | Inference | ~400 TOPS (INT8) per chip |
+| **Maia 100** | Microsoft | Training + inference | Custom, 800 W TDP |
+
+---
+
+## AI networking
+
+### Technology comparison
+
+| Technology | Bandwidth per link | Latency | Topology | Use case |
+|-------------|-------------------|---------|-----------|----------|
+| **InfiniBand NDR200** | 200 Gb/s | < 1 µs | Fat-tree, Dragonfly+ | Training (NVIDIA) |
+| **InfiniBand NDR400** | 400 Gb/s | < 1 µs | Fat-tree, Dragonfly+ | Training (NVIDIA) |
+| **InfiniBand XDR** | 800 Gb/s (planned) | < 1 µs | Dragonfly+ | Next-gen training |
+| **RoCEv2** (CX-7/8) | 200–400 Gb/s | 1–2 µs | Fat-tree, Spine-leaf | Training (AMD, Intel, open) |
+| **NVLink 4.0** | 900 GB/s per GPU | < 0.5 µs | NVSwitch full-mesh | Intra-node GPU comm |
+| **NVLink 5.0** | 1,800 GB/s per GPU | < 0.5 µs | NVSwitch full-mesh | Intra-node (Blackwell) |
+| **Ethernet (400 GbE)** | 400 Gb/s | 2–5 µs | Spine-leaf | Inference, data pipeline |
+
+### AI fabric principles
+
+- **Rail-optimized topology** — each GPU communicates on dedicated "rails" (same GPU indices across nodes connect to the same switch)
+- **Fat-tree (Clos)** — standard for InfiniBand and RoCE, non-blocking bisection bandwidth
+- **Dragonfly+** — reduces hop count while maintaining bandwidth (used in largest clusters)
+- **GPU Direct RDMA** — direct GPU ↔ GPU communication without CPU involvement, supports InfiniBand and RoCE
+- **SHARP (Scalable Hierarchical Aggregation and Reduction Protocol)** — in-network reduction for AllReduce (InfiniBand only)
+
+### Bandwidth sizing
+
+```text
+Rule of thumb: InfiniBand bandwidth ≥ 50 % GPU HBM bandwidth for scalable training
+
+Example: H100 has 3.35 TB/s HBM
+  → Needs min. 1.6 TB/s bisection bandwidth per GPU
+  → 8× H100 in DGX: 4× NDR400 IB per GPU = 4 × 50 GB/s = 200 GB/s
+  → Reality: 8× 200 Gb/s (25 GB/s) per GPU in typical config = ~6 % HBM → bottleneck
+```
+
+---
+
+## AI storage
+
+### Requirements
+
+| Dataset size | IO pattern | Recommended storage | Bandwidth |
+|-------------|-----------|-------------------|-----------|
+| < 10 TB | Sequential read (data loading) | Local NVMe | > 10 GB/s per node |
+| 10–100 TB | Random read (checkpointing) | Parallel FS (Lustre, Weka) | > 100 GB/s cluster-wide |
+| 100 TB–10 PB | Mixed (training + checkpoint) | Parallel FS + object store | > 500 GB/s |
+| 10 PB+ | Multi-modal, video, LLM | Tiered (NVMe cache + parallel FS + object) | > 1 TB/s |
+
+### Storage solution comparison
+
+| Solution | Type | Bandwidth per node | Max capacity | Scaling | Use case |
+|--------|-----|-------------------|-------------|-----------|----------|
+| **Lustre** | Parallel FS (POSIX) | > 100 GB/s (cluster) | 100s PB | OST + MDS | HPC, LLM training (standard) |
+| **GPFS / StorageScale** | Parallel FS (POSIX) | > 100 GB/s | 100s PB | NSD servers | HPC, AI (IBM) |
+| **WekaFS** | Parallel FS (POSIX + NFS/SMB) | ~80 GB/s per 10 nodes | 10s PB | Container-native | AI/ML, NVIDIA DGX preferred |
+| **VAST Data** | Universal storage (NVMe + QLC) | ~100 GB/s per cluster | 10s PB | Scale-out | AI, checkpoint, data lake |
+| **Pure Storage//E** | All-flash (NVMe) | ~50 GB/s | ~30 PB | Scale-out | Enterprise AI, database |
+| **MinIO / S3** | Object store | ~20 GB/s per gateway | EB | Erasure coding | Dataset repository, checkpoint |
+| **NetApp AFF** | NAS + S3 | ~10 GB/s per controller | ~50 PB | HA pair | Enterprise, NFS baseline |
+
+### Checkpointing strategies
+
+| Strategy | RPO | Storage impact | Description |
+|-----------|-----|---------------|-------|
+| **Full checkpoint** | every N steps | High (stops training) | Full model + optimizer state |
+| **Async checkpoint** | every N steps | Medium (non-blocking) | Copy to staging buffer, async write |
+| **Distributed checkpoint** (NVIDIA NeMo) | every N steps | Low | Each rank writes its own shard |
+| **In-memory checkpoint** (IBM) | on failover | Minimal (DRAM) | Replication to another node's DRAM |
+| **Continuous checkpoint** (Microsoft) | every 1–5 min | Low (delta) | Changed shards only |
+
+---
+
+## AI cluster architecture
+
+### Physical topology — DGX H100 example
+
+```
+┌──────── DGX H100 (8× GPU) ────────┐
+│  ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │
+│  │GPU 0│ │GPU 1│ │GPU 2│ │GPU 3│ │
+│  └──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘ │
+│  ┌──┴──┐ ┌──┴──┐ ┌──┴──┐ ┌──┴──┐ │
+│  │GPU 4│ │GPU 5│ │GPU 6│ │GPU 7│ │
+│  └─────┘ └─────┘ └─────┘ └─────┘ │
+│  NVSwitch (NVLink 4.0, 900 GB/s)  │
+│  InfiniBand CX-7: 8× NDR400       │
+└────────────────────────────────────┘
+         │ 8× IB rails
+    ┌────┴──────────────┐
+    │  IB NDR400 Switches │  (rail-optimized)
+    └────────────────────┘
+```
+
+### Kubernetes for AI
+
+| Component | Role |
+|-----------|------|
+| **Volcano** | Batch scheduling, gang scheduling, queue management |
+| **Kueue** | Multi-tenant admission, resource quotas, fair sharing |
+| **NVIDIA GPU Operator** | Driver, container toolkit, MIG, DCGM, monitoring |
+| **HAMi** (ex k8s-vGPU-scheduler) | GPU sharing, MIG partitioning, fractional GPU |
+| **Node Feature Discovery** | GPU type detection, NUMA topology |
+| **Topology Manager** | NUMA-aware pod placement |
+| **DPDK / SR-IOV** | High-performance networking for GPU Direct RDMA |
+
+### Slurm for AI
+
+| Component | Role |
+|-----------|------|
+| **slurm.conf** | Partition for GPU nodes, GRES (Generic Resource) |
+| **gres.conf** | GPU type, GPU count per node |
+| **srun --gres=gpu:8** | Allocate 8 GPUs per job |
+| **sbatch --nodes=64 --ntasks=512** | 64 nodes, 512 ranks (8 GPU/node) |
+| **Pixis** | NVIDIA orchestration plugin for Slurm |
+
+---
+
+## AI cluster cooling
+
+### Power density comparison
+
+| Configuration | TDP per node | Racks | kW/rack | Note |
+|-------------|-------------|-------|---------|----------|
+| Standard server (2U) | 1 kW | 20 | 5–10 | Typical DC |
+| GPU server (DGX H100, 6×) | 42 kW | 6 | 45–50 | Air cooling limit |
+| GPU server (DGX B200, 6×) | 72 kW | 6 | 90–100 | Liquid cooling required |
+| GPU server (GB200 NVL72) | 120 kW | — | ~120 | Liquid cooling mandatory |
+| NVIDIA NVL72 rack | 120 kW | 1 | 120 | Fully liquid cooled |
+
+### Cooling technologies
+
+| Method | Max kW/rack | CAPEX | OPEX | Complexity |
+|--------|-------------|-------|------|-----------|
+| **Air cooling (CRAC/CRAH)** | < 15 | Low | Medium | Low |
+| **Air cooling (in-row)** | 15–30 | Medium | Medium | Low |
+| **Rear-door heat exchanger** | 30–50 | Medium | Low | Medium |
+| **Direct-to-chip liquid (cold plate)** | 50–150 | High | Low | High |
+| **Immersion (single-phase)** | 100–200 | High | Low | High |
+| **Immersion (two-phase)** | 200+ | Very high | Low | Very high |
+
+---
+
+## Inference infrastructure
+
+### Inference server comparison
+
+| Tool | Frameworks | Optimization | Use case |
+|---------|-----------|-------------|----------|
+| **vLLM** | Megatron, HF, AWQ, GPTQ | PagedAttention, KV cache, continuous batching | LLM inference (open source) |
+| **TensorRT-LLM** | TensorRT | INT4/INT8/FP8, inflight batching, attention optimizations | Production (NVIDIA) |
+| **Triton Inference Server** | All (TensorRT, vLLM, PyTorch) | Model ensemble, model caching, concurrent execution | Enterprise, multi-model |
+| **SageMaker** | Managed | Auto-scaling, model parallelism | AWS managed |
+| **OpenAI API / TGI** | HF Transformers | Continuous batching, flash attention | Hosting |
+
+### Inference optimization
+
+| Technique | Latency improvement | Throughput improvement | Memory reduction |
+|----------|-----------------|---------------------|------------------|
+| **FP8/INT8 quantization** | — | 2× | 2× |
+| **INT4 quantization** | — | 4× | 4× |
+| **Flash Attention 2/3** | 2–4× | — | 50 % (KV cache) |
+| **PagedAttention** | — | 2–5× | 95 % (KV cache fragmentation) |
+| **Continuous batching** | — | 10–20× | — |
+| **Speculative decoding** | 2–3× | — | — |
+| **Multi-LoRA / S-LoRA** | — | 8–16× | — |
+
+---
+
+## Distributed training techniques
+
+| Technique | Description | Frameworks |
+|----------|-------|------------|
+| **Data Parallelism (DDP/FSDP)** | Each GPU has model copy, different batch | PyTorch DDP, FSDP |
+| **Tensor Parallelism (TP)** | Model split across layers (intra-node) | Megatron-LM, DeepSpeed |
+| **Pipeline Parallelism (PP)** | Layers split across nodes | Megatron-LM, DeepSpeed |
+| **Sequence Parallelism (SP)** | Sequence split across GPUs | Megatron-LM |
+| **Expert Parallelism (EP)** | Different expert subnets on different GPUs | Mixture-of-Experts (MoE) |
+| **3D Parallelism** | TP + PP + DP combination | Megatron-LM, NeMo |
+| **ZeRO (1/2/3)** | Optimizer/gradient/parameter sharding | DeepSpeed |
+| **NCCL / RCCL** | GPU collective communication library | NVIDIA/AMD |
+
+---
+
+## Operating systems for AI
+
+### Distribution comparison
+
+| OS | GPU driver | CUDA | Container toolkit | IB/RoCE | Lustre client | Production support |
+|----|-----------|------|-------------------|---------|--------------|-------------------|
+| **Ubuntu 22.04 LTS** | NVIDIA 525+ | 12.x | nvidia-container-toolkit | MLNX_OFED, rdma-core | Yes (lustre-client) | NVIDIA DGX standard |
+| **Ubuntu 24.04 LTS** | NVIDIA 550+ | 12.5+ | nvidia-container-toolkit | MLNX_OFED, rdma-core | Yes | Latest GPU support |
+| **RHEL 9 / Rocky 9** | NVIDIA 525+ | 12.x | nvidia-container-toolkit | MLNX_OFED | Yes (EL repo) | Red Hat, enterprise |
+| **DGX OS** (Ubuntu-based) | NVIDIA custom | 12.x | Pre-installed | Pre-configured | Yes | NVIDIA DGX only supported |
+| **SLES 15 SP5** | NVIDIA 525+ | 12.x | nvidia-container-toolkit | MLNX_OFED | Yes | HPC, some Lustre clusters |
+| **Debian 12** | NVIDIA 525+ | 12.x | nvidia-container-toolkit | rdma-core | Yes (backports) | Community, research |
+| **Flatcar / Bottlerocket** | Container-host | — | nvidia-container-toolkit | Limited | No | K8s-only, minimal footprint |
+
+### Limitations and constraints
+
+#### GPU drivers and CUDA
+
+| Constraint | Detail |
+|----------|--------|
+| **Driver-CUDA compatibility** | NVIDIA driver major version must match CUDA toolkit (driver ≥ CUDA req). E.g., CUDA 12.5 requires driver ≥ 550 |
+| **Kernel version** | NVIDIA driver not compatible with all kernels. New kernel (6.8+) may require DKMS build or delayed support |
+| **Secure Boot** | NVIDIA driver requires signed module (MOK, shim) or disabled Secure Boot — common enterprise issue |
+| **Open vs Proprietary driver** | NVIDIA `nvidia-open` (since R515) — open source kernel module. GPU support: DC (H100+) → OK, older GPUs → proprietary required |
+| **nvidia-persistenced** | Required to maintain GPU initialization; without it GPUs may sleep after idle timeout (`nvidia-smi -pm 1`) |
+| **GPU reset** | After crashed training job, GPU may hang. `nvidia-smi --gpu-reset` or reboot node, sometimes power cycle |
+| **Multi-instance GPU (MIG)** | Requires specific driver, MIG mode on GPU, GPU restart. Cannot be changed at runtime. A100, H100, B200 only |
+
+#### Network (InfiniBand / RoCE)
+
+| Constraint | Detail |
+|----------|--------|
+| **MLNX_OFED vs rdma-core** | MLNX_OFED (NVIDIA) — full support, but own kernel modules, kernel version compatibility needed. `rdma-core` (open) — limited support, no custom modules |
+| **Kernel compatibility** | MLNX_OFED supports only specific kernel versions (major.minor). Kernel upgrade → MLNX_OFED rebuild required |
+| **NCCL** | NCCL version must be compatible with CUDA and IB firmware. `nccl-tests` for validation |
+| **SHARP** | In-network reduction requires specific MLNX_OFED + IB switch firmware combination |
+| **GPU Direct RDMA** | Requires `nvidia-peermem` module + MLNX_OFED. Does not work with all GPU and IB card combinations |
+| **RoCE PFC/ECN** | RoCE requires lossless fabric (PFC, ECN, DCQCN). Switch and host configuration — complex tuning |
+
+#### Storage
+
+| Constraint | Detail |
+|----------|--------|
+| **Lustre client** | Client version must match server. Server upgrade → upgrade all clients. Compatible with RHEL/Debian derivatives only |
+| **POSIX locking** | NFS and Lustre have different POSIX locking behavior. Distributed training relies on flock → problematic with mixed FS |
+| **Filesystem cache** | Page cache can mask IO bottlenecks. Training jobs often require `O_DIRECT` or sync IO |
+| **Local NVMe vs parallel FS** | Dataset staging on local NVMe eliminates network dependency but requires space and pre-fetch pipeline |
+
+#### Container runtime
+
+| Constraint | Detail |
+|----------|--------|
+| **Docker + GPU** | `nvidia-container-toolkit` (formerly nvidia-docker2). Requires runtime installation and config in `/etc/docker/daemon.json` |
+| **Podman + GPU** | Requires `nvidia-container-toolkit` + podman hook. Less tested than Docker |
+| **containerd + GPU** | Standard for K8s. Requires `cdi` (Container Device Interface) or `nvidia-container-runtime` |
+| **Enroot + Pyxis** | NVIDIA container stack for Slurm (Enroot = daemonless container runtime, Pyxis = Slurm plugin) |
+| **User namespace mapping** | Container GPU access requires device cgroup; rootless may fail (exception for /dev/dri and /dev/nvidia*) |
+
+#### Kernel parameters
+
+```text
+# AI workload recommended sysctl
+net.core.rmem_max = 134217728       # sufficient for NCCL
+net.core.wmem_max = 134217728
+net.ipv4.tcp_rmem = 4096 87380 134217728
+net.ipv4.tcp_wmem = 4096 65536 134217728
+net.core.netdev_budget = 600        # for high packet rate
+vm.max_map_count = 1048576          # PyTorch DataLoader workers
+kernel.numa_balancing = 0           # disable NUMA balancing (breaks locality)
+kernel.sched_min_granularity_ns = 10000000
+
+# Disable security mitigations for perf (dedicated AI clusters only)
+mitigations=off
+transparent_hugepages=never         # or madvise — THP may cause latency spikes
+intel_idle.max_cstate=1             # reduce C-state transition latency
+```
+
+#### Firmware and HW
+
+| Constraint | Detail |
+|----------|--------|
+| **GPU firmware (VBIOS)** | NVIDIA datacenter GPUs (H100, B200) have VBIOS updates via NVFlash. Without update → missing partitioning support or newer CUDA features |
+| **InfiniBand firmware** | IB switch and HCA firmware must be compatible. Mix old switch + new HCA → degraded perf |
+| **NVSwitch firmware** | DGX systems have NVSwitch firmware updatable only via NVIDIA DGX tools |
+| **Power capping (nvidia-smi)** | `nvidia-smi -pl <power>` — limit TDP for power budget management. Test impact on training throughput |
+| **GPU clock locking** | `nvidia-smi -ac <clock,mem>` — locked clock frequency for stable benchmarks. Apply after `nvidia-persistenced` |
+| **PCIe Gen** | GPU in PCIe Gen4 slot (instead of Gen5) → bottleneck for CPU↔GPU data transfer. Important for FSDP sharding |
+
+### Recommended OS per use case
+
+| Use case | OS | Rationale |
+|----------|-----|-------|
+| **DGX cluster (production)** | DGX OS / Ubuntu 22.04 LTS | NVIDIA standard, best driver support |
+| **Enterprise K8s (OpenShift)** | RHEL 9 / RHCOS | Red Hat support, GPU Operator compatible |
+| **Vanilla K8s (on-prem)** | Ubuntu 22.04 LTS + Flatcar (workers) | Widest community support, Flatcar for minimal footprint |
+| **Slurm cluster (HPC/AI)** | Rocky Linux 9 / Ubuntu 22.04 LTS | EL ecosystem (Lustre, OFED) or Ubuntu (community) |
+| **Research / rapid prototyping** | Ubuntu 24.04 LTS | Latest CUDA, PyTorch, driver support |
+| **Edge inference** | NVIDIA JetPack / Ubuntu (ARM) | Embedded GPU (Jetson Orin, AGX) |
+
+---
+
+## AI-ready data center — check-list
+
+| Area | Requirement |
+|--------|-----------|
+| **Power** | 30–120 kW/rack, HVDC (400 V DC), UPS supporting GPU spikes |
+| **Cooling** | Liquid cooling ready (direct-to-chip), rear-door for 30+ kW |
+| **Network** | InfiniBand (NDR/XDR) or RoCEv2, rail-optimized fat-tree |
+| **Storage** | Parallel FS (Lustre/Weka), checkpoint bandwidth > 100 GB/s |
+| **GPU density** | Max GPU/rack, minimize NVSwitch hops |
+| **Physical** | Floor load 1,500+ kg/m², rack 52U–60U |
+| **Security** | Tenant isolation, network segmentation, data encryption |
+| **Monitoring** | DCGM, NCCL health checks, thermals, power capping |
+
+---
+
+## Model and throughput limitations
+
+### Model size per GPU
+
+Maximum model size fitting on a single GPU depends on HBM capacity and precision:
+
+| GPU | HBM | FP32 | FP16/BF16 | INT8 | INT4 |
+|-----|-----|------|-----------|------|------|
+| **H100 80GB** | 80 GB | ~10B | ~40B | ~80B | ~160B |
+| **H200 141GB** | 141 GB | ~18B | ~70B | ~140B | ~280B |
+| **B200 192GB** | 192 GB | ~24B | ~96B | ~192B | ~384B |
+| **MI300X 192GB** | 192 GB | ~24B | ~96B | ~192B | ~384B |
+| **A100 80GB** | 80 GB | ~10B | ~40B | ~80B | ~160B |
+| **GB200 (192+480)** | 192 GB GPU + 480 GB Grace | — | ~96B + CPU offload | — | — |
+
+*Approximate: 1B params ≈ 2 GB FP16 ≈ 4 GB FP32 ≈ 1 GB INT8 ≈ 0.5 GB INT4. Subtract ~10–15 % HBM for activations, KV cache, optimizer states.*
+
+### Memory breakdown inference
+
+| Component | Llama 3 70B (FP16) | Llama 3 8B (FP16) |
+|------------|-------------------|-------------------|
+| Model weights | 140 GB | 16 GB |
+| KV cache (4K context, batch 1) | ~2 GB | ~0.2 GB |
+| KV cache (128K context, batch 1) | ~60 GB | ~6.5 GB |
+| Activations (peak) | ~5 GB | ~1 GB |
+| **Total 4K ctx** | ~147 GB | ~17 GB |
+| **Total 128K ctx** | ~205 GB | ~23 GB |
+
+**Conclusion:** Llama 3 70B FP16 does not fit on a single H100 (80 GB). Required: INT8 (170 GB → 2× H100), INT4 (85 GB → 1× H200), or tensor parallelism.
+
+### Context length vs memory
+
+| Context | KV cache 70B (FP16) | KV cache 8B (FP16) | Note |
+|---------|-------------------|-------------------|------|
+| 4K | ~2.2 GB | ~0.25 GB | Typical chat |
+| 32K | ~18 GB | ~2 GB | Documents |
+| 128K | ~72 GB | ~8 GB | Long-context (Claude, Gemini) |
+| 1M | ~560 GB | ~64 GB | Experimental (Gemini 1.5 Pro) |
+
+KV cache is **linear with context length** and quadratic with attention head count. Critical for long-context inference.
+
+### Throughput inference
+
+| Model | GPU | Precision | Batch size | Tokens/s | QPS (1K output) |
+|-------|-----|-----------|-----------|----------|-----------------|
+| Llama 3 8B | H100 | FP16 | 1 | ~800 | ~0.8 |
+| Llama 3 8B | H100 | FP16 | 128 | ~4 500 | ~35 |
+| Llama 3 8B | H100 | INT4 | 128 | ~8 000 | ~62 |
+| Llama 3 70B | 4× H100 | FP16 | 1 | ~180 | ~0.18 |
+| Llama 3 70B | 4× H100 | INT4 | 64 | ~1 200 | ~19 |
+| Llama 3 70B | 8× H100 | FP16 (TP=8) | 128 | ~2 500 | ~20 |
+| DeepSeek-R1 671B | 8× H200 | FP8 (MoE) | 64 | ~500 | ~8 |
+| GPT-4 class (est.) | — | — | — | ~100–300 | ~1–3 |
+
+**Notes:**
+- QPS (queries per second) depends on output length (1K tokens ≈ ~1 query)
+- Larger batch increases throughput but increases TTFB (time to first token)
+- Tensor Parallelism (TP) scales, but communication overhead grows linearly
+
+### Training limits
+
+#### Scaling efficiency
+
+| GPU count | Model | Efficiency | Reason |
+|-----------|-------|-----------|-------|
+| 8 (1 node) | Llama 3 8B | ~95 % | NVLink intra-node |
+| 64 (8 nodes) | Llama 3 8B | ~85 % | IB inter-node |
+| 512 (64 nodes) | Llama 3 70B | ~75 % | Communication overhead |
+| 4 096 (512 nodes) | Llama 3 70B | ~60 % | Pipeline bubble, network |
+| 16 384 (2 048 nodes) | Llama 3 405B | ~45 % | Synchronous SGD overhead |
+
+**Note:** Efficiency = (actual throughput) / (ideal linear speedup). Decreases logarithmically with GPU count.
+
+#### Memory breakdown training
+
+| Component | Llama 3 70B (BF16) | Llama 3 8B (BF16) |
+|------------|-------------------|-------------------|
+| Model weights | 140 GB | 16 GB |
+| Optimizer states (Adam) | 280 GB | 32 GB |
+| Gradients | 140 GB | 16 GB |
+| Activations (peak) | ~30 GB | ~4 GB |
+| **Total (DDP)** | ~590 GB | ~68 GB |
+| **Total (FSDP shard=8)** | ~74 GB | ~8.5 GB |
+
+**Conclusion:** FSDP (Fully Sharded Data Parallelism) is required for training models > 10B. Adam optimizer doubles memory vs inference (weights + optimizer + gradients).
+
+#### Time to train
+
+| Model | GPU count | GPU type | Precision | Time | Cost (on-prem estimate) |
+|-------|-----------|---------|-----------|------|---------------------|
+| Llama 3 8B | 64 | H100 | BF16 | ~3 days | ~$5 000 |
+| Llama 3 70B | 512 | H100 | BF16 | ~14 days | ~$100 000 |
+| Llama 3 405B | 16 384 | H100 | BF16 | ~60 days | ~$14 M |
+| DeepSeek-R1 671B (MoE) | 2 048 | H800 | BF16 | ~30 days | ~$6 M |
+| GPT-4 (est.) | 25 000 | A100/H100 | Mixed | ~90–100 days | ~$100 M |
+
+### Power and thermal limits
+
+| Configuration | TDP limit | Throughput loss | Reason |
+|-------------|-----------|------------------|--------|
+| H100 SXM | 700 W (default) | 0 % | Nominal |
+| H100 SXM | 600 W (-15 %) | ~5–8 % | Power capping |
+| H100 SXM | 500 W (-30 %) | ~15–25 % | Significant throttling |
+| H100 SXM | 400 W (-43 %) | ~30–50 % | Emergency only |
+| DGX H100 (8×) | 5.6 kW (max) | 0 % | Liquid cooling required |
+| DGX H100 (8×) | 4.5 kW (air) | ~10–15 % | Rear-door heat exchanger |
+
+GPU throttles when exceeding TDP or temperature (85°C+). Power capping correlates linearly with frequency but non-linearly with throughput.
+
+### API and operational limits
+
+| Limit | Description | Typical value |
+|-------|-------|-----------------|
+| **Rate limit** | Max requests per minute/hour | 100–10 000 RPM (per tier) |
+| **Tokens per minute (TPM)** | Max tokens per minute | 1M–300M (per model) |
+| **Context window** | Max input tokens | 4K–2M (per model) |
+| **Max output tokens** | Max generated tokens | 4K–32K (per model) |
+| **Concurrent requests** | Parallel request count | 10–10 000 (per backend) |
+| **Batch window** | Time to accumulate batch | 0–20 s (vLLM, TGI) |
+| **TTFB timeout** | Max latency to first token | 30–120 s |
+| **Idle timeout** | GPU idle → scale to 0 | 5–15 min (cloud) |
+
+### Limits per deployment model
+
+| Dimension | On-prem HW | Managed cloud (SageMaker, Vertex) | API (OpenAI, Anthropic) |
+|-----------|--------------|----------------------------------|------------------------|
+| **Model size** | Limited by HBM (max 192 GB/GPU) | Unlimited (cluster scaling) | Unlimited |
+| **Queries** | Limited by GPU count | Auto-scaling | Rate limit (per tier) |
+| **Latency** | < 10 ms (same node) | 10–100 ms (network hop) | 100 ms – 10 s |
+| **Customization** | Full (fine-tuning, quantization) | Managed (SageMaker, Bedrock) | Prompt engineering only |
+| **Data privacy** | Yes (on-prem) | Contractual (region, encryption) | Limited |
+| **Cost per 1M tokens** | ~$0.10–0.50 (FP16 inference) | ~$0.20–1.00 | ~$0.15–15.00 |
+| **Max context** | 128K+ (depending on GPU count) | 128K+ | 32K–2M |
+| **Cold start** | 0 (always-on) | 30 s – 5 min | 0 (shared infra) |
+
+---
+
+## GPU pricing and price/performance (2026)
+
+> Prices are approximate — NVIDIA does not publish official datacenter GPU price lists. Cloud prices from public providers (Q2 2026). HW purchase prices vary by volume, reseller, and region.
+
+### Purchase price (buy)
+
+| GPU | Price/GPU | Price 8× GPU baseboard | $/PFLOPS (FP16) | Note |
+|-----|---------|----------------------|----------------|------|
+| **H100 SXM** | $27,000–40,000 | ~$200,000 | $25,000 | Scarcity 2023–2024, now stabilized |
+| **H200 SXM** | $35,000–50,000 | ~$280,000 | ~$35,000 | H100 upgrade, HBM3e |
+| **B200** | ~$60,000–70,000 | ~$500,000+ | ~$31,000 | Blackwell, FP4 support |
+| **B100** | ~$30,000 | ~$240,000 | ~$20,000 | Lower price than B200, similar FP8 perf |
+| **GB200** (Grace+Blackwell) | ~$70,000–100,000 | ~$2,000,000 (rack) | — | CPU+GPU unified, high-density |
+| **A100 80GB** | ~$10,000–15,000 | ~$120,000 | ~$19,200 | Previous gen, still relevant |
+| **MI300X** | ~$12,000–18,000 | ~$100,000 | ~$9,600 | AMD, 192 GB HBM3 |
+| **Gaudi 3** | ~$15,625 | ~$125,000 | **$8,515** | Intel, best $/PFLOPS |
+| **L40S** | ~$8,000–10,000 | — | — | Inference, enterprise |
+
+### Cloud pricing (on-demand $/GPU/hr)
+
+| GPU | Cheapest | Mid-range (CoreWeave, Lambda) | Hyperscaler (AWS, GCP, Azure) |
+|-----|----------|-----------------------------|-------------------------------|
+| **H100 SXM** | $1.38 (Thunder) | $2.89–3.29 | $4.15–6.88 |
+| **H100 PCIe** | $2.01 (Spheron) | $2.50 | — |
+| **H200 SXM** | $3.89 (Spheron) | $4.54 | $5.00+ |
+| **B200** | **$3.39** (Spheron) | $6.02 | $14.24 (AWS) |
+| **B200 spot** | **$2.12** (Spheron) | — | — |
+| **GB200** | $3.50 (Runcrate) | $5.85 (Oracle) | $6.95 (GCP) |
+| **MI300X** | **$1.50** (TensorWave) | $1.85 (Vultr) | $7.86 (Azure) |
+| **A100 80GB** | $1.07 (Spheron) | $1.50–2.00 | $3.00+ |
+| **Gaudi 3** | ~$1.50–2.50 | — | — |
+| **L40S** | $0.91 (Spheron) | $1.50–2.00 | — |
+
+### Inference cost ($/M tokens)
+
+| GPU | Provider | $/hr | Est. tok/s | $/M tok |
+|-----|----------|------|-----------|--------|
+| **B200** | Spheron | $3.39 | ~4,000 | **$0.42** |
+| **B200 spot** | Spheron | $2.12 | ~4,000 | **$0.15** |
+| **H100 PCIe** | Spheron | $2.01 | ~1,200 | $0.47 |
+| **A100 80GB** | Spheron | $1.07 | ~520 | $0.57 |
+| **H100 SXM** | AWS | $6.88 | ~1,200 | $1.59 |
+| **H200 SXM** | Spheron | $4.54 | ~1,800 | $0.70 |
+| **L40S** | Spheron | $0.91 | ~450 | $0.56 |
+
+*Values for Llama 3 70B (INT8, batch=1, output 1K tok). Actual values vary by batch size, context, and quantization.*
+
+### Cost per GB HBM
+
+| GPU | HBM | Price/hr cloud | $/GB/hr | Best for memory-bound workloads |
+|-----|-----|-------------|--------|--------------------------------|
+| **MI300X** | 192 GB | $1.50 | **$0.0078** | ✅ Best |
+| **B200** | 192 GB | $3.39 | $0.0177 | ✅ Good |
+| **H200** | 141 GB | $3.89 | $0.0276 | ⚠️ |
+| **H100 SXM** | 80 GB | $1.38 | $0.0173 | ⚠️ Only up to 70B models |
+| **GB200** | 384 GB | $3.50 | $0.0091 | ✅✅ (2× MI300X capacity) |
+
+### Price/performance by scenario
+
+| Scenario | Winner | Rationale |
+|----------|--------|-----------|
+| **Absolute performance** (cost no object) | **GB200 DGX NVL72** | 72× GPU, 18 PFLOPS FP8, 384 GB HBM/GPU |
+| **Cloud inference** — best $/token | **B200 spot** | $0.15/M tok; 4× H100 throughput at lower cost |
+| **Cloud inference** — on-demand | **B200** | $0.42/M tok |
+| **Cloud inference** — budget | **A100 / L40S** | $0.57–0.56/M tok |
+| **Training** — price/perf on purchase | **Gaudi 3** | $8,515/PFLOPS, 2.5–3× better than H100 |
+| **Training** — cloud | **H100 SXM** | $1.38/hr, CUDA ecosystem, NCCL |
+| **Memory-bound** — long context, 70B+ | **MI300X / GB200** | 192–384 GB, $0.0078–0.0091/GB |
+| **Ecosystem + safe choice** | **H100/H200** | CUDA, widest SW, NVIDIA tools |
+| **Spot / preemptible** — lowest cost | **A100 / H100** | $1.07–1.38/hr, 50–90% off on-demand |
+
+### 2026 Trends
+
+- **H100** — price dropped 64% from peak $8/hr to $1.38–2.89/hr, then 40% rebound from inference demand
+- **B200** — new high-end, $3.39/hr cloud → ~$0.15/M tok on spot — new inference benchmark
+- **MI300X** — supply growing (TensorWave, Vultr, CoreWeave, Oracle, Azure), from $1.50/hr
+- **Gaudi 3** — best $/PFLOPS on purchase, but narrow ecosystem and limited cloud availability
+- **Market bifurcation** — prior gen (H100, A100) commoditizing, new gen (B200, GB200) commanding premium
+
+- [GPU.en.md](GPU.en.md) — GPU architecture, NVIDIA/AMD, vGPU, MIG
+- [NETWORKING.en.md](NETWORKING.en.md) — InfiniBand, RoCE, network topology
+- [STORAGE.en.md](STORAGE.en.md) — parallel filesystem, object store
+- [DATACENTERS.en.md](DATACENTERS.en.md) — DC layout, power, cooling
+- [CLOUD.en.md](CLOUD.en.md) — cloud AI services (SageMaker, Vertex AI)
+
+## Sources
+
+Links, books, and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)
+
+*Last revision: 2026-06-18*
--- a/AI-INFRASTRUCTURE.md
+++ b/AI-INFRASTRUCTURE.md
@@ -0,0 +1,602 @@
+# 🧠 Infrastruktura pro AI/ML
+
+## Přehled komponent
+
+```mermaid
+flowchart TD
+    subgraph Compute
+        GPU["GPU (H100/B200/Instinct)"]
+        CPU["CPU (AMD EPYC / Intel Xeon)"]
+        ASIC["ASIC (TPU, Trainium, Inferentia)"]
+    end
+    subgraph Network
+        IB["InfiniBand NDR/XDR"]
+        ROCE["RoCEv2"]
+        NVL["NVLink / NVSwitch"]
+    end
+    subgraph Storage
+        FS["Parallel FS (Lustre, GPFS, Weka)"]
+        OBJ["Object Store (S3, MinIO)"]
+        NVME["Local NVMe cache"]
+    end
+    subgraph Orchestration
+        S["Slurm"]
+        K["Kubernetes + Volcano/Kueue"]
+    end
+    subgraph Cooling
+        DLC["Direct-to-chip liquid"]
+        IMM["Immersion"]
+        AIR["Air (high-density)"]
+    end
+
+    Compute --> Network --> Storage
+    Orchestration --> Compute
+    Cooling --> Compute
+```
+
+---
+
+## GPU compute
+
+### NVIDIA
+
+| GPU | Architektura | FP8 | FP16/BF16 | FP64 | HBM | NVLink | TDP | Rack |
+|-----|-------------|-----|-----------|------|-----|--------|-----|------|
+| **H100 SXM** | Hopper | 3 958 TFLOPS | 1 979 TFLOPS | 67 TFLOPS | 80 GB HBM3 | 900 GB/s | 700 W | 6–8× v DGX H100 |
+| **H200 SXM** | Hopper (HBM3e) | 3 958 TFLOPS | 1 979 TFLOPS | 67 TFLOPS | 141 GB HBM3e | 900 GB/s | 700 W | 6–8× v DGX H200 |
+| **B200** | Blackwell | ~9 000 TFLOPS | ~4 500 TFLOPS | ~40 TFLOPS | 192 GB HBM3e | 1 800 GB/s | 1 000 W | 6–8× v DGX B200 |
+| **GB200 Grace Hopper** | Blackwell | ~18 000 TFLOPS | ~9 000 TFLOPS | — | 192 GB + 480 GB (Grace) | NVLink-C2C | 1 000 W (GPU) + 500 W (CPU) | DGX GB200 (36× GPU) |
+| **L40S** | Ada Lovelace | 733 TFLOPS | 367 TFLOPS | — | 48 GB GDDR6 | N/A | 350 W | Inference, enterprise |
+| **A100 SXM** | Ampere | 1 248 TFLOPS | 624 TFLOPS | 19,5 TFLOPS | 80 GB HBM2e | 600 GB/s | 400 W | DGX A100 |
+
+### AMD
+
+| GPU | Architektura | FP8 | FP16/BF16 | FP64 | HBM | Infinity Fabric | TDP |
+|-----|-------------|-----|-----------|------|-----|----------------|-----|
+| **MI300X** | CDNA 3 | 2 615 TFLOPS | 1 307 TFLOPS | 81 TFLOPS | 192 GB HBM3 | 896 GB/s | 750 W |
+| **MI250** | CDNA 2 | — | 383 TFLOPS | 95,7 TFLOPS | 128 GB HBM2e | 400 GB/s | 500 W |
+
+### Intel
+
+| GPU | Architektura | FP16/BF16 | FP32 | HBM | TDP |
+|-----|-------------|-----------|------|-----|-----|
+| **Gaudi 3** | Custom | 1 835 TFLOPS | — | 144 GB HBM2e | 600 W |
+| **Max 1550** | Xe HPC | 600+ TFLOPS | 200 TFLOPS | 128 GB HBM2e | 600 W |
+
+### Cloud ASIC
+
+| ASIC | Provider | Use case | Výkon |
+|------|----------|----------|-------|
+| **TPU v5p** | Google | Training | ~4 600 TFLOPS (BF16) per pod |
+| **Trainium 2** | AWS | Training | ~1 000 TFLOPS (BF16) per chip |
+| **Inferentia 2** | AWS | Inference | ~400 TOPS (INT8) per chip |
+| **Maia 100** | Microsoft | Training + inference | Custom, 800 W TDP |
+
+---
+
+## AI networking
+
+### Srovnání technologií
+
+| Technologie | Bandwidth per link | Latence | Topologie | Use case |
+|-------------|-------------------|---------|-----------|----------|
+| **InfiniBand NDR200** | 200 Gb/s | < 1 µs | Fat-tree, Dragonfly+ | Training (NVIDIA) |
+| **InfiniBand NDR400** | 400 Gb/s | < 1 µs | Fat-tree, Dragonfly+ | Training (NVIDIA) |
+| **InfiniBand XDR** | 800 Gb/s (planned) | < 1 µs | Dragonfly+ | Next-gen training |
+| **RoCEv2** (CX-7/8) | 200–400 Gb/s | 1–2 µs | Fat-tree, Spine-leaf | Training (AMD, Intel, open) |
+| **NVLink 4.0** | 900 GB/s per GPU | < 0,5 µs | NVSwitch full-mesh | Intra-node GPU comm |
+| **NVLink 5.0** | 1 800 GB/s per GPU | < 0,5 µs | NVSwitch full-mesh | Intra-node (Blackwell) |
+| **Ethernet (400 GbE)** | 400 Gb/s | 2–5 µs | Spine-leaf | Inference, data pipeline |
+
+### Principy AI fabric
+
+- **Rail-optimized topology** — každá GPU komunikuje na dedikovaném "rails" (stejné GPU indexy napříč uzly jsou na stejném switchi)
+- **Fat-tree (Clos)** — standard pro InfiniBand a RoCE, non-blocking bisection bandwidth
+- **Dragonfly+** — redukce počtu hopů při zachování bandwidth (používáno v největších clusterech)
+- **GPU Direct RDMA** — přímá komunikace GPU ↔ GPU bez CPU involvementu, podpora InfiniBand a RoCE
+- **SHARP (Scalable Hierarchical Aggregation and Reduction Protocol)** — in-network reduction pro AllReduce (pouze InfiniBand)
+
+### Bandwidth dimenzování
+
+```text
+Pravidlo: InfiniBand bandwidth ≥ 50 % GPU HBM bandwidth pro škálovatelné training
+
+Příklad: H100 má 3,35 TB/s HBM
+  → Potřebuje min. 1,6 TB/s bisection bandwidth per GPU
+  → 8× H100 v DGX: 4× NDR400 IB na GPU = 4 × 50 GB/s = 200 GB/s
+  → Reálně: 8× 200 Gb/s (25 GB/s) per GPU v typické konfiguraci = ~6 % HBM → bottleneck
+```
+
+---
+
+## AI storage
+
+### Požadavky
+
+| Dataset size | IO pattern | Doporučený storage | Bandwidth |
+|-------------|-----------|-------------------|-----------|
+| < 10 TB | Sequential read (data loading) | Local NVMe | > 10 GB/s per node |
+| 10–100 TB | Random read (checkpointing) | Parallel FS (Lustre, Weka) | > 100 GB/s cluster-wide |
+| 100 TB–10 PB | Mixed (training + checkpoint) | Parallel FS + object store | > 500 GB/s |
+| 10 PB+ | Multi-modal, video, LLM | Tiered (NVMe cache + parallel FS + object) | > 1 TB/s |
+
+### Srovnání storage řešení
+
+| Řešení | Typ | Bandwidth per node | Max capacity | Škálování | Use case |
+|--------|-----|-------------------|-------------|-----------|----------|
+| **Lustre** | Parallel FS (POSIX) | > 100 GB/s (cluster) | 100s PB | OST + MDS | HPC, LLM training (standard) |
+| **GPFS / StorageScale** | Parallel FS (POSIX) | > 100 GB/s | 100s PB | NSD servers | HPC, AI (IBM) |
+| **WekaFS** | Parallel FS (POSIX + NFS/SMB) | ~80 GB/s per 10 nodes | 10s PB | Container-native | AI/ML, NVIDIA DGX preferred |
+| **VAST Data** | Universal storage (NVMe + QLC) | ~100 GB/s per cluster | 10s PB | Scale-out | AI, checkpoint, data lake |
+| **Pure Storage//E** | All-flash (NVMe) | ~50 GB/s | ~30 PB | Scale-out | Enterprise AI, database |
+| **MinIO / S3** | Object store | ~20 GB/s per gateway | EB | Erasure coding | Dataset repository, checkpoint |
+| **NetApp AFF** | NAS + S3 | ~10 GB/s per controller | ~50 PB | HA pair | Enterprise, NFS baseline |
+
+### Checkpointing strategie
+
+| Strategie | RPO | Storage impact | Popis |
+|-----------|-----|---------------|-------|
+| **Full checkpoint** | každý N step | Vysoký (zastaví training) | Celý model + optimizer state |
+| **Async checkpoint** | každý N step | Střední (non-blocking) | Kopie do staging bufferu, zápis na pozadí |
+| **Distributed checkpoint** (NVIDIA NeMo) | každý N step | Nízký | Každá rank zapisuje svůj shard |
+| **In-memory checkpoint** (IBM) | při failover | Minimální (DRAM) | Replikace do DRAM jiného node |
+| **Continuous checkpoint** (Microsoft) | každý 1–5 min | Nízký (delta) | Jen changed shardy |
+
+---
+
+## AI cluster architektura
+
+### Fyzická topologie — DGX H100 example
+
+```
+┌──────── DGX H100 (8× GPU) ────────┐
+│  ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │
+│  │GPU 0│ │GPU 1│ │GPU 2│ │GPU 3│ │
+│  └──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘ │
+│  ┌──┴──┐ ┌──┴──┐ ┌──┴──┐ ┌──┴──┐ │
+│  │GPU 4│ │GPU 5│ │GPU 6│ │GPU 7│ │
+│  └─────┘ └─────┘ └─────┘ └─────┘ │
+│  NVSwitch (NVLink 4.0, 900 GB/s)  │
+│  InfiniBand CX-7: 8× NDR400       │
+└────────────────────────────────────┘
+         │ 8× IB rails
+    ┌────┴──────────────┐
+    │  IB NDR400 Switches │  (rail-optimized)
+    └────────────────────┘
+```
+
+### Kubernetes pro AI
+
+| Komponenta | Role |
+|-----------|------|
+| **Volcano** | Batch scheduling, gang scheduling, queue management |
+| **Kueue** | Multi-tenant admission, resource quotas, fair sharing |
+| **NVIDIA GPU Operator** | Driver, container toolkit, MIG, DCGM, monitoring |
+| **HAMi** (ex k8s-vGPU-scheduler) | GPU sharing, MIG partitioning, fractional GPU |
+| **Node Feature Discovery** | Detekce GPU typu, NUMA topologie |
+| **Topology Manager** | NUMA-aware pod placement |
+| **DPDK / SR-IOV** | High-performance networking pro GPU Direct RDMA |
+
+### Slurm pro AI
+
+| Komponenta | Role |
+|-----------|------|
+| **slurm.conf** | Partition pro GPU nodes, GRES (Generic Resource) |
+| **gres.conf** | GPU typ, počet GPU na node |
+| **srun --gres=gpu:8** | Alokace 8 GPU pro job |
+| **sbatch --nodes=64 --ntasks=512** | 64 uzly, 512 ranků (8 GPU/node) |
+| **Pixis** | NVIDIA orchestrace plugin pro Slurm |
+
+---
+
+## Chlazení AI clusterů
+
+### Power density srovnání
+
+| Konfigurace | TDP per node | Racků | kW/rack | Poznámka |
+|-------------|-------------|-------|---------|----------|
+| Standardní server (2U) | 1 kW | 20 | 5–10 | Běžné DC |
+| GPU server (DGX H100, 6×) | 42 kW | 6 | 45–50 | Air cooling limit |
+| GPU server (DGX B200, 6×) | 72 kW | 6 | 90–100 | Liquid cooling nutný |
+| GPU server (GB200 NVL72) | 120 kW | — | ~120 | Liquid cooling mandatory |
+| NVIDIA NVL72 rack | 120 kW | 1 | 120 | Plně liquid cooled |
+
+### Chladící technologie
+
+| Metoda | Max kW/rack | CAPEX | OPEX | Komplexita |
+|--------|-------------|-------|------|-----------|
+| **Air cooling (CRAC/CRAH)** | < 15 | Nízká | Střední | Nízká |
+| **Air cooling (in-row)** | 15–30 | Střední | Střední | Nízká |
+| **Rear-door heat exchanger** | 30–50 | Střední | Nízká | Střední |
+| **Direct-to-chip liquid (cold plate)** | 50–150 | Vysoká | Nízká | Vysoká |
+| **Immersion (single-phase)** | 100–200 | Vysoká | Nízká | Vysoká |
+| **Immersion (two-phase)** | 200+ | Velmi vysoká | Nízká | Velmi vysoká |
+
+---
+
+## Inference infrastruktura
+
+### Srovnání inference serverů
+
+| Nástroj | Frameworky | Optimalizace | Use case |
+|---------|-----------|-------------|----------|
+| **vLLM** | Megatron, HF, AWQ, GPTQ | PagedAttention, KV cache, continuous batching | LLM inference (open source) |
+| **TensorRT-LLM** | TensorRT | INT4/INT8/FP8, inflight batching, attention optimizations | Produkce (NVIDIA) |
+| **Triton Inference Server** | Vše (TensorRT, vLLM, PyTorch) | Model ensemble, model caching, concurrent execution | Enterprise, multi-model |
+| **SageMaker** | Managed | Auto-scaling, model parallelism | AWS managed |
+| **OpenAI API / TGI** | HF Transformers | Continuous batching, flash attention | Hosting |
+
+### Optimalizace pro inference
+
+| Technika | Latence zlepšení | Propustnost zlepšení | Memory reduction |
+|----------|-----------------|---------------------|------------------|
+| **FP8/INT8 quantization** | — | 2× | 2× |
+| **INT4 quantization** | — | 4× | 4× |
+| **Flash Attention 2/3** | 2–4× | — | 50 % (KV cache) |
+| **PagedAttention** | — | 2–5× | 95 % (KV cache fragmentation) |
+| **Continuous batching** | — | 10–20× | — |
+| **Speculative decoding** | 2–3× | — | — |
+| **Multi-LoRA / S-LoRA** | — | 8–16× | — |
+
+---
+
+## Distribuované training techniky
+
+| Technika | Popis | Frameworky |
+|----------|-------|------------|
+| **Data Parallelism (DDP/FSDP)** | Každá GPU má kopii modelu, různé batch | PyTorch DDP, FSDP |
+| **Tensor Parallelism (TP)** | Model rozdělen po vrstvách (intra-node) | Megatron-LM, DeepSpeed |
+| **Pipeline Parallelism (PP)** | Vrstvy rozděleny napříč uzly | Megatron-LM, DeepSpeed |
+| **Sequence Parallelism (SP)** | Sekvence rozdělena napříč GPU | Megatron-LM |
+| **Expert Parallelism (EP)** | Různé expertní subsítě na různých GPU | Mixture-of-Experts (MoE) |
+| **3D Parallelism** | TP + PP + DP kombinace | Megatron-LM, NeMo |
+| **ZeRO (1/2/3)** | Optimalizátor/gradient/parametry sharding | DeepSpeed |
+| **NCCL / RCCL** | GPU collective communication library | NVIDIA/AMD |
+
+---
+
+## Operační systémy pro AI
+
+### Srovnání distribucí
+
+| OS | GPU driver | CUDA | Container toolkit | IB/RoCE | Lustre klient | Produkční podpora |
+|----|-----------|------|-------------------|---------|--------------|-------------------|
+| **Ubuntu 22.04 LTS** | NVIDIA 525+ | 12.x | nvidia-container-toolkit | MLNX_OFED, rdma-core | Ano (lustre-client) | NVIDIA DGX standard |
+| **Ubuntu 24.04 LTS** | NVIDIA 550+ | 12.5+ | nvidia-container-toolkit | MLNX_OFED, rdma-core | Ano | Nejnovější GPU podpora |
+| **RHEL 9 / Rocky 9** | NVIDIA 525+ | 12.x | nvidia-container-toolkit | MLNX_OFED | Ano (EL repo) | Red Hat, enterprise |
+| **DGX OS** (Ubuntu-based) | NVIDIA custom | 12.x | Pre-installed | Pre-configured | Ano | NVIDIA DGX jediná podporovaná |
+| **SLES 15 SP5** | NVIDIA 525+ | 12.x | nvidia-container-toolkit | MLNX_OFED | Ano | HPC, některé Lustre clustery |
+| **Debian 12** | NVIDIA 525+ | 12.x | nvidia-container-toolkit | rdma-core | Ano (backports) | Community, research |
+| **Flatcar / Bottlerocket** | Container-host | — | nvidia-container-toolkit | Omezeně | Ne | K8s-only, minimal footprint |
+
+### Omezení a limity
+
+#### GPU drivery a CUDA
+
+| Omezení | Detail |
+|----------|--------|
+| **Driver-CUDA kompatibilita** | NVIDIA driver major verze musí odpovídat CUDA toolkit (driver ≥ CUDA req). Např. CUDA 12.5 vyžaduje driver ≥ 550 |
+| **Kernel version** | NVIDIA driver není kompatibilní se všemi kernely. Nový kernel (6.8+) může vyžadovat DKMS build nebo opožděnou podporu |
+| **Secure Boot** | NVIDIA driver vyžaduje podepsaný modul (MOK, shim) nebo vypnutý Secure Boot — častý problém v enterprise |
+| **Open vs Proprietary driver** | NVIDIA `nvidia-open` (od R515) — open source kernel modul. Podpora GPU: datové centrum (H100+) → OK, starší GPU → proprietary nutný |
+| **nvidia-persistenced** | Nutný pro udržení GPU initialization, bez něj GPU po idle timeout usnou (`nvidia-smi -pm 1`) |
+| **GPU reset** | Po crash training jobu může GPU viset. `nvidia-smi --gpu-reset` nebo reboot node, někdy i power cycle |
+| **Multi-instance GPU (MIG)** | Vyžaduje specifický driver, MIG mode na GPU, restart GPU. Nelze měnit za běhu. Podpora jen A100, H100, B200 |
+
+#### Network (InfiniBand / RoCE)
+
+| Omezení | Detail |
+|----------|--------|
+| **MLNX_OFED vs rdma-core** | MLNX_OFED (NVIDIA) — plná podpora, ale vlastní kernel moduly, nutná compatibility s kernel verzí. `rdma-core` (open) — omezená podpora, ale bez modulů |
+| **Kernel compatibility** | MLNX_OFED podporuje jen specifické kernel verze (major.minor). Upgrade kernelu → nutný rebuild MLNX_OFED |
+| **NCCL** | Verze NCCL musí být kompatibilní s CUDA a IB firmware. `nccl-tests` jako validace |
+| **SHARP** | In-network reduction vyžaduje specifickou MLNX_OFED + IB switch firmware kombinaci |
+| **GPU Direct RDMA** | Vyžaduje `nvidia-peermem` modul + MLNX_OFED. Nefunguje se všemi GPU a IB kartami |
+| **RoCE v PFC/ECN** | RoCE vyžaduje lossless fabric (PFC, ECN, DCQCN). Nastavení switch i host — komplexní tuning |
+
+#### Storage
+
+| Omezení | Detail |
+|----------|--------|
+| **Lustre klient** | Verze klienta musí odpovídat serveru. Upgrade serveru → upgrade všech klientů. Kompatibilní jen s RHEL/Debian deriváty |
+| **POSIX locking** | NFS a Lustre mají odlišné POSIX locking chování. Distributed training spoléhá na flock → problém při smíšených FS |
+| **Filesystem cache** | Page cache může maskovat IO bottleneck. Training joby často vyžadují `O_DIRECT` nebo `sync` IO |
+| **Local NVMe vs parallel FS** | Dataset staging na lokální NVMe eliminuje síťovou závislost, ale vyžaduje prostor a pre-fetch pipeline |
+
+#### Kontejnerový runtime
+
+| Omezení | Detail |
+|----------|--------|
+| **Docker + GPU** | `nvidia-container-toolkit` (dříve nvidia-docker2). Nutná instalace runtime a config v `/etc/docker/daemon.json` |
+| **Podman + GPU** | Vyžaduje `nvidia-container-toolkit` + podman hook. Méně testováno než Docker |
+| **containerd + GPU** | Standart pro K8s. Vyžaduje `cdi` (Container Device Interface) nebo `nvidia-container-runtime` |
+| **Enroot + Pyxis** | NVIDIA container stack pro Slurm (Enroot = container runtime bez daemona, Pyxis = Slurm plugin) |
+| **User namespace mapping** | Kontejnerové GPU access vyžaduje device cgroup a rootless může selhat (výjimka pro /dev/dri a /dev/nvidia*) |
+
+#### Kernel parametry
+
+```text
+# AI workload recommended sysctl
+net.core.rmem_max = 134217728       # dostatečný pro NCCL
+net.core.wmem_max = 134217728
+net.ipv4.tcp_rmem = 4096 87380 134217728
+net.ipv4.tcp_wmem = 4096 65536 134217728
+net.core.netdev_budget = 600        # pro vysokou packet rate
+vm.max_map_count = 1048576          # PyTorch DataLoader workers
+kernel.numa_balancing = 0           # vypnout NUMA balancing (ruší locality)
+kernel.sched_min_granularity_ns = 10000000
+
+# Disable security mitigations pro perf (pouze na dedicated AI clusterech)
+mitigations=off
+transparent_hugepages=never         # nebo madvise — THP může způsobovat latency spiky
+intel_idle.max_cstate=1             # redukce C-state transition latency
+```
+
+#### Firmware a HW
+
+| Omezení | Detail |
+|----------|--------|
+| **GPU firmware (VBIOS)** | NVIDIA datacenter GPU (H100, B200) mají VBIOS updates přes NVFlash. Bez update → chybí podpora partitioning nebo novějších CUDA feature |
+| **InfiniBand firmware** | IB switch a HCA firmware musí být kompatibilní. Mix starého switch + nového HCA → degraded perf |
+| **NVSwitch firmware** | DGX systémy mají NVSwitch firmware updatovatelný jen přes NVIDIA DGX tools |
+| **Power capping (nvidia-smi)** | `nvidia-smi -pl <power>` — omezení TDP pro power budget management. Nutné testovat vliv na training throughput |
+| **GPU clock locking** | `nvidia-smi -ac <clock,mem>` — locked clock frekvence pro stabilní benchmarky. Aplikace až po `nvidia-persistenced` |
+| **PCIe Gen** | GPU v PCIe Gen4 slotu (místo Gen5) → bottleneck pro data transfer CPU↔GPU. Důležité pro FSDP sharding |
+
+### Doporučené OS per use case
+
+| Use case | OS | Zdůvodnění |
+|----------|-----|-------|
+| **DGX cluster (produkce)** | DGX OS / Ubuntu 22.04 LTS | NVIDIA standard, nejlepší driver support |
+| **Enterprise K8s (OpenShift)** | RHEL 9 / RHCOS | Red Hat support, GPU Operator kompatibilní |
+| **Vanilla K8s (on-prem)** | Ubuntu 22.04 LTS + Flatcar (workers) | Nejširší community support, Flatcar pro minimal footprint |
+| **Slurm cluster (HPC/AI)** | Rocky Linux 9 / Ubuntu 22.04 LTS | EL ekosystém (Lustre, OFED) nebo Ubuntu (community) |
+| **Výzkum / rapid prototyping** | Ubuntu 24.04 LTS | Nejnovější CUDA, PyTorch, driver support |
+| **Edge inference** | NVIDIA JetPack / Ubuntu (ARM) | Embedded GPU (Jetson Orin, AGX) |
+
+---
+
+## AI-ready datové centrum — check-list
+
+| Oblast | Požadavek |
+|--------|-----------|
+| **Power** | 30–120 kW/rack, HVDC (400 V DC), UPS s podporou GPU špiček |
+| **Cooling** | Liquid cooling ready (direct-to-chip), rear-door pro 30+ kW |
+| **Network** | InfiniBand (NDR/XDR) nebo RoCEv2, rail-optimized fat-tree |
+| **Storage** | Parallel FS (Lustre/Weka), checkpoint bandwidth > 100 GB/s |
+| **GPU density** | Max GPU/rack, minimalizace NVSwitch hopů |
+| **Physical** | Podlaha nosnost 1 500+ kg/m², rack 52U–60U |
+| **Security** | Tenant isolation, network segmentation, data encryption |
+| **Monitoring** | DCGM, NCCL health checks, thermals, power capping |
+
+---
+
+## Omezení modelů a propustnosti
+
+### Model size per GPU
+
+Maximální velikost modelu, který se vejde na jednu GPU, závisí na HBM kapacitě a precision:
+
+| GPU | HBM | FP32 | FP16/BF16 | INT8 | INT4 |
+|-----|-----|------|-----------|------|------|
+| **H100 80GB** | 80 GB | ~10B | ~40B | ~80B | ~160B |
+| **H200 141GB** | 141 GB | ~18B | ~70B | ~140B | ~280B |
+| **B200 192GB** | 192 GB | ~24B | ~96B | ~192B | ~384B |
+| **MI300X 192GB** | 192 GB | ~24B | ~96B | ~192B | ~384B |
+| **A100 80GB** | 80 GB | ~10B | ~40B | ~80B | ~160B |
+| **GB200 (192+480)** | 192 GB GPU + 480 GB Grace | — | ~96B + CPU offload | — | — |
+
+*Hodnoty orientační: 1B parametrů ≈ 2 GB FP16 ≈ 4 GB FP32 ≈ 1 GB INT8 ≈ 0,5 GB INT4. Reálně odečíst ~10–15 % HBM pro activations, KV cache, optimizer states.*
+
+### Memory breakdown inference
+
+| Komponenta | Llama 3 70B (FP16) | Llama 3 8B (FP16) |
+|------------|-------------------|-------------------|
+| Model weights | 140 GB | 16 GB |
+| KV cache (4K context, batch 1) | ~2 GB | ~0,2 GB |
+| KV cache (128K context, batch 1) | ~60 GB | ~6,5 GB |
+| Activations (peak) | ~5 GB | ~1 GB |
+| **Celkem 4K ctx** | ~147 GB | ~17 GB |
+| **Celkem 128K ctx** | ~205 GB | ~23 GB |
+
+**Závěr:** Llama 3 70B v FP16 se nevejde na jednu H100 (80 GB). Nutné: INT8 (170 GB → 2× H100), INT4 (85 GB → 1× H200), nebo tensor parallelism.
+
+### Context length vs memory
+
+| Context | KV cache 70B (FP16) | KV cache 8B (FP16) | Poznámka |
+|---------|-------------------|-------------------|----------|
+| 4K | ~2,2 GB | ~0,25 GB | Běžný chat |
+| 32K | ~18 GB | ~2 GB | Dokumenty |
+| 128K | ~72 GB | ~8 GB | Long-context (Claude, Gemini) |
+| 1M | ~560 GB | ~64 GB | Experimentální (Gemini 1.5 Pro) |
+
+KV cache je **lineární s délkou kontextu** a kvadratická s počtem hlav pozornosti. Pro long-context je kritická.
+
+### Throughput inference
+
+| Model | GPU | Precision | Batch size | Tokens/s | QPS (1K output) |
+|-------|-----|-----------|-----------|----------|-----------------|
+| Llama 3 8B | H100 | FP16 | 1 | ~800 | ~0,8 |
+| Llama 3 8B | H100 | FP16 | 128 | ~4 500 | ~35 |
+| Llama 3 8B | H100 | INT4 | 128 | ~8 000 | ~62 |
+| Llama 3 70B | 4× H100 | FP16 | 1 | ~180 | ~0,18 |
+| Llama 3 70B | 4× H100 | INT4 | 64 | ~1 200 | ~19 |
+| Llama 3 70B | 8× H100 | FP16 (TP=8) | 128 | ~2 500 | ~20 |
+| DeepSeek-R1 671B | 8× H200 | FP8 (MoE) | 64 | ~500 | ~8 |
+| GPT-4 class (est.) | — | — | — | ~100–300 | ~1–3 |
+
+**Poznámky:**
+- QPS (queries per second) závisí na output délce (1K tokenů ≈ ~1 query)
+- Batch size zvyšuje throughput, ale zvyšuje TTFB (time to first token)
+- Tensor Parallelism (TP) škáluje, ale komunikační režba roste lineárně
+
+### Training limits
+
+#### Scaling efficiency
+
+| Počet GPU | Model | Efficiency | Důvod |
+|-----------|-------|-----------|-------|
+| 8 (1 node) | Llama 3 8B | ~95 % | NVLink intra-node |
+| 64 (8 nodes) | Llama 3 8B | ~85 % | IB inter-node |
+| 512 (64 nodes) | Llama 3 70B | ~75 % | Komunikační režie |
+| 4 096 (512 nodes) | Llama 3 70B | ~60 % | Pipeline bubble, network |
+| 16 384 (2 048 nodes) | Llama 3 405B | ~45 % | Synchronous SGD overhead |
+
+**Poznámka:** Efficiency = (actual throughput) / (ideal linear speedup). Klesá logaritmicky s počtem GPU.
+
+#### Memory breakdown training
+
+| Komponenta | Llama 3 70B (BF16) | Llama 3 8B (BF16) |
+|------------|-------------------|-------------------|
+| Model weights | 140 GB | 16 GB |
+| Optimizer states (Adam) | 280 GB | 32 GB |
+| Gradients | 140 GB | 16 GB |
+| Activations (peak) | ~30 GB | ~4 GB |
+| **Celkem (DDP)** | ~590 GB | ~68 GB |
+| **Celkem (FSDP shard=8)** | ~74 GB | ~8,5 GB |
+
+**Závěr:** FSDP (Fully Sharded Data Parallelism) je nutný pro trénování modelů > 10B. Adam optimizer zdvojnásobuje memory oproti inference (weights + optimizer + gradients).
+
+#### Time to train
+
+| Model | GPU count | GPU type | Precision | Time | Cost (on-prem odhad) |
+|-------|-----------|---------|-----------|------|---------------------|
+| Llama 3 8B | 64 | H100 | BF16 | ~3 dny | ~$5 000 |
+| Llama 3 70B | 512 | H100 | BF16 | ~14 dní | ~$100 000 |
+| Llama 3 405B | 16 384 | H100 | BF16 | ~60 dní | ~$14 M |
+| DeepSeek-R1 671B (MoE) | 2 048 | H800 | BF16 | ~30 dní | ~$6 M |
+| GPT-4 (est.) | 25 000 | A100/H100 | Mixed | ~90–100 dní | ~$100 M |
+
+### Power a thermal limity
+
+| Konfigurace | TDP limit | Throughput ztráta | Důvod |
+|-------------|-----------|------------------|-------|
+| H100 SXM | 700 W (default) | 0 % | Nominální |
+| H100 SXM | 600 W (-15 %) | ~5–8 % | Power capping |
+| H100 SXM | 500 W (-30 %) | ~15–25 % | Výrazný throttling |
+| H100 SXM | 400 W (-43 %) | ~30–50 % | Jen pro emergency |
+| DGX H100 (8×) | 5,6 kW (max) | 0 % | Nutné liquid cooling |
+| DGX H100 (8×) | 4,5 kW (air) | ~10–15 % | Rear-door heat exchanger |
+
+GPU throttluje při překročení TDP nebo teploty (85°C+). Power capping je lineární korelace s frekvencí, ale nelineární s propustností.
+
+### API a provozní limity
+
+| Limit | Popis | Typická hodnota |
+|-------|-------|-----------------|
+| **Rate limit** | Max requestů za minutu/hodinu | 100–10 000 RPM (dle tieru) |
+| **Tokens per minute (TPM)** | Max tokenů za minutu | 1M–300M (dle modelu) |
+| **Context window** | Max vstupních tokenů | 4K–2M (dle modelu) |
+| **Max output tokens** | Max vygenerovaných tokenů | 4K–32K (dle modelu) |
+| **Concurrent requests** | Počet paralelních requestů | 10–10 000 (dle backendu) |
+| **Batch window** | Čas na sebírání batch | 0–20 s (vLLM, TGI) |
+| **TTFB timeout** | Max latence na první token | 30–120 s |
+| **Idle timeout** | GPU idle → škálování na 0 | 5–15 min (cloud) |
+
+### Limity per deployment model
+
+| Model | Samostatný HW | Managed cloud (SageMaker, Vertex) | API (OpenAI, Anthropic) |
+|-------|--------------|----------------------------------|------------------------|
+| **Model size** | Limitován HBM (max 192 GB/GPU) | Neomezen (škálování cluster) | Neomezen |
+| **Queries** | Limitován GPU count | Auto-scaling | Rate limit (dle tieru) |
+| **Latency** | < 10 ms (same node) | 10–100 ms (network hop) | 100 ms – 10 s |
+| **Customization** | Plná (fine-tuning, quantization) | Managed (SageMaker, Bedrock) | Pouze prompt engineering |
+| **Data privacy** | Ano (on-prem) | Smluvní (region, encryption) | Omezená |
+| **Cost per 1M tokens** | ~$0,10–0,50 (FP16 inference) | ~$0,20–1,00 | ~$0,15–15,00 |
+| **Max context** | 128K+ (dle GPU count) | 128K+ | 32K–2M |
+| **Cold start** | 0 (always-on) | 30 s – 5 min | 0 (shared infra) |
+
+---
+
+## Ceny GPU a poměr cena/výkon (2026)
+
+> Ceny jsou orientační — NVIDIA nezveřejňuje oficiální ceník pro datacenter GPU. Cloud ceny dle veřejných providerů (Q2 2026). Při koupi HW se cena liší dle objemu, resellera a regionu.
+
+### Pořizovací cena (buy)
+
+| GPU | Cena/GPU | Cena 8× GPU baseboard | $/PFLOPS (FP16) | Poznámka |
+|-----|---------|----------------------|----------------|----------|
+| **H100 SXM** | $27 000–40 000 | ~$200 000 | $25 000 | Scareita 2023–2024, nyní stabilizace |
+| **H200 SXM** | $35 000–50 000 | ~$280 000 | ~$35 000 | Upgrade H100, HBM3e |
+| **B200** | ~$60 000–70 000 | ~$500 000+ | ~$31 000 | Blackwell, FP4 support |
+| **B100** | ~$30 000 | ~$240 000 | ~$20 000 | Nižší cena než B200, podobný výkon FP8 |
+| **GB200** (Grace+Blackwell) | ~$70 000–100 000 | ~$2 000 000 (rack) | — | CPU+GPU unified, high-density |
+| **A100 80GB** | ~$10 000–15 000 | ~$120 000 | ~$19 200 | Předchozí generace, stále relevantní |
+| **MI300X** | ~$12 000–18 000 | ~$100 000 | ~$9 600 | AMD, 192 GB HBM3 |
+| **Gaudi 3** | ~$15 625 | ~$125 000 | **$8 515** | Intel, nejlepší $/PFLOPS |
+| **L40S** | ~$8 000–10 000 | — | — | Inference, enterprise |
+
+### Cloud ceny (on-demand $/GPU/hr)
+
+| GPU | Nejdostupnější | Mid-range (CoreWeave, Lambda) | Hyperscaler (AWS, GCP, Azure) |
+|-----|--------------|-------------------------------|-------------------------------|
+| **H100 SXM** | $1.38 (Thunder) | $2.89–3.29 | $4.15–6.88 |
+| **H100 PCIe** | $2.01 (Spheron) | $2.50 | — |
+| **H200 SXM** | $3.89 (Spheron) | $4.54 | $5.00+ |
+| **B200** | **$3.39** (Spheron) | $6.02 | $14.24 (AWS) |
+| **B200** | **$2.12** (spot) | — | — |
+| **GB200** | $3.50 (Runcrate) | $5.85 (Oracle) | $6.95 (GCP) |
+| **MI300X** | **$1.50** (TensorWave) | $1.85 (Vultr) | $7.86 (Azure) |
+| **A100 80GB** | $1.07 (Spheron) | $1.50–2.00 | $3.00+ |
+| **Gaudi 3** | ~$1.50–2.50 | — | — |
+| **L40S** | $0.91 (Spheron) | $1.50–2.00 | — |
+
+### Cena za inferenci ($/M tokenů)
+
+| GPU | Provider | $/hr | Est. tok/s | $/M tok |
+|-----|----------|------|-----------|--------|
+| **B200** | Spheron | $3.39 | ~4 000 | **$0.42** |
+| **B200** (spot) | Spheron | $2.12 | ~4 000 | **$0.15** |
+| **H100 PCIe** | Spheron | $2.01 | ~1 200 | $0.47 |
+| **A100 80GB** | Spheron | $1.07 | ~520 | $0.57 |
+| **H100 SXM** | AWS | $6.88 | ~1 200 | $1.59 |
+| **H200 SXM** | Spheron | $4.54 | ~1 800 | $0.70 |
+| **L40S** | Spheron | $0.91 | ~450 | $0.56 |
+
+*Hodnoty pro Llama 3 70B (INT8, batch=1, output 1K tok). Reálné hodnoty se liší dle batch size, kontextu a kvantizace.*
+
+### Cena za GB HBM
+
+| GPU | HBM | Cena/hr cloud | $/GB/hr | Vhodnost pro memory-bound workloady |
+|-----|-----|-------------|--------|-----------------------------------|
+| **MI300X** | 192 GB | $1.50 | **$0.0078** | ✅ Nejlepší |
+| **B200** | 192 GB | $3.39 | $0.0177 | ✅ Dobrý |
+| **H200** | 141 GB | $3.89 | $0.0276 | ⚠️ |
+| **H100 SXM** | 80 GB | $1.38 | $0.0173 | ⚠️ Jen do 70B modelů |
+| **GB200** | 384 GB | $3.50 | $0.0091 | ✅✅ (2× MI300X kapacita) |
+
+### Poměr cena/výkon dle scénáře
+
+| Scénář | Vítěz | Zdůvodnění |
+|--------|-------|-------|
+| **Absolutní výkon** (cena není limit) | **GB200 DGX NVL72** | 72× GPU, 18 PFLOPS FP8, 384 GB HBM/GPU |
+| **Cloud inference** — nejlepší $/token | **B200 spot** | $0.15/M tok; 4× throughput H100 při nižší ceně |
+| **Cloud inference** — on-demand | **B200** | $0.42/M tok |
+| **Cloud inference** — rozpočet | **A100 / L40S** | $0.57–0.56/M tok |
+| **Training** — cena/výkon při koupi | **Gaudi 3** | $8 515/PFLOPS, 2.5–3× lepší než H100 |
+| **Training** — cloud | **H100 SXM** | $1.38/hr, CUDA ekosystém, NCCL |
+| **Memory-bound** — long context, 70B+ | **MI300X / GB200** | 192–384 GB, $0.0078–0.0091/GB |
+| **Ekosystém + bezpečná volba** | **H100/H200** | CUDA, nejširší SW, NVIDIA tools |
+| **Spot / preemptible** — nejnižší cena | **A100 / H100** | $1.07–1.38/hr, 50–90 % sleva oproti on-demand |
+
+### Trendy 2026
+
+- **H100** — cena klesla o 64 % z peaku $8/hr na $1.38–2.89/hr, pak rebound o 40 % díky inference boomu
+- **B200** — nový high-end, $3.39/hr cloud → ~$0.15/M tok na spotu — benchmark pro inference
+- **MI300X** — nabídka roste (TensorWave, Vultr, CoreWeave, Oracle, Azure), cena od $1.50/hr
+- **Gaudi 3** — nejlepší $/PFLOPS při koupi, ale úzký ekosystém a omezená cloud dostupnost
+- **Market se bifurkoval** — starší generace (H100, A100) komoditizují, nová (B200, GB200) drží prémii
+
+## Související
+
+- [GPU.md](GPU.md) — GPU architektura, NVIDIA/AMD, vGPU, MIG
+- [NETWORKING.md](NETWORKING.md) — InfiniBand, RoCE, network topologie
+- [STORAGE.md](STORAGE.md) — parallel filesystem, object store
+- [DATACENTERS.md](DATACENTERS.md) — DC layout, power, cooling
+- [CLOUD.md](CLOUD.md) — cloud AI služby (SageMaker, Vertex AI)
+
+## Zdroje
+
+Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
+
+*Poslední revize: 2026-06-18*
--- a/BIG-DATA.en.md
+++ b/BIG-DATA.en.md
@@ -0,0 +1,232 @@
+# 🗄️ Big Data — ecosystem, architecture, tools
+
+## Overview
+
+The Big Data ecosystem in 2026: "Hadoop is dead, and yet it's everywhere." HDFS has shrunk, MapReduce is effectively gone, the Cloudera/Hortonworks era is over. But YARN lives on, the Hive Metastore has changed clothes into Iceberg/Delta, and the lakehouse pattern (cheap object storage + table format + distributed engine) is the inheritance Hadoop left behind.
+
+The modern Big Data stack has 8 layers:
+
+1. **Storage** — HDFS, S3, GCS, ABFS, MinIO
+2. **Table format** — Apache Iceberg, Delta Lake, Apache Hudi, Apache Paimon
+3. **Catalog** — Hive Metastore, Unity Catalog, Polaris, Nessie, AWS Glue
+4. **Batch processing** — Apache Spark, Trino-on-Spark, Dremio
+5. **Stream processing** — Apache Flink, Spark Structured Streaming, Kafka Streams
+6. **Distributed SQL** — Trino, Presto, StarRocks, ClickHouse
+7. **Transformation** — dbt, SQLMesh
+8. **Orchestration** — Apache Airflow 3.0, Dagster, Prefect, Kestra
+
+---
+
+## Storage
+
+### HDFS (Hadoop Distributed File System)
+
+| Feature | Detail |
+|---------|--------|
+| **Architecture** | Master/worker: NameNode (metadata) + DataNode (data) |
+| **Replication** | Default 3×, configurable (rack-aware) |
+| **Block size** | Default 128 MB (range 64 MB – 256 MB) |
+| **Limits** | NameNode memory ~ 1 GB / 1 million blocks; ~1000 DataNodes per cluster |
+| **Use case** | On-prem clusters, sequential read/write, large files |
+| **Status 2026** | Declining — most projects migrate to object storage (S3, GCS, MinIO) |
+
+HDFS remains relevant for on-prem environments where object storage is unavailable, or for specific use cases (YARN clusters, Spark shuffle). For new projects, object storage is recommended.
+
+### Object storage as Data Lake
+
+| Platform | Service | Use case |
+|----------|--------|----------|
+| **AWS** | S3 | Primary data lake, Iceberg/Delta on S3 |
+| **Azure** | ADLS Gen2 / Blob | Data lake for Azure ecosystem |
+| **GCP** | GCS | Data lake for GCP (Dataproc, BigQuery) |
+| **On-prem** | MinIO | S3-compatible object storage on own HW |
+
+### HDFS capacity planning
+
+| Data size | Configuration |
+|-----------|-------------|
+| **< 100 TB** | 3–5 DataNodes, 10 GbE, replication 3× |
+| **100 TB – 1 PB** | 5–20 DataNodes, 25/100 GbE, rack-aware, NameNode HA |
+| **1 PB+** | 20+ DataNodes, 100 GbE, Federation (multiple NameNodes) |
+
+---
+
+## Open Table Formats
+
+Table formats bring ACID transactions, schema evolution, and time travel to data lake object storage.
+
+| Format | Organization | Engine compatibility | Streaming | Catalog |
+|--------|-------------|---------------------|-----------|---------|
+| **Apache Iceberg** | Apache Foundation | Spark, Flink, Trino, Dremio, Athena, Snowflake | Flink sink, snapshot-based | REST catalog, Polaris, Glue, Hive |
+| **Delta Lake** | Linux Foundation (Databricks) | Spark (native), Trino, Flink (limited), Athena | Spark Streaming, DLT | Unity Catalog (proprietary), Hive |
+| **Apache Hudi** | Apache Foundation | Spark, Flink, Trino (connector) | Built-in CDC, incremental | Hive, Glue (limited) |
+| **Apache Paimon** | Apache Foundation | Flink (native), Spark | LSM-tree, changelog mode | Hive, REST |
+
+**Recommendation 2026:**
+- **Iceberg** — broadest multi-engine support, vendor-neutral, open catalog (Polaris)
+- **Delta Lake** — best for Spark/Databricks ecosystem, UniForm for cross-format reads
+- **Hudi** — losing momentum, only if already in production
+- **Paimon** — emerging, Flink-native, LSM architecture
+
+---
+
+## Processing Engines
+
+### Apache Spark
+
+Dominant batch processing engine and unifying engine (batch + streaming + SQL + ML).
+
+| Feature | Detail |
+|---------|--------|
+| **Version 2026** | Spark 4.x (4.1.0), native Kubernetes support, Structured Streaming, Delta Lake integration |
+| **API** | Scala, Java, Python (PySpark), SQL, R (SparkR) |
+| **Batch** | DataFrame/Dataset, RDD, SQL queries — 10–100× faster than MapReduce |
+| **Streaming** | Structured Streaming (micro-batch), latency ~100 ms – 5 s |
+| **SQL** | Spark SQL, ANSI SQL, Hive compatible |
+| **ML** | MLlib, SparkML, MLflow integration |
+| **Scheduler** | YARN, Kubernetes (production-ready since Spark 3.x), standalone |
+| **Fault tolerance** | RDD lineage, checkpointing |
+
+**When to use Spark:**
+- Batch ETL/ELT pipelines
+- Unified engine for batch + streaming (team preference)
+- Machine learning pipelines (MLlib, SparkML)
+- SQL analytics on large datasets
+
+### Apache Flink
+
+Highest-performance engine for true streaming (per-event processing).
+
+| Feature | Detail |
+|---------|--------|
+| **Version 2026** | Flink 2.x (streaming-first, batch as bounded stream) |
+| **API** | DataStream API, Table/SQL API, ProcessFunction (low-level) |
+| **Latency** | < 100 ms (true streaming, Chandy-Lamport checkpointing) |
+| **State management** | Managed state (ValueState, ListState, MapState), RocksDB backend |
+| **Event time** | Native, watermarks, out-of-order handling |
+| **Batch** | Batch as bounded stream (same runtime) |
+| **Deployment** | YARN, Kubernetes, standalone |
+| **Economics** | Higher memory requirements (managed state), requires careful tuning |
+
+**When to use Flink:**
+- Fraud detection, real-time bidding, IoT (< 100 ms latency)
+- Complex stateful stream processing
+- CDC pipelines
+- Event-driven architectures
+
+### Trino (ex PrestoSQL)
+
+Distributed SQL query engine — federated queries across various sources.
+
+| Feature | Detail |
+|---------|--------|
+| **Architecture** | Coordinator + Worker (no storage, no scheduler) |
+| **Connectors** | Iceberg, Delta, Hive, HDFS, S3, GCS, ADLS, PostgreSQL, MySQL, Kafka, Elasticsearch |
+| **Use case** | Interactive SQL, federated queries, lakehouse queries |
+| **Version 2026** | Trino 470+, Iceberg native, Delta Lake connector |
+
+---
+
+## Spark vs Flink vs Trino comparison
+
+| Criteria | Spark | Flink | Trino |
+|----------|-------|-------|-------|
+| **Primary use case** | Batch + unifying | True streaming | Interactive SQL |
+| **Streaming latency** | 100 ms – 5 s (micro-batch) | < 100 ms (true streaming) | N/A |
+| **Throughput** | High (batch-optimized) | High (pipeline-optimized) | Medium (ad-hoc) |
+| **State management** | State store (external) | Managed state (embedded) | N/A |
+| **SQL support** | Spark SQL | Flink SQL | ANSI SQL (broadest) |
+| **ML/AI** | MLlib, SparkML | — | — |
+| **Kubernetes** | Native (production) | Native (production) | Native (production) |
+| **Learning curve** | Medium | High | Low |
+| **Operational complexity** | Medium | High | Medium |
+
+---
+
+## Orchestration
+
+| Tool | Version 2026 | Use case |
+|------|-------------|----------|
+| **Apache Airflow** | 3.0+ (taskflow API, dynamic tasks, deferrable operators) | Universal orchestration, largest ecosystem |
+| **Dagster** | 1.x (asset-oriented, software-defined assets) | Data pipelines, observability, asset lineage |
+| **Prefect** | 3.x (native async, workers, blocks) | Python-native, serverless workers |
+| **Kestra** | 1.x (YAML-native, declarative) | Event-driven orchestration |
+| **Apache NiFi** | 2.x (flow-based, visual) | Data ingestion, CDC, streaming |
+
+---
+
+## Lakehouse architecture
+
+Lakehouse combines data lake flexibility (object storage) with data warehouse performance and governance.
+
+```
+┌──────────────────────────────────────────────────────┐
+│                    Query Engines                      │
+│   Trino    Spark SQL    Flink SQL    Dremio    Athena  │
+└─────────────────────────┬────────────────────────────┘
+                          │
+┌─────────────────────────▼────────────────────────────┐
+│                  Table Format Layer                    │
+│         Apache Iceberg / Delta Lake / Hudi            │
+│       (ACID, time travel, schema evolution)           │
+└─────────────────────────┬────────────────────────────┘
+                          │
+┌─────────────────────────▼────────────────────────────┐
+│                    Storage Layer                       │
+│           S3 / GCS / ADLS / MinIO / HDFS              │
+│               (Parquet / ORC / Avro)                  │
+└──────────────────────────────────────────────────────┘
+```
+
+For Iceberg details see [DATABASES.en.md — Apache Iceberg Lakehouse](DATABASES.en.md#apache-iceberg-lakehouse).
+
+---
+
+## Big Data Infrastructure
+
+### Cluster sizing
+
+| Component | Spark (batch) | Flink (streaming) | Trino (SQL) |
+|-----------|--------------|-------------------|-------------|
+| **CPU** | 16–64 cores/node | 16–32 cores/node | 8–32 cores/node |
+| **RAM** | 64–256 GB/node | 64–256 GB/node (incl. managed state) | 64–256 GB/node |
+| **Storage** | HDFS / object storage | Object storage (checkpoints) | None (stateless) |
+| **Network** | 25–100 GbE (shuffle-heavy) | 25–100 GbE (checkpointing) | 25–100 GbE |
+| **Disk** | NVMe (scratch, shuffle) | NVMe (RocksDB state backend) | — |
+| **Cluster size** | 5–200+ nodes | 3–100+ nodes | 5–50 nodes |
+
+### Network considerations
+
+- **Spark shuffle** — heavy network traffic between nodes; recommend 25–100 GbE, ideally no oversubscription
+- **Flink checkpointing** — periodic state writes to object storage; requires stable latency
+- **HDFS rack awareness** — optimizes replication across racks
+- **Data locality** — HDFS: local disk reads; object storage: network-bound
+
+### Kubernetes vs YARN
+
+| Criteria | YARN | Kubernetes |
+|----------|------|-----------|
+| **Resource isolation** | Cgroups (YARN containers) | Cgroups + namespaces (pods) |
+| **Ecosystem fit** | Hadoop-native (HDFS, Hive, Spark) | Cloud-native, Spark, Flink, Trino |
+| **Operational complexity** | Lower (single cluster manager) | Higher (requires K8s cluster) |
+| **Multi-tenant isolation** | YARN queues (Capacity/Fair Scheduler) | Namespaces, ResourceQuotas, LimitRanges |
+| **Stateful workloads** | Limited | StatefulSets, PVC, Operators |
+| **2026 trend** | Legacy (declining) | Standard for new projects |
+
+---
+
+## Cloud deployment
+
+| Cloud | Batch processing | Streaming | SQL | Managed K8s |
+|-------|-----------------|-----------|-----|-------------|
+| **AWS** | EMR (Spark, Hive, Flink) | Kinesis, MSK (Kafka), EMR Flink | Athena (Trino), Redshift | EKS |
+| **Azure** | HDInsight (Spark, Hive), Synapse | Event Hubs, HDInsight Flink | Synapse SQL, Azure Data Explorer | AKS |
+| **GCP** | Dataproc (Spark, Flink, Hive, Trino) | Pub/Sub, Dataflow (Beam), Dataproc Flink | BigQuery | GKE |
+
+---
+
+## Sources
+
+Links, books and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)
+
+*Last revision: 2026-06-18*
--- a/BIG-DATA.md
+++ b/BIG-DATA.md
@@ -0,0 +1,232 @@
+# 🗄️ Big Data — ekosystém, architektura, nástroje
+
+## Přehled
+
+Big Data ekosystém v roce 2026: "Hadoop je mrtvý, a přitom je všude." HDFS se zmenšil, MapReduce je fakticky mrtvý, Cloudera/Hortonworks éra skončila. Ale YARN žije, Hive Metastore se převlékl do Iceberg/Delta a lakehouse pattern (levné object storage + tabulkový formát + distribuovaný engine) je dědictví, které Hadoop zanechal.
+
+Moderní Big Data stack má 8 vrstev:
+
+1. **Storage** — HDFS, S3, GCS, ABFS, MinIO
+2. **Tabulkový formát** — Apache Iceberg, Delta Lake, Apache Hudi, Apache Paimon
+3. **Catalog** — Hive Metastore, Unity Catalog, Polaris, Nessie, AWS Glue
+4. **Dávkové zpracování** — Apache Spark, Trino-on-Spark, Dremio
+5. **Streamové zpracování** — Apache Flink, Spark Structured Streaming, Kafka Streams
+6. **Distribuované SQL** — Trino, Presto, StarRocks, ClickHouse
+7. **Transformace** — dbt, SQLMesh
+8. **Orchestrace** — Apache Airflow 3.0, Dagster, Prefect, Kestra
+
+---
+
+## Úložiště (Storage)
+
+### HDFS (Hadoop Distributed File System)
+
+| Vlastnost | Detail |
+|-----------|--------|
+| **Architektura** | Master/worker: NameNode (metadata) + DataNode (data) |
+| **Replikace** | Výchozí 3×, konfigurovatelná (rack-aware) |
+| **Block size** | Výchozí 128 MB (lze 64 MB – 256 MB) |
+| **Limity** | NameNode memory ~ 1 GB / 1 milion bloků; ~1000 DataNode v clusteru |
+| **Use case** | On-prem clustery, sekvenční čtení/zápis, velké soubory |
+| **Stav 2026** | Klesající podíl — většina migruje na object storage (S3, GCS, MinIO) |
+
+HDFS je stále relevantní pro on-prem prostředí, kde object storage není dostupná, nebo pro specifické use case (YARN cluster, Spark shuffle). Pro nové projekty se doporučuje object storage.
+
+### Object storage jako Data Lake
+
+| Platforma | Služba | Use case |
+|-----------|--------|----------|
+| **AWS** | S3 | Hlavní data lake, Iceberg/Delta na S3 |
+| **Azure** | ADLS Gen2 / Blob | Data lake pro Azure ekosystém |
+| **GCP** | GCS | Data lake pro GCP (Dataproc, BigQuery) |
+| **On-prem** | MinIO | S3-kompatibilní object storage na vlastním HW |
+
+### Kapacitní plánování HDFS
+
+| Velikost dat | Konfigurace |
+|-------------|------------|
+| **< 100 TB** | 3–5 DataNode, 10 GbE, replication 3× |
+| **100 TB – 1 PB** | 5–20 DataNode, 25/100 GbE, rack-aware, NameNode HA |
+| **1 PB+** | 20+ DataNode, 100 GbE, Federation (více NameNode) |
+
+---
+
+## Tabulkové formáty (Open Table Formats)
+
+Tabulkové formáty přináší ACID transakce, schema evolution a time travel do data lake objektového úložiště.
+
+| Formát | Organizace | Engine kompatibilita | Streaming | Katalog |
+|--------|-----------|---------------------|-----------|---------|
+| **Apache Iceberg** | Apache Foundation | Spark, Flink, Trino, Dremio, Athena, Snowflake | Flink sink, snapshot-based | REST catalog, Polaris, Glue, Hive |
+| **Delta Lake** | Linux Foundation (Databricks) | Spark (native), Trino, Flink (limited), Athena | Spark Streaming, DLT | Unity Catalog (proprietary), Hive |
+| **Apache Hudi** | Apache Foundation | Spark, Flink, Trino (connector) | Built-in CDC, incremental | Hive, Glue (limited) |
+| **Apache Paimon** | Apache Foundation | Flink (native), Spark | LSM-tree, changelog mode | Hive, REST |
+
+**Doporučení 2026:**
+- **Iceberg** — nejširší multi-engine podpora, vendor-neutral, otevřený katalog (Polaris)
+- **Delta Lake** — nejlepší pro Spark/Databricks ekosystém, UniForm pro cross-format čtení
+- **Hudi** — ztrácí momentum, jen pokud již v produkci
+- **Paimon** — emerging, Flink-native, LSM architektura
+
+---
+
+## Zpracování (Processing Engines)
+
+### Apache Spark
+
+Dominantní engine pro dávkové zpracování a unifying engine (batch + streaming + SQL + ML).
+
+| Vlastnost | Detail |
+|-----------|--------|
+| **Verze 2026** | Spark 4.x (4.1.0), native Kubernetes support, Structured Streaming, Delta Lake integrace |
+| **API** | Scala, Java, Python (PySpark), SQL, R (SparkR) |
+| **Batch** | DataFrame/Dataset, RDD, SQL queries — 10–100× rychlejší než MapReduce |
+| **Streaming** | Structured Streaming (micro-batch), latence ~100 ms – 5 s |
+| **SQL** | Spark SQL, ANSI SQL, Hive兼容 |
+| **ML** | MLlib, SparkML, integrace s MLflow |
+| **Scheduler** | YARN, Kubernetes (production-ready od Spark 3.x), standalone |
+| **Fault tolerance** | RDD lineage, checkpointing |
+
+**Kdy použít Spark:**
+- Dávkové ETL/ELT pipelines
+- Jednotný engine pro batch + streaming (team preference)
+- Machine learning pipelines (MLlib, SparkML)
+- SQL analytika na velkých datech
+
+### Apache Flink
+
+Nejvýkonnější engine pro true streaming (per-event zpracování).
+
+| Vlastnost | Detail |
+|-----------|--------|
+| **Verze 2026** | Flink 2.x (streaming-first, batch jako speciální případ streamu) |
+| **API** | DataStream API, Table/SQL API, ProcessFunction (low-level) |
+| **Latence** | < 100 ms (true streaming, Chandy-Lamport checkpointing) |
+| **State management** | Managed state (ValueState, ListState, MapState), RocksDB backend |
+| **Event time** | Nativní, watermarky, out-of-order handling |
+| **Batch** | Batch jako bounded stream (stejný runtime) |
+| **Deployment** | YARN, Kubernetes, standalone |
+| **Ekonomika** | Vyšší paměťové nároky (managed state), nutnost pečlivého tuningu |
+
+**Kdy použít Flink:**
+- Fraud detection, real-time bidding, IoT (< 100 ms latence)
+- Komplexní stateful stream processing
+- CDC pipelines
+- Event-driven architektury
+
+### Trino (ex PrestoSQL)
+
+Distribuovaný SQL query engine — federované dotazy napříč různými zdroji.
+
+| Vlastnost | Detail |
+|-----------|--------|
+| **Architektura** | Coordinator + Worker (bez storage, bez scheduleru) |
+| **Konektory** | Iceberg, Delta, Hive, HDFS, S3, GCS, ADLS, PostgreSQL, MySQL, Kafka, Elasticsearch |
+| **Use case** | Interactive SQL, federované dotazy, lakehouse queries |
+| **Verze 2026** | Trino 470+, Iceberg native, Delta Lake connector |
+
+---
+
+## Srovnání Spark vs Flink vs Trino
+
+| Kritérium | Spark | Flink | Trino |
+|-----------|-------|-------|-------|
+| **Primární use case** | Batch + unifying | True streaming | Interactive SQL |
+| **Latence streaming** | 100 ms – 5 s (micro-batch) | < 100 ms (true streaming) | N/A |
+| **Throughput** | Vysoký (batch optimalizace) | Vysoký (pipeline optimalizace) | Střední (ad-hoc) |
+| **State management** | State store (external) | Managed state (embedded) | N/A |
+| **SQL support** | Spark SQL | Flink SQL | ANSI SQL (nejširší) |
+| **ML/AI** | MLlib, SparkML | — | — |
+| **Kubernetes** | Native (production) | Native (production) | Native (production) |
+| **Křivka učení** | Střední | Vysoká | Nízká |
+| **Provozní náročnost** | Střední | Vysoká | Střední |
+
+---
+
+## Orchestrace
+
+| Nástroj | Verze 2026 | Use case |
+|---------|-----------|----------|
+| **Apache Airflow** | 3.0+ (taskflow API, dynamic tasks, deferrable operators) | Univerzální orchestrace, největší ekosystém |
+| **Dagster** | 1.x (asset-oriented, software-defined assets) | Data pipelines, observabilita, asset lineage |
+| **Prefect** | 3.x (native async, workers, blocks) | Python-native, serverless workers |
+| **Kestra** | 1.x (YAML-native, declarative) | Event-driven orchestration |
+| **Apache NiFi** | 2.x (flow-based, visual) | Data ingestion, CDC, streaming |
+
+---
+
+## Lakehouse architektura
+
+Lakehouse kombinuje flexibilitu data lake (object storage) s výkonem a governance data warehouse.
+
+```
+┌──────────────────────────────────────────────────────┐
+│                    Query Engines                      │
+│   Trino    Spark SQL    Flink SQL    Dremio    Athena  │
+└─────────────────────────┬────────────────────────────┘
+                          │
+┌─────────────────────────▼────────────────────────────┐
+│                  Table Format Layer                    │
+│         Apache Iceberg / Delta Lake / Hudi            │
+│       (ACID, time travel, schema evolution)           │
+└─────────────────────────┬────────────────────────────┘
+                          │
+┌─────────────────────────▼────────────────────────────┐
+│                    Storage Layer                       │
+│           S3 / GCS / ADLS / MinIO / HDFS              │
+│               (Parquet / ORC / Avro)                  │
+└──────────────────────────────────────────────────────┘
+```
+
+Detailněji Iceberg viz [DATABASES.md — Apache Iceberg Lakehouse](DATABASES.md#apache-iceberg-lakehouse).
+
+---
+
+## Infrastruktura pro Big Data
+
+### Cluster sizing
+
+| Komponenta | Spark (batch) | Flink (streaming) | Trino (SQL) |
+|------------|--------------|-------------------|-------------|
+| **CPU** | 16–64 cores/node | 16–32 cores/node | 8–32 cores/node |
+| **RAM** | 64–256 GB/node | 64–256 GB/node (včetně managed state) | 64–256 GB/node |
+| **Storage** | HDFS / object storage | Object storage (checkpointy) | Žádná (stateless) |
+| **Network** | 25–100 GbE (shuffle-heavy) | 25–100 GbE (checkpointing) | 25–100 GbE |
+| **Disk** | NVMe (scratch, shuffle) | NVMe (RocksDB state backend) | — |
+| **Cluster velikost** | 5–200+ nodes | 3–100+ nodes | 5–50 nodes |
+
+### Network considerations
+
+- **Spark shuffle** — heavy network traffic mezi uzly; doporučeno 25–100 GbE, ideálně bez oversubscription
+- **Flink checkpointing** — periodický zápis stavu na object storage; vyžaduje stabilní latenci
+- **HDFS rack awareness** — optimalizuje replikaci napříč racky
+- **Data locality** — HDFS: čtení z lokálního disku; object storage: network-bound
+
+### Kubernetes vs YARN
+
+| Kritérium | YARN | Kubernetes |
+|-----------|------|-----------|
+| **Resource isolation** | Cgroups (YARN containers) | Cgroups + namespaces (pods) |
+| **Ecosystem fit** | Hadoop-native (HDFS, Hive, Spark) | Cloud-native, Spark, Flink, Trino |
+| **Operational complexity** | Nižší (jeden cluster manager) | Vyšší (vyžaduje K8s cluster) |
+| **Multi-tenant isolation** | YARN queues (Capacity/Fair Scheduler) | Namespaces, ResourceQuotas, LimitRanges |
+| **Stateful workloads** | Omezená | StatefulSets, PVC, Operators |
+| **2026 trend** | Legacy (klesající) | Standard pro nové projekty |
+
+---
+
+## Nasazení v cloudu
+
+| Cloud | Dávkové zpracování | Streaming | SQL | Managed K8s |
+|-------|-------------------|-----------|-----|-------------|
+| **AWS** | EMR (Spark, Hive, Flink) | Kinesis, MSK (Kafka), EMR Flink | Athena (Trino), Redshift | EKS |
+| **Azure** | HDInsight (Spark, Hive), Synapse | Event Hubs, HDInsight Flink | Synapse SQL, Azure Data Explorer | AKS |
+| **GCP** | Dataproc (Spark, Flink, Hive, Trino) | Pub/Sub, Dataflow (Beam), Dataproc Flink | BigQuery | GKE |
+
+---
+
+## Zdroje
+
+Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
+
+*Poslední revize: 2026-06-18*
--- a/CASSANDRA.en.md
+++ b/CASSANDRA.en.md
@@ -123,7 +123,7 @@ ScyllaDB is advantageous when:

 ## Sources

-References, books, and standards: [sources/databases/sources.md](sources/databases/sources.md)
+References, books, and standards: [sources/databases/sources.en.md](sources/databases/sources.en.md)

 ### Recommended reading

--- a/CICD.en.md
+++ b/CICD.en.md
@@ -637,7 +637,7 @@ New tools: Harness (AI-native CD), GitLab 19.0 (agentic MR workflows, secrets ma

 ## Resources

-Links, books and standards: [sources/cicd/sources.md](sources/cicd/sources.md)
+Links, books and standards: [sources/cicd/sources.en.md](sources/cicd/sources.en.md)

 ### Recommended Reading

--- a/CLOUD.en.md
+++ b/CLOUD.en.md
@@ -144,7 +144,7 @@ Analogues: Azure Well-Architected Framework, GCP Architecture Framework
 | **Storage optimized** | I4i, im4gn | 1:4 + NVMe | Transactional DB, data warehousing, Kafka | i4i.large ~$0.138/h |
 | **GPU / ML** | P5, g5, trn1 | GPU attach | AI training (P5), inference (g5), ML (trn1) | g5.xlarge ~$1.006/h |

-See [GPU.md](GPU.md) for GPU model and configuration details.
+See [GPU.en.md](GPU.en.md) for GPU model and configuration details.

 ### Storage

@@ -287,7 +287,7 @@ Automated checks of architectural characteristics — analogous to tests for arc

 ## Hybrid Cloud Connectivity

-See also: [NETWORKING.md](NETWORKING.md) — network architecture (VPN, BGP, VPC design).
+See also: [NETWORKING.en.md](NETWORKING.en.md) — network architecture (VPN, BGP, VPC design).

 - **Site-to-Site VPN** — IPSec tunnel over the internet
 - **Direct Connect / ExpressRoute / Dedicated Interconnect** — private physical connection
@@ -480,7 +480,7 @@ OpenStack is the dominant open-source platform for building private clouds (IaaS

 ## Resources

-Links, books and standards: [sources/cloud/sources.md](sources/cloud/sources.md)
+Links, books and standards: [sources/cloud/sources.en.md](sources/cloud/sources.en.md)
 - **Cost tagging** — assign tags for chargeback/showback (Environment, Team, Cost Center, Application)
 - **Automated compliance** — AWS Config, Azure Policy, GCP Org Policies for guardrails
 - **Multi-account strategy** — AWS Control Tower, Azure Landing Zones, GCP Resource Hierarchy
--- a/CONNECTIVITY.en.md
+++ b/CONNECTIVITY.en.md
@@ -259,7 +259,7 @@ HPE ProLiant Gen11 (DL360/DL380) supports:

 ## Sources

-Links, books, and standards: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
+Links, books, and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)

 ### Recommended literature

--- a/DATABASE-ENGINES.en.md
+++ b/DATABASE-ENGINES.en.md
@@ -90,7 +90,7 @@ Each transaction sees a snapshot of data as of the start time. Old row versions

 ## Resources

-Links, books and standards: [sources/databases/sources.md](sources/databases/sources.md)
+Links, books and standards: [sources/databases/sources.en.md](sources/databases/sources.en.md)

 ### Recommended Reading

--- a/DATABASES.en.md
+++ b/DATABASES.en.md
@@ -6,20 +6,20 @@

 | DB | License | Use Case | Details |
 |----|---------|----------|--------|
-| **PostgreSQL** | Open source | Universal, geospatial, analytics, AI | [POSTGRESQL.md](POSTGRESQL.md) |
-| **MySQL / MariaDB** | Open source | Web, LAMP stack, e-commerce | [MYSQL.md](MYSQL.md) |
+| **PostgreSQL** | Open source | Universal, geospatial, analytics, AI | [POSTGRESQL.en.md](POSTGRESQL.en.md) |
+| **MySQL / MariaDB** | Open source | Web, LAMP stack, e-commerce | [MYSQL.en.md](MYSQL.en.md) |
 | **Microsoft SQL Server** | Proprietary | Enterprise .NET, Windows ecosystem | — |
-| **Oracle DB** | Proprietary | Enterprise, finance, mainframe, RAC cluster | [ORACLE.md](ORACLE.md) |
+| **Oracle DB** | Proprietary | Enterprise, finance, mainframe, RAC cluster | [ORACLE.en.md](ORACLE.en.md) |
 | **Amazon Aurora** | Managed | MySQL/PostgreSQL compatible, cloud-native | — |

 ### NoSQL

 | Type | DB | Use Case | Details |
 |-----|----|----------|--------|
-| **Document** | MongoDB, Couchbase | JSON data, flexible schema | [MONGODB.md](MONGODB.md) |
-| **Key-Value / Cache** | Redis, Memcached, DynamoDB | Cache, session store, real-time | [REDIS.md](REDIS.md) |
-| **Wide-column** | Cassandra, ScyllaDB | Time-series, IoT, big data | [CASSANDRA.md](CASSANDRA.md) |
-| **Vector** | Pinecone, Qdrant, Milvus, pgvector | Embeddings, RAG, semantic search | [VEKTOROVE-DB.md](VEKTOROVE-DB.md) |
+| **Document** | MongoDB, Couchbase | JSON data, flexible schema | [MONGODB.en.md](MONGODB.en.md) |
+| **Key-Value / Cache** | Redis, Memcached, DynamoDB | Cache, session store, real-time | [REDIS.en.md](REDIS.en.md) |
+| **Wide-column** | Cassandra, ScyllaDB | Time-series, IoT, big data | [CASSANDRA.en.md](CASSANDRA.en.md) |
+| **Vector** | Pinecone, Qdrant, Milvus, pgvector | Embeddings, RAG, semantic search | [VECTOR-DBS.en.md](VECTOR-DBS.en.md) |
 | **Graph** | Neo4j, Dgraph | Relationships, recommendations, social graphs | — |

 ### Storage Engines
@@ -258,6 +258,8 @@ Table metadata (.metadata.json)
 | **Hidden partitioning** | Automatic partition filters (user does not need to specify) |
 | **Multi-engine** | Spark, Flink, Trino, Dremio, Snowflake over the same data |

+For a broader overview of the Big Data ecosystem (HDFS, Spark, Flink, Trino, Delta Lake, Hudi) see [BIG-DATA.en.md](BIG-DATA.en.md).
+
 ### When to Use Iceberg

 - Multi-tool access to the same governed data
@@ -305,7 +307,7 @@ Table metadata (.metadata.json)

 ## Resources

-Links, books and standards: [sources/databases/sources.md](sources/databases/sources.md)
+Links, books and standards: [sources/databases/sources.en.md](sources/databases/sources.en.md)

 ### Recommended Reading

--- a/DATABASES.md
+++ b/DATABASES.md
@@ -258,6 +258,8 @@ Table metadata (.metadata.json)
 | **Hidden partitioning** | Automatické partition filtry (uživatel nemusí uvádět) |
 | **Multi-engine** | Spark, Flink, Trino, Dremio, Snowflake nad stejnými daty |

+Detailnější přehled Big Data ekosystému (HDFS, Spark, Flink, Trino, Delta Lake, Hudi) viz [BIG-DATA.md](BIG-DATA.md).
+
 ### Kdy použít Iceberg

 - Multi-tool přístup ke stejným governed datům
--- a/DATACENTERS.en.md
+++ b/DATACENTERS.en.md
@@ -950,7 +950,7 @@ Tools: `smartmontools` (smartctl, smartd), Prometheus exporter (`node_exporter`)

 ## Sources

-Links, books and standards: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
+Links, books and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)

 ### Recommended literature

@@ -1010,7 +1010,7 @@ Best practices: separate auth and recursive resolvers, DNSSEC, split-horizon (in

 ### Monitoring and observability

-See [MONITORING.md](MONITORING.md). Before running first workloads, DC must have:
+See [MONITORING.en.md](MONITORING.en.md). Before running first workloads, DC must have:
 - Metric collection (Prometheus, Zabbix)
 - Centralized logs (Loki, ELK)
 - Alerting (Alertmanager, PagerDuty)
--- a/DC-MIGRATION.en.md
+++ b/DC-MIGRATION.en.md
@@ -241,6 +241,6 @@ See [CLOUD.en.md](CLOUD.en.md) — migration strategies (6 Rs):

 ## Sources

-Links, books, and standards: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
+Links, books, and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)

 *Last revision: 2026-06-12*
--- a/DR.en.md
+++ b/DR.en.md
@@ -323,14 +323,14 @@ contacts:

 ## Related

- [CLOUD.md](CLOUD.md) — cloud DR strategy, AWS/Azure/GCP specific
- [DATACENTERS.md](DATACENTERS.md) — DC redundancy, Tier classification
- [MONITORING.md](MONITORING.md) — alerting, SLI/SLO/SLA
- [CICD.md](CICD.md) — deployment strategy, rollback
- [STORAGE.md](STORAGE.md) — backup storage, replication
+- [CLOUD.en.md](CLOUD.en.md) — cloud DR strategy, AWS/Azure/GCP specific
+- [DATACENTERS.en.md](DATACENTERS.en.md) — DC redundancy, Tier classification
+- [MONITORING.en.md](MONITORING.en.md) — alerting, SLI/SLO/SLA
+- [CICD.en.md](CICD.en.md) — deployment strategy, rollback
+- [STORAGE.en.md](STORAGE.en.md) — backup storage, replication

 ## Sources

-Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
+Odkazy, knihy a standardy: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)

 *Last revised: 2026-06-11*
--- a/GPU.en.md
+++ b/GPU.en.md
@@ -112,6 +112,12 @@ NVLink topologie (GPU direct)   PCIe topologie (CPU mediated)
 - **Denoising**: AI-accelerated denoising on GPU
 - **Farm rendering**: Deadline, Qube! (job scheduler)

+## GPU pricing
+
+Detailed pricing comparisons (purchase price, cloud on-demand, $/M token inference cost, $/GB HBM, price trends 2024→2026) see:
+
+- [AI-INFRASTRUCTURE.en.md — GPU pricing and price/performance](AI-INFRASTRUCTURE.en.md#gpu-pricing-and-priceperformance)
+
 ## GPU server form factors

 | Form factor | GPU count | Power | Cooling | Example |
@@ -144,6 +150,6 @@ Cyborg is an OpenStack service for managing accelerators (GPU, FPGA, DPU, NPU).

 ## Sources

-Links, books and standards: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
+Links, books and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)

 *Last revision: 2026-06-03*
--- a/GPU.md
+++ b/GPU.md
@@ -112,6 +112,12 @@ NVLink topologie (GPU direct)   PCIe topologie (CPU mediated)
 - **Denoising**: AI-accelerated denoising na GPU
 - **Farm rendering**: Deadline, Qube! (job scheduler)

+## Ceny GPU
+
+Detailní cenová srovnání (nákupní cena, cloud on-demand, $/M token inferenčních nákladů, $/GB HBM, cenový vývoj 2024→2026) viz:
+
+- [AI-INFRASTRUCTURE.md — Ceny GPU a poměr cena/výkon](AI-INFRASTRUCTURE.md#ceny-gpu-a-poměr-cenavýkon)
+
 ## GPU server form factors

 | Form factor | GPU count | Power | Cooling | Příklad |
--- a/HARDWARE.en.md
+++ b/HARDWARE.en.md
@@ -4,9 +4,9 @@ This file has been split into separate areas:

 | Area | File |
 |--------|--------|
-| 🔧 Server hardware — components and architecture | [SERVER-HW.md](SERVER-HW.md) |
-| 🎮 GPU — architecture, models, virtualization | [GPU.md](GPU.md) |
-| ⚙️ Server configuration — best practices by workload | [SERVER-CONFIG.md](SERVER-CONFIG.md) |
-| 📦 Provisioning — boot, installation, server management | [PROVISIONING.md](PROVISIONING.md) |
+| 🔧 Server hardware — components and architecture | [SERVER-HW.en.md](SERVER-HW.en.md) |
+| 🎮 GPU — architecture, models, virtualization | [GPU.en.md](GPU.en.md) |
+| ⚙️ Server configuration — best practices by workload | [SERVER-CONFIG.en.md](SERVER-CONFIG.en.md) |
+| 📦 Provisioning — boot, installation, server management | [PROVISIONING.en.md](PROVISIONING.en.md) |

 *Last revision: 2026-06-03*
--- a/HYPERVISORS.en.md
+++ b/HYPERVISORS.en.md
@@ -24,7 +24,7 @@
 - **VM — Virtual Machine** — full virtualization, own kernel
 - **Container** — shared host kernel, lighter (Docker, LXC)
 - **Paravirtualization** — guest OS knows it runs in a VM (better I/O performance)
- **NUMA** — Non-Uniform Memory Access, CPU/memory allocation optimization (see [SERVER-HW.md](SERVER-HW.md#numa))
+- **NUMA** — Non-Uniform Memory Access, CPU/memory allocation optimization (see [SERVER-HW.en.md](SERVER-HW.en.md#numa))
 - **Overcommit** — allocating more vCPU/RAM than physically available (ratio management)
 - **Live Migration** — moving a running VM between hosts (vSphere vMotion, Hyper-V Live Migration)
 - **HA (High Availability)** — VM restart on another host upon failure
@@ -86,20 +86,22 @@ According to Foundry/CIO.com survey (2025): **56%** of organizations plan to red

 #### Target Platforms — Comparison

-| Criterion | Proxmox VE | Nutanix AHV | Microsoft Hyper-V | Red Hat OpenShift Virtualization |
-|-----------|-----------|-------------|-------------------|----------------------------------|
-| **Hypervisor** | KVM + LXC | KVM (fork) | Hyper-V | KVM (KubeVirt) |
-| **License** | Open source (free), support ~€500/host/year | Per node subscription (30–60% savings vs VCF) | Windows Server license (Standard/Datacenter) | OpenShift subscription (core-based) |
-| **Live Migration** | Live Migration (Proxmox 8+) | AHV Live Migration | Live Migration (SMB/RDMA) | KubeVirt (VMI live migration) |
-| **HA** | Proxmox HA (watchdog, fencing) | Built-in HA (Prism) | Hyper-V HA (WS Failover Cluster) | OpenShift HA (self-healing) |
-| **Storage** | ZFS, Ceph, LVM | AOS (hybrid/SSD, erasure coding) | S2D, CSV, ReFS | OCS, Ceph, LSO |
-| **Backup** | Proxmox Backup Server (free) | Native snapshot + DR | Windows Server Backup / Veeam | OpenShift APIs + OADP |
-| **Price (3 years, 3 hosts)** | $0 + support $1,500 | ~$45,000–60,000 | $0 (Hyper-V Server free) or Windows Server license | ~$90,000+ (OpenShift) |
-| **Price (3 years, 10 hosts)** | $0 + support $5,000 | ~$150,000–200,000 | Windows Server Datacenter for unlimited VMs | ~$300,000+ (OpenShift) |
-| **Migration difficulty** | Medium (VMDK → QCOW2, VirtIO drivers) | Low (Nutanix Move tool) | Medium (V2V converter, SCVMM) | High (Kubernetes learning curve) |
-| **Linux support** | Excellent (native KVM) | Excellent (KVM-based) | Good (LIS drivers) | Excellent (KVM + OpenShift) |
-| **Windows support** | Good (VirtIO drivers) | Excellent (ALAS drivers, svpd) | Excellent (native) | Good (KubeVirt + VirtIO) |
-| **GPU passthrough** | VFIO (excellent) | GPU passthrough | DDA (Direct Device Assignment) | VFIO + GPU Operator |
+| Criterion | Proxmox VE | Nutanix AHV | Microsoft Hyper-V | Red Hat OpenShift Virtualization | **Sangfor aSV (HCI)** |
+|-----------|-----------|-------------|-------------------|----------------------------------|----------------------|
+| **Hypervisor** | KVM + LXC | KVM (fork) | Hyper-V | KVM (KubeVirt) | **KVM (aSV)** |
+| **License** | Open source (free), support ~€500/host/year | Per node subscription (30–60% savings vs VCF) | Windows Server license (Standard/Datacenter) | OpenShift subscription (core-based) | **Per node (Enterprise Pro), all-inclusive** |
+| **Live Migration** | Live Migration (Proxmox 8+) | AHV Live Migration | Live Migration (SMB/RDMA) | KubeVirt (VMI live migration) | **Yes** |
+| **HA** | Proxmox HA (watchdog, fencing) | Built-in HA (Prism) | Hyper-V HA (WS Failover Cluster) | OpenShift HA (self-healing) | **Built-in HA** |
+| **Storage** | ZFS, Ceph, LVM | AOS (hybrid/SSD, erasure coding) | S2D, CSV, ReFS | OCS, Ceph, LSO | **aSAN (distributed SDS, locality-aware)** |
+| **Backup** | Proxmox Backup Server (free) | Native snapshot + DR | Windows Server Backup / Veeam | OpenShift APIs + OADP | **Built-in backup + CDP** |
+| **Price (3 years, 3 hosts)** | $0 + support $1,500 | ~$45,000–60,000 | $0 (Hyper-V Server free) or Windows Server license | ~$90,000+ (OpenShift) | **~$15,000–25,000** |
+| **Price (3 years, 10 hosts)** | $0 + support $5,000 | ~$150,000–200,000 | Windows Server Datacenter for unlimited VMs | ~$300,000+ (OpenShift) | **~$50,000–80,000** |
+| **Migration difficulty** | Medium (VMDK → QCOW2, VirtIO drivers) | Low (Nutanix Move tool) | Medium (V2V converter, SCVMM) | High (Kubernetes learning curve) | **Low (VMware import tool)** |
+| **Linux support** | Excellent (native KVM) | Excellent (KVM-based) | Good (LIS drivers) | Excellent (KVM + OpenShift) | **Excellent (KVM-based)** |
+| **Windows support** | Good (VirtIO drivers) | Excellent (ALAS drivers, svpd) | Excellent (native) | Good (KubeVirt + VirtIO) | **Good (VirtIO drivers)** |
+| **GPU passthrough** | VFIO (excellent) | GPU passthrough | DDA (Direct Device Assignment) | VFIO + GPU Operator | **vGPU support (standard)** |
+| **Integrated security** | — | — | — | — | **Yes (NGFW, IPS, WAF, EDR — aSEC)** |
+| **Min. cluster (3 copies)** | 3 (Ceph) | 3 | 2–3 | 3 | **3** |

 #### Migration Tools

@@ -112,8 +114,47 @@ According to Foundry/CIO.com survey (2025): **56%** of organizations plan to red
 | **virt-v2v** | VMware ESXi, Xen, Hyper-V | KVM (libvirt) | Open source CLI tool, disk + driver conversion (virtio), suitable for bulk migration |
 | **Windows Admin Center VM Conversion Extension** | VMware ESXi | Hyper-V | Microsoft WAC extension, free, GUI-based, bulk migration |
 | **Platform9 vJailbreak** | VMware ESXi | OpenStack / KVM | In-place migration (no swing gear), open source |
+| **Sangfor VMware Import Tool** | VMware ESXi | Sangfor aSV (HCI) | VMware import tool, disk + driver conversion, can retain network config |

-#### TCO Comparison — Example: 3 hosts (2× 20C CPU), 50 VMs
+#### Cross-Hypervisor Migration Matrix
+
+Comprehensive overview of all source→target pairs with methods, tools, limitations, and complexity.
+
+| Source → Target | Method | Tools | Complexity | Limitations |
+|-------------|--------|----------|-----------|---------|
+| **VMware → Proxmox** | Disk conversion VMDK→QCOW2, driver reinstall | Proxmox Import Wizard, Veeam, StarWind, virt-v2v | Medium | VirtIO drivers required, UEFI not supported in Import Wizard (< 8.1), snapshots must be removed |
+| **VMware → Hyper-V** | Disk conversion VMDK→VHDX, driver reinstall | StarWind, WAC Converter, SCVMM, Microsoft MTC | Medium | Integration Services required, network config differences (VMXNET3 → Hyper-V Synthetic) |
+| **VMware → KVM/XCP-ng** | Disk conversion VMDK→raw/QCOW2, driver swap | virt-v2v, StarWind | Medium | VirtIO drivers, UEFI support (OVMF), host passthrough compatibility |
+| **VMware → Nutanix AHV** | Automated migration via Move appliance | Nutanix Move, Veeam | Low | AHV is also KVM — minimal issues, retain IP/MAC, UEFI support |
+| **VMware → Sangfor aSV** | Import via VMware Import Tool, disk + driver conversion | Sangfor VMware Import Tool | Low | Built-in tool, retain network config, UEFI support |
+| **VMware → OpenStack** | In-place or swing | Platform9 vJailbreak, virt-v2v + Glance | High | Network redesign (Neutron), storage (Cinder), image format (Glance) required |
+| **Hyper-V → VMware** | Disk conversion VHDX→VMDK, driver reinstall | StarWind, virt-v2v, VMware vCenter Converter (standalone) | Medium | VMware Tools required, network driver change (VMXNET3), UEFI/secure boot issues |
+| **Hyper-V → Proxmox** | Disk conversion VHDX→QCOW2, driver swap | StarWind, virt-v2v, qemu-img | Medium–High | VirtIO drivers, integration services → guest agent, secure boot issues |
+| **Hyper-V → KVM/XCP-ng** | Disk conversion VHDX→raw/QCOW2 | virt-v2v, qemu-img | Medium | VirtIO drivers, Linux generic drivers usually work |
+| **Hyper-V → Nutanix AHV** | Automated migration | Nutanix Move | Low–Medium | Similar to VMware→Nutanix, UEFI support, retain IP |
+| **Proxmox → VMware** | Export OVF/OVA, qemu-img convert | qemu-img (QCOW2→VMDK), ovftool, manual OVF export | High | VMware Tools required, storage format differences, no live migration, downtime required |
+| **Proxmox → Hyper-V** | qemu-img convert, driver reinstall | qemu-img, manual VHDX conversion | High | Hyper-V Integration Services required, no automated tool, edge case |
+| **Proxmox → KVM/XCP-ng** | Direct QCOW2 (same format), XML edit | libvirt, virsh dumpxml/define | Medium | libvirt XML/QEMU args differences (storage pool, network), validation required |
+| **Proxmox → Nutanix AHV** | qemu-img + manual import | qemu-img, Nutanix Image Service CLI | High | No hot tool, conversion + manual VM reconfiguration required |
+| **XCP-ng → VMware** | Disk conversion VHD→VMDK | qemu-img, StarWind, virt-v2v | High | VMware Tools required, paravirtualization differences (Xen PV vs VMware) |
+| **XCP-ng → Proxmox** | Disk conversion or direct VHD | qemu-img, manual import | Medium | Disk conversion, VHD format not native in Proxmox |
+| **XCP-ng → Hyper-V** | Disk conversion VHD→VHDX (direct) | StarWind, qemu-img | Medium | VHD/VHDX compatible, Integration Services required |
+| **Nutanix AHV → VMware** | Export + conversion | qemu-img, Nutanix Export, VMware vCenter Converter | High | VMware Tools, AHV is KVM → usually easier than Hyper-V→VMware |
+| **Nutanix AHV → Proxmox** | qemu-img + manual import | qemu-img, Nutanix self-service restore | Medium | AFS disks → QCOW2, metadata must be reconstructed |
+| **Nutanix AHV → Hyper-V** | qemu-img + manual | qemu-img, StarWind | High | Edge case, no hot tool |
+| **OpenStack → (any)** | Glance export + qemu-img | glance image-download, qemu-img, ovftool | Medium–High | Image format (raw/QCOW2), metadata (flavor, security groups) must be recreated |
+| **Sangfor aSV → (any)** | qemu-img conversion + manual | qemu-img, manual OVF/OVA export | Medium–High | KVM-based → conversion to QCOW2/VMDK/VHDX via qemu-img, metadata must be recreated |
+| **(any) → Sangfor aSV** | aSV API import + VMware Import Tool | Sangfor VMware Import Tool (for VMware), manual qemu-img import for others | Medium | KVM-based → standard formats supported, import tool for VMware only |
+
+**Migration success keys:**
+
+- **Drivers** — each platform requires its own paravirtual drivers (VMware Tools, VirtIO, Hyper-V Integration Services, Xen Tools). Always swap after migration.
+- **UEFI / Secure Boot** — not all combinations support UEFI (Proxmox Import Wizard < 8.1 does not). Test UEFI VMs before migration.
+- **Snapshots** — snapshots must be removed (merged) before migration. Most tools only migrate flat disks.
+- **Network** — MAC addresses, IP addresses, VLAN tagging — verify after migration. Some tools (Nutanix Move, VMware Converter) can retain MAC.
+- **Storage format** — VMDK ↔ VHDX ↔ QCOW2 ↔ raw are inter-convertible via `qemu-img`, but metadata differs (snapshots, backing files).
+- **Live migration** — no live migration exists between different hypervisors. Downtime is always required (minutes to hours depending on VM size).
+- **Migration temperature** — the "colder" the VM (fewer changes), the easier the migration. Real-time database applications require a separate DB migration plan.

 | Platform | Year 1 | 3 Years Total | Note |
 |-----------|--------|---------------|----------|
@@ -123,6 +164,7 @@ According to Foundry/CIO.com survey (2025): **56%** of organizations plan to red
 | **Nutanix AHV** (average) | ~$18,000 | ~$54,000 | Per node subscription, estimate |
 | **Hyper-V** (Windows Server Datacenter) | $12,400 | $37,200 | One-time license per core, without SA |
 | **Hyper-V** (Azure Stack HCI) | ~$7,200 | ~$21,600 | ~$10/core/month, 120 cores |
+| **Sangfor HCI** (Enterprise Pro) | ~$5,000–8,000 | ~$15,000–25,000 | Per node, all-inclusive, 3 nodes |

 **Real-world example from Spiceworks (2026)**: A user reports VMware Essentials+ increasing from $1,900/year to $14,000/year (VVF) — a 7.4× increase.

@@ -142,8 +184,9 @@ According to Foundry/CIO.com survey (2025): **56%** of organizations plan to red
 3. Select target platform (1-2 candidates)
   ├─ Proxmox: lowest TCO, Linux-heavy shops
   ├─ Nutanix: enterprise HCI, low migration difficulty
-   ├─ Hyper-V: Windows-centric, Azure hybrid
-   └─ OpenShift: Kubernetes-first, platform engineering
+    ├─ Hyper-V: Windows-centric, Azure hybrid
+    ├─ Sangfor: HCI all-in-one, security-first, VMware exit (SMB/mid-market)
+    └─ OpenShift: Kubernetes-first, platform engineering

 4. Plan migration phases
   ├─ Wave 1: non-critical (dev/test, 1-2 months)
@@ -269,9 +312,71 @@ Hardware ──> QEMU (I/O emulation) + KVM (kernel module, virtualization)
 - Load KVM modules: `kvm`, `kvm_intel`/`kvm_amd`, `vfio-pci`
 - Optimize storage: raw/LVM (avoid qcow2 for performance workloads)

+## Sangfor aSV (HCI)
+
+[Chinese vendor](https://www.sangfor.com) — KVM-based hypervisor, part of Sangfor HCI stack (aSV + aSAN + aNet + aSEC). Distributed through partners in EMEA.
+
+### Stack architecture
+
+| Component | Role |
+|-----------|------|
+| **aSV** | Hypervisor (KVM-based) |
+| **aSAN** | Distributed SDS (locality-aware, data tiering, dedup, compression) |
+| **aNet** | Network virtualization (distributed switches and routers, WYDIWYG visual editor) |
+| **aSEC** | Security (NGFW, IPS, WAF, EDR, east-west segmentation) |
+| **Sangfor Cloud Platform** | Management orchestrator, unified dashboard |
+
+### Key features
+
+| Feature | Detail |
+|-----------|--------|
+| **Hypervisor** | KVM (aSV) — custom fork with HCI extensions |
+| **License** | Enterprise Pro — per node, all-inclusive (compute + storage + network + security) |
+| **Min. cluster** | 3 nodes (3 data copies) |
+| **Live Migration** | Yes |
+| **HA** | Built-in HA |
+| **Storage** | aSAN — locality-aware, data tiering (SSD + HDD), dedup, compression, erasure coding |
+| **Backup** | Built-in backup + CDP — no 3rd party needed |
+| **Security** | Integrated NGFW, IPS, WAF, EDR — no external appliances |
+| **VDI** | aDesk — integrated VDI solution |
+| **Kubernetes** | SKE (Sangfor Kubernetes Engine) |
+| **Migration** | Sangfor VMware Import Tool (from vCenter), qemu-img for others |
+| **vGPU** | Standard support (no extra license) |
+
+### Comparison with VMware
+
+| Feature | Sangfor | VMware |
+|---------|---------|--------|
+| **License** | Per node, all-inclusive | Multi-tier (vSphere + vSAN + NSX + Aria) |
+| **vGPU** | Included (standard) | Enterprise Plus only |
+| **Backup + CDP** | Built-in | 3rd party or extra license |
+| **Security (NGFW, IPS, WAF)** | Built-in (aSEC) | NSX + 3rd party |
+| **Network management** | WYDIWYG visual editor | NSX Manager (more complex) |
+| **Min. cluster (3 copies)** | 3 nodes | 5 nodes (vSAN) |
+| **Data locality** | Yes | No |
+| **SSD life prediction** | Yes | No |
+
+### Use case
+
+- **VMware exit** — VMware replacement for SMB and mid-market
+- **Greenfield HCI** — new DCs, branch offices, remote sites
+- **VDI** — aDesk integrated with HCI
+- **Security-first** — organizations requiring integrated security
+- **Asia-Pacific / EMEA** — strongest in Asia, expanding to Europe
+
+### Risks and limitations
+
+| Risk | Detail |
+|--------|--------|
+| **Geopolitical** | Chinese vendor — possible regulatory restrictions (GDPR, EU, NATO, government) |
+| **Ecosystem** | Smaller community than VMware/Proxmox, less documentation and ISV certifications |
+| **Support** | Primary support from Asia, local partner critical |
+| **Vendor lock-in** | Closed ecosystem (aSV + aSAN + aNet + aSEC), harder to mix with 3rd party |
+| **References in CZ/EU** | Very limited — pilot required before production |
+
 ## Storage in Hypervisors

-See also: [STORAGE.md](STORAGE.md) — detailed overview of storage protocols and configurations.
+See also: [STORAGE.en.md](STORAGE.en.md) — detailed overview of storage protocols and configurations.

 | Type | Description | Protocols |
 |-----|-------|-----------|
@@ -443,7 +548,7 @@ For telco, large private clouds, MANO/NFVI environments.

 ## Resources

-Links, books and standards: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
+Links, books and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)

 ### Recommended Reading

--- a/HYPERVISORS.md
+++ b/HYPERVISORS.md
@@ -86,20 +86,22 @@ Dle Foundry/CIO.com průzkumu (2025): **56 %** organizací plánuje snížit vyu

 #### Cílové platformy — srovnání

-| Kritérium | Proxmox VE | Nutanix AHV | Microsoft Hyper-V | Red Hat OpenShift Virtualization |
-|-----------|-----------|-------------|-------------------|----------------------------------|
-| **Hypervisor** | KVM + LXC | KVM (fork) | Hyper-V | KVM (KubeVirt) |
-| **Licence** | Open source (free), support ~€500/host/rok | Per node subscription (30–60 % savings oproti VCF) | Windows Server license (Standard/Datacenter) | OpenShift subscription (core-based) |
-| **Live Migration** | Live Migration (Proxmox 8+) | AHV Live Migration | Live Migration (SMB/RDMA) | KubeVirt (VMI live migration) |
-| **HA** | Proxmox HA (watchdog, fencing) | Built-in HA (Prism) | Hyper-V HA (WS Failover Cluster) | OpenShift HA (self-healing) |
-| **Storage** | ZFS, Ceph, LVM | AOS (hybrid/SSD, erasure coding) | S2D, CSV, ReFS | OCS, Ceph, LSO |
-| **Backup** | Proxmox Backup Server (free) | Native snapshot + DR | Windows Server Backup / Veeam | OpenShift APIs + OADP |
-| **Cena (3 roky, 3 hosty)** | $0 + support $1 500 | ~$45 000–60 000 | $0 (Hyper-V Server zdarma) nebo Windows Server lic. | ~$90 000+ (OpenShift) |
-| **Cena (3 roky, 10 hostů)** | $0 + support $5 000 | ~$150 000–200 000 | Windows Server Datacenter pro neomezené VM | ~$300 000+ (OpenShift) |
-| **Náročnost migrace** | Střední (VMDK → QCOW2, VirtIO drivery) | Nízká (Nutanix Move tool) | Střední (V2V converter, SCVMM) | Vysoká (Kubernetes learning curve) |
-| **Linux podpora** | Výborná (nativní KVM) | Výborná (KVM-based) | Dobrá (LIS drivers) | Výborná (KVM + OpenShift) |
-| **Windows podpora** | Dobrá (VirtIO drivers) | Výborná (ALAS drivers, svpd) | Výborná (nativní) | Dobrá (KubeVirt + VirtIO) |
-| **GPU passthrough** | VFIO (výborná) | GPU passthrough | DDA (Direct Device Assignment) | VFIO + GPU Operator |
+| Kritérium | Proxmox VE | Nutanix AHV | Microsoft Hyper-V | Red Hat OpenShift Virtualization | **Sangfor aSV (HCI)** |
+|-----------|-----------|-------------|-------------------|----------------------------------|----------------------|
+| **Hypervisor** | KVM + LXC | KVM (fork) | Hyper-V | KVM (KubeVirt) | **KVM (aSV)** |
+| **Licence** | Open source (free), support ~€500/host/rok | Per node subscription (30–60 % savings oproti VCF) | Windows Server license (Standard/Datacenter) | OpenShift subscription (core-based) | **Per node (Enterprise Pro), vše v ceně** |
+| **Live Migration** | Live Migration (Proxmox 8+) | AHV Live Migration | Live Migration (SMB/RDMA) | KubeVirt (VMI live migration) | **Ano** |
+| **HA** | Proxmox HA (watchdog, fencing) | Built-in HA (Prism) | Hyper-V HA (WS Failover Cluster) | OpenShift HA (self-healing) | **Built-in HA** |
+| **Storage** | ZFS, Ceph, LVM | AOS (hybrid/SSD, erasure coding) | S2D, CSV, ReFS | OCS, Ceph, LSO | **aSAN (distribuovaný SDS, locality-aware)** |
+| **Backup** | Proxmox Backup Server (free) | Native snapshot + DR | Windows Server Backup / Veeam | OpenShift APIs + OADP | **Built-in backup + CDP (Continuous Data Protection)** |
+| **Cena (3 roky, 3 hosty)** | $0 + support $1 500 | ~$45 000–60 000 | $0 (Hyper-V Server zdarma) nebo Windows Server lic. | ~$90 000+ (OpenShift) | **~$15 000–25 000** |
+| **Cena (3 roky, 10 hostů)** | $0 + support $5 000 | ~$150 000–200 000 | Windows Server Datacenter pro neomezené VM | ~$300 000+ (OpenShift) | **~$50 000–80 000** |
+| **Náročnost migrace** | Střední (VMDK → QCOW2, VirtIO drivery) | Nízká (Nutanix Move tool) | Střední (V2V converter, SCVMM) | Vysoká (Kubernetes learning curve) | **Nízká (nástroje pro VMware import)** |
+| **Linux podpora** | Výborná (nativní KVM) | Výborná (KVM-based) | Dobrá (LIS drivers) | Výborná (KVM + OpenShift) | **Výborná (KVM-based)** |
+| **Windows podpora** | Dobrá (VirtIO drivers) | Výborná (ALAS drivers, svpd) | Výborná (nativní) | Dobrá (KubeVirt + VirtIO) | **Dobrá (VirtIO drivers)** |
+| **GPU passthrough** | VFIO (výborná) | GPU passthrough | DDA (Direct Device Assignment) | VFIO + GPU Operator | **vGPU support (standard)** |
+| **Integrovaná bezpečnost** | — | — | — | — | **Ano (NGFW, IPS, WAF, EDR — aSEC)** |
+| **Min. cluster (3 kopie)** | 3 (Ceph) | 3 | 2–3 | 3 | **3** |

 #### Migrační nástroje

@@ -112,6 +114,47 @@ Dle Foundry/CIO.com průzkumu (2025): **56 %** organizací plánuje snížit vyu
 | **virt-v2v** | VMware ESXi, Xen, Hyper-V | KVM (libvirt) | Open source CLI nástroj, konverze disků + driverů (virtio), vhodný pro hromadnou migraci |
 | **Windows Admin Center VM Conversion Extension** | VMware ESXi | Hyper-V | Microsoft WAC extension, free, GUI-based, hromadná migrace |
 | **Platform9 vJailbreak** | VMware ESXi | OpenStack / KVM | In-place migration (bez swing gear), open source |
+| **Sangfor VMware Import Tool** | VMware ESXi | Sangfor aSV (HCI) | Nástroj pro import VM z vCenter, konverze disků + driverů, možnost retain network config |
+
+#### Matice migrací napříč hypervisory
+
+Komplexní přehled všech dvojic zdroj → cíl s metodami, nástroji, omezeními a obtížností.
+
+| Zdroj → Cíl | Metoda | Nástroje | Obtížnost | Omezení |
+|-------------|--------|----------|-----------|---------|
+| **VMware → Proxmox** | Disk konverze VMDK→QCOW2, reinstalace driverů | Proxmox Import Wizard, Veeam, StarWind, virt-v2v | Střední | Nutné VirtIO drivery, UEFI nepodporováno v Import Wizard (< 8.1), nutno odstranit snapshoty |
+| **VMware → Hyper-V** | Disk konverze VMDK→VHDX, reinstalace driverů | StarWind, WAC Converter, SCVMM, Microsoft MTC | Střední | Integration Services nutné, rozdíly v síťové konfiguraci (VMXNET3 → Hyper-V Synthetic) |
+| **VMware → KVM/XCP-ng** | Disk konverze VMDK→raw/QCOW2, driver swap | virt-v2v, StarWind | Střední | VirtIO drivers, UEFI support (OVMF), host passthrough musí být kompatibilní |
+| **VMware → Nutanix AHV** | Automatizovaná migrace přes Move appliance | Nutanix Move, Veeam | Nízká | AHV je také KVM – minimální problémy, retain IP/MAC, podpora UEFI |
+| **VMware → Sangfor aSV** | Import přes VMware Import Tool, konverze disků + driverů | Sangfor VMware Import Tool | Nízká | Built-in nástroj, retain network config, support UEFI |
+| **VMware → OpenStack** | In-place nebo swing | Platform9 vJailbreak, virt-v2v + Glance | Vysoká | Nutný redesign networking (Neutron), storage (Cinder), image format (Glance) |
+| **Hyper-V → VMware** | Disk konverze VHDX→VMDK, reinstalace driverů | StarWind, virt-v2v, VMware vCenter Converter (standalone) | Střední | VMware Tools nutné, síťový driver change (VMXNET3), UEFI/secure boot issues |
+| **Hyper-V → Proxmox** | Disk konverze VHDX→QCOW2, driver swap | StarWind, virt-v2v, qemu-img | Střední–Vysoká | VirtIO drivers, integration services → guest agent, secure boot issues |
+| **Hyper-V → KVM/XCP-ng** | Disk konverze VHDX→raw/QCOW2 | virt-v2v, qemu-img | Střední | VirtIO drivers, Linux generické drivery obvykle fungují |
+| **Hyper-V → Nutanix AHV** | Automatizovaná migrace | Nutanix Move | Nízká–Střední | Obdobné jako VMware→Nutanix, support UEFI, retain IP |
+| **Proxmox → VMware** | Export OVF/OVA, qemu-img convert | qemu-img (QCOW2→VMDK), ovftool, manuální OVF export | Vysoká | VMware Tools nutné, rozdíly v storage formátech, bez live migration, nutný downtime |
+| **Proxmox → Hyper-V** | qemu-img convert, reinstalace driverů | qemu-img, manuální VHDX konverze | Vysoká | Hyper-V Integration Services nutné, žádný automatizovaný nástroj, edge case |
+| **Proxmox → KVM/XCP-ng** | Direct QCOW2 (stejný formát), úprava XML | libvirt, virsh dumpxml/define | Střední | Rozdíly v libvirt XML/QEMU args (storage pool, síť), nutná validace |
+| **Proxmox → Nutanix AHV** | qemu-img + manuální import | qemu-img, Nutanix Image Service CLI | Vysoká | Žádný hot nástroj, nutná konverze + manuální rekonfigurace VM |
+| **XCP-ng → VMware** | Disk konverze VHD→VMDK | qemu-img, StarWind, virt-v2v | Vysoká | VMware Tools nutné, rozdíly v paravirtualizaci (Xen PV vs VMware) |
+| **XCP-ng → Proxmox** | Disk konverze nebo direct VHD | qemu-img, manuální import | Střední | Konverze disků, formát VHD není nativní v Proxmox |
+| **XCP-ng → Hyper-V** | Disk konverze VHD→VHDX (přímá) | StarWind, qemu-img | Střední | VHD/VHDX kompatibilní, nutné Integration Services |
+| **Nutanix AHV → VMware** | Export + konverze | qemu-img, Nutanix Export, VMware vCenter Converter | Vysoká | VMware Tools, AHV je KVM → obvykle jednodušší než Hyper-V→VMware |
+| **Nutanix AHV → Proxmox** | qemu-img + manuální import | qemu-img, Nutanix self-service restore | Střední | Disky z AFS → QCOW2, metadata nutno rekonstruovat |
+| **Nutanix AHV → Hyper-V** | qemu-img + manuální | qemu-img, StarWind | Vysoká | Edge case, žádný hot nástroj |
+| **OpenStack → (any)** | Glance export + qemu-img | glance image-download, qemu-img, ovftool | Střední–Vysoká | Image formát (raw/QCOW2), metadata (flavor, security groups) nutno znovu vytvořit |
+| **Sangfor aSV → (any)** | qemu-img konverze + manuální | qemu-img, manuální OVF/OVA export | Střední–Vysoká | KVM-based → konverze do QCOW2/VMDK/VHDX přes qemu-img, metadata nutno znovu vytvořit |
+| **(any) → Sangfor aSV** | aSV API import + VMware Import Tool | Sangfor VMware Import Tool (pro VMware), manuální qemu-img import pro ostatní | Střední | KVM-based → podpora standardních formátů, import tool jen pro VMware |
+
+**Klíče k úspěšné migraci:**
+
+- **Drivery** — každá platforma vyžaduje vlastní paravirtual drivers (VMware Tools, VirtIO, Hyper-V Integration Services, Xen Tools). Po migraci vždy vyměnit.
+- **UEFI / Secure Boot** — ne všechny kombinace podporují UEFI (Proxmox Import Wizard < 8.1 nepodporuje). Při migraci UEFI VM raději testovat.
+- **Snapshoty** — snapshots musí být před migrací odstraněny (sloučeny). Většina nástrojů migruje jen flat disky.
+- **Síť** — MAC adresy, IP adresy, VLAN tagging — po migraci zkontrolovat. Některé nástroje (Nutanix Move, VMware Converter) umí retain MAC.
+- **Storage format** — VMDK ↔ VHDX ↔ QCOW2 ↔ raw jsou vzájemně konvertovatelné přes `qemu-img`, ale liší se v metadatech (snapshots, backing files).
+- **Live migration** — mezi různými hypervisory neexistuje live migration. Vždy je potřeba downtime (minuty až hodiny podle velikosti VM).
+- **Teplota migrace** — čím "chladnější" VM (méně změn), tím snazší migrace. Aplikace s databází v reálném čase vyžadují samostatný DB migrační plán.

 #### TCO srovnání — příklad: 3 hosty (2× 20C CPU), 50 VM

@@ -123,6 +166,7 @@ Dle Foundry/CIO.com průzkumu (2025): **56 %** organizací plánuje snížit vyu
 | **Nutanix AHV** (průměr) | ~$18 000 | ~$54 000 | Per node subscription, odhad |
 | **Hyper-V** (Windows Server Datacenter) | $12 400 | $37 200 | Jednorázová licence per core, bez SA |
 | **Hyper-V** (Azure Stack HCI) | ~$7 200 | ~$21 600 | ~$10/core/měsíc, 120 cores |
+| **Sangfor HCI** (Enterprise Pro) | ~$5 000–8 000 | ~$15 000–25 000 | Per node, vše v ceně, 3 uzly |

 **Reálný příklad ze Spiceworks (2026)**: Uživatel hlásí navýšení VMware Essentials+ z $1 900/rok na $14 000/rok (VVF) — nárůst 7.4×.

@@ -142,8 +186,9 @@ Dle Foundry/CIO.com průzkumu (2025): **56 %** organizací plánuje snížit vyu
 3. Vyber cílovou platformu (1-2 kandidáty)
   ├─ Proxmox: nejnižší TCO, Linux-heavy shops
   ├─ Nutanix: enterprise HCI, nízká náročnost migrace
-   ├─ Hyper-V: Windows-centric, Azure hybrid
-   └─ OpenShift: Kubernetes-first, platform engineering
+    ├─ Hyper-V: Windows-centric, Azure hybrid
+    ├─ Sangfor: HCI all-in-one, security-first, VMware exit (SMB/mid-market)
+    └─ OpenShift: Kubernetes-first, platform engineering

 4. Naplánuj migrační fáze
   ├─ Wave 1: non-critical (dev/test, 1-2 měsíce)
@@ -269,6 +314,72 @@ Hardware ──> QEMU (emulace I/O) + KVM (kernel module, virtualization)
 - Naložit KVM moduly: `kvm`, `kvm_intel`/`kvm_amd`, `vfio-pci`
 - Optimalizovat storage: raw/LVM (vyhnout se qcow2 u výkonových workloadů)

+## Sangfor aSV (HCI)
+
+[Čínský vendor](https://www.sangfor.com) — KVM-based hypervisor, součást Sangfor HCI stacku (aSV + aSAN + aNet + aSEC). V ČR distribuován přes partnery.
+
+### Architektura stacku
+
+| Komponenta | Role |
+|-----------|------|
+| **aSV** | Hypervisor (KVM-based) |
+| **aSAN** | Distributed SDS (locality-aware, data tiering, dedup, compression) |
+| **aNet** | Network virtualization (distribuované switche a routery, WYDIWYG vizuální editor) |
+| **aSEC** | Bezpečnost (NGFW, IPS, WAF, EDR, east-west segmentation) |
+| **Sangfor Cloud Platform** | Management orchestrator, unified dashboard |
+
+### Klíčové vlastnosti
+
+| Vlastnost | Detail |
+|-----------|--------|
+| **Hypervisor** | KVM (aSV) — vlastní fork s rozšířeními pro HCI |
+| **Licence** | Enterprise Pro — per node, vše v ceně (compute + storage + network + security) |
+| **Min. cluster** | 3 uzly (3 kopie dat) |
+| **Live Migration** | Ano |
+| **HA** | Built-in HA |
+| **Storage** | aSAN — locality-aware (data locality), data tiering (SSD + HDD), dedup, compression, erasure coding |
+| **Backup** | Built-in backup + CDP (Continuous Data Protection) — bez nutnosti 3rd party |
+| **Security** | Integrated NGFW, IPS, WAF, EDR — bez externích appliance |
+| **VDI** | aDesk — integrované VDI řešení |
+| **Kubernetes** | SKE (Sangfor Kubernetes Engine) |
+| **Migrace** | Sangfor VMware Import Tool (z vCenter), qemu-img pro ostatní |
+| **vGPU** | Standardní podpora (bez extra licence) |
+
+### Srovnání s VMware
+
+| Feature | Sangfor | VMware |
+|---------|---------|--------|
+| **Licence** | Per node, vše v ceně | Vícestupňová (vSphere + vSAN + NSX + Aria) |
+| **vGPU** | V ceně (standard) | Jen v Enterprise Plus |
+| **Backup + CDP** | Built-in | 3rd party nebo extra licence |
+| **Security (NGFW, IPS, WAF)** | Built-in (aSEC) | NSX + 3rd party (Palo Alto, Check Point) |
+| **Network management** | WYDIWYG vizuální editor | NSX Manager (složitější) |
+| **Min. cluster (3 kopie)** | 3 uzly | 5 uzlů (vSAN) |
+| **Data locality** | Ano | Ne |
+| **SSD life prediction** | Ano | Ne |
+
+### Use case
+
+- **VMware exit** — náhrada za VMware v SMB a mid-market
+- **Greenfield HCI** — nové DC, branch offices, remote sites
+- **VDI** — aDesk integrovaný s HCI
+- **Security-first** — organizace vyžadující integrovanou bezpečnost (NGFW, IPS, WAF)
+- **Asie-Pacific / EMEA** — nejsilnější v Asii, expanding do Evropy
+
+### Rizika a omezení
+
+| Riziko | Detail |
+|--------|--------|
+| **Geopolitické** | Čínský vendor — možné regulatory restrictions (GDPR, EU, NATO, government) |
+| **Ekosystém** | Menší komunita než VMware/Proxmox, méně dokumentace a ISV certifikací |
+| **Support** | Support primárně z Asie, lokální partner kritický |
+| **Vendor lock-in** | Uzavřený ekosystém (aSV + aSAN + aNet + aSEC), těžší mix s 3rd party |
+| **Reference v ČR** | Velmi omezené — nutný pilot před produkcí |
+
+### Migrace na/z Sangfor
+
+Viz matice migrací výše v této sekci. Pro VMware → Sangfor existuje dedikovaný import nástroj. Pro ostatní hypervisory standardní qemu-img.
+
 ## Storage v hypervizorech

 Viz také: [STORAGE.md](STORAGE.md) — detailní přehled storage protokolů a konfigurací.
--- a/INFRASTRUCTURE.en.md
+++ b/INFRASTRUCTURE.en.md
@@ -4,9 +4,9 @@ This file has been split into separate areas:

 | Area | File |
 |--------|--------|
-| 🖥️ Hypervisors and virtualization | [HYPERVISORS.md](HYPERVISORS.md) |
-| 🏭 Data centers | [DATACENTERS.md](DATACENTERS.md) |
-| 💾 Storage | [STORAGE.md](STORAGE.md) |
-| 🔧 Hardware and servers | [HARDWARE.md](HARDWARE.md) |
+| 🖥️ Hypervisors and virtualization | [HYPERVISORS.en.md](HYPERVISORS.en.md) |
+| 🏭 Data centers | [DATACENTERS.en.md](DATACENTERS.en.md) |
+| 💾 Storage | [STORAGE.en.md](STORAGE.en.md) |
+| 🔧 Hardware and servers | [HARDWARE.en.md](HARDWARE.en.md) |

 *Last revision: 2026-06-03*
--- a/KUBERNETES.en.md
+++ b/KUBERNETES.en.md
@@ -0,0 +1,299 @@
+# ☸ Kubernetes — architecture, platforms, Cluster API
+
+## Overview
+
+Kubernetes (K8s) is an open-source container orchestrator — the de facto standard for deploying, scaling, and managing containerized applications. Built on declarative configuration and control loops (reconciliation).
+
+## Kubernetes deployment methods
+
+| Method | Description | Control plane | Best for |
+|--------|-------------|--------------|----------|
+| **kubeadm** | Official K8s cluster bootstrap tool | Self-managed (stacked/external etcd) | On-prem, lab, learning |
+| **K3s** | Lightweight K8s (Rancher), single binary, embedded etcd/SQLite | Self-managed | Edge, IoT, low-resource, HA with embedded etcd |
+| **RKE2** | Rancher Kubernetes Engine 2, CIS-hardened, FIPS-ready | Self-managed | Enterprise on-prem, air-gapped, regulatory |
+| **OpenShift** | Red Hat enterprise K8s + operator lifecycle + SDN + routing | Self-managed (RHCOS) | Enterprise, multicluster, platform engineering |
+| **Vanilla K8s (CAPI)** | Cluster API — declarative provisioning and lifecycle management | Self-managed (CAPI managed) | Fleet management, GitOps, multi-provider |
+| **EKS** (AWS) | Managed K8s | AWS managed | AWS cloud-native, least ops |
+| **AKS** (Azure) | Managed K8s | Azure managed | Azure cloud-native |
+| **GKE** (GCP) | Managed K8s, auto-pilot, autopilot modes | GCP managed | GCP cloud-native |
+| **SKE** (Sangfor) | Managed K8s on Sangfor HCI | Vendor managed | Sangfor HCI ecosystem |
+
+---
+
+## Cluster API (CAPI)
+
+### What is Cluster API
+
+Cluster API is a Kubernetes sub-project (SIG Cluster-Lifecycle) that brings declarative APIs for provisioning, upgrading, and operating Kubernetes clusters. Instead of Terraform scripts or manual `kubeadm`, you define clusters as Kubernetes Custom Resources — `Cluster`, `Machine`, `MachineDeployment`, etc.
+
+Core principle: **A Kubernetes cluster that manages Kubernetes clusters.**
+
+### Architecture
+
+```
+┌─────────────────────────────────────────┐
+│           Management Cluster            │
+│                                         │
+│  ┌──────────────────────────────────┐   │
+│  │        CAPI Controllers          │   │
+│  │  ┌──────┐ ┌──────┐ ┌─────────┐  │   │
+│  │  │ Infra│ │Bootstrap│ │Control  │  │   │
+│  │  │ Prov │ │ Prov   │ │Plane Pr │  │   │
+│  │  └──────┘ └──────┘ └─────────┘  │   │
+│  └──────────────────────────────────┘   │
+│                                         │
+│  CR: Cluster, Machine, MachineDeployment│
+│  ...                                    │
+└────────────────┬────────────────────────┘
+                 │ CAPI controller
+                 │ creates / manages
+        ┌────────┴────────┐
+        ▼                 ▼
+┌───────────────┐  ┌───────────────┐
+│ Workload      │  │ Workload      │
+│ Cluster (dev) │  │ Cluster (prod)│
+│ ┌───┐ ┌───┐   │  │ ┌───┐ ┌───┐   │
+│ │ CP│ │ W │   │  │ │ CP│ │ W │   │
+│ └───┘ └───┘   │  │ └───┘ └───┘   │
+└───────────────┘  └───────────────┘
+```
+
+- **Management cluster** — a Kubernetes cluster running CAPI controllers. Can be a dedicated small admin cluster.
+- **Workload (managed) cluster** — Kubernetes clusters managed by CAPI; each is a CRD inside the management cluster.
+- **Machine** — abstraction of a compute unit (VM, bare metal) that becomes a K8s node.
+
+### Key CRDs (Custom Resource Definitions)
+
+| CRD | API group | Purpose |
+|-----|-----------|---------|
+| **Cluster** | `cluster.x-k8s.io` | Cluster representation (infra ref, control plane ref, networking) |
+| **Machine** | `cluster.x-k8s.io` | Individual node (VM/BM instance) |
+| **MachineDeployment** | `cluster.x-k8s.io` | Declarative scaling and rolling update of workers |
+| **MachineSet** | `cluster.x-k8s.io` | Replica set for Machines (lower-level) |
+| **MachineHealthCheck** | `cluster.x-k8s.io` | Auto-remediation (replace unhealthy nodes) |
+| **ClusterClass** | `cluster.x-k8s.io` | Cluster template for reuse |
+| **KubeadmControlPlane** | `controlplane.cluster.x-k8s.io` | Kubeadm-managed control plane (stacked/external etcd) |
+| **KubeadmConfig / KubeadmConfigTemplate** | `bootstrap.cluster.x-k8s.io` | Bootstrap configuration (kubeadm init/join) |
+
+### Provider model
+
+CAPI uses a three-layer provider model:
+
+#### 1. Infrastructure Provider
+Creates and manages infrastructure (VM, networks, LB, storage).
+
+| Provider | Platform | Status |
+|----------|----------|--------|
+| **AWS (CAPA)** | AWS EC2, VPC, ELB, EKS | Stable, SIG-sponsored |
+| **Azure (CAPZ)** | Azure VM, VNet, LB, AKS | Stable, SIG-sponsored |
+| **GCP (CAPG)** | GCP Compute, VPC, GKE | Beta |
+| **vSphere (CAPV)** | VMware vSphere | Stable |
+| **OpenStack (CAPO)** | OpenStack compute/network | Stable |
+| **Metal3** | Bare metal (Ironic) | Stable |
+| **Docker (CAPD)** | Docker containers (development) | Tilt/Dev only |
+| **Akamai (Linode)** | Linode | Community |
+| **Azure Stack HCI** | Azure Stack HCI | Community |
+| **cloudscale** | cloudscale.ch | Community |
+| **Exoscale** | Exoscale | Community |
+| **IBM Cloud** | IBM Cloud | Community |
+| **Equinix Metal** | Equinix (ex Packet) | Community |
+| **Hetzner** | Hetzner Cloud | Community |
+| **OpenNebula** | OpenNebula | Community |
+
+#### 2. Bootstrap Provider
+Handles K8s initialization on a node (kubeadm init/join, TLS certs, tokens).
+
+| Provider | Description |
+|----------|-------------|
+| **Kubeadm** (built-in) | Standard kubeadm init/join, supports stacked/external etcd |
+| **EKS** | Bootstrap for EKS managed control plane (AWS) |
+| **K3s** | Lightweight K8s bootstrap (edge, IoT) |
+| **RKE2** | Rancher K8s bootstrap, CIS-hardened |
+| **Talos** | API-driven bootstrap (Sidero Labs), immutable OS |
+| **k0smotron** | K0s-based bootstrap + hosted control plane |
+| **MicroK8s** | Canonical MicroK8s bootstrap |
+| **Canonical Kubernetes** | Canonical K8s (snap-based) |
+
+#### 3. Control Plane Provider
+Manages control plane nodes.
+
+| Provider | Description |
+|----------|-------------|
+| **KubeadmControlPlane** (built-in) | Kubeadm-managed CP, stacked/external etcd |
+| **EKS** | AWS EKS managed control plane |
+| **Kamaji** | Hosted control plane (CP runs as deployment in management cluster) |
+| **K3s** | K3s control plane (edge-optimized) |
+| **RKE2** | RKE2 control plane |
+| **Talos** | Talos control plane, API-based management |
+| **k0smotron** | Hosted control plane (k0s-based) |
+| **Nested** | Nested virtualization control plane |
+
+### ClusterClass and Managed Topologies
+
+ClusterClass (stable since CAPI v1beta1, CAPI v1.0+) allows defining a **cluster template**:
+
+```yaml
+apiVersion: cluster.x-k8s.io/v1beta1
+kind: ClusterClass
+metadata:
+  name: standard-aws-cluster
+spec:
+  controlPlane:
+    ref:
+      apiVersion: controlplane.cluster.x-k8s.io/v1beta1
+      kind: KubeadmControlPlaneTemplate
+      name: aws-cp-tmpl
+    machineInfrastructure:
+      ref:
+        kind: AWSMachineTemplate
+        apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
+        name: aws-cp-machine-tmpl
+  workers:
+    machineDeployments:
+    - class: default-worker
+      template:
+        bootstrap:
+          ref:
+            apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
+            kind: KubeadmConfigTemplate
+            name: aws-worker-bootstrap-tmpl
+        infrastructure:
+          ref:
+            apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
+            kind: AWSMachineTemplate
+            name: aws-worker-machine-tmpl
+  variables:
+    - name: instanceType
+      required: true
+      schema:
+        openAPIV3Schema:
+          type: string
+          enum: ["t3.large", "m5.large", "m5.xlarge"]
+```
+
+Then create a cluster with variable overrides:
+
+```yaml
+apiVersion: cluster.x-k8s.io/v1beta1
+kind: Cluster
+metadata:
+  name: dev-team-alpha
+  namespace: clusters
+spec:
+  topology:
+    class: standard-aws-cluster
+    version: v1.30.2
+    controlPlane:
+      replicas: 1
+    workers:
+      machineDeployments:
+      - class: default-worker
+        name: md-0
+        replicas: 2
+    variables:
+      - name: instanceType
+        value: "m5.xlarge"
+```
+
+### Cluster lifecycle with CAPI
+
+| Phase | Action | CAPI mechanism |
+|-------|--------|----------------|
+| **Create** | `kubectl apply -f cluster.yaml` | Controller creates infra (VM, network), runs kubeadm init/join bootstrap |
+| **Scale** | Update `replicas` in MachineDeployment | Controller creates/removes Machine → VM → node join/drain |
+| **Upgrade** | Change `version` in KubeadmControlPlane / MachineDeployment | Rolling update: new CP node → upgrade → old drain & delete. Workers: MachineDeployment rolling update |
+| **Health check** | MachineHealthCheck | If node unhealthy > timeout, controller creates replacement Machine |
+| **Delete** | `kubectl delete cluster` | Controller drains, deletes VMs, cleans up infrastructure |
+| **Template update** | Change AWSMachineTemplate / KubeadmConfigTemplate | New Machines use the new template; existing Machines only affected via rolling update |
+
+### Auto-remediation (MachineHealthCheck)
+
+```yaml
+apiVersion: cluster.x-k8s.io/v1beta1
+kind: MachineHealthCheck
+metadata:
+  name: prod-mhc
+  namespace: clusters
+spec:
+  clusterName: prod-us-east
+  selector:
+    matchLabels:
+      cluster.x-k8s.io/deployment-name: prod-us-east-workers
+  unhealthyConditions:
+  - type: Ready
+    status: "False"
+    timeout: 5m
+  - type: Ready
+    status: Unknown
+    timeout: 5m
+  maxUnhealthy: "40%"
+  nodeStartupTimeout: 10m
+```
+
+### CAPI + GitOps
+
+CAPI integrates naturally with GitOps:
+
+- **ArgoCD** — Cluster and MachineDeployment manifests in Git repo, ArgoCD applies them to the management cluster
+- **Flux** — `Kustomization` + `OCIRepository` for CAPI objects
+- **Crossplane** — can be combined: Crossplane provisions cloud resources (VPC, subnets), CAPI manages K8s clusters on top
+
+Pattern: a dedicated "fleet management" cluster running CAPI + ArgoCD. All workload clusters are defined as YAML in Git.
+
+### CAPI for on-prem
+
+| Provider | Use case | Note |
+|----------|----------|------|
+| **Metal3** (Ironic) | Bare metal provisioning (PXE, IPMI, Redfish) | Automatically provisions BM servers as K8s nodes |
+| **CAPV (vSphere)** | VMware VMs as K8s nodes | Most common enterprise on-prem |
+| **CAPO (OpenStack)** | OpenStack VMs as K8s nodes | OpenStack-native |
+| **Nutanix (CAPNX)** | Nutanix AHV/Prism | Community provider |
+
+### CAPI for edge
+
+| Provider | Use case | Note |
+|----------|----------|------|
+| **K3s bootstrap + control plane** | Lightweight K8s on edge devices | Single binary, SQLite/embedded etcd |
+| **RKE2 bootstrap + control plane** | Enterprise edge, air-gapped | CIS-hardened, FIPS |
+| **Talos** | Immutable OS, API-driven | Minimal footprint, no SSH |
+| **k0smotron** | Hosted control plane for edge clusters | CP runs in management cluster, worker on edge |
+
+### CAPI vs alternatives
+
+| Tool | Approach | CAPI advantage | CAPI disadvantage |
+|------|----------|----------------|-------------------|
+| **Terraform/Pulumi** | Imperative/declarative IaC | CAPI is K8s-native — same tool for apps and clusters; GitOps ready | Terraform has broader non-K8s resource support |
+| **kubeadm** | Manual or scripted | CAPI automates full lifecycle including upgrades and remediation | Higher complexity, requires management cluster |
+| **Rancher** | Web UI + API for K8s cluster management | CAPI is open-source, vendor-neutral | Rancher has GUI, monitoring, app catalog |
+| **OpenShift Hive/ACM** | Red Hat Advanced Cluster Management | CAPI is standard (SIG) — wider provider ecosystem | ACM has governance, policy, compliance |
+
+### Limitations and maturity
+
+- **Management cluster is SPOF** — needs its own HA and backup (etcd snapshots, certificates)
+- **CAPI is not a cluster autoscaler** — it handles cluster lifecycle, not pod auto-scaling within a cluster (use Cluster Autoscaler separately)
+- **Provider maturity varies** — AWS/Azure/vSphere stable, GCP/OpenStack beta, some community providers alpha
+- **etcd backup is not built-in** — must be handled externally (Velero, etcd snapshot)
+- **CAPI does not handle applications** — only K8s cluster lifecycle (monitoring, logging, ingress is user-managed)
+- **Learning curve** — requires understanding management cluster, provider model, CRDs
+- **CAPI v1.13+ (2026)** — stable release, v1beta1 API is GA, ClusterClass stable, EKS/AKS/GKE managed control plane support
+
+### Recommended production CAPI stack
+
+| Component | Recommendation |
+|-----------|---------------|
+| **Management cluster** | K3s (small footprint) or kubeadm (3 nodes HA) |
+| **Infra provider** | CAPA (AWS) / CAPV (vSphere) / CAPO (OpenStack) — based on platform |
+| **Bootstrap/CP provider** | Kubeadm or RKE2 |
+| **GitOps** | ArgoCD or Flux |
+| **Backup** | Velero + restic/Ceph |
+| **Cluster autoscaler** | Cluster Autoscaler (via CAPI integration) |
+| **Network** | Cilium (CAPI-native, support) |
+| **Secrets** | External Secrets Operator / Sealed Secrets |
+| **Monitoring** | Prometheus + Grafana (kube-prometheus-stack) |
+| **Ingress** | ingress-nginx / Kong / Traefik |
+
+## Sources
+
+Links, books and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)
+
+*Last revision: 2026-06-18*
--- a/KUBERNETES.md
+++ b/KUBERNETES.md
@@ -0,0 +1,299 @@
+# ☸ Kubernetes — architektura, platformy, Cluster API
+
+## Přehled
+
+Kubernetes (K8s) je open-source orchestrátor kontejnerů — de facto standard pro nasazování, škálování a správu containerizovaných aplikací. Postaven na modelu deklarativní konfigurace a control loopů (reconciliation).
+
+## Způsoby nasazení Kubernetes
+
+| Metoda | Popis | Správa control plane | Vhodné pro |
+|--------|-------|---------------------|------------|
+| **kubeadm** | Oficiální nástroj pro bootstrap K8s clusteru | Self-managed (stacked/external etcd) | On-prem, lab, learning |
+| **K3s** | Lightweight K8s (Rancher), single binary, embedded etcd/SQLite | Self-managed | Edge, IoT, low-resource, HA s embedded etcd |
+| **RKE2** | Rancher Kubernetes Engine 2, CIS-hardened, FIPS-ready | Self-managed | Enterprise on-prem, air-gapped, regulatory |
+| **OpenShift** | Red Hat enterprise K8s + operator lifecycle + SDN + routing | Self-managed (RHCOS) | Enterprise, multicluster, platform engineering |
+| **Vanilla K8s (CAPI)** | Cluster API — deklarativní provisioning a lifecycle management | Self-managed (CAPI managed) | Fleet management, GitOps, multi-provider |
+| **EKS** (AWS) | Managed K8s | AWS managed | AWS cloud-native, nejméně ops |
+| **AKS** (Azure) | Managed K8s | Azure managed | Azure cloud-native |
+| **GKE** (GCP) | Managed K8s, auto-pilot, autopilot modes | GCP managed | GCP cloud-native |
+| **SKE** (Sangfor) | Managed K8s on Sangfor HCI | Vendor managed | Sangfor HCI ekosystém |
+
+---
+
+## Cluster API (CAPI)
+
+### Co je Cluster API
+
+Cluster API je Kubernetes sub-projekt (SIG Cluster-Lifecycle), který přináší deklarativní API pro provisioning, upgrade a operace Kubernetes clusterů. Místo Terraform skriptů nebo manuálního `kubeadm` definujete cluster jako Kubernetes Custom Resources — `Cluster`, `Machine`, `MachineDeployment` atd.
+
+Princip: **Kubernetes cluster, který spravuje Kubernetes clustery.**
+
+### Architektura
+
+```
+┌─────────────────────────────────────────┐
+│           Management Cluster            │
+│                                         │
+│  ┌──────────────────────────────────┐   │
+│  │        CAPI Controllers          │   │
+│  │  ┌──────┐ ┌──────┐ ┌─────────┐  │   │
+│  │  │ Infra│ │Bootstrap│ │Control  │  │   │
+│  │  │ Prov │ │ Prov   │ │Plane Pr │  │   │
+│  │  └──────┘ └──────┘ └─────────┘  │   │
+│  └──────────────────────────────────┘   │
+│                                         │
+│  CR: Cluster, Machine, MachineDeployment│
+│  ...                                    │
+└────────────────┬────────────────────────┘
+                 │ CAPI controller
+                 │ vytváří / spravuje
+        ┌────────┴────────┐
+        ▼                 ▼
+┌───────────────┐  ┌───────────────┐
+│ Workload      │  │ Workload      │
+│ Cluster (dev) │  │ Cluster (prod)│
+│ ┌───┐ ┌───┐   │  │ ┌───┐ ┌───┐   │
+│ │ CP│ │ W │   │  │ │ CP│ │ W │   │
+│ └───┘ └───┘   │  │ └───┘ └───┘   │
+└───────────────┘  └───────────────┘
+```
+
+- **Management cluster** — Kubernetes cluster, kde běží CAPI controllery. Může to být vyhrazený "admin" cluster (často velmi malý).
+- **Workload (managed) cluster** — Kubernetes clustery, které CAPI spravuje. Každý je reprezentován jako CRD v management clusteru.
+- **Machine** — abstrakce compute jednotky (VM, bare metal), která se stane K8s uzlem.
+
+### Klíčové CRD (Custom Resource Definitions)
+
+| CRD | API skupina | Účel |
+|-----|------------|------|
+| **Cluster** | `cluster.x-k8s.io` | Reprezentace clusteru (infra reference, control plane ref, networking) |
+| **Machine** | `cluster.x-k8s.io` | Jednotlivý uzel (VM/BM instance) |
+| **MachineDeployment** | `cluster.x-k8s.io` | Deklarativní škálování a rolling update workerů |
+| **MachineSet** | `cluster.x-k8s.io` | Replica set pro Machiny (lower-level) |
+| **MachineHealthCheck** | `cluster.x-k8s.io` | Auto-remediaci (automatické nahrazení unhealthy uzlu) |
+| **ClusterClass** | `cluster.x-k8s.io` | Šablona pro vytváření clusterů |
+| **KubeadmControlPlane** | `controlplane.cluster.x-k8s.io` | Control plane managed kubeadm (stacked/external etcd) |
+| **KubeadmConfig / KubeadmConfigTemplate** | `bootstrap.cluster.x-k8s.io` | Bootstrap konfigurace (kubeadm init/join) |
+
+### Provider model
+
+CAPI používá třívrstvý provider model:
+
+#### 1. Infrastructure Provider
+Vytváří a spravuje infrastrukturu (VM, sítě, LB, storage).
+
+| Provider | Platforma | Status |
+|----------|-----------|--------|
+| **AWS (CAPA)** | AWS EC2, VPC, ELB, EKS | Stable, SIG-sponsored |
+| **Azure (CAPZ)** | Azure VM, VNet, LB, AKS | Stable, SIG-sponsored |
+| **GCP (CAPG)** | GCP Compute, VPC, GKE | Beta |
+| **vSphere (CAPV)** | VMware vSphere | Stable |
+| **OpenStack (CAPO)** | OpenStack compute/network | Stable |
+| **Metal3** | Bare metal (Ironic) | Stable |
+| **Docker (CAPD)** | Docker containers (development) | Tilt/Dev only |
+| **Akamai (Linode)** | Linode | Community |
+| **Azure Stack HCI** | Azure Stack HCI | Community |
+| **cloudscale** | cloudscale.ch | Community |
+| **Exoscale** | Exoscale | Community |
+| **IBM Cloud** | IBM Cloud | Community |
+| **Equinix Metal** | Equinix (ex Packet) | Community |
+| **Hetzner** | Hetzner Cloud | Community |
+| **OpenNebula** | OpenNebula | Community |
+
+#### 2. Bootstrap Provider
+Zajišťuje inicializaci K8s na node (kubeadm init/join, TLS certs, tokeny).
+
+| Provider | Popis |
+|----------|-------|
+| **Kubeadm** (vestavěný) | Standardní kubeadm init/join, podpora stacked/external etcd |
+| **EKS** | Bootstrap pro EKS managed control plane (AWS) |
+| **K3s** | Lightweight K8s bootstrap (edge, IoT) |
+| **RKE2** | Rancher K8s bootstrap, CIS-hardened |
+| **Talos** | API-driven bootstrap (Sidero Labs), immutable OS |
+| **k0smotron** | K0s-based bootstrap + hosted control plane |
+| **MicroK8s** | Canonical MicroK8s bootstrap |
+| **Canonical Kubernetes** | Canonical K8s (snap-based) |
+
+#### 3. Control Plane Provider
+Spravuje control plane uzly.
+
+| Provider | Popis |
+|----------|-------|
+| **KubeadmControlPlane** (vestavěný) | Kubeadm-managed CP, stacked/external etcd |
+| **EKS** | AWS EKS managed control plane |
+| **Kamaji** | Hosted control plane (CP běží jako deployment v management clusteru) |
+| **K3s** | K3s control plane (edge-optimized) |
+| **RKE2** | RKE2 control plane |
+| **Talos** | Talos control plane, API-based management |
+| **k0smotron** | Hosted control plane (k0s-based) |
+| **Nested** | Nested virtualization control plane |
+
+### ClusterClass a Managed Topologies
+
+ClusterClass (stabilní od CAPI v1beta1, CAPI v1.0+) umožňuje definovat **šablonu clusteru**:
+
+```yaml
+apiVersion: cluster.x-k8s.io/v1beta1
+kind: ClusterClass
+metadata:
+  name: standard-aws-cluster
+spec:
+  controlPlane:
+    ref:
+      apiVersion: controlplane.cluster.x-k8s.io/v1beta1
+      kind: KubeadmControlPlaneTemplate
+      name: aws-cp-tmpl
+    machineInfrastructure:
+      ref:
+        kind: AWSMachineTemplate
+        apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
+        name: aws-cp-machine-tmpl
+  workers:
+    machineDeployments:
+    - class: default-worker
+      template:
+        bootstrap:
+          ref:
+            apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
+            kind: KubeadmConfigTemplate
+            name: aws-worker-bootstrap-tmpl
+        infrastructure:
+          ref:
+            apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
+            kind: AWSMachineTemplate
+            name: aws-worker-machine-tmpl
+  variables:
+    - name: instanceType
+      required: true
+      schema:
+        openAPIV3Schema:
+          type: string
+          enum: ["t3.large", "m5.large", "m5.xlarge"]
+```
+
+Pak lze vytvořit cluster s přetížením proměnných:
+
+```yaml
+apiVersion: cluster.x-k8s.io/v1beta1
+kind: Cluster
+metadata:
+  name: dev-team-alpha
+  namespace: clusters
+spec:
+  topology:
+    class: standard-aws-cluster
+    version: v1.30.2
+    controlPlane:
+      replicas: 1
+    workers:
+      machineDeployments:
+      - class: default-worker
+        name: md-0
+        replicas: 2
+    variables:
+      - name: instanceType
+        value: "m5.xlarge"
+```
+
+### Životní cyklus clusteru s CAPI
+
+| Fáze | Akce | CAPI mechanismus |
+|------|------|------------------|
+| **Create** | `kubectl apply -f cluster.yaml` | Controller vytvoří infra (VM, network), provede bootstrap kubeadm init/join |
+| **Scale** | Upravit `replicas` v MachineDeployment | Controller vytvoří/odstraní Machine → VM → node join/drain |
+| **Upgrade** | Změnit `version` v KubeadmControlPlane / MachineDeployment | Rolling update: nový CP node → upgrade → starý drain a delete. Workers: MachineDeployment rolling update |
+| **Health check** | MachineHealthCheck | Pokud node unhealthy > timeout, controller vytvoří náhradní Machine |
+| **Delete** | `kubectl delete cluster` | Controller provede drain, delete VMs, cleanup infrastruktury |
+| **Template update** | Změna AWSMachineTemplate / KubeadmConfigTemplate | Stroj se vytvoří s novou šablonou; stávající Machiny se dotýká jen přes rolling update |
+
+### Auto-remediace (MachineHealthCheck)
+
+```yaml
+apiVersion: cluster.x-k8s.io/v1beta1
+kind: MachineHealthCheck
+metadata:
+  name: prod-mhc
+  namespace: clusters
+spec:
+  clusterName: prod-us-east
+  selector:
+    matchLabels:
+      cluster.x-k8s.io/deployment-name: prod-us-east-workers
+  unhealthyConditions:
+  - type: Ready
+    status: "False"
+    timeout: 5m
+  - type: Ready
+    status: Unknown
+    timeout: 5m
+  maxUnhealthy: "40%"
+  nodeStartupTimeout: 10m
+```
+
+### CAPI + GitOps
+
+CAPI se přirozeně integruje s GitOps:
+
+- **ArgoCD** — Cluster a MachineDeployment manifesty v Git repozitáři, ArgoCD je aplikuje na management cluster
+- **Flux** — `Kustomization` + `OCIRepository` pro CAPI objekty
+- **Crossplane** — lze kombinovat: Crossplane pro provisioning cloud resources (VPC, subnets), CAPI pro K8s cluster na nich
+
+Vzor: vyhrazený "fleet management" cluster, na kterém běží CAPI + ArgoCD. Všechny workload clustery jsou definované jako YAML v Gitu.
+
+### CAPI pro on-prem
+
+| Provider | Use case | Poznámka |
+|----------|----------|----------|
+| **Metal3** (Ironic) | Bare metal provisioning (PXE, IPMI, Redfish) | Automatické provisionování BM serverů jako K8s nodes |
+| **CAPV (vSphere)** | VMware VM jako K8s nodes | Většina enterprise on-prem |
+| **CAPO (OpenStack)** | OpenStack VM jako K8s nodes | OpenStack-native |
+| **Nutanix (CAPNX)** | Nutanix AHV/Prism | Community provider |
+
+### CAPI pro edge
+
+| Provider | Use case | Poznámka |
+|----------|----------|----------|
+| **K3s bootstrap + control plane** | Lightweight K8s na edge zařízeních | Single binary, SQLite/embedded etcd |
+| **RKE2 bootstrap + control plane** | Enterprise edge, air-gapped | CIS-hardened, FIPS |
+| **Talos** | Immutable OS, API-driven | Minimal footprint, no SSH |
+| **k0smotron** | Hosted control plane pro edge clustery | CP běží v management clusteru, worker na edge |
+
+### CAPI vs alternativy
+
+| Nástroj | Přístup | CAPI výhoda | CAPI nevýhoda |
+|---------|---------|-------------|---------------|
+| **Terraform/Pulumi** | Imperativní/declarativní IaC | CAPI je K8s-native — stejný nástroj pro appky i clustery; GitOps ready | Terraform má širší podporu non-K8s resources |
+| **kubeadm** | Manuální nebo skriptovaný | CAPI automatizuje celý lifecycle včetně upgradů a remediací | Vyšší komplexita, nutný management cluster |
+| **Rancher** | Web UI + API pro správu K8s clusterů | CAPI je open-source, vendor-neutral | Rancher má GUI, monitoring, katalog appek |
+| **OpenShift Hive/ACM** | Red Hat Advanced Cluster Management | CAPI je standardní (SIG) — širší provider ecosystem | ACM má governance, policy, compliance |
+
+### Limitations a maturity
+
+- **Management cluster je SPOF** — musí mít vlastní HA a backup (etcd zálohy, certifikáty)
+- **CAPI není cluster autoscaler** — řeší lifecycle clusterů, ne auto-scaling podů v rámci clusteru (používá se Cluster Autoscaler samostatně)
+- **Provider maturity se liší** — AWS/Azure/vSphere stabilní, GCP/OpenStack beta, některé community providers alpha
+- **etcd backup není built-in** — nutné řešit externě (Velero, etcd snapshot)
+- **CAPI neřeší aplikace** — pouze lifecycle K8s clusterů (monitoring, logging, ingress si řídí uživatel)
+- **Learning curve** — nutnost management clusteru, pochopení provider modelu, CRDs
+- **CAPI v1.13+ (2026)** — stable release, v1beta1 API je GA, ClusterClass stable, EKS/AKS/GKE managed control plane podpora
+
+### Doporučený stack pro CAPI v produkci
+
+| Komponenta | Doporučení |
+|------------|------------|
+| **Management cluster** | K3s (malý footprint) nebo kubeadm (3 nodes HA) |
+| **Infra provider** | CAPA (AWS) / CAPV (vSphere) / CAPO (OpenStack) — dle platformy |
+| **Bootstrap/CP provider** | Kubeadm nebo RKE2 |
+| **GitOps** | ArgoCD nebo Flux |
+| **Backup** | Velero + restic/Ceph |
+| **Cluster autoscaler** | Cluster Autoscaler (přes CAPI integration) |
+| **Network** | Cilium (CAPI-native, podpora) |
+| **Secrets** | External Secrets Operator / Sealed Secrets |
+| **Monitoring** | Prometheus + Grafana (kube-prometheus-stack) |
+| **Ingress** | ingress-nginx / Kong / Traefik |
+
+## Zdroje
+
+Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
+
+*Poslední revize: 2026-06-18*
--- a/MESSAGING.en.md
+++ b/MESSAGING.en.md
@@ -270,6 +270,6 @@ See [DATACENTERS.en.md](DATACENTERS.en.md) — section "Impact of individual tec

 ## Sources

-Links, books, and standards: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
+Links, books, and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)

 *Last revision: 2026-06-12*
--- a/MONGODB.en.md
+++ b/MONGODB.en.md
@@ -111,6 +111,6 @@ MongoDB changed its license in 2018 from GNU AGPL v3 to **SSPL** (Server Side Pu

 ## Sources

-References, books, and standards: [sources/databases/sources.md](sources/databases/sources.md)
+References, books, and standards: [sources/databases/sources.en.md](sources/databases/sources.en.md)

 *Last revision: 2026-06-03*
--- a/MONITORING.en.md
+++ b/MONITORING.en.md
@@ -497,6 +497,6 @@ OpenStack provides several services for telemetry and monitoring:

 ## Sources

-Links, books and standards: [sources/monitoring/sources.md](sources/monitoring/sources.md)
+Links, books and standards: [sources/monitoring/sources.en.md](sources/monitoring/sources.en.md)

 *Last revision: 2026-06-03*
--- a/MYSQL.en.md
+++ b/MYSQL.en.md
@@ -131,7 +131,7 @@ ProxySQL is an advanced proxy for MySQL with sophisticated routing:

 ## Sources

-References, books, and standards: [sources/databases/sources.md](sources/databases/sources.md)
+References, books, and standards: [sources/databases/sources.en.md](sources/databases/sources.en.md)

 ### Recommended reading

--- a/NETWORKING.en.md
+++ b/NETWORKING.en.md
@@ -302,7 +302,7 @@ Anycast detail:

 ## Cloud Networking Resilience (2026)

-See also: [CLOUD.md](CLOUD.md) — cloud architecture, multi-AZ, hybrid cloud connectivity.
+See also: [CLOUD.en.md](CLOUD.en.md) — cloud architecture, multi-AZ, hybrid cloud connectivity.

 ### Cell-based Architectures

@@ -577,7 +577,7 @@ In a private DC, Zero Trust is deployed via:

 ## Resources

-Links, books and standards: [sources/networking/sources.md](sources/networking/sources.md)
+Links, books and standards: [sources/networking/sources.en.md](sources/networking/sources.en.md)
 - **MTU alignment** — consistent MTU across the entire path, check ICMP blocking for PMTUD
 - **IP planning** — RFC 1918 (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16), avoid overlaps for peering

--- a/ORACLE.en.md
+++ b/ORACLE.en.md
@@ -195,7 +195,7 @@ Tip: For RAC, consider smaller CPUs (e.g., 64C instead of 96C) — license cost

 ## Sources

-References, books, and standards: [sources/databases/sources.md](sources/databases/sources.md)
+References, books, and standards: [sources/databases/sources.en.md](sources/databases/sources.en.md)

 ### Recommended reading

--- a/OS.en.md
+++ b/OS.en.md
@@ -0,0 +1,337 @@
+# Operating Systems
+
+> Overview of Linux distributions and Microsoft Windows for server, container, and AI/GPU workloads, including support lifecycle, EOL dates, and comparison.
+
+---
+
+## Distribution overview
+
+| Distribution | Family | Package manager | Init | Security | Reference platform |
+|-------------|--------|----------------|------|----------|-------------------|
+| **Ubuntu LTS** | Debian | apt (deb) | systemd | AppArmor | NVIDIA DGX, widest AI/GPU support |
+| **Debian** | Debian | apt (deb) | systemd | AppArmor | General-purpose server, stability |
+| **RHEL** | Red Hat | dnf (rpm) | systemd | SELinux | Enterprise standard, SAP, Oracle DB |
+| **Rocky Linux** | Red Hat | dnf (rpm) | systemd | SELinux | RHEL binary compatible (free) |
+| **AlmaLinux** | Red Hat | dnf (rpm) | systemd | SELinux | RHEL binary compatible (free) |
+| **SLES** | SUSE | zypper (rpm) | systemd | AppArmor | HPC, SAP, mainframe |
+| **OpenSUSE Leap** | SUSE | zypper (rpm) | systemd | AppArmor | Desktop, development |
+| **OpenSUSE Tumbleweed** | SUSE | zypper (rpm) | systemd | AppArmor | Rolling release, bleeding edge |
+| **Fedora** | Red Hat | dnf (rpm) | systemd | SELinux | Desktop, technology preview |
+| **Arch Linux** | Independent | pacman | systemd | — | Rolling, power users |
+| **Alpine Linux** | Independent | apk | OpenRC | — | Container image, embedded |
+| **Flatcar Container Linux** | Independent | — (image-based) | systemd | — | K8s worker node, minimal footprint |
+| **Bottlerocket** | Independent | — (image-based) | systemd | — | AWS K8s, minimal footprint |
+
+---
+
+## Support lifecycle and EOL dates
+
+> **Standard:** base support (bug fixes, security). **LTS/ELS:** extended support (security only).
+> ESM = Ubuntu Extended Security Maintenance, EUS = RHEL Extended Update Support, LTSS = SUSE Long Term Service Pack Support.
+
+### Ubuntu LTS
+
+| Version | Release | Standard support | ESM / Ubuntu Pro | Note |
+|---------|---------|-----------------|------------------|------|
+| **20.04 LTS** (Focal) | 2020-04 | End 2025-04 | End 2030-04 | Last release with Python 2 |
+| **22.04 LTS** (Jammy) | 2022-04 | End 2027-04 | End 2032-04 | NVIDIA DGX standard |
+| **24.04 LTS** (Noble) | 2024-04 | End 2029-04 | End 2034-04 | Latest GPU/CUDA support |
+| **26.04 LTS** (planned) | 2026-04 | End 2031-04 | End 2036-04 | — |
+
+### RHEL
+
+| Version | Release | Full support | Maintenance support | Extended life cycle |
+|---------|---------|-------------|-------------------|-------------------|
+| **7** | 2014-06 | End 2019-08 | End 2024-06 | End 2028-06 (ELS) |
+| **8** | 2019-05 | End 2024-05 | End 2029-05 | End 2034-06 (ELS) |
+| **9** | 2022-05 | End 2027-05 | End 2032-05 | End 2037-06 (ELS) |
+| **10** (planned) | 2025 | End 2029 | End 2034 | — |
+
+### Rocky Linux / AlmaLinux
+
+| Version | Release | Support until | RHEL compatible | Note |
+|---------|---------|-------------|-----------------|------|
+| **8** | 2021-06 | 2029-05 | Yes (since RHEL 8.4) | Alma/Rocky |
+| **9** | 2022-07 | 2032-05 | Yes (since RHEL 9.0) | Alma/Rocky |
+
+### Debian
+
+| Version | Release | Full support | LTS support | ELTS (paid) |
+|---------|---------|-------------|-------------|-------------|
+| **11** (Bullseye) | 2021-08 | 2024-08 | End 2026-08 | End 2028-08 |
+| **12** (Bookworm) | 2023-06 | 2026-06 | End 2028-06 | End 2030-06 |
+| **13** (Trixie) | 2025 (expected) | ~3 years post-release | ~5 years post-release | — |
+
+### SLES
+
+| Version | Release | General support | LTSS | Note |
+|---------|---------|---------------|------|------|
+| **15 SP3** | 2021-06 | End 2024-12 | End 2027-12 | — |
+| **15 SP4** | 2022-06 | End 2025-12 | End 2028-12 | — |
+| **15 SP5** | 2023-06 | End 2026-12 | End 2029-12 | Current SP |
+| **15 SP6** | 2024-10 | End 2027-12 | End 2030-12 | — |
+
+### Fedora
+
+| Version | Release | EOL | Note |
+|---------|---------|-----|------|
+| **38** | 2023-04 | 2024-05 | — |
+| **39** | 2023-11 | 2024-12 | — |
+| **40** | 2024-04 | 2025-05 | — |
+| **41** | 2024-11 | 2025-12 | — |
+
+Fedora releases a new version every ~6 months, EOL ~13 months after release. Serves as upstream for RHEL.
+
+### Alpine Linux
+
+| Version | Release | EOL |
+|---------|---------|-----|
+| **3.18** | 2023-05 | 2025-05 |
+| **3.19** | 2023-12 | 2025-12 |
+| **3.20** | 2024-05 | 2026-05 |
+| **3.21** | 2024-12 | 2026-12 |
+
+---
+
+## Kernel version per distribution
+
+| Distribution | Kernel (default) | Kernel (HWE/enhanced) | Note |
+|------------|-----------------|----------------------|------|
+| Ubuntu 22.04 LTS | 5.15 (GA) | 6.5+ (HWE) | HWE from 22.04.2 |
+| Ubuntu 24.04 LTS | 6.8 | — | — |
+| RHEL 8 | 4.18 | — | Backported features |
+| RHEL 9 | 5.14 | — | Backported features |
+| RHEL 10 | 6.11+ (expected) | — | — |
+| Rocky/Alma 8 | 4.18 | — | Same as RHEL 8 |
+| Rocky/Alma 9 | 5.14 | — | Same as RHEL 9 |
+| Debian 11 | 5.10 | 6.1 (backports) | — |
+| Debian 12 | 6.1 | — | — |
+| SLES 15 SP5 | 5.14 | — | — |
+| SLES 15 SP6 | 6.4 | — | — |
+| Fedora 40 | 6.8+ | — | Rolling upstream |
+| Alpine 3.20 | 6.6 | — | — |
+
+---
+
+## Use case comparison
+
+| Use case | Recommended distribution | Rationale |
+|----------|------------------------|-----------|
+| **AI/GPU cluster (DGX)** | Ubuntu 22.04 LTS / DGX OS | NVIDIA standard, CUDA, MLNX_OFED |
+| **Enterprise K8s (OpenShift)** | RHEL 9 / RHCOS | Red Hat support, GPU Operator |
+| **Vanilla K8s (on-prem)** | Ubuntu 22.04 LTS + Flatcar (workers) | Community support, minimal worker image |
+| **HPC cluster (Slurm)** | Rocky Linux 9 / Ubuntu 22.04 | EL ecosystem + Lustre, or Ubuntu |
+| **Traditional enterprise DB (Oracle, SAP)** | RHEL 9 / SLES 15 | Vendor certification |
+| **Container host** | Ubuntu 22.04 / Alpine | Broad image compatibility / min size |
+| **Development / desktop** | Fedora / Ubuntu 24.04 / OpenSUSE Tumbleweed | Latest packages, HW support |
+| **Embedded / IoT** | Debian / Alpine / Yocto | Minimal footprint, stability |
+| **Edge inference** | Ubuntu (ARM) / NVIDIA JetPack | Jetson, GPU support |
+| **Mainframe (IBM z/Arch)** | SLES 15 / RHEL 9 | IBM certification |
+
+---
+
+## Package management comparison
+
+| Feature | apt (Debian/Ubuntu) | dnf (RHEL/Rocky/Alma/Fedora) | zypper (SUSE) | pacman (Arch) | apk (Alpine) |
+|---------|--------------------|------------------------------|---------------|---------------|-------------|
+| **Package format** | .deb | .rpm | .rpm | .pkg.tar.zst | .apk |
+| **Repo management** | /etc/apt/sources.list | /etc/yum.repos.d/ | /etc/zypp/repos.d/ | /etc/pacman.conf | /etc/apk/repositories |
+| **Lock file** | — (apt-mark hold) | — (exclude) | — (lock) | — (IgnorePkg) | — |
+| **Transactional update** | No | Yes (dnf history) | Yes (zypper history) | No | No |
+| **Rollback** | No (manual) | Yes (dnf history rollback) | Yes (snapper + zypper) | No | No |
+| **Delta updates** | Yes (apt-xapian) | Yes (deltarpm) | Yes (zsync) | No | No |
+| **Version (as of 2025)** | apt 2.7+ | dnf 4.18+ | zypper 1.14+ | pacman 6.1+ | apk 2.14+ |
+
+---
+
+## Security model comparison
+
+| Feature | SELinux (RHEL derivatives) | AppArmor (Ubuntu/Debian/SUSE) |
+|---------|--------------------------|-------------------------------|
+| **Type** | Mandatory Access Control (MAC) | Mandatory Access Control (MAC) |
+| **Labeling** | Context-based (user:role:type) | Path-based (profile per executable) |
+| **Configuration** | Policy (modules, booleans) | Profiles (text, in /etc/apparmor.d/) |
+| **Modes** | Enforcing / Permissive / Disabled | Enforce / Complain / Disabled |
+| **Learning curve** | Steep (complex policies) | Moderate (simpler profiles) |
+| **Default in** | RHEL, Rocky, Alma, Fedora | Ubuntu, Debian, SLES, OpenSUSE |
+| **Use case** | Enterprise multi-tenant, regulated | General-purpose server, app containment |
+| **Container integration** | SELinux labels on container | AppArmor profile on container |
+
+Additional layers:
+- **seccomp** — syscall filtering (default in containerd, Docker)
+- **Capabilities** — Linux capabilities (drop all except required)
+- **cgroups v2** — resource isolation (CPU, memory, IO, PID)
+- **User namespaces** — rootless containers (Podman, Docker rootless)
+
+---
+
+## Recommended migration path for EOL distributions
+
+| From | To | Recommended approach |
+|------|-----|---------------------|
+| Ubuntu 20.04 (EOL 2025) | Ubuntu 22.04 or 24.04 | `do-release-upgrade` or fresh install |
+| RHEL 7 (EOL 2024) | RHEL 8 or 9 | `leapp` upgrade, or fresh install |
+| Rocky/Alma 8 | Rocky/Alma 9 | `dnf upgrade --releasever=9` |
+| Debian 11 (EOL LTS 2026) | Debian 12 | `apt full-upgrade` + new sources.list |
+| SLES 15 SP4 (EOL 2025) | SLES 15 SP6 | `zypper migration` |
+| Fedora 40 (EOL 2025) | Fedora 42+ | `dnf system-upgrade` |
+
+---
+
+## Microsoft Windows
+
+### Windows Server — editions
+
+| Edition | Price (approx) | Core limits | VM rights | Use case |
+|---------|---------------|-------------|-----------|----------|
+| **Datacenter** | ~$6,155 (2025) | Unlimited | Unlimited Windows VMs per host | Virtualization, SDDC, S2D, HCI |
+| **Standard** | ~$1,069 (2025) | 2 CPU, unlimited cores | 2 Windows VMs + Hyper-V host | General server, AD, file server |
+| **Essentials** | ~$501 (2025) | 1 CPU, max 10 users | — | Small business (≤25 users) |
+| **Azure Edition** | Pay-as-you-go | Per Azure VM | Per Azure | Azure-only, hotpatching |
+
+Licensing: Windows Server Standard and Datacenter are licensed **per core** (min 16 core/server + 8 core/VM).
+
+### Windows Server — support lifecycle
+
+> **Mainstream:** regular updates (bug fixes, security, features). **Extended:** security updates only (free).
+> **ESU:** Extended Security Updates (paid tier, ~$45–300/core/year).
+
+| Version | Release | Mainstream support | Extended support | ESU | Note |
+|---------|---------|------------------|-----------------|-----|------|
+| **2012 R2** | 2013-11 | 2018-10 | 2023-10 | End 2026-10 (year 3) | ESU paid, final year |
+| **2016** | 2016-10 | 2022-01 | 2027-01 | — | Last with Desktop Experience |
+| **2019** | 2019-01 | 2024-01 | 2029-01 | — | Last with Nano Server (1803 only) |
+| **2022** | 2021-09 | 2026-10 | 2031-10 | — | Current, TPM 2.0, Credential Guard |
+| **2025** | 2024-11 | 2029-10 | 2034-10 | — | Hotpatching, PowerShell 7, SMB over QUIC |
+
+### Windows Server — version vs edition feature grid
+
+| Version | Hyper-V | Storage Spaces Direct | Software-defined networking | Containers | GPU DDA / vGPU | WSL2 |
+|---------|---------|---------------------|---------------------------|------------|---------------|------|
+| 2016 Standard | Yes | No (DC only) | No (DC only) | Windows only | Yes | No |
+| 2016 Datacenter | Yes | Yes | Yes | Windows | Yes | No |
+| 2019 Standard | Yes | No | No | Windows | Yes | No |
+| 2019 Datacenter | Yes | Yes | Yes | Windows | Yes | No |
+| 2022 Standard | Yes | No | No | Windows + Linux | Yes | No |
+| 2022 Datacenter | Yes | Yes | Yes | Windows + Linux (2022.2+) | Yes | No |
+| 2025 Datacenter | Yes | Yes | Yes | Windows + Linux | Yes | Yes |
+
+### Windows Desktop — support lifecycle
+
+> **E = Enterprise, Pro = Professional, Home = Consumer**
+> LTSC = Long Term Servicing Channel (stable, no feature updates)
+
+| Version | Release | EOL (Home/Pro) | EOL (Enterprise) | LTSC EOL | Note |
+|---------|---------|---------------|-----------------|----------|------|
+| **10 21H2** | 2021-11 | — | 2024-06 | — |
+| **10 22H2** | 2022-10 | 2025-10 | 2025-10 | — | Final Windows 10 |
+| **10 LTSC 2021** | 2021-11 | — | — | 2032-01 | IoT Enterprise LTSC |
+| **11 22H2** | 2022-09 | 2024-10 | 2025-10 | — |
+| **11 23H2** | 2023-10 | 2025-11 | 2026-11 | — |
+| **11 24H2** | 2024-10 | 2026-10 | 2027-10 | — | First with Recall, Copilot+ |
+| **11 LTSC 2024** | 2024-10 | — | — | 2029-10 | Enterprise LTSC |
+
+Windows 10 support **ended 2025-10-14** — last version with classic Control Panel.
+
+### Windows vs Linux — comparison
+
+| Feature | Windows Server | RHEL / Ubuntu |
+|---------|---------------|---------------|
+| **License (server)** | $500–6,000 (per core) + CAL | $0–800 (per node subscription) |
+| **License (desktop)** | $100–200 (OEM/retail) | Free |
+| **Support cost** | Included in license (SA/ESU) | $200–1,300/node/year (RHEL) |
+| **Package management** | MSI, AppX, winget, NuGet | APT, DNF, Zypper |
+| **Package count** | ~10,000 (chocolatey) | ~60,000+ (Ubuntu repo) |
+| **Desktop GUI** | Windows Shell (mandatory) | Optional (GNOME, KDE, XFCE…) |
+| **Server GUI** | Windows Shell (core-only since 2022) | CLI-only (standard) |
+| **Kernel** | NT hybrid kernel (kernel-mode Win32) | Monolithic Linux kernel |
+| **Device support** | OEM driver model (WHQL) | Open source + vendor drivers |
+| **Container types** | Windows + Linux (WSL2) | Linux (Docker, Podman, containerd) |
+| **Container registry** | Docker Hub, ACR, Nexus | Docker Hub, Quay, GHCR, Nexus… |
+| **Container image size** | ~4–8 GB (Windows Server Core) | ~100 MB – 1 GB (Alpine/Ubuntu) |
+| **GPU passthrough** | DDA (Discrete Device Assignment) | GPU Direct, VFIO, SR-IOV |
+| **AI/ML support** | WSL2 (CUDA), Azure ML | Native CUDA, ROCm, oneAPI |
+| **CUDA support** | Yes (via WSL2 or Docker) | Native (nvidia-container-toolkit) |
+| **Orchestration** | AD / GPO / SCCM / WAC | Ansible, Puppet, Salt, Foreman |
+| **RBAC/AAA** | Active Directory (+ Kerberos) | LDAP, FreeIPA, SSSD, AD |
+| **Remote management** | RDP, WinRM, PowerShell Remoting | SSH, Cockpit, Webmin |
+| **Filesystem** | NTFS, ReFS, CSVFS | ext4, XFS, Btrfs, ZFS |
+| **Max file system size** | 256 TB (NTFS), 1.2 YB (ReFS) | 1 EB (XFS), 16 EB (ZFS) |
+| **Hypervisor** | Hyper-V (Type 1) | KVM (Type 2-like), Xen |
+| **Dynamic memory** | Hyper-V Dynamic Memory | KSM, virtio-balloon (KVM) |
+| **Live migration** | Hyper-V Live Migration | KVM Live Migration, vMotion |
+
+### Windows specific features
+
+| Feature | Description | Linux alternative |
+|---------|------------|-------------------|
+| **Active Directory** | Identity, auth, GPO, DNS, DHCP | FreeIPA, Samba AD DC, 389-ds, SSSD |
+| **Group Policy** | Central desktop/server configuration | Ansible, Puppet, Salt (agent-based) |
+| **Hyper-V + S2D** | Hyper-converged storage and virtualization (HCI) | Proxmox Ceph / oVirt + Gluster |
+| **Failover Clustering** | Cluster-aware apps (SQL, File Server) | Pacemaker + Corosync + DRBD |
+| **IIS** | Web server, ASP.NET host | Nginx, Apache (.NET host possible) |
+| **PowerShell** | Scripting, Desired State Configuration | Bash, Python, Ansible |
+| **Windows Admin Center** | GUI management | Cockpit, Webmin |
+| **BitLocker** | Full disk encryption | LUKS + cryptsetup |
+| **Windows Defender** | Antivirus + EDR | ClamAV, Wazuh, Osquery |
+| **SQL Server** | Relational database | PostgreSQL, MySQL, MariaDB |
+
+### Recommended OS per use case (including Windows)
+
+| Use case | OS | Rationale |
+|----------|-----|-------|
+| **Active Directory / GPO / hybrid ID** | Windows Server 2022/2025 | AD is Windows-only |
+| **SQL Server (failover cluster)** | Windows Server Datacenter + SQL EE | Always On FCI, ReFS |
+| **Exchange / SharePoint** | Windows Server 2022 | Windows-only |
+| **Enterprise desktop management** | Windows 11 Enterprise + Intune/SCCM | GPO, AD, enterprise MDM |
+| **.NET / ASP.NET apps** | Windows Server / Linux (.NET Core) | .NET 6+ runs on Linux |
+| **HCI (Microsoft stack)** | Windows Server Datacenter + S2D + Hyper-V | Azure Stack HCI |
+| **Virtualization (mixed workload)** | Windows Server Datacenter (Hyper-V) | Linux + Windows VMs under one |
+| **AI/GPU inference** | Linux (Ubuntu) + CUDA | NVIDIA optimal; WSL2 alternative |
+| **Container orchestration (Windows nodes)** | Windows Server 2022/2025 + containerd | Windows Pods in AKS on-prem |
+| **Tier 2 apps / web / API** | Ubuntu or RHEL (Linux) | Lower TCO, smaller footprint |
+
+### Windows Server migration paths
+
+| From | To | Recommended approach |
+|------|-----|---------------------|
+| Windows Server 2012 R2 (EOL 2023) | Windows Server 2022/2025 | In-place upgrade or fresh + migration |
+| Windows Server 2016 (EOL 2027) | Windows Server 2022/2025 | In-place upgrade or fresh |
+| Windows Server 2019 | Windows Server 2022/2025 | In-place upgrade (`Setup.exe /auto upgrade`) |
+| Windows Server 2022 | Windows Server 2025 | In-place upgrade or fresh |
+| Windows Server → Cloud | Azure VM / Azure Stack HCI | Azure Migrate, Storage Migration Service |
+| Windows Server → Linux | Ubuntu / RHEL (re-platform) | Migrate app to .NET Core or alternative |
+
+### Windows — API and operational limits
+
+| Limit | Windows Server | Windows Desktop |
+|-------|---------------|----------------|
+| **Max RAM** | 24 TB (2025 Datacenter) | 2 TB (Pro/Enterprise), 128 GB (Home) |
+| **Max CPU sockets** | 64 (Datacenter), 2 (Standard) | 2 |
+| **Max CPU cores** | Unlimited | 128 (Pro), 64 (Home) |
+| **Max file size (NTFS)** | 256 TB | 256 TB |
+| **Max file size (ReFS)** | 18.4 EB (2025) | — |
+| **Max volume size (NTFS)** | 256 TB | 256 TB |
+| **Max volume size (ReFS)** | 1.2 YB (theoretical) | — |
+| **Max dedup volume** | 64 TB (Data Deduplication) | — |
+| **Max cluster nodes** | 64 (Failover Cluster) | — |
+| **Max VM per host** | Unlimited (Datacenter) | — |
+| **VM memory per VM** | 12 TB (2022+) | — |
+| **VM vCPU per VM** | 240 (2022+) | — |
+| **Concurrent RDP** | 2 (admin), 200+ (RDS CAL) | 1 (Home), more (RDP host) |
+| **PowerShell Remoting** | Unlimited (WinRM) | Yes (WinRM) |
+
+---
+
+## Related
+
+- [AI-INFRASTRUCTURE.en.md](AI-INFRASTRUCTURE.en.md) — OS for AI workloads, GPU drivers, kernel parameters
+- [KUBERNETES.en.md](KUBERNETES.en.md) — container runtime, orchestration
+- [HYPERVISORS.en.md](HYPERVISORS.en.md) — hypervisors, VM host OS
+- [DATACENTERS.en.md](DATACENTERS.en.md) — DC layout, HW platforms
+
+## Sources
+
+Links, books, and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)
+
+*Last revision: 2026-06-18*
--- a/OS.md
+++ b/OS.md
@@ -0,0 +1,333 @@
+# Operační systémy
+
+> Přehled Linux distribucí a Microsoft Windows pro serverové, containerové a AI/GPU workloady, včetně support lifecycle, EOL dat a srovnání.
+
+---
+
+## Přehled distribucí
+
+| Distribuce | Rodina | Package manager | Init | Security | Reference platforma |
+|-----------|--------|----------------|------|----------|-------------------|
+| **Ubuntu LTS** | Debian | apt (deb) | systemd | AppArmor | NVIDIA DGX, nejširší AI/GPU support |
+| **Debian** | Debian | apt (deb) | systemd | AppArmor | Univerzální server, stabilita |
+| **RHEL** | Red Hat | dnf (rpm) | systemd | SELinux | Enterprise standard, SAP, Oracle DB |
+| **Rocky Linux** | Red Hat | dnf (rpm) | systemd | SELinux | RHEL binary compatible (free) |
+| **AlmaLinux** | Red Hat | dnf (rpm) | systemd | SELinux | RHEL binary compatible (free) |
+| **SLES** | SUSE | zypper (rpm) | systemd | AppArmor | HPC, SAP, mainframe |
+| **OpenSUSE Leap** | SUSE | zypper (rpm) | systemd | AppArmor | Desktop, vývoj |
+| **OpenSUSE Tumbleweed** | SUSE | zypper (rpm) | systemd | AppArmor | Rolling release, bleeding edge |
+| **Fedora** | Red Hat | dnf (rpm) | systemd | SELinux | Desktop, technologický preview |
+| **Arch Linux** | Independent | pacman | systemd | — | Rolling, power users |
+| **Alpine Linux** | Independent | apk | OpenRC | — | Container image, embedded |
+| **Flatcar Container Linux** | Independent | — (image-based) | systemd | — | K8s worker node, minimal footprint |
+| **Bottlerocket** | Independent | — (image-based) | systemd | — | AWS K8s, minimal footprint |
+
+---
+
+## Support lifecycle a EOL data
+
+> **Standard:** základní podpora (bug fixy, security). **LTS/ELS:** prodloužená podpora (jen security).
+> ESM = Ubuntu Extended Security Maintenance, EUS = RHEL Extended Update Support, LTSS = SUSE Long Term Service Pack Support.
+
+### Ubuntu LTS
+
+| Verze | Release | Standard support | ESM / Ubuntu Pro | Poznámka |
+|-------|---------|-----------------|------------------|----------|
+| **20.04 LTS** (Focal) | 2020-04 | Konec 2025-04 | Konec 2030-04 | Poslední verze s Python 2 |
+| **22.04 LTS** (Jammy) | 2022-04 | Konec 2027-04 | Konec 2032-04 | NVIDIA DGX standard |
+| **24.04 LTS** (Noble) | 2024-04 | Konec 2029-04 | Konec 2034-04 | Nejnovější GPU/CUDA support |
+| **26.04 LTS** (plán) | 2026-04 | Konec 2031-04 | Konec 2036-04 | — |
+
+### RHEL
+
+| Verze | Release | Full support | Maintenance support | Extended life cycle |
+|-------|---------|-------------|-------------------|-------------------|
+| **7** | 2014-06 | Konec 2019-08 | Konec 2024-06 | Konec 2028-06 (ELS) |
+| **8** | 2019-05 | Konec 2024-05 | Konec 2029-05 | Konec 2034-06 (ELS) |
+| **9** | 2022-05 | Konec 2027-05 | Konec 2032-05 | Konec 2037-06 (ELS) |
+| **10** (plán) | 2025 | Konec 2029 | Konec 2034 | — |
+
+### Rocky Linux / AlmaLinux
+
+| Verze | Release | Support do | Kompatibilní s RHEL | Poznámka |
+|-------|---------|-----------|-------------------|----------|
+| **8** | 2021-06 | 2029-05 | Ano (od RHEL 8.4) | Alma/rocky |
+| **9** | 2022-07 | 2032-05 | Ano (od RHEL 9.0) | Alma/rocky |
+
+### Debian
+
+| Verze | Release | Full support | LTS support | ELTS (paid) |
+|-------|---------|-------------|-------------|-------------|
+| **11** (Bullseye) | 2021-08 | 2024-08 | Konec 2026-08 | Konec 2028-08 |
+| **12** (Bookworm) | 2023-06 | 2026-06 | Konec 2028-06 | Konec 2030-06 |
+| **13** (Trixie) | 2025 (oček.) | ~3 roky po release | ~5 let po release | — |
+
+### SLES
+
+| Verze | Release | General support | LTSS | Poznámka |
+|-------|---------|---------------|------|----------|
+| **15 SP3** | 2021-06 | Konec 2024-12 | Konec 2027-12 | — |
+| **15 SP4** | 2022-06 | Konec 2025-12 | Konec 2028-12 | — |
+| **15 SP5** | 2023-06 | Konec 2026-12 | Konec 2029-12 | Aktuální SP |
+| **15 SP6** | 2024-10 | Konec 2027-12 | Konec 2030-12 | — |
+
+### Fedora
+
+| Verze | Release | EOL | Poznámka |
+|-------|---------|-----|----------|
+| **38** | 2023-04 | 2024-05 | — |
+| **39** | 2023-11 | 2024-12 | — |
+| **40** | 2024-04 | 2025-05 | — |
+| **41** | 2024-11 | 2025-12 | — |
+
+Fedora vydává novou verzi každých ~6 měsíců, EOL ~13 měsíců po release. Slouží jako upstream pro RHEL.
+
+### Alpine Linux
+
+| Verze | Release | EOL |
+|-------|---------|-----|
+| **3.18** | 2023-05 | 2025-05 |
+| **3.19** | 2023-12 | 2025-12 |
+| **3.20** | 2024-05 | 2026-05 |
+| **3.21** | 2024-12 | 2026-12 |
+
+---
+
+## Kernel verze per distribuce
+
+| Distribuce | Kernel (default) | Kernel (HWE/enhanced) | Poznámka |
+|-----------|-----------------|----------------------|----------|
+| Ubuntu 22.04 LTS | 5.15 (GA) | 6.5+ (HWE) | HWE od 22.04.2 |
+| Ubuntu 24.04 LTS | 6.8 | — | — |
+| RHEL 8 | 4.18 | — | Backportované featur |
+| RHEL 9 | 5.14 | — | Backportované featur |
+| RHEL 10 | 6.11+ (oček.) | — | — |
+| Rocky/Alma 8 | 4.18 | — | Stejný jako RHEL 8 |
+| Rocky/Alma 9 | 5.14 | — | Stejný jako RHEL 9 |
+| Debian 11 | 5.10 | 6.1 (backports) | — |
+| Debian 12 | 6.1 | — | — |
+| SLES 15 SP5 | 5.14 | — | — |
+| SLES 15 SP6 | 6.4 | — | — |
+| Fedora 40 | 6.8+ | — | Rolling upstream |
+| Alpine 3.20 | 6.6 | — | — |
+
+---
+
+## Srovnání dle use case
+
+| Use case | Doporučená distribuce | Zdůvodnění |
+|----------|---------------------|-------|
+| **AI/GPU cluster (DGX)** | Ubuntu 22.04 LTS / DGX OS | NVIDIA standard, CUDA, MLNX_OFED |
+| **Enterprise K8s (OpenShift)** | RHEL 9 / RHCOS | Red Hat support, GPU Operator |
+| **Vanilla K8s (on-prem)** | Ubuntu 22.04 LTS + Flatcar (workers) | Community support, minimal worker image |
+| **HPC cluster (Slurm)** | Rocky Linux 9 / Ubuntu 22.04 | EL ekosystém + Lustre, nebo Ubuntu |
+| **Traditional enterprise DB (Oracle, SAP)** | RHEL 9 / SLES 15 | Vendor certifikace |
+| **Container host** | Ubuntu 22.04 / Alpine | Široká image kompatibilita / min size |
+| **Vývoj / desktop** | Fedora / Ubuntu 24.04 / OpenSUSE Tumbleweed | Aktuální balíčky, HW support |
+| **Embedded / IoT** | Debian / Alpine / Yocto | Minimal footprint, stabilita |
+| **Edge inference** | Ubuntu (ARM) / NVIDIA JetPack | Jetson, GPU support |
+| **Mainframe (IBM z/Arch)** | SLES 15 / RHEL 9 | IBM certifikace |
+
+---
+
+## Package management srovnání
+
+| Vlastnost | apt (Debian/Ubuntu) | dnf (RHEL/Rocky/Alma/Fedora) | zypper (SUSE) | pacman (Arch) | apk (Alpine) |
+|-----------|--------------------|------------------------------|---------------|---------------|-------------|
+| **Formát balíčků** | .deb | .rpm | .rpm | .pkg.tar.zst | .apk |
+| **Repo management** | /etc/apt/sources.list | /etc/yum.repos.d/ | /etc/zypp/repos.d/ | /etc/pacman.conf | /etc/apk/repositories |
+| **Lock file** | — (apt-mark hold) | — (exclude) | — (lock) | — (IgnorePkg) | — |
+| **Transactional update** | Ne | Ano (dnf history) | Ano (zypper history) | Ne | Ne |
+| **Rollback** | Ne (manual) | Ano (dnf history rollback) | Ano (snapper + zypper) | Ne | Ne |
+| **Delta updates** | Ano (apt-xapian) | Ano (deltarpm) | Ano (zsync) | Ne | Ne |
+| **Verze (k 2025)** | apt 2.7+ | dnf 4.18+ | zypper 1.14+ | pacman 6.1+ | apk 2.14+ |
+
+---
+
+## Security model porovnání
+
+| Vlastnost | SELinux (RHEL deriváty) | AppArmor (Ubuntu/Debian/SUSE) |
+|-----------|----------------------|------------------------------|
+| **Typ** | Mandatory Access Control (MAC) | Mandatory Access Control (MAC) |
+| **Labelování** | Kontextové (user:role:type) | Path-based (profil k executable) |
+| **Konfigurace** | Policy (moduly, booleany) | Profily (textové, v /etc/apparmor.d/) |
+| **Režimy** | Enforcing / Permissive / Disabled | Enforce / Complain / Disabled |
+| **Křivka učení** | Strmá (politiky komplexní) | Mírná (profily jednodušší) |
+| **Default v** | RHEL, Rocky, Alma, Fedora | Ubuntu, Debian, SLES, OpenSUSE |
+| **Use case** | Enterprise multiclient, regulované prostředí | Univerzální server, containment aplikací |
+| **Container integrace** | SELinux labels na kontejner | AppArmor profile na kontejner |
+
+Další vrstvy:
+- **seccomp** — syscall filtering (default v containerd, Docker)
+- **Capabilities** — Linux capabilities (drop vše kromě nutných)
+- **cgroups v2** — resource isolation (CPU, memory, IO, PID)
+- **User namespaces** — rootless kontejnery (Podman, Docker rootless)
+
+---
+
+## Doporučená migrační cesta pro EOL distribuce
+
+| Ze staré verze | Na | Doporučený postup |
+|----------------|-----|-------------------|
+| Ubuntu 20.04 (EOL 2025) | Ubuntu 22.04 nebo 24.04 | `do-release-upgrade` nebo fresh install |
+| RHEL 7 (EOL 2024) | RHEL 8 nebo 9 | `leapp` upgrade, nebo fresh install |
+| Rocky/Alma 8 | Rocky/Alma 9 | `dnf upgrade --releasever=9` |
+| Debian 11 (EOL LTS 2026) | Debian 12 | `apt full-upgrade` + nové sources.list |
+| SLES 15 SP4 (EOL 2025) | SLES 15 SP6 | `zypper migration` |
+| Fedora 40 (EOL 2025) | Fedora 42+ | `dnf system-upgrade` |
+
+---
+
+## Microsoft Windows
+
+### Windows Server — edice
+
+| Edice | Cena (approx) | Core limity | VM rights | Use case |
+|-------|--------------|-------------|-----------|----------|
+| **Datacenter** | ~$6 155 (2025) | Neomezen | Neomezené Windows VM na hostiteli | Virtualizace, SDDC, S2D, HCI |
+| **Standard** | ~$1 069 (2025) | 2 CPU, neomezen jader | 2 Windows VM + Hyper-V host | Běžný server, AD, file server |
+| **Essentials** | ~$501 (2025) | 1 CPU, max 10 uživatelů | — | Malé firmy (do 25 uživatelů) |
+| **Azure Edition** | Pay-as-you-go | Dle Azure VM | Dle Azure | Azure-only, hotpatching |
+
+Licencování: Windows Server Standard a Datacenter se licencují **per core** (min 16 core/server + 8 core/VM).
+
+### Windows Server — support lifecycle
+
+> **Mainstream:** běžné aktualizace (bug fixy, security, feature). **Extended:** jen security aktualizace (zdarma).
+> **ESU:** Extended Security Updates (placená vrstva navíc, cca $45–300/core/rok).
+
+| Verze | Release | Mainstream support | Extended support | ESU | Poznámka |
+|-------|---------|------------------|-----------------|-----|----------|
+| **2012 R2** | 2013-11 | 2018-10 | 2023-10 | Konec 2026-10 (3. rok) | ESU placená, poslední rok |
+| **2016** | 2016-10 | 2022-01 | 2027-01 | — | Poslední s Desktop Experience |
+| **2019** | 2019-01 | 2024-01 | 2029-01 | — | Poslední s Nano Server (jen 1803) |
+| **2022** | 2021-09 | 2026-10 | 2031-10 | — | Aktuální, TPM 2.0, Credential Guard |
+| **2025** | 2024-11 | 2029-10 | 2034-10 | — | Hotpatching, PowerShell 7, SMB over QUIC |
+
+### Windows Server — verze vs edice grid
+
+| Verze | Hyper-V | Storage Spaces Direct | Software-defined networking | Containers | GPU DDA / vGPU | WSL2 |
+|-------|---------|---------------------|---------------------------|------------|---------------|------|
+| 2016 Standard | Ano | Ne (jen Datacenter) | Ne (jen Datacenter) | Jen Windows | Ano | Ne |
+| 2016 Datacenter | Ano | Ano | Ano | Windows | Ano | Ne |
+| 2019 Standard | Ano | Ne | Ne | Windows | Ano | Ne |
+| 2019 Datacenter | Ano | Ano | Ano | Windows | Ano | Ne |
+| 2022 Standard | Ano | Ne | Ne | Windows + Linux | Ano | Ne |
+| 2022 Datacenter | Ano | Ano | Ano | Windows + Linux (2022.2+) | Ano | Ne |
+| 2025 Datacenter | Ano | Ano | Ano | Windows + Linux | Ano | Ano |
+
+### Windows Desktop — support lifecycle
+
+> **E = Enterprise, Pro = Professional, Home = Consumer**
+> LTSC = Long Term Servicing Channel (stabilní, bez feature updatů)
+
+| Verze | Release | EOL (Home/Pro) | EOL (Enterprise) | LTSC EOL | Poznámka |
+|-------|---------|---------------|-----------------|----------|----------|
+| **10 21H2** | 2021-11 | — | 2024-06 | — |
+| **10 22H2** | 2022-10 | 2025-10 | 2025-10 | — | Poslední Windows 10 |
+| **10 LTSC 2021** | 2021-11 | — | — | 2032-01 | IoT Enterprise LTSC |
+| **11 22H2** | 2022-09 | 2024-10 | 2025-10 | — |
+| **11 23H2** | 2023-10 | 2025-11 | 2026-11 | — |
+| **11 24H2** | 2024-10 | 2026-10 | 2027-10 | — | První s Recall, Copilot+ |
+| **11 LTSC 2024** | 2024-10 | — | — | 2029-10 | Enterprise LTSC |
+
+Podpora Windows 10 **skončila 2025-10-14** — poslední verze s klasickým ovládacím panelem.
+
+### Windows vs Linux — srovnání
+
+| Vlastnost | Windows Server | RHEL / Ubuntu |
+|-----------|---------------|---------------|
+| **Licence (server)** | $500–6 000 (per core) + CAL | $0–800 (per node subscription) |
+| **Licence (desktop)** | $100–200 (OEM/retail) | Zdarma |
+| **Cena za support** | Zahrnuto v licenci (SA/ESU) | $200–1 300/node/rok (RHEL) |
+| **Package management** | MSI, AppX, winget, NuGet | APT, DNF, Zypper |
+| **Package count** | ~10 000 (chocolatey) | ~60 000+ (Ubuntu repo) |
+| **Desktop GUI** | Windows Shell (mandatory) | Volitelný (GNOME, KDE, XFCE…) |
+| **Server GUI** | Windows Shell (od 2022 Core only) | CLI-only (standard) |
+| **Kernel** | NT hybrid kernel (kernel-mode Win32) | Monolithic Linux kernel |
+| **Device support** | OEM driver model (WHQL) | Open source + vendor drivers |
+| **Container types** | Windows + Linux (WSL2) | Linux (Docker, Podman, containerd) |
+| **Container registry** | Docker Hub, ACR, Nexus | Docker Hub, Quay, GHCR, Nexus… |
+| **Container image size** | ~4–8 GB (Windows Server Core) | ~100 MB – 1 GB (Alpine/Ubuntu) |
+| **GPU passthrough** | DDA (Discrete Device Assignment) | GPU Direct, VFIO, SR-IOV |
+| **AI/ML support** | WSL2 (CUDA), Azure ML | Native CUDA, ROCm, oneAPI |
+| **CUDA support** | Ano (přes WSL2 nebo Docker) | Native (nvidia-container-toolkit) |
+| **Orchestration** | AD / GPO / SCCM / WAC | Ansible, Puppet, Salt, Foreman |
+| **RBAC/AAA** | Active Directory (+ Kerberos) | LDAP, FreeIPA, SSSD, AD |
+| **Remote management** | RDP, WinRM, PowerShell Remoting | SSH, Cockpit, Webmin |
+| **Filesystem** | NTFS, ReFS, CSVFS | ext4, XFS, Btrfs, ZFS |
+| **Max file system size** | 256 TB (NTFS), 1.2 YB (ReFS) | 1 EB (XFS), 16 EB (ZFS) |
+| **Hypervisor** | Hyper-V (Type 1) | KVM (Type 2-ish), Xen |
+| **Dynamic memory** | Hyper-V Dynamic Memory | KSM, virtio-balloon (KVM) |
+| **Live migration** | Hyper-V Live Migration | KVM Live Migration, vMotion |
+
+### Windows specific features
+
+| Feature | Popis | Lze nahradit na Linuxu? |
+|---------|-------|------------------------|
+| **Active Directory** | Identity, auth, GPO, DNS, DHCP | FreeIPA, Samba AD DC, 389-ds, SSSD |
+| **Group Policy** | Centrální konfigurace desktopů/serverů | Ansible, Puppet, Salt (agent-based) |
+| **Hyper-V + S2D** | Hyper-converged storage a virtualizace (HCI) | Proxmox Ceph / oVirt + Gluster |
+| **Failover Clustering** | Cluster-aware aplikace (SQL, File Server) | Pacemaker + Corosync + DRBD |
+| **IIS** | Web server, ASP.NET host | Nginx, Apache (bez ASP.NET, nebo .NET host) |
+| **PowerShell** | Scripting, Desired State Configuration | Bash, Python, Ansible |
+| **Windows Admin Center** | GUI management | Cockpit, Webmin |
+| **BitLocker** | Full disk encryption | LUKS + cryptsetup |
+| **Windows Defender** | Antivirus + EDR | ClamAV, Wazuh, Osquery |
+| **SQL Server** | Relační DB | PostgreSQL, MySQL, MariaDB |
+
+### Doporučený OS dle use case (včetně Windows)
+
+| Use case | OS | Zdůvodnění |
+|----------|-----|-------|
+| **Active Directory / GPO / hybrid ID** | Windows Server 2022/2025 | AD jen na Windows |
+| **SQL Server (failover cluster)** | Windows Server Datacenter + SQL EE | Always On FCI, ReFS |
+| **Exchange / SharePoint** | Windows Server 2022 | Jen na Windows |
+| **Enterprise desktop management** | Windows 11 Enterprise + Intune/SCCM | GPO, AD, enterprise MDM |
+| **.NET / ASP.NET aplikace** | Windows Server / Linux (.NET Core) | .NET 6+ běží na Linuxu |
+| **HCI (Microsoft stack)** | Windows Server Datacenter + S2D + Hyper-V | Azure Stack HCI |
+| **Virtualizace (mixed workload)** | Windows Server Datacenter (Hyper-V) | Linux i Windows VM pod jedním |
+| **AI/GPU inference** | Linux (Ubuntu) + CUDA | NVIDIA optimální; WSL2 alternativa |
+| **Container orchestration (Windows nodes)** | Windows Server 2022/2025 + containerd | Windows Pods v AKS on-prem |
+| **Tier 2 aplikace / web / API** | Ubuntu nebo RHEL (Linux) | Nižší TCO, menší footprint |
+
+### Windows Server migrační cesty
+
+| Ze staré verze | Na | Doporučený postup |
+|---------------|-----|-------------------|
+| Windows Server 2012 R2 (EOL 2023) | Windows Server 2022/2025 | In-place upgrade nebo fresh + migration |
+| Windows Server 2016 (EOL 2027) | Windows Server 2022/2025 | In-place upgrade nebo fresh |
+| Windows Server 2019 | Windows Server 2022/2025 | In-place upgrade (`Setup.exe /auto upgrade`) |
+| Windows Server 2022 | Windows Server 2025 | In-place upgrade nebo fresh |
+| Windows Server → Cloud | Azure VM / Azure Stack HCI | Azure Migrate, Storage Migration Service |
+| Windows Server → Linux | Ubuntu / RHEL (re-platform) | Migrace aplikace na .NET Core nebo alternativu |
+
+### Windows — API a provozní limity
+
+| Limit | Windows Server | Windows Desktop |
+|-------|---------------|----------------|
+| **Max RAM** | 24 TB (2025 Datacenter) | 2 TB (Pro/Enterprise), 128 GB (Home) |
+| **Max CPU sockets** | 64 (Datacenter), 2 (Standard) | 2 |
+| **Max CPU cores** | Neomezen | 128 (Pro), 64 (Home) |
+| **Max file size (NTFS)** | 256 TB | 256 TB |
+| **Max file size (ReFS)** | 18.4 EB (2025) | — |
+| **Max volume size (NTFS)** | 256 TB | 256 TB |
+| **Max volume size (ReFS)** | 1.2 YB (teoreticky) | — |
+| **Max dedup volume** | 64 TB (Data Deduplication) | — |
+| **Max cluster nodes** | 64 (Failover Cluster) | — |
+| **Max VM per host** | Neomezen (Datacenter) | — |
+| **VM memory per VM** | 12 TB (2022+) | — |
+| **VM vCPU per VM** | 240 (2022+) | — |
+| **Concurrent RDP** | 2 (admin), 200+ (RDS CAL) | 1 (Home), více (RDP host) |
+| **PowerShell Remoting** | Neomezen (WinRM) | Ano (WinRM) |
+
+- [AI-INFRASTRUCTURE.md](AI-INFRASTRUCTURE.md) — OS pro AI workloady, GPU drivery, kernel parametry
+- [KUBERNETES.md](KUBERNETES.md) — container runtime, orchestrace
+- [HYPERVISORS.md](HYPERVISORS.md) — hypervisory, VM host OS
+- [DATACENTERS.md](DATACENTERS.md) — DC layout, HW platformy
+
+## Zdroje
+
+Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
+
+*Poslední revize: 2026-06-18*
--- a/POSTGRESQL.en.md
+++ b/POSTGRESQL.en.md
@@ -166,7 +166,7 @@ LIMIT 10;

 ## Sources

-References, books, and standards: [sources/databases/sources.md](sources/databases/sources.md)
+References, books, and standards: [sources/databases/sources.en.md](sources/databases/sources.en.md)

 ### Recommended reading

--- a/PROVISIONING.en.md
+++ b/PROVISIONING.en.md
@@ -167,7 +167,7 @@ resource "vsphere_virtual_machine" "web" {
 }
 ```

-More in [CICD.md](CICD.md#infrastructure-as-code-iac).
+More in [CICD.en.md](CICD.en.md#infrastructure-as-code-iac).

 ## Firmware management

@@ -188,7 +188,7 @@ More in [CICD.md](CICD.md#infrastructure-as-code-iac).
 | **Chef** | Ruby DSL | Pull (agent) | Compliance, infrastructure automation |
 | **SaltStack** | YAML/Python | Both (salt-minion) | High-speed config, event-driven |

-More in [CICD.md](CICD.md).
+More in [CICD.en.md](CICD.en.md).

 ## OpenStack Provisioning

@@ -223,6 +223,6 @@ OpenStack offers several methods for provisioning infrastructure:

 ## Sources

-Links, books and standards: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
+Links, books and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)

 *Last revision: 2026-06-03*
--- a/README.en.md
+++ b/README.en.md
@@ -35,11 +35,11 @@ Bilingual: Czech (`.md`) and English (`.en.md`).
  │ PCIe,BM) │ │(BIOS,    │ │  AMD)  │ │  Terraform)  │
  └──────────┘ │ NUMA)    │ └────────┘ └──────────────┘
               └──────────┘
-  ┌──────────┐ ┌──────────┐ ┌────────┐
-  │HYPERVISOR│ │ MONITOR  │ │  CICD  │
-  │(VMware,  │ │(Prom,    │ │(GitOps, │
-  │ KVM, ...)│ │ Grafana) │ │  IaC)   │
-  └──────────┘ └──────────┘ └────────┘
+  ┌──────────┐ ┌──────────┐ ┌────────┐ ┌────────────┐
+  │HYPERVISOR│ │ MONITOR  │ │  CICD  │ │   ☸ K8s    │
+  │(VMware,  │ │(Prom,    │ │(GitOps, │ │(CAPI, K3s, │
+  │ KVM, ...)│ │ Grafana) │ │  IaC)   │ │  RKE2...)  │
+  └──────────┘ └──────────┘ └────────┘ └────────────┘
 ```

 ---
@@ -52,16 +52,22 @@ Bilingual: Czech (`.md`) and English (`.en.md`).
 | 🌐 Network architecture | [NETWORKING.md](NETWORKING.md) | DNS, BGP, VPC, Zero Trust, EVPN VXLAN, TLS | CLOUD |
 | 📊 Monitoring & observability | [MONITORING.md](MONITORING.md) | Prometheus, Grafana, OTel, logging, alerting | — |
 | 🔄 CI/CD & DevOps | [CICD.md](CICD.md) | Pipelines, GitOps, IaC (Terraform), deployment | — |
+| 💻 Operační systémy | [OS.md](OS.md) | Linux distribuce, Windows Server, lifecycle, EOL, kernel | KUBERNETES, HYPERVISORS, AI-INFRASTRUCTURE |
 | 🔄 Disaster Recovery | [DR.md](DR.md) | RTO, RPO, scenarios, prevention, uptime calculation | CLOUD, DATACENTERS, MONITORING |
 | 🗄️ Database architecture | [DATABASES.md](DATABASES.md) | Classification, sharding, replication, caching | POSTGRESQL, MYSQL, ORACLE, MONGODB, REDIS, CASSANDRA, VEKTOROVE-DB, DATABAZOVE-ENGINY |
+| 🗄️ Big Data | [BIG-DATA.md](BIG-DATA.md) | HDFS, Spark, Flink, Trino, Iceberg, Delta Lake, Lakehouse | DATABASES, CLOUD, MESSAGING, KUBERNETES |
 | 🖥️ Hypervisors | [HYPERVISORS.md](HYPERVISORS.md) | VMware, Hyper-V, KVM, Proxmox, migration | STORAGE, SERVER-HW |
-| 🏭 Data centers | [DATACENTERS.md](DATACENTERS.md) | Tier, power, cooling, layout, DC services, secondary DC topologies | MONITORING |
+| 🏭 Data centers | [DATACENTERS.md](DATACENTERS.md) | Tier, power, cooling, layout, DC services, secondary DC topologies | MONITORING, MESSAGING |
 | 💾 Storage | [STORAGE.md](STORAGE.md) | SAN/NAS/object, RAID, SDS, Ceph, OpenStack Cinder/Swift/Manila | — |
 | 🔌 Server connectivity | [CONNECTIVITY.md](CONNECTIVITY.md) | Ethernet, FC SAN, iSCSI, NVMe-oF, SAS | — |
 | 🔧 Server hardware | [SERVER-HW.md](SERVER-HW.md) | CPU, RAM, PCIe, NUMA, BMC | CONNECTIVITY |
 | 🎮 GPU | [GPU.md](GPU.md) | NVIDIA/AMD, NVLink, MIG/vGPU, AI, Cyborg | — |
 | ⚙️ Server config | [SERVER-CONFIG.md](SERVER-CONFIG.md) | BIOS tuning, DB/hypervisor/K8s/storage best practices | — |
 | 📦 Provisioning | [PROVISIONING.md](PROVISIONING.md) | PXE, Redfish, Terraform, Ironic, OpenStack deploy | CICD |
+| ☸ Kubernetes | [KUBERNETES.md](KUBERNETES.md) | K8s architektura, deployment, Cluster API (CAPI) | CICD, CLOUD, NETWORKING |
+| 📨 Messaging & streaming | [MESSAGING.md](MESSAGING.md) | Kafka, RabbitMQ, Pulsar, NATS, managed queue/pubsub | DATACENTERS, CLOUD |
+| 🏗️ Migrace DC | [DC-MIGRATION.md](DC-MIGRATION.md) | Strategie, fáze, network, DB, rollback | DATACENTERS, CLOUD, DR, NETWORKING, STORAGE |
+| 🧠 AI infrastruktura | [AI-INFRASTRUCTURE.md](AI-INFRASTRUCTURE.md) | GPU, AI networking, storage, cluster, cooling, training/inference | GPU, NETWORKING, STORAGE, DATACENTERS, CLOUD |
 | 📋 Legacy index | [HARDWARE.md](HARDWARE.md) | → SERVER-HW, GPU, SERVER-CONFIG, PROVISIONING | SERVER-HW, GPU, SERVER-CONFIG, PROVISIONING |
 | 📋 Legacy infra | [INFRASTRUCTURE.md](INFRASTRUCTURE.md) | → HYPERVISORS, DATACENTERS, STORAGE, HARDWARE | HYPERVISORS, DATACENTERS, STORAGE, HARDWARE |
 | 📋 Review workflow | [REVIEW.md](REVIEW.md) | Review and content control process | — |
@@ -71,12 +77,12 @@ Bilingual: Czech (`.md`) and English (`.en.md`).

 | File | Description |
 |------|-------------|
-| [POSTGRESQL.md](POSTGRESQL.md) | PostgreSQL — architecture, replication, tuning |
-| [MYSQL.md](MYSQL.md) | MySQL & MariaDB |
-| [ORACLE.md](ORACLE.md) | Oracle Database — RAC, Data Guard, tuning |
-| [MONGODB.md](MONGODB.md) | MongoDB — document DB, sharding, replica sets |
-| [REDIS.md](REDIS.md) | Redis — cache, session store, streams |
-| [CASSANDRA.md](CASSANDRA.md) | Cassandra & ScyllaDB — wide-column, nosql |
+| [POSTGRESQL.en.md](POSTGRESQL.en.md) | PostgreSQL — architecture, replication, tuning |
+| [MYSQL.en.md](MYSQL.en.md) | MySQL & MariaDB |
+| [ORACLE.en.md](ORACLE.en.md) | Oracle Database — RAC, Data Guard, tuning |
+| [MONGODB.en.md](MONGODB.en.md) | MongoDB — document DB, sharding, replica sets |
+| [REDIS.en.md](REDIS.en.md) | Redis — cache, session store, streams |
+| [CASSANDRA.en.md](CASSANDRA.en.md) | Cassandra & ScyllaDB — wide-column, nosql |
 | [VEKTOROVE-DB.md](VEKTOROVE-DB.md) | Vector databases — Pinecone, Qdrant, Milvus, pgvector |
 | [DATABAZOVE-ENGINY.md](DATABAZOVE-ENGINY.md) | Common DB concepts — transactions, indexes, locking |

@@ -90,16 +96,22 @@ Bilingual: Czech (`.md`) and English (`.en.md`).
 | 🌐 Network architecture | [NETWORKING.en.md](NETWORKING.en.md) | DNS, BGP, VPC, Zero Trust, EVPN VXLAN, TLS | CLOUD |
 | 📊 Monitoring & observability | [MONITORING.en.md](MONITORING.en.md) | Prometheus, Grafana, OTel, logging, alerting | — |
 | 🔄 CI/CD & DevOps | [CICD.en.md](CICD.en.md) | Pipelines, GitOps, IaC (Terraform), deployment | — |
+| 💻 Operating systems | [OS.en.md](OS.en.md) | Linux distributions, Windows Server, lifecycle, EOL, kernel | KUBERNETES, HYPERVISORS, AI-INFRASTRUCTURE |
 | 🔄 Disaster Recovery | [DR.en.md](DR.en.md) | RTO, RPO, scenarios, prevention, uptime calculation | CLOUD, DATACENTERS, MONITORING |
 | 🗄️ Database architecture | [DATABASES.en.md](DATABASES.en.md) | Classification, sharding, replication, caching | POSTGRESQL, MYSQL, ORACLE, MONGODB, REDIS, CASSANDRA, VECTOR-DBS, DATABASE-ENGINES |
+| 🗄️ Big Data | [BIG-DATA.en.md](BIG-DATA.en.md) | HDFS, Spark, Flink, Trino, Iceberg, Delta Lake, Lakehouse | DATABASES, CLOUD, MESSAGING, KUBERNETES |
 | 🖥️ Hypervisors | [HYPERVISORS.en.md](HYPERVISORS.en.md) | VMware, Hyper-V, KVM, Proxmox, migration | STORAGE, SERVER-HW |
-| 🏭 Data centers | [DATACENTERS.en.md](DATACENTERS.en.md) | Tier, power, cooling, layout, DC services, secondary DC topologies | MONITORING |
+| 🏭 Data centers | [DATACENTERS.en.md](DATACENTERS.en.md) | Tier, power, cooling, layout, DC services, secondary DC topologies | MONITORING, MESSAGING |
 | 💾 Storage | [STORAGE.en.md](STORAGE.en.md) | SAN/NAS/object, RAID, SDS, Ceph | — |
 | 🔌 Server connectivity | [CONNECTIVITY.en.md](CONNECTIVITY.en.md) | Ethernet, FC SAN, iSCSI, NVMe-oF, SAS | — |
 | 🔧 Server hardware | [SERVER-HW.en.md](SERVER-HW.en.md) | CPU, RAM, PCIe, NUMA, BMC | CONNECTIVITY |
 | 🎮 GPU | [GPU.en.md](GPU.en.md) | NVIDIA/AMD, NVLink, MIG/vGPU, AI, Cyborg | — |
 | ⚙️ Server config | [SERVER-CONFIG.en.md](SERVER-CONFIG.en.md) | BIOS tuning, DB/hypervisor/K8s/storage best practices | — |
 | 📦 Provisioning | [PROVISIONING.en.md](PROVISIONING.en.md) | PXE, Redfish, Terraform, Ironic, OpenStack deploy | CICD |
+| ☸ Kubernetes | [KUBERNETES.en.md](KUBERNETES.en.md) | K8s architecture, deployment, Cluster API (CAPI) | CICD, CLOUD, NETWORKING |
+| 📨 Messaging & streaming | [MESSAGING.en.md](MESSAGING.en.md) | Kafka, RabbitMQ, Pulsar, NATS, managed queue/pubsub | DATACENTERS, CLOUD |
+| 🏗️ DC Migration | [DC-MIGRATION.en.md](DC-MIGRATION.en.md) | Strategies, phases, network, DB, rollback | DATACENTERS, CLOUD, DR, NETWORKING, STORAGE |
+| 🧠 AI Infrastructure | [AI-INFRASTRUCTURE.en.md](AI-INFRASTRUCTURE.en.md) | GPU, AI networking, storage, cluster, cooling, training/inference | GPU, NETWORKING, STORAGE, DATACENTERS, CLOUD |
 | 📋 Legacy index | [HARDWARE.en.md](HARDWARE.en.md) | → SERVER-HW, GPU, SERVER-CONFIG, PROVISIONING | SERVER-HW, GPU, SERVER-CONFIG, PROVISIONING |
 | 📋 Legacy infra | [INFRASTRUCTURE.en.md](INFRASTRUCTURE.en.md) | → HYPERVISORS, DATACENTERS, STORAGE, HARDWARE | HYPERVISORS, DATACENTERS, STORAGE, HARDWARE |
 | 📋 Review workflow | [REVIEW.en.md](REVIEW.en.md) | Review and content control process | — |
@@ -124,7 +136,7 @@ Bilingual: Czech (`.md`) and English (`.en.md`).

 | File | Description |
 |------|-------------|
-| [case-studies/proxmox-demo/README.md](case-studies/proxmox-demo/README.md) | Proxmox VE demo cluster — design (CZ) |
+| [case-studies/proxmox-demo/README.md](case-studies/proxmox-demo/README.md) | Proxmox VE demo cluster — návrh (CZ) |
 | [case-studies/proxmox-demo/README.en.md](case-studies/proxmox-demo/README.en.md) | Proxmox VE demo cluster — design (EN) |

 ---
@@ -133,22 +145,28 @@ Bilingual: Czech (`.md`) and English (`.en.md`).

 | File | References |
 |------|------------|
-| `CLOUD.md` / `CLOUD.en.md` | [`GPU.md`](GPU.md), [`NETWORKING.md`](NETWORKING.md), [`sources/cloud/sources.md`](sources/cloud/sources.md) |
-| `NETWORKING.md` / `NETWORKING.en.md` | [`CLOUD.md`](CLOUD.md), [`sources/networking/sources.md`](sources/networking/sources.md) |
-| `DATACENTERS.md` / `DATACENTERS.en.md` | [`MONITORING.md`](MONITORING.md), [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
-| `MONITORING.md` / `MONITORING.en.md` | [`sources/monitoring/sources.md`](sources/monitoring/sources.md) |
-| `CICD.md` / `CICD.en.md` | [`sources/cicd/sources.md`](sources/cicd/sources.md) |
-| `DR.md` / `DR.en.md` | [`CLOUD.md`](CLOUD.md), [`DATACENTERS.md`](DATACENTERS.md), [`MONITORING.md`](MONITORING.md), [`CICD.md`](CICD.md), [`STORAGE.md`](STORAGE.md), [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
-| `PROVISIONING.md` / `PROVISIONING.en.md` | [`CICD.md`](CICD.md), [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
-| `STORAGE.md` / `STORAGE.en.md` | [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
-| `GPU.md` / `GPU.en.md` | [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
-| `SERVER-HW.md` / `SERVER-HW.en.md` | [`CONNECTIVITY.md`](CONNECTIVITY.md), [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
-| `SERVER-CONFIG.md` / `SERVER-CONFIG.en.md` | [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
-| `CONNECTIVITY.md` / `CONNECTIVITY.en.md` | [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
-| `HYPERVISORS.md` / `HYPERVISORS.en.md` | [`STORAGE.md`](STORAGE.md), [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
-| `DATABASES.md` / `DATABASES.en.md` | [`POSTGRESQL.md`](POSTGRESQL.md), [`MYSQL.md`](MYSQL.md), [`ORACLE.md`](ORACLE.md), [`MONGODB.md`](MONGODB.md), [`REDIS.md`](REDIS.md), [`CASSANDRA.md`](CASSANDRA.md), [`VEKTOROVE-DB.md`](VEKTOROVE-DB.md), [`DATABAZOVE-ENGINY.md`](DATABAZOVE-ENGINY.md), [`sources/databases/sources.md`](sources/databases/sources.md) |
-| `HARDWARE.md` / `HARDWARE.en.md` | [`SERVER-HW.md`](SERVER-HW.md), [`GPU.md`](GPU.md), [`SERVER-CONFIG.md`](SERVER-CONFIG.md), [`PROVISIONING.md`](PROVISIONING.md) |
-| `INFRASTRUCTURE.md` / `INFRASTRUCTURE.en.md` | [`HYPERVISORS.md`](HYPERVISORS.md), [`DATACENTERS.md`](DATACENTERS.md), [`STORAGE.md`](STORAGE.md), [`HARDWARE.md`](HARDWARE.md) |
+| `CLOUD.md` / `CLOUD.en.md` | [`GPU.en.md`](GPU.en.md), [`NETWORKING.en.md`](NETWORKING.en.md), [`sources/cloud/sources.en.md`](sources/cloud/sources.en.md) |
+| `NETWORKING.md` / `NETWORKING.en.md` | [`CLOUD.en.md`](CLOUD.en.md), [`sources/networking/sources.en.md`](sources/networking/sources.en.md) |
+| `DATACENTERS.md` / `DATACENTERS.en.md` | [`MONITORING.en.md`](MONITORING.en.md), [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
+| `MONITORING.md` / `MONITORING.en.md` | [`sources/monitoring/sources.en.md`](sources/monitoring/sources.en.md) |
+| `CICD.md` / `CICD.en.md` | [`sources/cicd/sources.en.md`](sources/cicd/sources.en.md) |
+| `DR.md` / `DR.en.md` | [`CLOUD.en.md`](CLOUD.en.md), [`DATACENTERS.en.md`](DATACENTERS.en.md), [`MONITORING.en.md`](MONITORING.en.md), [`CICD.en.md`](CICD.en.md), [`STORAGE.en.md`](STORAGE.en.md), [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
+| `MESSAGING.md` / `MESSAGING.en.md` | [`DATACENTERS.en.md`](DATACENTERS.en.md), [`CLOUD.en.md`](CLOUD.en.md), [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
+| `DC-MIGRATION.md` / `DC-MIGRATION.en.md` | [`DATACENTERS.en.md`](DATACENTERS.en.md), [`CLOUD.en.md`](CLOUD.en.md), [`DR.en.md`](DR.en.md), [`NETWORKING.en.md`](NETWORKING.en.md), [`STORAGE.en.md`](STORAGE.en.md), [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
+| `AI-INFRASTRUCTURE.md` / `AI-INFRASTRUCTURE.en.md` | [`GPU.en.md`](GPU.en.md), [`NETWORKING.en.md`](NETWORKING.en.md), [`STORAGE.en.md`](STORAGE.en.md), [`DATACENTERS.en.md`](DATACENTERS.en.md), [`CLOUD.en.md`](CLOUD.en.md), [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
+| `PROVISIONING.md` / `PROVISIONING.en.md` | [`CICD.en.md`](CICD.en.md), [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
+| `STORAGE.md` / `STORAGE.en.md` | [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
+| `GPU.md` / `GPU.en.md` | [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
+| `SERVER-HW.md` / `SERVER-HW.en.md` | [`CONNECTIVITY.en.md`](CONNECTIVITY.en.md), [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
+| `SERVER-CONFIG.md` / `SERVER-CONFIG.en.md` | [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
+| `CONNECTIVITY.md` / `CONNECTIVITY.en.md` | [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
+| `HYPERVISORS.md` / `HYPERVISORS.en.md` | [`STORAGE.en.md`](STORAGE.en.md), [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
+| `DATABASES.md` / `DATABASES.en.md` | [`POSTGRESQL.en.md`](POSTGRESQL.en.md), [`MYSQL.en.md`](MYSQL.en.md), [`ORACLE.en.md`](ORACLE.en.md), [`MONGODB.en.md`](MONGODB.en.md), [`REDIS.en.md`](REDIS.en.md), [`CASSANDRA.en.md`](CASSANDRA.en.md), [`VEKTOROVE-DB.md`](VEKTOROVE-DB.md), [`DATABAZOVE-ENGINY.md`](DATABAZOVE-ENGINY.md), [`sources/databases/sources.en.md`](sources/databases/sources.en.md) |
+| `HARDWARE.md` / `HARDWARE.en.md` | [`SERVER-HW.en.md`](SERVER-HW.en.md), [`GPU.en.md`](GPU.en.md), [`SERVER-CONFIG.en.md`](SERVER-CONFIG.en.md), [`PROVISIONING.en.md`](PROVISIONING.en.md) |
+| `OS.md` / `OS.en.md` | [`AI-INFRASTRUCTURE.en.md`](AI-INFRASTRUCTURE.en.md), [`KUBERNETES.en.md`](KUBERNETES.en.md), [`HYPERVISORS.en.md`](HYPERVISORS.en.md), [`DATACENTERS.en.md`](DATACENTERS.en.md), [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
+| `KUBERNETES.md` / `KUBERNETES.en.md` | [`CICD.en.md`](CICD.en.md), [`CLOUD.en.md`](CLOUD.en.md), [`NETWORKING.en.md`](NETWORKING.en.md), [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
+| `BIG-DATA.md` / `BIG-DATA.en.md` | [`DATABASES.en.md`](DATABASES.en.md), [`CLOUD.en.md`](CLOUD.en.md), [`MESSAGING.en.md`](MESSAGING.en.md), [`KUBERNETES.en.md`](KUBERNETES.en.md), [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
+| `INFRASTRUCTURE.md` / `INFRASTRUCTURE.en.md` | [`HYPERVISORS.en.md`](HYPERVISORS.en.md), [`DATACENTERS.en.md`](DATACENTERS.en.md), [`STORAGE.en.md`](STORAGE.en.md), [`HARDWARE.en.md`](HARDWARE.en.md) |

 ---

@@ -190,4 +208,4 @@ Raw reference data (documentation, books, standards) by area:

 ---

-*This index is automatically maintained by the `kb-index` agent. Last updated: 2026-06-11.*
+*This index is automatically maintained by the `kb-index` agent. Last updated: 2026-06-18.*
--- a/README.md
+++ b/README.md
@@ -35,11 +35,11 @@ Bilingual: Czech (`.md`) and English (`.en.md`).
  │ PCIe,BM) │ │(BIOS,    │ │  AMD)  │ │  Terraform)  │
  └──────────┘ │ NUMA)    │ └────────┘ └──────────────┘
               └──────────┘
-  ┌──────────┐ ┌──────────┐ ┌────────┐
-  │HYPERVISOR│ │ MONITOR  │ │  CICD  │
-  │(VMware,  │ │(Prom,    │ │(GitOps, │
-  │ KVM, ...)│ │ Grafana) │ │  IaC)   │
-  └──────────┘ └──────────┘ └────────┘
+  ┌──────────┐ ┌──────────┐ ┌────────┐ ┌────────────┐
+  │HYPERVISOR│ │ MONITOR  │ │  CICD  │ │   ☸ K8s    │
+  │(VMware,  │ │(Prom,    │ │(GitOps, │ │(CAPI, K3s, │
+  │ KVM, ...)│ │ Grafana) │ │  IaC)   │ │  RKE2...)  │
+  └──────────┘ └──────────┘ └────────┘ └────────────┘
 ```

 ---
@@ -52,8 +52,10 @@ Bilingual: Czech (`.md`) and English (`.en.md`).
 | 🌐 Síťová architektura | [NETWORKING.md](NETWORKING.md) | DNS, BGP, VPC, Zero Trust, EVPN VXLAN, TLS | CLOUD |
 | 📊 Monitoring a observabilita | [MONITORING.md](MONITORING.md) | Prometheus, Grafana, OTel, logging, alerting, SLO | — |
 | 🔄 CI/CD a DevOps | [CICD.md](CICD.md) | Pipelines, GitOps, IaC (Terraform), deployment strategie | — |
+| 💻 Operační systémy | [OS.md](OS.md) | Linux distribuce, Windows Server, lifecycle, EOL, kernel | KUBERNETES, HYPERVISORS, AI-INFRASTRUCTURE |
 | 🔄 Disaster Recovery | [DR.md](DR.md) | RTO, RPO, scénáře, prevence, výpočet uptimu | CLOUD, DATACENTERS, MONITORING |
 | 🗄️ Databázová architektura | [DATABASES.md](DATABASES.md) | Klasifikace, sharding, replikace, caching | POSTGRESQL, MYSQL, ORACLE, MONGODB, REDIS, CASSANDRA, VEKTOROVE-DB, DATABAZOVE-ENGINY |
+| 🗄️ Big Data | [BIG-DATA.md](BIG-DATA.md) | HDFS, Spark, Flink, Trino, Iceberg, Delta Lake, Lakehouse | DATABASES, CLOUD, MESSAGING, KUBERNETES |
 | 🖥️ Hypervisory | [HYPERVISORS.md](HYPERVISORS.md) | VMware, Hyper-V, KVM, Proxmox, migrace | STORAGE, SERVER-HW |
 | 🏭 Datová centra | [DATACENTERS.md](DATACENTERS.md) | Tier, power, cooling, layout, DC služby, sekundární DC topologie | MONITORING, MESSAGING |
 | 💾 Storage | [STORAGE.md](STORAGE.md) | SAN/NAS/object, RAID, SDS, Ceph, OpenStack Cinder/Swift/Manila | — |
@@ -62,8 +64,10 @@ Bilingual: Czech (`.md`) and English (`.en.md`).
 | 🎮 GPU | [GPU.md](GPU.md) | NVIDIA/AMD, NVLink, MIG/vGPU, AI, Cyborg | — |
 | ⚙️ Server config | [SERVER-CONFIG.md](SERVER-CONFIG.md) | BIOS tuning, DB/hypervisor/K8s/storage best practices | — |
 | 📦 Provisioning | [PROVISIONING.md](PROVISIONING.md) | PXE, Redfish, Terraform, Ironic, OpenStack deploy | CICD |
+| ☸ Kubernetes | [KUBERNETES.md](KUBERNETES.md) | K8s architektura, deployment, Cluster API (CAPI) | CICD, CLOUD, NETWORKING |
 | 📨 Messaging & streaming | [MESSAGING.md](MESSAGING.md) | Kafka, RabbitMQ, Pulsar, NATS, managed queue/pubsub | DATACENTERS, CLOUD |
 | 🏗️ Migrace DC | [DC-MIGRATION.md](DC-MIGRATION.md) | Strategie, fáze, network, DB, rollback | DATACENTERS, CLOUD, DR, NETWORKING, STORAGE |
+| 🧠 AI infrastruktura | [AI-INFRASTRUCTURE.md](AI-INFRASTRUCTURE.md) | GPU, AI networking, storage, cluster, cooling, training/inference | GPU, NETWORKING, STORAGE, DATACENTERS, CLOUD |
 | 📋 Původní rozcestník | [HARDWARE.md](HARDWARE.md) | Legacy index → SERVER-HW, GPU, SERVER-CONFIG, PROVISIONING | SERVER-HW, GPU, SERVER-CONFIG, PROVISIONING |
 | 📋 Původní infrastruktura | [INFRASTRUCTURE.md](INFRASTRUCTURE.md) | Legacy index → HYPERVISORS, DATACENTERS, STORAGE, HARDWARE | HYPERVISORS, DATACENTERS, STORAGE, HARDWARE |
 | 📋 Review workflow | [REVIEW.md](REVIEW.md) | Proces oponentury a kontroly obsahu | — |
@@ -92,8 +96,10 @@ Bilingual: Czech (`.md`) and English (`.en.md`).
 | 🌐 Network architecture | [NETWORKING.en.md](NETWORKING.en.md) | DNS, BGP, VPC, Zero Trust, EVPN VXLAN, TLS | CLOUD |
 | 📊 Monitoring & observability | [MONITORING.en.md](MONITORING.en.md) | Prometheus, Grafana, OTel, logging, alerting | — |
 | 🔄 CI/CD & DevOps | [CICD.en.md](CICD.en.md) | Pipelines, GitOps, IaC (Terraform), deployment | — |
+| 💻 Operating systems | [OS.en.md](OS.en.md) | Linux distributions, Windows Server, lifecycle, EOL, kernel | KUBERNETES, HYPERVISORS, AI-INFRASTRUCTURE |
 | 🔄 Disaster Recovery | [DR.en.md](DR.en.md) | RTO, RPO, scenarios, prevention, uptime calculation | CLOUD, DATACENTERS, MONITORING |
 | 🗄️ Database architecture | [DATABASES.en.md](DATABASES.en.md) | Classification, sharding, replication, caching | POSTGRESQL, MYSQL, ORACLE, MONGODB, REDIS, CASSANDRA, VECTOR-DBS, DATABASE-ENGINES |
+| 🗄️ Big Data | [BIG-DATA.en.md](BIG-DATA.en.md) | HDFS, Spark, Flink, Trino, Iceberg, Delta Lake, Lakehouse | DATABASES, CLOUD, MESSAGING, KUBERNETES |
 | 🖥️ Hypervisors | [HYPERVISORS.en.md](HYPERVISORS.en.md) | VMware, Hyper-V, KVM, Proxmox, migration | STORAGE, SERVER-HW |
 | 🏭 Data centers | [DATACENTERS.en.md](DATACENTERS.en.md) | Tier, power, cooling, layout, DC services, secondary DC topologies | MONITORING, MESSAGING |
 | 💾 Storage | [STORAGE.en.md](STORAGE.en.md) | SAN/NAS/object, RAID, SDS, Ceph | — |
@@ -102,8 +108,10 @@ Bilingual: Czech (`.md`) and English (`.en.md`).
 | 🎮 GPU | [GPU.en.md](GPU.en.md) | NVIDIA/AMD, NVLink, MIG/vGPU, AI, Cyborg | — |
 | ⚙️ Server config | [SERVER-CONFIG.en.md](SERVER-CONFIG.en.md) | BIOS tuning, DB/hypervisor/K8s/storage best practices | — |
 | 📦 Provisioning | [PROVISIONING.en.md](PROVISIONING.en.md) | PXE, Redfish, Terraform, Ironic, OpenStack deploy | CICD |
+| ☸ Kubernetes | [KUBERNETES.en.md](KUBERNETES.en.md) | K8s architecture, deployment, Cluster API (CAPI) | CICD, CLOUD, NETWORKING |
 | 📨 Messaging & streaming | [MESSAGING.en.md](MESSAGING.en.md) | Kafka, RabbitMQ, Pulsar, NATS, managed queue/pubsub | DATACENTERS, CLOUD |
 | 🏗️ DC Migration | [DC-MIGRATION.en.md](DC-MIGRATION.en.md) | Strategies, phases, network, DB, rollback | DATACENTERS, CLOUD, DR, NETWORKING, STORAGE |
+| 🧠 AI Infrastructure | [AI-INFRASTRUCTURE.en.md](AI-INFRASTRUCTURE.en.md) | GPU, AI networking, storage, cluster, cooling, training/inference | GPU, NETWORKING, STORAGE, DATACENTERS, CLOUD |
 | 📋 Legacy index | [HARDWARE.en.md](HARDWARE.en.md) | → SERVER-HW, GPU, SERVER-CONFIG, PROVISIONING | SERVER-HW, GPU, SERVER-CONFIG, PROVISIONING |
 | 📋 Legacy infra | [INFRASTRUCTURE.en.md](INFRASTRUCTURE.en.md) | → HYPERVISORS, DATACENTERS, STORAGE, HARDWARE | HYPERVISORS, DATACENTERS, STORAGE, HARDWARE |
 | 📋 Review workflow | [REVIEW.en.md](REVIEW.en.md) | Review and content control process | — |
@@ -145,6 +153,8 @@ Bilingual: Czech (`.md`) and English (`.en.md`).
 | `DR.md` / `DR.en.md` | [`CLOUD.md`](CLOUD.md), [`DATACENTERS.md`](DATACENTERS.md), [`MONITORING.md`](MONITORING.md), [`CICD.md`](CICD.md), [`STORAGE.md`](STORAGE.md), [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
 | `MESSAGING.md` / `MESSAGING.en.md` | [`DATACENTERS.md`](DATACENTERS.md), [`CLOUD.md`](CLOUD.md), [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
 | `DC-MIGRATION.md` / `DC-MIGRATION.en.md` | [`DATACENTERS.md`](DATACENTERS.md), [`CLOUD.md`](CLOUD.md), [`DR.md`](DR.md), [`NETWORKING.md`](NETWORKING.md), [`STORAGE.md`](STORAGE.md), [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
+| `OS.md` / `OS.en.md` | [`AI-INFRASTRUCTURE.md`](AI-INFRASTRUCTURE.md), [`KUBERNETES.md`](KUBERNETES.md), [`HYPERVISORS.md`](HYPERVISORS.md), [`DATACENTERS.md`](DATACENTERS.md), [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
+| `AI-INFRASTRUCTURE.md` / `AI-INFRASTRUCTURE.en.md` | [`GPU.md`](GPU.md), [`NETWORKING.md`](NETWORKING.md), [`STORAGE.md`](STORAGE.md), [`DATACENTERS.md`](DATACENTERS.md), [`CLOUD.md`](CLOUD.md), [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
 | `PROVISIONING.md` / `PROVISIONING.en.md` | [`CICD.md`](CICD.md), [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
 | `STORAGE.md` / `STORAGE.en.md` | [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
 | `GPU.md` / `GPU.en.md` | [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
@@ -155,6 +165,8 @@ Bilingual: Czech (`.md`) and English (`.en.md`).
 | `DATABASES.md` / `DATABASES.en.md` | [`POSTGRESQL.md`](POSTGRESQL.md), [`MYSQL.md`](MYSQL.md), [`ORACLE.md`](ORACLE.md), [`MONGODB.md`](MONGODB.md), [`REDIS.md`](REDIS.md), [`CASSANDRA.md`](CASSANDRA.md), [`VEKTOROVE-DB.md`](VEKTOROVE-DB.md), [`DATABAZOVE-ENGINY.md`](DATABAZOVE-ENGINY.md), [`sources/databases/sources.md`](sources/databases/sources.md) |
 | `HARDWARE.md` / `HARDWARE.en.md` | [`SERVER-HW.md`](SERVER-HW.md), [`GPU.md`](GPU.md), [`SERVER-CONFIG.md`](SERVER-CONFIG.md), [`PROVISIONING.md`](PROVISIONING.md) |
 | `INFRASTRUCTURE.md` / `INFRASTRUCTURE.en.md` | [`HYPERVISORS.md`](HYPERVISORS.md), [`DATACENTERS.md`](DATACENTERS.md), [`STORAGE.md`](STORAGE.md), [`HARDWARE.md`](HARDWARE.md) |
+| `KUBERNETES.md` / `KUBERNETES.en.md` | [`CICD.md`](CICD.md), [`CLOUD.md`](CLOUD.md), [`NETWORKING.md`](NETWORKING.md), [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
+| `BIG-DATA.md` / `BIG-DATA.en.md` | [`DATABASES.md`](DATABASES.md), [`CLOUD.md`](CLOUD.md), [`MESSAGING.md`](MESSAGING.md), [`KUBERNETES.md`](KUBERNETES.md), [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |

 ---

@@ -196,4 +208,4 @@ Raw referenční data (dokumentace, knihy, standardy) podle oblastí:

 ---

-*Rozcestník je automaticky udržován agentem `kb-index`. Poslední aktualizace: 2026-06-12.*
+*Rozcestník je automaticky udržován agentem `kb-index`. Poslední aktualizace: 2026-06-18.*
--- a/REDIS.en.md
+++ b/REDIS.en.md
@@ -114,6 +114,6 @@ Redis underwent a major license change in 2024:

 ## Sources

-References, books, and standards: [sources/databases/sources.md](sources/databases/sources.md)
+References, books, and standards: [sources/databases/sources.en.md](sources/databases/sources.en.md)

 *Last revision: 2026-06-03*
--- a/SERVER-CONFIG.en.md
+++ b/SERVER-CONFIG.en.md
@@ -752,6 +752,6 @@ flowchart TD

 ## Sources

-Links, books and standards: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
+Links, books and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)

 *Last revision: 2026-06-03*
--- a/SERVER-HW.en.md
+++ b/SERVER-HW.en.md
@@ -230,6 +230,10 @@ Conclusion: 8 DIMMs per CPU (1DPC) = highest performance
 | AI training (CPU preprocessing) | 2-4 GB/core | 128-512 GB | 8× 32-64 GB RDIMM, 1DPC |
 | HPC | 1-2 GB/core | 64-128 GB | 8× 16 GB RDIMM, 1DPC, high-speed |
 | In-memory DB (SAP HANA) | 8-32 GB/core | 1-6 TB+ | 16× 128-256 GB LRDIMM/3DS |
+| Big Data — Spark worker | 4-8 GB/core | 128-512 GB | 8-16× 32-64 GB RDIMM, 1DPC, NVMe scratch |
+| Big Data — Flink worker | 8-16 GB/core (incl. managed state) | 128-512 GB | 8-16× 32-64 GB RDIMM, 1DPC, RocksDB on NVMe |
+| Big Data — Trino worker | 4-8 GB/core | 64-256 GB | 8× 16-32 GB RDIMM, 1DPC |
+| Big Data — HDFS DataNode | 1-2 GB/core (metadata cache) | 64-256 GB | 8× 16-32 GB RDIMM, 1DPC, max storage density |

 ## PCIe

@@ -324,7 +328,7 @@ Socket 0 (NUMA node 0)              Socket 1 (NUMA node 1)

 ## Server connectivity

-Detailed chapter on network and storage connectivity: [CONNECTIVITY.md](CONNECTIVITY.md)
+Detailed chapter on network and storage connectivity: [CONNECTIVITY.en.md](CONNECTIVITY.en.md)

 ## Storage controllers

@@ -346,8 +350,51 @@ Detailed chapter on network and storage connectivity: [CONNECTIVITY.md](CONNECTI
 | **Use case** | SDS (Ceph, MinIO), ZFS | VMware VMFS, Windows, legacy |
 | **Battery/Backup** | Not needed | Write-back cache requires BBU |

+## Pricing (2026)
+
+### CPU pricing (2026)
+| CPU | Cores | TDP | 1ku price | $/core |
+|-----|-------|-----|----------|--------|
+| AMD EPYC 9965 (Turin) | 192 | 500 W | ~$11,988 | $62 |
+| AMD EPYC 9655 (Turin) | 96 | 400 W | ~$6,500 | $68 |
+| AMD EPYC 9475F (Turin) | 48 | 360 W | ~$5,000 | $104 |
+| Intel Xeon 6980P (Granite Rapids) | 128 | 500 W | ~$12,460 | $97 |
+| Intel Xeon 6980P (Granite Rapids-AP) | 128 | 500 W | $13,955 | $109 |
+| Intel Xeon 6767P (Granite Rapids) | 64 | 350 W | ~$7,000 | $109 |
+
+Sources: AMD 1ku pricing, Intel RCP, Newegg verified.
+
+### DDR5 RDIMM pricing (2026 — AI-driven price surge)
+| Capacity | Speed | Price 2025 | Price Q2 2026 | Change |
+|----------|---------|-----------|-------------|-------|
+| 32 GB (2R×8) | DDR5-5600 | ~$95 | ~$400–550 | +400–500 % |
+| 64 GB (2R×4) | DDR5-4800 | ~$180 | ~$700–900 | +400 % |
+| 96 GB (2R×4) | DDR5-6400 | ~$300 | ~$1,200–1,600 | +400 % |
+| 128 GB (2R×4) | DDR5-6400 | ~$450 | ~$1,800–2,500 | +450 % |
+| 256 GB (LRDIMM) | DDR5-6400 | ~$900 | ~$4,000–5,000 | +450 % |
+
+Trend: DDR5 prices have risen ~400–500 % since mid-2025 due to AI-driven demand. Further increases expected in H2 2026. Source: Counterpoint, TrendForce.
+
+### NVMe SSD pricing (enterprise, 2026)
+| Capacity | Type | Price 2024 | Price Q2 2026 | Change |
+|----------|-----|-----------|-------------|-------|
+| 1.92 TB | NVMe U.3 (read-intensive) | ~$200 | ~$500–600 | +150 % |
+| 3.84 TB | NVMe U.3 (mixed-use) | ~$400 | ~$1,000–1,200 | +150 % |
+| 7.68 TB | NVMe U.3 (mixed-use) | ~$800 | ~$2,000–2,500 | +150 % |
+| 15.36 TB | NVMe U.3 (mixed-use) | ~$1,500 | ~$4,000–5,000 | +170 % |
+
+Trend: NAND flash prices have risen ~100–200 % since 2025, average enterprise SSD now costs 2–3× more. Source: TrendForce, Xinnor.
+
+### Total server cost (example configurations)
+| Configuration | CPU | RAM | Storage | Estimated Price |
+|-------------|-----|-----|------|-----------|
+| DB server (OLTP) | 2× EPYC 9655 (96C) | 1 TB DDR5 | 6× 1.92 TB NVMe | ~$45,000–60,000 |
+| GPU server (AI) | 2× Xeon 6980P | 2 TB DDR5 | 4× 3.84 TB NVMe | ~$80,000–120,000 (w/o GPU) |
+| Hypervisor host | 2× EPYC 9475F (48C) | 512 GB DDR5 | 2× 1.92 TB NVMe + 4× 16 TB HDD | ~$25,000–35,000 |
+| Storage server (Ceph) | 1× EPYC 9655 (96C) | 256 GB DDR5 | 24× 15.36 TB NVMe | ~$60,000–80,000 |
+
 ## Sources

-Links, books and standards: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
+Links, books and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)

 *Last revision: 2026-06-03*
--- a/SERVER-HW.md
+++ b/SERVER-HW.md
@@ -230,6 +230,10 @@ Závěr: 8 DIMMů na CPU (1DPC) = nejvyšší výkon
 | AI training (CPU preprocessing) | 2-4 GB/core | 128-512 GB | 8× 32-64 GB RDIMM, 1DPC |
 | HPC | 1-2 GB/core | 64-128 GB | 8× 16 GB RDIMM, 1DPC, high-speed |
 | In-memory DB (SAP HANA) | 8-32 GB/core | 1-6 TB+ | 16× 128-256 GB LRDIMM/3DS |
+| Big Data — Spark worker | 4-8 GB/core | 128-512 GB | 8-16× 32-64 GB RDIMM, 1DPC, NVMe scratch |
+| Big Data — Flink worker | 8-16 GB/core (vč. managed state) | 128-512 GB | 8-16× 32-64 GB RDIMM, 1DPC, RocksDB na NVMe |
+| Big Data — Trino worker | 4-8 GB/core | 64-256 GB | 8× 16-32 GB RDIMM, 1DPC |
+| Big Data — HDFS DataNode | 1-2 GB/core (metadata cache) | 64-256 GB | 8× 16-32 GB RDIMM, 1DPC, max storage density |

 ## PCIe

@@ -346,6 +350,49 @@ Detailní kapitola o síťové a storage konektivitě: [CONNECTIVITY.md](CONNECT
 | **Use case** | SDS (Ceph, MinIO), ZFS | VMware VMFS, Windows, legacy |
 | **Battery/Backup** | Není potřeba | Write-back cache vyžaduje BBU |

+## Ceny (2026)
+
+### CPU ceny (2026)
+| CPU | Cores | TDP | 1ku cena | $/core |
+|-----|-------|-----|----------|--------|
+| AMD EPYC 9965 (Turin) | 192 | 500 W | ~$11 988 | $62 |
+| AMD EPYC 9655 (Turin) | 96 | 400 W | ~$6 500 | $68 |
+| AMD EPYC 9475F (Turin) | 48 | 360 W | ~$5 000 | $104 |
+| Intel Xeon 6980P (Granite Rapids) | 128 | 500 W | ~$12 460 | $97 |
+| Intel Xeon 6980P (Granite Rapids-AP) | 128 | 500 W | $13 955 | $109 |
+| Intel Xeon 6767P (Granite Rapids) | 64 | 350 W | ~$7 000 | $109 |
+
+Sources: AMD 1ku pricing, Intel RCP, Newegg verified.
+
+### DDR5 RDIMM ceny (2026 — AI-driven price surge)
+| Kapacita | Rychlost | Cena 2025 | Cena Q2 2026 | Změna |
+|----------|---------|-----------|-------------|-------|
+| 32 GB (2R×8) | DDR5-5600 | ~$95 | ~$400–550 | +400–500 % |
+| 64 GB (2R×4) | DDR5-4800 | ~$180 | ~$700–900 | +400 % |
+| 96 GB (2R×4) | DDR5-6400 | ~$300 | ~$1 200–1 600 | +400 % |
+| 128 GB (2R×4) | DDR5-6400 | ~$450 | ~$1 800–2 500 | +450 % |
+| 256 GB (LRDIMM) | DDR5-6400 | ~$900 | ~$4 000–5 000 | +450 % |
+
+Trend: DDR5 ceny vzrostly ~400–500 % od mid-2025 kvůli AI-driven poptávce. Očekává se další růst v H2 2026. Zdroj: Counterpoint, TrendForce.
+
+### NVMe SSD ceny (enterprise, 2026)
+| Kapacita | Typ | Cena 2024 | Cena Q2 2026 | Změna |
+|----------|-----|-----------|-------------|-------|
+| 1.92 TB | NVMe U.3 (read-intensive) | ~$200 | ~$500–600 | +150 % |
+| 3.84 TB | NVMe U.3 (mixed-use) | ~$400 | ~$1 000–1 200 | +150 % |
+| 7.68 TB | NVMe U.3 (mixed-use) | ~$800 | ~$2 000–2 500 | +150 % |
+| 15.36 TB | NVMe U.3 (mixed-use) | ~$1 500 | ~$4 000–5 000 | +170 % |
+
+Trend: NAND flash ceny vzrostly ~100–200 % od 2025, průměrný enterprise SSD stojí 2–3× více. Zdroj: TrendForce, Xinnor.
+
+### Celková cena serveru (příkladové konfigurace)
+| Konfigurace | CPU | RAM | Disk | Odhad ceny |
+|-------------|-----|-----|------|-----------|
+| DB server (OLTP) | 2× EPYC 9655 (96C) | 1 TB DDR5 | 6× 1.92 TB NVMe | ~$45 000–60 000 |
+| GPU server (AI) | 2× Xeon 6980P | 2 TB DDR5 | 4× 3.84 TB NVMe | ~$80 000–120 000 (bez GPU) |
+| Hypervisor host | 2× EPYC 9475F (48C) | 512 GB DDR5 | 2× 1.92 TB NVMe + 4× 16 TB HDD | ~$25 000–35 000 |
+| Storage server (Ceph) | 1× EPYC 9655 (96C) | 256 GB DDR5 | 24× 15.36 TB NVMe | ~$60 000–80 000 |
+
 ## Zdroje

 Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
--- a/STORAGE.en.md
+++ b/STORAGE.en.md
@@ -270,9 +270,60 @@ OpenStack offers three main storage services:

 Ceph is the most common storage backend for OpenStack: Cinder (RBD), Swift (RGW), Manila (CephFS), Glance (RBD images).

+## Big Data storage
+
+### HDFS cluster
+
+HDFS is the primary storage for the Hadoop ecosystem (on-prem). Typical configuration:
+
+| Parameter | Value | Note |
+|-----------|-------|------|
+| **Disk per DataNode** | 8–24 × HDD (14–22 TB) + 2× NVMe (metadata, cache) | Balance capacity / performance |
+| **Replication factor** | 3× | Rack-aware |
+| **Network** | 2× 25/100 GbE (data) + 1× 1 GbE (management) | Data + replication traffic |
+| **RAM** | 64–256 GB (OS cache + metadata) | HDFS cache + OS buffer cache |
+| **CPU** | 16–32 cores | HDFS overhead is low |
+| **NameNode HA** | Active + Standby + JN (JournalNode) | Quorum-based HA |
+| **Use case** | Sequential read/write, large files, Spark YARN |
+
+**Model cluster — 1 PB usable:**
+
+- 10× DataNode (12× 18 TB HDD, 2× 1.9 TB NVMe)
+- 2× NameNode (HA, 256 GB RAM)
+- 3× JournalNode (small VMs)
+- Replication 3× → raw ~ 2.2 PB
+- Network: 25 GbE for data, 100 GbE for shuffle-heavy Spark
+
+### Object storage as Data Lake (S3/GCS/MinIO)
+
+For new projects (Spark on K8s, Iceberg/Delta, lakehouse), object storage is preferred over HDFS:
+
+| Platform | Advantages | Limits |
+|----------|-----------|--------|
+| **MinIO** (on-prem) | S3 API, erasure coding, NVMe direct, high throughput | Single tenant (per cluster) |
+| **Pure //C** (on-prem) | QLC NVMe, dedupe, S3 + NFS | Higher $/TB |
+| **AWS S3** (cloud) | Unlimited capacity, Iceberg/Delta support | Egress fees |
+| **Azure ADLS** (cloud) | Hierarchical namespace, HNS, POSIX-like ACLs | Vendor lock |
+| **GCP GCS** (cloud) | Uniform + fine-grained ACLs, object versioning | Region restrictions |
+
+### Comparison: HDFS vs Object Storage for Big Data
+
+| Criteria | HDFS | Object Storage (S3/MinIO) |
+|----------|------|-------------------------|
+| **Architecture** | Master/worker (NameNode SPOF) | Distributed, no SPOF (erasure coding) |
+| **Consistency** | Strong (single writer per file) | Eventual (S3) / Strong (MinIO) |
+| **Throughput** | High (rack-aware, locality) | High (network-bound) |
+| **Scaling** | Horizontal (DataNode) | Horizontal (stateless) |
+| **Cost** | Low (HDD) | Medium (S3 API) |
+| **Metadata** | NameNode (1M blocks ~ 1 GB RAM) | Object-level (flat namespace) |
+| **Spark integration** | Native (locality-optimized) | S3A connector, Hadoop Compatible |
+| **2026 trend** | Legacy, declining | Standard for new projects |
+
+For more information about Big Data see [BIG-DATA.en.md](BIG-DATA.en.md).
+
 ## Sources

-Links, books and standards: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
+Links, books and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)

 ### Recommended reading

--- a/STORAGE.md
+++ b/STORAGE.md
@@ -270,6 +270,57 @@ OpenStack nabízí tři hlavní storage služby:

 Ceph je nejčastější storage backend pro OpenStack: Cinder (RBD), Swift (RGW), Manila (CephFS), Glance (RBD images).

+## Big Data storage
+
+### HDFS cluster
+
+HDFS je primární storage pro Hadoop ekosystém (on-prem). Typická konfigurace:
+
+| Parametr | Hodnota | Poznámka |
+|----------|---------|----------|
+| **Disk per DataNode** | 8–24 × HDD (14–22 TB) + 2× NVMe (metadata, cache) | Balance capacity / performance |
+| **Replication factor** | 3× | Rack-aware |
+| **Network** | 2× 25/100 GbE (data) + 1× 1 GbE (management) | Data + replication traffic |
+| **RAM** | 64–256 GB (OS cache + metadata) | HDFS cache + OS buffer cache |
+| **CPU** | 16–32 cores | HDFS overhead je nízký |
+| **NameNode HA** | Active + Standby + JN (JournalNode) | Quorum-based HA |
+| **Use case** | Secvenční čtení/zápis, velké soubory, Spark YARN |
+
+**Modelový cluster — 1 PB usable:**
+
+- 10× DataNode (12× 18 TB HDD, 2× 1.9 TB NVMe)
+- 2× NameNode (HA, 256 GB RAM)
+- 3× JournalNode (malé VM)
+- Replication 3× → raw ~ 2.2 PB
+- Network: 25 GbE pro data, 100 GbE pro shuffle-heavy Spark
+
+### Object storage jako Data Lake (S3/GCS/MinIO)
+
+Pro nové projekty (Spark on K8s, Iceberg/Delta, lakehouse) se preferuje object storage před HDFS:
+
+| Platforma | Výhody | Limity |
+|-----------|--------|--------|
+| **MinIO** (on-prem) | S3 API, erasure coding, NVMe direct, high throughput | Single tenant (per cluster) |
+| **Pure //C** (on-prem) | QLC NVMe, dedupe, S3 + NFS | Vyšší cena/TB |
+| **AWS S3** (cloud) | Neomezená kapacita, Iceberg/Delta support | Egress fees |
+| **Azure ADLS** (cloud) | Hierarchical namespace, HNS, POSIX-like ACLs | Vendor lock |
+| **GCP GCS** (cloud) | Uniform + fine-grained ACLs, object versioning | Region restrictions |
+
+### Srovnání: HDFS vs Object Storage pro Big Data
+
+| Kritérium | HDFS | Object Storage (S3/MinIO) |
+|-----------|------|-------------------------|
+| **Architektura** | Master/worker (NameNode SPOF) | Distributed, no SPOF (erasure coding) |
+| **Konzistence** | Strong (jediný writer per file) | Eventual (S3) / Strong (MinIO) |
+| **Propustnost** | Vysoká (rack-aware, locality) | Vysoká (network-bound) |
+| **Škálování** | Horizontální (DataNode) | Horizontální (stateless) |
+| **Cena** | Nízká (HDD) | Střední (S3 API) |
+| **Metadata** | NameNode (1 mil. bloků ~ 1 GB RAM) | Object-level (flat namespace) |
+| **Spark integration** | Native (locality optimalizace) | S3A connector, Hadoop Compatible |
+| **2026 trend** | Legacy, klesající | Standard pro nové projekty |
+
+Podrobnější informace o Big Data viz [BIG-DATA.md](BIG-DATA.md).
+
 ## Zdroje

 Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
--- a/VECTOR-DBS.en.md
+++ b/VECTOR-DBS.en.md
@@ -94,7 +94,7 @@ Variants:

 ## Sources

-References, books, and standards: [sources/databases/sources.md](sources/databases/sources.md)
+References, books, and standards: [sources/databases/sources.en.md](sources/databases/sources.en.md)

 ### Recommended reading

--- a/sources/infrastructure/sources.en.md
+++ b/sources/infrastructure/sources.en.md
@@ -1,10 +1,10 @@
 # Infrastructure — Sources

 Split into separate files:
- [HYPERVISORS.md](../../HYPERVISORS.md) — hypervisors and virtualization
- [DATACENTERS.md](../../DATACENTERS.md) — data centers
- [STORAGE.md](../../STORAGE.md) — storage
- [HARDWARE.md](../../HARDWARE.md) — hardware and servers
+- [HYPERVISORS.en.md](../../HYPERVISORS.en.md) — hypervisors and virtualization
+- [DATACENTERS.en.md](../../DATACENTERS.en.md) — data centers
+- [STORAGE.en.md](../../STORAGE.en.md) — storage
+- [HARDWARE.en.md](../../HARDWARE.en.md) — hardware and servers

 ## Official documentation

@@ -112,7 +112,65 @@ Split into separate files:
 | Complete guide to modern vSphere alternatives — Spectro Cloud | https://www.spectrocloud.com/blog/vsphere-alternatives | `[done]` |
 | Broadcom VMware Acquisition: What's Next — Sayers | https://www.sayers.com/blog/after-the-deal-whats-next-for-vmware-customers | `[done]` |
 | Stanford University migration from VMware to Proxmox | https://itcommunity.stanford.edu/news/enterprise-technology-completes-successful-virtual-infrastructure-migration-vmware-proxmox | `[done]` |
-
+| | **Sangfor** | |
+| Sangfor HCI — product page | https://www.sangfor.com/cloud-and-infrastructure/products/hci-hyper-converged-infrastructure | `[done]` |
+| Sangfor aSV — hypervisor | https://www.sangfor.com/cloud-and-infrastructure/products/asv-hypervisor-server-virtualization | `[done]` |
+| Sangfor vs VMware — feature comparison | https://www.sangfor.com/blog/cloud-and-infrastructure/sangfor-hci-vs-vmware-feature-comparison | `[done]` |
+| | **AI infrastructure** | |
+| NVIDIA DGX — documentation | https://www.nvidia.com/en-us/data-center/dgx-platform/ | `[done]` |
+| InfiniBand — Mellanox/NVIDIA | https://www.nvidia.com/en-us/networking/products/infiniband/ | `[done]` |
+| Lustre parallel filesystem | https://www.lustre.org/ | `[done]` |
+| WekaFS — AI storage | https://www.weka.io/ | `[done]` |
+| vLLM — inference server | https://github.com/vllm-project/vllm | `[done]` |
+| Megatron-LM — distributed training | https://github.com/NVIDIA/Megatron-LM | `[done]` |
+| | **Kubernetes / Cluster API** | |
+| Cluster API (CAPI) — official documentation (The CAPI Book) | https://cluster-api.sigs.k8s.io/ | `[done]` |
+| Cluster API — GitHub (kubernetes-sigs/cluster-api) | https://github.com/kubernetes-sigs/cluster-api | `[done]` |
+| Cluster API — provider list | https://cluster-api.sigs.k8s.io/reference/providers.html | `[done]` |
+| Kubernetes — official documentation | https://kubernetes.io/docs/ | `[done]` |
+| K3s — lightweight Kubernetes | https://k3s.io/ | `[done]` |
+| RKE2 — Rancher Kubernetes Engine 2 | https://docs.rke2.io/ | `[done]` |
+| Talos — API-driven Kubernetes OS | https://www.talos.dev/ | `[done]` |
+| Kamaji — hosted control plane provider | https://kamaji.clastix.io/ | `[done]` |
+| Metal3 — bare metal provider for CAPI | https://metal3.io/ | `[done]` |
+| Cluster API — ClusterClass and topologies | https://kubernetes.io/blog/2021/10/08/capi-clusterclass-and-managed-topologies/ | `[done]` |
+| | **Big Data** | |
+| Apache Spark — official documentation | https://spark.apache.org/docs/latest/ | `[done]` |
+| Apache Flink — official documentation | https://flink.apache.org/ | `[done]` |
+| Trino — distributed SQL engine | https://trino.io/docs/current/ | `[done]` |
+| Apache Iceberg — table format | https://iceberg.apache.org/ | `[done]` |
+| Delta Lake — documentation | https://docs.delta.io/ | `[done]` |
+| Apache Hudi | https://hudi.apache.org/ | `[done]` |
+| Apache Paimon | https://paimon.apache.org/ | `[done]` |
+| Apache Hadoop — documentation | https://hadoop.apache.org/docs/stable/ | `[done]` |
+| Apache Airflow — documentation | https://airflow.apache.org/docs/ | `[done]` |
+| Dagster — documentation | https://docs.dagster.io/ | `[done]` |
+| Prefect — documentation | https://docs.prefect.io/ | `[done]` |
+| HDFS architecture (Apache) | https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html | `[done]` |
+| | **Operating Systems** | |
+| Ubuntu lifecycle — Ubuntu Pro + ESM | https://ubuntu.com/about/release-cycle | `[done]` |
+| RHEL lifecycle — Red Hat Enterprise Linux | https://access.redhat.com/support/policy/updates/errata | `[done]` |
+| Rocky Linux lifecycle | https://rockylinux.org/download/ | `[done]` |
+| AlmaLinux lifecycle | https://almalinux.org/ | `[done]` |
+| Debian releases / LTS | https://wiki.debian.org/LTS | `[done]` |
+| SLES lifecycle — SUSE | https://www.suse.com/lifecycle/ | `[done]` |
+| Alpine Linux releases | https://alpinelinux.org/releases/ | `[done]` |
+| Fedora lifecycle | https://docs.fedoraproject.org/en-US/releases/lifecycle/ | `[done]` |
+| SELinux — Red Hat docs | https://www.redhat.com/en/topics/linux/what-is-selinux | `[done]` |
+| AppArmor — Ubuntu wiki | https://wiki.ubuntu.com/AppArmor | `[done]` |
+| | **Windows** | |
+| Windows Server lifecycle | https://learn.microsoft.com/en-us/lifecycle/products/windows-server-2022/ | `[done]` |
+| Windows Server 2025 lifecycle | https://learn.microsoft.com/en-us/lifecycle/products/windows-server-2025/ | `[done]` |
+| Windows 11 lifecycle | https://learn.microsoft.com/en-us/lifecycle/products/windows-11-enterprise/ | `[done]` |
+| Windows 10 EOL | https://learn.microsoft.com/en-us/lifecycle/products/windows-10-enterprise/ | `[done]` |
+| Windows Server licensing (per core) | https://learn.microsoft.com/en-us/windows-server/get-started/editions-and-support | `[done]` |
+| | **GPU pricing** | |
+| NVIDIA AI GPU pricing guide (2026) | https://intuitionlabs.ai/articles/nvidia-ai-gpu-pricing-guide | `[done]` |
+| GPU cloud pricing comparison (2026) | https://www.spheron.network/blog/gpu-cloud-pricing-comparison-2026/ | `[done]` |
+| GPU pricing trends 2026 — CompuX | https://compux.net/docs/guides/gpu-pricing-trends-2026 | `[done]` |
+| AMD MI300X pricing (2026) | https://www.thundercompute.com/blog/amd-mi300x-pricing | `[done]` |
+| GPU price/performance frontier — Silicon Analysts | https://siliconanalysts.com/tools/frontier | `[done]` |
+ 
 ## Hardware manufacturers

 | Manufacturer | Server series | Management |
--- a/sources/infrastructure/sources.md
+++ b/sources/infrastructure/sources.md
@@ -127,7 +127,65 @@ Rozděleno do samostatných souborů:
 | VMware Site Recovery Manager — documentation | https://docs.vmware.com/en/Site-Recovery-Manager/ | `[done]` |
 | Zerto — Disaster Recovery & Migration | https://www.zerto.com/resources/ | `[done]` |
 | The Phoenix Project — IT Ops & Migration patterns | https://itrevolution.com/product/the-phoenix-project/ | `[done]` |
-
+| | **Sangfor** | |
+| Sangfor HCI — product page | https://www.sangfor.com/cloud-and-infrastructure/products/hci-hyper-converged-infrastructure | `[done]` |
+| Sangfor aSV — hypervisor | https://www.sangfor.com/cloud-and-infrastructure/products/asv-hypervisor-server-virtualization | `[done]` |
+| Sangfor vs VMware — feature comparison | https://www.sangfor.com/blog/cloud-and-infrastructure/sangfor-hci-vs-vmware-feature-comparison | `[done]` |
+| | **AI infrastruktura** | |
+| NVIDIA DGX — documentation | https://www.nvidia.com/en-us/data-center/dgx-platform/ | `[done]` |
+| InfiniBand — Mellanox/NVIDIA | https://www.nvidia.com/en-us/networking/products/infiniband/ | `[done]` |
+| Lustre parallel filesystem | https://www.lustre.org/ | `[done]` |
+| WekaFS — AI storage | https://www.weka.io/ | `[done]` |
+| vLLM — inference server | https://github.com/vllm-project/vllm | `[done]` |
+| Megatron-LM — distributed training | https://github.com/NVIDIA/Megatron-LM | `[done]`
+| | **Kubernetes / Cluster API** | |
+| Cluster API (CAPI) — oficiální dokumentace (The CAPI Book) | https://cluster-api.sigs.k8s.io/ | `[done]` |
+| Cluster API — GitHub (kubernetes-sigs/cluster-api) | https://github.com/kubernetes-sigs/cluster-api | `[done]` |
+| Cluster API — seznam providerů | https://cluster-api.sigs.k8s.io/reference/providers.html | `[done]` |
+| Kubernetes — oficiální dokumentace | https://kubernetes.io/docs/ | `[done]` |
+| K3s — lightweigh Kubernetes | https://k3s.io/ | `[done]` |
+| RKE2 — Rancher Kubernetes Engine 2 | https://docs.rke2.io/ | `[done]` |
+| Talos — API-driven Kubernetes OS | https://www.talos.dev/ | `[done]` |
+| Kamaji — hosted control plane provider | https://kamaji.clastix.io/ | `[done]` |
+| Metal3 — bare metal provider pro CAPI | https://metal3.io/ | `[done]` |
+| Cluster API — ClusterClass a topologies | https://kubernetes.io/blog/2021/10/08/capi-clusterclass-and-managed-topologies/ | `[done]` |
+| | **Big Data** | |
+| Apache Spark — oficiální dokumentace | https://spark.apache.org/docs/latest/ | `[done]` |
+| Apache Flink — oficiální dokumentace | https://flink.apache.org/ | `[done]` |
+| Trino — distribuovaný SQL engine | https://trino.io/docs/current/ | `[done]` |
+| Apache Iceberg — tabulkový formát | https://iceberg.apache.org/ | `[done]` |
+| Delta Lake — dokumentace | https://docs.delta.io/ | `[done]` |
+| Apache Hudi | https://hudi.apache.org/ | `[done]` |
+| Apache Paimon | https://paimon.apache.org/ | `[done]` |
+| Apache Hadoop — dokumentace | https://hadoop.apache.org/docs/stable/ | `[done]` |
+| Apache Airflow — dokumentace | https://airflow.apache.org/docs/ | `[done]` |
+| Dagster — dokumentace | https://docs.dagster.io/ | `[done]` |
+| Prefect — dokumentace | https://docs.prefect.io/ | `[done]` |
+| HDFS architektura (Apache) | https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html | `[done]` |
+| | **Operační systémy** | |
+| Ubuntu lifecycle — Ubuntu Pro + ESM | https://ubuntu.com/about/release-cycle | `[done]` |
+| RHEL lifecycle — Red Hat Enterprise Linux | https://access.redhat.com/support/policy/updates/errata | `[done]` |
+| Rocky Linux lifecycle | https://rockylinux.org/download/ | `[done]` |
+| AlmaLinux lifecycle | https://almalinux.org/ | `[done]` |
+| Debian releases / LTS | https://wiki.debian.org/LTS | `[done]` |
+| SLES lifecycle — SUSE | https://www.suse.com/lifecycle/ | `[done]` |
+| Alpine Linux releases | https://alpinelinux.org/releases/ | `[done]` |
+| Fedora lifecycle | https://docs.fedoraproject.org/en-US/releases/lifecycle/ | `[done]` |
+| SELinux — Red Hat docs | https://www.redhat.com/en/topics/linux/what-is-selinux | `[done]` |
+| AppArmor — Ubuntu wiki | https://wiki.ubuntu.com/AppArmor | `[done]` |
+| | **Windows** | |
+| Windows Server lifecycle | https://learn.microsoft.com/en-us/lifecycle/products/windows-server-2022/ | `[done]` |
+| Windows Server 2025 lifecycle | https://learn.microsoft.com/en-us/lifecycle/products/windows-server-2025/ | `[done]` |
+| Windows 11 lifecycle | https://learn.microsoft.com/en-us/lifecycle/products/windows-11-enterprise/ | `[done]` |
+| Windows 10 EOL | https://learn.microsoft.com/en-us/lifecycle/products/windows-10-enterprise/ | `[done]` |
+| Windows Server licensing (per core) | https://learn.microsoft.com/en-us/windows-server/get-started/editions-and-support | `[done]` |
+| | **GPU ceny** | |
+| NVIDIA AI GPU pricing guide (2026) | https://intuitionlabs.ai/articles/nvidia-ai-gpu-pricing-guide | `[done]` |
+| GPU cloud pricing comparison (2026) | https://www.spheron.network/blog/gpu-cloud-pricing-comparison-2026/ | `[done]` |
+| GPU pricing trends 2026 — CompuX | https://compux.net/docs/guides/gpu-pricing-trends-2026 | `[done]` |
+| AMD MI300X pricing (2026) | https://www.thundercompute.com/blog/amd-mi300x-pricing | `[done]` |
+| GPU price/performance frontier — Silicon Analysts | https://siliconanalysts.com/tools/frontier | `[done]` |
+ 
 ## Výrobci hardware

 | Výrobce | Serverové řady | Management |