Compare commits

...

5 Commits

Author SHA1 Message Date
Stanislav Hubacek
ef3c2f75b1 18.6.2026 2026-06-18 16:25:33 +02:00
Stanislav Hubacek
b53714113c new files 2026-06-16 15:47:45 +02:00
Stanislav Hubacek
3fa11ef0f6 comiiit 2026-06-11 15:27:28 +02:00
Stanislav Hubacek
95d1839f05 First batch 2026-06-11 15:27:28 +02:00
Stanislav Hubacek
c6fa0bff6a commit 2026-06-11 15:27:28 +02:00
90 changed files with 22356 additions and 0 deletions

BIN
.DS_Store vendored Normal file

Binary file not shown.

View File

@@ -0,0 +1,95 @@
---
description: >
Navrhne malé datové centrum / demo cluster pro virtualizaci na základě znalostí v KB.
Projde relevantní KB soubory (DATACENTERS, HYPERVISORS, STORAGE, SERVER-CONFIG, CONNECTIVITY,
NETWORKING, CLOUD) a vytvoří ucelený návrh včetně HW sestavy, topologie sítě, diskového
subsystému, konektivity a rozpočtu. Výstup zapíše do case-studies/<nazev>/README.md.
mode: subagent
permission:
edit: allow
read: allow
glob: allow
grep: allow
bash: allow
webfetch: allow
websearch: allow
---
Jsi **DC Designer Agent** — navrhuješ malá dema/produkční datová centra pro virtualizaci.
## Vstup
Uživatel zadá parametry:
- Počet hostů (např. 2-3 malé, 3-6 střední)
- Účel (demo, vývoj, produkce)
- Preferovaný hypervisor (VMware, Proxmox, Hyper-V, Nutanix AHV)
- Rozpočtová omezení (low-cost, střední, enterprise)
- Další požadavky (HA, FT, GPU, NVMe, FC SAN, …)
## Workflow
1. **Analýza požadavků** — identifikuj klíčové parametry a variantu dle rozpočtu / velikosti
2. **Rešerše KB** — načti relevantní KB soubory:
- `DATACENTERS.md` — rack, power, cooling, layout, cabling
- `HYPERVISORS.md` — výběr hypervisoru, varianty A/B/C/D, licence
- `SERVER-CONFIG.md` — konkrétní HW konfigurace podle varianty
- `STORAGE.md` — storage (local vs SAN vs HCI), vendor srovnání
- `CONNECTIVITY.md` — NIC, switching, cabling (Ethernet, FC)
- `NETWORKING.md` — network layout, VLAN, segmentation
- `CLOUD.md` — hybrid cloud možnosti, offload
- `HARDWARE.md` / `SERVER-HW.md` — CPU, RAM, GPU, cooling
3. **Syntéza návrhu** — sestav konzistentní návrh pokrývající:
- Serverová sestava (CPU/RAM/disk/NIC/model)
- Storage varianta (Local RAID, vSAN, Ceph, FC SAN)
- Network (switche, topologie, kabeláž)
- Rack layout (rozměry, pozice, chlazení, UPS)
- Hypervisor + licence
- Odhad rozpočtu (orientační ceny)
- Diagram topologie (text/ASCII/Mermaid)
4. **Zápis** — vytvoř `case-studies/<nazev>/README.md`
5. **Shrnutí** — na konci vypiš klíčová rozhodnutí a kompromisy
## Pravidla
- Vždy čerpej z KB — neuváděj informace, které nejsou podložené zdroji
- Pokud KB neobsahuje dostatek dat pro konkrétní rozhodnutí, poznamenej to explicitně
- Rozhodnutí zdůvodni — proč zrovna tato komponenta, jaké jsou alternativy
- Ceny uváděj jako orientační řádové odhady (např. "~$15 000$25 000")
- Piš česky, fakticky, strukturovaně
- Na konec přidej sekci "Použité zdroje z KB" s odkazy na konkrétní soubory
- Výstupní soubor opatři footer `*Poslední revize: YYYY-MM-DD*`
- Pokud už case-studies adresář neexistuje, vytvoř ho
## Varianty dle velikosti
### Varianta "Mini" (2-3 hosté, demo/učení)
- 2-3× single-socket server (AMD EPYC 4124 / Intel Xeon E-2400)
- 128-256 GB RAM
- Local NVMe + HDD
- 1× 10GbE L2 switch
- Hypervisor: Proxmox VE (free) nebo VMware vSphere Foundation
- UPS 1500 VA
### Varianta "Medium" (3-6 hostů, vývoj/test)
- 3-4× dual-socket (AMD EPYC 9254 / Intel Xeon 6526Y)
- 512 GB - 1 TB RAM
- HCI: vSAN nebo Ceph (3+ nodes mandatory)
- 2× 25GbE ToR switch
- Hypervisor: VMware VCF nebo Nutanix AHV
- UPS 3000 VA + ATS
### Varianta "Enterprise Light" (4-8 hostů, produkce)
- 4-6× dual-socket (AMD EPYC 9454 / Intel Xeon 6548Y)
- 1-2 TB RAM
- FC SAN: 2× controller + JBOD (all-flash)
- 2× 25GbE ToR + 2× 32Gb FC switch
- Hypervisor: VMware VCF s FC SAN
- 2× UPS 3000 VA + service bypass
## Příklad použití
Uživatel: "navrhni malé demo DC pro 3 hosty, Proxmox, low-cost"
→ Projdeš KB, vytvoříš návrh ve variantě Mini a zapíšeš do case-studies/proxmox-demo/README.md
Uživatel: "case study pro VMware cluster se 4 hosty a SAN"
→ Zpracuješ variantu Enterprise Light, zapíšeš do case-studies/vmware-san-cluster/README.md

View File

@@ -0,0 +1,68 @@
# kb-index — Knowledge Base Index Agent
Udržuje centrální rozcestník (`README.md` / `README.en.md`).
## Responsibilities
1. **Scan all KB files** — prochází všechny `.md` a `.en.md` soubory (mimo README a .opencode/)
2. **Extract cross-references** — hledá markdown odkazy `[text](file.md)` mezi KB soubory
3. **Update cross-reference matrix** — aktualizuje tabulku v README.md a README.en.md
4. **Validate links** — kontroluje, zda všechny interní odkazy vedou na existující soubory
5. **Detect orphans** — najde soubory, které nejsou nikde odkazovány
6. **Add new files** — přidá nové soubory do navigační tabulky
## Trigger
Spouštět po:
- Přidání nového souboru do KB
- Přidání nové sekce s křížovými referencemi
- Hromadné změně (překlad, restrukturalizace)
- Ruční požadavek: "aktualizuj rozcestník"
## Workflow
### 1. Scan files
Pomocí globu najdi všechny `*.md` a `*.en.md` v kořenu KB (ne v .opencode/, ne README).
### 2. Extract metadata
Pro každý soubor:
- Přečti prvních 5 řádků (pro název a popis)
- Najdi všechny odkazy `[text](path/to/file.md)` na jiné KB soubory
### 3. Classify files
| Kategorie | Příznak |
|-----------|---------|
| Hlavní téma | Root `.md` / `.en.md` bez detailní DB |
| Detailní DB | POSTGRESQL, MYSQL, ORACLE, MONGODB, REDIS, CASSANDRA, VEKTOROVE-DB / VECTOR-DBS |
| DB koncepty | DATABAZOVE-ENGINY / DATABASE-ENGINES |
| Legacy index | HARDWARE, INFRASTRUCTURE |
| Case study | case-studies/*/README.md |
| Template | templates/ADR |
| Sources | sources/*/sources.md |
### 4. Update README.md
Aktualizuj sekce:
- **Navigace — Czech** — tabulka všech `.md` souborů
- **Navigation — English** — tabulka všech `.en.md` souborů
- **Cross-Reference Matrix** — tabulka s referencemi mezi soubory
- **Case Studies** — seznam case-studies/README.md
- **Doporučená literatura** — knihy z README
- **Zdroje / Sources** — tabulka sources souborů
- **Datum poslední aktualizace**
### 5. Validate
Zkontroluj:
- Každý interní odkaz v každém souboru → cíl existuje
- Každý soubor (mimo legacy indexů) je uveden v README navigaci
- Hlášení: "X validních odkazů, Y broken, Z orphan souborů"
### 6. Report
Po dokončení vrať summary:
- Počet zscanovaných souborů
- Počet nalezených cross-referencí
- Broken linky (pokud existují)
- Orphan soubory (pokud existují)

View File

@@ -0,0 +1,43 @@
---
description: >
Zpracovává [todo] položky v knowledge base. Hledá v souborech sources/<area>/sources.md položky se statusem
[todo], rešeršuje téma, zapracuje nové poznatky do příslušného KB souboru a změní status na [done].
Spouštět s konkrétním požadavkem, např. "zpracuj všechny todo v sources/cicd/". Používat pro rozšiřování
knowledge base o nová témata z nespracovaných zdrojů.
mode: subagent
permission:
edit: allow
read: allow
bash: allow
webfetch: allow
websearch: allow
---
Jsi **KB Research Agent** — tvým úkolem je systematicky zpracovávat `[todo]` položky v knowledge base.
## Workflow
1. **Analýza** — projdi `sources/<area>/sources.md` v zadané oblasti a identifikuj všechny řádky se `[todo]`
2. **Rešerše** — pro každou todo položku:
- Pokud má URL, načti obsah (webfetch)
- Pokud je to kniha / standard, vyhledej aktuální informace (websearch)
- Získej klíčové koncepty, definice, best practices
3. **Zapracování** — rozšiř příslušný `.md` soubor v kořeni KB o nové poznatky
4. **Update zdroje** — změň `[todo]` na `[done]`
## Pravidla
- Neodstraňuj existující obsah — pouze přidávej a rozšiřuj
- Udržuj konzistentní formát (tabulky, seznamy, hlavičky)
- Piš česky, fakticky, bez subjektivních názorů
- Každý nový koncept doplň krátkým popisem
- Pokud narazíš na `[done]` položku, nech ji být
- Na konci vytvoř summary: co bylo zpracováno, co bylo přidáno
## Příklady použití
Uživatel: "zpracuj všechny todo v sources/cicd/"
→ Projdeš sources/cicd/sources.md, zpracuješ každý [todo] záznam a rozšíříš CICD.md
Uživatel: "zpracuj [todo] položku o CAP theorem"
→ Najdeš konkrétní todo o CAP theorem (v sources/databases/), provedeš rešerši a rozšíříš DATABASES.md

View File

@@ -0,0 +1,89 @@
---
description: >
Kontroluje konzistenci, kvalitu a aktuálnost celé knowledge base. Prochází všechny .md soubory,
ověřuje formátování (tabulky, nadpisy, seznamy), křížové odkazy mezi soubory, duplicitní obsah,
zastaralé informace a konzistenci se zdroji v sources/. Spouštět např. "proveď review celé KB"
nebo "zkontroluj konzistenci CICD.md".
mode: subagent
permission:
edit: allow
read: allow
webfetch: allow
websearch: allow
---
Jsi **KB Reviewer** — tvým úkolem je auditovat kvalitu knowledge base.
## Kontrolní oblasti
### 1. Formátování a konzistence
- [ ] Všechny soubory mají stejnou strukturu nadpisů (začínají na `#`, sekce `##`)
- [ ] Tabulky mají konzistentní formát (zarovnání, oddělovače `|---|`)
- [ ] Kódové bloky používají ``` s jazykovým tagem
- [ ] Seznamy jsou jednotně odsazeny
- [ ] Diagramy (ASCII / Mermaid) jsou čitelné
### 2. Křížové odkazy
- [ ] Témata, která se překrývají mezi soubory, na sebe vzájemně odkazují
- Např. "monitoring v CICD" → odkaz na MONITORING.md
- Např. "cloud networking" → odkaz mezi CLOUD.md a NETWORKING.md
- [ ] README.md obsahuje všechny aktuální soubory
- [ ] Každý `.md` soubor v kořeni je zmíněn v README.md
### 3. Duplicity
- [ ] Stejný koncept není vysvětlen na více místech s rozdílnými informacemi
- [ ] Pokud se koncept opakuje, je konzistentní (stejná čísla, definice, doporučení)
### 4. Aktuálnost
- [ ] Verze nástrojů odpovídají aktuálním stabilním vydáním (ověř webem)
- [ ] EOL technologie jsou označeny nebo odstraněny
- [ ] Žádné "brzy bude" — pokud není splněno, označ jako outdated
- [ ] Licence a ceny (kde uvedeny) jsou aktuální
### 5. Konzistence se zdroji
- [ ] Každý fakt v KB by měl mít dohledatelný zdroj v `sources/`
- [ ] Pokud je zdroj v `sources/` označen `[done]`, měl by být odpovídající obsah v KB
- [ ] Pokud `sources/` obsahuje zdroj k tématu, které v KB chybí — upozorni
### 6. Pravopis a styl
- [ ] Žádné překlepy
- [ ] Konzistentní terminology (nepoužívat "VM" i "virtuální stroj" v jednom souboru)
- [ ] Anglicismy jsou tam kde dávají smysl (vysvětlené při prvním použití)
## Report
Na konci vygeneruj přehledný report:
```markdown
## Review report — YYYY-MM-DD
### Problémy (nutno opravit)
- [soubor.md:řádek] popis problému
### Doporučení
- [soubor.md] popis
### Stav
- ✅ Kontrola formátování: OK / N problémů
- ✅ Křížové odkazy: OK / N chybějících
- ✅ Duplicity: OK / N nalezeno
- ✅ Aktuálnost: OK / N zastaralých
- ✅ Konzistence se zdroji: OK / N nesrovnalostí
```
## Příklady použití
Uživatel: "proveď review celé KB"
→ Projdeš všechny soubory a vypíšeš kompletní report
Uživatel: "zkontroluj konzistenci NETWORKING.md"
→ Zaměříš se jen na jeden soubor, zkontroluješ ho ze všech úhlů
Uživatel: "najdi duplicity mezi CLOUD.md a INFRASTRUCTURE.md"
→ Porovnáš specifické soubory

View File

@@ -0,0 +1,53 @@
---
description: >
Vyhledává nové zdroje (knihy, články, dokumentace, nástroje, standardy, videa, certifikace)
pro rozšíření knowledge base. Prochází web, identifikuje relevantní materiály k zadané oblasti
a přidává je jako [todo] do příslušného sources/<area>/sources.md. Používat pro kontinuální
obohacování knowledge base o aktuální zdroje. Spouštět např. "najdi nové zdroje pro cloud architekturu".
mode: subagent
permission:
edit: allow
read: allow
webfetch: allow
websearch: allow
---
Jsi **KB Source Scout** — tvým úkolem je aktivně vyhledávat nové kvalitní zdroje pro knowledge base.
## Workflow
1. **Analýza stavu** — přečti `sources/<area>/sources.md` pro zadanou oblast, zjisti co už je zdokumentované
2. **Rešerše novinek** — pomocí websearch najdi nové zdroje:
- Oficiální dokumentace a whitepapery
- Knihy (ISBN, autor, vydání)
- Kvalitní články a blog posty
- Nástroje a frameworky
- Standardy a RFC
- Video kurzy a přednášky (konference)
- Certifikace
3. **Deduplikace** — zkontroluj, zda zdroj už není v sources.md
4. **Přidání** — doplň nové zdroje do příslušného `sources/<area>/sources.md` s tagem `[todo]`
## Kritéria kvality
- **Oficiální dokumentace** — preferovat primary sources (vendor docs, RFC, standardy)
- **Knihy** — preferovat vydání z posledních 3 let, u klasik (jako TCP/IP Illustrated) stačí starší
- **Články** — preferovat autority v oboru (Brendan Gregg, Martin Kleppmann, Kelsey Hightower, ...)
- **Nástroje** — aktivní komunita, aktuální verze, open-source bonus
- Vyhýbej se: zjevně zastaralým materiálům (>5 let mimo obor), clickbaitům, nedůvěryhodným zdrojům
## Formát zápisu
Pro každý nový zdroj přidej řádek do tabulky v příslušném sources.md.
Udržuj konzistentní formát dle existujících záznamů v souboru.
## Příklady použití
Uživatel: "najdi nové zdroje pro cloud architekturu"
→ Prohledáš web, najdeš knihy, články, whitepapery o cloud architektuře z roku 2025/2026 a přidáš je do sources/cloud/sources.md jako [todo]
Uživatel: "scout infra"
→ Prohledáš nové zdroje pro infrastrukturu (hypervisory, DC, storage, hardware) a přidáš je do sources/infrastructure/sources.md
Uživatel: "najdi novinky v observability za poslední rok"
→ Zaměříš se na monitoring/observability, hledáš nové nástroje, články, verze a doplníš do sources/monitoring/sources.md

25
.opencode/opencode.json Normal file
View File

@@ -0,0 +1,25 @@
{
"$schema": "https://opencode.ai/config.json",
"agent": {
"kb-research": {
"description": "Zpracovává [todo] položky v knowledge base — rešerše a zapracování nových témat",
"mode": "subagent"
},
"kb-source-scout": {
"description": "Vyhledává nové zdroje (knihy, články, dokumentace) a přidává je do sources/ jako [todo]",
"mode": "subagent"
},
"kb-reviewer": {
"description": "Audituje konzistenci, aktuálnost, křížové odkazy, duplicity a formátování celé KB",
"mode": "subagent"
},
"dc-designer": {
"description": "Navrhne malé DC / demo cluster pro virtualizaci na základě KB a zapíše case study do case-studies/",
"mode": "subagent"
},
"kb-index": {
"description": "Udržuje centrální rozcestník README.md — scanuje soubory, extrahuje křížové reference, validuje odkazy, přidává nové soubory",
"mode": "subagent"
}
}
}

600
AI-INFRASTRUCTURE.en.md Normal file
View File

@@ -0,0 +1,600 @@
# 🧠 AI/ML Infrastructure
## Component overview
```mermaid
flowchart TD
subgraph Compute
GPU["GPU (H100/B200/Instinct)"]
CPU["CPU (AMD EPYC / Intel Xeon)"]
ASIC["ASIC (TPU, Trainium, Inferentia)"]
end
subgraph Network
IB["InfiniBand NDR/XDR"]
ROCE["RoCEv2"]
NVL["NVLink / NVSwitch"]
end
subgraph Storage
FS["Parallel FS (Lustre, GPFS, Weka)"]
OBJ["Object Store (S3, MinIO)"]
NVME["Local NVMe cache"]
end
subgraph Orchestration
S["Slurm"]
K["Kubernetes + Volcano/Kueue"]
end
subgraph Cooling
DLC["Direct-to-chip liquid"]
IMM["Immersion"]
AIR["Air (high-density)"]
end
Compute --> Network --> Storage
Orchestration --> Compute
Cooling --> Compute
```
---
## GPU compute
### NVIDIA
| GPU | Architecture | FP8 | FP16/BF16 | FP64 | HBM | NVLink | TDP | Rack config |
|-----|-------------|-----|-----------|------|-----|--------|-----|------|
| **H100 SXM** | Hopper | 3,958 TFLOPS | 1,979 TFLOPS | 67 TFLOPS | 80 GB HBM3 | 900 GB/s | 700 W | 68× in DGX H100 |
| **H200 SXM** | Hopper (HBM3e) | 3,958 TFLOPS | 1,979 TFLOPS | 67 TFLOPS | 141 GB HBM3e | 900 GB/s | 700 W | 68× in DGX H200 |
| **B200** | Blackwell | ~9,000 TFLOPS | ~4,500 TFLOPS | ~40 TFLOPS | 192 GB HBM3e | 1,800 GB/s | 1,000 W | 68× in DGX B200 |
| **GB200 Grace Hopper** | Blackwell | ~18,000 TFLOPS | ~9,000 TFLOPS | — | 192 GB + 480 GB (Grace) | NVLink-C2C | 1,000 W (GPU) + 500 W (CPU) | DGX GB200 (36× GPU) |
| **L40S** | Ada Lovelace | 733 TFLOPS | 367 TFLOPS | — | 48 GB GDDR6 | N/A | 350 W | Inference, enterprise |
| **A100 SXM** | Ampere | 1,248 TFLOPS | 624 TFLOPS | 19.5 TFLOPS | 80 GB HBM2e | 600 GB/s | 400 W | DGX A100 |
### AMD
| GPU | Architecture | FP8 | FP16/BF16 | FP64 | HBM | Infinity Fabric | TDP |
|-----|-------------|-----|-----------|------|-----|----------------|-----|
| **MI300X** | CDNA 3 | 2,615 TFLOPS | 1,307 TFLOPS | 81 TFLOPS | 192 GB HBM3 | 896 GB/s | 750 W |
| **MI250** | CDNA 2 | — | 383 TFLOPS | 95.7 TFLOPS | 128 GB HBM2e | 400 GB/s | 500 W |
### Intel
| GPU | Architecture | FP16/BF16 | FP32 | HBM | TDP |
|-----|-------------|-----------|------|-----|-----|
| **Gaudi 3** | Custom | 1,835 TFLOPS | — | 144 GB HBM2e | 600 W |
| **Max 1550** | Xe HPC | 600+ TFLOPS | 200 TFLOPS | 128 GB HBM2e | 600 W |
### Cloud ASIC
| ASIC | Provider | Use case | Performance |
|------|----------|----------|-------|
| **TPU v5p** | Google | Training | ~4,600 TFLOPS (BF16) per pod |
| **Trainium 2** | AWS | Training | ~1,000 TFLOPS (BF16) per chip |
| **Inferentia 2** | AWS | Inference | ~400 TOPS (INT8) per chip |
| **Maia 100** | Microsoft | Training + inference | Custom, 800 W TDP |
---
## AI networking
### Technology comparison
| Technology | Bandwidth per link | Latency | Topology | Use case |
|-------------|-------------------|---------|-----------|----------|
| **InfiniBand NDR200** | 200 Gb/s | < 1 µs | Fat-tree, Dragonfly+ | Training (NVIDIA) |
| **InfiniBand NDR400** | 400 Gb/s | < 1 µs | Fat-tree, Dragonfly+ | Training (NVIDIA) |
| **InfiniBand XDR** | 800 Gb/s (planned) | < 1 µs | Dragonfly+ | Next-gen training |
| **RoCEv2** (CX-7/8) | 200400 Gb/s | 12 µs | Fat-tree, Spine-leaf | Training (AMD, Intel, open) |
| **NVLink 4.0** | 900 GB/s per GPU | < 0.5 µs | NVSwitch full-mesh | Intra-node GPU comm |
| **NVLink 5.0** | 1,800 GB/s per GPU | < 0.5 µs | NVSwitch full-mesh | Intra-node (Blackwell) |
| **Ethernet (400 GbE)** | 400 Gb/s | 25 µs | Spine-leaf | Inference, data pipeline |
### AI fabric principles
- **Rail-optimized topology** — each GPU communicates on dedicated "rails" (same GPU indices across nodes connect to the same switch)
- **Fat-tree (Clos)** — standard for InfiniBand and RoCE, non-blocking bisection bandwidth
- **Dragonfly+** — reduces hop count while maintaining bandwidth (used in largest clusters)
- **GPU Direct RDMA** — direct GPU ↔ GPU communication without CPU involvement, supports InfiniBand and RoCE
- **SHARP (Scalable Hierarchical Aggregation and Reduction Protocol)** — in-network reduction for AllReduce (InfiniBand only)
### Bandwidth sizing
```text
Rule of thumb: InfiniBand bandwidth ≥ 50 % GPU HBM bandwidth for scalable training
Example: H100 has 3.35 TB/s HBM
→ Needs min. 1.6 TB/s bisection bandwidth per GPU
→ 8× H100 in DGX: 4× NDR400 IB per GPU = 4 × 50 GB/s = 200 GB/s
→ Reality: 8× 200 Gb/s (25 GB/s) per GPU in typical config = ~6 % HBM → bottleneck
```
---
## AI storage
### Requirements
| Dataset size | IO pattern | Recommended storage | Bandwidth |
|-------------|-----------|-------------------|-----------|
| < 10 TB | Sequential read (data loading) | Local NVMe | > 10 GB/s per node |
| 10100 TB | Random read (checkpointing) | Parallel FS (Lustre, Weka) | > 100 GB/s cluster-wide |
| 100 TB10 PB | Mixed (training + checkpoint) | Parallel FS + object store | > 500 GB/s |
| 10 PB+ | Multi-modal, video, LLM | Tiered (NVMe cache + parallel FS + object) | > 1 TB/s |
### Storage solution comparison
| Solution | Type | Bandwidth per node | Max capacity | Scaling | Use case |
|--------|-----|-------------------|-------------|-----------|----------|
| **Lustre** | Parallel FS (POSIX) | > 100 GB/s (cluster) | 100s PB | OST + MDS | HPC, LLM training (standard) |
| **GPFS / StorageScale** | Parallel FS (POSIX) | > 100 GB/s | 100s PB | NSD servers | HPC, AI (IBM) |
| **WekaFS** | Parallel FS (POSIX + NFS/SMB) | ~80 GB/s per 10 nodes | 10s PB | Container-native | AI/ML, NVIDIA DGX preferred |
| **VAST Data** | Universal storage (NVMe + QLC) | ~100 GB/s per cluster | 10s PB | Scale-out | AI, checkpoint, data lake |
| **Pure Storage//E** | All-flash (NVMe) | ~50 GB/s | ~30 PB | Scale-out | Enterprise AI, database |
| **MinIO / S3** | Object store | ~20 GB/s per gateway | EB | Erasure coding | Dataset repository, checkpoint |
| **NetApp AFF** | NAS + S3 | ~10 GB/s per controller | ~50 PB | HA pair | Enterprise, NFS baseline |
### Checkpointing strategies
| Strategy | RPO | Storage impact | Description |
|-----------|-----|---------------|-------|
| **Full checkpoint** | every N steps | High (stops training) | Full model + optimizer state |
| **Async checkpoint** | every N steps | Medium (non-blocking) | Copy to staging buffer, async write |
| **Distributed checkpoint** (NVIDIA NeMo) | every N steps | Low | Each rank writes its own shard |
| **In-memory checkpoint** (IBM) | on failover | Minimal (DRAM) | Replication to another node's DRAM |
| **Continuous checkpoint** (Microsoft) | every 15 min | Low (delta) | Changed shards only |
---
## AI cluster architecture
### Physical topology — DGX H100 example
```
┌──────── DGX H100 (8× GPU) ────────┐
│ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │
│ │GPU 0│ │GPU 1│ │GPU 2│ │GPU 3│ │
│ └──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘ │
│ ┌──┴──┐ ┌──┴──┐ ┌──┴──┐ ┌──┴──┐ │
│ │GPU 4│ │GPU 5│ │GPU 6│ │GPU 7│ │
│ └─────┘ └─────┘ └─────┘ └─────┘ │
│ NVSwitch (NVLink 4.0, 900 GB/s) │
│ InfiniBand CX-7: 8× NDR400 │
└────────────────────────────────────┘
│ 8× IB rails
┌────┴──────────────┐
│ IB NDR400 Switches │ (rail-optimized)
└────────────────────┘
```
### Kubernetes for AI
| Component | Role |
|-----------|------|
| **Volcano** | Batch scheduling, gang scheduling, queue management |
| **Kueue** | Multi-tenant admission, resource quotas, fair sharing |
| **NVIDIA GPU Operator** | Driver, container toolkit, MIG, DCGM, monitoring |
| **HAMi** (ex k8s-vGPU-scheduler) | GPU sharing, MIG partitioning, fractional GPU |
| **Node Feature Discovery** | GPU type detection, NUMA topology |
| **Topology Manager** | NUMA-aware pod placement |
| **DPDK / SR-IOV** | High-performance networking for GPU Direct RDMA |
### Slurm for AI
| Component | Role |
|-----------|------|
| **slurm.conf** | Partition for GPU nodes, GRES (Generic Resource) |
| **gres.conf** | GPU type, GPU count per node |
| **srun --gres=gpu:8** | Allocate 8 GPUs per job |
| **sbatch --nodes=64 --ntasks=512** | 64 nodes, 512 ranks (8 GPU/node) |
| **Pixis** | NVIDIA orchestration plugin for Slurm |
---
## AI cluster cooling
### Power density comparison
| Configuration | TDP per node | Racks | kW/rack | Note |
|-------------|-------------|-------|---------|----------|
| Standard server (2U) | 1 kW | 20 | 510 | Typical DC |
| GPU server (DGX H100, 6×) | 42 kW | 6 | 4550 | Air cooling limit |
| GPU server (DGX B200, 6×) | 72 kW | 6 | 90100 | Liquid cooling required |
| GPU server (GB200 NVL72) | 120 kW | — | ~120 | Liquid cooling mandatory |
| NVIDIA NVL72 rack | 120 kW | 1 | 120 | Fully liquid cooled |
### Cooling technologies
| Method | Max kW/rack | CAPEX | OPEX | Complexity |
|--------|-------------|-------|------|-----------|
| **Air cooling (CRAC/CRAH)** | < 15 | Low | Medium | Low |
| **Air cooling (in-row)** | 1530 | Medium | Medium | Low |
| **Rear-door heat exchanger** | 3050 | Medium | Low | Medium |
| **Direct-to-chip liquid (cold plate)** | 50150 | High | Low | High |
| **Immersion (single-phase)** | 100200 | High | Low | High |
| **Immersion (two-phase)** | 200+ | Very high | Low | Very high |
---
## Inference infrastructure
### Inference server comparison
| Tool | Frameworks | Optimization | Use case |
|---------|-----------|-------------|----------|
| **vLLM** | Megatron, HF, AWQ, GPTQ | PagedAttention, KV cache, continuous batching | LLM inference (open source) |
| **TensorRT-LLM** | TensorRT | INT4/INT8/FP8, inflight batching, attention optimizations | Production (NVIDIA) |
| **Triton Inference Server** | All (TensorRT, vLLM, PyTorch) | Model ensemble, model caching, concurrent execution | Enterprise, multi-model |
| **SageMaker** | Managed | Auto-scaling, model parallelism | AWS managed |
| **OpenAI API / TGI** | HF Transformers | Continuous batching, flash attention | Hosting |
### Inference optimization
| Technique | Latency improvement | Throughput improvement | Memory reduction |
|----------|-----------------|---------------------|------------------|
| **FP8/INT8 quantization** | — | 2× | 2× |
| **INT4 quantization** | — | 4× | 4× |
| **Flash Attention 2/3** | 24× | — | 50 % (KV cache) |
| **PagedAttention** | — | 25× | 95 % (KV cache fragmentation) |
| **Continuous batching** | — | 1020× | — |
| **Speculative decoding** | 23× | — | — |
| **Multi-LoRA / S-LoRA** | — | 816× | — |
---
## Distributed training techniques
| Technique | Description | Frameworks |
|----------|-------|------------|
| **Data Parallelism (DDP/FSDP)** | Each GPU has model copy, different batch | PyTorch DDP, FSDP |
| **Tensor Parallelism (TP)** | Model split across layers (intra-node) | Megatron-LM, DeepSpeed |
| **Pipeline Parallelism (PP)** | Layers split across nodes | Megatron-LM, DeepSpeed |
| **Sequence Parallelism (SP)** | Sequence split across GPUs | Megatron-LM |
| **Expert Parallelism (EP)** | Different expert subnets on different GPUs | Mixture-of-Experts (MoE) |
| **3D Parallelism** | TP + PP + DP combination | Megatron-LM, NeMo |
| **ZeRO (1/2/3)** | Optimizer/gradient/parameter sharding | DeepSpeed |
| **NCCL / RCCL** | GPU collective communication library | NVIDIA/AMD |
---
## Operating systems for AI
### Distribution comparison
| OS | GPU driver | CUDA | Container toolkit | IB/RoCE | Lustre client | Production support |
|----|-----------|------|-------------------|---------|--------------|-------------------|
| **Ubuntu 22.04 LTS** | NVIDIA 525+ | 12.x | nvidia-container-toolkit | MLNX_OFED, rdma-core | Yes (lustre-client) | NVIDIA DGX standard |
| **Ubuntu 24.04 LTS** | NVIDIA 550+ | 12.5+ | nvidia-container-toolkit | MLNX_OFED, rdma-core | Yes | Latest GPU support |
| **RHEL 9 / Rocky 9** | NVIDIA 525+ | 12.x | nvidia-container-toolkit | MLNX_OFED | Yes (EL repo) | Red Hat, enterprise |
| **DGX OS** (Ubuntu-based) | NVIDIA custom | 12.x | Pre-installed | Pre-configured | Yes | NVIDIA DGX only supported |
| **SLES 15 SP5** | NVIDIA 525+ | 12.x | nvidia-container-toolkit | MLNX_OFED | Yes | HPC, some Lustre clusters |
| **Debian 12** | NVIDIA 525+ | 12.x | nvidia-container-toolkit | rdma-core | Yes (backports) | Community, research |
| **Flatcar / Bottlerocket** | Container-host | — | nvidia-container-toolkit | Limited | No | K8s-only, minimal footprint |
### Limitations and constraints
#### GPU drivers and CUDA
| Constraint | Detail |
|----------|--------|
| **Driver-CUDA compatibility** | NVIDIA driver major version must match CUDA toolkit (driver ≥ CUDA req). E.g., CUDA 12.5 requires driver ≥ 550 |
| **Kernel version** | NVIDIA driver not compatible with all kernels. New kernel (6.8+) may require DKMS build or delayed support |
| **Secure Boot** | NVIDIA driver requires signed module (MOK, shim) or disabled Secure Boot — common enterprise issue |
| **Open vs Proprietary driver** | NVIDIA `nvidia-open` (since R515) — open source kernel module. GPU support: DC (H100+) → OK, older GPUs → proprietary required |
| **nvidia-persistenced** | Required to maintain GPU initialization; without it GPUs may sleep after idle timeout (`nvidia-smi -pm 1`) |
| **GPU reset** | After crashed training job, GPU may hang. `nvidia-smi --gpu-reset` or reboot node, sometimes power cycle |
| **Multi-instance GPU (MIG)** | Requires specific driver, MIG mode on GPU, GPU restart. Cannot be changed at runtime. A100, H100, B200 only |
#### Network (InfiniBand / RoCE)
| Constraint | Detail |
|----------|--------|
| **MLNX_OFED vs rdma-core** | MLNX_OFED (NVIDIA) — full support, but own kernel modules, kernel version compatibility needed. `rdma-core` (open) — limited support, no custom modules |
| **Kernel compatibility** | MLNX_OFED supports only specific kernel versions (major.minor). Kernel upgrade → MLNX_OFED rebuild required |
| **NCCL** | NCCL version must be compatible with CUDA and IB firmware. `nccl-tests` for validation |
| **SHARP** | In-network reduction requires specific MLNX_OFED + IB switch firmware combination |
| **GPU Direct RDMA** | Requires `nvidia-peermem` module + MLNX_OFED. Does not work with all GPU and IB card combinations |
| **RoCE PFC/ECN** | RoCE requires lossless fabric (PFC, ECN, DCQCN). Switch and host configuration — complex tuning |
#### Storage
| Constraint | Detail |
|----------|--------|
| **Lustre client** | Client version must match server. Server upgrade → upgrade all clients. Compatible with RHEL/Debian derivatives only |
| **POSIX locking** | NFS and Lustre have different POSIX locking behavior. Distributed training relies on flock → problematic with mixed FS |
| **Filesystem cache** | Page cache can mask IO bottlenecks. Training jobs often require `O_DIRECT` or sync IO |
| **Local NVMe vs parallel FS** | Dataset staging on local NVMe eliminates network dependency but requires space and pre-fetch pipeline |
#### Container runtime
| Constraint | Detail |
|----------|--------|
| **Docker + GPU** | `nvidia-container-toolkit` (formerly nvidia-docker2). Requires runtime installation and config in `/etc/docker/daemon.json` |
| **Podman + GPU** | Requires `nvidia-container-toolkit` + podman hook. Less tested than Docker |
| **containerd + GPU** | Standard for K8s. Requires `cdi` (Container Device Interface) or `nvidia-container-runtime` |
| **Enroot + Pyxis** | NVIDIA container stack for Slurm (Enroot = daemonless container runtime, Pyxis = Slurm plugin) |
| **User namespace mapping** | Container GPU access requires device cgroup; rootless may fail (exception for /dev/dri and /dev/nvidia*) |
#### Kernel parameters
```text
# AI workload recommended sysctl
net.core.rmem_max = 134217728 # sufficient for NCCL
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.core.netdev_budget = 600 # for high packet rate
vm.max_map_count = 1048576 # PyTorch DataLoader workers
kernel.numa_balancing = 0 # disable NUMA balancing (breaks locality)
kernel.sched_min_granularity_ns = 10000000
# Disable security mitigations for perf (dedicated AI clusters only)
mitigations=off
transparent_hugepages=never # or madvise — THP may cause latency spikes
intel_idle.max_cstate=1 # reduce C-state transition latency
```
#### Firmware and HW
| Constraint | Detail |
|----------|--------|
| **GPU firmware (VBIOS)** | NVIDIA datacenter GPUs (H100, B200) have VBIOS updates via NVFlash. Without update → missing partitioning support or newer CUDA features |
| **InfiniBand firmware** | IB switch and HCA firmware must be compatible. Mix old switch + new HCA → degraded perf |
| **NVSwitch firmware** | DGX systems have NVSwitch firmware updatable only via NVIDIA DGX tools |
| **Power capping (nvidia-smi)** | `nvidia-smi -pl <power>` — limit TDP for power budget management. Test impact on training throughput |
| **GPU clock locking** | `nvidia-smi -ac <clock,mem>` — locked clock frequency for stable benchmarks. Apply after `nvidia-persistenced` |
| **PCIe Gen** | GPU in PCIe Gen4 slot (instead of Gen5) → bottleneck for CPU↔GPU data transfer. Important for FSDP sharding |
### Recommended OS per use case
| Use case | OS | Rationale |
|----------|-----|-------|
| **DGX cluster (production)** | DGX OS / Ubuntu 22.04 LTS | NVIDIA standard, best driver support |
| **Enterprise K8s (OpenShift)** | RHEL 9 / RHCOS | Red Hat support, GPU Operator compatible |
| **Vanilla K8s (on-prem)** | Ubuntu 22.04 LTS + Flatcar (workers) | Widest community support, Flatcar for minimal footprint |
| **Slurm cluster (HPC/AI)** | Rocky Linux 9 / Ubuntu 22.04 LTS | EL ecosystem (Lustre, OFED) or Ubuntu (community) |
| **Research / rapid prototyping** | Ubuntu 24.04 LTS | Latest CUDA, PyTorch, driver support |
| **Edge inference** | NVIDIA JetPack / Ubuntu (ARM) | Embedded GPU (Jetson Orin, AGX) |
---
## AI-ready data center — check-list
| Area | Requirement |
|--------|-----------|
| **Power** | 30120 kW/rack, HVDC (400 V DC), UPS supporting GPU spikes |
| **Cooling** | Liquid cooling ready (direct-to-chip), rear-door for 30+ kW |
| **Network** | InfiniBand (NDR/XDR) or RoCEv2, rail-optimized fat-tree |
| **Storage** | Parallel FS (Lustre/Weka), checkpoint bandwidth > 100 GB/s |
| **GPU density** | Max GPU/rack, minimize NVSwitch hops |
| **Physical** | Floor load 1,500+ kg/m², rack 52U60U |
| **Security** | Tenant isolation, network segmentation, data encryption |
| **Monitoring** | DCGM, NCCL health checks, thermals, power capping |
---
## Model and throughput limitations
### Model size per GPU
Maximum model size fitting on a single GPU depends on HBM capacity and precision:
| GPU | HBM | FP32 | FP16/BF16 | INT8 | INT4 |
|-----|-----|------|-----------|------|------|
| **H100 80GB** | 80 GB | ~10B | ~40B | ~80B | ~160B |
| **H200 141GB** | 141 GB | ~18B | ~70B | ~140B | ~280B |
| **B200 192GB** | 192 GB | ~24B | ~96B | ~192B | ~384B |
| **MI300X 192GB** | 192 GB | ~24B | ~96B | ~192B | ~384B |
| **A100 80GB** | 80 GB | ~10B | ~40B | ~80B | ~160B |
| **GB200 (192+480)** | 192 GB GPU + 480 GB Grace | — | ~96B + CPU offload | — | — |
*Approximate: 1B params ≈ 2 GB FP16 ≈ 4 GB FP32 ≈ 1 GB INT8 ≈ 0.5 GB INT4. Subtract ~1015 % HBM for activations, KV cache, optimizer states.*
### Memory breakdown inference
| Component | Llama 3 70B (FP16) | Llama 3 8B (FP16) |
|------------|-------------------|-------------------|
| Model weights | 140 GB | 16 GB |
| KV cache (4K context, batch 1) | ~2 GB | ~0.2 GB |
| KV cache (128K context, batch 1) | ~60 GB | ~6.5 GB |
| Activations (peak) | ~5 GB | ~1 GB |
| **Total 4K ctx** | ~147 GB | ~17 GB |
| **Total 128K ctx** | ~205 GB | ~23 GB |
**Conclusion:** Llama 3 70B FP16 does not fit on a single H100 (80 GB). Required: INT8 (170 GB → 2× H100), INT4 (85 GB → 1× H200), or tensor parallelism.
### Context length vs memory
| Context | KV cache 70B (FP16) | KV cache 8B (FP16) | Note |
|---------|-------------------|-------------------|------|
| 4K | ~2.2 GB | ~0.25 GB | Typical chat |
| 32K | ~18 GB | ~2 GB | Documents |
| 128K | ~72 GB | ~8 GB | Long-context (Claude, Gemini) |
| 1M | ~560 GB | ~64 GB | Experimental (Gemini 1.5 Pro) |
KV cache is **linear with context length** and quadratic with attention head count. Critical for long-context inference.
### Throughput inference
| Model | GPU | Precision | Batch size | Tokens/s | QPS (1K output) |
|-------|-----|-----------|-----------|----------|-----------------|
| Llama 3 8B | H100 | FP16 | 1 | ~800 | ~0.8 |
| Llama 3 8B | H100 | FP16 | 128 | ~4 500 | ~35 |
| Llama 3 8B | H100 | INT4 | 128 | ~8 000 | ~62 |
| Llama 3 70B | 4× H100 | FP16 | 1 | ~180 | ~0.18 |
| Llama 3 70B | 4× H100 | INT4 | 64 | ~1 200 | ~19 |
| Llama 3 70B | 8× H100 | FP16 (TP=8) | 128 | ~2 500 | ~20 |
| DeepSeek-R1 671B | 8× H200 | FP8 (MoE) | 64 | ~500 | ~8 |
| GPT-4 class (est.) | — | — | — | ~100300 | ~13 |
**Notes:**
- QPS (queries per second) depends on output length (1K tokens ≈ ~1 query)
- Larger batch increases throughput but increases TTFB (time to first token)
- Tensor Parallelism (TP) scales, but communication overhead grows linearly
### Training limits
#### Scaling efficiency
| GPU count | Model | Efficiency | Reason |
|-----------|-------|-----------|-------|
| 8 (1 node) | Llama 3 8B | ~95 % | NVLink intra-node |
| 64 (8 nodes) | Llama 3 8B | ~85 % | IB inter-node |
| 512 (64 nodes) | Llama 3 70B | ~75 % | Communication overhead |
| 4 096 (512 nodes) | Llama 3 70B | ~60 % | Pipeline bubble, network |
| 16 384 (2 048 nodes) | Llama 3 405B | ~45 % | Synchronous SGD overhead |
**Note:** Efficiency = (actual throughput) / (ideal linear speedup). Decreases logarithmically with GPU count.
#### Memory breakdown training
| Component | Llama 3 70B (BF16) | Llama 3 8B (BF16) |
|------------|-------------------|-------------------|
| Model weights | 140 GB | 16 GB |
| Optimizer states (Adam) | 280 GB | 32 GB |
| Gradients | 140 GB | 16 GB |
| Activations (peak) | ~30 GB | ~4 GB |
| **Total (DDP)** | ~590 GB | ~68 GB |
| **Total (FSDP shard=8)** | ~74 GB | ~8.5 GB |
**Conclusion:** FSDP (Fully Sharded Data Parallelism) is required for training models > 10B. Adam optimizer doubles memory vs inference (weights + optimizer + gradients).
#### Time to train
| Model | GPU count | GPU type | Precision | Time | Cost (on-prem estimate) |
|-------|-----------|---------|-----------|------|---------------------|
| Llama 3 8B | 64 | H100 | BF16 | ~3 days | ~$5 000 |
| Llama 3 70B | 512 | H100 | BF16 | ~14 days | ~$100 000 |
| Llama 3 405B | 16 384 | H100 | BF16 | ~60 days | ~$14 M |
| DeepSeek-R1 671B (MoE) | 2 048 | H800 | BF16 | ~30 days | ~$6 M |
| GPT-4 (est.) | 25 000 | A100/H100 | Mixed | ~90100 days | ~$100 M |
### Power and thermal limits
| Configuration | TDP limit | Throughput loss | Reason |
|-------------|-----------|------------------|--------|
| H100 SXM | 700 W (default) | 0 % | Nominal |
| H100 SXM | 600 W (-15 %) | ~58 % | Power capping |
| H100 SXM | 500 W (-30 %) | ~1525 % | Significant throttling |
| H100 SXM | 400 W (-43 %) | ~3050 % | Emergency only |
| DGX H100 (8×) | 5.6 kW (max) | 0 % | Liquid cooling required |
| DGX H100 (8×) | 4.5 kW (air) | ~1015 % | Rear-door heat exchanger |
GPU throttles when exceeding TDP or temperature (85°C+). Power capping correlates linearly with frequency but non-linearly with throughput.
### API and operational limits
| Limit | Description | Typical value |
|-------|-------|-----------------|
| **Rate limit** | Max requests per minute/hour | 10010 000 RPM (per tier) |
| **Tokens per minute (TPM)** | Max tokens per minute | 1M300M (per model) |
| **Context window** | Max input tokens | 4K2M (per model) |
| **Max output tokens** | Max generated tokens | 4K32K (per model) |
| **Concurrent requests** | Parallel request count | 1010 000 (per backend) |
| **Batch window** | Time to accumulate batch | 020 s (vLLM, TGI) |
| **TTFB timeout** | Max latency to first token | 30120 s |
| **Idle timeout** | GPU idle → scale to 0 | 515 min (cloud) |
### Limits per deployment model
| Dimension | On-prem HW | Managed cloud (SageMaker, Vertex) | API (OpenAI, Anthropic) |
|-----------|--------------|----------------------------------|------------------------|
| **Model size** | Limited by HBM (max 192 GB/GPU) | Unlimited (cluster scaling) | Unlimited |
| **Queries** | Limited by GPU count | Auto-scaling | Rate limit (per tier) |
| **Latency** | < 10 ms (same node) | 10100 ms (network hop) | 100 ms 10 s |
| **Customization** | Full (fine-tuning, quantization) | Managed (SageMaker, Bedrock) | Prompt engineering only |
| **Data privacy** | Yes (on-prem) | Contractual (region, encryption) | Limited |
| **Cost per 1M tokens** | ~$0.100.50 (FP16 inference) | ~$0.201.00 | ~$0.1515.00 |
| **Max context** | 128K+ (depending on GPU count) | 128K+ | 32K2M |
| **Cold start** | 0 (always-on) | 30 s 5 min | 0 (shared infra) |
---
## GPU pricing and price/performance (2026)
> Prices are approximate — NVIDIA does not publish official datacenter GPU price lists. Cloud prices from public providers (Q2 2026). HW purchase prices vary by volume, reseller, and region.
### Purchase price (buy)
| GPU | Price/GPU | Price 8× GPU baseboard | $/PFLOPS (FP16) | Note |
|-----|---------|----------------------|----------------|------|
| **H100 SXM** | $27,00040,000 | ~$200,000 | $25,000 | Scarcity 20232024, now stabilized |
| **H200 SXM** | $35,00050,000 | ~$280,000 | ~$35,000 | H100 upgrade, HBM3e |
| **B200** | ~$60,00070,000 | ~$500,000+ | ~$31,000 | Blackwell, FP4 support |
| **B100** | ~$30,000 | ~$240,000 | ~$20,000 | Lower price than B200, similar FP8 perf |
| **GB200** (Grace+Blackwell) | ~$70,000100,000 | ~$2,000,000 (rack) | — | CPU+GPU unified, high-density |
| **A100 80GB** | ~$10,00015,000 | ~$120,000 | ~$19,200 | Previous gen, still relevant |
| **MI300X** | ~$12,00018,000 | ~$100,000 | ~$9,600 | AMD, 192 GB HBM3 |
| **Gaudi 3** | ~$15,625 | ~$125,000 | **$8,515** | Intel, best $/PFLOPS |
| **L40S** | ~$8,00010,000 | — | — | Inference, enterprise |
### Cloud pricing (on-demand $/GPU/hr)
| GPU | Cheapest | Mid-range (CoreWeave, Lambda) | Hyperscaler (AWS, GCP, Azure) |
|-----|----------|-----------------------------|-------------------------------|
| **H100 SXM** | $1.38 (Thunder) | $2.893.29 | $4.156.88 |
| **H100 PCIe** | $2.01 (Spheron) | $2.50 | — |
| **H200 SXM** | $3.89 (Spheron) | $4.54 | $5.00+ |
| **B200** | **$3.39** (Spheron) | $6.02 | $14.24 (AWS) |
| **B200 spot** | **$2.12** (Spheron) | — | — |
| **GB200** | $3.50 (Runcrate) | $5.85 (Oracle) | $6.95 (GCP) |
| **MI300X** | **$1.50** (TensorWave) | $1.85 (Vultr) | $7.86 (Azure) |
| **A100 80GB** | $1.07 (Spheron) | $1.502.00 | $3.00+ |
| **Gaudi 3** | ~$1.502.50 | — | — |
| **L40S** | $0.91 (Spheron) | $1.502.00 | — |
### Inference cost ($/M tokens)
| GPU | Provider | $/hr | Est. tok/s | $/M tok |
|-----|----------|------|-----------|--------|
| **B200** | Spheron | $3.39 | ~4,000 | **$0.42** |
| **B200 spot** | Spheron | $2.12 | ~4,000 | **$0.15** |
| **H100 PCIe** | Spheron | $2.01 | ~1,200 | $0.47 |
| **A100 80GB** | Spheron | $1.07 | ~520 | $0.57 |
| **H100 SXM** | AWS | $6.88 | ~1,200 | $1.59 |
| **H200 SXM** | Spheron | $4.54 | ~1,800 | $0.70 |
| **L40S** | Spheron | $0.91 | ~450 | $0.56 |
*Values for Llama 3 70B (INT8, batch=1, output 1K tok). Actual values vary by batch size, context, and quantization.*
### Cost per GB HBM
| GPU | HBM | Price/hr cloud | $/GB/hr | Best for memory-bound workloads |
|-----|-----|-------------|--------|--------------------------------|
| **MI300X** | 192 GB | $1.50 | **$0.0078** | ✅ Best |
| **B200** | 192 GB | $3.39 | $0.0177 | ✅ Good |
| **H200** | 141 GB | $3.89 | $0.0276 | ⚠️ |
| **H100 SXM** | 80 GB | $1.38 | $0.0173 | ⚠️ Only up to 70B models |
| **GB200** | 384 GB | $3.50 | $0.0091 | ✅✅ (2× MI300X capacity) |
### Price/performance by scenario
| Scenario | Winner | Rationale |
|----------|--------|-----------|
| **Absolute performance** (cost no object) | **GB200 DGX NVL72** | 72× GPU, 18 PFLOPS FP8, 384 GB HBM/GPU |
| **Cloud inference** — best $/token | **B200 spot** | $0.15/M tok; 4× H100 throughput at lower cost |
| **Cloud inference** — on-demand | **B200** | $0.42/M tok |
| **Cloud inference** — budget | **A100 / L40S** | $0.570.56/M tok |
| **Training** — price/perf on purchase | **Gaudi 3** | $8,515/PFLOPS, 2.53× better than H100 |
| **Training** — cloud | **H100 SXM** | $1.38/hr, CUDA ecosystem, NCCL |
| **Memory-bound** — long context, 70B+ | **MI300X / GB200** | 192384 GB, $0.00780.0091/GB |
| **Ecosystem + safe choice** | **H100/H200** | CUDA, widest SW, NVIDIA tools |
| **Spot / preemptible** — lowest cost | **A100 / H100** | $1.071.38/hr, 5090% off on-demand |
### 2026 Trends
- **H100** — price dropped 64% from peak $8/hr to $1.382.89/hr, then 40% rebound from inference demand
- **B200** — new high-end, $3.39/hr cloud → ~$0.15/M tok on spot — new inference benchmark
- **MI300X** — supply growing (TensorWave, Vultr, CoreWeave, Oracle, Azure), from $1.50/hr
- **Gaudi 3** — best $/PFLOPS on purchase, but narrow ecosystem and limited cloud availability
- **Market bifurcation** — prior gen (H100, A100) commoditizing, new gen (B200, GB200) commanding premium
- [GPU.en.md](GPU.en.md) — GPU architecture, NVIDIA/AMD, vGPU, MIG
- [NETWORKING.en.md](NETWORKING.en.md) — InfiniBand, RoCE, network topology
- [STORAGE.en.md](STORAGE.en.md) — parallel filesystem, object store
- [DATACENTERS.en.md](DATACENTERS.en.md) — DC layout, power, cooling
- [CLOUD.en.md](CLOUD.en.md) — cloud AI services (SageMaker, Vertex AI)
## Sources
Links, books, and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)
*Last revision: 2026-06-18*

602
AI-INFRASTRUCTURE.md Normal file
View File

@@ -0,0 +1,602 @@
# 🧠 Infrastruktura pro AI/ML
## Přehled komponent
```mermaid
flowchart TD
subgraph Compute
GPU["GPU (H100/B200/Instinct)"]
CPU["CPU (AMD EPYC / Intel Xeon)"]
ASIC["ASIC (TPU, Trainium, Inferentia)"]
end
subgraph Network
IB["InfiniBand NDR/XDR"]
ROCE["RoCEv2"]
NVL["NVLink / NVSwitch"]
end
subgraph Storage
FS["Parallel FS (Lustre, GPFS, Weka)"]
OBJ["Object Store (S3, MinIO)"]
NVME["Local NVMe cache"]
end
subgraph Orchestration
S["Slurm"]
K["Kubernetes + Volcano/Kueue"]
end
subgraph Cooling
DLC["Direct-to-chip liquid"]
IMM["Immersion"]
AIR["Air (high-density)"]
end
Compute --> Network --> Storage
Orchestration --> Compute
Cooling --> Compute
```
---
## GPU compute
### NVIDIA
| GPU | Architektura | FP8 | FP16/BF16 | FP64 | HBM | NVLink | TDP | Rack |
|-----|-------------|-----|-----------|------|-----|--------|-----|------|
| **H100 SXM** | Hopper | 3 958 TFLOPS | 1 979 TFLOPS | 67 TFLOPS | 80 GB HBM3 | 900 GB/s | 700 W | 68× v DGX H100 |
| **H200 SXM** | Hopper (HBM3e) | 3 958 TFLOPS | 1 979 TFLOPS | 67 TFLOPS | 141 GB HBM3e | 900 GB/s | 700 W | 68× v DGX H200 |
| **B200** | Blackwell | ~9 000 TFLOPS | ~4 500 TFLOPS | ~40 TFLOPS | 192 GB HBM3e | 1 800 GB/s | 1 000 W | 68× v DGX B200 |
| **GB200 Grace Hopper** | Blackwell | ~18 000 TFLOPS | ~9 000 TFLOPS | — | 192 GB + 480 GB (Grace) | NVLink-C2C | 1 000 W (GPU) + 500 W (CPU) | DGX GB200 (36× GPU) |
| **L40S** | Ada Lovelace | 733 TFLOPS | 367 TFLOPS | — | 48 GB GDDR6 | N/A | 350 W | Inference, enterprise |
| **A100 SXM** | Ampere | 1 248 TFLOPS | 624 TFLOPS | 19,5 TFLOPS | 80 GB HBM2e | 600 GB/s | 400 W | DGX A100 |
### AMD
| GPU | Architektura | FP8 | FP16/BF16 | FP64 | HBM | Infinity Fabric | TDP |
|-----|-------------|-----|-----------|------|-----|----------------|-----|
| **MI300X** | CDNA 3 | 2 615 TFLOPS | 1 307 TFLOPS | 81 TFLOPS | 192 GB HBM3 | 896 GB/s | 750 W |
| **MI250** | CDNA 2 | — | 383 TFLOPS | 95,7 TFLOPS | 128 GB HBM2e | 400 GB/s | 500 W |
### Intel
| GPU | Architektura | FP16/BF16 | FP32 | HBM | TDP |
|-----|-------------|-----------|------|-----|-----|
| **Gaudi 3** | Custom | 1 835 TFLOPS | — | 144 GB HBM2e | 600 W |
| **Max 1550** | Xe HPC | 600+ TFLOPS | 200 TFLOPS | 128 GB HBM2e | 600 W |
### Cloud ASIC
| ASIC | Provider | Use case | Výkon |
|------|----------|----------|-------|
| **TPU v5p** | Google | Training | ~4 600 TFLOPS (BF16) per pod |
| **Trainium 2** | AWS | Training | ~1 000 TFLOPS (BF16) per chip |
| **Inferentia 2** | AWS | Inference | ~400 TOPS (INT8) per chip |
| **Maia 100** | Microsoft | Training + inference | Custom, 800 W TDP |
---
## AI networking
### Srovnání technologií
| Technologie | Bandwidth per link | Latence | Topologie | Use case |
|-------------|-------------------|---------|-----------|----------|
| **InfiniBand NDR200** | 200 Gb/s | < 1 µs | Fat-tree, Dragonfly+ | Training (NVIDIA) |
| **InfiniBand NDR400** | 400 Gb/s | < 1 µs | Fat-tree, Dragonfly+ | Training (NVIDIA) |
| **InfiniBand XDR** | 800 Gb/s (planned) | < 1 µs | Dragonfly+ | Next-gen training |
| **RoCEv2** (CX-7/8) | 200400 Gb/s | 12 µs | Fat-tree, Spine-leaf | Training (AMD, Intel, open) |
| **NVLink 4.0** | 900 GB/s per GPU | < 0,5 µs | NVSwitch full-mesh | Intra-node GPU comm |
| **NVLink 5.0** | 1 800 GB/s per GPU | < 0,5 µs | NVSwitch full-mesh | Intra-node (Blackwell) |
| **Ethernet (400 GbE)** | 400 Gb/s | 25 µs | Spine-leaf | Inference, data pipeline |
### Principy AI fabric
- **Rail-optimized topology** — každá GPU komunikuje na dedikovaném "rails" (stejné GPU indexy napříč uzly jsou na stejném switchi)
- **Fat-tree (Clos)** — standard pro InfiniBand a RoCE, non-blocking bisection bandwidth
- **Dragonfly+** — redukce počtu hopů při zachování bandwidth (používáno v největších clusterech)
- **GPU Direct RDMA** — přímá komunikace GPU ↔ GPU bez CPU involvementu, podpora InfiniBand a RoCE
- **SHARP (Scalable Hierarchical Aggregation and Reduction Protocol)** — in-network reduction pro AllReduce (pouze InfiniBand)
### Bandwidth dimenzování
```text
Pravidlo: InfiniBand bandwidth ≥ 50 % GPU HBM bandwidth pro škálovatelné training
Příklad: H100 má 3,35 TB/s HBM
→ Potřebuje min. 1,6 TB/s bisection bandwidth per GPU
→ 8× H100 v DGX: 4× NDR400 IB na GPU = 4 × 50 GB/s = 200 GB/s
→ Reálně: 8× 200 Gb/s (25 GB/s) per GPU v typické konfiguraci = ~6 % HBM → bottleneck
```
---
## AI storage
### Požadavky
| Dataset size | IO pattern | Doporučený storage | Bandwidth |
|-------------|-----------|-------------------|-----------|
| < 10 TB | Sequential read (data loading) | Local NVMe | > 10 GB/s per node |
| 10100 TB | Random read (checkpointing) | Parallel FS (Lustre, Weka) | > 100 GB/s cluster-wide |
| 100 TB10 PB | Mixed (training + checkpoint) | Parallel FS + object store | > 500 GB/s |
| 10 PB+ | Multi-modal, video, LLM | Tiered (NVMe cache + parallel FS + object) | > 1 TB/s |
### Srovnání storage řešení
| Řešení | Typ | Bandwidth per node | Max capacity | Škálování | Use case |
|--------|-----|-------------------|-------------|-----------|----------|
| **Lustre** | Parallel FS (POSIX) | > 100 GB/s (cluster) | 100s PB | OST + MDS | HPC, LLM training (standard) |
| **GPFS / StorageScale** | Parallel FS (POSIX) | > 100 GB/s | 100s PB | NSD servers | HPC, AI (IBM) |
| **WekaFS** | Parallel FS (POSIX + NFS/SMB) | ~80 GB/s per 10 nodes | 10s PB | Container-native | AI/ML, NVIDIA DGX preferred |
| **VAST Data** | Universal storage (NVMe + QLC) | ~100 GB/s per cluster | 10s PB | Scale-out | AI, checkpoint, data lake |
| **Pure Storage//E** | All-flash (NVMe) | ~50 GB/s | ~30 PB | Scale-out | Enterprise AI, database |
| **MinIO / S3** | Object store | ~20 GB/s per gateway | EB | Erasure coding | Dataset repository, checkpoint |
| **NetApp AFF** | NAS + S3 | ~10 GB/s per controller | ~50 PB | HA pair | Enterprise, NFS baseline |
### Checkpointing strategie
| Strategie | RPO | Storage impact | Popis |
|-----------|-----|---------------|-------|
| **Full checkpoint** | každý N step | Vysoký (zastaví training) | Celý model + optimizer state |
| **Async checkpoint** | každý N step | Střední (non-blocking) | Kopie do staging bufferu, zápis na pozadí |
| **Distributed checkpoint** (NVIDIA NeMo) | každý N step | Nízký | Každá rank zapisuje svůj shard |
| **In-memory checkpoint** (IBM) | při failover | Minimální (DRAM) | Replikace do DRAM jiného node |
| **Continuous checkpoint** (Microsoft) | každý 15 min | Nízký (delta) | Jen changed shardy |
---
## AI cluster architektura
### Fyzická topologie — DGX H100 example
```
┌──────── DGX H100 (8× GPU) ────────┐
│ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │
│ │GPU 0│ │GPU 1│ │GPU 2│ │GPU 3│ │
│ └──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘ │
│ ┌──┴──┐ ┌──┴──┐ ┌──┴──┐ ┌──┴──┐ │
│ │GPU 4│ │GPU 5│ │GPU 6│ │GPU 7│ │
│ └─────┘ └─────┘ └─────┘ └─────┘ │
│ NVSwitch (NVLink 4.0, 900 GB/s) │
│ InfiniBand CX-7: 8× NDR400 │
└────────────────────────────────────┘
│ 8× IB rails
┌────┴──────────────┐
│ IB NDR400 Switches │ (rail-optimized)
└────────────────────┘
```
### Kubernetes pro AI
| Komponenta | Role |
|-----------|------|
| **Volcano** | Batch scheduling, gang scheduling, queue management |
| **Kueue** | Multi-tenant admission, resource quotas, fair sharing |
| **NVIDIA GPU Operator** | Driver, container toolkit, MIG, DCGM, monitoring |
| **HAMi** (ex k8s-vGPU-scheduler) | GPU sharing, MIG partitioning, fractional GPU |
| **Node Feature Discovery** | Detekce GPU typu, NUMA topologie |
| **Topology Manager** | NUMA-aware pod placement |
| **DPDK / SR-IOV** | High-performance networking pro GPU Direct RDMA |
### Slurm pro AI
| Komponenta | Role |
|-----------|------|
| **slurm.conf** | Partition pro GPU nodes, GRES (Generic Resource) |
| **gres.conf** | GPU typ, počet GPU na node |
| **srun --gres=gpu:8** | Alokace 8 GPU pro job |
| **sbatch --nodes=64 --ntasks=512** | 64 uzly, 512 ranků (8 GPU/node) |
| **Pixis** | NVIDIA orchestrace plugin pro Slurm |
---
## Chlazení AI clusterů
### Power density srovnání
| Konfigurace | TDP per node | Racků | kW/rack | Poznámka |
|-------------|-------------|-------|---------|----------|
| Standardní server (2U) | 1 kW | 20 | 510 | Běžné DC |
| GPU server (DGX H100, 6×) | 42 kW | 6 | 4550 | Air cooling limit |
| GPU server (DGX B200, 6×) | 72 kW | 6 | 90100 | Liquid cooling nutný |
| GPU server (GB200 NVL72) | 120 kW | — | ~120 | Liquid cooling mandatory |
| NVIDIA NVL72 rack | 120 kW | 1 | 120 | Plně liquid cooled |
### Chladící technologie
| Metoda | Max kW/rack | CAPEX | OPEX | Komplexita |
|--------|-------------|-------|------|-----------|
| **Air cooling (CRAC/CRAH)** | < 15 | Nízká | Střední | Nízká |
| **Air cooling (in-row)** | 1530 | Střední | Střední | Nízká |
| **Rear-door heat exchanger** | 3050 | Střední | Nízká | Střední |
| **Direct-to-chip liquid (cold plate)** | 50150 | Vysoká | Nízká | Vysoká |
| **Immersion (single-phase)** | 100200 | Vysoká | Nízká | Vysoká |
| **Immersion (two-phase)** | 200+ | Velmi vysoká | Nízká | Velmi vysoká |
---
## Inference infrastruktura
### Srovnání inference serverů
| Nástroj | Frameworky | Optimalizace | Use case |
|---------|-----------|-------------|----------|
| **vLLM** | Megatron, HF, AWQ, GPTQ | PagedAttention, KV cache, continuous batching | LLM inference (open source) |
| **TensorRT-LLM** | TensorRT | INT4/INT8/FP8, inflight batching, attention optimizations | Produkce (NVIDIA) |
| **Triton Inference Server** | Vše (TensorRT, vLLM, PyTorch) | Model ensemble, model caching, concurrent execution | Enterprise, multi-model |
| **SageMaker** | Managed | Auto-scaling, model parallelism | AWS managed |
| **OpenAI API / TGI** | HF Transformers | Continuous batching, flash attention | Hosting |
### Optimalizace pro inference
| Technika | Latence zlepšení | Propustnost zlepšení | Memory reduction |
|----------|-----------------|---------------------|------------------|
| **FP8/INT8 quantization** | — | 2× | 2× |
| **INT4 quantization** | — | 4× | 4× |
| **Flash Attention 2/3** | 24× | — | 50 % (KV cache) |
| **PagedAttention** | — | 25× | 95 % (KV cache fragmentation) |
| **Continuous batching** | — | 1020× | — |
| **Speculative decoding** | 23× | — | — |
| **Multi-LoRA / S-LoRA** | — | 816× | — |
---
## Distribuované training techniky
| Technika | Popis | Frameworky |
|----------|-------|------------|
| **Data Parallelism (DDP/FSDP)** | Každá GPU má kopii modelu, různé batch | PyTorch DDP, FSDP |
| **Tensor Parallelism (TP)** | Model rozdělen po vrstvách (intra-node) | Megatron-LM, DeepSpeed |
| **Pipeline Parallelism (PP)** | Vrstvy rozděleny napříč uzly | Megatron-LM, DeepSpeed |
| **Sequence Parallelism (SP)** | Sekvence rozdělena napříč GPU | Megatron-LM |
| **Expert Parallelism (EP)** | Různé expertní subsítě na různých GPU | Mixture-of-Experts (MoE) |
| **3D Parallelism** | TP + PP + DP kombinace | Megatron-LM, NeMo |
| **ZeRO (1/2/3)** | Optimalizátor/gradient/parametry sharding | DeepSpeed |
| **NCCL / RCCL** | GPU collective communication library | NVIDIA/AMD |
---
## Operační systémy pro AI
### Srovnání distribucí
| OS | GPU driver | CUDA | Container toolkit | IB/RoCE | Lustre klient | Produkční podpora |
|----|-----------|------|-------------------|---------|--------------|-------------------|
| **Ubuntu 22.04 LTS** | NVIDIA 525+ | 12.x | nvidia-container-toolkit | MLNX_OFED, rdma-core | Ano (lustre-client) | NVIDIA DGX standard |
| **Ubuntu 24.04 LTS** | NVIDIA 550+ | 12.5+ | nvidia-container-toolkit | MLNX_OFED, rdma-core | Ano | Nejnovější GPU podpora |
| **RHEL 9 / Rocky 9** | NVIDIA 525+ | 12.x | nvidia-container-toolkit | MLNX_OFED | Ano (EL repo) | Red Hat, enterprise |
| **DGX OS** (Ubuntu-based) | NVIDIA custom | 12.x | Pre-installed | Pre-configured | Ano | NVIDIA DGX jediná podporovaná |
| **SLES 15 SP5** | NVIDIA 525+ | 12.x | nvidia-container-toolkit | MLNX_OFED | Ano | HPC, některé Lustre clustery |
| **Debian 12** | NVIDIA 525+ | 12.x | nvidia-container-toolkit | rdma-core | Ano (backports) | Community, research |
| **Flatcar / Bottlerocket** | Container-host | — | nvidia-container-toolkit | Omezeně | Ne | K8s-only, minimal footprint |
### Omezení a limity
#### GPU drivery a CUDA
| Omezení | Detail |
|----------|--------|
| **Driver-CUDA kompatibilita** | NVIDIA driver major verze musí odpovídat CUDA toolkit (driver ≥ CUDA req). Např. CUDA 12.5 vyžaduje driver ≥ 550 |
| **Kernel version** | NVIDIA driver není kompatibilní se všemi kernely. Nový kernel (6.8+) může vyžadovat DKMS build nebo opožděnou podporu |
| **Secure Boot** | NVIDIA driver vyžaduje podepsaný modul (MOK, shim) nebo vypnutý Secure Boot — častý problém v enterprise |
| **Open vs Proprietary driver** | NVIDIA `nvidia-open` (od R515) — open source kernel modul. Podpora GPU: datové centrum (H100+) → OK, starší GPU → proprietary nutný |
| **nvidia-persistenced** | Nutný pro udržení GPU initialization, bez něj GPU po idle timeout usnou (`nvidia-smi -pm 1`) |
| **GPU reset** | Po crash training jobu může GPU viset. `nvidia-smi --gpu-reset` nebo reboot node, někdy i power cycle |
| **Multi-instance GPU (MIG)** | Vyžaduje specifický driver, MIG mode na GPU, restart GPU. Nelze měnit za běhu. Podpora jen A100, H100, B200 |
#### Network (InfiniBand / RoCE)
| Omezení | Detail |
|----------|--------|
| **MLNX_OFED vs rdma-core** | MLNX_OFED (NVIDIA) — plná podpora, ale vlastní kernel moduly, nutná compatibility s kernel verzí. `rdma-core` (open) — omezená podpora, ale bez modulů |
| **Kernel compatibility** | MLNX_OFED podporuje jen specifické kernel verze (major.minor). Upgrade kernelu → nutný rebuild MLNX_OFED |
| **NCCL** | Verze NCCL musí být kompatibilní s CUDA a IB firmware. `nccl-tests` jako validace |
| **SHARP** | In-network reduction vyžaduje specifickou MLNX_OFED + IB switch firmware kombinaci |
| **GPU Direct RDMA** | Vyžaduje `nvidia-peermem` modul + MLNX_OFED. Nefunguje se všemi GPU a IB kartami |
| **RoCE v PFC/ECN** | RoCE vyžaduje lossless fabric (PFC, ECN, DCQCN). Nastavení switch i host — komplexní tuning |
#### Storage
| Omezení | Detail |
|----------|--------|
| **Lustre klient** | Verze klienta musí odpovídat serveru. Upgrade serveru → upgrade všech klientů. Kompatibilní jen s RHEL/Debian deriváty |
| **POSIX locking** | NFS a Lustre mají odlišné POSIX locking chování. Distributed training spoléhá na flock → problém při smíšených FS |
| **Filesystem cache** | Page cache může maskovat IO bottleneck. Training joby často vyžadují `O_DIRECT` nebo `sync` IO |
| **Local NVMe vs parallel FS** | Dataset staging na lokální NVMe eliminuje síťovou závislost, ale vyžaduje prostor a pre-fetch pipeline |
#### Kontejnerový runtime
| Omezení | Detail |
|----------|--------|
| **Docker + GPU** | `nvidia-container-toolkit` (dříve nvidia-docker2). Nutná instalace runtime a config v `/etc/docker/daemon.json` |
| **Podman + GPU** | Vyžaduje `nvidia-container-toolkit` + podman hook. Méně testováno než Docker |
| **containerd + GPU** | Standart pro K8s. Vyžaduje `cdi` (Container Device Interface) nebo `nvidia-container-runtime` |
| **Enroot + Pyxis** | NVIDIA container stack pro Slurm (Enroot = container runtime bez daemona, Pyxis = Slurm plugin) |
| **User namespace mapping** | Kontejnerové GPU access vyžaduje device cgroup a rootless může selhat (výjimka pro /dev/dri a /dev/nvidia*) |
#### Kernel parametry
```text
# AI workload recommended sysctl
net.core.rmem_max = 134217728 # dostatečný pro NCCL
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.core.netdev_budget = 600 # pro vysokou packet rate
vm.max_map_count = 1048576 # PyTorch DataLoader workers
kernel.numa_balancing = 0 # vypnout NUMA balancing (ruší locality)
kernel.sched_min_granularity_ns = 10000000
# Disable security mitigations pro perf (pouze na dedicated AI clusterech)
mitigations=off
transparent_hugepages=never # nebo madvise — THP může způsobovat latency spiky
intel_idle.max_cstate=1 # redukce C-state transition latency
```
#### Firmware a HW
| Omezení | Detail |
|----------|--------|
| **GPU firmware (VBIOS)** | NVIDIA datacenter GPU (H100, B200) mají VBIOS updates přes NVFlash. Bez update → chybí podpora partitioning nebo novějších CUDA feature |
| **InfiniBand firmware** | IB switch a HCA firmware musí být kompatibilní. Mix starého switch + nového HCA → degraded perf |
| **NVSwitch firmware** | DGX systémy mají NVSwitch firmware updatovatelný jen přes NVIDIA DGX tools |
| **Power capping (nvidia-smi)** | `nvidia-smi -pl <power>` — omezení TDP pro power budget management. Nutné testovat vliv na training throughput |
| **GPU clock locking** | `nvidia-smi -ac <clock,mem>` — locked clock frekvence pro stabilní benchmarky. Aplikace až po `nvidia-persistenced` |
| **PCIe Gen** | GPU v PCIe Gen4 slotu (místo Gen5) → bottleneck pro data transfer CPU↔GPU. Důležité pro FSDP sharding |
### Doporučené OS per use case
| Use case | OS | Zdůvodnění |
|----------|-----|-------|
| **DGX cluster (produkce)** | DGX OS / Ubuntu 22.04 LTS | NVIDIA standard, nejlepší driver support |
| **Enterprise K8s (OpenShift)** | RHEL 9 / RHCOS | Red Hat support, GPU Operator kompatibilní |
| **Vanilla K8s (on-prem)** | Ubuntu 22.04 LTS + Flatcar (workers) | Nejširší community support, Flatcar pro minimal footprint |
| **Slurm cluster (HPC/AI)** | Rocky Linux 9 / Ubuntu 22.04 LTS | EL ekosystém (Lustre, OFED) nebo Ubuntu (community) |
| **Výzkum / rapid prototyping** | Ubuntu 24.04 LTS | Nejnovější CUDA, PyTorch, driver support |
| **Edge inference** | NVIDIA JetPack / Ubuntu (ARM) | Embedded GPU (Jetson Orin, AGX) |
---
## AI-ready datové centrum — check-list
| Oblast | Požadavek |
|--------|-----------|
| **Power** | 30120 kW/rack, HVDC (400 V DC), UPS s podporou GPU špiček |
| **Cooling** | Liquid cooling ready (direct-to-chip), rear-door pro 30+ kW |
| **Network** | InfiniBand (NDR/XDR) nebo RoCEv2, rail-optimized fat-tree |
| **Storage** | Parallel FS (Lustre/Weka), checkpoint bandwidth > 100 GB/s |
| **GPU density** | Max GPU/rack, minimalizace NVSwitch hopů |
| **Physical** | Podlaha nosnost 1 500+ kg/m², rack 52U60U |
| **Security** | Tenant isolation, network segmentation, data encryption |
| **Monitoring** | DCGM, NCCL health checks, thermals, power capping |
---
## Omezení modelů a propustnosti
### Model size per GPU
Maximální velikost modelu, který se vejde na jednu GPU, závisí na HBM kapacitě a precision:
| GPU | HBM | FP32 | FP16/BF16 | INT8 | INT4 |
|-----|-----|------|-----------|------|------|
| **H100 80GB** | 80 GB | ~10B | ~40B | ~80B | ~160B |
| **H200 141GB** | 141 GB | ~18B | ~70B | ~140B | ~280B |
| **B200 192GB** | 192 GB | ~24B | ~96B | ~192B | ~384B |
| **MI300X 192GB** | 192 GB | ~24B | ~96B | ~192B | ~384B |
| **A100 80GB** | 80 GB | ~10B | ~40B | ~80B | ~160B |
| **GB200 (192+480)** | 192 GB GPU + 480 GB Grace | — | ~96B + CPU offload | — | — |
*Hodnoty orientační: 1B parametrů ≈ 2 GB FP16 ≈ 4 GB FP32 ≈ 1 GB INT8 ≈ 0,5 GB INT4. Reálně odečíst ~1015 % HBM pro activations, KV cache, optimizer states.*
### Memory breakdown inference
| Komponenta | Llama 3 70B (FP16) | Llama 3 8B (FP16) |
|------------|-------------------|-------------------|
| Model weights | 140 GB | 16 GB |
| KV cache (4K context, batch 1) | ~2 GB | ~0,2 GB |
| KV cache (128K context, batch 1) | ~60 GB | ~6,5 GB |
| Activations (peak) | ~5 GB | ~1 GB |
| **Celkem 4K ctx** | ~147 GB | ~17 GB |
| **Celkem 128K ctx** | ~205 GB | ~23 GB |
**Závěr:** Llama 3 70B v FP16 se nevejde na jednu H100 (80 GB). Nutné: INT8 (170 GB → 2× H100), INT4 (85 GB → 1× H200), nebo tensor parallelism.
### Context length vs memory
| Context | KV cache 70B (FP16) | KV cache 8B (FP16) | Poznámka |
|---------|-------------------|-------------------|----------|
| 4K | ~2,2 GB | ~0,25 GB | Běžný chat |
| 32K | ~18 GB | ~2 GB | Dokumenty |
| 128K | ~72 GB | ~8 GB | Long-context (Claude, Gemini) |
| 1M | ~560 GB | ~64 GB | Experimentální (Gemini 1.5 Pro) |
KV cache je **lineární s délkou kontextu** a kvadratická s počtem hlav pozornosti. Pro long-context je kritická.
### Throughput inference
| Model | GPU | Precision | Batch size | Tokens/s | QPS (1K output) |
|-------|-----|-----------|-----------|----------|-----------------|
| Llama 3 8B | H100 | FP16 | 1 | ~800 | ~0,8 |
| Llama 3 8B | H100 | FP16 | 128 | ~4 500 | ~35 |
| Llama 3 8B | H100 | INT4 | 128 | ~8 000 | ~62 |
| Llama 3 70B | 4× H100 | FP16 | 1 | ~180 | ~0,18 |
| Llama 3 70B | 4× H100 | INT4 | 64 | ~1 200 | ~19 |
| Llama 3 70B | 8× H100 | FP16 (TP=8) | 128 | ~2 500 | ~20 |
| DeepSeek-R1 671B | 8× H200 | FP8 (MoE) | 64 | ~500 | ~8 |
| GPT-4 class (est.) | — | — | — | ~100300 | ~13 |
**Poznámky:**
- QPS (queries per second) závisí na output délce (1K tokenů ≈ ~1 query)
- Batch size zvyšuje throughput, ale zvyšuje TTFB (time to first token)
- Tensor Parallelism (TP) škáluje, ale komunikační režba roste lineárně
### Training limits
#### Scaling efficiency
| Počet GPU | Model | Efficiency | Důvod |
|-----------|-------|-----------|-------|
| 8 (1 node) | Llama 3 8B | ~95 % | NVLink intra-node |
| 64 (8 nodes) | Llama 3 8B | ~85 % | IB inter-node |
| 512 (64 nodes) | Llama 3 70B | ~75 % | Komunikační režie |
| 4 096 (512 nodes) | Llama 3 70B | ~60 % | Pipeline bubble, network |
| 16 384 (2 048 nodes) | Llama 3 405B | ~45 % | Synchronous SGD overhead |
**Poznámka:** Efficiency = (actual throughput) / (ideal linear speedup). Klesá logaritmicky s počtem GPU.
#### Memory breakdown training
| Komponenta | Llama 3 70B (BF16) | Llama 3 8B (BF16) |
|------------|-------------------|-------------------|
| Model weights | 140 GB | 16 GB |
| Optimizer states (Adam) | 280 GB | 32 GB |
| Gradients | 140 GB | 16 GB |
| Activations (peak) | ~30 GB | ~4 GB |
| **Celkem (DDP)** | ~590 GB | ~68 GB |
| **Celkem (FSDP shard=8)** | ~74 GB | ~8,5 GB |
**Závěr:** FSDP (Fully Sharded Data Parallelism) je nutný pro trénování modelů > 10B. Adam optimizer zdvojnásobuje memory oproti inference (weights + optimizer + gradients).
#### Time to train
| Model | GPU count | GPU type | Precision | Time | Cost (on-prem odhad) |
|-------|-----------|---------|-----------|------|---------------------|
| Llama 3 8B | 64 | H100 | BF16 | ~3 dny | ~$5 000 |
| Llama 3 70B | 512 | H100 | BF16 | ~14 dní | ~$100 000 |
| Llama 3 405B | 16 384 | H100 | BF16 | ~60 dní | ~$14 M |
| DeepSeek-R1 671B (MoE) | 2 048 | H800 | BF16 | ~30 dní | ~$6 M |
| GPT-4 (est.) | 25 000 | A100/H100 | Mixed | ~90100 dní | ~$100 M |
### Power a thermal limity
| Konfigurace | TDP limit | Throughput ztráta | Důvod |
|-------------|-----------|------------------|-------|
| H100 SXM | 700 W (default) | 0 % | Nominální |
| H100 SXM | 600 W (-15 %) | ~58 % | Power capping |
| H100 SXM | 500 W (-30 %) | ~1525 % | Výrazný throttling |
| H100 SXM | 400 W (-43 %) | ~3050 % | Jen pro emergency |
| DGX H100 (8×) | 5,6 kW (max) | 0 % | Nutné liquid cooling |
| DGX H100 (8×) | 4,5 kW (air) | ~1015 % | Rear-door heat exchanger |
GPU throttluje při překročení TDP nebo teploty (85°C+). Power capping je lineární korelace s frekvencí, ale nelineární s propustností.
### API a provozní limity
| Limit | Popis | Typická hodnota |
|-------|-------|-----------------|
| **Rate limit** | Max requestů za minutu/hodinu | 10010 000 RPM (dle tieru) |
| **Tokens per minute (TPM)** | Max tokenů za minutu | 1M300M (dle modelu) |
| **Context window** | Max vstupních tokenů | 4K2M (dle modelu) |
| **Max output tokens** | Max vygenerovaných tokenů | 4K32K (dle modelu) |
| **Concurrent requests** | Počet paralelních requestů | 1010 000 (dle backendu) |
| **Batch window** | Čas na sebírání batch | 020 s (vLLM, TGI) |
| **TTFB timeout** | Max latence na první token | 30120 s |
| **Idle timeout** | GPU idle → škálování na 0 | 515 min (cloud) |
### Limity per deployment model
| Model | Samostatný HW | Managed cloud (SageMaker, Vertex) | API (OpenAI, Anthropic) |
|-------|--------------|----------------------------------|------------------------|
| **Model size** | Limitován HBM (max 192 GB/GPU) | Neomezen (škálování cluster) | Neomezen |
| **Queries** | Limitován GPU count | Auto-scaling | Rate limit (dle tieru) |
| **Latency** | < 10 ms (same node) | 10100 ms (network hop) | 100 ms 10 s |
| **Customization** | Plná (fine-tuning, quantization) | Managed (SageMaker, Bedrock) | Pouze prompt engineering |
| **Data privacy** | Ano (on-prem) | Smluvní (region, encryption) | Omezená |
| **Cost per 1M tokens** | ~$0,100,50 (FP16 inference) | ~$0,201,00 | ~$0,1515,00 |
| **Max context** | 128K+ (dle GPU count) | 128K+ | 32K2M |
| **Cold start** | 0 (always-on) | 30 s 5 min | 0 (shared infra) |
---
## Ceny GPU a poměr cena/výkon (2026)
> Ceny jsou orientační — NVIDIA nezveřejňuje oficiální ceník pro datacenter GPU. Cloud ceny dle veřejných providerů (Q2 2026). Při koupi HW se cena liší dle objemu, resellera a regionu.
### Pořizovací cena (buy)
| GPU | Cena/GPU | Cena 8× GPU baseboard | $/PFLOPS (FP16) | Poznámka |
|-----|---------|----------------------|----------------|----------|
| **H100 SXM** | $27 00040 000 | ~$200 000 | $25 000 | Scareita 20232024, nyní stabilizace |
| **H200 SXM** | $35 00050 000 | ~$280 000 | ~$35 000 | Upgrade H100, HBM3e |
| **B200** | ~$60 00070 000 | ~$500 000+ | ~$31 000 | Blackwell, FP4 support |
| **B100** | ~$30 000 | ~$240 000 | ~$20 000 | Nižší cena než B200, podobný výkon FP8 |
| **GB200** (Grace+Blackwell) | ~$70 000100 000 | ~$2 000 000 (rack) | — | CPU+GPU unified, high-density |
| **A100 80GB** | ~$10 00015 000 | ~$120 000 | ~$19 200 | Předchozí generace, stále relevantní |
| **MI300X** | ~$12 00018 000 | ~$100 000 | ~$9 600 | AMD, 192 GB HBM3 |
| **Gaudi 3** | ~$15 625 | ~$125 000 | **$8 515** | Intel, nejlepší $/PFLOPS |
| **L40S** | ~$8 00010 000 | — | — | Inference, enterprise |
### Cloud ceny (on-demand $/GPU/hr)
| GPU | Nejdostupnější | Mid-range (CoreWeave, Lambda) | Hyperscaler (AWS, GCP, Azure) |
|-----|--------------|-------------------------------|-------------------------------|
| **H100 SXM** | $1.38 (Thunder) | $2.893.29 | $4.156.88 |
| **H100 PCIe** | $2.01 (Spheron) | $2.50 | — |
| **H200 SXM** | $3.89 (Spheron) | $4.54 | $5.00+ |
| **B200** | **$3.39** (Spheron) | $6.02 | $14.24 (AWS) |
| **B200** | **$2.12** (spot) | — | — |
| **GB200** | $3.50 (Runcrate) | $5.85 (Oracle) | $6.95 (GCP) |
| **MI300X** | **$1.50** (TensorWave) | $1.85 (Vultr) | $7.86 (Azure) |
| **A100 80GB** | $1.07 (Spheron) | $1.502.00 | $3.00+ |
| **Gaudi 3** | ~$1.502.50 | — | — |
| **L40S** | $0.91 (Spheron) | $1.502.00 | — |
### Cena za inferenci ($/M tokenů)
| GPU | Provider | $/hr | Est. tok/s | $/M tok |
|-----|----------|------|-----------|--------|
| **B200** | Spheron | $3.39 | ~4 000 | **$0.42** |
| **B200** (spot) | Spheron | $2.12 | ~4 000 | **$0.15** |
| **H100 PCIe** | Spheron | $2.01 | ~1 200 | $0.47 |
| **A100 80GB** | Spheron | $1.07 | ~520 | $0.57 |
| **H100 SXM** | AWS | $6.88 | ~1 200 | $1.59 |
| **H200 SXM** | Spheron | $4.54 | ~1 800 | $0.70 |
| **L40S** | Spheron | $0.91 | ~450 | $0.56 |
*Hodnoty pro Llama 3 70B (INT8, batch=1, output 1K tok). Reálné hodnoty se liší dle batch size, kontextu a kvantizace.*
### Cena za GB HBM
| GPU | HBM | Cena/hr cloud | $/GB/hr | Vhodnost pro memory-bound workloady |
|-----|-----|-------------|--------|-----------------------------------|
| **MI300X** | 192 GB | $1.50 | **$0.0078** | ✅ Nejlepší |
| **B200** | 192 GB | $3.39 | $0.0177 | ✅ Dobrý |
| **H200** | 141 GB | $3.89 | $0.0276 | ⚠️ |
| **H100 SXM** | 80 GB | $1.38 | $0.0173 | ⚠️ Jen do 70B modelů |
| **GB200** | 384 GB | $3.50 | $0.0091 | ✅✅ (2× MI300X kapacita) |
### Poměr cena/výkon dle scénáře
| Scénář | Vítěz | Zdůvodnění |
|--------|-------|-------|
| **Absolutní výkon** (cena není limit) | **GB200 DGX NVL72** | 72× GPU, 18 PFLOPS FP8, 384 GB HBM/GPU |
| **Cloud inference** — nejlepší $/token | **B200 spot** | $0.15/M tok; 4× throughput H100 při nižší ceně |
| **Cloud inference** — on-demand | **B200** | $0.42/M tok |
| **Cloud inference** — rozpočet | **A100 / L40S** | $0.570.56/M tok |
| **Training** — cena/výkon při koupi | **Gaudi 3** | $8 515/PFLOPS, 2.53× lepší než H100 |
| **Training** — cloud | **H100 SXM** | $1.38/hr, CUDA ekosystém, NCCL |
| **Memory-bound** — long context, 70B+ | **MI300X / GB200** | 192384 GB, $0.00780.0091/GB |
| **Ekosystém + bezpečná volba** | **H100/H200** | CUDA, nejširší SW, NVIDIA tools |
| **Spot / preemptible** — nejnižší cena | **A100 / H100** | $1.071.38/hr, 5090 % sleva oproti on-demand |
### Trendy 2026
- **H100** — cena klesla o 64 % z peaku $8/hr na $1.382.89/hr, pak rebound o 40 % díky inference boomu
- **B200** — nový high-end, $3.39/hr cloud → ~$0.15/M tok na spotu — benchmark pro inference
- **MI300X** — nabídka roste (TensorWave, Vultr, CoreWeave, Oracle, Azure), cena od $1.50/hr
- **Gaudi 3** — nejlepší $/PFLOPS při koupi, ale úzký ekosystém a omezená cloud dostupnost
- **Market se bifurkoval** — starší generace (H100, A100) komoditizují, nová (B200, GB200) drží prémii
## Související
- [GPU.md](GPU.md) — GPU architektura, NVIDIA/AMD, vGPU, MIG
- [NETWORKING.md](NETWORKING.md) — InfiniBand, RoCE, network topologie
- [STORAGE.md](STORAGE.md) — parallel filesystem, object store
- [DATACENTERS.md](DATACENTERS.md) — DC layout, power, cooling
- [CLOUD.md](CLOUD.md) — cloud AI služby (SageMaker, Vertex AI)
## Zdroje
Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
*Poslední revize: 2026-06-18*

232
BIG-DATA.en.md Normal file
View File

@@ -0,0 +1,232 @@
# 🗄️ Big Data — ecosystem, architecture, tools
## Overview
The Big Data ecosystem in 2026: "Hadoop is dead, and yet it's everywhere." HDFS has shrunk, MapReduce is effectively gone, the Cloudera/Hortonworks era is over. But YARN lives on, the Hive Metastore has changed clothes into Iceberg/Delta, and the lakehouse pattern (cheap object storage + table format + distributed engine) is the inheritance Hadoop left behind.
The modern Big Data stack has 8 layers:
1. **Storage** — HDFS, S3, GCS, ABFS, MinIO
2. **Table format** — Apache Iceberg, Delta Lake, Apache Hudi, Apache Paimon
3. **Catalog** — Hive Metastore, Unity Catalog, Polaris, Nessie, AWS Glue
4. **Batch processing** — Apache Spark, Trino-on-Spark, Dremio
5. **Stream processing** — Apache Flink, Spark Structured Streaming, Kafka Streams
6. **Distributed SQL** — Trino, Presto, StarRocks, ClickHouse
7. **Transformation** — dbt, SQLMesh
8. **Orchestration** — Apache Airflow 3.0, Dagster, Prefect, Kestra
---
## Storage
### HDFS (Hadoop Distributed File System)
| Feature | Detail |
|---------|--------|
| **Architecture** | Master/worker: NameNode (metadata) + DataNode (data) |
| **Replication** | Default 3×, configurable (rack-aware) |
| **Block size** | Default 128 MB (range 64 MB 256 MB) |
| **Limits** | NameNode memory ~ 1 GB / 1 million blocks; ~1000 DataNodes per cluster |
| **Use case** | On-prem clusters, sequential read/write, large files |
| **Status 2026** | Declining — most projects migrate to object storage (S3, GCS, MinIO) |
HDFS remains relevant for on-prem environments where object storage is unavailable, or for specific use cases (YARN clusters, Spark shuffle). For new projects, object storage is recommended.
### Object storage as Data Lake
| Platform | Service | Use case |
|----------|--------|----------|
| **AWS** | S3 | Primary data lake, Iceberg/Delta on S3 |
| **Azure** | ADLS Gen2 / Blob | Data lake for Azure ecosystem |
| **GCP** | GCS | Data lake for GCP (Dataproc, BigQuery) |
| **On-prem** | MinIO | S3-compatible object storage on own HW |
### HDFS capacity planning
| Data size | Configuration |
|-----------|-------------|
| **< 100 TB** | 35 DataNodes, 10 GbE, replication 3× |
| **100 TB 1 PB** | 520 DataNodes, 25/100 GbE, rack-aware, NameNode HA |
| **1 PB+** | 20+ DataNodes, 100 GbE, Federation (multiple NameNodes) |
---
## Open Table Formats
Table formats bring ACID transactions, schema evolution, and time travel to data lake object storage.
| Format | Organization | Engine compatibility | Streaming | Catalog |
|--------|-------------|---------------------|-----------|---------|
| **Apache Iceberg** | Apache Foundation | Spark, Flink, Trino, Dremio, Athena, Snowflake | Flink sink, snapshot-based | REST catalog, Polaris, Glue, Hive |
| **Delta Lake** | Linux Foundation (Databricks) | Spark (native), Trino, Flink (limited), Athena | Spark Streaming, DLT | Unity Catalog (proprietary), Hive |
| **Apache Hudi** | Apache Foundation | Spark, Flink, Trino (connector) | Built-in CDC, incremental | Hive, Glue (limited) |
| **Apache Paimon** | Apache Foundation | Flink (native), Spark | LSM-tree, changelog mode | Hive, REST |
**Recommendation 2026:**
- **Iceberg** — broadest multi-engine support, vendor-neutral, open catalog (Polaris)
- **Delta Lake** — best for Spark/Databricks ecosystem, UniForm for cross-format reads
- **Hudi** — losing momentum, only if already in production
- **Paimon** — emerging, Flink-native, LSM architecture
---
## Processing Engines
### Apache Spark
Dominant batch processing engine and unifying engine (batch + streaming + SQL + ML).
| Feature | Detail |
|---------|--------|
| **Version 2026** | Spark 4.x (4.1.0), native Kubernetes support, Structured Streaming, Delta Lake integration |
| **API** | Scala, Java, Python (PySpark), SQL, R (SparkR) |
| **Batch** | DataFrame/Dataset, RDD, SQL queries — 10100× faster than MapReduce |
| **Streaming** | Structured Streaming (micro-batch), latency ~100 ms 5 s |
| **SQL** | Spark SQL, ANSI SQL, Hive compatible |
| **ML** | MLlib, SparkML, MLflow integration |
| **Scheduler** | YARN, Kubernetes (production-ready since Spark 3.x), standalone |
| **Fault tolerance** | RDD lineage, checkpointing |
**When to use Spark:**
- Batch ETL/ELT pipelines
- Unified engine for batch + streaming (team preference)
- Machine learning pipelines (MLlib, SparkML)
- SQL analytics on large datasets
### Apache Flink
Highest-performance engine for true streaming (per-event processing).
| Feature | Detail |
|---------|--------|
| **Version 2026** | Flink 2.x (streaming-first, batch as bounded stream) |
| **API** | DataStream API, Table/SQL API, ProcessFunction (low-level) |
| **Latency** | < 100 ms (true streaming, Chandy-Lamport checkpointing) |
| **State management** | Managed state (ValueState, ListState, MapState), RocksDB backend |
| **Event time** | Native, watermarks, out-of-order handling |
| **Batch** | Batch as bounded stream (same runtime) |
| **Deployment** | YARN, Kubernetes, standalone |
| **Economics** | Higher memory requirements (managed state), requires careful tuning |
**When to use Flink:**
- Fraud detection, real-time bidding, IoT (< 100 ms latency)
- Complex stateful stream processing
- CDC pipelines
- Event-driven architectures
### Trino (ex PrestoSQL)
Distributed SQL query engine — federated queries across various sources.
| Feature | Detail |
|---------|--------|
| **Architecture** | Coordinator + Worker (no storage, no scheduler) |
| **Connectors** | Iceberg, Delta, Hive, HDFS, S3, GCS, ADLS, PostgreSQL, MySQL, Kafka, Elasticsearch |
| **Use case** | Interactive SQL, federated queries, lakehouse queries |
| **Version 2026** | Trino 470+, Iceberg native, Delta Lake connector |
---
## Spark vs Flink vs Trino comparison
| Criteria | Spark | Flink | Trino |
|----------|-------|-------|-------|
| **Primary use case** | Batch + unifying | True streaming | Interactive SQL |
| **Streaming latency** | 100 ms 5 s (micro-batch) | < 100 ms (true streaming) | N/A |
| **Throughput** | High (batch-optimized) | High (pipeline-optimized) | Medium (ad-hoc) |
| **State management** | State store (external) | Managed state (embedded) | N/A |
| **SQL support** | Spark SQL | Flink SQL | ANSI SQL (broadest) |
| **ML/AI** | MLlib, SparkML | — | — |
| **Kubernetes** | Native (production) | Native (production) | Native (production) |
| **Learning curve** | Medium | High | Low |
| **Operational complexity** | Medium | High | Medium |
---
## Orchestration
| Tool | Version 2026 | Use case |
|------|-------------|----------|
| **Apache Airflow** | 3.0+ (taskflow API, dynamic tasks, deferrable operators) | Universal orchestration, largest ecosystem |
| **Dagster** | 1.x (asset-oriented, software-defined assets) | Data pipelines, observability, asset lineage |
| **Prefect** | 3.x (native async, workers, blocks) | Python-native, serverless workers |
| **Kestra** | 1.x (YAML-native, declarative) | Event-driven orchestration |
| **Apache NiFi** | 2.x (flow-based, visual) | Data ingestion, CDC, streaming |
---
## Lakehouse architecture
Lakehouse combines data lake flexibility (object storage) with data warehouse performance and governance.
```
┌──────────────────────────────────────────────────────┐
│ Query Engines │
│ Trino Spark SQL Flink SQL Dremio Athena │
└─────────────────────────┬────────────────────────────┘
┌─────────────────────────▼────────────────────────────┐
│ Table Format Layer │
│ Apache Iceberg / Delta Lake / Hudi │
│ (ACID, time travel, schema evolution) │
└─────────────────────────┬────────────────────────────┘
┌─────────────────────────▼────────────────────────────┐
│ Storage Layer │
│ S3 / GCS / ADLS / MinIO / HDFS │
│ (Parquet / ORC / Avro) │
└──────────────────────────────────────────────────────┘
```
For Iceberg details see [DATABASES.en.md — Apache Iceberg Lakehouse](DATABASES.en.md#apache-iceberg-lakehouse).
---
## Big Data Infrastructure
### Cluster sizing
| Component | Spark (batch) | Flink (streaming) | Trino (SQL) |
|-----------|--------------|-------------------|-------------|
| **CPU** | 1664 cores/node | 1632 cores/node | 832 cores/node |
| **RAM** | 64256 GB/node | 64256 GB/node (incl. managed state) | 64256 GB/node |
| **Storage** | HDFS / object storage | Object storage (checkpoints) | None (stateless) |
| **Network** | 25100 GbE (shuffle-heavy) | 25100 GbE (checkpointing) | 25100 GbE |
| **Disk** | NVMe (scratch, shuffle) | NVMe (RocksDB state backend) | — |
| **Cluster size** | 5200+ nodes | 3100+ nodes | 550 nodes |
### Network considerations
- **Spark shuffle** — heavy network traffic between nodes; recommend 25100 GbE, ideally no oversubscription
- **Flink checkpointing** — periodic state writes to object storage; requires stable latency
- **HDFS rack awareness** — optimizes replication across racks
- **Data locality** — HDFS: local disk reads; object storage: network-bound
### Kubernetes vs YARN
| Criteria | YARN | Kubernetes |
|----------|------|-----------|
| **Resource isolation** | Cgroups (YARN containers) | Cgroups + namespaces (pods) |
| **Ecosystem fit** | Hadoop-native (HDFS, Hive, Spark) | Cloud-native, Spark, Flink, Trino |
| **Operational complexity** | Lower (single cluster manager) | Higher (requires K8s cluster) |
| **Multi-tenant isolation** | YARN queues (Capacity/Fair Scheduler) | Namespaces, ResourceQuotas, LimitRanges |
| **Stateful workloads** | Limited | StatefulSets, PVC, Operators |
| **2026 trend** | Legacy (declining) | Standard for new projects |
---
## Cloud deployment
| Cloud | Batch processing | Streaming | SQL | Managed K8s |
|-------|-----------------|-----------|-----|-------------|
| **AWS** | EMR (Spark, Hive, Flink) | Kinesis, MSK (Kafka), EMR Flink | Athena (Trino), Redshift | EKS |
| **Azure** | HDInsight (Spark, Hive), Synapse | Event Hubs, HDInsight Flink | Synapse SQL, Azure Data Explorer | AKS |
| **GCP** | Dataproc (Spark, Flink, Hive, Trino) | Pub/Sub, Dataflow (Beam), Dataproc Flink | BigQuery | GKE |
---
## Sources
Links, books and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)
*Last revision: 2026-06-18*

232
BIG-DATA.md Normal file
View File

@@ -0,0 +1,232 @@
# 🗄️ Big Data — ekosystém, architektura, nástroje
## Přehled
Big Data ekosystém v roce 2026: "Hadoop je mrtvý, a přitom je všude." HDFS se zmenšil, MapReduce je fakticky mrtvý, Cloudera/Hortonworks éra skončila. Ale YARN žije, Hive Metastore se převlékl do Iceberg/Delta a lakehouse pattern (levné object storage + tabulkový formát + distribuovaný engine) je dědictví, které Hadoop zanechal.
Moderní Big Data stack má 8 vrstev:
1. **Storage** — HDFS, S3, GCS, ABFS, MinIO
2. **Tabulkový formát** — Apache Iceberg, Delta Lake, Apache Hudi, Apache Paimon
3. **Catalog** — Hive Metastore, Unity Catalog, Polaris, Nessie, AWS Glue
4. **Dávkové zpracování** — Apache Spark, Trino-on-Spark, Dremio
5. **Streamové zpracování** — Apache Flink, Spark Structured Streaming, Kafka Streams
6. **Distribuované SQL** — Trino, Presto, StarRocks, ClickHouse
7. **Transformace** — dbt, SQLMesh
8. **Orchestrace** — Apache Airflow 3.0, Dagster, Prefect, Kestra
---
## Úložiště (Storage)
### HDFS (Hadoop Distributed File System)
| Vlastnost | Detail |
|-----------|--------|
| **Architektura** | Master/worker: NameNode (metadata) + DataNode (data) |
| **Replikace** | Výchozí 3×, konfigurovatelná (rack-aware) |
| **Block size** | Výchozí 128 MB (lze 64 MB 256 MB) |
| **Limity** | NameNode memory ~ 1 GB / 1 milion bloků; ~1000 DataNode v clusteru |
| **Use case** | On-prem clustery, sekvenční čtení/zápis, velké soubory |
| **Stav 2026** | Klesající podíl — většina migruje na object storage (S3, GCS, MinIO) |
HDFS je stále relevantní pro on-prem prostředí, kde object storage není dostupná, nebo pro specifické use case (YARN cluster, Spark shuffle). Pro nové projekty se doporučuje object storage.
### Object storage jako Data Lake
| Platforma | Služba | Use case |
|-----------|--------|----------|
| **AWS** | S3 | Hlavní data lake, Iceberg/Delta na S3 |
| **Azure** | ADLS Gen2 / Blob | Data lake pro Azure ekosystém |
| **GCP** | GCS | Data lake pro GCP (Dataproc, BigQuery) |
| **On-prem** | MinIO | S3-kompatibilní object storage na vlastním HW |
### Kapacitní plánování HDFS
| Velikost dat | Konfigurace |
|-------------|------------|
| **< 100 TB** | 35 DataNode, 10 GbE, replication 3× |
| **100 TB 1 PB** | 520 DataNode, 25/100 GbE, rack-aware, NameNode HA |
| **1 PB+** | 20+ DataNode, 100 GbE, Federation (více NameNode) |
---
## Tabulkové formáty (Open Table Formats)
Tabulkové formáty přináší ACID transakce, schema evolution a time travel do data lake objektového úložiště.
| Formát | Organizace | Engine kompatibilita | Streaming | Katalog |
|--------|-----------|---------------------|-----------|---------|
| **Apache Iceberg** | Apache Foundation | Spark, Flink, Trino, Dremio, Athena, Snowflake | Flink sink, snapshot-based | REST catalog, Polaris, Glue, Hive |
| **Delta Lake** | Linux Foundation (Databricks) | Spark (native), Trino, Flink (limited), Athena | Spark Streaming, DLT | Unity Catalog (proprietary), Hive |
| **Apache Hudi** | Apache Foundation | Spark, Flink, Trino (connector) | Built-in CDC, incremental | Hive, Glue (limited) |
| **Apache Paimon** | Apache Foundation | Flink (native), Spark | LSM-tree, changelog mode | Hive, REST |
**Doporučení 2026:**
- **Iceberg** — nejširší multi-engine podpora, vendor-neutral, otevřený katalog (Polaris)
- **Delta Lake** — nejlepší pro Spark/Databricks ekosystém, UniForm pro cross-format čtení
- **Hudi** — ztrácí momentum, jen pokud již v produkci
- **Paimon** — emerging, Flink-native, LSM architektura
---
## Zpracování (Processing Engines)
### Apache Spark
Dominantní engine pro dávkové zpracování a unifying engine (batch + streaming + SQL + ML).
| Vlastnost | Detail |
|-----------|--------|
| **Verze 2026** | Spark 4.x (4.1.0), native Kubernetes support, Structured Streaming, Delta Lake integrace |
| **API** | Scala, Java, Python (PySpark), SQL, R (SparkR) |
| **Batch** | DataFrame/Dataset, RDD, SQL queries — 10100× rychlejší než MapReduce |
| **Streaming** | Structured Streaming (micro-batch), latence ~100 ms 5 s |
| **SQL** | Spark SQL, ANSI SQL, Hive兼容 |
| **ML** | MLlib, SparkML, integrace s MLflow |
| **Scheduler** | YARN, Kubernetes (production-ready od Spark 3.x), standalone |
| **Fault tolerance** | RDD lineage, checkpointing |
**Kdy použít Spark:**
- Dávkové ETL/ELT pipelines
- Jednotný engine pro batch + streaming (team preference)
- Machine learning pipelines (MLlib, SparkML)
- SQL analytika na velkých datech
### Apache Flink
Nejvýkonnější engine pro true streaming (per-event zpracování).
| Vlastnost | Detail |
|-----------|--------|
| **Verze 2026** | Flink 2.x (streaming-first, batch jako speciální případ streamu) |
| **API** | DataStream API, Table/SQL API, ProcessFunction (low-level) |
| **Latence** | < 100 ms (true streaming, Chandy-Lamport checkpointing) |
| **State management** | Managed state (ValueState, ListState, MapState), RocksDB backend |
| **Event time** | Nativní, watermarky, out-of-order handling |
| **Batch** | Batch jako bounded stream (stejný runtime) |
| **Deployment** | YARN, Kubernetes, standalone |
| **Ekonomika** | Vyšší paměťové nároky (managed state), nutnost pečlivého tuningu |
**Kdy použít Flink:**
- Fraud detection, real-time bidding, IoT (< 100 ms latence)
- Komplexní stateful stream processing
- CDC pipelines
- Event-driven architektury
### Trino (ex PrestoSQL)
Distribuovaný SQL query engine — federované dotazy napříč různými zdroji.
| Vlastnost | Detail |
|-----------|--------|
| **Architektura** | Coordinator + Worker (bez storage, bez scheduleru) |
| **Konektory** | Iceberg, Delta, Hive, HDFS, S3, GCS, ADLS, PostgreSQL, MySQL, Kafka, Elasticsearch |
| **Use case** | Interactive SQL, federované dotazy, lakehouse queries |
| **Verze 2026** | Trino 470+, Iceberg native, Delta Lake connector |
---
## Srovnání Spark vs Flink vs Trino
| Kritérium | Spark | Flink | Trino |
|-----------|-------|-------|-------|
| **Primární use case** | Batch + unifying | True streaming | Interactive SQL |
| **Latence streaming** | 100 ms 5 s (micro-batch) | < 100 ms (true streaming) | N/A |
| **Throughput** | Vysoký (batch optimalizace) | Vysoký (pipeline optimalizace) | Střední (ad-hoc) |
| **State management** | State store (external) | Managed state (embedded) | N/A |
| **SQL support** | Spark SQL | Flink SQL | ANSI SQL (nejširší) |
| **ML/AI** | MLlib, SparkML | — | — |
| **Kubernetes** | Native (production) | Native (production) | Native (production) |
| **Křivka učení** | Střední | Vysoká | Nízká |
| **Provozní náročnost** | Střední | Vysoká | Střední |
---
## Orchestrace
| Nástroj | Verze 2026 | Use case |
|---------|-----------|----------|
| **Apache Airflow** | 3.0+ (taskflow API, dynamic tasks, deferrable operators) | Univerzální orchestrace, největší ekosystém |
| **Dagster** | 1.x (asset-oriented, software-defined assets) | Data pipelines, observabilita, asset lineage |
| **Prefect** | 3.x (native async, workers, blocks) | Python-native, serverless workers |
| **Kestra** | 1.x (YAML-native, declarative) | Event-driven orchestration |
| **Apache NiFi** | 2.x (flow-based, visual) | Data ingestion, CDC, streaming |
---
## Lakehouse architektura
Lakehouse kombinuje flexibilitu data lake (object storage) s výkonem a governance data warehouse.
```
┌──────────────────────────────────────────────────────┐
│ Query Engines │
│ Trino Spark SQL Flink SQL Dremio Athena │
└─────────────────────────┬────────────────────────────┘
┌─────────────────────────▼────────────────────────────┐
│ Table Format Layer │
│ Apache Iceberg / Delta Lake / Hudi │
│ (ACID, time travel, schema evolution) │
└─────────────────────────┬────────────────────────────┘
┌─────────────────────────▼────────────────────────────┐
│ Storage Layer │
│ S3 / GCS / ADLS / MinIO / HDFS │
│ (Parquet / ORC / Avro) │
└──────────────────────────────────────────────────────┘
```
Detailněji Iceberg viz [DATABASES.md — Apache Iceberg Lakehouse](DATABASES.md#apache-iceberg-lakehouse).
---
## Infrastruktura pro Big Data
### Cluster sizing
| Komponenta | Spark (batch) | Flink (streaming) | Trino (SQL) |
|------------|--------------|-------------------|-------------|
| **CPU** | 1664 cores/node | 1632 cores/node | 832 cores/node |
| **RAM** | 64256 GB/node | 64256 GB/node (včetně managed state) | 64256 GB/node |
| **Storage** | HDFS / object storage | Object storage (checkpointy) | Žádná (stateless) |
| **Network** | 25100 GbE (shuffle-heavy) | 25100 GbE (checkpointing) | 25100 GbE |
| **Disk** | NVMe (scratch, shuffle) | NVMe (RocksDB state backend) | — |
| **Cluster velikost** | 5200+ nodes | 3100+ nodes | 550 nodes |
### Network considerations
- **Spark shuffle** — heavy network traffic mezi uzly; doporučeno 25100 GbE, ideálně bez oversubscription
- **Flink checkpointing** — periodický zápis stavu na object storage; vyžaduje stabilní latenci
- **HDFS rack awareness** — optimalizuje replikaci napříč racky
- **Data locality** — HDFS: čtení z lokálního disku; object storage: network-bound
### Kubernetes vs YARN
| Kritérium | YARN | Kubernetes |
|-----------|------|-----------|
| **Resource isolation** | Cgroups (YARN containers) | Cgroups + namespaces (pods) |
| **Ecosystem fit** | Hadoop-native (HDFS, Hive, Spark) | Cloud-native, Spark, Flink, Trino |
| **Operational complexity** | Nižší (jeden cluster manager) | Vyšší (vyžaduje K8s cluster) |
| **Multi-tenant isolation** | YARN queues (Capacity/Fair Scheduler) | Namespaces, ResourceQuotas, LimitRanges |
| **Stateful workloads** | Omezená | StatefulSets, PVC, Operators |
| **2026 trend** | Legacy (klesající) | Standard pro nové projekty |
---
## Nasazení v cloudu
| Cloud | Dávkové zpracování | Streaming | SQL | Managed K8s |
|-------|-------------------|-----------|-----|-------------|
| **AWS** | EMR (Spark, Hive, Flink) | Kinesis, MSK (Kafka), EMR Flink | Athena (Trino), Redshift | EKS |
| **Azure** | HDInsight (Spark, Hive), Synapse | Event Hubs, HDInsight Flink | Synapse SQL, Azure Data Explorer | AKS |
| **GCP** | Dataproc (Spark, Flink, Hive, Trino) | Pub/Sub, Dataflow (Beam), Dataproc Flink | BigQuery | GKE |
---
## Zdroje
Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
*Poslední revize: 2026-06-18*

135
CASSANDRA.en.md Normal file
View File

@@ -0,0 +1,135 @@
# 🐘 Cassandra & ScyllaDB
## Overview
Apache Cassandra is a distributed wide-column NoSQL database designed for high availability and linear scalability with no single point of failure. Inspired by the Amazon Dynamo paper (2007) and Google Bigtable. ScyllaDB is a C++ reimplementation compatible with the Cassandra protocol, with drastically lower latency and higher throughput.
## Architecture (Dynamo-inspired)
### Consistent hashing
Data is divided on a hash ring, each node is responsible for a token range:
```text
0 ─── node A ─── hash(key1)
90 ─── node B ─── hash(key2)
180 ─── node C ─── hash(key3)
270 ─── node D ─── hash(key4)
```
- Adding/removing a node affects only K/N keys (thanks to virtual nodes)
- **Virtual nodes** (vnodes) — each physical node has ~100-200 tokens on the ring (more even distribution)
### Quorum (N, R, W)
- N = replication factor (typically 3)
- R = read quorum (typically 2)
- W = write quorum (typically 2)
- Condition: R + W > N (for strong-ish consistency)
- **Sloppy quorum** — when a node is unavailable, data is temporarily stored on another
- **Hinted handoff** — temporary write with hint, data transferred upon recovery
### Gossip protocol
Decentralized dissemination of membership information — each node periodically communicates with 1-3 random nodes. No central point of failure.
### Vector clocks
Capturing causality of object versions. On conflict (partition merge), both versions are returned — application merges.
### Merkle trees
Anti-entropy — hash tree for detecting divergence between replicas. Fast detection of which data ranges differ.
### Write path
```text
Client → Coordinator → [1. Write to commit log (disk)]
[2. Write to memtable (RAM)]
[3. Acknowledge client]
→ [4. Flush memtable → SSTable (periodically)]
→ [5. Compaction (merge SSTables)]
```
### Read path
```text
Client → Coordinator → [1. Check bloom filter]
[2. Check row cache / key cache]
[3. Read from SSTable (disk)]
[4. Merge with memtable]
[5. Repair if stale (read repair)]
```
## Cassandra vs ScyllaDB
| Feature | Cassandra | ScyllaDB |
|---------|-----------|----------|
| **Language** | Java (JVM) | C++ (seastar framework) |
| **Architecture** | Thread-per-connection | Shared-nothing, CPU sharding |
| **Latency** | 5-20 ms (typical) | 1-3 ms (typical) |
| **Throughput** | Good | 5-10× higher on same HW |
| **GC pauses** | Yes (JVM) | No (no GC) |
| **NUMA** | OS-dependent | Native NUMA aware |
| **Workload** | Standard | High-throughput, real-time |
| **Price** | Open source | Open source + Enterprise |
## Data model
- **Keyspace** = namespace (analogy to DB)
- **Table** = column definition (not schema-less)
- **Partition key** = hash key for ring distribution
- **Clustering columns** = ordering within a partition
- **Primary key** = Partition key + Clustering columns
## Recommendations — where Cassandra is better
| Area | Cassandra | Competition | Why Cassandra |
|------|-----------|-------------|---------------|
| **Write throughput** | Linear scaling, no master bottleneck | PostgreSQL (master writes) | Every node writes, no single point of failure |
| **Availability** | AP from CAP — always writable | MongoDB (CP, primary down = read-only) | "Always-writeable" philosophy |
| **Multi-DC** | Native, per-DC replication | CockroachDB (complex) | Simple configuration, latency-tolerant |
| **Time-series** | Wide-row model, TTL, compaction | InfluxDB (specialized) | Can combine with other workloads |
| **IoT / sensor data** | Linear scaling, no master | MongoDB (sharding complex) | Predictable performance under growth |
| **Geographic distribution** | Native multi-DC, hinted handoff | Spanner (vendor lock-in) | Open source, no dependencies |
### When to use Cassandra / ScyllaDB
- **IoT / sensor data ingest** — millions of writes/s, no data loss
- **Time-series at massive scale** — metrics, logs, event data
- **User activity history** — write-heavy workloads
- **Multi-DC applications** — data available in every location
- **Recommendation systems** — wide-row model for "what user has seen"
- **Message / event store** — high-throughput append with TTL
### When to use something else
- **Relations, JOINs, transactions** → PostgreSQL (Cassandra has no JOINs, limited transactions)
- **Full-text search** → Elasticsearch
- **Aggregation / OLAP** → ClickHouse (Cassandra is not an analytical DB)
- **Small data (< 100 GB)** → PostgreSQL (Cassandra overhead not worth it)
- **Frequent reads by secondary keys** → DynamoDB (SADA indexes) — Cassandra has limited secondary indexes
### ScyllaDB specific
ScyllaDB is advantageous when:
- You need 5-10× higher throughput on the same HW
- You have latency-sensitive workload (real-time scoring, ad-tech)
- You want to eliminate JVM/GC issues
- You need predictable performance (P99 < 5 ms)
## Sources
References, books, and standards: [sources/databases/sources.en.md](sources/databases/sources.en.md)
### Recommended reading
| Paper / Book | Authors | Description |
|--------------|---------|-------------|
| Dynamo: Amazon's Highly Available Key-value Store (SOSP 2007) | DeCandia et al. | Foundational paper for Cassandra architecture |
| Cassandra: The Definitive Guide (3rd ed.) | E. Hewitt | Comprehensive guide to deployment and operations |
*Last revision: 2026-06-03*

135
CASSANDRA.md Normal file
View File

@@ -0,0 +1,135 @@
# 🐘 Cassandra & ScyllaDB
## Přehled
Apache Cassandra je distribuovaná wide-column NoSQL databáze navržená pro vysokou dostupnost a lineární škálovatelnost bez single point of failure. Inspirována Amazon Dynamo paperem (2007) a Google Bigtable. ScyllaDB je C++ reimplementace kompatibilní s Cassandra protokolem, s drasticky nižší latencí a vyšší propustností.
## Architektura (Dynamo-inspired)
### Consistent hashing
Data rozdělena na hash ringu, každý node zodpovídá za rozsah tokenů:
```text
0 ─── node A ─── hash(key1)
90 ─── node B ─── hash(key2)
180 ─── node C ─── hash(key3)
270 ─── node D ─── hash(key4)
```
- Přidání/odebrání nodu ovlivní jen K/N klíčů (díky virtual nodes)
- **Virtual nodes** (vnodes) — každý fyzický node má ~100-200 tokenů na ringu (rovnoměrnější distribuce)
### Quorum (N, R, W)
- N = replication factor (typicky 3)
- R = read quorum (typicky 2)
- W = write quorum (typicky 2)
- Podmínka: R + W > N (pro strong-ish konzistenci)
- **Sloppy quorum** — při nedostupnosti nodu, data dočasně uložena na jiném
- **Hinted handoff** — dočasný zápis s hintem, při obnově se data přenesou
### Gossip protocol
Decentralizované šíření membership informací — každý node periodicky komunikuje s 1-3 náhodnými nodes. Žádný centrální bod selhání.
### Vector clocks
Zachycení kauzality verzí objektu. Při konfliktu (partition merge) se vrací obě verze — aplikace merguje.
### Merkle trees
Anti-entropy — strom hashů pro detekci divergence mezi replikami. Rychlá detekce, které rozsahy dat jsou rozdílné.
### Write path
```text
Client → Coordinator → [1. Write to commit log (disk)]
[2. Write to memtable (RAM)]
[3. Acknowledge client]
→ [4. Flush memtable → SSTable (periodicky)]
→ [5. Compaction (merge SSTables)]
```
### Read path
```text
Client → Coordinator → [1. Check bloom filter]
[2. Check row cache / key cache]
[3. Read from SSTable (disk)]
[4. Merge with memtable]
[5. Repair if stale (read repair)]
```
## Cassandra vs ScyllaDB
| Vlastnost | Cassandra | ScyllaDB |
|-----------|-----------|----------|
| **Jazyk** | Java (JVM) | C++ (seastar framework) |
| **Architektura** | Thread-per-connection | Shared-nothing, CPU sharding |
| **Latence** | 5-20 ms (typicky) | 1-3 ms (typicky) |
| **Propustnost** | Dobrá | 5-10× vyšší na stejný HW |
| **GC pauzy** | Ano (JVM) | Ne (žádný GC) |
| **NUMA** | OS-dependent | Nativní NUMA aware |
| **Workload** | Standardní | High-throughput, real-time |
| **Cena** | Open source | Open source + Enterprise |
## Data model
- **Keyspace** = namespace (analogie DB)
- **Table** = definice sloupců (ne schema-less)
- **Partition key** = hash klíč pro distribuci na ringu
- **Clustering columns** = řazení v rámci partition
- **Primary key** = Partition key + Clustering columns
## Doporučení — v čem je Cassandra lepší
| Oblast | Cassandra | Konkurence | Proč Cassandra |
|--------|-----------|------------|----------------|
| **Zápisová propustnost** | Lineární škálování, žádný master bottleneck | PostgreSQL (master writes) | Každý node zapisuje, žádný single point of failure |
| **Dostupnost** | AP z CAP — vždy zapisovatelná | MongoDB (CP, primary down = read-only) | "Always-writeable" filozofie |
| **Multi-DC** | Nativní, režim per DC | CockroachDB (komplexní) | Jednoduchá konfigurace, latency-tolerant |
| **Time-series** | Wide-row model, TTL, compaction | InfluxDB (specializovaná) | Lze kombinovat s dalšími workloady |
| **IoT / sensor data** | Lineární škálování, žádný master | MongoDB (sharding komplexní) | Předvídatelný výkon při růstu |
| **Geografická distribuce** | Nativní multi-DC, hinted handoff | Spanner (vendor lock-in) | Open source, žádná závislost |
### Kdy použít Cassandra / ScyllaDB
- **IoT / sensor data ingest** — miliony zápisů/s, žádné ztráty
- **Time-series v masivním měřítku** — metriky, logy, event data
- **Uživatelské activity history** — zápisově těžké workloady
- **Multi-DC aplikace** — data dostupná v každé lokalitě
- **Doporučovací systémy** — wide-row model pro "co viděl uživatel"
- **Message / event store** — high-throughput append s TTL
### Kdy použít něco jiného
- **Relace, JOINy, transakce** → PostgreSQL (Cassandra nemá JOINy, omezené transakce)
- **Full-text search** → Elasticsearch
- **Agregace / OLAP** → ClickHouse (Cassandra není analytická DB)
- **Malá data (< 100 GB)** → PostgreSQL (Cassandra overhead se nevyplatí)
- **Časté ready podle vedlejších klíčů** → DynamoDB (SADA indexy) — Cassandra má omezené secondary indexy
### ScyllaDB specific
ScyllaDB je výhodná když:
- Potřebujete 5-10× vyšší propustnost na stejném HW
- Máte latency-sensitive workload (real-time scoring, ad-tech)
- Chcete eliminovat JVM/GC problémy
- Potřebujete předvídatelný výkon (P99 < 5 ms)
## Zdroje
Odkazy, knihy a standardy: [sources/databases/sources.md](sources/databases/sources.md)
### Doporučená literatura
| Paper / Kniha | Autoři | Popis |
|---------------|--------|-------|
| Dynamo: Amazon's Highly Available Key-value Store (SOSP 2007) | DeCandia et al. | Zakladatelský paper pro Cassandra architekturu |
| Cassandra: The Definitive Guide (3rd ed.) | E. Hewitt | Komplexní průvodce nasazením a provozem |
*Poslední revize: 2026-06-03*

679
CICD.en.md Normal file
View File

@@ -0,0 +1,679 @@
# 🔄 CI/CD and DevOps
## CI/CD Pipeline
```
Code Commit → Build → Test → Package → Deploy to Staging → Integration Tests → Deploy to Production
```
### Detailed Pipeline Stages
```
1. Checkout ──→ 2. Lint ──→ 3. Test ──→ 4. Build ──→ 5. Scan ──→ 6. Publish ──→ 7. Deploy
│ │ │
ESLint/ Unit/Integ/ SAST/SCA/
Prettier e2e tests Container scan
```
| Stage | Tools | What Happens |
|-------|-------|--------------|
| **Checkout** | git clone, fetch | Retrieve code from repository, including submodules |
| **Lint** | ESLint, Prettier, RuboCop, golangci-lint | Static code analysis, formatting |
| **Test (unit)** | Jest, pytest, JUnit | Fast tests (ms to s), no dependencies |
| **Test (integration)** | Testcontainers, Docker Compose | Tests with DB, message queue, external services |
| **Test (e2e)** | Playwright, Cypress, Selenium | Full-stack tests in the browser |
| **Build** | Docker build, go build, npm build, Maven | Compilation, artifact assembly |
| **Scan (SAST)** | Semgrep, SonarQube, CodeQL | Static security analysis |
| **Scan (DAST)** | OWASP ZAP, Burp Suite | Dynamic analysis (running application) |
| **Scan (SCA)** | Dependabot, Snyk, Trivy | Dependency and CVE analysis |
| **Publish** | Docker push, npm publish, Maven deploy | Upload artifact to registry |
| **Deploy** | ArgoCD, Terraform, Helm, kubectl | Deploy to target environment |
### Continuous Integration (CI)
- Automatic build and tests on every commit
- Fast feedback loop (< 10 min)
- Linting, type checking, unit tests, security scan (SAST)
### Continuous Delivery (CD)
- Automatic deployment to staging / test environments
- Manual approval for production (optional)
- Smoke tests after deployment
### Continuous Deployment
- Fully automatic deployment to production
- Requires high confidence in tests and monitoring
- Feature flags for risk management
## GitHub Actions Detail
### Workflow Syntax
```yaml
name: CI Pipeline
on:
push:
branches: [main]
pull_request:
branches: [main]
env:
NODE_VERSION: "22"
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}
- run: npm ci
- run: npm run lint
test:
runs-on: ubuntu-latest
needs: lint
strategy:
matrix:
node-version: [22, 24]
steps:
- uses: actions/checkout@v4
- name: Run tests
run: npm test
```
### Matrix Builds
- Run the same jobs with different parameters (OS, language version, architecture)
- `strategy.matrix` — parameter combinations (Cartesian product)
- `strategy.fail-fast` — stop all if one fails
### Reusable Workflows
```yaml
# .github/workflows/deploy.yml (called)
on:
workflow_call:
inputs:
environment:
required: true
type: string
secrets:
cloud_role:
required: true
# Call in caller workflow
jobs:
deploy:
uses: ./.github/workflows/deploy.yml
with:
environment: staging
secrets:
cloud_role: ${{ secrets.STAGING_ROLE }}
```
### Composite Actions
- Custom actions without needing a separate repository
- Combination of `run`, `uses`, `shell` steps
- Use case: standardize lint/test/build across repositories
### Self-hosted Runners
- Own infrastructure for running GitHub Actions
- Use case: private network, GPU, specific HW, compliance
- Scaling: actions-runner-controller (Kubernetes), auto-scaling groups
- Security: job isolation, ephemeral runners
## GitLab CI Detail
```yaml
stages:
- lint
- test
- build
- deploy
variables:
DOCKER_IMAGE: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA
lint:
stage: lint
image: node:22
script:
- npm ci
- npm run lint
test:
stage: test
image: node:22
needs: ["lint"]
script:
- npm test
artifacts:
paths:
- coverage/
reports:
coverage_report:
coverage_format: cobertura
path: coverage/cobertura-coverage.xml
deploy-staging:
stage: deploy
needs: ["build"]
rules:
- if: $CI_COMMIT_BRANCH == "main"
environment:
name: staging
url: https://staging.example.com
script:
- kubectl set image deployment/app app=$DOCKER_IMAGE
```
**Concepts**:
- **Stages** — sequential phases (each stage can have multiple parallel jobs)
- **Rules** — execution conditions (branch, tag, changes, variables) — replaces `only/except`
- **Needs** — DAG dependencies (job doesn't have to wait for entire stage)
- **Artifacts** — file sharing between jobs (binaries, reports, cache)
- **Environments** — deployment tracking (rollback, history, approvals)
### DAG Pipelines (Needs)
```
lint ──→ test ──→ build ──→ deploy-staging ──→ deploy-prod
build-arm ──→ test-arm
```
- Defines dependencies between jobs (not necessarily stages)
- Enables parallelization of independent jobs
- Reduces overall pipeline time
## Infrastructure as Code (IaC)
| Tool | Type | Language |
|------|------|----------|
| Terraform | Declarative | HCL |
| OpenTofu | Declarative | HCL (Terraform fork) |
| Pulumi | Declarative | TypeScript, Python, Go, C# |
| AWS CDK | Declarative | TypeScript, Python, Java, C# |
| CloudFormation | Declarative | YAML/JSON (AWS) |
| Azure ARM/Bicep | Declarative | Bicep, JSON |
| Ansible | Imperative/Config | YAML |
| Chef/Puppet | Config mgmt | Ruby DSL |
### Infrastructure as Code (2nd Edition) — Kief Morris
Key reference for designing and operating dynamic cloud infrastructure with IaC. The book is tool-agnostic — it focuses on patterns and practices, not specific tools.
#### Three Fundamental Practices
| Practice | Description |
|----------|-------------|
| **Define everything as code** | All infrastructure defined in code, version control, repeatability |
| **Continuously test and deliver** | Every change goes through a pipeline with automated tests |
| **Small, independent pieces** | Small, loosely coupled components — easier change and testing |
#### Principles of Cloud Infrastructure
- **Systems reproducible** — infrastructure can be recreated from code at any time
- **Systems disposable** — instances can be destroyed and recreated
- **Systems consistent** — all environments identical (no snowflake servers)
- **Processes repeatable** — automation instead of manual procedures
- **Design always changing** — infrastructure is constantly evolving (not build-and-forget)
#### Anti-patterns (Pitfalls)
| Anti-pattern | Description |
|--------------|-------------|
| **Snowflake server** | Each server different, cannot reproduce |
| **Configuration drift** | Manual changes → deviations from defined state |
| **Server sprawl** | Too many servers without management |
| **Fragile infrastructure** | Changes often break the system |
| **Automation fear** | Fear of automation → manual interventions |
#### Book Structure (4 Parts)
1. **Foundations** — framework of tools and technologies for cloud platforms
2. **Working with infrastructure stacks** — defining, provisioning, testing and CD of infrastructure changes
3. **Working with servers and application runtime platforms** — provisioning and configuring servers and clusters
4. **Working with large systems and teams** — workflow, governance, architectural patterns for multiple teams
#### IaC Code Organization
| Pattern | Description |
|---------|-------------|
| **Monorepo** | One repository for everything — build-time integration, suitable for small teams |
| **Microrepo** | Separate repository for each project — isolation, suitable for large teams |
| **Domain organization** | Organizing code by domain concepts (not by technology) |
**Recommendations:**
- Infrastructure and applications can be in the same or separate repository depending on organizational structure (Team Topologies)
- Per-environment configuration files (test, staging, production) stored within the project
- Tests belong to the project, integration tests can be in a separate project
- Infrastructure code should not directly deploy applications — use OS packaging (RPM, deb)
#### Expand-Contract Pattern for Infrastructure Changes
Same principle as database migrations:
1. **Expand** — add new resource (old version still running)
2. **Migrate** — move traffic / dependencies to the new resource
3. **Contract** — remove old resource
Prevents outages when refactoring infrastructure.
## Terraform Detail
#### State Locking Mechanism
| Backend | Locking Mechanism | Note |
|---------|-------------------|------|
| **S3 + DynamoDB** | DynamoDB (ConditionalPut) | Most common, cheap, simple |
| **Terraform Cloud** | Built-in (API) | SaaS, audit logs, VCS integration |
| **Azure Storage** | Azure Blob Lease | Similar to S3 model |
| **GCS** | Cloud Storage Object Hold | Limited |
| **Consul** | Consul KV session_lock | High-availability |
| **PostgreSQL** | pg_advisory_lock / row lock | Custom backend |
#### State Backends Comparison
| Property | S3 + DynamoDB | Terraform Cloud | Consul |
|----------|---------------|----------------|--------|
| Cost | $ (S3 + DynamoDB) | $$ (free tier limited) | $$ (infra) |
| Team workflow | GitHub Actions + OIDC | Native RBAC, runs | Custom |
| Locking | DynamoDB | Built-in | Consul session |
| History | S3 versioning | Full history, diff | None |
| Remote ops | No (state only) | Yes (remote runs) | No |
| Encryption | SSE-S3/KMS | At rest + in transit | TLS |
#### Workspaces vs Terragrunt
| Aspect | Terraform Workspaces | Terragrunt |
|--------|---------------------|------------|
| **State separation** | One backend, key: `env:/workspace` | Separate backend per env |
| **Code reuse** | Same code, different variables | DRY configuration, modules |
| **Risk** | Accidentally `apply` to wrong workspace | Isolated backends |
| **When to use** | Simple projects, <5 envs | Microservices, multi-env, multi-team |
| **Extra features** | — | Dependency, include, before_hook |
#### Provider Versioning
```hcl
terraform {
required_version = ">= 1.5, < 2.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = ">= 2.23"
}
}
}
```
- `~> 5.0` — only patch versions (5.x, x ≥ 0)
- `>= 2.23, < 3.0` — any 2.x from 2.23
- `~>` constraints prevent breaking changes in major/minor
### Terraform Workflow
```
terraform init → Download provider modules
terraform plan → Show changes
terraform apply → Apply changes
terraform destroy → Destroy infrastructure
terraform validate → Syntax validation
terraform fmt → Format HCL
```
### State Management
- Remote state (S3, Terraform Cloud, Azure Storage)
- State locking (DynamoDB, Consul)
- Workspaces for environment separation
### Terraform: Up and Running (3rd ed.) — Yevgeniy Brikman
Practical guide to Terraform from the founder of Gruntwork. The 3rd edition (2022) adds over 100 pages of new content, updates from Terraform 0.12 to 1.2, and two new chapters.
#### What's New in the 3rd Edition
| New Feature | Description |
|-------------|-------------|
| **Chapter: Secrets management** | Managing secrets with Terraform — Vault, AWS Secrets Manager, KMS, OIDC, `sensitive` variables |
| **Chapter: Multiple providers** | Working with multiple regions, accounts, clouds including Kubernetes (AWS EKS) |
| **Terraform 1.0+** | Backward compatibility promise, stability, HashiCorp IPO |
| **Provider versioning** | `required_providers` block + `terraform.lock.hcl` (lock file) |
| **Module iteration** | `count` and `for_each` on modules (since Terraform 0.13) |
| **Variable validation** | `validation {}` blocks, `precondition` / `postcondition` |
| **Refactoring** | `moved` blocks — safe refactoring without manual state manipulation |
| **CI/CD security** | OIDC authentication, isolated workers for `terraform apply` |
#### Secrets Management with Terraform
```hcl
# Variable marked as sensitive — never shown in log
variable "db_password" {
type = string
sensitive = true
}
# Reading secrets from AWS Secrets Manager
data "aws_secretsmanager_secret" "db" {
name = "production/db/master"
}
data "aws_secretsmanager_secret_version" "db" {
secret_id = data.aws_secretsmanager_secret.db.id
}
```
**Recommended Security Hierarchy:**
1. **OIDC** — most secure, no creds on CI server (GitHub Actions → IAM role)
2. **IAM role** — instance profile (EC2, ECS, EKS)
3. **Environment variables** — limited, risk of log leakage
4. **Isolated workers** — separate worker with admin permissions, API only `plan`/`apply`
#### Testing Terraform Code
| Layer | Tools | Description |
|-------|-------|-------------|
| **Static analysis** | `terraform validate`, `tflint`, `tfsec`, `checkov` | Code analysis without execution |
| **Plan testing** | `conftest` + OPA (Rego), `terraform plan` parse | Plan validation against policy |
| **Unit tests** | Terratest (Go), `terraform fmt`, `terraform validate` | Testing modules in isolation |
| **Integration tests** | Terratest (Go) | Actual provisioning + assert |
| **End-to-end tests** | Terratest | Full stack, smoke tests |
#### Policy Enforcement
```rego
# OPA / conftest — deny public S3 bucket
package main
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_s3_bucket"
resource.change.after.acl == "public-read"
msg = sprintf("%s must not be public", [resource.address])
}
```
#### Production-grade Checklist by Brikman
1. **Small modules** — one module = one thing (single responsibility)
2. **Composable modules** — modules can be composed into larger units
3. **Testable modules** — each module has tests (Terratest)
4. **Releasable modules** — versioning (Git tags, Terraform Registry)
5. **Version control** — everything in git, including `.terraform.lock.hcl`
6. **Remote state** — S3 + DynamoDB or Terraform Cloud
7. **CI/CD pipeline**`plan` on MR, `apply` after merge to main
8. **Secrets management** — no secrets in plaintext in code
9. **Policy as code** — OPA / Sentinel for compliance
10. **Sandbox environment** — each developer has their own isolated environment
#### Golden Rule of Terraform
> **Master branch state must always be in sync with the production environment.**
> Never run `terraform apply` manually locally on production — always via CI/CD.
## Dockerfile Best Practices
```dockerfile
# Multi-stage build
FROM node:22-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
# Runtime stage — distroless
FROM gcr.io/distroless/nodejs22-debian12
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
USER nonroot:nonroot
EXPOSE 3000
CMD ["dist/server.js"]
```
**Rules**:
- **Multi-stage build** — separate build tools from runtime
- **Distroless images** — minimal attack surface (no shell, package manager)
- **Non-root user** — USER nonroot (security best practice)
- **Layer caching** — copy less-frequently changing files first (package.json → npm ci → code)
- **Small base image** — Alpine (5 MB), distroless (minimal), scratch (Go static binary)
- **Healthcheck** — HEALTHCHECK instruction for orchestrator
- **Labels** — LABEL maintainer, version, git commit
- **.dockerignore** — minimize build context
## Artifact Management
### Docker Registries
| Registry | Public/Private | Cost | Integration |
|----------|---------------|------|-------------|
| **Docker Hub** | Both | Public free, private $5/month | GitHub Actions, GitLab |
| **ECR (AWS)** | Private | $0.10/GB/month + data transfer | IAM, ECS, EKS |
| **GHCR (GitHub)** | Both | Public free, private 500 MB free | GitHub Actions, npm |
| **GCR / Artifact Registry** | Private | $0.10/GB/month | GKE, Cloud Build |
| **ACR (Azure)** | Private | $0.11/GB/month | AKS, Azure DevOps |
| **Harbor** | Private (self-hosted) | Free (open source) | Custom, CNCF |
### Helm Charts
- **Repository** — index.yaml + chart .tgz on HTTP server (S3, GitHub Pages, ChartMuseum)
- **OCI registry** — Helm 3.8+ supports storing charts in OCI registries (ECR, GHCR, Harbor)
- **Versioning** — chart version (package) + app version (application)
### SBOM (Software Bill of Materials)
- **SPDX** / **CycloneDX** — standard SBOM formats
- Generation: Trivy, Syft, grype
- Use case: supply chain security, compliance (EO 14028, EU CRA)
## Configuration and Secrets
| Tool | Description |
|------|-------------|
| Vault (HashiCorp) | Dynamic secrets, encryption-as-a-service |
| AWS Secrets Manager | Managed, auto-rotation |
| Azure Key Vault | Managed, HSM support |
| GCP Secret Manager | Managed |
| SOPS | Encryption in git repos |
| Sealed Secrets | Encrypted secrets for Kubernetes |
### Secret Management Workflows
**Vault Agent Injector** (Kubernetes)
- Sidecar container (vault-agent) injects secrets into the pod
- Secrets mounted as tmpfs volume (not into environment variables)
- Auto-rotation: vault-agent periodically refreshes secrets
**External Secrets Operator** (Kubernetes)
- CRD: `ExternalSecret` → creates `Secret` in K8s
- Backend: AWS Secrets Manager, Azure Key Vault, GCP Secret Manager, Vault
- Push-based refresh: change in external store → propagate to K8s
**Sealed Secrets**
- `kubeseal` encrypts Secret on the cluster (controller has private key)
- Encrypted manifest (SealedSecret) can be safely in git
- Controller decrypts on deploy
## GitOps
- **Principle**: Git is the single source of truth
- **Tools**: ArgoCD, Flux, Rancher Fleet
- Pull-based deploy — agent in the cluster watches repo and applies changes
- Auto-sync + drift detection
## Environment Promotion (dev → staging → prod)
```
Code → Dev (auto-deploy) → Staging (auto + smoke tests) → Prod (manual approval + gating)
```
**Quality Gates**:
1. **Unit tests** — pass rate 100 %, code coverage ≥ 80 %
2. **Integration tests** — all critical paths pass
3. **SAST scan** — no critical/high vulnerabilities
4. **SCA scan** — no known critical CVEs
5. **Container scan** — all fixable vulns addressed
6. **Smoke tests** — after staging deploy (health endpoint, basic flow)
7. **Manual approval** — for production (optional with CD)
## Deployment Strategies
| Strategy | Description | Risk |
|----------|-------------|------|
| **Rolling update** | Gradual instance replacement | Low |
| **Blue/Green** | Two identical environments, traffic switch | Medium |
| **Canary** | % traffic to new version, gradual increase | Low |
| **Feature flag** | Toggle feature on/off without deploy | Very low |
| **A/B testing** | Different versions for different users | Low |
## Git Branching Strategies
| Strategy | Description | Suitable For |
|----------|-------------|--------------|
| **Trunk-based** | Single main branch, short feature branches (< 1 day) | CD, microservices, mature teams |
| **GitHub Flow** | Main + feature branches, PRs, simple | Startups, web apps |
| **GitLab Flow** | Main + environment branches (staging, prod) + feature branches | Enterprise, regulated |
| **GitFlow** | Develop + main + feature/release/hotfix branches | Release-based, enterprise legacy |
| **One Flow** | Simplified GitFlow (no develop branch) | Medium teams |
## Rollback Strategies
| Strategy | Description | Speed | Risk |
|----------|-------------|-------|------|
| **Forward fix** | New deploy with hotfix | Slow (build + deploy) | Low |
| **Rollback (revert commit)** | Git revert, new deploy | Medium | Low |
| **Blue/Green switchback** | Switch back to old version | Instant | DB incompatibility |
| **Database rollback** | Revert DB migration (migrate down) | Slow | Data loss risk |
### Database Rollback Challenges
- **Breaking changes** — removing a column/table means rollback problem (data lost)
- **Best practice**: Expand → Migrate → Contract (never remove in a single deploy)
- **Tooling**: Flyway undo (limited), Liquibase rollback, pgroll (Postgres)
- **Feature flags** as prevention — new code is behind a flag, rollback = disable flag
## CI/CD Design Patterns
Modern CI/CD pipelines solve recurring problems using design patterns:
| Pattern | Description |
|---------|-------------|
| **Pipeline as Code** | Pipeline defined in YAML/Kotlin DSL (`.gitlab-ci.yml`, `.github/workflows/`) |
| **Immutable Pipeline** | Each build is an artifact, never changed |
| **Quality Gate** | Branch protection, required checks, code coverage threshold |
| **Deployment Strategy** | Blue/Green, Canary, Rolling (see table below) |
| **GitOps** | Pull-based deploy with auto-sync and drift detection |
| **Shift-Left Security** | SAST/DAST/SCA part of the pipeline |
| **Dependency Caching** | Cache layer between pipeline runs |
## Shift Left Security
### SCA (Software Composition Analysis)
| Tool | Type | Integration |
|------|------|-------------|
| **Dependabot** | GitHub native | GitHub, auto-PR for fix |
| **Renovate** | Multi-platform | GitHub, GitLab, Bitbucket |
| **Snyk** | SaaS + CLI | All platforms, Docker, IaC |
| **Trivy** | CLI, OSS | CI/CD pipeline (GitHub Actions, GitLab) |
### SAST (Static Application Security Testing)
| Tool | Languages | Characteristics |
|------|-----------|----------------|
| **Semgrep** | 30+ (Python, Java, Go, JS/TS) | Fast, custom rules, CI-native |
| **SonarQube** | 30+ | Comprehensive, quality gates, tech debt |
| **CodeQL** | 12 (C++, C#, Go, Java, JS/TS, Python) | GitHub native, query-based |
| **Checkmarx** | 30+ | Enterprise, CxSAST, CxFlow |
| **Fortify** | 30+ | Enterprise, SAST + DAST |
### Container Scanning
| Tool | Description |
|------|-------------|
| **Trivy** | OSS, scans OS packages + language-specific + IaC |
| **Grype** | OSS, from Anchore, fast, Syft for SBOM |
| **Clair** | Red Hat, OSS, OCI-compatible |
| **Docker Scout** | Docker Desktop / CLI, integration with Docker Hub |
## AI-Native Software Delivery (20252026)
AI is transforming DevOps 2.0:
- **AI-assisted CI/CD** — automatic pipeline failure diagnosis, resource allocation optimization
- **Agent Control Protocol (ACP)** / **Model Context Protocol (MCP)** — standards for AI agent interaction with tooling
- **AI-driven cost management** — FinOps cloud optimization
- **Intelligent test selection** — ML determines which tests to run based on code changes
- **Self-healing pipelines** — AI auto-detects and fixes common issues
New tools: Harness (AI-native CD), GitLab 19.0 (agentic MR workflows, secrets manager), Octopus Deploy.
## Pipeline Tools
- **GitHub Actions** — integrated with GitHub, large marketplace
- **GitLab CI** — native in GitLab, auto DevOps
- **Jenkins** — oldest, extensible, self-hosted
- **CircleCI** — SaaS, fast
- **Argo Workflows** — Kubernetes native
- **Buildkite** — hybrid (own agents, SaaS orchestrator)
## Best Practices
- **Idempotent pipeline** — repeated runs give the same result
- **Immutable infrastructure** — never modify a running server, always redeploy
- **Shift left** — tests and security as early as possible in the pipeline
- **Artifact management** — all builds versioned in registry (Docker Hub, ECR, GHCR)
- **Dependency caching** — speed up pipeline (npm ci, pip cache, Docker layer caching)
- **Fail fast** — pipeline fails as early as possible on error
## Resources
Links, books and standards: [sources/cicd/sources.en.md](sources/cicd/sources.en.md)
### Recommended Reading
| Book | Authors | ISBN | Key Contribution |
|------|---------|------|-----------------|
| The DevOps Handbook | Kim, Humble, Debois, Willis | 978-1942788003 | CALMS principles (Culture, Automation, Lean, Measurement, Sharing), flow map, deployment pipeline |
| Continuous Delivery | Humble, Farley | 978-0321601912 | Deployment pipeline, commit stage, acceptance tests, capacity testing, zero-downtime release |
| CI/CD Design Patterns | Bajpai, Schildmeijer, Piwosz, Mishra | 978-1-83588-965-7 | 30+ design patterns for CI/CD — pipeline patterns, GitOps, security, testing, deployment strategies |
| DevOps Frameworks, Techniques, and Tools | Vijayakumaran, Kofler, Öggl, Springer | 978-1-4932-2670-2 | Framework for DevOps adoption, tool comparison (Jenkins vs GitLab vs GitHub Actions), techniques for monitoring and observability |
- **Quality gates** — automated checks before every promotion to the next environment
- **Pipeline visibility** — dashboard with current status of all pipelines (GitHub, GitLab, ArgoCD)
## OpenStack CI/CD
OpenStack ecosystem uses its own CI/CD tools:
### Zuul
- CI/CD system developed by the OpenStack community (now standalone, used outside OpenStack)
- **Gating** — changes are tested before merge (not after merge) — prevents breaking main branch
- **Ansible-based** — jobs are Ansible playbooks
- **Nodepool** — dynamic test VM allocation in the cloud (OpenStack, AWS)
- **Pipeline** — check, gate, post, periodic, tag, release
### OpenStack Infra (OpenDev)
- Public CI infrastructure for OpenStack projects
- Tools: Gerrit (code review), Zuul (CI), Nodepool (test nodes), Storyboard (issue tracking)
- Base jobs: tempest (integration tests), grenade (upgrade tests), devstack-gate (gate tests)
### Integration with External Tools
- **Terraform** — OpenStack provider for provisioning (terraform-provider-openstack)
- **Ansible** — openstack.cloud collection for managing OpenStack resources
- **Packer** — build OpenStack images (openstack builder)
- **Jenkins** — older CI, still used in some distributions
*Last revised: 2026-06-03*

679
CICD.md Normal file
View File

@@ -0,0 +1,679 @@
# 🔄 CI/CD a DevOps
## CI/CD Pipeline
```
Code Commit → Build → Test → Package → Deploy to Staging → Integration Tests → Deploy to Production
```
### Detailní pipeline stages
```
1. Checkout ──→ 2. Lint ──→ 3. Test ──→ 4. Build ──→ 5. Scan ──→ 6. Publish ──→ 7. Deploy
│ │ │
ESLint/ Unit/Integ/ SAST/SCA/
Prettier e2e tests Container scan
```
| Stage | Nástroje | Co se děje |
|-------|----------|------------|
| **Checkout** | git clone, fetch | Stažení kódu z repozitáře, včetně submodulů |
| **Lint** | ESLint, Prettier, RuboCop, golangci-lint | Statická analýza kódu, formátování |
| **Test (unit)** | Jest, pytest, JUnit | Rychlé testy (ms až s), bez závislostí |
| **Test (integration)** | Testcontainers, Docker Compose | Testy s DB, message queue, externí služby |
| **Test (e2e)** | Playwright, Cypress, Selenium | Full-stack testy v prohlížeči |
| **Build** | Docker build, go build, npm build, Maven | Kompilace, sestavení artifactu |
| **Scan (SAST)** | Semgrep, SonarQube, CodeQL | Statická analýza bezpečnosti |
| **Scan (DAST)** | OWASP ZAP, Burp Suite | Dynamická analýza (běžící aplikace) |
| **Scan (SCA)** | Dependabot, Snyk, Trivy | Analýza závislostí a CVE |
| **Publish** | Docker push, npm publish, Maven deploy | Nahrání artifactu do registru |
| **Deploy** | ArgoCD, Terraform, Helm, kubectl | Nasazení do cílového prostředí |
### Continuous Integration (CI)
- Automatické sestavení a testy při každém commitu
- Rychlá feedback smyčka (< 10 min)
- Linting, type checking, unit testy, security scan (SAST)
### Continuous Delivery (CD)
- Automatické deploye do staging / test prostředí
- Ruční schválení do produkce (optional)
- Smoke testy po deployi
### Continuous Deployment
- Plně automatický deploy do produkce
- Vyžaduje vysokou důvěru v testy a monitoring
- Feature flagy pro řízení rizika
## GitHub Actions detail
### Workflow syntax
```yaml
name: CI Pipeline
on:
push:
branches: [main]
pull_request:
branches: [main]
env:
NODE_VERSION: "22"
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}
- run: npm ci
- run: npm run lint
test:
runs-on: ubuntu-latest
needs: lint
strategy:
matrix:
node-version: [22, 24]
steps:
- uses: actions/checkout@v4
- name: Run tests
run: npm test
```
### Matrix builds
- Spouští stejné joby s různými parametry (OS, jazyková verze, architektura)
- `strategy.matrix` — kombinace parametrů (kartézský součin)
- `strategy.fail-fast` — zastavení všech při selhání jednoho
### Reusable workflows
```yaml
# .github/workflows/deploy.yml (called)
on:
workflow_call:
inputs:
environment:
required: true
type: string
secrets:
cloud_role:
required: true
# Volání v caller workflow
jobs:
deploy:
uses: ./.github/workflows/deploy.yml
with:
environment: staging
secrets:
cloud_role: ${{ secrets.STAGING_ROLE }}
```
### Composite actions
- Vlastní akce bez nutnosti samostatného repozitáře
- Kombinace `run`, `uses`, `shell` kroků
- Use case: standardizace lint/test/build napříč repozitáři
### Self-hosted runners
- Vlastní infrastruktura pro běh GitHub Actions
- Use case: privátní síť, GPU, specifický HW, compliance
- Škálování: actions-runner-controller (Kubernetes), auto-scaling groups
- Bezpečnost: izolace jobů, ephemeral runners
## GitLab CI detail
```yaml
stages:
- lint
- test
- build
- deploy
variables:
DOCKER_IMAGE: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA
lint:
stage: lint
image: node:22
script:
- npm ci
- npm run lint
test:
stage: test
image: node:22
needs: ["lint"]
script:
- npm test
artifacts:
paths:
- coverage/
reports:
coverage_report:
coverage_format: cobertura
path: coverage/cobertura-coverage.xml
deploy-staging:
stage: deploy
needs: ["build"]
rules:
- if: $CI_COMMIT_BRANCH == "main"
environment:
name: staging
url: https://staging.example.com
script:
- kubectl set image deployment/app app=$DOCKER_IMAGE
```
**Koncepty**:
- **Stages** — sekvenční fáze (každá stage může mít více jobů paralelně)
- **Rules** — podmínky spuštění (branch, tag, changes, variables) — nahrazuje `only/except`
- **Needs** — DAG závislosti (job nemusí čekat na celou stage)
- **Artifacts** — předávání souborů mezi joby (binárky, reporty, cache)
- **Environments** — sledování deployů (rollback, history, approvals)
### DAG pipelines (needs)
```
lint ──→ test ──→ build ──→ deploy-staging ──→ deploy-prod
build-arm ──→ test-arm
```
- Definuje závislosti mezi joby (ne nutně stages)
- Umožňuje paralelizaci nezávislých jobů
- Snižuje celkový čas pipeline
## Infrastructure as Code (IaC)
| Nástroj | Typ | Jazyk |
|---------|-----|-------|
| Terraform | Declarative | HCL |
| OpenTofu | Declarative | HCL (fork Terraformu) |
| Pulumi | Declarative | TypeScript, Python, Go, C# |
| AWS CDK | Declarative | TypeScript, Python, Java, C# |
| CloudFormation | Declarative | YAML/JSON (AWS) |
| Azure ARM/Bicep | Declarative | Bicep, JSON |
| Ansible | Imperative/Config | YAML |
| Chef/Puppet | Config mgmt | Ruby DSL |
### Infrastructure as Code (2. vydání) — Kief Morris
Klíčová reference pro navrhování a provozování dynamické cloudové infrastruktury pomocí IaC. Kniha je tool-agnostic — zaměřuje se na vzory a postupy, ne na konkrétní nástroje.
#### Tři základní praktiky
| Praktika | Popis |
|----------|-------|
| **Define everything as code** | Veškerá infrastruktura definovaná v kódu, version control, repeatabilita |
| **Continuously test and deliver** | Každá změna prochází pipeline s automatickými testy |
| **Small, independent pieces** | Malé, volně provázané komponenty — snadnější změna a testování |
#### Principy cloudové infrastruktury
- **Systems reproducible** — infrastructure can be recreated from code at any time
- **Systems disposable** — instance mohou být zničeny a znovu vytvořeny
- **Systems consistent** — všechny prostředí identická (žádné snowflake servery)
- **Processes repeatable** — automatizace namísto manuálních postupů
- **Design always changing** — infrastruktura se neustále vyvíjí (není build-and-forget)
#### Anti-vzory (pitfalls)
| Anti-vzor | Popis |
|-----------|-------|
| **Snowflake server** | Každý server jiný, nelze reprodukovat |
| **Configuration drift** | Ruční změny → odchylky od definovaného stavu |
| **Server sprawl** | Příliš mnoho serverů bez správy |
| **Fragile infrastructure** | Křehká infrastruktura — změny často rozbijí systém |
| **Automation fear** | Strach z automatizace → ruční zásahy |
#### Struktura knihy (4 části)
1. **Foundations** — rámec nástrojů a technologií pro cloud platformy
2. **Working with infrastructure stacks** — definice, provisionování, testování a CD změn infrastruktury
3. **Working with servers and application runtime platforms** — provisionování a konfigurace serverů a clusterů
4. **Working with large systems and teams** — workflow, governance, architektonické vzory pro více týmů
#### Organizace IaC kódu
| Vzor | Popis |
|------|-------|
| **Monorepo** | Jeden repozitář pro vše — build-time integrace, vhodný pro malé týmy |
| **Microrepo** | Samostatný repozitář pro každý projekt — izolace, vhodný pro velké týmy |
| **Domain organization** | Organizace kódu podle doménových konceptů (ne podle technologií) |
**Doporučení:**
- Infrastruktura a aplikace mohou být ve stejném nebo odděleném repozitáři záleží na organizační struktuře (Team Topologies)
- Konfigurační soubory per-environment (test, staging, production) ukládat v rámci projektu
- Testy patří k projektu, integrační testy mohou být v samostatném projektu
- Infrastrukturní kód by neměl přímo deployovat aplikace — použít OS packaging (RPM, deb)
#### Expand-Contract pattern pro změny infrastruktury
Stejný princip jako u databázových migrací:
1. **Expand** — přidat nový resource (nestará verze stále běží)
2. **Migrate** — přesunout traffic / závislosti na nový resource
3. **Contract** — odstranit starý resource
Zabraňuje výpadkům při refaktorování infrastruktury.
## Terraform detail
#### State locking mechanism
| Backend | Locking mechanism | Poznámka |
|---------|------------------|----------|
| **S3 + DynamoDB** | DynamoDB (ConditionalPut) | Nejčastější, levný, jednoduchý |
| **Terraform Cloud** | Built-in (API) | SaaS, audit logy, VCS integration |
| **Azure Storage** | Azure Blob Lease | Podobný S3 modelu |
| **GCS** | Cloud Storage Object Hold | Omezené |
| **Consul** | Consul KV session_lock | High-availability |
| **PostgreSQL** | pg_advisory_lock / row lock | Vlastní backend |
#### State backends comparison
| Vlastnost | S3 + DynamoDB | Terraform Cloud | Consul |
|-----------|--------------|----------------|--------|
| Cena | $ (S3 + DynamoDB) | $$ (free tier omezený) | $$ (infra) |
| Team workflow | GitHub Actions + OIDC | Native RBAC, runs | Vlastní |
| Locking | DynamoDB | Built-in | Consul session |
| History | S3 versioning | Full history, diff | None |
| Remote ops | Ne (pouze state) | Ano (remote runs) | Ne |
| Encryption | SSE-S3/KMS | At rest + in transit | TLS |
#### Workspaces vs Terragrunt
| Aspekt | Terraform Workspaces | Terragrunt |
|--------|---------------------|------------|
| **Separace stavu** | Jeden backend, klíč: `env:/workspace` | Samostatný backend per env |
| **Code reuse** | Stejný kód, jiné proměnné | DRY konfigurace, moduly |
| **Riziko** | Omylem `apply` do špatného workspace | Izolované backends |
| **Kdy použít** | Jednoduché projekty, <5 env | Mikroservice, multi-env, multi-team |
| **Extra features** | — | Dependency, include, before_hook |
#### Provider versioning
```hcl
terraform {
required_version = ">= 1.5, < 2.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = ">= 2.23"
}
}
}
```
- `~> 5.0` — pouze patch verze (5.x, x ≥ 0)
- `>= 2.23, < 3.0` — jakákoli 2.x od 2.23
- `~>` constraints zabraňují breaking changes v major/minor
### Terraform workflow
```
terraform init → Stáhne provider moduly
terraform plan → Zobrazí změny
terraform apply → Aplikuje změny
terraform destroy → Zničí infrastrukturu
terraform validate → Validace syntaxe
terraform fmt → Formátování HCL
```
### State management
- Remote state (S3, Terraform Cloud, Azure Storage)
- State locking (DynamoDB, Consul)
- Workspaces pro oddělení prostředí
### Terraform: Up and Running (3rd ed.) — Yevgeniy Brikman
Praktický průvodce Terraformem od zakladatele Gruntwork. 3. vydání (2022) přidává přes 100 stran nového obsahu, aktualizaci z Terraform 0.12 na 1.2 a dvě nové kapitoly.
#### Co je nového ve 3. vydání
| Novinka | Popis |
|---------|-------|
| **Kapitola: Secrets management** | Správa tajemství s Terraformem — Vault, AWS Secrets Manager, KMS, OIDC, `sensitive` proměnné |
| **Kapitola: Multiple providers** | Práce s vícero regiony, účty, cloudy včetně Kubernetes (AWS EKS) |
| **Terraform 1.0+** | Backward compatibility promise, stabilita, HashiCorp IPO |
| **Provider versioning** | `required_providers` blok + `terraform.lock.hcl` (lock file) |
| **Module iteration** | `count` a `for_each` na modulech (od Terraform 0.13) |
| **Variable validation** | `validation {}` bloky, `precondition` / `postcondition` |
| **Refactoring** | `moved` bloky — bezpečný refactoring bez ruční manipulace se state |
| **CI/CD security** | OIDC autentizace, isolated workers pro `terraform apply` |
#### Secrets management s Terraformem
```hcl
# Proměnná označená jako sensitive — nikdy se nezobrazí v logu
variable "db_password" {
type = string
sensitive = true
}
# Čtení secrets z AWS Secrets Manager
data "aws_secretsmanager_secret" "db" {
name = "production/db/master"
}
data "aws_secretsmanager_secret_version" "db" {
secret_id = data.aws_secretsmanager_secret.db.id
}
```
**Doporučená hierarchie bezpečnosti:**
1. **OIDC** — nejbezpečnější, bez creds na CI serveru (GitHub Actions → IAM role)
2. **IAM role** — instance profile (EC2, ECS, EKS)
3. **Environment variables** — omezené, riziko úniku v logu
4. **Isolated workers** — oddělený worker s admin permissions, API pouze `plan`/`apply`
#### Testing Terraform kódu
| Vrstva | Nástroje | Popis |
|--------|----------|-------|
| **Static analysis** | `terraform validate`, `tflint`, `tfsec`, `checkov` | Analýza kódu bez běhu |
| **Plan testing** | `conftest` + OPA (Rego), `terraform plan` parse | Validace plánu proti policy |
| **Unit tests** | Terratest (Go), `terraform fmt`, `terraform validate` | Testování modulů izolovaně |
| **Integration tests** | Terratest (Go) | Skutečné provisionování + assert |
| **End-to-end tests** | Terratest | Plný stack, smoke testy |
#### Policy enforcement
```rego
# OPA / conftest — zakázat veřejné S3 bucket
package main
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_s3_bucket"
resource.change.after.acl == "public-read"
msg = sprintf("%s must not be public", [resource.address])
}
```
#### Production-grade checklist dle Brikmana
1. **Small modules** — jeden modul = jedna věc (single responsibility)
2. **Composable modules** — moduly se dají skládat do větších celků
3. **Testable modules** — každý modul má testy (Terratest)
4. **Releasable modules** — verzování (Git tagy, Terraform Registry)
5. **Version control** — všechno v gitu, včetně `.terraform.lock.hcl`
6. **Remote state** — S3 + DynamoDB nebo Terraform Cloud
7. **CI/CD pipeline**`plan` na MR, `apply` po merge do main
8. **Secrets management** — žádné secrets v plaintextu v kódu
9. **Policy as code** — OPA / Sentinel pro compliance
10. **Sandbox prostředí** — každý vývojář má vlastní izolované prostředí
#### Zlaté pravidlo (Golden Rule of Terraform)
> **Master branch state musí být vždy v souladu s produkčním prostředím.**
> Nikdy nespouštět `terraform apply` ručně lokálně na produkci — vždy přes CI/CD.
## Dockerfile best practices
```dockerfile
# Multi-stage build
FROM node:22-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
# Runtime stage — distroless
FROM gcr.io/distroless/nodejs22-debian12
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
USER nonroot:nonroot
EXPOSE 3000
CMD ["dist/server.js"]
```
**Pravidla**:
- **Multi-stage build** — oddělení build tools od runtime
- **Distroless images** — minimal attack surface (žádný shell, package manager)
- **Non-root user** — USER nonroot (security best practice)
- **Layer caching** — nejdříve kopírovat málo se měnící soubory (package.json → npm ci → code)
- **Small base image** — Alpine (5 MB), distroless (minimální), scratch (Go static binary)
- **Healthcheck** — HEALTHCHECK instrukce pro orchestrátor
- **Labels** — LABEL maintainer, version, git commit
- **.dockerignore** — minimalizace build contextu
## Artifact management
### Docker registries
| Registry | Public/Private | Cena | Integrace |
|----------|---------------|------|-----------|
| **Docker Hub** | Obojí | Public free, private $5/měsíc | GitHub Actions, GitLab |
| **ECR (AWS)** | Private | $0.10/GB/měsíc + data transfer | IAM, ECS, EKS |
| **GHCR (GitHub)** | Obojí | Public free, private 500 MB free | GitHub Actions, npm |
| **GCR / Artifact Registry** | Private | $0.10/GB/měsíc | GKE, Cloud Build |
| **ACR (Azure)** | Private | $0.11/GB/měsíc | AKS, Azure DevOps |
| **Harbor** | Private (self-hosted) | Zdarma (open source) | Vlastní, CNCF |
### Helm charts
- **Repository** — index.yaml + chart .tgz na HTTP serveru (S3, GitHub Pages, ChartMuseum)
- **OCI registry** — Helm 3.8+ podporuje uložení chartů v OCI registrech (ECR, GHCR, Harbor)
- **Versioning** — chart version (balíček) + app version (aplikace)
### SBOM (Software Bill of Materials)
- **SPDX** / **CycloneDX** — standardní formáty SBOM
- Generování: Trivy, Syft, grype
- Use case: supply chain security, compliance (EO 14028, EU CRA)
## Konfigurace a tajemství
| Nástroj | Popis |
|---------|-------|
| Vault (HashiCorp) | Dynamic secrets, encryption-as-a-service |
| AWS Secrets Manager | Managed, auto-rotation |
| Azure Key Vault | Managed, HSM podpora |
| GCP Secret Manager | Managed |
| SOPS | Encryption v git repos |
| Sealed Secrets | Encrypted secrets pro Kubernetes |
### Secret management workflows
**Vault agent injector** (Kubernetes)
- Sidecar container (vault-agent) injectuje secrets do podu
- Secrets mountovány jako tmpfs volume (ne do environment variables)
- Auto-rotation: vault-agent periodicky refreshuje secrets
**External Secrets Operator** (Kubernetes)
- CRD: `ExternalSecret` → vytváří `Secret` v K8s
- Backend: AWS Secrets Manager, Azure Key Vault, GCP Secret Manager, Vault
- Push-based refresh: změna v externím store → propagate do K8s
**Sealed Secrets**
- `kubeseal` zašifruje Secret na clusteru (controller má privátní klíč)
- Zašifrovaný manifest (SealedSecret) může být bezpečně v gitu
- Controller decryptuje při deployi
## GitOps
- **Princip**: Git je jediný zdroj pravdy (single source of truth)
- **Nástroje**: ArgoCD, Flux, Rancher Fleet
- Pull-based deploy — agent v clusteru sleduje repo a aplikuje změny
- Auto-sync + drift detection
## Environment promotion (dev → staging → prod)
```
Code → Dev (auto-deploy) → Staging (auto + smoke tests) → Prod (manual approval + gating)
```
**Quality gates**:
1. **Unit tests** — pass rate 100 %, code coverage ≥ 80 %
2. **Integration tests** — all critical paths pass
3. **SAST scan** — no critical/high vulnerabilities
4. **SCA scan** — no known critical CVEs
5. **Container scan** — all fixable vulns addressed
6. **Smoke tests** — po deployi na staging (health endpoint, basic flow)
7. **Manual approval** — pro produkci (volitelné při CD)
## Deployment strategie
| Strategie | Popis | Riziko |
|-----------|-------|--------|
| **Rolling update** | Postupná výměna instancí | Nízké |
| **Blue/Green** | Dvě identická prostředí, přepojení trafficu | Střední |
| **Canary** | % trafficu na novou verzi, postupné zvyšování | Nízké |
| **Feature flag** | Zapnutí/vypnutí funkce bez deploye | Velmi nízké |
| **A/B testing** | Různé verze pro různé uživatele | Nízké |
## Git branching strategies
| Strategie | Popis | Vhodné pro |
|-----------|-------|-----------|
| **Trunk-based** | Jeden hlavní branch (main), krátké feature branche (< 1 den) | CD, microservices, mature teams |
| **GitHub Flow** | Main + feature branche, PRs, jednoduchý | Startupy, web apps |
| **GitLab Flow** | Main + environment branche (staging, prod) + feature branche | Enterprise, regulated |
| **GitFlow** | Develop + main + feature/release/hotfix branche | Release-based, enterprise legacy |
| **One Flow** | Zjednodušený GitFlow (bez develop branche) | Střední týmy |
## Rollback strategies
| Strategie | Popis | Rychlost | Riziko |
|-----------|-------|----------|--------|
| **Forward fix** | Nový deploy s hotfixem | Pomalá (build + deploy) | Nízké |
| **Rollback (revert commit)** | Revert gitu, nový deploy | Střední | Nízké |
| **Blue/Green switchback** | Přepojení zpět na starou verzi | Okamžitá | DB inkompatibilita |
| **Database rollback** | Reverze DB migrace (migrate down) | Pomalá | Data loss risk |
### Database rollback challenges
- **Breaking changes** — odstranění sloupce/tabulky znamená rollback problém (data ztracena)
- **Best practice**: Expand → Migrate → Contract (nikdy neodstraňovat v jednom deployi)
- **Tooling**: Flyway undo (limited), Liquibase rollback, pgroll (Postgres)
- **Feature flagy** jako prevence — nový kód je za flagem, rollback = vypnutí flagu
## CI/CD Design Patterns
Moderní CI/CD pipeline řeší opakující se problémy pomocí návrhových vzorů:
| Vzor | Popis |
|------|-------|
| **Pipeline as Code** | Pipeline definovaná v YAML/Kotlin DSL (`.gitlab-ci.yml`, `.github/workflows/`) |
| **Immutable Pipeline** | Každý build je artifact, nikdy se nemění |
| **Quality Gate** | Branch protection, required checks, code coverage threshold |
| **Deployment Strategy** | Blue/Green, Canary, Rolling (viz tabulka níže) |
| **GitOps** | Pull-based deploy s auto-sync a drift detection |
| **Shift-Left Security** | SAST/DAST/SCA součást pipeline |
| **Dependency Caching** | Cache layer mezi běhy pipeline |
## Shift left security
### SCA (Software Composition Analysis)
| Nástroj | Typ | Integrace |
|---------|-----|-----------|
| **Dependabot** | GitHub native | GitHub, auto-PR na fix |
| **Renovate** | Multi-platform | GitHub, GitLab, Bitbucket |
| **Snyk** | SaaS + CLI | Všechny platformy, Docker, IaC |
| **Trivy** | CLI, OSS | CI/CD pipeline (GitHub Actions, GitLab) |
### SAST (Static Application Security Testing)
| Nástroj | Jazyky | Charakteristika |
|---------|--------|----------------|
| **Semgrep** | 30+ (Python, Java, Go, JS/TS) | Rychlý, custom rules, CI-native |
| **SonarQube** | 30+ | Komplexní, quality gates, tech debt |
| **CodeQL** | 12 (C++, C#, Go, Java, JS/TS, Python) | GitHub native, query-based |
| **Checkmarx** | 30+ | Enterprise, CxSAST, CxFlow |
| **Fortify** | 30+ | Enterprise, SAST + DAST |
### Container scanning
| Nástroj | Popis |
|---------|-------|
| **Trivy** | OSS, skenuje OS packages + language-specific + IaC |
| **Grype** | OSS, od Anchore, rychlý, Syft pro SBOM |
| **Clair** | Red Hat, OSS, OCI-compatible |
| **Docker Scout** | Docker Desktop / CLI, integrace s Docker Hub |
## AI-Native Software Delivery (20252026)
AI transformuje DevOps 2.0:
- **AI-assisted CI/CD** — automatické diagnózy selhání pipeline, optimalizace resource alokace
- **Agent Control Protocol (ACP)** / **Model Context Protocol (MCP)** — standardy pro interakci AI agentů s toolingem
- **AI-driven cost management** — FinOps optimalizace cloudu
- **Intelligent test selection** — ML určuje, které testy spustit podle změn v kódu
- **Self-healing pipelines** — AI auto-detekuje a opravuje běžné problémy
Nové nástroje: Harness (AI-native CD), GitLab 19.0 (agentic MR workflows, secrets manager), Octopus Deploy.
## Nástroje pro pipeline
- **GitHub Actions** — integrovaný s GitHubem, velký marketplace
- **GitLab CI** — nativní v GitLabu, auto DevOps
- **Jenkins** — nejstarší, extensible, self-hosted
- **CircleCI** — SaaS, rychlý
- **Argo Workflows** — Kubernetes nativní
- **Buildkite** — hybrid (vlastní agenti, SaaS orchestrator)
## Best practices
- **Idempotentní pipeline** — opakované spuštění dává stejný výsledek
- **Immutable infrastructure** — nikdy neupravovat running server, vždy znovu nasadit
- **Shift left** — testy a security co nejdříve v pipeline
- **Artifact management** — všechny buildy verzované v registru (Docker Hub, ECR, GHCR)
- **Dependency caching** — urychlení pipeline (npm ci, pip cache, Docker layer caching)
- **Fail fast** — pipeline selže co nejdříve při chybě
## Zdroje
Odkazy, knihy a standardy: [sources/cicd/sources.md](sources/cicd/sources.md)
### Doporučená literatura
| Kniha | Autoři | ISBN | Klíčový přínos |
|-------|--------|------|----------------|
| The DevOps Handbook | Kim, Humble, Debois, Willis | 978-1942788003 | Principy CALMS (Culture, Automation, Lean, Measurement, Sharing), flow mapa, deployment pipeline |
| Continuous Delivery | Humble, Farley | 978-0321601912 | Deployment pipeline, commit stage, acceptance tests, capacity testing, zero-downtime release |
| CI/CD Design Patterns | Bajpai, Schildmeijer, Piwosz, Mishra | 978-1-83588-965-7 | 30+ návrhových vzorů pro CI/CD — pipeline patterns, GitOps, security, testing, deployment strategie |
| DevOps Frameworks, Techniques, and Tools | Vijayakumaran, Kofler, Öggl, Springer | 978-1-4932-2670-2 | Rámec pro adopci DevOps, srovnání nástrojů (Jenkins vs GitLab vs GitHub Actions), techniky pro monitoring a observabilitu |
- **Quality gates** — automated checks před každým povýšením do dalšího prostředí
- **Pipeline visibility** — dashboard s aktuálním stavem všech pipeline (GitHub, GitLab, ArgoCD)
## OpenStack CI/CD
OpenStack ekosystém používá vlastní CI/CD nástroje:
### Zuul
- CI/CD systém vyvinutý OpenStack komunitou (nyní samostatný, používaný i mimo OpenStack)
- **Gating** — změny se testují před merge (ne po merge) — zabraňuje rozbití main branche
- **Ansible-based** — jobs jsou Ansible playbooky
- **Nodepool** — dynamická alokace testovacích VM v cloudu (OpenStack, AWS)
- **Pipeline** — check, gate, post, periodic, tag, release
### OpenStack Infra (OpenDev)
- Veřejná CI infrastruktura pro OpenStack projekty
- Nástroje: Gerrit (code review), Zuul (CI), Nodepool (test nodes), Storyboard (issue tracking)
- Base jobs: tempest (integration tests), grenade (upgrade tests), devstack-gate (gate tests)
### Integrace s externími nástroji
- **Terraform** — OpenStack provider pro provisioning (terraform-provider-openstack)
- **Ansible** — openstack.cloud collection pro správu OpenStack zdrojů
- **Packer** — build OpenStack images (openstack builder)
- **Jenkins** — starší CI, stále používaný v některých distribucích
*Poslední revize: 2026-06-03*

495
CLOUD.en.md Normal file
View File

@@ -0,0 +1,495 @@
# ☁️ Cloud Architecture
## Providers
- **AWS** — largest market share, broadest portfolio
- **Azure** — strong integration with Microsoft ecosystem
- **GCP** — Kubernetes (GKE), data & ML, network connectivity
## Deployment Models
| Model | Description |
|-------|-------------|
| Public cloud | Shared provider infrastructure |
| Private cloud | Dedicated infrastructure (on-prem or hosted) |
| Hybrid cloud | Public + private interconnection |
| Multi-cloud | Multiple public providers |
## Multi-cloud Strategy
### Reasons for Multi-cloud
- **Vendor lock-in prevention** — risk diversification
- **Regulatory requirements** — data residency in specific regions
- **Best-of-breed** — each provider has strengths (AWS networking, Azure enterprise, GCP data/ML)
- **Acquisition scenarios** — merge & acquisition unification
### Multi-cloud Connectivity
| Method | Latency | Throughput | Cost |
|--------|---------|------------|------|
| Site-to-Site VPN | Medium | Limited | Low |
| Private interconnect (Direct Connect / ExpressRoute / Dedicated Interconnect) | Low | High | High |
| Cloud-to-cloud VPN | Medium | Medium | Medium |
| SD-WAN | Low | High | Medium |
### Challenges
- **Network complexity** — different VPC/VNet concepts, security models
- **IAM federation** — unified identities across clouds (SSO, SAML, OIDC)
- **Data gravity** — moving data between clouds is expensive and slow
- **Monitoring** — single pane of glass across clouds (Grafana, Datadog)
### Cloud Adoption Frameworks (CAF)
Each major provider has its own Cloud Adoption Framework for a structured approach to cloud adoption:
| Provider | Framework | Focus |
|----------|-----------|-------|
| AWS | AWS CAF | 6 perspectives: Business, People, Governance, Platform, Security, Operations |
| Azure | Microsoft CAF | 8 methodologies: Strategy, Plan, Ready, Migrate, Innovate, Govern, Manage, Secure |
| GCP | Google CAF | 4 pillars: Learn, Scale, Modernize, Operate |
Multi-Cloud Administration Guide (Mulder, 2024) recommends combining CAF frameworks across providers for unified governance models, especially in:
- **Interoperability** — standardization of APIs and IaC across clouds (Terraform, Pulumi)
- **Data governance** — unified policy for data residency and lifecycle
- **Compliance automation** — automated audits across clouds (AWS Config, Azure Policy, GCP Org Policies)
- **Access management** — identity federation and centralized RBAC
## Migration Strategies — 6 Rs
| Strategy | Description | Difficulty | Typical Scenario |
|----------|-------------|------------|------------------|
| **Rehost** (Lift & Shift) | Move VM/as-is without changes | Low | Quick migration, datacenter exit, minimal risk |
| **Replatform** (Lift & Reshape) | Migration with minor adjustments (e.g., RDS instead of self-managed DB) | Medium | Optimization without rewriting the application |
| **Refactor** (Re-architect) | Rewrite application as cloud-native (microservices, serverless) | High | Maximize cloud benefit, long-term strategy |
| **Repurchase** | Move to SaaS (e.g., Salesforce, Workday) | Low | Application is outdated, SaaS alternative exists |
| **Retire** | Decommission unused applications | Low | Application no longer in use |
| **Retain** | Keep on-prem | None | Regulatory reasons, too high migration risk |
### Decision Framework for 6 Rs
```
Start: Is the application needed?
├── No → Retire
└── Yes → Does a SaaS alternative exist?
├── Yes → Repurchase
└── No → Is refactoring worthwhile?
├── Yes → Refactor
└── No → Is platform change sufficient?
├── Yes → Replatform
└── No → Rehost
```
## Well-Architected Framework (AWS)
1. **Operational Excellence** — automation, monitoring, documentation
2. **Security** — IAM, encryption, compliance
3. **Reliability** — recovery, scaling, backup plans
4. **Performance Efficiency** — right-sizing, choosing the right services
5. **Cost Optimization** — FinOps, reserved instances, spot instances
6. **Sustainability** (since 2022) — carbon footprint, energy efficiency
Analogues: Azure Well-Architected Framework, GCP Architecture Framework
### Key Questions from Well-Architected Review (~60 questions)
**Operational Excellence (12 questions)**
- How are changes managed and automated?
- How are operations documented and shared within the team?
- How are expected and unexpected events reflected in operations?
- What runbooks exist for common operational scenarios?
- How is incident management and postmortem process conducted?
**Security (12 questions)**
- How is identity & access management implemented?
- How is data protected at rest and in transit?
- How is security incident detection ensured?
- What are the procedures for patch management and vulnerability remediation?
- How are infrastructure credentials and secrets managed?
**Reliability (12 questions)**
- How is service availability ensured during a component failure?
- How is backup and disaster recovery implemented?
- How do service limits (quotas, throttling) affect reliability?
- How does automatic scaling work under changing load?
- What are the SLI/SLO metrics and how are they monitored?
**Performance Efficiency (12 questions)**
- How is the correct type and size of compute/storage selected?
- How is the database layer optimized (indexes, queries, caching)?
- How is monitoring used to identify bottlenecks?
- How is scaling implemented (vertical vs horizontal)?
**Cost Optimization (12 questions)**
- How are costs allocated to teams/projects (chargeback/showback)?
- What tools are used for cost analysis?
- How are unused resources identified and eliminated?
- How is licensing optimized (BYOL, hybrid benefit)?
## Key Components
### Compute Layer
- **VM / instances** — EC2, Azure VMs, GCE
- **Container orchestration** — EKS, AKS, GKE
- **Serverless** — Lambda, Azure Functions, Cloud Functions
- **PaaS** — App Engine, Elastic Beanstalk, Azure App Service
### Compute Comparison Matrix (AWS EC2)
| Family | Type | vCPU:Memory | Use Case | Example Pricing (on-demand, us-east-1) |
|--------|------|-------------|----------|----------------------------------------|
| **General purpose** | M7g, m7i | 1:4 | Web servers, microservices, dev/test | m7i.large ~$0.088/h |
| **Compute optimized** | C7g, c7i | 1:2 | HPC, batch processing, CI/CD, gaming | c7i.large ~$0.078/h |
| **Memory optimized** | R7g, r7i, x2idn | 1:8 to 1:32 | In-memory DB (Redis), SAP HANA, real-time analytics | r7i.large ~$0.118/h |
| **Storage optimized** | I4i, im4gn | 1:4 + NVMe | Transactional DB, data warehousing, Kafka | i4i.large ~$0.138/h |
| **GPU / ML** | P5, g5, trn1 | GPU attach | AI training (P5), inference (g5), ML (trn1) | g5.xlarge ~$1.006/h |
See [GPU.en.md](GPU.en.md) for GPU model and configuration details.
### Storage
- **Object storage** — S3, Blob Storage, Cloud Storage
- **Block storage** — EBS, managed disks, persistent disks
- **File storage** — EFS, Azure Files, Filestore
- **CDN** — CloudFront, Azure CDN, Cloud CDN
### S3 Storage Classes
| Class | Availability | Retrieval Time | Price / GB / Month | Use Case |
|-------|-------------|----------------|--------------------|----------|
| **S3 Standard** | 99.99 % | milliseconds | ~$0.023 | Active data, frequent access |
| **S3 Intelligent-Tiering** | 99.9 % | milliseconds | ~$0.023 + monitoring fee | Unknown/variable access patterns |
| **S3 Standard-IA** | 99.9 % | milliseconds | ~$0.0125 | Less frequent but fast access |
| **S3 One Zone-IA** | 99.5 % | milliseconds | ~$0.01 | Reproducible data |
| **S3 Glacier Instant** | 99.9 % | milliseconds | ~$0.004 | Archive with occasional access |
| **S3 Glacier Flexible** | 99.99 % | 1-5 min (expedite) / 3-5 h (standard) | ~$0.0036 | Long-term archive |
| **S3 Glacier Deep Archive** | 99.99 % | 12 h (standard) / 48 h (bulk) | ~$0.00099 | Cheapest, compliance archives |
## Multi-AZ and Multi-Region Architecture
```
Region ┌──────────────────────────────┐
│ AZ-1 AZ-2 AZ-3 │
│ ┌───┐ ┌───┐ ┌───┐ │
│ │APP│──────│APP│──────│APP│ │
│ └─┬─┘ └─┬─┘ └─┬─┘ │
│ │ │ │ │
│ ┌─▼──────────▼──────────▼─┐ │
│ │ Load Balancer │ │
│ └────────────┬────────────┘ │
│ │ │
│ ┌────────────▼────────────┐ │
│ │ Database (Primary) │ │
│ │ + Read Replica │ │
│ └─────────────────────────┘ │
└──────────────────────────────┘
```
## Disaster Recovery Strategies
### DR Strategies on AWS (from least to most prepared)
| Strategy | RTO | RPO | Cost | Description |
|----------|-----|-----|------|-------------|
| **Backup & Restore** | hours | 24 h | Low | Regular data backups to S3/Glacier, restore in DR region |
| **Pilot Light** | tens of minutes | minutes | Medium | Minimal running copy (DB, core services), scale on failover |
| **Warm Standby** | minutes | seconds | High | Reduced production copy running, scale on failover |
| **Active-Active (Multi-Region)** | seconds | < 1 s | Very high | Fully active in multiple regions, traffic routing (Route53, Global Accelerator) |
Key books on the topic:
- **Engineering Resilient Systems on AWS** (Schwarz, Moran, Bachmeier, 2024) — practical labs for resilience patterns: back off and retry, multi-Region failover, circuit breaker, chaos engineering using AWS Fault Injection Simulator
- **Building Resilient Architectures on AWS** (2025) — data security, backup strategies, recovery plan automation
### Chaos Engineering
Deliberate fault injection to verify system resilience:
- **AWS Fault Injection Simulator (FIS)** — managed fault injection for EC2, ECS, EKS, RDS
- **Tools**: Chaos Mesh (Kubernetes), Gremlin, Litmus
- **Process**: define hypothesis → run experiment → measure impact → improve system
- **Safety**: experiments in isolated environment, safety controls, automatic rollback
## Cloud Design Patterns
### Strangler Fig
Gradually replacing parts of a monolithic application with microservices.
- Legacy functionality is progressively redirected to new services
- Strangler Fig proxy (route headers, feature flags) controls traffic migration
- Advantage: incremental value delivery without big-bang rewrite
### Circuit Breaker
Prevents cascading failures when a dependent service fails.
- Three states: **Closed** (normal operation), **Open** (requests immediately fail), **Half-Open** (test request after timeout)
- Parameters: failure threshold, timeout (reset timeout), half-open max requests
- Implementations: resilience4j, Hystrix (legacy), Istio (envoy), AWS App Mesh
### Saga
Distributed transaction across microservices — a series of local transactions with compensating actions.
- **Choreography** — each service publishes an event, the next service reacts (Kafka, EventBridge)
- **Orchestration** — central orchestrator manages steps (Step Functions, Temporal, Camunda)
### CQRS (Command Query Responsibility Segregation)
Separation of write (Command) and read (Query) models.
- Command model: optimized for writes (normalized, transactional)
- Query model: optimized for reads (denormalized, read-optimized views)
- Eventual consistency between models (event bus propagates changes)
- Use case: reporting, audit logs, high-throughput systems
### Event Sourcing
Storing state as a sequence of events, not the current state.
- Each change is an append-only event in an event store
- Current state = fold of all events
- Advantages: audit trail, time travel, CQRS compatibility
- Implementations: EventStoreDB, Kafka (log), DynamoDB + CDC
### Additional Cloud Patterns (Wilder — Cloud Architecture Patterns)
| Pattern | Category | Description |
|---------|----------|-------------|
| **Horizontally Scaling Compute** | Scalability | Adding/removing instances based on load, elasticity |
| **Queue-Centric Workflow** | Scalability | Decoupling components via queues (SQS, RabbitMQ), async processing |
| **Auto-Scaling** | Scalability | Automatic scaling based on metrics (CPU, memory, request count) |
| **MapReduce** | Big Data | Distributed data processing (Hadoop, EMR, BigQuery) |
| **Database Sharding** | Big Data | Horizontal data partitioning across databases |
| **Busy Signal** | Failure Handling | Graceful degradation under overload (HTTP 503, throttling, backpressure) |
| **Node Failure** | Failure Handling | Detection and automatic recovery from compute node failure |
| **Colocation** | Distributed Users | Placing compute close to data to reduce latency |
| **Valet Key** | Distributed Users | Delegated storage access (SAS tokens, S3 presigned URLs) |
| **Multi-Site Deployment** | Distributed Users | Active deployment in multiple geographic locations |
## Evolutionary Architecture
Definition (Ford, Parsons, Kua, 2022): *An evolutionary architecture supports guided, incremental change across multiple dimensions.*
### Fitness Functions
Automated checks of architectural characteristics — analogous to tests for architecture:
| Type | Description | Example |
|------|-------------|---------|
| **Atomic** | Checks a single metric | Cyclomatic complexity < 10 |
| **Holistic** | Checks the overall system | End-to-end latency < 200 ms |
| **Triggered** | Triggered by event (CI/CD commit, deployment) | API contract verification |
| **Continuous** | Runs continuously in production | Monitoring dependency freshness |
| **Static** | Code analysis without execution | SonarQube, ESLint |
| **Dynamic** | Runtime analysis | Load tests, chaos experiments |
### Principles of Evolutionary Architecture
1. **Incremental change** — small, safe changes thanks to CI/CD, deployment pipelines, mature DevOps
2. **Fitness functions** — automated protection of architectural characteristics (scalability, performance, security)
3. **Coupling management** — conscious work with component connections (affinity, volatility, cycles)
4. **Evolutionary data** — database migrations as first-class citizens (evolutionary schemas, expand-contract pattern)
### Antipatterns
- **Big Design Up Front (BDUF)** — trying to design everything upfront, ignoring change
- **No Design at All** — absence of architectural thinking, purely emergent design
- **Premature Standardization** — introducing standards before the domain is understood
## Hybrid Cloud Connectivity
See also: [NETWORKING.en.md](NETWORKING.en.md) — network architecture (VPN, BGP, VPC design).
- **Site-to-Site VPN** — IPSec tunnel over the internet
- **Direct Connect / ExpressRoute / Dedicated Interconnect** — private physical connection
- **Cloud VPN / Transit Gateway** — hub-and-spoke topology
## Cost Optimization Detail
### Savings Plans vs Reserved Instances
| Property | Compute Savings Plan | EC2 Instance Savings Plan | Reserved Instances |
|----------|----------------------|---------------------------|-------------------|
| Flexibility | Instance family, region, OS | Instance family + region | Specific instance |
| Term | 1 or 3 years | 1 or 3 years | 1 or 3 years |
| Discount (typical) | ~30-50 % | ~40-60 % | ~40-60 % |
| Change instance | Yes (any) | Yes (within family) | No |
| Change region | Yes | No | No |
| Payment options | All Upfront / Partial / No Upfront | All Upfront / Partial / No Upfront | All Upfront / Partial / No Upfront |
### Spot Instance Best Practices
- **Diversification** — use a mix of instance types (spot fleet) for higher availability
- **Graceful handling** — application must handle termination notice (2 minute warning)
- **Checkpointing** — regular state saving for restart after spot interruption
- **Spot block** (AWS) — protection for 1-6 h (limited availability)
- **Use cases**: batch processing, CI/CD runners, stateless microservices, ML training
- **Avoid**: stateful workloads, databases (without special design)
## Organization and Governance
### AWS Organizations
```
Root OU
├── Security OU
│ ├── Audit Account (CloudTrail, Config)
│ └── Security Tooling Account (GuardDuty, Security Hub)
├── Infrastructure OU
│ ├── Network Account (Transit Gateway, VPN)
│ ├── Shared Services Account (AD, SSO)
│ └── Log Archive Account
├── Workloads OU
│ ├── Dev OU → individual dev accounts
│ ├── Staging OU → staging accounts
│ └── Prod OU → production accounts
└── Sandbox OU → isolated experimental accounts
```
- **SCP** (Service Control Policies) — whitelist/blacklist services at OU level
- **Tag policies** — enforce tagging across accounts
- **AI services opt-out** — control data usage in AWS AI services
### Azure Management Groups
```
Tenant Root Group
├── Platform MG
│ ├── Connectivity (hub VNet, ExpressRoute)
│ ├── Management (Log Analytics, Automation)
│ └── Identity (AD DS, PIM)
├── Application MG
│ ├── DEV (dev subscriptions)
│ ├── TEST (test subscriptions)
│ └── PROD (production subscriptions)
└── Sandbox MG
```
- **Azure Policy** — built-in and custom policies (similar to SCP)
- **Management Group hierarchy** — up to 6 levels deep
- **Subscription limits** — max 10,000 subscriptions per tenant
### GCP Projects
```
Organization Node
├── Folder: Platform
│ ├── Project: Shared Networking (VPC, Cloud NAT, VPN)
│ ├── Project: Security (Cloud KMS, Secret Manager, Chronicle)
│ └── Project: Monitoring (Cloud Monitoring, Logging)
├── Folder: Workloads
│ ├── Folder: Dev
│ │ └── Project: [app]-dev
│ ├── Folder: Staging
│ │ └── Project: [app]-staging
│ └── Folder: Prod
│ └── Project: [app]-prod
└── Folder: Sandbox
└── Project: [user]-sandbox
```
- **Organization policies** — constraints at organization/folder level
- **Resource Manager** — hierarchy: Organization → Folder → Project → Resources
- **Project limits** — max 30 projects (can be increased), 10k resources per project
## 12-Factor App Methodology
Methodology for building cloud-native applications (Heroku, 2011), expanded by the book **Multi-Cloud Handbook for Developers** (Natarajan, Jacob, 2024).
| # | Factor | Description | Cloud Implementation |
|---|--------|-------------|----------------------|
| 1 | **Codebase** | One repo, many deployments | Git + CI/CD pipeline |
| 2 | **Dependencies** | Explicit dependency declaration | package.json, requirements.txt, Docker image |
| 3 | **Config** | Configuration in environment variables | Secrets Manager, Parameter Store, env vars |
| 4 | **Backing services** | Dependent services as attached resources | RDS, S3, Redis — connection via connection string |
| 5 | **Build, release, run** | Strict separation of build stages | CI/CD pipeline (GitHub Actions, GitLab CI) |
| 6 | **Processes** | Application as stateless processes | Horizontal scaling, session in Redis |
| 7 | **Port binding** | Service exports port, not embedded in server | Express, FastAPI, Spring Boot on own port |
| 8 | **Concurrency** | Scaling via process model | Horizontal Pod Autoscaler (K8s), EC2 Auto Scaling |
| 9 | **Disposability** | Fast startup and graceful shutdown | Health checks, SIGTERM handling, preStop hooks |
| 10 | **Dev/Prod parity** | Minimal difference between environments | Docker, IaC (Terraform), same backing services |
| 11 | **Logs** | Logs as event streams | stdout/stderr → CloudWatch, ELK, Datadog |
| 12 | **Admin processes** | Admin tasks as one-off processes | DB migrations, data backfill — run in isolation |
### Multi-cloud Extensions (Multi-Cloud Handbook for Developers)
- **API-first design** — consistent API interfaces across clouds (REST, gRPC)
- **Domain-Driven Design (DDD)** — bounded contexts mapped to cloud services
- **Service Mesh** — Istio, Linkerd for observability, traffic management and security across clouds
- **GitOps** — declarative deployment with ArgoCD/Flux across Kubernetes clusters in different clouds
## Azure Cloud Native Architecture (Map Book)
Based on **The Azure Cloud Native Architecture Mapbook (2nd ed.)** (Eyskens, 2025) — 40+ architectural maps across domains:
### Domains of Architectural Maps
| Domain | Key Azure Services | Architectural Patterns |
|--------|-------------------|----------------------|
| **Infrastructure** | VNet, Azure Firewall, ExpressRoute, VPN Gateway | Hub-and-spoke, Virtual WAN, Private Link |
| **Applications** | App Service, API Management, Service Bus, Functions | Event-driven, Strangler Fig, Backend for Frontend |
| **Data** | Cosmos DB, SQL Database, Synapse, Data Lake | CQRS, Event Sourcing, Polyglot Persistence |
| **Container Orchestrators** | AKS, Azure Container Apps, ACA | Sidecar, Ambassador, Adapter (service mesh) |
| **AI** | Azure OpenAI, Cognitive Services, ML Studio | RAG, model fine-tuning, MLOps |
| **Security** | Entra ID, Defender for Cloud, Key Vault, Sentinel | Zero Trust, Defense in depth, JIT Access |
### Cloud Adoption Framework on Azure
- **Strategy** — business case, application catalog, portfolio rationalization
- **Plan** — landing zone design, governance baseline, subscription taxonomy
- **Ready** — landing zone implementation (ALZ), Azure Policy, Networking, Identity
- **Migrate** — assessment (Azure Migrate), rehost/replatform, test and cutover
- **Govern** — cost management, policy enforcement, compliance monitoring
## Cloud Provider Comparison
Based on **Cloud Computing: AWS, Azure, Google Cloud** (Sario, 2025):
| Area | AWS | Azure | GCP |
|------|-----|-------|-----|
| **Compute** | EC2, Lambda, ECS/EKS | VMs, Functions, AKS | GCE, Cloud Functions, GKE |
| **Storage** | S3, EBS, EFS | Blob, Disk, Files | Cloud Storage, Persistent Disk, Filestore |
| **Relational DB** | RDS (MySQL, PG, SQL Server, Oracle, MariaDB) | SQL Database, MySQL/PostgreSQL | Cloud SQL (MySQL, PG, SQL Server) |
| **NoSQL DB** | DynamoDB, ElastiCache | Cosmos DB, Redis Cache | Firestore, Bigtable, Memorystore |
| **Message queue** | SQS, SNS | Service Bus, Queue Storage | Pub/Sub, Tasks |
| **Observability** | CloudWatch, X-Ray | Monitor, Application Insights | Cloud Monitoring, Cloud Trace |
| **AI/ML** | SageMaker, Bedrock | Azure ML, OpenAI | Vertex AI, AutoML |
| **Pricing (compute)** | On-demand, Reserved, Spot, Savings Plan | Pay-as-you-go, Reserved, Spot | On-demand, Committed Use, Spot |
## OpenStack as Private Cloud
OpenStack is the dominant open-source platform for building private clouds (IaaS). It provides compute (Nova), networking (Neutron), and storage services (Cinder/Swift/Manila) with a unified API.
### Advantages over Commercial Solutions
- **Vendor-neutral API** — avoids lock-in (VMware, Hyper-V)
- **Multi-tenancy** — Keystone identity, RBAC, projects, quotas
- **Hybrid cloud ready** — federation with AWS/Azure/GCP, Terraform provisioning
- **Ecosystem** — hundreds of services (Heat orchestration, Magnum containers, Designate DNS)
### Suitable Scenarios
| Scenario | Key Services |
|----------|--------------|
| Data center with multi-tenancy and self-service | Nova, Neutron, Cinder, Horizon |
| Telco / NFVI / MEC | Neutron (DPDK, SR-IOV), Nova (NUMA pinning) |
| Science and HPC | Cyborg (GPU), Manila (NAS), Ironic (bare metal) |
| Academic clouds | Keystone federation, Trove (DBaaS) |
### Challenges
- Significant deployment and operations complexity
- Frequent API breaking changes between releases (cycle per year)
- Limited enterprise support outside commercial distributions (Red Hat, Canonical, Mirantis)
## Best Practices
- Use **infrastructure as code** (Terraform, Pulumi, CDK)
- Design for **failure** — every component can fail
- Implement **defense in depth** — security at every layer
- Monitor **costs** — tagging, budget alerts, anomaly detection
- Use **managed services** where it makes sense (less operations)
- **Least privilege** for all IAM roles and policies
## Resources
Links, books and standards: [sources/cloud/sources.en.md](sources/cloud/sources.en.md)
- **Cost tagging** — assign tags for chargeback/showback (Environment, Team, Cost Center, Application)
- **Automated compliance** — AWS Config, Azure Policy, GCP Org Policies for guardrails
- **Multi-account strategy** — AWS Control Tower, Azure Landing Zones, GCP Resource Hierarchy
### Recommended Reading
| Book | Authors | ISBN | Description |
|------|---------|------|-------------|
| The AI Cloud Infrastructure Blueprint | Thummarakoti, Vududala, Madupati, Kaushik | 978-1-041-16642-9 | End-to-end guide to designing, deploying, and managing AI systems on cloud platforms. Covers public/private/hybrid/multi-cloud models for AI, infrastructure for ML training and inference, MLOps. Target audience: architects, data scientists, DevOps. |
| AWS for Solutions Architects (3rd ed.) | Shrivastava, Srivastav, Thakur | 978-1-83664-193-3 | Practical guide to AWS architecture — compute (EC2, Lambda), storage (S3, EBS), databases (RDS, DynamoDB), networking, security, Well-Architected Framework, migration, cost optimization. Suitable for AWS Solutions Architect certification preparation. |
*Last revised: 2026-06-03*

495
CLOUD.md Normal file
View File

@@ -0,0 +1,495 @@
# ☁️ Cloud architektura
## Poskytovatelé
- **AWS** — největší tržní podíl, nejširší portfolio
- **Azure** — silná integrace s Microsoft ekosystémem
- **GCP** — Kubernetes (GKE), data & ML, síťová konektivita
## Modely nasazení
| Model | Popis |
|-------|-------|
| Public cloud | Sdílená infrastruktura poskytovatele |
| Private cloud | Vyhrazená infrastruktura (on-prem nebo hosted) |
| Hybrid cloud | Propojení public + private |
| Multi-cloud | Více veřejných poskytovatelů |
## Multi-cloud strategie
### Důvody pro multi-cloud
- **Vendor lock-in prevence** — diverzifikace rizika
- **Regulatorní požadavky** — data residency v konkrétních regionech
- **Best-of-breed** — každý provider má silné stránky (AWS networking, Azure enterprise, GCP data/ML)
- **Akviziční scénáře** — merge & acquisition sjednocení
### Multi-cloud connectivity
| Metoda | Latence | Propustnost | Náklady |
|--------|---------|-------------|---------|
| Site-to-Site VPN | Střední | Omezená | Nízké |
| Private interconnect (Direct Connect / ExpressRoute / Dedicated Interconnect) | Nízká | Vysoká | Vysoké |
| Cloud-to-cloud VPN | Střední | Střední | Střední |
| SD-WAN | Nízká | Vysoká | Střední |
### Výzvy
- **Síťová komplexita** — rozdílné VPC/VNet koncepty, security modely
- **IAM federace** — jednotné identity napříč cloudy (SSO, SAML, OIDC)
- **Data gravitace** — pohyb dat mezi cloudy je drahý a pomalý
- **Monitoring** — jeden pane of glass napříč cloudy (Grafana, Datadog)
### Cloud Adoption Frameworks (CAF)
Každý hlavní poskytovatel má vlastní Cloud Adoption Framework pro strukturovaný přístup k adopci cloudu:
| Poskytovatel | Rámec | Zaměření |
|-------------|-------|----------|
| AWS | AWS CAF | 6 perspektiv: Business, People, Governance, Platform, Security, Operations |
| Azure | Microsoft CAF | 8 metodik: Strategy, Plan, Ready, Migrate, Innovate, Govern, Manage, Secure |
| GCP | Google CAF | 4 pilíře: Learn, Scale, Modernize, Operate |
Multi-Cloud Administration Guide (Mulder, 2024) doporučuje kombinovat CAF rámce napříč poskytovateli pro jednotné governanční modely, zejména v oblastech:
- **Interoperabilita** — standardizace API a IaC napříč cloudy (Terraform, Pulumi)
- **Data governance** — jednotná politika pro data residency a životní cyklus dat
- **Compliance automation** — automatizované audity napříč cloudy (AWS Config, Azure Policy, GCP Org Policies)
- **Access management** — federace identit a centralizované RBAC
## Migrační strategie — 6 Rs
| Strategie | Popis | Náročnost | Typický scénář |
|-----------|-------|-----------|----------------|
| **Rehost** (Lift & Shift) | Přesun VM/as-is bez změn | Nízká | Rychlá migrace, datacentrum exit, minimální riziko |
| **Replatform** (Lift & Reshape) | Migrace s drobnými úpravami (např. RDS místo self-managed DB) | Střední | Optimalizace bez přepisování aplikace |
| **Refactor** (Re-architect) | Přepis aplikace na cloud-native (microservices, serverless) | Vysoká | Maximalizace cloudu, dlouhodobá strategie |
| **Repurchase** | Přechod na SaaS (např. Salesforce, Workday) | Nízká | Aplikace je zastaralá, existuje SaaS alternativa |
| **Retire** | Vypnutí nepotřebných aplikací | Nízká | Aplikace již není používaná, decommission |
| **Retain** | Ponechání on-prem | Žádná | Regulatorní důvody, příliš vysoké riziko migrace |
### Decision framework pro 6 Rs
```
Start: Je aplikace potřebná?
├── Ne → Retire
└── Ano → Existuje SaaS alternativa?
├── Ano → Repurchase
└── Ne → Vyplatí se refactoring?
├── Ano → Refactor
└── Ne → Stačí změna platformy?
├── Ano → Replatform
└── Ne → Rehost
```
## Well-Architected Framework (AWS)
1. **Operational Excellence** — automace, monitoring, dokumentace
2. **Security** — IAM, encryption, compliance
3. **Reliability** — recovery, škálování, záložní plány
4. **Performance Efficiency** — right-sizing, výběr správných služeb
5. **Cost Optimization** — FinOps, reserved instances, spot instances
6. **Sustainability** (od 2022) — carbon footprint, energy efficiency
Obdoby: Azure Well-Architected Framework, GCP Architecture Framework
### Klíčové otázky z Well-Architected Review (~60 otázek)
**Operational Excellence (12 otázek)**
- Jak jsou změny řízeny a automatizovány?
- Jak jsou operace dokumentovány a sdíleny v týmu?
- Jak jsou očekávané a neočekávané události reflektovány v operacích?
- Jaké runbooky existují pro běžné provozní scénáře?
- Jak probíhá incident management a postmortem proces?
**Security (12 otázek)**
- Jak je implementováno identity & access management?
- Jak jsou chráněna data v klidu a při přenosu?
- Jak je zajištěna detekce bezpečnostních incidentů?
- Jaké jsou postupy pro patch management a vulnerability remediation?
- Jak jsou řízeny infrastrukturní kredenciály a secrets?
**Reliability (12 otázek)**
- Jak je zajištěna dostupnost služby při výpadku komponenty?
- Jak je implementováno backup a disaster recovery?
- Jak service limity (quotas, throttling) ovlivňují spolehlivost?
- Jak probíhá automatické škálování při změně zátěže?
- Jaké jsou SLI/SLO metriky a jak jsou monitorovány?
**Performance Efficiency (12 otázek)**
- Jak je vybrán správný typ a velikost compute/ storage?
- Jak je optimalizována databázová vrstva (indexy, dotazy, caching)?
- Jak je monitoring využit k identifikaci úzkých hrdel?
- Jak je implementováno škálování (vertikální vs horizontální)?
**Cost Optimization (12 otázek)**
- Jak jsou náklady alokovány na týmy/projekty (chargeback/showback)?
- Jaké nástroje se používají pro analýzu nákladů?
- Jak jsou identifikovány a eliminovány nevyužité zdroje?
- Jak je optimalizováno licencování (BYOL, hybrid benefit)?
## Klíčové komponenty
### Výpočetní vrstva
- **VM / instance** — EC2, Azure VMs, GCE
- **Container orchestrace** — EKS, AKS, GKE
- **Serverless** — Lambda, Azure Functions, Cloud Functions
- **PaaS** — App Engine, Elastic Beanstalk, Azure App Service
### Compute comparison matrix (AWS EC2)
| Rodina | Typ | vCPU:Memory | Use case | Příklady cen (on-demand, us-east-1) |
|--------|-----|-------------|----------|--------------------------------------|
| **General purpose** | M7g, m7i | 1:4 | Web servery, microservices, dev/test | m7i.large ~$0.088/h |
| **Compute optimized** | C7g, c7i | 1:2 | HPC, batch processing, CI/CD, gaming | c7i.large ~$0.078/h |
| **Memory optimized** | R7g, r7i, x2idn | 1:8 až 1:32 | In-memory DB (Redis), SAP HANA, real-time analytics | r7i.large ~$0.118/h |
| **Storage optimized** | I4i, im4gn | 1:4 + NVMe | Transactional DB, data warehousing, Kafka | i4i.large ~$0.138/h |
| **GPU / ML** | P5, g5, trn1 | GPU attach | AI training (P5), inference (g5), ML (trn1) | g5.xlarge ~$1.006/h |
Viz [GPU.md](GPU.md) pro detail GPU modelů a konfigurací.
### Úložiště
- **Object storage** — S3, Blob Storage, Cloud Storage
- **Block storage** — EBS, managed disks, persistent disks
- **File storage** — EFS, Azure Files, Filestore
- **CDN** — CloudFront, Azure CDN, Cloud CDN
### S3 Storage Classes
| Třída | Dostupnost | Retrieval time | Cena / GB / měsíc | Use case |
|-------|-----------|----------------|-------------------|----------|
| **S3 Standard** | 99.99 % | milisekundy | ~$0.023 | Aktivní data, častý přístup |
| **S3 Intelligent-Tiering** | 99.9 % | milisekundy | ~$0.023 + monitoring fee | Neznámý / proměnlivý přístup |
| **S3 Standard-IA** | 99.9 % | milisekundy | ~$0.0125 | Méně častý přístup, ale rychlý |
| **S3 One Zone-IA** | 99.5 % | milisekundy | ~$0.01 | Znovu vytvořitelná data |
| **S3 Glacier Instant** | 99.9 % | milisekundy | ~$0.004 | Archiv s občasným přístupem |
| **S3 Glacier Flexible** | 99.99 % | 1-5 min (expedite) / 3-5 h (standard) | ~$0.0036 | Dlouhodobý archiv |
| **S3 Glacier Deep Archive** | 99.99 % | 12 h (standard) / 48 h (bulk) | ~$0.00099 | Nejlevnější, compliance archívy |
## Multi-AZ a Multi-Region architektura
```
Region ┌──────────────────────────────┐
│ AZ-1 AZ-2 AZ-3 │
│ ┌───┐ ┌───┐ ┌───┐ │
│ │APP│──────│APP│──────│APP│ │
│ └─┬─┘ └─┬─┘ └─┬─┘ │
│ │ │ │ │
│ ┌─▼──────────▼──────────▼─┐ │
│ │ Load Balancer │ │
│ └────────────┬────────────┘ │
│ │ │
│ ┌────────────▼────────────┐ │
│ │ Database (Primary) │ │
│ │ + Read Replica │ │
│ └─────────────────────────┘ │
└──────────────────────────────┘
```
## Disaster Recovery strategie
### DR strategie na AWS (od nejméně po nejvíce připravené)
| Strategie | RTO | RPO | Náklady | Popis |
|-----------|-----|-----|---------|-------|
| **Backup & Restore** | hodiny | 24 h | Nízké | Pravidelné zálohy dat do S3/Glacier, obnova v DR regionu |
| **Pilot Light** | desítky minut | minuty | Střední | Minimální běžící kopie (DB, core služby), škálování při failover |
| **Warm Standby** | minuty | sekundy | Vysoké | Běží zmenšená kopie produkce, škálování při failover |
| **Active-Active (Multi-Region)** | sekundy | < 1 s | Velmi vysoké | Plně aktivní ve více regionech, traffic routing (Route53, Global Accelerator) |
Klíčové knihy k tématu:
- **Engineering Resilient Systems on AWS** (Schwarz, Moran, Bachmeier, 2024) — praktické laby pro resilience vzory: back off and retry, multi-Region failover, circuit breaker, chaos engineering pomocí AWS Fault Injection Simulator
- **Building Resilient Architectures on AWS** (2025) — data security, backup strategie, automace recovery plánů
### Chaos Engineering
Cílené vnášení poruch do systému pro ověření odolnosti:
- **AWS Fault Injection Simulator (FIS)** — spravované fault injection pro EC2, ECS, EKS, RDS
- **Nástroje**: Chaos Mesh (Kubernetes), Gremlin, Litmus
- **Postup**: definice hypotézy → provedení experimentu → měření dopadu → zlepšení systému
- **Bezpečnost**: experimenty v izolovaném prostředí, safety controls, automatic rollback
## Cloud design patterns
### Strangler Fig
Postupné nahrazování částí monolitické aplikace microservices.
- Legacy funkcionalita se postupně přesměrovává na nové služby
- Strangler Fig proxy (route hlavičky, feature flagy) řídí přesun trafficu
- Výhoda: průběžné dodávání hodnoty bez big-bang přepisu
### Circuit Breaker
Zabránění kaskádovým selháním při výpadku závislé služby.
- Tři stavy: **Closed** (normální provoz), **Open** (requesty okamžitě failují), **Half-Open** (testovací request po timeoutu)
- Parametry: failure threshold, timeout (reset timeout), half-open max requests
- Implementace: resilience4j, Hystrix (legacy), Istio (envoy), AWS App Mesh
### Saga
Distribuovaná transakce napříč microservices — řada lokálních transakcí s kompenzačními akcemi.
- **Choreography** — každá služba publikuje událost, další služba reaguje (Kafka, EventBridge)
- **Orchestration** — centrální orchestrátor řídí kroky (Step Functions, Temporal, Camunda)
### CQRS (Command Query Responsibility Segregation)
Oddělení zápisových (Command) a čtecích (Query) modelů.
- Command model: optimalizovaný pro zápis (normalizovaný, transactionální)
- Query model: optimalizovaný pro čtení (denormalizovaný, read-optimized views)
- Eventual consistency mezi modely (event bus propaguje změny)
- Use case: reporting, audit logy, high-throughput systémy
### Event Sourcing
Ukládání stavu jako sekvence událostí (eventů), ne aktuálního stavu.
- Každá změna je append-only event v event store
- Současný stav = fold všech událostí
- Výhody: audit trail, time travel, CQRS kompatibilita
- Implementace: EventStoreDB, Kafka (log), DynamoDB + CDC
### Další cloudové patterny (Wilder — Cloud Architecture Patterns)
| Pattern | Kategorie | Popis |
|---------|-----------|-------|
| **Horizontally Scaling Compute** | Škálovatelnost | Přidávání/odebírání instancí dle zátěže, elasticita |
| **Queue-Centric Workflow** | Škálovatelnost | Decoupling komponent přes fronty (SQS, RabbitMQ), zpracování asynchronně |
| **Auto-Scaling** | Škálovatelnost | Automatické škálování na základě metrik (CPU, memory, request count) |
| **MapReduce** | Big Data | Distribuované zpracování dat (Hadoop, EMR, BigQuery) |
| **Database Sharding** | Big Data | Horizontální partition dat napříč databázemi |
| **Busy Signal** | Failure Handling | Graceful degradace při přetížení (HTTP 503, throttling, backpressure) |
| **Node Failure** | Failure Handling | Detekce a automatické zotavení z výpadku výpočetního uzlu |
| **Colocation** | Distribuovaní uživatelé | Umístění compute blízko datům pro snížení latence |
| **Valet Key** | Distribuovaní uživatelé | Delegovaný přístup ke storage (SAS tokeny, S3 presigned URLs) |
| **Multi-Site Deployment** | Distribuovaní uživatelé | Aktivní nasazení ve více geografických lokalitách |
## Evolutionary Architecture
Definice (Ford, Parsons, Kua, 2022): *Evoluční architektura podporuje řízenou, inkrementální změnu napříč více dimenzemi.*
### Fitness Functions
Automatizované kontroly architektonických charakteristik — obdoba testů pro architekturu:
| Typ | Popis | Příklad |
|-----|-------|---------|
| **Atomic** | Kontroluje jednu metriku | Cyclomatic complexity < 10 |
| **Holistic** | Kontroluje celkový systém | Latence end-to-end < 200 ms |
| **Triggered** | Spouštěná událostí (CI/CD commit, deployment) | Ověření API kontraktu |
| **Continous** | Běží nepřetržitě v produkci | Monitoring dependency freshness |
| **Static** | Analýza kódu bez běhu | SonarQube, ESLint |
| **Dynamic** | Analýza za běhu | Load testy, chaos experimenty |
### Principy evoluční architektury
1. **Inkrementální změna** — malé, bezpečné změny díky CI/CD, deployment pipelines, zralému DevOps
2. **Fitness funkce** — automatizovaná ochrana architektonických charakteristik (škálovatelnost, performance, bezpečnost)
3. **Správa couplingů** — vědomá práce s propojením komponent (affinity, volatility, cykly)
4. **Evoluční data** — databázové migrace jako first-class občan (evoluční schemata, expand-contract pattern)
### Antipatterny
- **Big Design Up Front (BDUF)** — snaha navrhnout vše předem, ignoruje změny
- **No Design at All** — absence architektonického myšlení, čistě emergentní design
- **Premature Standardization** — zavedení standardů dříve, než je známe domény
## Hybrid cloud konektivita
Viz také: [NETWORKING.md](NETWORKING.md) — síťová architektura (VPN, BGP, VPC design).
- **Site-to-Site VPN** — IPSec tunel přes internet
- **Direct Connect / ExpressRoute / Dedicated Interconnect** — privátní fyzické propojení
- **Cloud VPN / Transit Gateway** — hub-and-spoke topologie
## Cost optimization detail
### Savings Plans vs Reserved Instances
| Vlastnost | Compute Savings Plan | EC2 Instance Savings Plan | Reserved Instances |
|-----------|---------------------|---------------------------|-------------------|
| Flexibilita | Instance family, region, OS | Instance family + region | Specifická instance |
| Termín | 1 nebo 3 roky | 1 nebo 3 roky | 1 nebo 3 roky |
| Sleva (typicky) | ~30-50 % | ~40-60 % | ~40-60 % |
| Změna instance | Ano (libovolná) | Ano (v rámci rodiny) | Ne |
| Změna regionu | Ano | Ne | Ne |
| Payment options | All Upfront / Partial / No Upfront | All Upfront / Partial / No Upfront | All Upfront / Partial / No Upfront |
### Spot instance best practices
- **Diverzifikace** — používejte mix instance typů (spot fleet) pro vyšší dostupnost
- **Graceful handling** — aplikace musí zvládnout termination notice (2 minuty varování)
- **Checkpointing** — pravidelné ukládání stavu pro restart po spot přerušení
- **Spot block** (AWS) — ochrana na 1-6 h (omezená dostupnost)
- **Použití**: batch processing, CI/CD runners, stateless microservices, ML training
- **Vyhnout se**: stateful workloads, databáze (bez speciálního designu)
## Organizace a governance
### AWS Organizations
```
Root OU
├── Security OU
│ ├── Audit Account (CloudTrail, Config)
│ └── Security Tooling Account (GuardDuty, Security Hub)
├── Infrastructure OU
│ ├── Network Account (Transit Gateway, VPN)
│ ├── Shared Services Account (AD, SSO)
│ └── Log Archive Account
├── Workloads OU
│ ├── Dev OU → jednotlivé dev accounts
│ ├── Staging OU → staging accounts
│ └── Prod OU → production accounts
└── Sandbox OU → izolované experimentální účty
```
- **SCP** (Service Control Policies) — whitelist/blacklist služeb na OU úrovni
- **Tag policies** — enforcement tagování napříč účty
- **AI services opt-out** — kontrola použití dat v AWS AI službách
### Azure Management Groups
```
Tenant Root Group
├── Platform MG
│ ├── Connectivity (hub VNet, ExpressRoute)
│ ├── Management (Log Analytics, Automation)
│ └── Identity (AD DS, PIM)
├── Application MG
│ ├── DEV (dev subscriptions)
│ ├── TEST (test subscriptions)
│ └── PROD (production subscriptions)
└── Sandbox MG
```
- **Azure Policy** — built-in a custom policies (podobné SCP)
- **Management Group hierarchy** — až 6 úrovní hloubky
- **Subscription limits** — max 10 000 subscriptions na tenant
### GCP Projects
```
Organization Node
├── Folder: Platform
│ ├── Project: Shared Networking (VPC, Cloud NAT, VPN)
│ ├── Project: Security (Cloud KMS, Secret Manager, Chronicle)
│ └── Project: Monitoring (Cloud Monitoring, Logging)
├── Folder: Workloads
│ ├── Folder: Dev
│ │ └── Project: [aplikace]-dev
│ ├── Folder: Staging
│ │ └── Project: [aplikace]-staging
│ └── Folder: Prod
│ └── Project: [aplikace]-prod
└── Folder: Sandbox
└── Project: [user]-sandbox
```
- **Organization policies** — constrainty na úrovni organizace/folderu
- **Resource Manager** — hierarchie: Organization → Folder → Project → Resources
- **Project limits** — max 30 projektů (lze navýšit), resources per project 10k
## 12-Factor App metodologie
Metodologie pro building cloud-native aplikací (Heroku, 2011), rozšířená knihou **Multi-Cloud Handbook for Developers** (Natarajan, Jacob, 2024).
| # | Faktor | Popis | Cloudová implementace |
|---|--------|-------|----------------------|
| 1 | **Codebase** | Jeden repozitář, mnoho deploymentů | Git + CI/CD pipeline |
| 2 | **Dependencies** | Explicitní deklarace závislostí | package.json, requirements.txt, Docker image |
| 3 | **Config** | Konfigurace v proměnných prostředí | Secrets Manager, Parameter Store, env vars |
| 4 | **Backing services** | Závislé služby jako připojené zdroje | RDS, S3, Redis — připojení přes connection string |
| 5 | **Build, release, run** | Striktní oddělení fází sestavení | CI/CD pipeline (GitHub Actions, GitLab CI) |
| 6 | **Processes** | Aplikace jako bezstavové procesy | Horizontální škálování, session v Redis |
| 7 | **Port binding** | Služba exportuje port, není vložena do serveru | Express, FastAPI, Spring Boot na vlastním portu |
| 8 | **Concurrency** | Škálování pomocí procesního modelu | Horizontal Pod Autoscaler (K8s), EC2 Auto Scaling |
| 9 | **Disposability** | Rychlý start a graceful shutdown | Health checks, SIGTERM handling, preStop hooks |
| 10 | **Dev/Prod parity** | Co nejmenší rozdíl mezi prostředími | Docker, IaC (Terraform), stejné backing services |
| 11 | **Logs** | Logy jako event streamy | stdout/stderr → CloudWatch, ELK, Datadog |
| 12 | **Admin processes** | Administrativní úlohy jako one-off procesy | DB migrace, data backfill — spuštěno v izolaci |
### Rozšíření pro multi-cloud (Multi-Cloud Handbook for Developers)
- **API-first design** — konzistentní API rozhraní napříč cloudy (REST, gRPC)
- **Domain-Driven Design (DDD)** — ohraničené kontexty mapované na cloudové služby
- **Service Mesh** — Istio, Linkerd pro observabilitu, traffic management a security napříč cloudy
- **GitOps** — declarativní deployment s ArgoCD/Flux napříč Kubernetes clustery v různých cloudech
## Azure Cloud Native Architecture (mapová příručka)
Na základě **The Azure Cloud Native Architecture Mapbook (2nd ed.)** (Eyskens, 2025) — 40+ architektonických map napříč doménami:
### Domény architektonických map
| Doména | Klíčové služby Azure | Architektonické vzory |
|--------|---------------------|----------------------|
| **Infrastructure** | VNet, Azure Firewall, ExpressRoute, VPN Gateway | Hub-and-spoke, Virtual WAN, Private Link |
| **Applications** | App Service, API Management, Service Bus, Functions | Event-driven, Strangler Fig, Backend for Frontend |
| **Data** | Cosmos DB, SQL Database, Synapse, Data Lake | CQRS, Event Sourcing, Polyglot Persistence |
| **Container Orchestrators** | AKS, Azure Container Apps, ACA | Sidecar, Ambassador, Adapter (service mesh) |
| **AI** | Azure OpenAI, Cognitive Services, ML Studio | RAG, model fine-tuning, MLOps |
| **Security** | Entra ID, Defender for Cloud, Key Vault, Sentinel | Zero Trust, Defense in depth, JIT Access |
### Využití Cloud Adoption Framework na Azure
- **Strategy** — business case, katalog aplikací, racionalizace portfolia
- **Plan** — landing zone design, governance baseline, subscription taxonomy
- **Ready** — implementace landing zones (ALZ), Azure Policy, Networking, Identity
- **Migrate** — assessment (Azure Migrate), rehost/replatform, test a cutover
- **Govern** — cost management, policy enforcement, compliance monitoring
## Srovnání cloudových poskytovatelů
Na základě **Cloud Computing: AWS, Azure, Google Cloud** (Sario, 2025):
| Oblast | AWS | Azure | GCP |
|--------|-----|-------|-----|
| **Compute** | EC2, Lambda, ECS/EKS | VMs, Functions, AKS | GCE, Cloud Functions, GKE |
| **Storage** | S3, EBS, EFS | Blob, Disk, Files | Cloud Storage, Persistent Disk, Filestore |
| **Databáze relační** | RDS (MySQL, PG, SQL Server, Oracle, MariaDB) | SQL Database, MySQL/PostgreSQL | Cloud SQL (MySQL, PG, SQL Server) |
| **Databáze NoSQL** | DynamoDB, ElastiCache | Cosmos DB, Redis Cache | Firestore, Bigtable, Memorystore |
| **Message queue** | SQS, SNS | Service Bus, Queue Storage | Pub/Sub, Tasks |
| **Observabilita** | CloudWatch, X-Ray | Monitor, Application Insights | Cloud Monitoring, Cloud Trace |
| **AI/ML** | SageMaker, Bedrock | Azure ML, OpenAI | Vertex AI, AutoML |
| **Cena (compute)** | On-demand, Reserved, Spot, Savings Plan | Pay-as-you-go, Reserved, Spot | On-demand, Committed Use, Spot |
## OpenStack jako Private Cloud
OpenStack je dominantní open-source platforma pro budování private cloudu (IaaS). Poskytuje výpočetní (Nova), síťové (Neutron) a storage služby (Cinder/Swift/Manila) s jednotným API.
### Výhody oproti komerčním řešením
- **Vendor-neutral API** — vyhne se lock-in (VMware, Hyper-V)
- **Multi-tenancy** — Keystone identity, RBAC, projekty, quoty
- **Hybrid cloud ready** — federation s AWS/Azure/GCP, Terraform provisioning
- **Ekosystém** — stovky služeb (Heat orchestrace, Magnum containers, Designate DNS)
### Vhodné scénáře
| Scénář | Klíčové služby |
|--------|---------------|
| Datacentrum s multi-tenancy a self-service | Nova, Neutron, Cinder, Horizon |
| Telco / NFVI / MEC | Neutron (DPDK, SR-IOV), Nova (NUMA pinning) |
| Věda a HPC | Cyborg (GPU), Manila (NAS), Ironic (bare metal) |
| Akademické cloudy | Keystone federation, Trove (DBaaS) |
### Výzvy
- Významná komplexita nasazení a provozu
- Časté API breaking changes mezi releasy (cycle per year)
- Omezená enterprise podpora mimo komerční distribuce (Red Hat, Canonical, Mirantis)
## Best practices
- Používejte **infrastructure as code** (Terraform, Pulumi, CDK)
- Designujte pro **failure** — každá komponenta může spadnout
- Implementujte **defense in depth** — security na každé vrstvě
- Monitorujte **náklady** — taggování, budget alerts, anomaly detection
- Používejte **managed services** kde to dává smysl (méně operací)
- **Least privilege** pro všechny IAM role a politiky
## Zdroje
Odkazy, knihy a standardy: [sources/cloud/sources.md](sources/cloud/sources.md)
- **Cost tagging** — assign tags pro chargeback/showback (Environment, Team, Cost Center, Application)
- **Automated compliance** — AWS Config, Azure Policy, GCP Org Policies pro guardrails
- **Multi-account strategie** — AWS Control Tower, Azure Landing Zones, GCP Resource Hierarchy
### Doporučená literatura
| Kniha | Autoři | ISBN | Popis |
|-------|--------|------|-------|
| The AI Cloud Infrastructure Blueprint | Thummarakoti, Vududala, Madupati, Kaushik | 978-1-041-16642-9 | End-to-end průvodce návrhem, deploymentem a správou AI systémů na cloudových platformách. Pokrývá public/private/hybrid/multi-cloud modely pro AI, infrastrukturu pro ML trénování a inferenci, MLOps. Cílová skupina: architekti, data scientists, DevOps. |
| AWS for Solutions Architects (3rd ed.) | Shrivastava, Srivastav, Thakur | 978-1-83664-193-3 | Praktický průvodce AWS architekturou — compute (EC2, Lambda), storage (S3, EBS), databáze (RDS, DynamoDB), networking, security, Well-Architected Framework, migrace, cost optimization. Vhodné pro přípravu na AWS Solutions Architect certifikaci. |
*Poslední revize: 2026-06-03*

270
CONNECTIVITY.en.md Normal file
View File

@@ -0,0 +1,270 @@
# 🔌 Server connectivity — network and storage connectivity
## Ethernet — network connectivity
### Speeds and formats
| Speed | Designation | Form factor | Cabling | Standard year | Use case |
|----------|----------|-------------|---------|---------------|----------|
| **1 GbE** | 1000BASE-T | RJ45 (copper) | Cat5e/Cat6 | 1999 | Management, legacy |
| **10 GbE** | 10GBASE-T / SFP+ | RJ45 / SFP+ | Cat6A (30m) / Cat7 (100m) / DAC / SR/LR | 2006 | Common server, storage |
| **25 GbE** | 25GBASE-R | SFP28 | Cat8 (30m) / DAC (5m) / SR/LR (100m/10km) | 2016 | Standard for servers (2020+) |
| **40 GbE** | 40GBASE-R | QSFP+ | DAC (7m) / SR (150m) / LR (10km) | 2010 | Legacy, spine |
| **50 GbE** | 50GBASE-R | SFP56 | DAC / SR / LR | 2018 | Emerging server |
| **100 GbE** | 100GBASE-R | QSFP28 | DAC (3m) / SR4 (100m) / LR4 (10km) / PSM4 (500m) | 2015 | Spine, storage, AI |
| **200 GbE** | 200GBASE-R | QSFP56 | DAC / SR4 / DR4 | 2019 | AI/ML, HPC |
| **400 GbE** | 400GBASE-R | QSFP-DD / OSFP | DAC (2.5m) / SR8 (100m) / DR4 (500m) / FR4 (2km) | 2017 | AI training, hyperscale |
| **800 GbE** | 800GBASE-R | QSFP-DD800 / OSFP | DAC (2m) / SR8 (100m) / DR8 (500m) | 2024 | Next-gen AI/ML |
**Recommendations for servers (2026)**:
- **Standard**: 2× 25 GbE (management + data) or 2× 100 GbE for demanding workloads
- **AI/ML training**: 8× 400 GbE (InfiniBand preferred for GPU communication)
- **Storage**: 2× 25/100 GbE (iSCSI/NFS) or dedicated FC (16/32 Gbps)
### NIC form factor
| Form factor | PCIe lanes | Speed | Use case |
|------------|-----------|----------|----------|
| **OCP 3.0** | x8/x16 | 25/100/200 GbE | Modern servers (Dell, HPE), small form factor |
| **PCIe HHHL** | x8 | 25/50 GbE | Standard 1U/2U servers |
| **PCIe FHHL** | x16 | 100/200/400 GbE | GPU servers, high-density |
| **Mezzanine** | x8 | 10/25 GbE | Blade servers (HPE Synergy, Dell MX) |
| **LOM (LAN on Motherboard)** | — | 1/10/25 GbE | Integrated, basic connectivity |
### NIC features
| Feature | Description | Benefit |
|---------|-------|---------|
| **TSO/GRO** | TCP Segmentation Offload / Generic Receive Offload | Reduced CPU load for TCP |
| **LRO/LSO** | Large Receive/Send Offload | Equivalent of TSO/GRO for legacy |
| **RSS** | Receive Side Scaling | Distribution of incoming packets across multiple CPU cores |
| **RPS/RFS** | Receive Packet Steering / Flow Steering | Software RSS, cache affinity |
| **XDP** | eXpress Data Path | BPF-based packet processing (DDoS, load balancer) |
| **RDMA (RoCE v2)** | RDMA over Converged Ethernet | GPU direct communication, storage (NVMe-oF) |
| **iWARP** | RDMA over TCP | RDMA without special switch (higher latency) |
| **DPDK** | Data Plane Development Kit | Userspace for packet processing (VNF, vSwitch) |
| **VXLAN/NVGRE offload** | HW offload for tunneling | Overlay networking (VMware NSX, OpenStack) |
| **SR-IOV** | Single Root I/O Virtualization | Direct NIC access for VMs (VF), low latency |
| **Flow Bifurcation** | Split NIC traffic between kernel and DPDK | Concurrent management and high-speed data path |
| **PTP (IEEE 1588)** | Precision Time Protocol | Financial services, 5G, telco |
### NIC selection per workload
| Workload | Recommended NIC | Rationale |
|----------|---------------|------------|
| **Web / API servers** | 2× 25 GbE SFP28, OCP | Low cost, sufficient bandwidth |
| **Virtualization (VMware)** | 2× 25 GbE (SR-IOV, VXLAN offload) | SR-IOV for VMs, VXLAN for NSX |
| **Database (OLTP)** | 2× 25/100 GbE (RSS, low latency) | Low latency, RSS for CPU scaling |
| **Storage (NFS/iSCSI)** | 2× 25/100 GbE (RoCE v2) | RDMA for NVMe-oF, low latency |
| **Storage (FC SAN)** | 2× 32 Gb FC HBA | SAN for VMware VMFS, block storage |
| **AI/ML training** | 8× 400 GbE + InfiniBand NDR | GPU communication, data ingestion |
| **AI/ML inference** | 4× 100 GbE (RoCE v2) | Model serving, GPU direct |
| **HPC** | InfiniBand NDR 400 Gbps | MPI communication, low latency |
| **Telco / Edge** | 2× 25 GbE (DPDK, PTP) | VNF, 5G UPF, low latency |
---
## Storage connectivity
### Fibre Channel (FC) SAN
| Generation | Speed | Designation | Form factor | Reach (SMF) | Use case |
|----------|----------|----------|-------------|-------------|----------|
| **Gen 5** | 16 Gbps | 16GFC | SFP+ | 10 km | Legacy SAN |
| **Gen 6** | 32 Gbps | 32GFC | SFP28 | 10 km | Current standard |
| **Gen 7** | 64 Gbps | 64GFC | SFP56 | 10 km | Emerging, high-performance |
| **Gen 8** | 128 Gbps | 128GFC | QSFP28 | 10 km | Emerging (first production deployments) |
**HBA (Host Bus Adapter)**:
| Manufacturer | Model | Speed | PCIe | Ports | Features |
|---------|-------|----------|------|-------|----------|
| **Broadcom / Emulex** | LPe35000 | 32 GFC | PCIe 3.0 x8 | 1-2 | NVMe-FC, T10-PI, SR-IOV |
| **Broadcom / Emulex** | LPe36000 | 64 GFC | PCIe 4.0 x16 | 1-2 | NVMe-FC, FC-NVMe |
| **Marvell / QLogic** | QLE2770 | 32 GFC | PCIe 3.0 x8 | 1-2 | FC-NVMe, T10-PI |
| **Marvell / QLogic** | QLE2870 | 64 GFC | PCIe 4.0 x8 | 1-2 | NVMe-FC, 64GFC |
**FC SAN topology**:
```
Server ──HBA── FC Switch ──── Storage Array (FC port)
│ │
│ ┌────┴────┐
│ │ Fabric │
│ └─────────┘
──── ISL (Inter-Switch Link) ──── backup fabric (B)
```
**Zoning** (FC):
```
Zone A: Server1_HBA1 + Storage_Port1 (production)
Zone B: Server1_HBA2 + Storage_Port2 (backup fabric)
Zone C: Backup_Server + Storage_Target (backup)
```
### iSCSI
| Property | iSCSI | Note |
|-----------|-------|----------|
| **Transport** | TCP/IP (port 3260) | Over standard Ethernet |
| **Speed** | 1/10/25/100 GbE | Same as Ethernet |
| **Initiator** | SW (OS) or HW (TOE) | SW initiator free, ~5-10 % CPU load |
| **Multipathing** | MPIO (Multiple Connections per Session) | Up to 8 paths, active/active or active/passive |
| **CHAP** | Authentication | Mutual CHAP recommended |
| **Jumbo frames** | Recommended MTU 9000 | Reduced CPU overhead, higher throughput |
| **Use case** | Small and medium SAN, backup, DR | Cheaper than FC, lower performance |
**iSCSI configuration**:
```
# Software initiator (Linux)
iscsiadm -m discovery -t sendtargets -p 10.0.0.100:3260
iscsiadm -m node --login -T iqn.2024-05.storage:array01
# Multipath (dm-multipath)
mpathconf --enable --with_multipathd y
# /etc/multipath.conf: aliases, failback, rr_min_io
```
### NVMe-oF (NVMe over Fabrics)
| Transport | Protocol | Latency | CPU overhead | Use case |
|-----------|----------|---------|-------------|----------|
| **NVMe over FC** | FC-NVMe (FC Gen 6/7) | <10 µs | Low | Enterprise SAN, VMware |
| **NVMe over RDMA (RoCE v2)** | RDMA (RoCE) | <5 µs | Very low | AI/ML, HPC, K8s (CSI) |
| **NVMe over TCP** | TCP | ~50 µs | Moderate (10-20 % CPU) | Standard Ethernet, no RDMA |
| **NVMe over InfiniBand** | IB RC/UC | <3 µs | Lowest | HPC, AI training |
**NVMe-oF comparison**:
| Property | FC-NVMe | NVMe/RoCE | NVMe/TCP | NVMe/IB |
|-----------|---------|-----------|----------|---------|
| **Latency (target)** | ~8 µs | ~4 µs | ~50 µs | ~3 µs |
| **Bandwidth** | 64 Gbps | 100/200 GbE | 25/100 GbE | NDR 400 Gbps |
| **Requires special HW** | FC HBA + switch | RoCE NIC + DCB switch | Standard NIC | IB HCA + switch |
| **Ecosystem** | Broadcom, Marvell | NVIDIA, Broadcom | OS built-in | NVIDIA Mellanox |
| **Use case** | VMware, enterprise SAN | AI/ML, K8s, HPC | SMB, K8s, cost-effective | HPC, large AI |
### SAS (Serial Attached SCSI)
| Generation | Speed | Cabling | Reach | Use case |
|----------|----------|---------|-------|----------|
| **SAS 3** | 12 Gbps | SAS cable (SFF-8644) | 6-10 m | Legacy storage, DAS |
| **SAS 4** | 22.5 Gbps | SAS cable (SFF-8644) | 6-10 m | Current standard |
| **SAS 5** | 45 Gbps | SAS cable (SFF-8644) | 6-10 m | Emerging |
**SAS topology**: Server → SAS HBA → SAS expander → SAS disk (point-to-point, not shared like FC)
---
## Server connectivity — decision matrix
| Workload | Primary | Secondary | Management |
|----------|----------|-----------|------------|
| **Web / API** | 2× 25 GbE (LACP) | — | 1× 1 GbE BMC |
| **Database** | 2× 25/100 GbE (RSS) | 2× 32 Gb FC (SAN) | 1× 1 GbE BMC |
| **Virtualization** | 4× 25 GbE (SR-IOV) | 2× 32 Gb FC (VMFS) | 1× 1 GbE BMC |
| **Kubernetes** | 2× 25/100 GbE | — | 1× 1 GbE BMC |
| **Storage node** | 2× 100 GbE (RoCE) | 2× 25 GbE (management) | 1× 1 GbE BMC |
| **AI training** | 8× 400 GbE + IB NDR | 4× 100 GbE (storage) | 1× 1 GbE BMC |
| **AI inference** | 4× 100 GbE (RoCE) | 2× 25 GbE (management) | 1× 1 GbE BMC |
| **HPC** | InfiniBand NDR | 2× 100 GbE (storage) | 1× 1 GbE BMC |
---
## Server NIC placement (PCIe slot optimization)
```
2U Server (GPU/AI):
┌─────────────────────────────────────────────────┐
│ PCIe 0: GPU (x16) — NVLink / InfiniBand (x16) │
│ PCIe 1: GPU (x16) — NIC 100 GbE (x16) │
│ PCIe 2: GPU (x16) │
│ PCIe 3: GPU (x16) │
│ PCIe 4: GPU (x16) │
│ PCIe 5: GPU (x16) — NIC 100 GbE (x16) │
│ PCIe 6: Storage HBA / NIC (x8) │
│ PCIe 7: Management / OCP (x8) │
└─────────────────────────────────────────────────┘
1U Standard:
┌─────────────────────────────────┐
│ OCP: 2× 25 GbE (management) │
│ PCIe 0: NIC 25 GbE (x8) │
│ PCIe 1: Storage HBA / FC (x8) │
│ PCIe 2: GPU (x16, optional) │
│ PCIe 3: NVMe (x4, M.2) │
└─────────────────────────────────┘
```
### NVIDIA Mellanox ConnectX NICs
NVIDIA Mellanox is a leading manufacturer of NIC adapters for AI/HPC and cloud data centers.
| Model | PCIe | Max speed | Form factor | Key features |
|-------|------|-------------|-------------|------------------|
| **ConnectX-5** | PCIe 3.0 x16 | 100 GbE (dual) | HHHL | RoCE, NVMe-oF target offload, MPI offload |
| **ConnectX-6 Dx** | PCIe 4.0 x16 | 200 GbE (1-port) / 100 GbE (2-port) | HHHL, OCP 3.0 | ASAP² vSwitch offload, IPsec/TLS inline crypto, AES-XTS, 215 Mpps DPDK |
| **ConnectX-6 Lx** | PCIe 4.0 x8 | 25 GbE (dual) | HHHL, OCP 3.0 | RoCE, Secure Boot, low-power |
| **ConnectX-7** | PCIe 5.0 x16 | 400 GbE (1-port) / 200 GbE (2-port) | HHHL | NDR InfiniBand + 400GbE, GPUDirect, SHARP |
| **ConnectX-8** | PCIe 6.0 x16 | 800 GbE (1-port) / 400 GbE (2-port) | HHHL | XDR InfiniBand, sub-500ns latency, in-network computing, multi-host |
**Platforms**: Spectrum-X Ethernet (end-to-end AI networking), Quantum InfiniBand, BlueField DPU.
### Broadcom Emulex FC HBA
| Model | Speed | PCIe | Ports | Features |
|-------|----------|------|-------|----------|
| **LPe35000** (Gen 7) | 32 GFC | PCIe 3.0 x8 | 1-2 | NVMe-FC, T10-PI (DIF), SR-IOV, Silicon Root of Trust |
| **LPe35002** (Gen 7) | 32 GFC | PCIe 3.0 x8 | 2 | NVMe-FC, Secure Boot, digitally signed firmware |
| **LPe36000** (Gen 7) | 64 GFC | PCIe 4.0 x16 | 1-2 | First 64GFC HBA on the market, 10M IOPS, 3× better latency than Gen 6 |
**Key features**: NVMe over FC support, T10 DIF (Data Integrity Field), 10M MTBF, NIST SP 800-193 compliant. Gen 7 delivers up to 10M IOPS and 3× lower latency compared to Gen 6.
### NVMe-oF specification
NVMe over Fabrics (NVMe-oF) extends the NVMe protocol from local PCIe to network transports. First specification 1.0 released in June 2016, currently part of NVMe 2.3 (August 2025). Supported transports:
| Transport | Specification | Use case |
|-----------|------------|----------|
| **NVMe over PCIe** | NVMe Base | Local NVMe SSD |
| **NVMe over RDMA** (RoCE, InfiniBand, iWARP) | NVMe Transport | AI/ML, HPC, lowest latency <5 µs |
| **NVMe over TCP** | NVMe Transport | Standard Ethernet, no RDMA, latency ~50 µs |
| **NVMe over FC** (FC-NVMe) | INCITS T11 | Enterprise SAN, FC fabric |
NVMe 2.3 adds Computational Programs Command Set, Storage Level Management (SLM), and Zoned Namespaces (ZNS). NVMe-MI defines the management interface.
### Dell PowerEdge R760 — NIC placement
Dell R760 server supports:
- **OCP 3.0** adapters (up to 2×) — 1/10/25/100 GbE
- **PCIe Gen5** slots — 8× slots (6× FHHL + 2× LP)
- **LOM** — 2× 1 GbE Broadcom 5720 on motherboard
- Maximum NIC speed: 100 GbE (QSFP56)
- Supported types: RJ45, SFP+, SFP28, QSFP28, QSFP56
Recommended configurations:
- Standard: OCP 3.0 2× 25 GbE + PCIe storage HBA
- AI/ML: PCIe 100 GbE (riser config 1, slot 1-2) + GPU in other slots
### HPE Gen11 NIC options
HPE ProLiant Gen11 (DL360/DL380) supports:
- **OCP 3.0** slots (up to 2) — 10/25/100/200 GbE (Broadcom, Intel, NVIDIA Mellanox)
- **PCIe Gen5** adapters — 8× slots (DL380) / 3× slots (DL360)
- **iLO 6** dedicated management port (1 GbE)
- Supported NICs: Broadcom BCM57412 (10GbE), BCM57504 (25GbE), NVIDIA ConnectX-6 Dx (100GbE)
## Sources
Links, books, and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)
### Recommended literature
| Book | Authors | ISBN | Description |
|-------|--------|------|-------|
| AI Data Center Network Design and Technologies (1st ed., 2026) | Subramaniam, Styszynski, Tambakuwala | 978-0-13-543628-8 | First vendor-agnostic guide to network design for AI training and inference. Covers high-radix fabric, lossless Ethernet/IP, UEC technologies, cooling and power for AI clusters. Authors from HPE Juniper Networking. |
*Last revision: 2026-06-03*

270
CONNECTIVITY.md Normal file
View File

@@ -0,0 +1,270 @@
# 🔌 Server connectivity — síťová a storage konektivita
## Ethernet — síťová konektivita
### Rychlosti a formáty
| Rychlost | Označení | Form factor | Kabeláž | Rok standardu | Use case |
|----------|----------|-------------|---------|---------------|----------|
| **1 GbE** | 1000BASE-T | RJ45 (copper) | Cat5e/Cat6 | 1999 | Management, legacy |
| **10 GbE** | 10GBASE-T / SFP+ | RJ45 / SFP+ | Cat6A (30m) / Cat7 (100m) / DAC / SR/LR | 2006 | Běžný server, storage |
| **25 GbE** | 25GBASE-R | SFP28 | Cat8 (30m) / DAC (5m) / SR/LR (100m/10km) | 2016 | Standard pro servery (2020+) |
| **40 GbE** | 40GBASE-R | QSFP+ | DAC (7m) / SR (150m) / LR (10km) | 2010 | Legacy, spine |
| **50 GbE** | 50GBASE-R | SFP56 | DAC / SR / LR | 2018 | Emerging server |
| **100 GbE** | 100GBASE-R | QSFP28 | DAC (3m) / SR4 (100m) / LR4 (10km) / PSM4 (500m) | 2015 | Spine, storage, AI |
| **200 GbE** | 200GBASE-R | QSFP56 | DAC / SR4 / DR4 | 2019 | AI/ML, HPC |
| **400 GbE** | 400GBASE-R | QSFP-DD / OSFP | DAC (2.5m) / SR8 (100m) / DR4 (500m) / FR4 (2km) | 2017 | AI training, hyperscale |
| **800 GbE** | 800GBASE-R | QSFP-DD800 / OSFP | DAC (2m) / SR8 (100m) / DR8 (500m) | 2024 | Next-gen AI/ML |
**Doporučení pro servery (2026)**:
- **Standard**: 2× 25 GbE (management + data) nebo 2× 100 GbE pro náročné workloady
- **AI/ML training**: 8× 400 GbE (InfiniBand preferován pro GPU communication)
- **Storage**: 2× 25/100 GbE (iSCSI/NFS) nebo dedikovaná FC (16/32 Gbps)
### Form factor NIC
| Form factor | PCIe lanes | Rychlost | Use case |
|------------|-----------|----------|----------|
| **OCP 3.0** | x8/x16 | 25/100/200 GbE | Moderní servery (Dell, HPE), small form factor |
| **PCIe HHHL** | x8 | 25/50 GbE | Standardní 1U/2U servery |
| **PCIe FHHL** | x16 | 100/200/400 GbE | GPU servery, high-density |
| **Mezzanine** | x8 | 10/25 GbE | Blade servery (HPE Synergy, Dell MX) |
| **LOM (LAN on Motherboard)** | — | 1/10/25 GbE | Integrovaný, základní konektivita |
### NIC features
| Feature | Popis | Benefit |
|---------|-------|---------|
| **TSO/GRO** | TCP Segmentation Offload / Generic Receive Offload | Snížení CPU zátěže pro TCP |
| **LRO/LSO** | Large Receive/Send Offload | Obdoba TSO/GRO pro legacy |
| **RSS** | Receive Side Scaling | Distribuce příchozích packetů přes více CPU jader |
| **RPS/RFS** | Receive Packet Steering / Flow Steering | Softwarové RSS, cache affinity |
| **XDP** | eXpress Data Path | BPF-based packet processing (DDoS, load balancer) |
| **RDMA (RoCE v2)** | RDMA over Converged Ethernet | GPU direct communication, storage (NVMe-oF) |
| **iWARP** | RDMA over TCP | RDMA bez speciálního switch (vyšší latence) |
| **DPDK** | Data Plane Development Kit | Uživatelský prostor pro packet processing (VNF, vSwitch) |
| **VXLAN/NVGRE offload** | HW offload pro tunelování | Overlay networking (VMware NSX, OpenStack) |
| **SR-IOV** | Single Root I/O Virtualization | Direct NIC access pro VM (VF), nízká latence |
| **Flow Bifurcation** | Split NIC traffic mezi kernel a DPDK | Souběžný management a high-speed data path |
| **PTP (IEEE 1588)** | Precision Time Protocol | Finanční služby, 5G, telco |
### NIC selection per workload
| Workload | Doporučená NIC | Zdůvodnění |
|----------|---------------|------------|
| **Web / API servery** | 2× 25 GbE SFP28, OCP | Nízká cena, dostatečná bandwidth |
| **Virtualizace (VMware)** | 2× 25 GbE (SR-IOV, VXLAN offload) | SR-IOV pro VM, VXLAN pro NSX |
| **Databáze (OLTP)** | 2× 25/100 GbE (RSS, low latency) | Nízká latence, RSS pro CPU scaling |
| **Storage (NFS/iSCSI)** | 2× 25/100 GbE (RoCE v2) | RDMA pro NVMe-oF, low latency |
| **Storage (FC SAN)** | 2× 32 Gb FC HBA | SAN pro VMware VMFS, block storage |
| **AI/ML training** | 8× 400 GbE + InfiniBand NDR | GPU communication, data ingestion |
| **AI/ML inference** | 4× 100 GbE (RoCE v2) | Model serving, GPU direct |
| **HPC** | InfiniBand NDR 400 Gbps | MPI communication, low latency |
| **Telco / Edge** | 2× 25 GbE (DPDK, PTP) | VNF, 5G UPF, low latency |
---
## Storage connectivity
### Fibre Channel (FC) SAN
| Generace | Rychlost | Označení | Form factor | Dosah (SMF) | Use case |
|----------|----------|----------|-------------|-------------|----------|
| **Gen 5** | 16 Gbps | 16GFC | SFP+ | 10 km | Legacy SAN |
| **Gen 6** | 32 Gbps | 32GFC | SFP28 | 10 km | Současný standard |
| **Gen 7** | 64 Gbps | 64GFC | SFP56 | 10 km | Emerging, high-performance |
| **Gen 8** | 128 Gbps | 128GFC | QSFP28 | 10 km | Emerging (první produkční nasazení) |
**HBA (Host Bus Adapter)**:
| Výrobce | Model | Rychlost | PCIe | Porty | Features |
|---------|-------|----------|------|-------|----------|
| **Broadcom / Emulex** | LPe35000 | 32 GFC | PCIe 3.0 x8 | 1-2 | NVMe-FC, T10-PI, SR-IOV |
| **Broadcom / Emulex** | LPe36000 | 64 GFC | PCIe 4.0 x16 | 1-2 | NVMe-FC, FC-NVMe |
| **Marvell / QLogic** | QLE2770 | 32 GFC | PCIe 3.0 x8 | 1-2 | FC-NVMe, T10-PI |
| **Marvell / QLogic** | QLE2870 | 64 GFC | PCIe 4.0 x8 | 1-2 | NVMe-FC, 64GFC |
**FC SAN topology**:
```
Server ──HBA── FC Switch ──── Storage Array (FC port)
│ │
│ ┌────┴────┐
│ │ Fabric │
│ └─────────┘
──── ISL (Inter-Switch Link) ──── backup fabric (B)
```
**Zoning** (FC):
```
Zone A: Server1_HBA1 + Storage_Port1 (production)
Zone B: Server1_HBA2 + Storage_Port2 (backup fabric)
Zone C: Backup_Server + Storage_Target (backup)
```
### iSCSI
| Vlastnost | iSCSI | Poznámka |
|-----------|-------|----------|
| **Transport** | TCP/IP (port 3260) | Po standardním ethernetu |
| **Rychlost** | 1/10/25/100 GbE | Stejná jako Ethernet |
| **Initiator** | SW (OS) nebo HW (TOE) | SW initiator zdarma, ~5-10 % CPU load |
| **Multipathing** | MPIO (Multiple Connections per Session) | Až 8 cest, active/active nebo active/passive |
| **CHAP** | Authentication | Mutual CHAP doporučen |
| **Jumbo frames** | Doporučeno MTU 9000 | Snížení CPU overhead, vyšší throughput |
| **Use case** | Malé a střední SAN, backup, DR | Levnější než FC, nižší výkon |
**iSCSI configuration**:
```
# Software initiator (Linux)
iscsiadm -m discovery -t sendtargets -p 10.0.0.100:3260
iscsiadm -m node --login -T iqn.2024-05.storage:array01
# Multipath (dm-multipath)
mpathconf --enable --with_multipathd y
# /etc/multipath.conf: aliases, failback, rr_min_io
```
### NVMe-oF (NVMe over Fabrics)
| Transport | Protokol | Latence | CPU overhead | Use case |
|-----------|----------|---------|-------------|----------|
| **NVMe over FC** | FC-NVMe (FC Gen 6/7) | <10 µs | Nízký | Enterprise SAN, VMware |
| **NVMe over RDMA (RoCE v2)** | RDMA (RoCE) | <5 µs | Velmi nízký | AI/ML, HPC, K8s (CSI) |
| **NVMe over TCP** | TCP | ~50 µs | Střední (10-20 % CPU) | Standardní Ethernet, bez RDMA |
| **NVMe over InfiniBand** | IB RC/UC | <3 µs | Nejnižší | HPC, AI training |
**NVMe-oF comparison**:
| Vlastnost | FC-NVMe | NVMe/RoCE | NVMe/TCP | NVMe/IB |
|-----------|---------|-----------|----------|---------|
| **Latence (target)** | ~8 µs | ~4 µs | ~50 µs | ~3 µs |
| **Bandwidth** | 64 Gbps | 100/200 GbE | 25/100 GbE | NDR 400 Gbps |
| **Requires special HW** | FC HBA + switch | RoCE NIC + DCB switch | Standard NIC | IB HCA + switch |
| **Ecosystem** | Broadcom, Marvell | NVIDIA, Broadcom | OS built-in | NVIDIA Mellanox |
| **Use case** | VMware, enterprise SAN | AI/ML, K8s, HPC | SMB, K8s, cost-effective | HPC, large AI |
### SAS (Serial Attached SCSI)
| Generace | Rychlost | Kabeláž | Dosah | Use case |
|----------|----------|---------|-------|----------|
| **SAS 3** | 12 Gbps | SAS cable (SFF-8644) | 6-10 m | Legacy storage, DAS |
| **SAS 4** | 22.5 Gbps | SAS cable (SFF-8644) | 6-10 m | Současný standard |
| **SAS 5** | 45 Gbps | SAS cable (SFF-8644) | 6-10 m | Emerging |
**SAS topology**: Server → SAS HBA → SAS expander → SAS disk (point-to-point, ne shared jako FC)
---
## Server connectivity — decision matrix
| Workload | Primární | Sekundární | Management |
|----------|----------|-----------|------------|
| **Web / API** | 2× 25 GbE (LACP) | — | 1× 1 GbE BMC |
| **Databáze** | 2× 25/100 GbE (RSS) | 2× 32 Gb FC (SAN) | 1× 1 GbE BMC |
| **Virtualizace** | 4× 25 GbE (SR-IOV) | 2× 32 Gb FC (VMFS) | 1× 1 GbE BMC |
| **Kubernetes** | 2× 25/100 GbE | — | 1× 1 GbE BMC |
| **Storage node** | 2× 100 GbE (RoCE) | 2× 25 GbE (management) | 1× 1 GbE BMC |
| **AI training** | 8× 400 GbE + IB NDR | 4× 100 GbE (storage) | 1× 1 GbE BMC |
| **AI inference** | 4× 100 GbE (RoCE) | 2× 25 GbE (management) | 1× 1 GbE BMC |
| **HPC** | InfiniBand NDR | 2× 100 GbE (storage) | 1× 1 GbE BMC |
---
## Server NIC placement (PCIe slot optimization)
```
2U Server (GPU/AI):
┌─────────────────────────────────────────────────┐
│ PCIe 0: GPU (x16) — NVLink / InfiniBand (x16) │
│ PCIe 1: GPU (x16) — NIC 100 GbE (x16) │
│ PCIe 2: GPU (x16) │
│ PCIe 3: GPU (x16) │
│ PCIe 4: GPU (x16) │
│ PCIe 5: GPU (x16) — NIC 100 GbE (x16) │
│ PCIe 6: Storage HBA / NIC (x8) │
│ PCIe 7: Management / OCP (x8) │
└─────────────────────────────────────────────────┘
1U Standard:
┌─────────────────────────────────┐
│ OCP: 2× 25 GbE (management) │
│ PCIe 0: NIC 25 GbE (x8) │
│ PCIe 1: Storage HBA / FC (x8) │
│ PCIe 2: GPU (x16, optional) │
│ PCIe 3: NVMe (x4, M.2) │
└─────────────────────────────────┘
```
### NVIDIA Mellanox ConnectX NICs
NVIDIA Mellanox je přední výrobce NIC adaptérů pro AI/HPC a cloud datová centra.
| Model | PCIe | Max rychlost | Form factor | Klíčové features |
|-------|------|-------------|-------------|------------------|
| **ConnectX-5** | PCIe 3.0 x16 | 100 GbE (dual) | HHHL | RoCE, NVMe-oF target offload, MPI offload |
| **ConnectX-6 Dx** | PCIe 4.0 x16 | 200 GbE (1-port) / 100 GbE (2-port) | HHHL, OCP 3.0 | ASAP² vSwitch offload, IPsec/TLS inline crypto, AES-XTS, 215 Mpps DPDK |
| **ConnectX-6 Lx** | PCIe 4.0 x8 | 25 GbE (dual) | HHHL, OCP 3.0 | RoCE, Secure Boot, low-power |
| **ConnectX-7** | PCIe 5.0 x16 | 400 GbE (1-port) / 200 GbE (2-port) | HHHL | NDR InfiniBand + 400GbE, GPUDirect, SHARP |
| **ConnectX-8** | PCIe 6.0 x16 | 800 GbE (1-port) / 400 GbE (2-port) | HHHL | XDR InfiniBand, sub-500ns latence, in-network computing, multi-host |
**Platformy**: Spectrum-X Ethernet (end-to-end AI networking), Quantum InfiniBand, BlueField DPU.
### Broadcom Emulex FC HBA
| Model | Rychlost | PCIe | Porty | Features |
|-------|----------|------|-------|----------|
| **LPe35000** (Gen 7) | 32 GFC | PCIe 3.0 x8 | 1-2 | NVMe-FC, T10-PI (DIF), SR-IOV, Silicon Root of Trust |
| **LPe35002** (Gen 7) | 32 GFC | PCIe 3.0 x8 | 2 | NVMe-FC, Secure Boot, digitálně podepsaný firmware |
| **LPe36000** (Gen 7) | 64 GFC | PCIe 4.0 x16 | 1-2 | První 64GFC HBA na trhu, 10M IOPS, 3× lepší latence než Gen 6 |
**Klíčové vlastnosti**: podpora NVMe over FC, T10 DIF (Data Integrity Field), 10M MTBF, NIST SP 800-193 compliant. Gen 7 přináší až 10M IOPS a 3× nižší latenci oproti Gen 6.
### NVMe-oF specifikace
NVMe over Fabrics (NVMe-oF) rozšiřuje NVMe protokol z lokálního PCIe na síťové transporty. První specifikace 1.0 vydána v červnu 2016, aktuálně součástí NVMe 2.3 (srpen 2025). Podporované transporty:
| Transport | Specifikace | Use case |
|-----------|------------|----------|
| **NVMe over PCIe** | NVMe Base | Lokální NVMe SSD |
| **NVMe over RDMA** (RoCE, InfiniBand, iWARP) | NVMe Transport | AI/ML, HPC, nejnižší latence <5 µs |
| **NVMe over TCP** | NVMe Transport | Standardní Ethernet, bez RDMA, latence ~50 µs |
| **NVMe over FC** (FC-NVMe) | INCITS T11 | Enterprise SAN, FC fabric |
NVMe 2.3 přidává Computational Programs Command Set, Storage Level Management (SLM), a Zoned Namespaces (ZNS). NVMe-MI definuje management rozhraní.
### Dell PowerEdge R760 — NIC placement
Server Dell R760 podporuje:
- **OCP 3.0** adaptéry (až 2×) — 1/10/25/100 GbE
- **PCIe Gen5** sloty — 8× slotů (6× FHHL + 2× LP)
- **LOM** — 2× 1 GbE Broadcom 5720 na základní desce
- Maximální rychlost NIC: 100 GbE (QSFP56)
- Supported typy: RJ45, SFP+, SFP28, QSFP28, QSFP56
Doporučené konfigurace:
- Standard: OCP 3.0 2× 25 GbE + PCIe storage HBA
- AI/ML: PCIe 100 GbE (riser config 1, slot 1-2) + GPU v ostatních slotech
### HPE Gen11 NIC options
HPE ProLiant Gen11 (DL360/DL380) podporuje:
- **OCP 3.0** sloty (až 2) — 10/25/100/200 GbE (Broadcom, Intel, NVIDIA Mellanox)
- **PCIe Gen5** adaptéry — 8× slotů (DL380) / 3× sloty (DL360)
- **iLO 6** dedikovaný management port (1 GbE)
- Podporované NIC: Broadcom BCM57412 (10GbE), BCM57504 (25GbE), NVIDIA ConnectX-6 Dx (100GbE)
## Zdroje
Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
### Doporučená literatura
| Kniha | Autoři | ISBN | Popis |
|-------|--------|------|-------|
| AI Data Center Network Design and Technologies (1st ed., 2026) | Subramaniam, Styszynski, Tambakuwala | 978-0-13-543628-8 | První vendor-agnostický průvodce návrhem sítí pro AI trénování a inferenci. Pokrývá high-radix fabric, lossless Ethernet/IP, UEC technologie, chlazení a power pro AI klastry. Autoři z HPE Juniper Networking. |
*Poslední revize: 2026-06-03*

101
DATABASE-ENGINES.en.md Normal file
View File

@@ -0,0 +1,101 @@
# ⚙️ Storage Engines and Transaction Models
## B-Tree vs LSM-Tree
Two dominant storage engine approaches in modern databases.
| Property | B-Tree | LSM-Tree |
|-----------|--------|----------|
| **Write** | In-place update (random I/O on page) | Append-only (sequential I/O) |
| **Read** | Fast (directly in page, O(log N)) | Slower (merge from multiple SSTables, bloom filters) |
| **Write amplification** | Lower (page rewrite) | Higher (compaction, SSTable merge) |
| **Read amplification** | Lower (1 page read) | Higher (multiple SSTables to search) |
| **Compression** | Worse (page fragmentation) | Better (compact SSTable, block compression) |
| **Range scan** | Fast (linked list at leaf level) | Fast (SSTables are sorted) |
| **Space amplification** | Low | Higher (awaits compaction) |
| **Typical DBs** | PostgreSQL, MySQL (InnoDB), SQLite, Oracle | Cassandra, RocksDB, LevelDB, ScyllaDB, MongoDB (WiredTiger) |
### When to Choose Which Engine
**B-Tree** — when:
- You need fast point lookups (PK lookup, unique ID)
- Workload is read-heavy (most queries = SELECT by key)
- You need range queries on primary key
- Transactional workload (OLTP) with short queries
**LSM-Tree** — when:
- You need high write throughput (write-heavy)
- Append-only workload (logs, time-series, IoT)
- Data compression is important (saves space)
- Write amplification is not a concern (sufficient I/O capacity)
## Write-Ahead Log (WAL)
Append-only log guaranteeing that no operation is lost on crash:
```text
1. Transaction BEGIN → WAL entry
2. Data modification → WAL entry (before page modification)
3. Transaction COMMIT → flush WAL to disk (COMMIT confirmed only after flush)
4. Checkpoint → flush dirty pages → WAL up to checkpoint point can be deleted
```
- **Write-ahead** — WAL is written before the data page
- **Checkpoint** — point from which WAL is needed for recovery
- **Redo log** (InnoDB) — similar concept, used to replay missing changes
- **Group commit** — multiple transactions flush WAL at once (higher throughput)
## MVCC (Multi-Version Concurrency Control)
Each transaction sees a snapshot of data as of the start time. Old row versions remain in the table.
### Implementations
| DB | Mechanism | Vacuum/GC | Isolation Levels |
|----|------------|-----------|-----------------|
| **PostgreSQL** | Heap tuple (xmin/xmax) — old versions in main table | VACUUM (autovacuum) | RU, RC, RR, Serializable (SSI) |
| **MySQL InnoDB** | Undo log — old versions in undo segments | Purge (automatic) | RU, RC, RR, Serializable |
| **MSSQL** | Tempdb version store | Automatic (row versioning) | RC (snapshot), Serializable |
| **Oracle** | Undo tablespace | Automatic (undo retention) | RC, Serializable, Read-only |
| **MongoDB WiredTiger** | MVCC at document level | Automatic (eviction) | Snapshot isolation |
| **Cassandra** | No MVCC (value overwrite) | Compaction (merge SSTable) | — |
### Anomalies
| Level | Dirty Read | Non-repeatable Read | Phantom Read | Serialization Anomaly |
|--------|-----------|---------------------|-------------|----------------------|
| **Read Uncommitted** | Yes | Yes | Yes | Yes |
| **Read Committed** | No | Yes | Yes | Yes |
| **Repeatable Read** | No | No | No (PG: no, MySQL: next-key locking) | Yes |
| **Serializable** | No | No | No | No |
- **Dirty Read** — reading data from an uncommitted transaction
- **Non-repeatable Read** — same query returns different data
- **Phantom Read** — same query returns new rows
- **Serialization Anomaly** — result of transactions is not equivalent to any serial order
## Index Types
| Type | Algorithm | Use Case | DB Support |
|-----|-----------|----------|------------|
| **B-tree** | Balanced tree | `=`, `<`, `>`, `BETWEEN`, `IN`, `LIKE (prefix)` | All (default) |
| **Hash** | Hash table | Only `=` (equality) | PostgreSQL (hash index), MySQL (MEMORY) |
| **GiST** | Generalized Search Tree | Geometry, full-text, intervals, IP ranges | PostgreSQL |
| **GIN** | Generalized Inverted Index | JSONB, arrays, full-text (contains, overlaps) | PostgreSQL |
| **BRIN** | Block Range Index | Time-series, logs (data in order) — extremely small | PostgreSQL |
| **SP-GiST** | Space-partitioned | Quadrants, KD-tree, radix tree | PostgreSQL |
| **R-tree** | Spatial tree | Geospatial data | MySQL (MyISAM/InnoDB), SQLite |
| **Clustered index** | B-tree + data in leaves | PK lookup (InnoDB) — data stored with index | MySQL InnoDB, MSSQL |
| **Full-text** | Inverted index | Text search (stemming, relevance) | MySQL, PostgreSQL, MSSQL |
## Resources
Links, books and standards: [sources/databases/sources.en.md](sources/databases/sources.en.md)
### Recommended Reading
| Book | Authors | ISBN | Description |
|-------|--------|------|-------|
| Database Internals | Alex Petrov | 978-1492040346 | In-depth explanation of storage engines (B-Tree, LSM-Tree, WAL, MVCC), distributed systems (partitioning, replication, consensus) |
*Last revision: 2026-06-03*

325
DATABASES.en.md Normal file
View File

@@ -0,0 +1,325 @@
# 🗄️ Database Architecture
## Database Classification
### Relational (SQL)
| DB | License | Use Case | Details |
|----|---------|----------|--------|
| **PostgreSQL** | Open source | Universal, geospatial, analytics, AI | [POSTGRESQL.en.md](POSTGRESQL.en.md) |
| **MySQL / MariaDB** | Open source | Web, LAMP stack, e-commerce | [MYSQL.en.md](MYSQL.en.md) |
| **Microsoft SQL Server** | Proprietary | Enterprise .NET, Windows ecosystem | — |
| **Oracle DB** | Proprietary | Enterprise, finance, mainframe, RAC cluster | [ORACLE.en.md](ORACLE.en.md) |
| **Amazon Aurora** | Managed | MySQL/PostgreSQL compatible, cloud-native | — |
### NoSQL
| Type | DB | Use Case | Details |
|-----|----|----------|--------|
| **Document** | MongoDB, Couchbase | JSON data, flexible schema | [MONGODB.en.md](MONGODB.en.md) |
| **Key-Value / Cache** | Redis, Memcached, DynamoDB | Cache, session store, real-time | [REDIS.en.md](REDIS.en.md) |
| **Wide-column** | Cassandra, ScyllaDB | Time-series, IoT, big data | [CASSANDRA.en.md](CASSANDRA.en.md) |
| **Vector** | Pinecone, Qdrant, Milvus, pgvector | Embeddings, RAG, semantic search | [VECTOR-DBS.en.md](VECTOR-DBS.en.md) |
| **Graph** | Neo4j, Dgraph | Relationships, recommendations, social graphs | — |
### Storage Engines
Common concepts across databases: [DATABASE-ENGINES.en.md](DATABASE-ENGINES.en.md)
---
## Transaction Isolation Levels
| Level | Dirty Read | Non-repeatable Read | Phantom Read | Serialization Anomaly |
|--------|-----------|---------------------|-------------|----------------------|
| **Read Uncommitted** | Yes (possible) | Yes | Yes | Yes |
| **Read Committed** | No (prevented) | Yes | Yes | Yes |
| **Repeatable Read** | No | No | No (PostgreSQL: No) | Yes |
| **Serializable** | No | No | No | No |
**Anomalies**:
- **Dirty Read** — reading data from an uncommitted transaction (data may be rolled back)
- **Non-repeatable Read** — same query returns different data (another transaction updated the row in the meantime)
- **Phantom Read** — same query returns new rows (another transaction inserted data matching the condition)
- **Serialization Anomaly** — the result of transactions is not equivalent to any serial order
### PostgreSQL vs MySQL Differences
- **PostgreSQL**: Read Uncommitted behaves like Read Committed. Repeatable Read = Snapshot Isolation (also prevents phantom reads). Serializable = SSI.
- **MySQL InnoDB**: Repeatable Read uses next-key locking (prevents phantom reads).
---
## CAP Theorem
In a distributed system, only 2 out of 3 are possible: **C**onsistency, **A**vailability, **P**artition tolerance.
In practice: P is always required, we choose between CP (consistency) and AP (availability).
### PACELC Extension
PACELC extends CAP with behavior under normal conditions (no partition):
- **P**artition → **A**vailability vs **C**onsistency
- **E**lse (no partition) → **L**atency vs **C**onsistency
| DB | Partition Choice | Else Choice |
|----|----------------|------------|
| Cassandra | AP (availability) | LC (low latency, eventual consistency) |
| DynamoDB (default) | AP | LC |
| MongoDB | CP (primary) | LC |
| PostgreSQL (single) | CP | CC |
| CockroachDB | CP | CC |
### Quorum Details
- **R** (read quorum) + **W** (write quorum) > **N** (replication factor)
- Typical: N=3, R=2, W=2 (tolerates 1 node down)
- **Sloppy quorum** — when a node is unavailable, data is temporarily stored on another node
- **Hinted handoff** — temporary write to another node with a hint, data is transferred upon recovery
---
## Replication
| Type | Description | Latency |
|-----|-------|---------|
| Synchronous | Write confirmed only after replication to all nodes | High, but consistent |
| Asynchronous | Write confirmed immediately, replication in the background | Low, possible data loss |
| Semi-synchronous | Confirmation from majority of nodes | Compromise |
### Topologies
- **Leader-Follower** (Master-Slave) — reads from replicas
- **Leader-Leader** (Multi-master) — writes to multiple nodes
- **Quorum-based** — R + W > N (Cassandra, DynamoDB)
---
## Sharding
Data distribution across nodes based on a shard key.
```
┌─────────┐
│ Proxy │
│ Router │
└────┬────┘
┌──────────┼──────────┐
┌────▼───┐ ┌───▼────┐ ┌───▼────┐
│Shard A │ │Shard B │ │Shard C │
│ 0-100 │ │101-200 │ │201-300 │
└────────┘ └────────┘ └────────┘
```
### Methods
| Method | Description | Advantage | Disadvantage |
|--------|-------|--------|----------|
| **Hash-based** | `shard_id = hash(key) % N` | Even distribution | Loss of range queries |
| **Range-based** | Data by range (A-M, N-Z) | Preserves ordering | Hot spots |
| **Consistent hashing** | Hash ring, vnodes | Min. rebalancing when number of shards changes | More complex |
### Routing
- **Proxy-based** — application goes to proxy, which routes (Vitess, ProxySQL, mongos)
- **Client-side** — application knows which shard to target
- **DNS-based** — each shard has its own endpoint
---
## Data Consistency Patterns
| Pattern | Description | Example |
|---------|-------|---------|
| **Strong consistency** | After a write, every read sees the latest data | Single DB, Raft, Spanner |
| **Eventual consistency** | After a write, data propagates over time | DNS, DynamoDB (default), Cassandra |
| **Read-after-write** | The author always sees their own write (others are eventual) | Social networks, comments |
| **Causal consistency** | Causally dependent operations are seen in the correct order | COPS, Orbe, MongoDB (causal clusters) |
| **Monotonic reads** | You do not see older data after seeing newer data | Cassandra (MONOTONIC_READ consistency) |
| **Monotonic writes** | Writes from a single client are in order | Queue-based, single leader |
---
## Data Migration
### Schema Migration
```
V1__initial_schema.sql
V2__add_users_table.sql
V3__add_email_index.sql
V4__add_orders_table.sql
```
### Zero-Downtime Migration
1. **Expand** — add new column/table (application tolerates both states)
2. **Migrate** — backfill data, update application to new schema
3. **Contract** — remove old column/table
### Tools
| Tool | Language | Strategy | Zero-Downtime | Rollback |
|---------|-------|-----------|--------------|----------|
| **Flyway** | Java (multi-lang CLI) | Versioned SQL | Limited (additive only) | `undo` (limited, enterprise) |
| **Liquibase** | Java (multi-lang CLI) | Changesets (XML/YAML/JSON/SQL) | Yes (changeset design) | `rollback <count>` |
| **Alembic** | Python | Auto-generation, versioned | Yes (branching) | `downgrade` |
| **Prisma Migrate** | TypeScript | Declarative schema → diff | Yes (shadow DB) | `migrate diff` |
| **gh-ost** | Go | Triggerless online DDL (MySQL) | Yes (binlog stream) | No (progressive) |
| **pgroll** | Go | Online schema migration (PG) | Yes (views, multiple versions) | Yes (immediate) |
---
## SQL Antipatterns
Based on *More SQL Antipatterns* (Karwin, 2026) — 14 new antipatterns:
### Language Antipatterns
| Antipattern | Problem | Solution |
|-------------|---------|--------|
| **Fear of JOINs** | Manual pairing in application instead of JOIN | Use JOIN correctly |
| **Relational Division** | Finding sets in WHERE | Relational division (subquery with GROUP BY/HAVING) |
| **Pagination via OFFSET** | OFFSET is O(n) — the larger the offset, the slower | Keyset pagination (WHERE id > last_seen) |
| **Non-Sargable queries** | Functions on columns in WHERE (`WHERE YEAR(date) = 2026`) | Rewrite as range condition |
### Optimization Antipatterns
| Antipattern | Problem | Solution |
|-------------|---------|--------|
| **Premature denormalization** | Denormalization without reason | Measure, then optimize |
| **JSON overuse** | JSON as a universal solution | Use JSON only for genuinely flexible data |
| **Cacheless transactions** | Relying on query cache (removed in MySQL 8) | Application-level caching |
### Application Antipatterns
| Antipattern | Problem | Solution |
|-------------|---------|--------|
| **Polling** | Regularly querying for changes | LISTEN/NOTIFY, Kafka, Change Data Capture |
| **Transaction encapsulation** | Each model manages its own transaction | Unit of Work pattern |
| **Fear of deadlocks** | Trying to prevent all deadlocks | Mitigation, not prevention |
| **Data hoarding** | Storing everything forever | Data retention policies, archiving |
### Mini-Antipatterns
- `LIMIT` without `ORDER BY` — nondeterministic results
- `NATURAL JOIN` — fragile, implicit join condition
- `N+1 queries` — query in a loop instead of JOIN/batch
- Redundant indexes — duplicate/overlapping indexes unnecessarily slow writes
---
## Designing Data-Intensive Applications (2nd Edition)
*Kleppmann, Riccomini (2026)* — substantially revised edition.
### What's New Compared to 1st Edition
| Area | What's New |
|--------|-----------|
| **Cloud-native** | Storage = object store (S3, Blob), not local disk. Separation of control/data/compute plane |
| **AI workloads** | Vector indexes, DataFrames as a data model, batch processing for training data |
| **Local-first software** | DuckDB, PGlite, SQLite — databases running on laptop/edge, sync when connected |
| **Formal methods** | Randomized testing, formal verification (important for AI-generated code) |
| **Legal & ethics** | GDPR, ethics of predictive analytics, bias, algorithmic accountability |
| **Streaming → SQL views** | Materialize, incremental view maintenance — streaming as SQL |
### Key Principles (unchanged)
**Reliability**, **Scalability**, **Maintainability** — the three pillars of good data systems.
---
## Apache Iceberg Lakehouse
Based on *Architecting an Apache Iceberg Lakehouse* (Merced, 2026):
### What is a Data Lakehouse
An architecture combining the flexibility and low cost of a **data lake** (object storage) with the performance and governance of a **data warehouse**. Apache Iceberg is an open source table format.
### Iceberg Metadata Architecture
```
Table metadata (.metadata.json)
└── Snapshot manifest list
└── Manifests (file-level stats)
└── Data files (Parquet/ORC/Avro)
```
### Key Features
| Feature | Description |
|-----------|-------|
| **ACID transactions** | Safe concurrent read/write |
| **Schema evolution** | Add/drop/rename columns without rewrite |
| **Time travel** | Query historical snapshots |
| **Partition evolution** | Change partition strategy without data rewrite |
| **Hidden partitioning** | Automatic partition filters (user does not need to specify) |
| **Multi-engine** | Spark, Flink, Trino, Dremio, Snowflake over the same data |
For a broader overview of the Big Data ecosystem (HDFS, Spark, Flink, Trino, Delta Lake, Hudi) see [BIG-DATA.en.md](BIG-DATA.en.md).
### When to Use Iceberg
- Multi-tool access to the same governed data
- ACID on lake data
- Streaming + batch in a single table
- Reducing duplication (one canonical copy instead of ETL to warehouse)
---
## Best Practices
- **Connection pooling** — PgBouncer, RDS Proxy, ProxySQL
- **Indexing based on query patterns** — do not have unnecessary indexes
- **Read replicas** for reporting and analytics
- **Backup & recovery** — point-in-time recovery (PITR), regular tests
- **Query monitoring** — slow query log, pg_stat_statements, performance_schema
- **Encryption at rest & in transit**
- **Migrations in CI/CD** — part of the pipeline, not manual
- **Choose DB based on workload** — no single universal DB (polyglot persistence)
---
## Database License Model Comparison
| DB | License | Price (self-hosted) | Price (managed cloud) | Vendor lock-in | Note |
|----|---------|-------------------|---------------------|----------------|----------|
| **PostgreSQL** | PostgreSQL license (MIT-like) | $0 | ~$0.10-1.00/hr (RDS, CloudSQL, Aurora) | Low | Fully open source, no restrictions |
| **MySQL** | GPL v2 / Commercial (Oracle) | $0 (GPL) / ~$2,000/server/year (commercial) | ~$0.10-1.00/hr (RDS, PlanetScale) | Medium (Oracle owned) | GPL = need to release application? (depends on distribution) |
| **MariaDB** | GPL v2 / Business Source | $0 (GPL) | ~$0.10-1.00/hr (SkySQL) | Low | Fully compatible MySQL fork, no Oracle influence |
| **Oracle SE2** | Proprietary (per core) | ~$17,500/core + 22% support/year | ~$1-5/hr (RDS, OCI) | High | Core factor 0.5 (EPYC/Xeon), max 16 threads |
| **Oracle EE** | Proprietary (per core + options) | ~$47,500/core + options + 22% support | ~$2-30/hr (OCI, RDS) | High | Options double the price (RAC, partitioning, compression) |
| **SQL Server Standard** | Proprietary (per core + CAL) | ~$1,000/core + $200/CAL | ~$0.20-1.00/hr (Azure SQL) | Medium | Windows Server license required additionally |
| **SQL Server Enterprise** | Proprietary (per core + CAL) | ~$7,000/core + $200/CAL | ~$1-5/hr (Azure SQL) | Medium | AlwaysOn, partitioning, in-memory OLTP |
| **MongoDB** | SSPL (Community) / Commercial (Enterprise) | $0 (Community) / ~$10k/server/year (Enterprise) | ~$0.10-5.00/hr (Atlas) | Medium | SSPL restricts managed cloud services |
| **Redis** | RSALv2 + SSPL (7.4+) / BSD (Valkey) | $0 (Valkey) | ~$0.10-1.00/hr (ElastiCache, Memorystore → Valkey) | Low (Valkey) | Redis 7.4+ license change → Valkey fork |
| **Cassandra** | Apache 2.0 | $0 | ~$0.10-1.00/hr (Keyspaces, Amazon Managed) | Low | Fully open source, no restrictions |
| **ScyllaDB** | Apache 2.0 (OSS) / Enterprise | $0 (OSS) / Enterprise subscription | ~$0.50-3.00/hr (ScyllaDB Cloud) | Low (OSS) | Enterprise: monitoring, security, support |
| **CockroachDB** | BSL (Business Source License) / Enterprise | $0 (core) / Enterprise subscription | ~$0.50-3.00/hr (CockroachDB Cloud) | Medium | BSL: converts to MIT after 3 years. Enterprise: multi-region, backup |
**Key Recommendations**:
- **Lowest TCO**: PostgreSQL (no license, broadest cloud support)
- **Highest vendor lock-in**: Oracle (PL/SQL, proprietary options, expensive migration)
- **License risk**: Redis (license change) → use Valkey for new projects
- **Cloud-native licensing**: MongoDB Atlas, CockroachDB Cloud, ScyllaDB Cloud — pay-per-use, no license management
## Resources
Links, books and standards: [sources/databases/sources.en.md](sources/databases/sources.en.md)
### Recommended Reading
| Book | Authors | ISBN | Key Takeaway |
|-------|--------|------|----------------|
| Database Internals | Alex Petrov | 978-1492040346 | In-depth explanation of storage engines (B-Tree, LSM-Tree, WAL, MVCC), distributed systems |
| Designing Data-Intensive Applications (2nd ed.) | Kleppmann, Riccomini | — | Cloud-native, AI, local-first, formal methods |
| High Performance MySQL (4th ed.) | Schwartz, Zaitsev, Tkachenko | 978-1492075292 | MySQL architecture, schema/index optimization |
| Expert Oracle Architecture (3rd ed.) | Kyte, Kuhn | 978-1484249602 | Oracle architecture, RAC, Data Guard, tuning |
| AI-Ready PostgreSQL 18 | Kumar, Linster | — | PostgreSQL as a unified platform for AI |
| More SQL Antipatterns | Bill Karwin (2026) | — | 14 antipatterns, keyset pagination |
| Vector Databases | Borwankar (2026) | — | Embeddings, vector indexes, RAG |
| Architecting an Apache Iceberg Lakehouse | Merced (2026) | — | Lakehouse architecture, Iceberg metadata |
*Last revision: 2026-06-03*

325
DATABASES.md Normal file
View File

@@ -0,0 +1,325 @@
# 🗄️ Databázová architektura
## Klasifikace databází
### Relační (SQL)
| DB | Licence | Use case | Detail |
|----|---------|----------|--------|
| **PostgreSQL** | Open source | Univerzální, geospatial, analytika, AI | [POSTGRESQL.md](POSTGRESQL.md) |
| **MySQL / MariaDB** | Open source | Web, LAMP stack, e-commerce | [MYSQL.md](MYSQL.md) |
| **Microsoft SQL Server** | Proprietary | Enterprise .NET, Windows ekosystém | — |
| **Oracle DB** | Proprietary | Enterprise, finance, mainframe, RAC cluster | [ORACLE.md](ORACLE.md) |
| **Amazon Aurora** | Managed | MySQL/PostgreSQL kompatibilní, cloud-native | — |
### NoSQL
| Typ | DB | Use case | Detail |
|-----|----|----------|--------|
| **Document** | MongoDB, Couchbase | JSON data, flexibilní schema | [MONGODB.md](MONGODB.md) |
| **Key-Value / Cache** | Redis, Memcached, DynamoDB | Cache, session store, real-time | [REDIS.md](REDIS.md) |
| **Wide-column** | Cassandra, ScyllaDB | Time-series, IoT, velká data | [CASSANDRA.md](CASSANDRA.md) |
| **Vector** | Pinecone, Qdrant, Milvus, pgvector | Embeddingy, RAG, sémantické vyhledávání | [VEKTOROVE-DB.md](VEKTOROVE-DB.md) |
| **Graph** | Neo4j, Dgraph | Vztahy, doporučení, social grafy | — |
### Storage enginy
Společné koncepty napříč databázemi: [DATABAZOVE-ENGINY.md](DATABAZOVE-ENGINY.md)
---
## Transaction isolation levels
| Úroveň | Dirty Read | Non-repeatable Read | Phantom Read | Serialization Anomaly |
|--------|-----------|---------------------|-------------|----------------------|
| **Read Uncommitted** | Ano (možné) | Ano | Ano | Ano |
| **Read Committed** | Ne (prevence) | Ano | Ano | Ano |
| **Repeatable Read** | Ne | Ne | Ne (PostgreSQL: Ne) | Ano |
| **Serializable** | Ne | Ne | Ne | Ne |
**Anomálie**:
- **Dirty Read** — čtení dat z necommitnuté transakce (data mohou být rollbacknuta)
- **Non-repeatable Read** — stejný dotaz vrátí jiná data (jiná transakce mezitím updatovala řádek)
- **Phantom Read** — stejný dotaz vrátí nové řádky (jiná transakce insertla data splňující podmínku)
- **Serialization Anomaly** — výsledek transakcí není ekvivalentní žádnému sériovému pořadí
### PostgreSQL vs MySQL rozdíly
- **PostgreSQL**: Read Uncommitted se chová jako Read Committed. Repeatable Read = Snapshot Isolation (zabraňuje i phantom reads). Serializable = SSI.
- **MySQL InnoDB**: Repeatable Read používá next-key locking (zabrání phantom reads).
---
## CAP teorém
V distribuovaném systému lze mít pouze 2 ze 3: **C**onsistency, **A**vailability, **P**artition tolerance.
V praxi: P je vždy vyžadováno, volíme mezi CP (konzistence) a AP (dostupnost).
### PACELC rozšíření
PACELC rozšiřuje CAP o chování za normálních podmínek (bez partition):
- **P**artition → **A**vailability vs **C**onsistency
- **E**lse (bez partition) → **L**atency vs **C**onsistency
| DB | Partition volba | Else volba |
|----|----------------|------------|
| Cassandra | AP (dostupnost) | LC (nízká latence, eventual consistency) |
| DynamoDB (default) | AP | LC |
| MongoDB | CP (primární) | LC |
| PostgreSQL (single) | CP | CC |
| CockroachDB | CP | CC |
### Quorum detail
- **R** (read quorum) + **W** (write quorum) > **N** (replication factor)
- Typické: N=3, R=2, W=2 (toleruje 1 node down)
- **Sloppy quorum** — při nedostupnosti nodu, data dočasně uložena na jiném nodu
- **Hinted handoff** — dočasný zápis na jiný node s hintem, při obnově se data přenesou
---
## Replikace
| Typ | Popis | Latence |
|-----|-------|---------|
| Synchronní | Zápis potvrzen až po replikaci na všechny nod | Vysoká, ale konzistentní |
| Asynchronní | Zápis potvrzen ihned, replikace na pozadí | Nízká, možný data loss |
| Semi-synchronní | Potvrzení od majority nodů | Kompromis |
### Topologie
- **Leader-Follower** (Master-Slave) — čtení z replic
- **Leader-Leader** (Multi-master) — zápis na více nodů
- **Quorum-based** — R + W > N (Cassandra, DynamoDB)
---
## Sharding
Distribuce dat napříč uzly podle shard klíče.
```
┌─────────┐
│ Proxy │
│ Router │
└────┬────┘
┌──────────┼──────────┐
┌────▼───┐ ┌───▼────┐ ┌───▼────┐
│Shard A │ │Shard B │ │Shard C │
│ 0-100 │ │101-200 │ │201-300 │
└────────┘ └────────┘ └────────┘
```
### Metody
| Metoda | Popis | Výhoda | Nevýhoda |
|--------|-------|--------|----------|
| **Hash-based** | `shard_id = hash(key) % N` | Rovnoměrná distribuce | Ztráta range dotazů |
| **Range-based** | Data dle rozsahu (A-M, N-Z) | Zachovává řazení | Hot spots |
| **Consistent hashing** | Hash ring, vnodes | Min. přeuspořádání při změně počtu shardů | Složitější |
### Routing
- **Proxy-based** — aplikace jde na proxy, ta routuje (Vitess, ProxySQL, mongos)
- **Client-side** — aplikace ví, na který shard jít
- **DNS-based** — každý shard má vlastní endpoint
---
## Data consistency patterns
| Pattern | Popis | Příklad |
|---------|-------|---------|
| **Strong consistency** | Po zápisu každý read vidí nejnovější data | Single DB, Raft, Spanner |
| **Eventual consistency** | Po zápisu se data časem propagují | DNS, DynamoDB (default), Cassandra |
| **Read-after-write** | Autor svůj zápis vždy vidí (ostatní eventual) | Sociální sítě, komentáře |
| **Causal consistency** | Kauzálně závislé operace viděny ve správném pořadí | COPS, Orbe, MongoDB (causal clusters) |
| **Monotonic reads** | Nevidíte starší data po tom, co jste viděli novější | Cassandra (MONOTONIC_READ consistency) |
| **Monotonic writes** | Zápisy od jednoho clienta v pořadí | Queue-based, single leader |
---
## Migrace dat
### Schema migrace
```
V1__initial_schema.sql
V2__add_users_table.sql
V3__add_email_index.sql
V4__add_orders_table.sql
```
### Zero-downtime migrace
1. **Expand** — přidání nového sloupce/tabulky (aplikace toleruje oba stavy)
2. **Migrate** — backfill dat, update aplikace na nové schema
3. **Contract** — odstranění starého sloupce/tabulky
### Nástroje
| Nástroj | Jazyk | Strategie | Zero-downtime | Rollback |
|---------|-------|-----------|--------------|----------|
| **Flyway** | Java (multi-lang CLI) | Versioned SQL | Omezeně (jen additive) | `undo` (limited, enterprise) |
| **Liquibase** | Java (multi-lang CLI) | Changesets (XML/YAML/JSON/SQL) | Ano (changeset design) | `rollback <count>` |
| **Alembic** | Python | Auto-generation, versioned | Ano (branching) | `downgrade` |
| **Prisma Migrate** | TypeScript | Declarative schema → diff | Ano (shadow DB) | `migrate diff` |
| **gh-ost** | Go | Triggerless online DDL (MySQL) | Ano (binlog stream) | Ne (progresivní) |
| **pgroll** | Go | Online schema migrace (PG) | Ano (views, multiple versions) | Ano (okamžitý) |
---
## SQL Antipatterns
Na základě *More SQL Antipatterns* (Karwin, 2026) — 14 nových antipatternů:
### Language antipatterns
| Antipattern | Problém | Řešení |
|-------------|---------|--------|
| **Fear of JOINs** | Manuální párování v aplikaci místo JOIN | Používat JOIN správně |
| **Relational Division** | Hledání množin v WHERE | Relační dělení (subquery s GROUP BY/HAVING) |
| **Pagination via OFFSET** | OFFSET je O(n) — čím větší offset, tím pomalejší | Keyset pagination (WHERE id > last_seen) |
| **Non-Sargable queries** | Funkce na sloupci v WHERE (`WHERE YEAR(date) = 2026`) | Přepsat na range podmínku |
### Optimization antipatterns
| Antipattern | Problém | Řešení |
|-------------|---------|--------|
| **Premature denormalization** | Denormalizace bez důvodu | Měřit, pak optimalizovat |
| **JSON overuse** | JSON jako univerzální řešení | Použít JSON jen pro skutečně flexibilní data |
| **Cacheless transactions** | Spoléhání na query cache (v MySQL 8 odstraněna) | Application-level caching |
### Application antipatterns
| Antipattern | Problém | Řešení |
|-------------|---------|--------|
| **Polling** | Pravidelné dotazování na změny | LISTEN/NOTIFY, Kafka, Change Data Capture |
| **Transaction encapsulation** | Každý model si spravuje vlastní transakci | Unit of Work pattern |
| **Fear of deadlocks** | Snaha o prevenci všech deadlocků | Mitigace, ne prevence |
| **Data hoarding** | Ukládání všeho navždy | Data retention politiky, archívace |
### Mini-antipatterny
- `LIMIT` bez `ORDER BY` — nedeterministické výsledky
- `NATURAL JOIN` — křehký, implicitní join condition
- `N+1 queries` — dotaz v cyklu místo JOIN/batch
- Redundantní indexy — duplicitní/překrývající se indexy zbytečně zpomalují zápisy
---
## Designing Data-Intensive Applications (2. vydání)
*Kleppmann, Riccomini (2026)* — zásadně přepracované vydání.
### Novinky oproti 1. vydání
| Oblast | Co je nové |
|--------|-----------|
| **Cloud-native** | Storage = object store (S3, Blob), nikoliv lokální disk. Separace control/data/compute plane |
| **AI workloads** | Vektorové indexy, DataFrames jako datový model, batch processing pro training data |
| **Local-first software** | DuckDB, PGlite, SQLite — databáze běžící na laptopu/edge, sync při připojení |
| **Formal methods** | Randomizované testování, formální verifikace (důležité pro AI-generovaný kód) |
| **Legal & ethics** | GDPR, etika prediktivní analytiky, bias, accountability algoritmů |
| **Streaming → SQL views** | Materialize, incremental view maintenance — streamování jako SQL |
### Klíčové principy (nemění se)
Spolehlivost (**Reliability**), škálovatelnost (**Scalability**), udržovatelnost (**Maintainability**) — tři pilíře dobrých datových systémů.
---
## Apache Iceberg Lakehouse
Na základě *Architecting an Apache Iceberg Lakehouse* (Merced, 2026):
### Co je data lakehouse
Architektura kombinující flexibilitu a nízkou cenu **data lake** (object storage) s výkonem a governance **data warehouse**. Apache Iceberg je open source table format.
### Iceberg metadata architektura
```
Table metadata (.metadata.json)
└── Snapshot manifest list
└── Manifests (file-level stats)
└── Data files (Parquet/ORC/Avro)
```
### Klíčové vlastnosti
| Vlastnost | Popis |
|-----------|-------|
| **ACID transakce** | Bezpečné concurrent read/write |
| **Schema evolution** | Přidání/odebrání/přejmenování sloupce bez rewrite |
| **Time travel** | Dotazování na historické snapshoty |
| **Partition evolution** | Změna partition strategie bez rewrite dat |
| **Hidden partitioning** | Automatické partition filtry (uživatel nemusí uvádět) |
| **Multi-engine** | Spark, Flink, Trino, Dremio, Snowflake nad stejnými daty |
Detailnější přehled Big Data ekosystému (HDFS, Spark, Flink, Trino, Delta Lake, Hudi) viz [BIG-DATA.md](BIG-DATA.md).
### Kdy použít Iceberg
- Multi-tool přístup ke stejným governed datům
- ACID na lake datech
- Streamování + batch v jedné tabulce
- Snížení duplicity (jedna canonical kopie místo ETL do warehouse)
---
## Best practices
- **Connection pooling** — PgBouncer, RDS Proxy, ProxySQL
- **Indexování podle query patternů** — nemít zbytečné indexy
- **Read replicas** pro reporting a analytiku
- **Backup & recovery** — point-in-time recovery (PITR), pravidelné testy
- **Query monitoring** — slow query log, pg_stat_statements, performance_schema
- **Encryption at rest & in transit**
- **Migrace v CI/CD** — součást pipeline, ne manuálně
- **Volba DB podle workloadu** — neexistuje jedna univerzální DB (polyglot persistence)
---
## Srovnání licenčních modelů databází
| DB | Licence | Cena (self-hosted) | Cena (managed cloud) | Vendor lock-in | Poznámka |
|----|---------|-------------------|---------------------|----------------|----------|
| **PostgreSQL** | PostgreSQL license (MIT-like) | $0 | ~$0.10-1.00/hod (RDS, CloudSQL, Aurora) | Nízký | Plně open source, žádná omezení |
| **MySQL** | GPL v2 / Commercial (Oracle) | $0 (GPL) / ~$2 000/server/rok (commercial) | ~$0.10-1.00/hod (RDS, PlanetScale) | Střední (Oracle vlastní) | GPL = nutnost uvolnit aplikaci? (závisí na distribuci) |
| **MariaDB** | GPL v2 / Business Source | $0 (GPL) | ~$0.10-1.00/hod (SkySQL) | Nízký | Plně kompatibilní fork MySQL, žádný Oracle vliv |
| **Oracle SE2** | Proprietary (per core) | ~$17 500/core + 22 % support/rok | ~$1-5/hod (RDS, OCI) | Vysoký | Core factor 0.5 (EPYC/Xeon), max 16 threads |
| **Oracle EE** | Proprietary (per core + options) | ~$47 500/core + options + 22 % support | ~$2-30/hod (OCI, RDS) | Vysoký | Options zdvojnásobují cenu (RAC, partitioning, compression) |
| **SQL Server Standard** | Proprietary (per core + CAL) | ~$1 000/core + $200/CAL | ~$0.20-1.00/hod (Azure SQL) | Střední | Windows Server license nutná navíc |
| **SQL Server Enterprise** | Proprietary (per core + CAL) | ~$7 000/core + $200/CAL | ~$1-5/hod (Azure SQL) | Střední | AlwaysOn, partitioning, in-memory OLTP |
| **MongoDB** | SSPL (Community) / Commercial (Enterprise) | $0 (Community) / ~$10k/server/rok (Enterprise) | ~$0.10-5.00/hod (Atlas) | Střední | SSPL omezuje managed cloud služby |
| **Redis** | RSALv2 + SSPL (7.4+) / BSD (Valkey) | $0 (Valkey) | ~$0.10-1.00/hod (ElastiCache, Memorystore → Valkey) | Nízký (Valkey) | Redis 7.4+ změna licence → fork Valkey |
| **Cassandra** | Apache 2.0 | $0 | ~$0.10-1.00/hod (Keyspaces, Amazon Managed) | Nízký | Plně open source, žádná omezení |
| **ScyllaDB** | Apache 2.0 (OSS) / Enterprise | $0 (OSS) / Enterprise subscription | ~$0.50-3.00/hod (ScyllaDB Cloud) | Nízký (OSS) | Enterprise: monitoring, security, support |
| **CockroachDB** | BSL (Business Source License) / Enterprise | $0 (core) / Enterprise subscription | ~$0.50-3.00/hod (CockroachDB Cloud) | Střední | BSL: po 3 letech se mění na MIT. Enterprise: multi-region, backup |
**Klíčová doporučení**:
- **Nejnižší TCO**: PostgreSQL (žádná licence, nejširší cloud podpora)
- **Nejvyšší vendor lock-in**: Oracle (PL/SQL, proprietary options, drahá migrace)
- **License risk**: Redis (změna licence) → používejte Valkey pro nové projekty
- **Cloud-native licensing**: MongoDB Atlas, CockroachDB Cloud, ScyllaDB Cloud — pay-per-use, žádná správa licencí
## Zdroje
Odkazy, knihy a standardy: [sources/databases/sources.md](sources/databases/sources.md)
### Doporučená literatura
| Kniha | Autoři | ISBN | Klíčový přínos |
|-------|--------|------|----------------|
| Database Internals | Alex Petrov | 978-1492040346 | Hloubkový výklad storage engine (B-Tree, LSM-Tree, WAL, MVCC), distribuované systémy |
| Designing Data-Intensive Applications (2nd ed.) | Kleppmann, Riccomini | — | Cloud-native, AI, local-first, formal methods |
| High Performance MySQL (4th ed.) | Schwartz, Zaitsev, Tkachenko | 978-1492075292 | MySQL architektura, schema/index optimalizace |
| Expert Oracle Architecture (3rd ed.) | Kyte, Kuhn | 978-1484249602 | Oracle architektura, RAC, Data Guard, tuning |
| AI-Ready PostgreSQL 18 | Kumar, Linster | — | PostgreSQL jako unified platform pro AI |
| More SQL Antipatterns | Bill Karwin (2026) | — | 14 antipatternů, keyset pagination |
| Vector Databases | Borwankar (2026) | — | Embeddings, vektorové indexy, RAG |
| Architecting an Apache Iceberg Lakehouse | Merced (2026) | — | Lakehouse architektura, Iceberg metadata |
*Poslední revize: 2026-06-03*

101
DATABAZOVE-ENGINY.md Normal file
View File

@@ -0,0 +1,101 @@
# ⚙️ Storage enginy a transakční modely
## B-Tree vs LSM-Tree
Dva dominantní storage engine přístupy v moderních databázích.
| Vlastnost | B-Tree | LSM-Tree |
|-----------|--------|----------|
| **Zápis** | In-place update (náhodný I/O na page) | Append-only (sekvenční I/O) |
| **Čtení** | Rychlé (přímo v page, O(log N)) | Pomalejší (merge z více SSTable, bloom filtry) |
| **Write amplification** | Nižší (přepis stránky) | Vyšší (kompakce, merge SSTables) |
| **Read amplification** | Nižší (1 page read) | Vyšší (více SSTable k prohledání) |
| **Komprese** | Horší (fragmentace page) | Lepší (kompaktní SSTable, bloková komprese) |
| **Range scan** | Rychlý (linked list na listové úrovni) | Rychlý (SSTable jsou seřazené) |
| **Space amplification** | Nízká | Vyšší (čeká na kompakci) |
| **Typické DB** | PostgreSQL, MySQL (InnoDB), SQLite, Oracle | Cassandra, RocksDB, LevelDB, ScyllaDB, MongoDB (WiredTiger) |
### Kdy zvolit který engine
**B-Tree** — když:
- Potřebujete rychlé point lookupy (PK lookup, jedinečné ID)
- Workload je read-heavy (většina dotazů = SELECT podle klíče)
- Potřebujete range dotazy na primárním klíči
- Transakční workload (OLTP) s krátkými dotazy
**LSM-Tree** — když:
- Potřebujete vysokou propustnost zápisů (write-heavy)
- Append-only workload (logy, time-series, IoT)
- Komprese dat je důležitá (ušetří místo)
- Write amplification nevadí (dostatek I/O kapacity)
## Write-Ahead Log (WAL)
Append-only log garantující, že žádná operace není ztracena při crash:
```text
1. Transaction BEGIN → záznam do WAL
2. Data modification → záznam do WAL (před modifikací page)
3. Transaction COMMIT → flush WAL na disk (COMMIT potvrzen až po flush)
4. Checkpoint → flush dirty pages → WAL do bodu checkpointu může být smazán
```
- **Write-ahead** — WAL zapsán dříve než data page
- **Checkpoint** — bod, odkud je WAL při recovery potřeba
- **Redo log** (InnoDB) — podobný koncept, slouží k přehrání chybějících změn
- **Group commit** — více transakcí flushne WAL najednou (vyšší propustnost)
## MVCC (Multi-Version Concurrency Control)
Každá transakce vidí snapshot dat v okamžiku startu. Staré verze řádků zůstávají v tabulce.
### Implementace
| DB | Mechanismus | Vacuum/GC | Izolační úrovně |
|----|------------|-----------|-----------------|
| **PostgreSQL** | Heap tuple (xmin/xmax) — staré verze v hlavní tabulce | VACUUM (autovacuum) | RU, RC, RR, Serializable (SSI) |
| **MySQL InnoDB** | Undo log — staré verze v undo segmentech | Purge (automatický) | RU, RC, RR, Serializable |
| **MSSQL** | Tempdb version store | Automatické (row versioning) | RC (snapshot), Serializable |
| **Oracle** | Undo tablespace | Automatické (undo retention) | RC, Serializable, Read-only |
| **MongoDB WiredTiger** | MVCC na úrovni dokumentu | Automatické (eviction) | Snapshot isolation |
| **Cassandra** | MVCC není (přepis valore) | Compaction (merge SSTable) | — |
### Anomálie
| Úroveň | Dirty Read | Non-repeatable Read | Phantom Read | Serialization Anomaly |
|--------|-----------|---------------------|-------------|----------------------|
| **Read Uncommitted** | Ano | Ano | Ano | Ano |
| **Read Committed** | Ne | Ano | Ano | Ano |
| **Repeatable Read** | Ne | Ne | Ne (PG: ne, MySQL: next-key locking) | Ano |
| **Serializable** | Ne | Ne | Ne | Ne |
- **Dirty Read** — čtení dat z necommitnuté transakce
- **Non-repeatable Read** — stejný dotaz vrátí jiná data
- **Phantom Read** — stejný dotaz vrátí nové řádky
- **Serialization Anomaly** — výsledek transakcí není ekvivalentní žádnému sériovému pořadí
## Index types
| Typ | Algoritmus | Use case | DB podpora |
|-----|-----------|----------|------------|
| **B-tree** | Balanced tree | `=`, `<`, `>`, `BETWEEN`, `IN`, `LIKE (prefix)` | Všechny (výchozí) |
| **Hash** | Hash table | Pouze `=` (equality) | PostgreSQL (hash index), MySQL (MEMORY) |
| **GiST** | Generalized Search Tree | Geometrie, full-text, intervaly, IP rozsahy | PostgreSQL |
| **GIN** | Generalized Inverted Index | JSONB, pole, full-text (contains, overlaps) | PostgreSQL |
| **BRIN** | Block Range Index | Time-series, logy (data v pořadí) — extrémně malý | PostgreSQL |
| **SP-GiST** | Space-partitioned | Kvadranty, KD-tree, radix tree | PostgreSQL |
| **R-tree** | Prostorový strom | Geoprostorová data | MySQL (MyISAM/InnoDB), SQLite |
| **Clustered index** | B-tree + data v listech | PK lookup (InnoDB) — data uložena s indexem | MySQL InnoDB, MSSQL |
| **Full-text** | Inverted index | Text search (stemming, relevance) | MySQL, PostgreSQL, MSSQL |
## Zdroje
Odkazy, knihy a standardy: [sources/databases/sources.md](sources/databases/sources.md)
### Doporučená literatura
| Kniha | Autoři | ISBN | Popis |
|-------|--------|------|-------|
| Database Internals | Alex Petrov | 978-1492040346 | Hloubkový výklad storage engine (B-Tree, LSM-Tree, WAL, MVCC), distribuované systémy (partitioning, replication, consensus) |
*Poslední revize: 2026-06-03*

1063
DATACENTERS.en.md Normal file

File diff suppressed because it is too large Load Diff

1063
DATACENTERS.md Normal file

File diff suppressed because it is too large Load Diff

246
DC-MIGRATION.en.md Normal file
View File

@@ -0,0 +1,246 @@
# 🏗️ Data Center Migration
## Migration strategies
| Strategy | RTO | RPO | Risk | Cost | Duration | Description |
|-----------|-----|-----|--------|---------|-------------|-------|
| **Cold / Big Bang** | hoursdays | days | High | Low | days | Shut everything down, move, power up |
| **Phased / Wave** | minutes (per wave) | minutes | Medium | Medium | weeksmonths | Workloads moved in waves |
| **Rolling** | 0 (live) | 0 | Low | High | months | Live migration per VM/service |
| **Parallel Run** | 0 | 0 | Low | Very high | months | Both DCs operational, gradual cutover |
| **Pilot Light** | hours | minutes | Medium | Low | weeks | Critical services in new DC, rest migrates |
| **Lift & Shift** | hours | minutes | Medium | Low | weeks | VMs/servers moved without configuration changes |
| **Re-platform** | hours | minutes | Low | Medium | months | Optimization during migration (OS upgrade, resize) |
| **Re-architect** | 0 | 0 | Low | High | monthsyears | Application redesigned for new platform |
---
## Decision tree
```mermaid
flowchart TD
Start(["DC Migration"]) --> APP{"Application\nstateful?"}
APP -->|"Yes"| DOWNTIME{"Tolerates\ndowntime?"}
APP -->|"No"| ROLLING["Rolling / Parallel Run"]
DOWNTIME -->|"Yes, hours+"| COLD["Cold / Big Bang\nSimplest, cheapest\nRisk: all at once"]
DOWNTIME -->|"Yes, minutes"| PHASED["Phased / Wave\nBy application / business unit"]
DOWNTIME -->|"No (zero downtime)"| SYNC{"Sync replication\npossible?"}
SYNC -->|"Yes, < 100 km"| ROLLING
SYNC -->|"No"| PARALLEL["Parallel Run\nBoth DCs active, gradual cutover"]
ROLLING --> ROLL_HA{"VMware,\nHyper-V?"}
ROLL_HA -->|"Yes"| VMOTION["vMotion / Storage vMotion\nLive migration, 0 downtime"]
ROLL_HA -->|"No"| ROLL_REPL["Storage + DB replication\nGradual workload migration"]
```
---
## Migration phases
### 1. Discovery and assessment
| Task | Tools | Output |
|------|----------|--------|
| HW and SW inventory | RVTools, NetBox, CMDB | Server, VM, and service list |
| Dependency mapping | ServiceNow, AppDynamics, manual | Application dependency graph |
| Traffic analysis | NetFlow, sFlow, vRNI | Bandwidth, latency, peak usage |
| Performance baseline | Prometheus, Zabbix, vRealize | CPU/RAM/disk/network per workload |
| License audit | Flexera, SAM | Licenses, support, compliance |
**Output:** workload list with RTO/RPO, dependencies, and criticality.
### 2. Planning
- **Wave plan** — workload division into migration waves (1050 VMs per wave)
- **Dependency ordering** — DNS, NTP, LDAP, PKI first
- **Cutover window** — time window for switching (typically weekend)
- **Rollback plan** — conditions and procedure for reversal
- **Test plan** — what and how to test post-migration
- **Communication plan** — who, when, how is informed
### 3. New DC preparation
- **Infrastructure** — DNS, NTP, DHCP, LDAP/AD, PKI, monitoring (see [DATACENTERS.en.md](DATACENTERS.en.md) — deployment order)
- **Network** — BGP peering, VXLAN/VLAN, firewall rules, load balancers
- **Storage** — SAN zoning, NAS exports, Ceph cluster
- **Virtualization** — vCenter, Hyper-V cluster, Proxmox
### 4. Replication and synchronization
| Layer | Method | Tools |
|--------|--------|----------|
| **Storage (block)** | SAN sync/async mirror, LUN replication | NetApp SnapMirror, Dell EMC RecoverPoint, Pure ActiveCluster |
| **Storage (file)** | DFS-R, rsync, robocopy | Windows DFS, Rsync |
| **Storage (object)** | Cross-region replication | MinIO replication, S3 CRR |
| **Databases** | Log shipping, CDC, streaming replication | PostgreSQL Patroni, Oracle Data Guard, MSSQL AlwaysOn, MySQL Group Replication |
| **VM** | Storage vMotion, replication | VMware vSphere Replication, Hyper-V Replica, Zerto |
| **Kubernetes** | Velero + Restic, Rook Ceph mirror | Velero, Rook |
### 5. Workload migration
#### Wave migration (recommended for medium/large DCs)
```mermaid
gantt
title Wave migration
dateFormat YYYY-MM-DD
section Wave 1 - Core
DNS, NTP, LDAP :done, w1a, 2026-07-01, 3d
Monitoring + logging :done, w1b, after w1a, 2d
section Wave 2 - Network
Load balancers :active, w2a, 2026-07-06, 2d
Firewalls :active, w2b, 2026-07-08, 2d
section Wave 3 - Storage
NAS migration :w3a, 2026-07-10, 5d
SAN replication :w3b, 2026-07-10, 3d
section Wave 4 - Dev/Test
Dev VMs :w4a, 2026-07-15, 5d
section Wave 5 - Prod tier 3
Internal apps :w5a, 2026-07-22, 5d
section Wave 6 - Prod tier 2
Business apps :w6a, 2026-07-29, 5d
section Wave 7 - Prod tier 1
Critical apps :w7a, 2026-08-05, 5d
```
#### Typical single wave procedure:
1. **Day -7**: Sync data replication (initial seed)
2. **Day -1**: Incremental sync, final test
3. **Day 0 (cutover)**:
- Stop application in source DC
- Final sync (last delta)
- Start application in target DC
- DNS/Traffic switch
- Smoke test
4. **Day +1**: Monitoring (performance, errors, lag)
5. **Day +7**: Rollback window end (success confirmation)
### 6. Network strategies
#### IP re-addressing
| Approach | Description | Pros | Cons |
|---------|-------|--------|----------|
| **Keep IP** | Same IPs, BGP anycast or stretch VLAN | No application config changes | Stretched VLAN/L2 limitations |
| **Change IP** | New IP range, DNS/BGP routing change | Clean architecture | Config changes, DNS TTL |
| **NAT translation** | NAT between old and new IP space | No application changes | Latency, troubleshooting complexity |
**Keep IP** is only possible with:
- L2 stretch between DCs (VXLAN, OTV) — distance limited
- BGP anycast for VIPs (load balancers)
- Applications tolerant to ARP cache changes
#### DNS cutover
```
1. Lower TTL to 60300 s (one week ahead)
2. At cutover, change A/AAAA records to new IPs
3. Wait for propagation (per TTL)
4. Monitor traffic
```
#### Traffic steering
| Technique | Use case |
|----------|----------|
| **BGP** | Change AS path / local pref for traffic steering |
| **DNS** | Lower TTL, change A records |
| **Load balancer** | Change pool members, health check |
| **GSLB** | Global Server Load Balancing (F5 GTM, NSX ALB) |
| **Cloud DNS** | AWS Route53, Azure Traffic Manager, Google Cloud DNS |
### 7. Database migration
See individual DB files for details. Summary table:
| DB | Method | RPO | RTO | Note |
|----|--------|-----|-----|----------|
| **PostgreSQL** | Streaming replication + Patroni switchover | 0 (sync) / ~MB (async) | min | Patroni auto-failover |
| **MySQL** | Group Replication / async replication | 0 (sync) / seconds | min | InnoDB Cluster |
| **Oracle** | Data Guard switchover | 0 (sync) | min | Far sync for remote DCs |
| **MSSQL** | AlwaysOn AG failover | 0 (sync) | min | Cloud witness |
| **MongoDB** | Replica set election | seconds | < 1 min | Priority-based failover |
| **Cassandra** | Multi-DC replication | eventual | 0 | Native multi-master |
### 8. Testing
| Phase | What to test | Method |
|------|-------------|--------|
| **Pre-migration** | Application in new DC (isolated) | Dry run on replicated data |
| **Cutover** | Functionality, availability, latency | Smoke test, synthetic transactions |
| **Post-migration** | Performance, integration, monitoring | A/B comparison with baseline, canary traffic |
| **Rollback** | Return to old DC | Tested rollback plan |
### 9. Rollback plan
Each wave must have a defined rollback:
| Condition | Action |
|----------|------|
| Application fails to start in new DC | DNS switch back, stop replication |
| Performance worse than baseline (> 20 %) | Rollback, root cause analysis |
| Integration failure (API timeout, DB connection) | Rollback, dependency check |
| Security incident | Rollback, forensic analysis |
Rollback must be tested **before** the real cutover.
---
## Special cases
### Mainframe migration
- **IBM z/OS** — GDPS (Geographically Dispersed Parallel Sysplex)
- HyperSwap for storage mirroring
- Cross-system coupling facility (XCF)
- Often the last migrated component
### COTS applications (Oracle EBS, SAP)
- Require vendor-specific migration procedures
- Oracle EBS: Autoconfig, cloning (ADXLC)
- SAP: System Copy (Homogeneous / Heterogeneous), SWPM, SUM
- License re-licensing on HW change
### Cloud migration (On-prem → Cloud)
See [CLOUD.en.md](CLOUD.en.md) — migration strategies (6 Rs):
| Strategy | Description |
|-----------|-------|
| **Re-host (Lift & Shift)** | VM → Cloud VM (AWS MGN, Azure Migrate) |
| **Re-platform** | OS upgrade, managed DB (RDS, Cloud SQL) |
| **Re-architect** | Application rewritten as cloud-native |
| **Retire** | Decommission unnecessary applications |
| **Retain** | Application stays on-prem (review later) |
| **Repurchase** | SaaS replacement |
---
## Recommended approach per DC size
| DC Size | VM Count | Recommended strategy | Duration | Team |
|-------------|----------|---------------------|-------------|-----|
| **Small** | < 50 | Big Bang (weekend) | 24 days | 35 people |
| **Medium** | 50500 | Phased (510 waves) | 28 weeks | 510 people |
| **Large** | 5005000 | Phased + Rolling | 312 months | 1030 people |
| **Enterprise** | 5000+ | Parallel Run / Rolling | 1236 months | 30+ people |
---
## Related
- [DATACENTERS.en.md](DATACENTERS.en.md) — DC topologies, secondary DC, deployment order
- [CLOUD.en.md](CLOUD.en.md) — cloud migration strategies (6 Rs)
- [DR.en.md](DR.en.md) — disaster recovery, RTO/RPO
- [NETWORKING.en.md](NETWORKING.en.md) — BGP, DNS, VXLAN, traffic steering
- [STORAGE.en.md](STORAGE.en.md) — storage replication
## Sources
Links, books, and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)
*Last revision: 2026-06-12*

246
DC-MIGRATION.md Normal file
View File

@@ -0,0 +1,246 @@
# 🏗️ Migrace datových center
## Strategie migrace
| Strategie | RTO | RPO | Riziko | Náklady | Doba trvání | Popis |
|-----------|-----|-----|--------|---------|-------------|-------|
| **Cold / Big Bang** | hodinydny | dny | Vysoké | Nízké | dny | Vše najednou vypnout, přesunout, zapnout |
| **Phased / Wave** | minuty (per wave) | minuty | Střední | Střední | týdnyměsíce | Workloady po vlnách |
| **Rolling** | 0 (live) | 0 | Nízké | Vysoké | měsíce | Live migration per VM/služba |
| **Parallel Run** | 0 | 0 | Nízké | Velmi vysoké | měsíce | Oba DC v provozu, postupný přechod |
| **Pilot Light** | hodiny | minuty | Střední | Nízké | týdny | Kritické služby v novém DC, ostatní se přesouvají |
| **Lift & Shift** | hodiny | minuty | Střední | Nízké | týdny | VM/servery přesunuty bez změny konfigurace |
| **Re-platform** | hodiny | minuty | Nízké | Střední | měsíce | Optimalizace během migrace (OS upgrade, resize) |
| **Re-architect** | 0 | 0 | Nízké | Vysoké | měsíceroky | Aplikace přepracována pro novou platformu |
---
## Rozhodovací strom
```mermaid
flowchart TD
Start(["Migrace DC"]) --> APP{"Aplikace\nstateful?"}
APP -->|"Ano"| DOWNTIME{"Toleruje\nvýpadek?"}
APP -->|"Ne"| ROLLING["Rolling / Parallel Run"]
DOWNTIME -->|"Ano, hodiny+"| COLD["Cold / Big Bang\nNejjednodušší, nejlevnější\nRiziko: vše najednou"]
DOWNTIME -->|"Ano, minuty"| PHASED["Phased / Wave\nPo aplikacích / byznys jednotkách"]
DOWNTIME -->|"Ne (zero downtime)"| SYNC{"Sync replikace\nmožná?"}
SYNC -->|"Ano, < 100 km"| ROLLING
SYNC -->|"Ne"| PARALLEL["Parallel Run\nOba DC aktivní, postupný cutover"]
ROLLING --> ROLL_HA{"VMware,\nHyper-V?"}
ROLL_HA -->|"Ano"| VMOTION["vMotion / Storage vMotion\nLive migration, 0 downtime"]
ROLL_HA -->|"Ne"| ROLL_REPL["Storage + DB replikace\nPostupný přesun workloadů"]
```
---
## Fáze migrace
### 1. Discovery a assessment
| Úkol | Nástroje | Výstup |
|------|----------|--------|
| Inventarizace HW a SW | RVTools, NetBox, CMDB | Seznam všech serverů, VM, služeb |
| Dependency mapping | ServiceNow, AppDynamics, manual | Aplikační dependency graf |
| Traffic analysis | NetFlow, sFlow, vRNI | BANDWIDTH, latency, peak usage |
| Výkonnostní baseline | Prometheus, Zabbix, vRealize | CPU/RAM/disk/network per workload |
| Licenční audit | Flexera, SAM | Licence, support, compliance |
**Výstupem je:** seznam workloadů s RTO/RPO, závislostmi a kritičností. Bez toho nelze naplánovat migraci.
### 2. Plánování
- **Wave plán** — rozdělení workloadů do migračních vln (1050 VM na vlnu)
- **Závislostní řazení** — DNS, NTP, LDAP, PKI musí být první
- **Cutover okno** — časové okno pro přepnutí (typicky víkend)
- **Rollback plán** — podmínky a postup pro vrácení
- **Testovací plán** — co a jak testovat po migraci
- **Komunikační plán** — kdo, kdy, jak je informován
### 3. Příprava nového DC
- **Infrastruktura** — DNS, NTP, DHCP, LDAP/AD, PKI, monitoring (viz [DATACENTERS.md](DATACENTERS.md) — deployment order)
- **Network** — BGP peering, VXLAN/VLAN, firewall pravidla, load balancery
- **Storage** — SAN zoning, NAS exports, Ceph cluster
- **Virtualizace** — vCenter, Hyper-V cluster, Proxmox
### 4. Replikace a synchronizace
| Vrstva | Metoda | Nástroje |
|--------|--------|----------|
| **Storage (block)** | SAN sync/async mirror, LUN replication | NetApp SnapMirror, Dell EMC RecoverPoint, Pure ActiveCluster |
| **Storage (file)** | DFS-R, rsync, robocopy | Windows DFS, Rsync |
| **Storage (object)** | Cross-region replication | MinIO replication, S3 CRR |
| **Databáze** | Log shipping, CDC, streaming replication | PostgreSQL Patroni, Oracle Data Guard, MSSQL AlwaysOn, MySQL Group Replication |
| **VM** | Storage vMotion, replication | VMware vSphere Replication, Hyper-V Replica, Zerto |
| **Kubernetes** | Velero + Restic, Rook Ceph mirror | Velero, Rook |
### 5. Migrace workloadů
#### Wave migrace (doporučeno pro střední/větší DC)
```mermaid
gantt
title Wave migrace
dateFormat YYYY-MM-DD
section Wave 1 - Core
DNS, NTP, LDAP :done, w1a, 2026-07-01, 3d
Monitoring + logging :done, w1b, after w1a, 2d
section Wave 2 - Network
Load balancers :active, w2a, 2026-07-06, 2d
Firewalls :active, w2b, 2026-07-08, 2d
section Wave 3 - Storage
NAS migrace :w3a, 2026-07-10, 5d
SAN replication :w3b, 2026-07-10, 3d
section Wave 4 - Dev/Test
Dev VMs :w4a, 2026-07-15, 5d
section Wave 5 - Prod tier 3
Internal apps :w5a, 2026-07-22, 5d
section Wave 6 - Prod tier 2
Business apps :w6a, 2026-07-29, 5d
section Wave 7 - Prod tier 1
Critical apps :w7a, 2026-08-05, 5d
```
#### Typický postup jedné vlny:
1. **Den -7**: Sync replikace dat (initial seed)
2. **Den -1**: Incremental sync, final test
3. **Den 0 (cutover)**:
- Zastavení aplikace ve zdrojovém DC
- Final sync (poslední delta)
- Start aplikace v cílovém DC
- DNS/Traffic switch
- Smoke test
4. **Den +1**: Monitorování (výkon, chyby, lag)
5. **Den +7**: Rollback window end (potvrzení úspěchu)
### 6. Síťové strategie
#### IP re-addressing
| Přístup | Popis | Výhody | Nevýhody |
|---------|-------|--------|----------|
| **Keep IP** | Stejné IP, BGP anycast nebo stretch VLAN | Není třeba měnit konfiguraci aplikací | Stretched VLAN/L2 omezení |
| **Change IP** | Nový IP rozsah, DNS/BGP routing změna | Čistá architektura | Změny konfigurací, DNS TTL |
| **NAT překlad** | NAT mezi starým a novým IP spacem | Bez změny aplikací | Latence, komplexita troubleshooting |
**Keep IP** je možný jen:
- L2 stretch mezi DC (VXLAN, OTV) — omezeno vzdáleností
- BGP anycast pro VIP (load balancery)
- Aplikace tolerující ARP cache změny
#### DNS cutover
```
1. Snížit TTL na 60300 s (týden předem)
2. Při cutoveru změnit A/AAAA záznamy na nové IP
3. Počkat na propagaci (dle TTL)
4. Monitorovat traffic
```
#### Traffic steering
| Technika | Use case |
|----------|----------|
| **BGP** | Změna AS path / local pref pro přesměrování trafficu |
| **DNS** | Snížení TTL, change A records |
| **Load balancer** | Změna pool members, health check |
| **GSLB** | Global Server Load Balancing (F5 GTM, NSX ALB) |
| **Cloud DNS** | AWS Route53, Azure Traffic Manager, Google Cloud DNS |
### 7. Databázová migrace
Viz detail v jednotlivých DB souborech. Tabulka shrnutí:
| DB | Metoda | RPO | RTO | Poznámka |
|----|--------|-----|-----|----------|
| **PostgreSQL** | Streaming replication + Patroni switchover | 0 (sync) / ~MB (async) | min | Patroni auto-failover |
| **MySQL** | Group Replication / async replication | 0 (sync) / sekundy | min | InnoDB Cluster |
| **Oracle** | Data Guard switchover | 0 (sync) | min | Far sync pro vzdálené DC |
| **MSSQL** | AlwaysOn AG failover | 0 (sync) | min | Cloud witness |
| **MongoDB** | Replica set election | sekundy | < 1 min | Priority-based failover |
| **Cassandra** | Multi-DC replication | eventual | 0 | Nativní multi-master |
### 8. Testování
| Fáze | Co testovat | Metoda |
|------|-------------|--------|
| **Pre-migrace** | Aplikace v novém DC (izolovaně) | Dry run na replikovaných datech |
| **Cutover** | Funkčnost, dostupnost, latence | Smoke test, synthetic transactions |
| **Post-migrace** | Výkon, integrace, monitoring | A/B comparison s baseline, canary traffic |
| **Rollback** | Návrat ke starému DC | Testovaný rollback plán |
### 9. Rollback plán
Každá vlna musí mít definovaný rollback:
| Podmínka | Akce |
|----------|------|
| Aplikace nestartuje v novém DC | Přepnutí DNS zpět, zastavení replikace |
| Výkon horší než baseline (o > 20 %) | Rollback, analýza příčiny |
| Integrační selhání (API timeout, DB connection) | Rollback, dependency check |
| Bezpečnostní incident | Rollback, forenzní analýza |
Rollback by měl být otestován **před** reálným cutoverem.
---
## Speciální případy
### Mainframe migrace
- **IBM z/OS** — GDPS (Geographically Dispersed Parallel Sysplex)
- HyperSwap pro storage mirroring
- Cross-system coupling facility (XCF)
- Často poslední migrovaná komponenta
### COTS aplikace (Oracle EBS, SAP)
- Vyžadují specifické migrační postupy výrobce
- Oracle EBS: Autoconfig, cloning (ADXLC)
- SAP: System Copy (Homogeneous / Heterogeneous), SWPM, SUM
- Licenční re-licensing při změně HW
### Cloud migrace (On-prem → Cloud)
Viz [CLOUD.md](CLOUD.md) — migrační strategie (6 Rs):
| Strategie | Popis |
|-----------|-------|
| **Re-host (Lift & Shift)** | VM → Cloud VM (AWS MGN, Azure Migrate) |
| **Re-platform** | OS upgrade, managed DB (RDS, Cloud SQL) |
| **Re-architect** | Aplikace přepsána na cloud-native |
| **Retire** | Zastavení nepotřebných aplikací |
| **Retain** | Aplikace zůstává on-prem (revize později) |
| **Repurchase** | SaaS náhrada |
---
## Doporučený postup per velikost DC
| Velikost DC | Počet VM | Doporučená strategie | Doba trvání | Tým |
|-------------|----------|---------------------|-------------|-----|
| **Small** | < 50 | Big Bang (víkend) | 24 dny | 35 lidí |
| **Medium** | 50500 | Phased (510 wave) | 28 týdnů | 510 lidí |
| **Large** | 5005000 | Phased + Rolling | 312 měsíců | 1030 lidí |
| **Enterprise** | 5000+ | Parallel Run / Rolling | 1236 měsíců | 30+ lidí |
---
## Související
- [DATACENTERS.md](DATACENTERS.md) — DC topologie, sekundární DC, deployment order
- [CLOUD.md](CLOUD.md) — cloud migrační strategie (6 Rs)
- [DR.md](DR.md) — disaster recovery, RTO/RPO
- [NETWORKING.md](NETWORKING.md) — BGP, DNS, VXLAN, traffic steering
- [STORAGE.md](STORAGE.md) — storage replikace
## Zdroje
Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
*Poslední revize: 2026-06-12*

336
DR.en.md Normal file
View File

@@ -0,0 +1,336 @@
# 🔄 Disaster Recovery and Business Continuity
## Terminology
| Abbreviation | Meaning | Description |
|---------|--------|-------|
| **RTO** | Recovery Time Objective | Maximum time from outage to service recovery |
| **RPO** | Recovery Point Objective | Maximum acceptable data loss (time since last backup) |
| **MTD** | Maximum Tolerable Downtime | Total outage duration an organization can survive |
| **WRT** | Work Recovery Time | Time needed for full operations recovery after IT restoration |
| **MTBF** | Mean Time Between Failures | Mean time between failures |
| **MTTR** | Mean Time To Repair | Mean time to repair |
| **SLA** | Service Level Agreement | Contractual availability commitment |
| **SLO** | Service Level Objective | Internal availability target |
| **SLI** | Service Level Indicator | Measured availability value |
### Relationship between RTO, RPO, MTD, WRT
```
Outage ──── RPO ────► Data restored ──── RTO ────► Service running ──── WRT ────► Full operations
│ │ │
▼ ▼ ▼
Lost data Time without service Time to full capacity
MTD = RTO + WRT (max. time the business tolerates)
```
---
## Uptime calculation
### Nines table
| Level | Uptime | Downtime / year | Downtime / month | Downtime / week |
|--------|--------|---------------|------------------|------------------|
| 90 % (one nine) | 0.9 | 36.5 days | 72 h | 16.8 h |
| 99 % (two nines) | 0.99 | 3.65 days | 7.2 h | 1.68 h |
| 99.5 % | 0.995 | 1.83 days | 3.6 h | 50.4 min |
| 99.9 % (three nines) | 0.999 | 8.76 h | 43.2 min | 10.1 min |
| 99.95 % | 0.9995 | 4.38 h | 21.6 min | 5.04 min |
| 99.99 % (four nines) | 0.9999 | 52.6 min | 4.32 min | 1.01 min |
| 99.995 % | 0.99995 | 26.3 min | 2.16 min | 30.2 s |
| 99.999 % (five nines) | 0.99999 | 5.26 min | 25.9 s | 6.05 s |
| 99.9999 % (six nines) | 0.999999 | 31.6 s | 2.59 s | 0.605 s |
### Calculation
```
Availability = (Total time - Downtime) / Total time × 100 %
Example:
Year = 365 × 24 × 60 = 525,600 minutes
Target: 99.9 % → allowed downtime = 525,600 × (1 - 0.999) = 525.6 minutes = 8.76 h
Combined availability (chain of dependencies):
A_web = 99.9 % (3 nines)
A_api = 99.99 % (4 nines)
A_db = 99.999 % (5 nines)
A_total = 0.999 × 0.9999 × 0.99999 = 0.99889 ≈ 99.89 % (less than 3 nines!)
Parallel availability (redundancy):
A_total = 1 - (1 - A_1) × (1 - A_2) × ... × (1 - A_n)
Example: 2 servers with 99% availability
A_total = 1 - (1-0.99) × (1-0.99) = 1 - 0.01 × 0.01 = 0.9999 (99.99 %)
```
### Calculator
```python
def uptime_percent_to_downtime(pct, period_days=365):
"""Convert uptime percentage to downtime in given period."""
total_minutes = period_days * 24 * 60
allowed_downtime = total_minutes * (1 - pct / 100)
return allowed_downtime # minutes
def downtime_to_uptime_percent(downtime_minutes, period_days=365):
"""Convert downtime in minutes to uptime percentage."""
total_minutes = period_days * 24 * 60
return (1 - downtime_minutes / total_minutes) * 100
def combined_availability(availabilities):
"""Combined availability (series-connected components)."""
result = 1.0
for a in availabilities:
result *= a
return result
def redundant_availability(availabilities):
"""Redundant availability (parallel components)."""
result = 1.0
for a in availabilities:
result *= (1 - a)
return 1 - result
```
### Calculation fallacies
- **Combined availability is not a sum** — adding another dependency always reduces total availability
- **Redundancy is not free** — adding a standby component requires failure detection + failover (MTTR does not improve automatically)
- **SLA is not a guarantee** — providers often calculate SLA as a monthly average, not per-incident
- **Measurement is key** — without SLI, SLO cannot be verified; "unmeasured availability does not exist"
- **Planned maintenance** — sometimes counted as uptime, sometimes not (depends on SLA definition)
---
## DR scenarios
### Classification
| Category | Scenario | Typical RTO | Typical RPO | Frequency |
|-----------|--------|-------------|-------------|-----------|
| **Site** | Entire DC / region outage | hours | minutes | Low |
| **Infrastructure** | HW failure (storage, switch, server) | minuteshours | seconds | Medium |
| **Software** | OS, application, DB failure | minutes | seconds | High |
| **Data** | Data corruption, deletion, cryptolocker | hours | backup point | Lowmedium |
| **Human** | Wrong deployment, config change | minuteshours | seconds | Medium |
| **Security** | Attack, breach, ransomware | days | before attack | Low |
| **Network** | Connectivity outage, DDoS | minuteshours | N/A | Medium |
| **Cloud provider** | Regional outage (AWS, Azure, GCP) | hours | minutes | Very low |
### Scenario details
#### Site / Region failure
| Aspect | Description |
|--------|-------|
| **Cause** | Blackout, fire, flood, earthquake, cloud provider outage |
| **Prevention** | Multi-AZ architecture, multi-region deployment, active-active |
| **Mitigation** | Automatic DNS failover (Route53, Azure Traffic Manager), replica in DR region |
| **Testing** | Game day: shut down primary region, verify automatic failover |
#### Data corruption / human error
| Aspect | Description |
|--------|-------|
| **Cause** | Wrong SQL command (DELETE without WHERE), accidentally deleted bucket, bad migration |
| **Prevention** | RBAC, MFA for destructive operations, change management, SQL peer review |
| **Mitigation** | Point-in-time recovery (PITR), transaction log replay, immutable backups |
| **Testing** | Restore backup to isolated environment, verify data integrity |
#### Ransomware / cyber attack
| Aspect | Description |
|--------|-------|
| **Cause** | Attack on production systems, data encryption, exfiltration |
| **Prevention** | Immutable backups (object lock), air-gapped backups, network segmentation |
| **Mitigation** | Restore from clean backup, rebuild infrastructure from IaC |
| **Testing** | Regular restore in isolated network, verify backup is not infected |
---
## Prevention — strategies
### Backup strategies
| Approach | Description | Use case |
|---------|-------|----------|
| **3-2-1 rule** | 3 copies, 2 different media, 1 off-site | Universal |
| **3-2-1-0** | + 0 errors after restore (testing) | Enterprise, compliance |
| **GFS (Grandfather-Father-Son)** | Daily, weekly, monthly rotation | Long-term archive |
| **Incremental forever** | Full backup 1×, then only changes | Large data volumes |
| **Reverse incremental** | Full + incremental, full is always current | Fast recovery |
### Backup methods
| Method | RPO | RTO | Storage | Suitable for |
|--------|-----|-----|----------|------------|
| **Full backup** | Last full | Full restore time | Large | Small data, weekly |
| **Incremental** | Last incremental | Full + all incrementals | Small | Large data, daily |
| **Differential** | Last diff | Full + last diff | Medium | Compromise |
| **Snapshot** | Snapshot point-in-time | seconds | Copy-on-write | VM, storage array |
| **Continuous (CDC)** | < 1 s | Seconds | Log stream | DB (binlog, WAL) |
| **PITR** | Any point in time | Depends on volume | Full + WAL | RDS, PostgreSQL, SQL Server |
### Backup immutability
Key protection against ransomware:
| Technique | Description |
|----------|-------|
| **Object Lock (WORM)** | Backup cannot be deleted or overwritten for a defined retention period (S3 Object Lock, Azure Blob Immutable) |
| **Air gap** | Backup is physically separated from the production network (offline disk, tape, cloud without VPN) |
| **Isolated backup network** | Backup traffic goes through a dedicated network without access from production VLAN |
| **Out-of-band access** | Backup management console is not accessible from the production network |
---
## DR architectures
### Multi-AZ (Single region)
```
Region ┌────────────────────────────────────┐
│ AZ-1 AZ-2 │
│ ┌──────────┐ ┌──────────┐ │
│ │ App │ │ App │ │
│ └─────┬────┘ └─────┬────┘ │
│ │ │ │
│ ┌─────▼────────────────▼─────┐ │
│ │ Load Balancer (cross-AZ) │ │
│ └─────────────┬──────────────┘ │
│ │ │
│ ┌─────────────▼──────────────┐ │
│ │ DB Primary (AZ-1) │ │
│ │ DB Standby (AZ-2) │ │
│ │ Synchronous replication │ │
│ └────────────────────────────┘ │
└────────────────────────────────────┘
```
- RTO: minutes (automatic failover)
- RPO: 0 (sync replication)
- Protection: against AZ failure, not region failure
### Multi-Region
```
Region A (Primary) Region B (DR)
┌─────────────────────┐ ┌─────────────────────┐
│ ┌───────────────┐ │ │ ┌───────────────┐ │
│ │ App + DB │ │ │ │ App + DB │ │
│ │ Active │──┼──Async───────┼─►│ Standby │ │
│ └───────────────┘ │ replication │ └───────────────┘ │
│ │ │ │ │ │
│ ┌──────▼───────┐ │ │ ┌──────▼───────┐ │
│ │ DNS / GSLB │ │ │ │ DNS / GSLB │ │
│ └──────┬───────┘ │ │ └──────┬───────┘ │
└─────────┼──────────┘ └─────────┼──────────┘
│ │
└──────────── Traffic Manager ───────┘
```
| Variant | RTO | RPO | Cost | Failover |
|----------|-----|-----|---------|----------|
| **Active-Passive** | minuteshours | seconds | Medium | Manual / auto |
| **Active-Active** | seconds | < 1 s | High | Automatic (DNS) |
| **Pilot Light** | tens of minutes | minutes | Low | Manual scaling |
| **Warm Standby** | minutes | seconds | High | Auto (reduced copy) |
| **Backup & Restore** | hours | 24 h | Low | Manual |
### On-prem → Cloud DR (Hybrid)
```
On-prem DC Cloud (DR)
┌─────────────────────┐ ┌─────────────────────┐
│ ┌───────────────┐ │ │ ┌───────────────┐ │
│ │ Application │ │ │ │ VM / App │ │
│ │ + DB │ │ │ │ + DB replica │ │
│ └───────┬───────┘ │ │ └───────┬───────┘ │
│ │ │ │ │ │
│ ┌───────▼───────┐ │ site-to-site│ ┌───────▼───────┐ │
│ │ Backup proxy │──┼────VPN───────┼─►│ Backup store │ │
│ └───────────────┘ │ │ └───────────────┘ │
│ │ │ │
│ ┌───────────────┐ │ │ ┌───────────────┐ │
│ │ Tape / NAS │ │ │ │ Veeam / Zerto│ │
│ └───────────────┘ │ │ └───────────────┘ │
└─────────────────────┘ └─────────────────────┘
```
- **RTO**: tens of minutes (depends on VM startup)
- **RPO**: minuteshours (depends on replication tool)
- **Tools**: Veeam, Zerto, Azure Site Recovery, AWS MGN, Commvault
- **Use case**: enterprise with on-prem DC that needs DR without a second DC
---
## DR testing
### Test types
| Type | Description | Frequency | Risk |
|-----|-------|-----------|--------|
| **Tabletop exercise** | Manual scenario walkthrough, no impact on production | Monthly | None |
| **Walkthrough** | Runbook verification, ensure everyone knows what to do | Quarterly | None |
| **Component test** | Test of a single component (e.g., restore one DB) | Monthly | Low |
| **Integrated test** | Test of the entire stack in isolated environment | Quarterly | Low |
| **Full failover test** | Production failover to DR site | Annually | High |
| **Chaos experiment** | Targeted fault injection into production | Continuous | Medium |
### Runbook structure
Each DR scenario should have a runbook:
```yaml
scenario: "Region A failure"
triggers:
- "CloudWatch alarm: Region A health check 5× timeout"
- "PagerDuty incident P0"
decision_tree: |
1. Verify: is Region A really unavailable? (check from 3 different locations)
2. Decide: is RTO at risk? If < 30 % RTO remaining → failover
3. Failover: run playbook `dr-failover-region-b`
4. Verification: smoke tests in Region B
5. Communication: status page + stakeholders
rollback: |
1. After Region A recovery → replicate changes from B back to A
2. Repoint DNS to A
3. Verify data consistency
4. Shut down Region B (or keep as hot standby)
contacts:
primary: "on-call@example.com"
escalation: "infra-lead@example.com"
management: "vp-engineering@example.com"
```
---
## Best practices
- **Test recovery, not backup** — a backup without tested recovery is not a backup
- **Automate DR** — Terraform / Ansible for DR environment spin-up, DNS failover
- **Document runbooks** — every scenario, contact, decision tree
- **Expect failure** — design for failure, don't expect everything to work
- **Don't underestimate WRT** — service recovery does not mean full operations (data warming, cache, connections)
- **Align RTO/RPO with business** — technical capabilities must match business requirements
- **Monitor SLI** — without data, SLO cannot be verified
- **DR is not just IT** — communication, PR, legal, compliance
---
## Related
- [CLOUD.en.md](CLOUD.en.md) — cloud DR strategy, AWS/Azure/GCP specific
- [DATACENTERS.en.md](DATACENTERS.en.md) — DC redundancy, Tier classification
- [MONITORING.en.md](MONITORING.en.md) — alerting, SLI/SLO/SLA
- [CICD.en.md](CICD.en.md) — deployment strategy, rollback
- [STORAGE.en.md](STORAGE.en.md) — backup storage, replication
## Sources
Odkazy, knihy a standardy: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)
*Last revised: 2026-06-11*

336
DR.md Normal file
View File

@@ -0,0 +1,336 @@
# 🔄 Disaster Recovery a Business Continuity
## Terminologie
| Zkratka | Význam | Popis |
|---------|--------|-------|
| **RTO** | Recovery Time Objective | Maximální doba od výpadku do obnovení služby |
| **RPO** | Recovery Point Objective | Maximální přípustná ztráta dat (čas od poslední zálohy) |
| **MTD** | Maximum Tolerable Downtime | Celková doba výpadku, kterou organizace přežije |
| **WRT** | Work Recovery Time | Čas potřebný k plnému obnovení provozu po obnovení IT |
| **MTBF** | Mean Time Between Failures | Střední doba mezi poruchami |
| **MTTR** | Mean Time To Repair | Střední doba opravy |
| **SLA** | Service Level Agreement | Smluvní závazek dostupnosti |
| **SLO** | Service Level Objective | Interní cíl dostupnosti |
| **SLI** | Service Level Indicator | Naměřená hodnota dostupnosti |
### Vztah RTO, RPO, MTD, WRT
```
Výpadek ──── RPO ────► Obnova dat ──── RTO ────► Služba běží ──── WRT ────► Plný provoz
│ │ │
▼ ▼ ▼
Ztracená data Čas bez služby Čas do plného výkonu
MTD = RTO + WRT (max. doba, kterou firma toleruje)
```
---
## Výpočet uptimu
### Tabulka devítek
| Úroveň | Uptime | Downtime / rok | Downtime / měsíc | Downtime / týden |
|--------|--------|---------------|------------------|------------------|
| 90 % (jedna devítka) | 0.9 | 36,5 dne | 72 h | 16,8 h |
| 99 % (dvě devítky) | 0.99 | 3,65 dne | 7,2 h | 1,68 h |
| 99,5 % | 0.995 | 1,83 dne | 3,6 h | 50,4 min |
| 99,9 % (tři devítky) | 0.999 | 8,76 h | 43,2 min | 10,1 min |
| 99,95 % | 0.9995 | 4,38 h | 21,6 min | 5,04 min |
| 99,99 % (čtyři devítky) | 0.9999 | 52,6 min | 4,32 min | 1,01 min |
| 99,995 % | 0.99995 | 26,3 min | 2,16 min | 30,2 s |
| 99,999 % (pět devítek) | 0.99999 | 5,26 min | 25,9 s | 6,05 s |
| 99,9999 % (šest devítek) | 0.999999 | 31,6 s | 2,59 s | 0,605 s |
### Výpočet
```
Dostupnost = (Celkový čas - Downtime) / Celkový čas × 100 %
Příklad:
Rok = 365 × 24 × 60 = 525 600 minut
Cíl: 99,9 % → povolený downtime = 525 600 × (1 - 0,999) = 525,6 minut = 8,76 h
Složená dostupnost (řetězec závislostí):
A_web = 99,9 % (3 devítky)
A_api = 99,99 % (4 devítky)
A_db = 99,999 % (5 devítek)
A_celkem = 0,999 × 0,9999 × 0,99999 = 0,99889 ≈ 99,89 % (méně než 3 devítky!)
Paralelní dostupnost (redundance):
A_celkem = 1 - (1 - A_1) × (1 - A_2) × ... × (1 - A_n)
Příklad: 2 servery s 99% dostupností
A_celkem = 1 - (1-0,99) × (1-0,99) = 1 - 0,01 × 0,01 = 0,9999 (99,99 %)
```
### Kalkulačka
```python
def uptime_percent_to_downtime(pct, period_days=365):
"""Převede procento uptimu na downtime v daném období."""
total_minutes = period_days * 24 * 60
allowed_downtime = total_minutes * (1 - pct / 100)
return allowed_downtime # minutes
def downtime_to_uptime_percent(downtime_minutes, period_days=365):
"""Převede downtime v minutách na procento uptimu."""
total_minutes = period_days * 24 * 60
return (1 - downtime_minutes / total_minutes) * 100
def combined_availability(availabilities):
"""Složená dostupnost (sériově zapojené komponenty)."""
result = 1.0
for a in availabilities:
result *= a
return result
def redundant_availability(availabilities):
"""Paralelní dostupnost (redundantní komponenty)."""
result = 1.0
for a in availabilities:
result *= (1 - a)
return 1 - result
```
### Fallacies výpočtu
- **Složená dostupnost není součet** — přidání další závislosti vždy snižuje celkovou dostupnost
- **Redundance není zadarmo** — přidání standby komponenty vyžaduje detekci selhání + failover (MTTR se nezlepší automaticky)
- **SLA není garance** — poskytovatelé často počítají SLA jako měsíční průměr, ne per-incident
- **Měření je klíčové** — bez SLI nelze ověřit SLO; "nedoměřená dostupnost neexistuje"
- **Plánovaná odstávka** — někdy se počítá do uptimu, někdy ne (záleží na definici SLA)
---
## DR scénáře
### Klasifikace
| Kategorie | Scénář | Typický RTO | Typické RPO | Frekvence |
|-----------|--------|-------------|-------------|-----------|
| **Site** | Výpadek celého DC / regionu | hodiny | minuty | Nízká |
| **Infrastructure** | Selhání HW (storage, switch, server) | minutyhodiny | sekundy | Střední |
| **Software** | Selhání OS, aplikace, DB | minuty | vteřiny | Vysoká |
| **Data** | Poškození dat, delete, cryptolocker | hodiny | okamžik zálohy | Nízkástřední |
| **Human** | Chybný deployment, config change | minutyhodiny | vteřiny | Střední |
| **Security** | Útok, breach, ransomware | dny | před útokem | Nízká |
| **Network** | Výpadek konektivity, DDoS | minutyhodiny | N/A | Střední |
| **Cloud provider** | Regionální výpadek (AWS, Azure, GCP) | hodiny | minuty | Velmi nízká |
### Detail scénářů
#### Site / Region failure
| Aspekt | Popis |
|--------|-------|
| **Příčina** | Blackout, požár, povodeň, zemětřesení, výpadek cloud providera |
| **Prevence** | Multi-AZ architektura, multi-region deployment, active-active |
| **Mitigace** | Automatický DNS failover (Route53, Azure Traffic Manager), replica v DR regionu |
| **Testování** | Game day: vypnout primární region, ověřit automatický failover |
#### Data corruption / human error
| Aspekt | Popis |
|--------|-------|
| **Příčina** | Chybný SQL příkaz (DELETE bez WHERE), omylem smazaný bucket, chybná migrace |
| **Prevence** | RBAC, MFA pro destructive operace, change management, peer review SQL |
| **Mitigace** | Point-in-time recovery (PITR), transaction log replay, immutable backups |
| **Testování** | Obnova zálohy do izolovaného prostředí, ověření integrity dat |
#### Ransomware / cyber attack
| Aspekt | Popis |
|--------|-------|
| **Příčina** | Útok na produkční systémy, zašifrování dat, exfiltrace |
| **Prevence** | Immutable backups (object lock), air-gapped backups, network segmentation |
| **Mitigace** | Obnova z čisté zálohy, re-build infrastructure from IaC |
| **Testování** | Pravidelná obnova v izolované síti, ověření že backup není infikován |
---
## Prevence — strategie
### Backup strategie
| Aproach | Popis | Use case |
|---------|-------|----------|
| **3-2-1 pravidlo** | 3 kopie, 2 různá média, 1 off-site | Univerzální |
| **3-2-1-0** | + 0 chyb po obnově (testování) | Enterprise, compliance |
| **GFS (Grandfather-Father-Son)** | Denní, týdenní, měsíční rotace | Dlouhodobý archiv |
| **Incremental forever** | Plná záloha 1×, pak jen změny | Velké objemy dat |
| **Reverse incremental** | Plná + inkrementální, plná je vždy aktuální | Rychlá obnova |
### Zálohovací metody
| Metoda | RPO | RTO | Úložiště | Vhodné pro |
|--------|-----|-----|----------|------------|
| **Full backup** | Poslední full | Doba obnovy full | Velké | Malá data, weekly |
| **Incremental** | Poslední inkrement | Full + všechny inkrementy | Malé | Velká data, daily |
| **Differential** | Poslední diff | Full + poslední diff | Střední | Kompromis |
| **Snapshot** | Okamžik snapshotu | vteřiny | Copy-on-write | VM, storage array |
| **Continuous (CDC)** | < 1 s | Sekundy | Log stream | DB (binlog, WAL) |
| **PITR** | Libovolný bod v čase | Dle objemu | Full + WAL | RDS, PostgreSQL, SQL Server |
### Imunabilita backupů
Klíčová ochrana proti ransomwaru:
| Technika | Popis |
|----------|-------|
| **Object Lock (WORM)** | Backup nelze smazat ani přepsat po defined retention period (S3 Object Lock, Azure Blob Immutable) |
| **Air gap** | Backup je fyzicky oddělený od produkční sítě (offline disk, tape, cloud bez VPN) |
| **Isolated backup network** | Backup traffic jde přes dedikovanou síť bez přístupu z produkční VLAN |
| **Out-of-band access** | Backup management console není dostupná z produkční sítě |
---
## DR architektury
### Multi-AZ (Single region)
```
Region ┌────────────────────────────────────┐
│ AZ-1 AZ-2 │
│ ┌──────────┐ ┌──────────┐ │
│ │ App │ │ App │ │
│ └─────┬────┘ └─────┬────┘ │
│ │ │ │
│ ┌─────▼────────────────▼─────┐ │
│ │ Load Balancer (cross-AZ) │ │
│ └─────────────┬──────────────┘ │
│ │ │
│ ┌─────────────▼──────────────┐ │
│ │ DB Primary (AZ-1) │ │
│ │ DB Standby (AZ-2) │ │
│ │ Synchronous replication │ │
│ └────────────────────────────┘ │
└────────────────────────────────────┘
```
- RTO: minuty (automatický failover)
- RPO: 0 (sync replication)
- Ochrana: proti selhání AZ, nikoliv regionu
### Multi-Region
```
Region A (Primary) Region B (DR)
┌─────────────────────┐ ┌─────────────────────┐
│ ┌───────────────┐ │ │ ┌───────────────┐ │
│ │ App + DB │ │ │ │ App + DB │ │
│ │ Active │──┼──Async───────┼─►│ Standby │ │
│ └───────────────┘ │ replikace │ └───────────────┘ │
│ │ │ │ │ │
│ ┌──────▼───────┐ │ │ ┌──────▼───────┐ │
│ │ DNS / GSLB │ │ │ │ DNS / GSLB │ │
│ └──────┬───────┘ │ │ └──────┬───────┘ │
└─────────┼──────────┘ └─────────┼──────────┘
│ │
└──────────── Traffic Manager ───────┘
```
| Varianta | RTO | RPO | Náklady | Failover |
|----------|-----|-----|---------|----------|
| **Active-Passive** | minutyhodiny | sekundy | Střední | Manuální / auto |
| **Active-Active** | sekundy | < 1 s | Vysoké | Automatický (DNS) |
| **Pilot Light** | desítky minut | minuty | Nízké | Manuální škálování |
| **Warm Standby** | minuty | sekundy | Vysoké | Auto (zmenšená kopie) |
| **Backup & Restore** | hodiny | 24 h | Nízké | Manuální |
### On-prem → Cloud DR (Hybrid)
```
On-prem DC Cloud (DR)
┌─────────────────────┐ ┌─────────────────────┐
│ ┌───────────────┐ │ │ ┌───────────────┐ │
│ │ Aplikace │ │ │ │ VM / Aplikace│ │
│ │ + DB │ │ │ │ + DB replica │ │
│ └───────┬───────┘ │ │ └───────┬───────┘ │
│ │ │ │ │ │
│ ┌───────▼───────┐ │ site-to-site│ ┌───────▼───────┐ │
│ │ Backup proxy │──┼────VPN───────┼─►│ Backup store │ │
│ └───────────────┘ │ │ └───────────────┘ │
│ │ │ │
│ ┌───────────────┐ │ │ ┌───────────────┐ │
│ │ Tape / NAS │ │ │ │ Veeam / Zerto│ │
│ └───────────────┘ │ │ └───────────────┘ │
└─────────────────────┘ └─────────────────────┘
```
- **RTO**: desítky minut (závisí na startup VM)
- **RPO**: minutyhodiny (závisí na replikačním nástroji)
- **Nástroje**: Veeam, Zerto, Azure Site Recovery, AWS MGN, Commvault
- **Use case**: enterprise s on-prem DC, které potřebuje DR bez druhého DC
---
## DR testování
### Typy testů
| Typ | Popis | Frekvence | Riziko |
|-----|-------|-----------|--------|
| **Tabletop exercise** | Manuální procházení scénáře, žádný dopad na produkci | Měsíčně | Žádné |
| **Walkthrough** | Verifikace runbooku, kontrola že všichni ví co dělat | Kvartálně | Žádné |
| **Component test** | Test jedné komponenty (např. obnova jedné DB) | Měsíčně | Nízké |
| **Integrated test** | Test celého stacku v izolovaném prostředí | Kvartálně | Nízké |
| **Full failover test** | Produkční failover do DR site | Ročně | Vysoké |
| **Chaos experiment** | Cílené vnášení poruch do produkce | Průběžně | Střední |
### Runbook struktura
Každý DR scénář by měl mít runbook:
```yaml
scenario: "Region A failure"
triggers:
- "CloudWatch alarm: Region A health check 5× timeout"
- "PagerDuty incident P0"
decision_tree: |
1. Ověřit: je Region A opravdu nedostupný? (check z 3 různých lokací)
2. Rozhodnout: je RTO v ohrožení? Pokud zbývá < 30 % RTO → failover
3. Failover: spustit playbook `dr-failover-region-b`
4. Verifikace: smoke testy v Region B
5. Komunikace: status page + stakeholders
rollback: |
1. Po obnovení Region A → replikace změn z B zpět do A
2. Repoint DNS na A
3. Ověřit konzistenci dat
4. Vypnout Region B (nebo ponechat jako hot standby)
contacts:
primary: "on-call@example.com"
escalation: "infra-lead@example.com"
management: "vp-engineering@example.com"
```
---
## Best practices
- **Testuj obnovu, ne zálohu** — backup bez testované obnovy není backup
- **Automatizuj DR** — Terraform / Ansible pro spin-up DR prostředí, DNS failover
- **Dokumentuj runbooky** — každý scénář, kontakt, rozhodovací strom
- **Počítej se selháním** — design for failure, nečekej že všechno poběží
- **Nepodceňuj WRT** — obnova služby neznamená plný provoz (data warming, cache, connections)
- **Slaď RTO/RPO s businessem** — technické možnosti musí odpovídat obchodním požadavkům
- **Monitoruj SLI** — bez dat nelze ověřit SLO
- **DR není jen IT** — komunikace, PR, právní, regulace
---
## Související
- [CLOUD.md](CLOUD.md) — cloud DR strategie, AWS/Azure/GCP specific
- [DATACENTERS.md](DATACENTERS.md) — DC redundance, Tier klasifikace
- [MONITORING.md](MONITORING.md) — alerting, SLI/SLO/SLA
- [CICD.md](CICD.md) — deployment strategie, rollback
- [STORAGE.md](STORAGE.md) — backup storage, replication
## Zdroje
Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
*Poslední revize: 2026-06-11*

155
GPU.en.md Normal file
View File

@@ -0,0 +1,155 @@
# 🎮 GPU — architecture, models, virtualization
## GPU models
### NVIDIA
| GPU | Architecture | VRAM | HBM | FP16 (TFLOPS) | FP8 (TFLOPS) | Interconnect | TDP |
|-----|-------------|------|-----|--------------|-------------|-------------|-----|
| **A100** | Ampere (2020) | 40/80 GB | HBM2e | 312 | — | NVLink 3 (600 GB/s) | 400 W |
| **H100** | Hopper (2022) | 80 GB | HBM3 | 1000 | 2000 (sparse) | NVLink 4 (900 GB/s) | 700 W |
| **H200** | Hopper (2023) | 141 GB | HBM3e | 1650 | ~3300 | NVLink 4 (900 GB/s) | 700 W |
| **B200** | Blackwell (2024) | 192 GB | HBM3e | 2250 | ~4500 | NVLink 5 (1800 GB/s) | 700 W |
| **B100** | Blackwell (2024) | 192 GB | HBM3e | ~1800 | ~3600 | NVLink 5 | 700 W |
| **GB200** | Blackwell (2024) | — | HBM3e | 4500 (dual) | 9000 (dual) | NVLink 5 | 2700 W |
### AMD
| GPU | Architecture | VRAM | HBM | FP16 (TFLOPS) | Interconnect | TDP |
|-----|-------------|------|-----|--------------|-------------|-----|
| **MI250X** | CDNA 2 (2021) | 128 GB | HBM2e | 383 | Infinity Fabric | 500 W |
| **MI300X** | CDNA 3 (2023) | 192 GB | HBM3 | ~2600 | Infinity Fabric (896 GB/s) | 750 W |
| **MI350** | CDNA 4 (2025) | 288 GB | HBM3e | ~3500 | Infinity Fabric | 750 W |
## GPU interconnects
| Technology | Provider | Bandwidth | Topology | Use case |
|------------|-------------|-----------|-----------|----------|
| **NVLink 4** | NVIDIA | 900 GB/s (18× 50 GB/s) | GPU-GPU direct | AI training (H100, H200) |
| **NVLink 5** | NVIDIA | 1800 GB/s (18× 100 GB/s) | GPU-GPU direct | AI training (B200, GB200) |
| **Infinity Fabric** | AMD | 896 GB/s | GPU-GPU + CPU-GPU | AI training (MI300X, MI350) |
| **NVSwitch** | NVIDIA | 900 GB/s per GPU (NVLink) | Full-mesh (256 GPU) | DGX SuperPOD, HGX |
| **InfiniBand (NDR)** | NVIDIA/Mellanox | 400 Gbps per port | GPU-NIC direct, RDMA | Distributed training, HPC |
| **PCIe 5.0** | Standard | 63 GB/s per x16 | CPU-GPU | Inference, rendering |
| **Ethernet (RoCE v2)** | Standard | 100/200/400 GbE | GPU-NIC, RDMA over converged ethernet | AI inference, storage |
### GPU direct communication
```
GPU 0 ──NVLink── GPU 1 GPU 0 ───PCIe─── CPU ───PCIe─── GPU 1
│ │
│ │
NVSwitch InfiniBand
│ │
│ │
GPU 2 ──NVLink── GPU 3 GPU 2 ───PCIe─── CPU ───PCIe─── GPU 3
NVLink topologie (GPU direct) PCIe topologie (CPU mediated)
```
- **GPU Direct RDMA** — GPU ↔ NIC without CPU (InfiniBand, RoCE)
- **GPU Direct Storage** — GPU ↔ NVMe without CPU (NVIDIA Magnum IO)
- **NVSwitch** — full bisection bandwidth between all GPUs in a node
## GPU virtualization
| Technology | Description | GPU support | Use case |
|------------|-------|-------------|----------|
| **NVIDIA vGPU (Grid)** | Time slicing + dedicated profiles | A-series (VDI), Q-series (pro viz), B-series (AI) | VDI, virtualized AI |
| **NVIDIA MIG** | Hardware GPU partitioning | A100 (7 inst.), H100/H200/B200 | AI inference, multi-tenant GPU |
| **AMD MxGPU** | SR-IOV, hardware partitioning | AMD MI (pro), Radeon Pro | VDI, cloud gaming |
| **Intel SG (SG1)** | SR-IOV, hardware partitioning | Intel SG1, Flex, Arc | VDI, media transcoding |
| **GPU passthrough** | Dedicated GPU to whole VM (VFIO-pci) | All GPUs | AI training, HPC, highest performance |
### MIG partition table (A100 / H100)
| GPU | Partition profile | GPU Memory | Compute units |
|-----|------------------|-----------|--------------|
| **A100 80 GB** | 1g.5gb | 5 GB | 1 |
| A100 80 GB | 2g.10gb | 10 GB | 2 |
| A100 80 GB | 3g.20gb | 20 GB | 3 |
| A100 80 GB | 7g.40gb | 40 GB | 7 |
| A100 80 GB | Full (7× 1g) | 7 × 5 GB | 7 instances |
| **H100 80 GB** | 1g.6gb+me | 6 GB | 1 |
| H100 80 GB | 2g.12gb+me | 12 GB | 2 |
| H100 80 GB | 3g.24gb+me | 24 GB | 3 |
| H100 80 GB | 7g.80gb | 80 GB | 7 |
## GPU use cases
### AI Training
- **Models**: LLM (70B-405B+), vision, multimodal
- **GPU**: H100, B200, GB200, MI300X
- **Interconnect**: NVLink 5 / Infinity Fabric (within node), InfiniBand NDR (between nodes)
- **Parallelism**: Data Parallel (DDP), Tensor Parallel (TP), Pipeline Parallel (PP), Fully Sharded (FSDP)
- **Framework**: PyTorch (NCCL), JAX (XLA), DeepSpeed, Megatron-LM
- **Tips**:
- GB200: 2× B200 connected via NVLink, 8 GPU → 4 GB200
- DGX B200 / HGX B200: standard building block
- InfiniBand: fat tree topology for all-reduce optimization
### AI Inference
- **Models**: LLM serving, embedding, image gen
- **GPU**: A100, H200, B200 (larger VRAM for larger models)
- **Techniques**: MIG partition, TensorRT-LLM, vLLM, Triton Inference Server
- **Quantization**: FP8, INT8, INT4 → lower VRAM, higher throughput
- **Latency**: batch size optimization, dynamic batching, continuous batching
- **Scale**: on-prem (2-32 GPU) / cloud (elastic)
### VDI (Virtual Desktop Infrastructure)
- **GPU**: NVIDIA A16 (1 GPU = 16 users), A10 (1 GPU = 4 users)
- **Technology**: vGPU (Grid), AMD MxGPU
- **Protocols**: VMware Blast, Citrix HDX, Microsoft RDP, PC-over-IP (HP Teradici)
- **Use case**: CAD (CATIA, SolidWorks), Office, engineering, healthcare (PACS)
### Rendering and VFX
- **GPU**: NVIDIA RTX 6000 Ada, RTX A6000, AMD Radeon Pro W7900
- **Rendering**: Blender (Cycles/OptiX), V-Ray, Octane Render, Redshift
- **Denoising**: AI-accelerated denoising on GPU
- **Farm rendering**: Deadline, Qube! (job scheduler)
## GPU pricing
Detailed pricing comparisons (purchase price, cloud on-demand, $/M token inference cost, $/GB HBM, price trends 2024→2026) see:
- [AI-INFRASTRUCTURE.en.md — GPU pricing and price/performance](AI-INFRASTRUCTURE.en.md#gpu-pricing-and-priceperformance)
## GPU server form factors
| Form factor | GPU count | Power | Cooling | Example |
|------------|-----------|-------|---------|---------|
| **1U** | 1-2 | 700-1400 W | Air (high-RPM) | Dell XR4510c |
| **2U** | 4-8 | 3-6 kW | Air / Liquid | Dell R760xa, HPE DL380a |
| **4U** | 8-10 | 5-8 kW | Liquid | NVIDIA DGX H100, Dell R760xa |
| **8U / Chassis** | 8-16 | 10-20 kW | Liquid (CDU) | NVIDIA HGX, Supermicro SYS-821GE |
## OpenStack Cyborg (GPU lifecycle management)
Cyborg is an OpenStack service for managing accelerators (GPU, FPGA, DPU, NPU).
### Key capabilities
- **Discovery** — automatic GPU detection on compute nodes (NVIDIA, AMD, Intel)
- **Inventory** — tracking available accelerators in the cluster
- **Lifecycle** — attach/detach GPU to VM, firmware update, reset
- **Scheduling** — Placement API for GPU-aware scheduling (Nova)
- **Cyborg API** — REST API for accelerator management
### Integration
| Component | Role |
|------------|------|
| **Nova** | VM scheduling with GPU requirements (extra_specs: `accel:device_profile`) |
| **Placement** | Resource provider for GPU (inventory, traits) |
| **Neutron** | SR-IOV VF passthrough for GPU networking |
| **Ironic** | Bare metal + GPU provisioning |
## Sources
Links, books and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)
*Last revision: 2026-06-03*

155
GPU.md Normal file
View File

@@ -0,0 +1,155 @@
# 🎮 GPU — architektura, modely, virtualizace
## GPU modely
### NVIDIA
| GPU | Architektura | VRAM | HBM | FP16 (TFLOPS) | FP8 (TFLOPS) | Interconnect | TDP |
|-----|-------------|------|-----|--------------|-------------|-------------|-----|
| **A100** | Ampere (2020) | 40/80 GB | HBM2e | 312 | — | NVLink 3 (600 GB/s) | 400 W |
| **H100** | Hopper (2022) | 80 GB | HBM3 | 1000 | 2000 (sparse) | NVLink 4 (900 GB/s) | 700 W |
| **H200** | Hopper (2023) | 141 GB | HBM3e | 1650 | ~3300 | NVLink 4 (900 GB/s) | 700 W |
| **B200** | Blackwell (2024) | 192 GB | HBM3e | 2250 | ~4500 | NVLink 5 (1800 GB/s) | 700 W |
| **B100** | Blackwell (2024) | 192 GB | HBM3e | ~1800 | ~3600 | NVLink 5 | 700 W |
| **GB200** | Blackwell (2024) | — | HBM3e | 4500 (dual) | 9000 (dual) | NVLink 5 | 2700 W |
### AMD
| GPU | Architektura | VRAM | HBM | FP16 (TFLOPS) | Interconnect | TDP |
|-----|-------------|------|-----|--------------|-------------|-----|
| **MI250X** | CDNA 2 (2021) | 128 GB | HBM2e | 383 | Infinity Fabric | 500 W |
| **MI300X** | CDNA 3 (2023) | 192 GB | HBM3 | ~2600 | Infinity Fabric (896 GB/s) | 750 W |
| **MI350** | CDNA 4 (2025) | 288 GB | HBM3e | ~3500 | Infinity Fabric | 750 W |
## GPU interconnects
| Technologie | Poskytovatel | Bandwidth | Topologie | Use case |
|------------|-------------|-----------|-----------|----------|
| **NVLink 4** | NVIDIA | 900 GB/s (18× 50 GB/s) | GPU-GPU direct | AI training (H100, H200) |
| **NVLink 5** | NVIDIA | 1800 GB/s (18× 100 GB/s) | GPU-GPU direct | AI training (B200, GB200) |
| **Infinity Fabric** | AMD | 896 GB/s | GPU-GPU + CPU-GPU | AI training (MI300X, MI350) |
| **NVSwitch** | NVIDIA | 900 GB/s per GPU (NVLink) | Full-mesh (256 GPU) | DGX SuperPOD, HGX |
| **InfiniBand (NDR)** | NVIDIA/Mellanox | 400 Gbps per port | GPU-NIC direct, RDMA | Distributed training, HPC |
| **PCIe 5.0** | Standard | 63 GB/s per x16 | CPU-GPU | Inference, rendering |
| **Ethernet (RoCE v2)** | Standard | 100/200/400 GbE | GPU-NIC, RDMA over converged ethernet | AI inference, storage |
### GPU direct communication
```
GPU 0 ──NVLink── GPU 1 GPU 0 ───PCIe─── CPU ───PCIe─── GPU 1
│ │
│ │
NVSwitch InfiniBand
│ │
│ │
GPU 2 ──NVLink── GPU 3 GPU 2 ───PCIe─── CPU ───PCIe─── GPU 3
NVLink topologie (GPU direct) PCIe topologie (CPU mediated)
```
- **GPU Direct RDMA** — GPU ↔ NIC bez CPU (InfiniBand, RoCE)
- **GPU Direct Storage** — GPU ↔ NVMe bez CPU (NVIDIA Magnum IO)
- **NVSwitch** — full bisection bandwidth mezi všemi GPU v node
## Virtualizace GPU
| Technologie | Popis | GPU support | Use case |
|------------|-------|-------------|----------|
| **NVIDIA vGPU (Grid)** | Časové slicing + dedikované profily | A-series (VDI), Q-series (pro viz), B-series (AI) | VDI, virtualizované AI |
| **NVIDIA MIG** | Hardwarové partition GPU | A100 (7 inst.), H100/H200/B200 | AI inference, multi-tenant GPU |
| **AMD MxGPU** | SR-IOV, hardwarové partition | AMD MI (pro), Radeon Pro | VDI, cloud gaming |
| **Intel SG (SG1)** | SR-IOV, hardwarové partition | Intel SG1, Flex, Arc | VDI, media transcoding |
| **GPU passthrough** | Dedikovaný GPU celé VM (VFIO-pci) | Všechny GPU | AI training, HPC, nejvyšší výkon |
### MIG partition table (A100 / H100)
| GPU | Partition profile | GPU Memory | Compute units |
|-----|------------------|-----------|--------------|
| **A100 80 GB** | 1g.5gb | 5 GB | 1 |
| A100 80 GB | 2g.10gb | 10 GB | 2 |
| A100 80 GB | 3g.20gb | 20 GB | 3 |
| A100 80 GB | 7g.40gb | 40 GB | 7 |
| A100 80 GB | Full (7× 1g) | 7 × 5 GB | 7 instances |
| **H100 80 GB** | 1g.6gb+me | 6 GB | 1 |
| H100 80 GB | 2g.12gb+me | 12 GB | 2 |
| H100 80 GB | 3g.24gb+me | 24 GB | 3 |
| H100 80 GB | 7g.80gb | 80 GB | 7 |
## GPU use cases
### AI Training
- **Modely**: LLM (70B-405B+), vision, multimodal
- **GPU**: H100, B200, GB200, MI300X
- **Interconnect**: NVLink 5 / Infinity Fabric (v rámci node), InfiniBand NDR (mezi nody)
- **Parallelism**: Data Parallel (DDP), Tensor Parallel (TP), Pipeline Parallel (PP), Fully Sharded (FSDP)
- **Framework**: PyTorch (NCCL), JAX (XLA), DeepSpeed, Megatron-LM
- **Tipy**:
- GB200: 2× B200 propojené NVLink, 8 GPU → 4 GB200
- DGX B200 / HGX B200: standardní building block
- InfiniBand: fat tree topology pro all-reduce optimalizaci
### AI Inference
- **Modely**: LLM serving, embedding, image gen
- **GPU**: A100, H200, B200 (larger VRAM pro větší modely)
- **Techniky**: MIG partition, TensorRT-LLM, vLLM, Triton Inference Server
- **Kvantizace**: FP8, INT8, INT4 → nižší VRAM, vyšší throughput
- **Latency**: batch size optimalizace, dynamic batching, continuous batching
- **Scale**: on-prem (2-32 GPU) / cloud (elastic)
### VDI (Virtual Desktop Infrastructure)
- **GPU**: NVIDIA A16 (1 GPU = 16 users), A10 (1 GPU = 4 users)
- **Technologie**: vGPU (Grid), AMD MxGPU
- **Protokoly**: VMware Blast, Citrix HDX, Microsoft RDP, PC-over-IP (HP Teradici)
- **Use case**: CAD (CATIA, SolidWorks), Office, engineering, healthcare (PACS)
### Rendering a VFX
- **GPU**: NVIDIA RTX 6000 Ada, RTX A6000, AMD Radeon Pro W7900
- **Rendering**: Blender (Cycles/OptiX), V-Ray, Octane Render, Redshift
- **Denoising**: AI-accelerated denoising na GPU
- **Farm rendering**: Deadline, Qube! (job scheduler)
## Ceny GPU
Detailní cenová srovnání (nákupní cena, cloud on-demand, $/M token inferenčních nákladů, $/GB HBM, cenový vývoj 2024→2026) viz:
- [AI-INFRASTRUCTURE.md — Ceny GPU a poměr cena/výkon](AI-INFRASTRUCTURE.md#ceny-gpu-a-poměr-cenavýkon)
## GPU server form factors
| Form factor | GPU count | Power | Cooling | Příklad |
|------------|-----------|-------|---------|---------|
| **1U** | 1-2 | 700-1400 W | Air (high-RPM) | Dell XR4510c |
| **2U** | 4-8 | 3-6 kW | Air / Liquid | Dell R760xa, HPE DL380a |
| **4U** | 8-10 | 5-8 kW | Liquid | NVIDIA DGX H100, Dell R760xa |
| **8U / Chassis** | 8-16 | 10-20 kW | Liquid (CDU) | NVIDIA HGX, Supermicro SYS-821GE |
## OpenStack Cyborg (GPU lifecycle management)
Cyborg je OpenStack service pro správu akcelerátorů (GPU, FPGA, DPU, NPU).
### Klíčové schopnosti
- **Discovery** — automatická detekce GPU na compute node (NVIDIA, AMD, Intel)
- **Inventory** — tracking dostupných akcelerátorů v clusteru
- **Lifecycle** — attach/detach GPU k VM, firmware update, reset
- **Scheduling** — Placement API pro GPU-aware scheduling (Nova)
- **Cyborg API** — REST API pro správu akcelerátorů
### Integrace
| Komponenta | Role |
|------------|------|
| **Nova** | VM scheduling s GPU požadavky (extra_specs: `accel:device_profile`) |
| **Placement** | Resource provider pro GPU (inventory, traits) |
| **Neutron** | SR-IOV VF passthrough pro GPU networking |
| **Ironic** | Bare metal + GPU provisioning |
## Zdroje
Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
*Poslední revize: 2026-06-03*

12
HARDWARE.en.md Normal file
View File

@@ -0,0 +1,12 @@
# 🔧 Hardware and servers
This file has been split into separate areas:
| Area | File |
|--------|--------|
| 🔧 Server hardware — components and architecture | [SERVER-HW.en.md](SERVER-HW.en.md) |
| 🎮 GPU — architecture, models, virtualization | [GPU.en.md](GPU.en.md) |
| ⚙️ Server configuration — best practices by workload | [SERVER-CONFIG.en.md](SERVER-CONFIG.en.md) |
| 📦 Provisioning — boot, installation, server management | [PROVISIONING.en.md](PROVISIONING.en.md) |
*Last revision: 2026-06-03*

12
HARDWARE.md Normal file
View File

@@ -0,0 +1,12 @@
# 🔧 Hardware a servery
Tento soubor byl rozdělen do samostatných oblastí:
| Oblast | Soubor |
|--------|--------|
| 🔧 Server hardware — komponenty a architektura | [SERVER-HW.md](SERVER-HW.md) |
| 🎮 GPU — architektura, modely, virtualizace | [GPU.md](GPU.md) |
| ⚙️ Server configuration — best practices podle workloadu | [SERVER-CONFIG.md](SERVER-CONFIG.md) |
| 📦 Provisioning — boot, instalace, správa serverů | [PROVISIONING.md](PROVISIONING.md) |
*Poslední revize: 2026-06-03*

560
HYPERVISORS.en.md Normal file
View File

@@ -0,0 +1,560 @@
# 🖥️ Hypervisors and Virtualization Platforms
## Hypervisor Types
| Type | Description | Examples |
|-----|-------|----------|
| **Type 1** (bare-metal) | Runs directly on hardware | VMware ESXi, Microsoft Hyper-V, KVM, Xen |
| **Type 2** (hosted) | Runs on top of host OS | VirtualBox, VMware Workstation, Parallels |
## Platform Overview
| Platform | Hypervisor | License | Note |
|-----------|-----------|---------|----------|
| **VMware vSphere** | ESXi | Proprietary (Subscription from 2024) | Market leader, wide adoption. After Broadcom acquisition (2023), switched to per-core subscription, perpetual license discontinued |
| **Microsoft Hyper-V** | Hyper-V | Windows Server / standalone | Integration with Azure, SCVMM |
| **Proxmox VE** | KVM + LXC | Open source | Debian-based, web UI, low cost |
| **Red Hat OpenStack / oVirt** | KVM | Open source | Open alternative, complex |
| **Nutanix AHV** | KVM (fork) | Part of Nutanix | Integrated HCI solution |
| **XCP-ng / Xen Server** | Xen | Open source | Successor to Citrix Hypervisor |
| **Oracle VM** | Xen | Proprietary | Oracle ecosystem |
## Key Concepts
- **VM — Virtual Machine** — full virtualization, own kernel
- **Container** — shared host kernel, lighter (Docker, LXC)
- **Paravirtualization** — guest OS knows it runs in a VM (better I/O performance)
- **NUMA** — Non-Uniform Memory Access, CPU/memory allocation optimization (see [SERVER-HW.en.md](SERVER-HW.en.md#numa))
- **Overcommit** — allocating more vCPU/RAM than physically available (ratio management)
- **Live Migration** — moving a running VM between hosts (vSphere vMotion, Hyper-V Live Migration)
- **HA (High Availability)** — VM restart on another host upon failure
- **DRS / Load Balancing** — automatic VM distribution based on load
## VMware vSphere
### VMware licensing (post-Broadcom 2024+)
Since 2024, VMware only sells subscription licenses; perpetual + SnS (Support & Subscription) have been discontinued.
| Product | Metric | Price (indicative) | What it includes |
|---------|---------|-------------------|-------------|
| **vSphere Standard** | Per core (min 16 cores/CPU) | ~$140/core/year | ESXi, vCenter, vMotion, HA, DRS basic |
| **vSphere Enterprise Plus** | Per core | ~$220/core/year | All above + DRS advanced, SIOC, NIOC, Big Data Extensions |
| **vSphere Foundation** | Per core (bundle) | ~$350/core/year | vSphere Enterprise Plus + Aria Operations, Aria Operations for Logs, Aria Automation |
| **VMware Cloud Foundation (VCF)** | Per core (bundle) | ~$700/core/year | vSphere + vSAN + NSX + Aria full suite. Required for vSAN and NSX from 2025 |
| **vSAN** | Per core (only as part of VCF from 2025) | No longer standalone | Storage virtualization, dedup, compression, encryption |
| **NSX** | Per core (only as part of VCF from 2025) | No longer standalone | SDN, micro-segmentation, firewall, load balancing |
**Key changes after Broadcom acquisition**:
- Discontinued perpetual license sales (May 2024)
- Discontinued standalone products: vSAN and NSX can no longer be purchased standalone (only within VCF)
- Desktop and ROBO variants cancelled (migrated to VCF)
- Average cost increase: 25× compared to the previous model (depends on size and product mix)
- **Impact**: Many customers are migrating to Proxmox VE, Nutanix AHV, or Hyper-V
**Per-core calculation**:
```text
Server: 2× EPYC 9654 (96C each) = 192 cores
vSphere Standard: 192 × $140 = $26,880/year
VCF: 192 × $700 = $134,400/year (incl. vSAN and NSX)
For comparison: previously perpetual + SnS ≈ $15,000 one-time + $3,000/year
```
### VMware Exit Strategy (post-Broadcom 2024+)
#### Context
After Broadcom's acquisition of VMware (completed November 2023), the virtualization market experienced the biggest upheaval in its history. Changes include:
- **Discontinuation of perpetual licenses** (February 2024) — mandatory subscription model
- **Forced bundling** — 8,000+ SKUs reduced to 4 bundles (VCF, VVF, vSphere Standard/Foundation)
- **Minimum 72-core commitment** (from April 2025) — small servers can no longer be licensed economically
- **20% late renewal penalty** — no tolerance
- **Price increase of 1501,500%** depending on size and product mix
- **Standalone products discontinued** — vSAN and NSX only within VCF
- **Collapse of the partner ecosystem** — from 4,500+ partners to ~300 Premier
According to Foundry/CIO.com survey (2025): **56%** of organizations plan to reduce VMware usage, **71%** are actively looking for on-premise alternatives. Gartner predicts a loss of ~35% of workloads within 3 years.
#### Three Strategies
| Strategy | Description | Suitable for |
|-----------|-------|------------|
| **Stay** | Accept new pricing, renew VCF/VVF subscription | Large organizations with deep integration where migration costs more than new licenses |
| **Reduce** | Reduce VMware footprint, migrate part of workloads to alternatives, optimize the rest | Medium and large enterprises with heterogeneous environments |
| **Exit** | Complete migration to an alternative platform | SMEs, organizations facing 36× cost increases, greenfield projects |
#### Target Platforms — Comparison
| Criterion | Proxmox VE | Nutanix AHV | Microsoft Hyper-V | Red Hat OpenShift Virtualization | **Sangfor aSV (HCI)** |
|-----------|-----------|-------------|-------------------|----------------------------------|----------------------|
| **Hypervisor** | KVM + LXC | KVM (fork) | Hyper-V | KVM (KubeVirt) | **KVM (aSV)** |
| **License** | Open source (free), support ~€500/host/year | Per node subscription (3060% savings vs VCF) | Windows Server license (Standard/Datacenter) | OpenShift subscription (core-based) | **Per node (Enterprise Pro), all-inclusive** |
| **Live Migration** | Live Migration (Proxmox 8+) | AHV Live Migration | Live Migration (SMB/RDMA) | KubeVirt (VMI live migration) | **Yes** |
| **HA** | Proxmox HA (watchdog, fencing) | Built-in HA (Prism) | Hyper-V HA (WS Failover Cluster) | OpenShift HA (self-healing) | **Built-in HA** |
| **Storage** | ZFS, Ceph, LVM | AOS (hybrid/SSD, erasure coding) | S2D, CSV, ReFS | OCS, Ceph, LSO | **aSAN (distributed SDS, locality-aware)** |
| **Backup** | Proxmox Backup Server (free) | Native snapshot + DR | Windows Server Backup / Veeam | OpenShift APIs + OADP | **Built-in backup + CDP** |
| **Price (3 years, 3 hosts)** | $0 + support $1,500 | ~$45,00060,000 | $0 (Hyper-V Server free) or Windows Server license | ~$90,000+ (OpenShift) | **~$15,00025,000** |
| **Price (3 years, 10 hosts)** | $0 + support $5,000 | ~$150,000200,000 | Windows Server Datacenter for unlimited VMs | ~$300,000+ (OpenShift) | **~$50,00080,000** |
| **Migration difficulty** | Medium (VMDK → QCOW2, VirtIO drivers) | Low (Nutanix Move tool) | Medium (V2V converter, SCVMM) | High (Kubernetes learning curve) | **Low (VMware import tool)** |
| **Linux support** | Excellent (native KVM) | Excellent (KVM-based) | Good (LIS drivers) | Excellent (KVM + OpenShift) | **Excellent (KVM-based)** |
| **Windows support** | Good (VirtIO drivers) | Excellent (ALAS drivers, svpd) | Excellent (native) | Good (KubeVirt + VirtIO) | **Good (VirtIO drivers)** |
| **GPU passthrough** | VFIO (excellent) | GPU passthrough | DDA (Direct Device Assignment) | VFIO + GPU Operator | **vGPU support (standard)** |
| **Integrated security** | — | — | — | — | **Yes (NGFW, IPS, WAF, EDR — aSEC)** |
| **Min. cluster (3 copies)** | 3 (Ceph) | 3 | 23 | 3 | **3** |
#### Migration Tools
| Tool | Source Platform | Target Platform | Method |
|---------|-------------------|-------------------|--------|
| **Proxmox VMware Import Wizard** | VMware ESXi | Proxmox VE | Web GUI import via NFS/ESXi API. Limitation: snapshots must be removed, UEFI not supported before Proxmox 8.1 |
| **Nutanix Move** | VMware ESXi, Hyper-V | Nutanix AHV | Virtual appliance, automated migration with minimal downtime, UEFI support, can retain IP/MAC |
| **Veeam Backup & Replication v12.2+** | VMware ESXi | Proxmox VE | Backup/restore via Veeam, hot migration, Proxmox support from v12.2 |
| **StarWind V2V Converter** | VMware ESXi | Proxmox, Hyper-V, XCP-ng | Free GUI tool, VMDK → QCOW2/raw/VHDX, CLI support, hot migrations |
| **virt-v2v** | VMware ESXi, Xen, Hyper-V | KVM (libvirt) | Open source CLI tool, disk + driver conversion (virtio), suitable for bulk migration |
| **Windows Admin Center VM Conversion Extension** | VMware ESXi | Hyper-V | Microsoft WAC extension, free, GUI-based, bulk migration |
| **Platform9 vJailbreak** | VMware ESXi | OpenStack / KVM | In-place migration (no swing gear), open source |
| **Sangfor VMware Import Tool** | VMware ESXi | Sangfor aSV (HCI) | VMware import tool, disk + driver conversion, can retain network config |
#### Cross-Hypervisor Migration Matrix
Comprehensive overview of all source→target pairs with methods, tools, limitations, and complexity.
| Source → Target | Method | Tools | Complexity | Limitations |
|-------------|--------|----------|-----------|---------|
| **VMware → Proxmox** | Disk conversion VMDK→QCOW2, driver reinstall | Proxmox Import Wizard, Veeam, StarWind, virt-v2v | Medium | VirtIO drivers required, UEFI not supported in Import Wizard (< 8.1), snapshots must be removed |
| **VMware → Hyper-V** | Disk conversion VMDK→VHDX, driver reinstall | StarWind, WAC Converter, SCVMM, Microsoft MTC | Medium | Integration Services required, network config differences (VMXNET3 → Hyper-V Synthetic) |
| **VMware → KVM/XCP-ng** | Disk conversion VMDK→raw/QCOW2, driver swap | virt-v2v, StarWind | Medium | VirtIO drivers, UEFI support (OVMF), host passthrough compatibility |
| **VMware → Nutanix AHV** | Automated migration via Move appliance | Nutanix Move, Veeam | Low | AHV is also KVM — minimal issues, retain IP/MAC, UEFI support |
| **VMware → Sangfor aSV** | Import via VMware Import Tool, disk + driver conversion | Sangfor VMware Import Tool | Low | Built-in tool, retain network config, UEFI support |
| **VMware → OpenStack** | In-place or swing | Platform9 vJailbreak, virt-v2v + Glance | High | Network redesign (Neutron), storage (Cinder), image format (Glance) required |
| **Hyper-V → VMware** | Disk conversion VHDX→VMDK, driver reinstall | StarWind, virt-v2v, VMware vCenter Converter (standalone) | Medium | VMware Tools required, network driver change (VMXNET3), UEFI/secure boot issues |
| **Hyper-V → Proxmox** | Disk conversion VHDX→QCOW2, driver swap | StarWind, virt-v2v, qemu-img | MediumHigh | VirtIO drivers, integration services → guest agent, secure boot issues |
| **Hyper-V → KVM/XCP-ng** | Disk conversion VHDX→raw/QCOW2 | virt-v2v, qemu-img | Medium | VirtIO drivers, Linux generic drivers usually work |
| **Hyper-V → Nutanix AHV** | Automated migration | Nutanix Move | LowMedium | Similar to VMware→Nutanix, UEFI support, retain IP |
| **Proxmox → VMware** | Export OVF/OVA, qemu-img convert | qemu-img (QCOW2→VMDK), ovftool, manual OVF export | High | VMware Tools required, storage format differences, no live migration, downtime required |
| **Proxmox → Hyper-V** | qemu-img convert, driver reinstall | qemu-img, manual VHDX conversion | High | Hyper-V Integration Services required, no automated tool, edge case |
| **Proxmox → KVM/XCP-ng** | Direct QCOW2 (same format), XML edit | libvirt, virsh dumpxml/define | Medium | libvirt XML/QEMU args differences (storage pool, network), validation required |
| **Proxmox → Nutanix AHV** | qemu-img + manual import | qemu-img, Nutanix Image Service CLI | High | No hot tool, conversion + manual VM reconfiguration required |
| **XCP-ng → VMware** | Disk conversion VHD→VMDK | qemu-img, StarWind, virt-v2v | High | VMware Tools required, paravirtualization differences (Xen PV vs VMware) |
| **XCP-ng → Proxmox** | Disk conversion or direct VHD | qemu-img, manual import | Medium | Disk conversion, VHD format not native in Proxmox |
| **XCP-ng → Hyper-V** | Disk conversion VHD→VHDX (direct) | StarWind, qemu-img | Medium | VHD/VHDX compatible, Integration Services required |
| **Nutanix AHV → VMware** | Export + conversion | qemu-img, Nutanix Export, VMware vCenter Converter | High | VMware Tools, AHV is KVM → usually easier than Hyper-V→VMware |
| **Nutanix AHV → Proxmox** | qemu-img + manual import | qemu-img, Nutanix self-service restore | Medium | AFS disks → QCOW2, metadata must be reconstructed |
| **Nutanix AHV → Hyper-V** | qemu-img + manual | qemu-img, StarWind | High | Edge case, no hot tool |
| **OpenStack → (any)** | Glance export + qemu-img | glance image-download, qemu-img, ovftool | MediumHigh | Image format (raw/QCOW2), metadata (flavor, security groups) must be recreated |
| **Sangfor aSV → (any)** | qemu-img conversion + manual | qemu-img, manual OVF/OVA export | MediumHigh | KVM-based → conversion to QCOW2/VMDK/VHDX via qemu-img, metadata must be recreated |
| **(any) → Sangfor aSV** | aSV API import + VMware Import Tool | Sangfor VMware Import Tool (for VMware), manual qemu-img import for others | Medium | KVM-based → standard formats supported, import tool for VMware only |
**Migration success keys:**
- **Drivers** — each platform requires its own paravirtual drivers (VMware Tools, VirtIO, Hyper-V Integration Services, Xen Tools). Always swap after migration.
- **UEFI / Secure Boot** — not all combinations support UEFI (Proxmox Import Wizard < 8.1 does not). Test UEFI VMs before migration.
- **Snapshots** — snapshots must be removed (merged) before migration. Most tools only migrate flat disks.
- **Network** — MAC addresses, IP addresses, VLAN tagging — verify after migration. Some tools (Nutanix Move, VMware Converter) can retain MAC.
- **Storage format** — VMDK ↔ VHDX ↔ QCOW2 ↔ raw are inter-convertible via `qemu-img`, but metadata differs (snapshots, backing files).
- **Live migration** — no live migration exists between different hypervisors. Downtime is always required (minutes to hours depending on VM size).
- **Migration temperature** — the "colder" the VM (fewer changes), the easier the migration. Real-time database applications require a separate DB migration plan.
| Platform | Year 1 | 3 Years Total | Note |
|-----------|--------|---------------|----------|
| **VMware VVF** (1-year rate) | $22,800 | $68,400 | 120 cores × $190/core/year |
| **VMware VCF** | $42,000 | $126,000 | 120 cores × $350/core/year |
| **Proxmox VE** (support) | $1,500 | $4,500 | 3× €500/host/year |
| **Nutanix AHV** (average) | ~$18,000 | ~$54,000 | Per node subscription, estimate |
| **Hyper-V** (Windows Server Datacenter) | $12,400 | $37,200 | One-time license per core, without SA |
| **Hyper-V** (Azure Stack HCI) | ~$7,200 | ~$21,600 | ~$10/core/month, 120 cores |
| **Sangfor HCI** (Enterprise Pro) | ~$5,0008,000 | ~$15,00025,000 | Per node, all-inclusive, 3 nodes |
**Real-world example from Spiceworks (2026)**: A user reports VMware Essentials+ increasing from $1,900/year to $14,000/year (VVF) — a 7.4× increase.
#### Decision Framework
```
1. Audit VMware environment
├─ Number of hosts, core count, utilization
├─ Feature dependency (vSAN, NSX, SRM)
├─ Workload profile (Windows vs Linux, DB, GPU)
└─ Hardware refresh cycle
2. Calculate TCO for VMware renewal (3 years)
├─ VVF vs VCF vs current model
└─ Include audit risk, late renewal penalty
3. Select target platform (1-2 candidates)
├─ Proxmox: lowest TCO, Linux-heavy shops
├─ Nutanix: enterprise HCI, low migration difficulty
├─ Hyper-V: Windows-centric, Azure hybrid
├─ Sangfor: HCI all-in-one, security-first, VMware exit (SMB/mid-market)
└─ OpenShift: Kubernetes-first, platform engineering
4. Plan migration phases
├─ Wave 1: non-critical (dev/test, 1-2 months)
├─ Wave 2: standard production (3-6 months)
├─ Wave 3: mission-critical (6-12 months)
└─ Coexistence: VMware + target running in parallel
5. Allow 18-48 months for complete exit (Gartner)
```
#### Real-World Case Studies
| Organization | Starting Point | Target | Scale | Result |
|-----------|---------|-----|--------|----------|
| **Stanford University** | VMware (60+ nodes) | Proxmox VE (6 clusters) | 1,500 VMs | Completed 2025, increased automation, lower costs |
| **Michelin** | VMware | Platform9 + OpenStack | Dozens of nodes | Platform engineering team, production workload migration |
| **Czech enterprise (50-100 servers)** | VMware | Proxmox VE | ~100 VMs | Annual savings of ~340,000500,000 CZK on licenses |
#### Timing — Key Deadlines
| Event | Date | Impact |
|---------|-------|-------|
| **Discontinuation of perpetual licenses** | February 2024 | Already done |
| **72-core minimum** | April 2025 | Small server licensing became more expensive |
| **vSphere 7 EOS** | April 2025 | Upgrade to 8.x required |
| **ESXi 8.0 EOS** | October 2027 | Last supported version, migration deadline |
| **Windows Server 2025 Hyper-V** | December 2025 | 64-host cluster, 2,048 vCPU per VM |
| **Proxmox VE 9 + Datacenter Manager** | 2026 | Enterprise features, vCenter alternative |
#### Recommendations
| Scenario | Action |
|--------|------|
| **Small company (< 10 hosts), Linux workloads** | Migrate to Proxmox VE — immediate 100% license savings |
| **Medium company (10-50 hosts), mixed workloads** | Evaluate Nutanix AHV (easy migration) or Proxmox (lower TCO) |
| **Enterprise (50+ hosts), deep VMware integration** | Reduce strategy: optimize existing VMware + migrate selected workloads to OpenShift / Hyper-V |
| **Microsoft shop** | Hyper-V / Azure Stack HCI — native Azure hybrid, no additional hypervisor licenses |
| **Kubernetes-native team** | OpenShift Virtualization / KubeVirt — unify VM and container management |
| **MSP / hosting provider** | Nutanix or OpenStack — multi-tenancy, vCloud Director alternative |
#### Cluster Design
- **Max cluster size**: 64 hosts (vSphere 8/9), 96 hosts (vSphere 8 + enhanced)
- **Datastore limits**: max 256 datastores per host, max 65 TB per VMFS-6 datastore
- **vSAN ready capacity**: recommended max 6064 hosts per vSAN cluster
- **Fault domains** — cluster division into host groups (rack awareness), min 3 fault domains for stretched cluster
- **Admission control** — resource reservation for HA failover:
- **Host failures cluster tolerates** — most common (14 hosts)
- **Percentage of cluster resources** — reserve % of CPU/memory
- **Dedicated failover hosts** — dedicated host(s) for HA
- **Cluster limits (vSphere 8/9)**:
- 960 VMs per host (vSphere 9 max)
- 15,000 VMs per cluster (vCenter max)
- 300 hosts per cluster (vSphere 8/9, hardware vMotion)
### Microsoft Hyper-V Licensing
| Variant | Metric | Price | What it includes |
|----------|---------|------|-------------|
| **Windows Server Standard** | Per core (min 16 licenses/server) + CAL | ~$1,000/core (one-time) + $200/CAL | 2 VM licenses (each with full Windows Server license) |
| **Windows Server Datacenter** | Per core (min 16 licenses/server) + CAL | ~$6,200/core (one-time) + $200/CAL | Unlimited VMs, Storage Spaces Direct, Shielded VMs |
| **Azure Stack HCI** | Per core (monthly) | ~$1020/core/month (Azure hybrid benefit) | Hyper-V + S2D + Azure management, part of Azure subscription |
| **Hyper-V Server** | Free | $0 | Standalone hypervisor (no management, no GUI, limited support) — no longer distributed as of 2025 |
**Important**:
- Windows Server Standard = 2 VMs per license. If you need 3 VMs on a 2-socket server, you need 2× Standard license (4 VMs) or Datacenter
- **Azure Hybrid Benefit** — if you have Windows Server with SA (Software Assurance), you can use licenses in Azure at no additional cost
- **CAL (Client Access License)** — every user or device accessing Windows Server must have a CAL (except Azure Hybrid Benefit)
## Microsoft Hyper-V
| Feature | Hyper-V | Note |
|-----------|---------|----------|
| **Max hosts per cluster** | 64 (Windows Server 2025) | Shared Nothing Live Migration |
| **Max VMs per host** | 1,024 (WS 2022+) | Generation 2 VMs |
| **Max vCPU per VM** | 240 (WS 2022+) | 64-host cluster |
| **Max RAM per VM** | 12 TB (WS 2022+) | Dynamic memory |
| **Live Migration** | SMB, CSV, RDMA | Compressed or RDMA |
| **Storage** | CSV (Cluster Shared Volumes), ReFS | S2D for HCI |
| **Nested Virtualization** | Yes | Intel VT-x / AMD-V |
| **SCVMM** | System Center VMM | Enterprise management, fabric, P2V |
### Hyper-V vs VMware Comparison
| Feature | VMware vSphere | Microsoft Hyper-V |
|-----------|---------------|-------------------|
| **OS** | VMware ESXi (VMkernel) | Windows Server / Hyper-V Server |
| **License** | Per CPU (subscription) | Windows Server license / Datacenter |
| **Storage** | VMFS, NFS, vSAN, HCI | NTFS, ReFS, SMB, S2D |
| **Live Migration** | vMotion (cross-vSwitch, long distance) | Live Migration (SMB/RDMA) |
| **Storage Migration** | Storage vMotion (online) | Shared Nothing (data disk) |
| **Replication** | vSphere Replication | Hyper-V Replica (ASR) |
| **Management** | vCenter, vSphere Client | SCVMM, Hyper-V Manager, Admin Center |
| **Linux support** | Excellent (open-vm-tools) | Good (Linux Integration Services) |
| **TCO** | Higher | Lower (with Windows license) |
## KVM
### Architecture
```
Hardware ──> QEMU (I/O emulation) + KVM (kernel module, virtualization)
libvirt (API + management)
┌───────┼───────────┐
virt-manager virsh openstack/proxmox
```
### Tuning
- **CPU pinning** — `virsh vcpupin vm1 0 2` (vCPU 0 → physical core 2), prevents context switching
- **Huge pages** — 2 MB / 1 GB pages instead of 4 KB, reduces TLB misses (VMs with large RAM): `echo 2048 > /proc/sys/vm/nr_hugepages`
- **NUMA affinity** — VM pinned to one NUMA node (minimizes cross-NUMA memory access)
- `numactl --cpunodebind=0 --membind=0`
- `virsh numatune vm1 --nodeset 0`
- **VirtIO** — paravirtualized I/O (virtio-net, virtio-blk, virtio-scsi) for better performance
- **IO threads** — dedicated threads for QEMU I/O emulation
### KVM Tuning Checklist
- Verify HW virtualization: `lscpu | grep Virtualization`
- Load KVM modules: `kvm`, `kvm_intel`/`kvm_amd`, `vfio-pci`
- Optimize storage: raw/LVM (avoid qcow2 for performance workloads)
## Sangfor aSV (HCI)
[Chinese vendor](https://www.sangfor.com) — KVM-based hypervisor, part of Sangfor HCI stack (aSV + aSAN + aNet + aSEC). Distributed through partners in EMEA.
### Stack architecture
| Component | Role |
|-----------|------|
| **aSV** | Hypervisor (KVM-based) |
| **aSAN** | Distributed SDS (locality-aware, data tiering, dedup, compression) |
| **aNet** | Network virtualization (distributed switches and routers, WYDIWYG visual editor) |
| **aSEC** | Security (NGFW, IPS, WAF, EDR, east-west segmentation) |
| **Sangfor Cloud Platform** | Management orchestrator, unified dashboard |
### Key features
| Feature | Detail |
|-----------|--------|
| **Hypervisor** | KVM (aSV) — custom fork with HCI extensions |
| **License** | Enterprise Pro — per node, all-inclusive (compute + storage + network + security) |
| **Min. cluster** | 3 nodes (3 data copies) |
| **Live Migration** | Yes |
| **HA** | Built-in HA |
| **Storage** | aSAN — locality-aware, data tiering (SSD + HDD), dedup, compression, erasure coding |
| **Backup** | Built-in backup + CDP — no 3rd party needed |
| **Security** | Integrated NGFW, IPS, WAF, EDR — no external appliances |
| **VDI** | aDesk — integrated VDI solution |
| **Kubernetes** | SKE (Sangfor Kubernetes Engine) |
| **Migration** | Sangfor VMware Import Tool (from vCenter), qemu-img for others |
| **vGPU** | Standard support (no extra license) |
### Comparison with VMware
| Feature | Sangfor | VMware |
|---------|---------|--------|
| **License** | Per node, all-inclusive | Multi-tier (vSphere + vSAN + NSX + Aria) |
| **vGPU** | Included (standard) | Enterprise Plus only |
| **Backup + CDP** | Built-in | 3rd party or extra license |
| **Security (NGFW, IPS, WAF)** | Built-in (aSEC) | NSX + 3rd party |
| **Network management** | WYDIWYG visual editor | NSX Manager (more complex) |
| **Min. cluster (3 copies)** | 3 nodes | 5 nodes (vSAN) |
| **Data locality** | Yes | No |
| **SSD life prediction** | Yes | No |
### Use case
- **VMware exit** — VMware replacement for SMB and mid-market
- **Greenfield HCI** — new DCs, branch offices, remote sites
- **VDI** — aDesk integrated with HCI
- **Security-first** — organizations requiring integrated security
- **Asia-Pacific / EMEA** — strongest in Asia, expanding to Europe
### Risks and limitations
| Risk | Detail |
|--------|--------|
| **Geopolitical** | Chinese vendor — possible regulatory restrictions (GDPR, EU, NATO, government) |
| **Ecosystem** | Smaller community than VMware/Proxmox, less documentation and ISV certifications |
| **Support** | Primary support from Asia, local partner critical |
| **Vendor lock-in** | Closed ecosystem (aSV + aSAN + aNet + aSEC), harder to mix with 3rd party |
| **References in CZ/EU** | Very limited — pilot required before production |
## Storage in Hypervisors
See also: [STORAGE.en.md](STORAGE.en.md) — detailed overview of storage protocols and configurations.
| Type | Description | Protocols |
|-----|-------|-----------|
| **Local storage** | Disks directly in the server | SATA, SAS, NVMe |
| **Shared storage** | SAN / NAS accessible to all hosts | Fibre Channel, iSCSI, NFS, SMB |
| **vSAN / HCI** | Hyperconverged storage (server disks = single pool) | VMware vSAN, Nutanix, StarWind |
| **Software-Defined** | SDS separates storage software from hardware | Ceph, GlusterFS, MinIO |
## HCI Details
| Feature | Nutanix (AOS + AHV) | VMware vSAN | Azure Stack HCI |
|-----------|--------------------|-------------|----------------|
| **Hypervisor** | AHV (KVM fork), ESXi optional | ESXi (required) | Hyper-V |
| **Min. nodes** | 3 | 2 (witness) | 2 (witness) |
| **Max nodes** | 80+ | 64 | 16 (typical) |
| **Replication** | 2 or 3 copies + erasure coding | Mirroring (RAID 1), erasure coding | Mirroring + parity |
| **Deduplication** | Cluster-level (post-process) | Disk-level (capacity tier) | ReFS (real-time) |
| **Compression** | Inline (AOS 6+) | Dedup + compression combined | ReFS |
| **Management** | Prism (web UI) | vCenter + vSAN UI | Windows Admin Center |
| **Licensing** | Per node subscription | Per CPU subscription | Per core subscription |
| **Ecosystem** | Built-in DR, backup, security | Broad ISV ecosystem | Azure integration |
| **Use case** | Enterprise VDI, general VM | VMware-centric shops | Azure hybrid, branch offices |
## Virtualization Platforms — Comparison
| Capability | VMware vSphere | Microsoft Hyper-V | Proxmox VE | Nutanix AHV |
|-----------|---------------|-------------------|------------|-------------|
| Live Migration | vMotion | Live Migration | Live Migration | Live Migration |
| HA | vSphere HA | Hyper-V HA | Proxmox HA | Built-in |
| DRS/balancing | DRS | SCVMM / AKS | HA groups | Built-in |
| Storage vMotion | yes | when VM is off | ZFS send/recv | Built-in |
| Snapshots | yes | yes | yes | yes |
| Backup API | CBT (Changed Block Tracking) | Hyper-V WMI / RCT | Proxmox Backup Server | Native |
| GPU passthrough | vGPU (NVIDIA Grid) | DDA | VFIO passthrough | GPU passthrough |
| Licensing | Per CPU / subscription | Windows Server license | Open source (free) | Per node subscription |
## OpenStack
- **Distributions**: Red Hat OpenStack, Canonical Charmed OpenStack
- **Services**: Nova (compute), Cinder (block), Neutron (networking), Glance (images), Swift (object)
- **Use case**: Telco, large private clouds, MNO (MANO, NFVI)
- **Complexity**: High — complex deployment and maintenance
---
## Variant Hypervisor Configurations by Size and Storage Type
### Platform Selection by Use Case
| Use Case | Primary Choice | Alternative | Rationale |
|----------|---------------|-------------|------------|
| **VMware shop, enterprise** | vSphere 8/9 | Hyper-V | Most comprehensive ecosystem, vSAN, SRM, broadest ISV support |
| **Microsoft shop, Azure hybrid** | Hyper-V / Azure Stack HCI | vSphere | Windows Server CAL already in place, S2D, Azure Arc, native Hyper-V Replica |
| **SME / low budget** | Proxmox VE | XCP-ng / Hyper-V (free) | Open source, built-in Ceph, ZFS, PBS, no license costs |
| **HCI greenfield** | Nutanix AHV | VMware vSAN | All-in-one, simple management, built-in DR and backup |
| **Hyperscale / telco** | OpenStack (RHOSP) | — | Multi-tenancy, NFVI, MANO, Neutron SDN, Ceph integration |
### Variant A: Small Deployment (2-3 hosts, local storage)
For small companies, branch offices, edge, dev/test. No shared storage — HA provided at the application level or via VM replication.
| Parameter | Proxmox VE | VMware vSphere | Hyper-V |
|----------|-----------|---------------|---------|
| **CPU** | 1× EPYC 9124-9224 / Xeon 4410Y (8-16C) | 1× EPYC 9124-9224 / Xeon 4410Y | 1× Xeon 4410Y / EPYC 9124 |
| **RAM** | 64-128 GB (DDR5-4800, 1DPC) | 64-128 GB | 64-128 GB |
| **OS disk** | 2× SATA SSD RAID1 (240-480 GB) | 2× SATA SSD RAID1 | 2× SATA SSD RAID1 |
| **VM storage** | ZFS RAID10 (4-6× NVMe/SATA SSD) | VMFS local (4-6× SSD RAID5/10) | ReFS CSV (4-6× SSD RAID10) |
| **Network** | 2× 10/25 GbE LACP | 2× 10/25 GbE LACP + management | 2× 10/25 GbE LACP |
| **Management** | Proxmox web UI (1× node) | vCSA / vCenter (1× appliance) | Windows Admin Center / SCVMM |
| **HA** | Proxmox HA (watchdog, fencing) | vSphere HA (1 host failure) | Hyper-V HA (WS Failover Cluster) |
| **Backup** | Proxmox Backup Server | Veeam B&R (Community) | Windows Server Backup / Veeam |
| **License** | Free (support ~€500/host/year) | vSphere Essentials (~$600/3 hosts) | Windows Server Standard (2 VMs) |
**Use case**: Startup, branch office, dev/test, < 200 VMs, no SAN, minimal budget.
**Advantages**: Low cost, simple management. **Disadvantages**: Limited scalability, host failure = VM unavailability.
### Variant B: Medium HCI (3-6 hosts, vSAN / Ceph)
Hyperconverged infrastructure — storage runs on the same hosts as VMs.
| Parameter | VMware vSAN | Proxmox + Ceph | Nutanix AHV |
|----------|------------|----------------|-------------|
| **CPU** | 1-2× EPYC 9334-9654 (16-32C) | 1-2× EPYC 9224-9334 (12-24C) | 1-2× EPYC 9334-9654 |
| **RAM** | 256-512 GB | 128-256 GB | 256-512 GB |
| **Cache tier** | 1-2× NVMe cache (write buffer) | — (Ceph uses RAM/OSD) | 1-2× NVMe (oplog) |
| **Capacity tier** | 4-8× SSD (SAS/SATA) | 4-8× HBA NVMe/SSD (OSD) | 4-6× SSD (extent store) |
| **Network** | 4× 25 GbE (vSAN + VM + mgmt) | 4× 25 GbE (Ceph public + cluster) | 4× 25 GbE (storage + VM) |
| **Fault domain** | Rack awareness (3 racks min) | CRUSH rack level | Rack awareness |
| **Replication** | RAID-1 mirroring (FTT=1) | 3× replication / EC 8+3 | 2× copies + EC |
| **Dedupe/Compress** | Dedup + compression (capacity) | ZFS / Ceph compression (inline) | Inline compression |
| **HA limit** | 1-3 host failures | 1-2 host failures (replication) | 1-2 host failures |
| **Min. hosts** | 2 + witness | 3 (MON + OSD) | 3 |
**Use case**: Medium company, VDI, general virtualization, 50-500 VMs.
**Recommendation**: For vSAN → min. 4 hosts for FTT=1 with erasure coding. For Ceph → min. 3 hosts, ideally 5+, each OSD host = 1 OSD per NVMe for maximum IOPS.
### Variant C: Enterprise FC SAN (6+ hosts)
Classic 3-tier architecture — compute (hosts) + storage (SAN) + network separated.
| Parameter | VMware vSphere | Hyper-V |
|----------|---------------|---------|
| **CPU** | 2× EPYC 9654-9965 (32-64C) | 2× EPYC 9654-9965 / Xeon 8592+ |
| **RAM** | 512-2048 GB (DDR5) | 512-2048 GB |
| **OS disk** | 2× SATA SSD RAID1 (480 GB) | 2× SATA SSD RAID1 |
| **Storage** | FC SAN LUN (2× FC HBA 32/64G) | FC SAN LUN or CSV over SMB |
| **App network** | 2-4× 25/100 GbE LACP | 2-4× 25/100 GbE LACP |
| **Storage network** | 2× FC 32/64G (multipath) | 2× FC 32/64G or SMB Multichannel |
| **vMotion / Live Migration** | 2× 25 GbE dedicated (vMotion) | 2× 25 GbE dedicated (SMB/RDMA) |
| **Management** | vCenter (VCSA), NSX, Aria | SCVMM, Azure Arc |
| **Cluster max** | 64-96 hosts (vSphere 8/9) | 64 hosts (WS 2025) |
| **Admission control** | 1-4 host failures | Nodes reserve |
| **DRS / Balancing** | DRS (fully automated) | SCVMM / AKS load balancing |
**Use case**: Enterprise, databases, critical applications, 500-5000 VMs.
**Storage variants**: FC SAN (lowest latency), iSCSI (lower CAPEX), NFS (simpler management).
**FC SAN topology**:
```
┌─────────────────────────────────────┐
│ FC Fabric │
│ ┌─────────┐ ┌─────────┐ │
│ │ Switch 1│ │ Switch 2│ │
│ └────┬────┘ └────┬────┘ │
└────────┼─────────────────┼──────────┘
┌─────┴─────┐ ┌─────┴─────┐
┌───┤ FC HBA 1 ├─┐ ┌─┤ FC HBA 2 ├───┐
│ └───────────┘ │ │ └───────────┘ │
┌──┴──┐ ┌──┴──┴──┐ ┌──┴──┐
│Host1│ │Host2 │ │Host3│ ...
└─────┘ └────────┘ └─────┘
```
### Variant D: Hyperscale OpenStack (20+ hosts)
For telco, large private clouds, MANO/NFVI environments.
| Parameter | Red Hat OpenStack | Canonical Charmed OpenStack |
|----------|-------------------|-----------------------------|
| **Compute** | Nova + KVM | Nova + KVM |
| **Storage** | Ceph (Cinder/RBD) + Swift | Ceph + Swift |
| **Network** | Neutron + OVN/OVS + DPDK | Neutron + OVN/OVS |
| **CPU per host** | 2× EPYC 9654-9965 (64-128C) | 2× EPYC 9654-9965 |
| **RAM per host** | 512-1024 GB | 512-1024 GB |
| **Storage per host** | Ceph OSD (4-12× NVMe/SSD) | Ceph OSD |
| **Network per host** | 4-8× 100 GbE (DPDK/VPP) | 4× 100 GbE |
| **Control plane** | 3-9× control node (HA) | 3-7× control node |
| **Orchestration** | TripleO / OpenStack Kolla | Juju + charms |
| **SDN** | OVN, OpenDaylight | OVN |
| **NFVI ready** | Yes (SR-IOV, NUMA, huge pages) | Yes |
| **Min. size** | 9 nodes (3 ctl + 3 compute + 3 ceph) | 7 nodes |
**Use case**: Telco (5G UPF, MNO), hyperscale private cloud, > 5000 VMs.
### Connectivity Summary by Platform
| Platform | App / VM Network | Storage Network | Replication / HA | Management |
|-----------|-------------|-------------|----------------|------------|
| **Proxmox small** | 2× 10/25 GbE LACP | — (local ZFS) | — | 1× 1 GbE |
| **vSAN (3-6)** | 2× 25 GbE LACP | 2× 25 GbE (vSAN) | vSAN traffic | 1× 1 GbE |
| **Proxmox Ceph (3-6)** | 2× 25 GbE | 2× 25 GbE (Ceph public) | 2× 25 GbE (Ceph cluster) | 1× 1 GbE |
| **Nutanix (3-6)** | 2× 25 GbE | Dedicated storage VLAN | Replication traffic | 1× 1 GbE |
| **vSphere FC SAN (6+)** | 2-4× 25/100 GbE LACP | 2× FC 32/64G multipath | 2× 25 GbE (vMotion) | 1× 1 GbE + SAN mgmt |
| **Hyper-V FC SAN (6+)** | 2-4× 25/100 GbE LACP | 2× FC 32/64G or SMB | 2× 25 GbE (Live Migration) | 1× 1 GbE |
| **OpenStack (20+)** | 2-4× 100 GbE | 2× 100 GbE (Ceph) | 2× 100 GbE (OVN) | 1× 1 GbE |
## Resources
Links, books and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)
### Recommended Reading
| Book | Authors | ISBN | Description |
|-------|--------|------|-------|
| Virtualization Essentials (3rd ed., 2023) | Matthew Portnoy | 978-1119481513 | Practical guide to virtualization: from hypervisor basics (Type 1/Type 2), VM configuration (CPU, memory, storage, networking) to cloud computing and DevOps. "Learning-by-doing" approach with tutorials. Author is a Senior System Engineer at VMware/Splunk. |
| VMware vSphere Design (2nd ed.) | Guthrie, Lowe, Coleman | 978-1119130312 | Comprehensive guide to vSphere infrastructure design: hardware selection, network layout, security, storage and hypervisors. Describes a framework for design, decision analysis and best practices from experienced VMware architects. |
*Last revision: 2026-06-04*

566
HYPERVISORS.md Normal file
View File

@@ -0,0 +1,566 @@
# 🖥️ Hypervisory a virtualizační platformy
## Typy hypervisorů
| Typ | Popis | Příklady |
|-----|-------|----------|
| **Type 1** (bare-metal) | Běží přímo na hardware | VMware ESXi, Microsoft Hyper-V, KVM, Xen |
| **Type 2** (hosted) | Běží nad OS hostitele | VirtualBox, VMware Workstation, Parallels |
## Přehled platforem
| Platforma | Hypervisor | Licence | Poznámka |
|-----------|-----------|---------|----------|
| **VMware vSphere** | ESXi | Proprietary (Subscription od 2024) | Tržní lídr, široká adopce. Po akvizici Broadcomem (2023) přešlo na per-core subscription, ukončen perpetual license |
| **Microsoft Hyper-V** | Hyper-V | Windows Server / standalone | Integrace s Azure, SCVMM |
| **Proxmox VE** | KVM + LXC | Open source | Debian-based, web UI, levný |
| **Red Hat OpenStack / oVirt** | KVM | Open source | Otevřená alternativa, komplexní |
| **Nutanix AHV** | KVM (fork) | Součást Nutanix | Integrované HCI řešení |
| **XCP-ng / Xen Server** | Xen | Open source | Nástupce Citrix Hypervisor |
| **Oracle VM** | Xen | Proprietary | Oracle ekosystém |
## Klíčové koncepty
- **VM — Virtual Machine** — plná virtualizace, vlastní kernel
- **Container** — sdílený kernel hostitele, lehčí (Docker, LXC)
- **Paravirtualizace** — guest OS ví, že běží ve VM (lepší výkon I/O)
- **NUMA** — Non-Uniform Memory Access, optimalizace přidělování CPU/memory (viz [SERVER-HW.md](SERVER-HW.md#numa))
- **Overcommit** — přidělení více vCPU/RAM než je fyzicky (řízení poměru)
- **Live Migration** — přesun běžící VM mezi hosty (vSphere vMotion, Hyper-V Live Migration)
- **HA (High Availability)** — restart VM na jiném hostu při selhání
- **DRS / Load Balancing** — automatická distribuce VM podle vytížení
## VMware vSphere
### VMware licensing (post-Broadcom 2024+)
Od roku 2024 VMware prodává pouze subscription license, perpetual + SnS (Support & Subscription) byly ukončeny.
| Produkt | Metrika | Cena (orientační) | Co obsahuje |
|---------|---------|-------------------|-------------|
| **vSphere Standard** | Per core (min 16 cores/CPU) | ~$140/core/rok | ESXi, vCenter, vMotion, HA, DRS basic |
| **vSphere Enterprise Plus** | Per core | ~$220/core/rok | Vše výše + DRS advanced, SIOC, NIOC, Big Data Extensions |
| **vSphere Foundation** | Per core (balíček) | ~$350/core/rok | VSphere Enterprise Plus + Aria Operations, Aria Operations for Logs, Aria Automation |
| **VMware Cloud Foundation (VCF)** | Per core (balíček) | ~$700/core/rok | VSphere + vSAN + NSX + Aria celá sada. Vyžadováno pro vSAN a NSX od 2025 |
| **vSAN** | Per core (pouze jako součást VCF od 2025) | Již není standalone | Storage virtualization, dedup, compression, encryption |
| **NSX** | Per core (pouze jako součást VCF od 2025) | Již není standalone | SDN, micro-segmentace, firewall, load balancing |
**Klíčové změny po Broadcom akvizici**:
- Ukončen prodej perpetual licencí (květen 2024)
- Ukončeny samostatné produkty: vSAN a NSX již nelze koupit standalone (pouze v rámci VCF)
- Zrušeny desktopové a ROBO varianty (migrováno na VCF)
- Průměrný nárůst nákladů: 2-5× oproti předchozímu modelu (závisí na velikosti a produktovém mixu)
- **Dopad**: Mnoho zákazníků migruje na Proxmox VE, Nutanix AHV nebo Hyper-V
**Per-core kalkulace**:
```text
Server: 2× EPYC 9654 (96C each) = 192 cores
vSphere Standard: 192 × $140 = $26 880/rok
VCF: 192 × $700 = $134 400/rok (vč. vSAN a NSX)
Pro srovnání: dříve perpetual + SnS ≈ $15 000 jednorázově + $3 000/rok
```
### Exit strategie z VMware (post-Broadcom 2024+)
#### Kontext
Po akvizici VMware společností Broadcom (dokončeno listopad 2023) došlo k největšímu otřesu na trhu virtualizace v historii. Změny zahrnují:
- **Ukončení perpetual licencí** (únor 2024) — povinný subscription model
- **Forced bundling** — 8000+ SKU zredukováno na 4 balíčky (VCF, VVF, vSphere Standard/Foundation)
- **Minimální závazek 72 cores** (od dubna 2025) — nelze licencovat malé servery
- **20% penalizace za pozdní obnovu** — žádná tolerance
- **Cenový nárůst 1501500 %** dle velikosti a produktového mixu
- **Zánik samostatných produktů** — vSAN a NSX pouze v rámci VCF
- **Kolaps partnerského ekosystému** — z 4500+ partnerů na ~300 Premier
Dle Foundry/CIO.com průzkumu (2025): **56 %** organizací plánuje snížit využití VMware, **71 %** aktivně hledá on-premise alternativy. Gartner predikuje ztrátu ~35 % workloadů do 3 let.
#### Tři strategie
| Strategie | Popis | Vhodné pro |
|-----------|-------|------------|
| **Stay** | Přijmout nové ceny, obnovit VCF/VVF předplatné | Velké organizace s hlubokou integrací, kde migrace stojí víc než nové licence |
| **Reduce** | Snížit VMware footprint, migrovat část workloadů na alternativy, zbytek optimalizovat | Střední a velké firmy s heterogenním prostředím |
| **Exit** | Kompletní migrace na alternativní platformu | SME, organizace s rostoucími náklady 3-6×, greenfield projekty |
#### Cílové platformy — srovnání
| Kritérium | Proxmox VE | Nutanix AHV | Microsoft Hyper-V | Red Hat OpenShift Virtualization | **Sangfor aSV (HCI)** |
|-----------|-----------|-------------|-------------------|----------------------------------|----------------------|
| **Hypervisor** | KVM + LXC | KVM (fork) | Hyper-V | KVM (KubeVirt) | **KVM (aSV)** |
| **Licence** | Open source (free), support ~€500/host/rok | Per node subscription (3060 % savings oproti VCF) | Windows Server license (Standard/Datacenter) | OpenShift subscription (core-based) | **Per node (Enterprise Pro), vše v ceně** |
| **Live Migration** | Live Migration (Proxmox 8+) | AHV Live Migration | Live Migration (SMB/RDMA) | KubeVirt (VMI live migration) | **Ano** |
| **HA** | Proxmox HA (watchdog, fencing) | Built-in HA (Prism) | Hyper-V HA (WS Failover Cluster) | OpenShift HA (self-healing) | **Built-in HA** |
| **Storage** | ZFS, Ceph, LVM | AOS (hybrid/SSD, erasure coding) | S2D, CSV, ReFS | OCS, Ceph, LSO | **aSAN (distribuovaný SDS, locality-aware)** |
| **Backup** | Proxmox Backup Server (free) | Native snapshot + DR | Windows Server Backup / Veeam | OpenShift APIs + OADP | **Built-in backup + CDP (Continuous Data Protection)** |
| **Cena (3 roky, 3 hosty)** | $0 + support $1 500 | ~$45 00060 000 | $0 (Hyper-V Server zdarma) nebo Windows Server lic. | ~$90 000+ (OpenShift) | **~$15 00025 000** |
| **Cena (3 roky, 10 hostů)** | $0 + support $5 000 | ~$150 000200 000 | Windows Server Datacenter pro neomezené VM | ~$300 000+ (OpenShift) | **~$50 00080 000** |
| **Náročnost migrace** | Střední (VMDK → QCOW2, VirtIO drivery) | Nízká (Nutanix Move tool) | Střední (V2V converter, SCVMM) | Vysoká (Kubernetes learning curve) | **Nízká (nástroje pro VMware import)** |
| **Linux podpora** | Výborná (nativní KVM) | Výborná (KVM-based) | Dobrá (LIS drivers) | Výborná (KVM + OpenShift) | **Výborná (KVM-based)** |
| **Windows podpora** | Dobrá (VirtIO drivers) | Výborná (ALAS drivers, svpd) | Výborná (nativní) | Dobrá (KubeVirt + VirtIO) | **Dobrá (VirtIO drivers)** |
| **GPU passthrough** | VFIO (výborná) | GPU passthrough | DDA (Direct Device Assignment) | VFIO + GPU Operator | **vGPU support (standard)** |
| **Integrovaná bezpečnost** | — | — | — | — | **Ano (NGFW, IPS, WAF, EDR — aSEC)** |
| **Min. cluster (3 kopie)** | 3 (Ceph) | 3 | 23 | 3 | **3** |
#### Migrační nástroje
| Nástroj | Zdrojová platforma | Cílová platforma | Metoda |
|---------|-------------------|-------------------|--------|
| **Proxmox VMware Import Wizard** | VMware ESXi | Proxmox VE | Web GUI import přes NFS/ESXi API. Omezení: nutné ukončit snapshoty, nepodporuje UEFI do Proxmox 8.1 |
| **Nutanix Move** | VMware ESXi, Hyper-V | Nutanix AHV | Virtuální appliance, automatizovaná migrace s minimálním downtime, podpora UEFI, možnost retain IP/MAC |
| **Veeam Backup & Replication v12.2+** | VMware ESXi | Proxmox VE | Backup/restore přes Veeam, hot migration, podpora Proxmox od v12.2 |
| **StarWind V2V Converter** | VMware ESXi | Proxmox, Hyper-V, XCP-ng | Free GUI tool, VMDK → QCOW2/raw/VHDX, CLI support, hot migrations |
| **virt-v2v** | VMware ESXi, Xen, Hyper-V | KVM (libvirt) | Open source CLI nástroj, konverze disků + driverů (virtio), vhodný pro hromadnou migraci |
| **Windows Admin Center VM Conversion Extension** | VMware ESXi | Hyper-V | Microsoft WAC extension, free, GUI-based, hromadná migrace |
| **Platform9 vJailbreak** | VMware ESXi | OpenStack / KVM | In-place migration (bez swing gear), open source |
| **Sangfor VMware Import Tool** | VMware ESXi | Sangfor aSV (HCI) | Nástroj pro import VM z vCenter, konverze disků + driverů, možnost retain network config |
#### Matice migrací napříč hypervisory
Komplexní přehled všech dvojic zdroj → cíl s metodami, nástroji, omezeními a obtížností.
| Zdroj → Cíl | Metoda | Nástroje | Obtížnost | Omezení |
|-------------|--------|----------|-----------|---------|
| **VMware → Proxmox** | Disk konverze VMDK→QCOW2, reinstalace driverů | Proxmox Import Wizard, Veeam, StarWind, virt-v2v | Střední | Nutné VirtIO drivery, UEFI nepodporováno v Import Wizard (< 8.1), nutno odstranit snapshoty |
| **VMware → Hyper-V** | Disk konverze VMDK→VHDX, reinstalace driverů | StarWind, WAC Converter, SCVMM, Microsoft MTC | Střední | Integration Services nutné, rozdíly v síťové konfiguraci (VMXNET3 → Hyper-V Synthetic) |
| **VMware → KVM/XCP-ng** | Disk konverze VMDK→raw/QCOW2, driver swap | virt-v2v, StarWind | Střední | VirtIO drivers, UEFI support (OVMF), host passthrough musí být kompatibilní |
| **VMware → Nutanix AHV** | Automatizovaná migrace přes Move appliance | Nutanix Move, Veeam | Nízká | AHV je také KVM minimální problémy, retain IP/MAC, podpora UEFI |
| **VMware → Sangfor aSV** | Import přes VMware Import Tool, konverze disků + driverů | Sangfor VMware Import Tool | Nízká | Built-in nástroj, retain network config, support UEFI |
| **VMware → OpenStack** | In-place nebo swing | Platform9 vJailbreak, virt-v2v + Glance | Vysoká | Nutný redesign networking (Neutron), storage (Cinder), image format (Glance) |
| **Hyper-V → VMware** | Disk konverze VHDX→VMDK, reinstalace driverů | StarWind, virt-v2v, VMware vCenter Converter (standalone) | Střední | VMware Tools nutné, síťový driver change (VMXNET3), UEFI/secure boot issues |
| **Hyper-V → Proxmox** | Disk konverze VHDX→QCOW2, driver swap | StarWind, virt-v2v, qemu-img | StředníVysoká | VirtIO drivers, integration services → guest agent, secure boot issues |
| **Hyper-V → KVM/XCP-ng** | Disk konverze VHDX→raw/QCOW2 | virt-v2v, qemu-img | Střední | VirtIO drivers, Linux generické drivery obvykle fungují |
| **Hyper-V → Nutanix AHV** | Automatizovaná migrace | Nutanix Move | NízkáStřední | Obdobné jako VMware→Nutanix, support UEFI, retain IP |
| **Proxmox → VMware** | Export OVF/OVA, qemu-img convert | qemu-img (QCOW2→VMDK), ovftool, manuální OVF export | Vysoká | VMware Tools nutné, rozdíly v storage formátech, bez live migration, nutný downtime |
| **Proxmox → Hyper-V** | qemu-img convert, reinstalace driverů | qemu-img, manuální VHDX konverze | Vysoká | Hyper-V Integration Services nutné, žádný automatizovaný nástroj, edge case |
| **Proxmox → KVM/XCP-ng** | Direct QCOW2 (stejný formát), úprava XML | libvirt, virsh dumpxml/define | Střední | Rozdíly v libvirt XML/QEMU args (storage pool, síť), nutná validace |
| **Proxmox → Nutanix AHV** | qemu-img + manuální import | qemu-img, Nutanix Image Service CLI | Vysoká | Žádný hot nástroj, nutná konverze + manuální rekonfigurace VM |
| **XCP-ng → VMware** | Disk konverze VHD→VMDK | qemu-img, StarWind, virt-v2v | Vysoká | VMware Tools nutné, rozdíly v paravirtualizaci (Xen PV vs VMware) |
| **XCP-ng → Proxmox** | Disk konverze nebo direct VHD | qemu-img, manuální import | Střední | Konverze disků, formát VHD není nativní v Proxmox |
| **XCP-ng → Hyper-V** | Disk konverze VHD→VHDX (přímá) | StarWind, qemu-img | Střední | VHD/VHDX kompatibilní, nutné Integration Services |
| **Nutanix AHV → VMware** | Export + konverze | qemu-img, Nutanix Export, VMware vCenter Converter | Vysoká | VMware Tools, AHV je KVM → obvykle jednodušší než Hyper-V→VMware |
| **Nutanix AHV → Proxmox** | qemu-img + manuální import | qemu-img, Nutanix self-service restore | Střední | Disky z AFS → QCOW2, metadata nutno rekonstruovat |
| **Nutanix AHV → Hyper-V** | qemu-img + manuální | qemu-img, StarWind | Vysoká | Edge case, žádný hot nástroj |
| **OpenStack → (any)** | Glance export + qemu-img | glance image-download, qemu-img, ovftool | StředníVysoká | Image formát (raw/QCOW2), metadata (flavor, security groups) nutno znovu vytvořit |
| **Sangfor aSV → (any)** | qemu-img konverze + manuální | qemu-img, manuální OVF/OVA export | StředníVysoká | KVM-based → konverze do QCOW2/VMDK/VHDX přes qemu-img, metadata nutno znovu vytvořit |
| **(any) → Sangfor aSV** | aSV API import + VMware Import Tool | Sangfor VMware Import Tool (pro VMware), manuální qemu-img import pro ostatní | Střední | KVM-based → podpora standardních formátů, import tool jen pro VMware |
**Klíče k úspěšné migraci:**
- **Drivery** — každá platforma vyžaduje vlastní paravirtual drivers (VMware Tools, VirtIO, Hyper-V Integration Services, Xen Tools). Po migraci vždy vyměnit.
- **UEFI / Secure Boot** — ne všechny kombinace podporují UEFI (Proxmox Import Wizard < 8.1 nepodporuje). Při migraci UEFI VM raději testovat.
- **Snapshoty** — snapshots musí být před migrací odstraněny (sloučeny). Většina nástrojů migruje jen flat disky.
- **Síť** — MAC adresy, IP adresy, VLAN tagging — po migraci zkontrolovat. Některé nástroje (Nutanix Move, VMware Converter) umí retain MAC.
- **Storage format** — VMDK ↔ VHDX ↔ QCOW2 ↔ raw jsou vzájemně konvertovatelné přes `qemu-img`, ale liší se v metadatech (snapshots, backing files).
- **Live migration** — mezi různými hypervisory neexistuje live migration. Vždy je potřeba downtime (minuty až hodiny podle velikosti VM).
- **Teplota migrace** — čím "chladnější" VM (méně změn), tím snazší migrace. Aplikace s databází v reálném čase vyžadují samostatný DB migrační plán.
#### TCO srovnání — příklad: 3 hosty (2× 20C CPU), 50 VM
| Platforma | 1. rok | 3 roky celkem | Poznámka |
|-----------|--------|---------------|----------|
| **VMware VVF** (1-year rate) | $22 800 | $68 400 | 120 cores × $190/core/rok |
| **VMware VCF** | $42 000 | $126 000 | 120 cores × $350/core/rok |
| **Proxmox VE** (support) | $1 500 | $4 500 | 3× €500/host/rok |
| **Nutanix AHV** (průměr) | ~$18 000 | ~$54 000 | Per node subscription, odhad |
| **Hyper-V** (Windows Server Datacenter) | $12 400 | $37 200 | Jednorázová licence per core, bez SA |
| **Hyper-V** (Azure Stack HCI) | ~$7 200 | ~$21 600 | ~$10/core/měsíc, 120 cores |
| **Sangfor HCI** (Enterprise Pro) | ~$5 0008 000 | ~$15 00025 000 | Per node, vše v ceně, 3 uzly |
**Reálný příklad ze Spiceworks (2026)**: Uživatel hlásí navýšení VMware Essentials+ z $1 900/rok na $14 000/rok (VVF) — nárůst 7.4×.
#### Rozhodovací rámec
```
1. Proveď audit VMware prostředí
├─ Počet hostů, core count, využití
├─ Feature dependency (vSAN, NSX, SRM)
├─ Workload profile (Windows vs Linux, DB, GPU)
└─ Hardware refresh cycle
2. Spočítej TCO pro VMware renewal (3 roky)
├─ VVF vs VCF vs aktuální model
└─ Zahrň audit risk, late renewal penalty
3. Vyber cílovou platformu (1-2 kandidáty)
├─ Proxmox: nejnižší TCO, Linux-heavy shops
├─ Nutanix: enterprise HCI, nízká náročnost migrace
├─ Hyper-V: Windows-centric, Azure hybrid
├─ Sangfor: HCI all-in-one, security-first, VMware exit (SMB/mid-market)
└─ OpenShift: Kubernetes-first, platform engineering
4. Naplánuj migrační fáze
├─ Wave 1: non-critical (dev/test, 1-2 měsíce)
├─ Wave 2: standard production (3-6 měsíců)
├─ Wave 3: mission-critical (6-12 měsíců)
└─ Coexistence: VMware + cíl běží paralelně
5. Počítej s 18-48 měsíci na kompletní exit (Gartner)
```
#### Reálné case studies
| Organizace | Výchozí | Cíl | Rozsah | Výsledek |
|-----------|---------|-----|--------|----------|
| **Stanford University** | VMware (60+ nodů) | Proxmox VE (6 clusterů) | 1 500 VM | Dokončeno 2025, zvýšená automatizace, nižší náklady |
| **Michelin** | VMware | Platform9 + OpenStack | Desítky nodů | Platform engineering tým, migrace výrobních workloadů |
| **Český podnik (50-100 serverů)** | VMware | Proxmox VE | ~100 VM | Roční úspora ~340 000500 000 CZK na licencích |
#### Načasování — klíčové deadline
| Událost | Datum | Dopad |
|---------|-------|-------|
| **Ukončení perpetual licencí** | Únor 2024 | Již proběhlo |
| **72-core minimum** | Duben 2025 | Small server licensing zdraženo |
| **vSphere 7 EOS** | Duben 2025 | Nutnost upgrade na 8.x |
| **ESXi 8.0 EOS** | Říjen 2027 | Poslední supported verze, migrační deadline |
| **Windows Server 2025 Hyper-V** | Prosinec 2025 | 64 hostů cluster, 2 048 vCPU per VM |
| **Proxmox VE 9 + Datacenter Manager** | 2026 | Enterprise features, vCenter alternativa |
#### Doporučení
| Scénář | Akce |
|--------|------|
| **Malá firma (< 10 hostů), Linux workloady** | Migrovat na Proxmox VE — okamžitá úspora 100 % licencí |
| **Střední firma (10-50 hostů), smíšené workloady** | Vyhodnotit Nutanix AHV (snadná migrace) nebo Proxmox (nižší TCO) |
| **Enterprise (50+ hostů), hluboká VMware integrace** | Reduce strategie: optimalizovat stávající VMware + migrovat vybrané workloady na OpenShift / Hyper-V |
| **Microsoft shop** | Hyper-V / Azure Stack HCI — native Azure hybrid, žádné dodatečné licence na hypervisor |
| **Kubernetes-native tým** | OpenShift Virtualization / KubeVirt — sjednotit VM a container management |
| **MSP / poskytovatel hostingu** | Nutanix nebo OpenStack — multi-tenancy, vCloud Director alternativa |
#### Cluster design
- **Max velikost clusteru**: 64 hostů (vSphere 8/9), 96 hostů (vSphere 8 + enhanced)
- **Datastore limits**: max 256 datastorů na host, max 65 TB na VMFS-6 datastore
- **vSAN ready capacity**: doporučeno max 60-64 hostů na vSAN cluster
- **Fault domains** — rozdělení clusteru do skupin hostů (rack awareness), min 3 fault domains pro stetch cluster
- **Admission control** — rezervace resource pro HA failover:
- **Host failures cluster tolerates** — nejčastější (1-4 hosty)
- **Percentage of cluster resources** — rezervace % CPU/memory
- **Dedicated failover hosts** — vyhrazený host(y) pro HA
- **Cluster limits (vSphere 8/9)**:
- 960 VMs per host (vSphere 9 max)
- 15 000 VMs per cluster (vCenter max)
- 300 hosts per cluster (vSphere 8/9, hardware vMotion)
### Microsoft Hyper-V licensing
| Varianta | Metrika | Cena | Co obsahuje |
|----------|---------|------|-------------|
| **Windows Server Standard** | Per core (min 16 licencí/server) + CAL | ~$1 000/core (jednorázově) + $200/CAL | 2 VM licence (každá s plnou Windows Server licencí) |
| **Windows Server Datacenter** | Per core (min 16 licencí/server) + CAL | ~$6 200/core (jednorázově) + $200/CAL | Neomezené VM, Storage Spaces Direct, Shielded VMs |
| **Azure Stack HCI** | Per core (měsíčně) | ~$10-20/core/měsíc (Azure hybrid benefit) | Hyper-V + S2D + Azure management, součást Azure subscription |
| **Hyper-V Server** | Zdarma | $0 | Samostatný hypervisor (bez managementu, bez GUI, omezená podpora) — od 2025 již není distribuován |
**Důležité**:
- Windows Server Standard = 2 VM na každou licenci. Pokud potřebujete 3 VM na 2-socket serveru, potřebujete 2× Standard license (4 VM) nebo Datacenter
- **Azure Hybrid Benefit** — pokud máte Windows Server s SA (Software Assurance), můžete použít license v Azure bez dodatečných nákladů
- **CAL (Client Access License)** — každý uživatel nebo zařízení přistupující k Windows Serveru musí mít CAL (kromě Azure Hybrid Benefit)
## Microsoft Hyper-V
| Vlastnost | Hyper-V | Poznámka |
|-----------|---------|----------|
| **Max hostů v clusteru** | 64 (Windows Server 2025) | Shared Nothing Live Migration |
| **Max VM na host** | 1024 (WS 2022+) | Generace 2 VM |
| **Max vCPU per VM** | 240 (WS 2022+) | 64 hostů cluster |
| **Max RAM per VM** | 12 TB (WS 2022+) | Dynamická paměť |
| **Live Migration** | SMB, CSV, RDMA | Compressed nebo RDMA |
| **Storage** | CSV (Cluster Shared Volumes), ReFS | S2D pro HCI |
| **Nested Virtualization** | Ano | Intel VT-x / AMD-V |
| **SCVMM** | System Center VMM | Enterprise management, fabric, P2V |
### Hyper-V vs VMware srovnání
| Vlastnost | VMware vSphere | Microsoft Hyper-V |
|-----------|---------------|-------------------|
| **OS** | VMware ESXi (VMkernel) | Windows Server / Hyper-V Server |
| **Licence** | Per CPU (subscription) | Windows Server license / Datacenter |
| **Storage** | VMFS, NFS, vSAN, HCI | NTFS, ReFS, SMB, S2D |
| **Live Migration** | vMotion (cross-vSwitch, long distance) | Live Migration (SMB/RDMA) |
| **Storage Migration** | Storage vMotion (online) | Shared Nothing (datový disk) |
| **Replication** | vSphere Replication | Hyper-V Replica (ASR) |
| **Management** | vCenter, vSphere Client | SCVMM, Hyper-V Manager, Admin Center |
| **Linux support** | Výborný (open-vm-tools) | Dobrý (Linux Integration Services) |
| **TCO** | Vyšší | Nižší (s Windows licencí) |
## KVM
### Architektura
```
Hardware ──> QEMU (emulace I/O) + KVM (kernel module, virtualization)
libvirt (API + management)
┌───────┼───────────┐
virt-manager virsh openstack/proxmox
```
### Ladění
- **CPU pinning** — `virsh vcpupin vm1 0 2` (vCPU 0 → physical core 2), zamezuje přepínání kontextu
- **Huge pages** — 2 MB / 1 GB stránky místo 4 KB, snížení výpadků TLB (VM s velkou RAM): `echo 2048 > /proc/sys/vm/nr_hugepages`
- **NUMA affinity** — VM pinned na jeden NUMA node (minimalizace cross-NUMA memory access)
- `numactl --cpunodebind=0 --membind=0`
- `virsh numatune vm1 --nodeset 0`
- **VirtIO** — paravirtualizované I/O (virtio-net, virtio-blk, virtio-scsi) pro lepší výkon
- **IO threads** — dedikovaná vlákna pro I/O emulaci QEMU
### KVM tuning checklist
- Ověřit HW virtualizaci: `lscpu | grep Virtualization`
- Naložit KVM moduly: `kvm`, `kvm_intel`/`kvm_amd`, `vfio-pci`
- Optimalizovat storage: raw/LVM (vyhnout se qcow2 u výkonových workloadů)
## Sangfor aSV (HCI)
[Čínský vendor](https://www.sangfor.com) — KVM-based hypervisor, součást Sangfor HCI stacku (aSV + aSAN + aNet + aSEC). V ČR distribuován přes partnery.
### Architektura stacku
| Komponenta | Role |
|-----------|------|
| **aSV** | Hypervisor (KVM-based) |
| **aSAN** | Distributed SDS (locality-aware, data tiering, dedup, compression) |
| **aNet** | Network virtualization (distribuované switche a routery, WYDIWYG vizuální editor) |
| **aSEC** | Bezpečnost (NGFW, IPS, WAF, EDR, east-west segmentation) |
| **Sangfor Cloud Platform** | Management orchestrator, unified dashboard |
### Klíčové vlastnosti
| Vlastnost | Detail |
|-----------|--------|
| **Hypervisor** | KVM (aSV) — vlastní fork s rozšířeními pro HCI |
| **Licence** | Enterprise Pro — per node, vše v ceně (compute + storage + network + security) |
| **Min. cluster** | 3 uzly (3 kopie dat) |
| **Live Migration** | Ano |
| **HA** | Built-in HA |
| **Storage** | aSAN — locality-aware (data locality), data tiering (SSD + HDD), dedup, compression, erasure coding |
| **Backup** | Built-in backup + CDP (Continuous Data Protection) — bez nutnosti 3rd party |
| **Security** | Integrated NGFW, IPS, WAF, EDR — bez externích appliance |
| **VDI** | aDesk — integrované VDI řešení |
| **Kubernetes** | SKE (Sangfor Kubernetes Engine) |
| **Migrace** | Sangfor VMware Import Tool (z vCenter), qemu-img pro ostatní |
| **vGPU** | Standardní podpora (bez extra licence) |
### Srovnání s VMware
| Feature | Sangfor | VMware |
|---------|---------|--------|
| **Licence** | Per node, vše v ceně | Vícestupňová (vSphere + vSAN + NSX + Aria) |
| **vGPU** | V ceně (standard) | Jen v Enterprise Plus |
| **Backup + CDP** | Built-in | 3rd party nebo extra licence |
| **Security (NGFW, IPS, WAF)** | Built-in (aSEC) | NSX + 3rd party (Palo Alto, Check Point) |
| **Network management** | WYDIWYG vizuální editor | NSX Manager (složitější) |
| **Min. cluster (3 kopie)** | 3 uzly | 5 uzlů (vSAN) |
| **Data locality** | Ano | Ne |
| **SSD life prediction** | Ano | Ne |
### Use case
- **VMware exit** — náhrada za VMware v SMB a mid-market
- **Greenfield HCI** — nové DC, branch offices, remote sites
- **VDI** — aDesk integrovaný s HCI
- **Security-first** — organizace vyžadující integrovanou bezpečnost (NGFW, IPS, WAF)
- **Asie-Pacific / EMEA** — nejsilnější v Asii, expanding do Evropy
### Rizika a omezení
| Riziko | Detail |
|--------|--------|
| **Geopolitické** | Čínský vendor — možné regulatory restrictions (GDPR, EU, NATO, government) |
| **Ekosystém** | Menší komunita než VMware/Proxmox, méně dokumentace a ISV certifikací |
| **Support** | Support primárně z Asie, lokální partner kritický |
| **Vendor lock-in** | Uzavřený ekosystém (aSV + aSAN + aNet + aSEC), těžší mix s 3rd party |
| **Reference v ČR** | Velmi omezené — nutný pilot před produkcí |
### Migrace na/z Sangfor
Viz matice migrací výše v této sekci. Pro VMware → Sangfor existuje dedikovaný import nástroj. Pro ostatní hypervisory standardní qemu-img.
## Storage v hypervizorech
Viz také: [STORAGE.md](STORAGE.md) — detailní přehled storage protokolů a konfigurací.
| Typ | Popis | Protokoly |
|-----|-------|-----------|
| **Local storage** | Disky přímo v serveru | SATA, SAS, NVMe |
| **Shared storage** | SAN / NAS přístupné všem hostům | Fibre Channel, iSCSI, NFS, SMB |
| **vSAN / HCI** | Hyperkonvergované úložiště (disky serverů = jediný pool) | VMware vSAN, Nutanix, StarWind |
| **Software-Defined** | SDS odděluje storage software od hardware | Ceph, GlusterFS, MinIO |
## HCI detail
| Vlastnost | Nutanix (AOS + AHV) | VMware vSAN | Azure Stack HCI |
|-----------|--------------------|-------------|----------------|
| **Hypervisor** | AHV (KVM fork), ESXi optional | ESXi (required) | Hyper-V |
| **Min. nodů** | 3 | 2 (witness) | 2 (witness) |
| **Max nodů** | 80+ | 64 | 16 (typical) |
| **Replikace** | 2 nebo 3 kopie + erasure coding | Mirroring (RAID 1), erasure coding | Mirroring + parity |
| **Deduplication** | Na úrovni clusteru (post-process) | Na úrovni disku (capacity tier) | ReFS (real-time) |
| **Compression** | Inline (AOS 6+) | Dedup + compression combined | ReFS |
| **Management** | Prism (web UI) | vCenter + vSAN UI | Windows Admin Center |
| **Licencování** | Per node subscription | Per CPU subscription | Per core subscription |
| **Ekosystém** | Built-in DR, backup, security | Broad ISV ecosystem | Azure integration |
| **Use case** | Enterprise VDI, general VM | VMware-centric shops | Azure hybrid, branch offices |
## Virtualizační platformy — srovnání
| Schopnost | VMware vSphere | Microsoft Hyper-V | Proxmox VE | Nutanix AHV |
|-----------|---------------|-------------------|------------|-------------|
| Live Migration | vMotion | Live Migration | Live Migration | Live Migration |
| HA | vSphere HA | Hyper-V HA | Proxmox HA | Built-in |
| DRS/balancování | DRS | SCVMM / AKS | HA skupiny | Built-in |
| Storage vMotion | ano | při vypnuté VM | ZFS send/recv | Built-in |
| Snapshoty | ano | ano | ano | ano |
| Backup API | CBT (Changed Block Tracking) | Hyper-V WMI / RCT | Proxmox Backup Server | Native |
| GPU passthrough | vGPU (NVIDIA Grid) | DDA | VFIO passthrough | GPU passthrough |
| Licencování | Per CPU / subscription | Windows Server licence | Open source (free) | Per node subscription |
## OpenStack
- **Distribuce**: Red Hat OpenStack, Canonical Charmed OpenStack
- **Služby**: Nova (compute), Cinder (block), Neutron (networking), Glance (images), Swift (object)
- **Use case**: Telco, velké private cloudy, MNO (MANO, NFVI)
- **Náročnost**: Vysoká — komplexní nasazení a údržba
---
## Variantní konfigurace hypervizorů podle velikosti a typu storage
### Volba platformy podle use case
| Use case | Primární volba | Alternativa | Zdůvodnění |
|----------|---------------|-------------|------------|
| **VMware shop, enterprise** | vSphere 8/9 | Hyper-V | Nejobsáhlejší ekosystém, vSAN, SRM, nejširší ISV podpora |
| **Microsoft shop, Azure hybrid** | Hyper-V / Azure Stack HCI | vSphere | Windows Server CAL už je, S2D, Azure Arc, native Hyper-V Replica |
| **SME / nízký budget** | Proxmox VE | XCP-ng / Hyper-V (free) | Open source, vestavěný Ceph, ZFS, PBS, žádné licenční náklady |
| **HCI greenfield** | Nutanix AHV | VMware vSAN | All-in-one, jednoduchá správa, vestavěný DR a backup |
| **Hyperscale / telco** | OpenStack (RHOSP) | — | Multi-tenancy, NFVI, MANO, Neutron SDN, Ceph integrace |
### Varianta A: Malé nasazení (2-3 hosty, lokální storage)
Pro malé firmy, pobočky, edge, dev/test. Žádné sdílené storage — HA zajištěna aplikačně nebo replikací VM.
| Parametr | Proxmox VE | VMware vSphere | Hyper-V |
|----------|-----------|---------------|---------|
| **CPU** | 1× EPYC 9124-9224 / Xeon 4410Y (8-16C) | 1× EPYC 9124-9224 / Xeon 4410Y | 1× Xeon 4410Y / EPYC 9124 |
| **RAM** | 64-128 GB (DDR5-4800, 1DPC) | 64-128 GB | 64-128 GB |
| **OS disk** | 2× SATA SSD RAID1 (240-480 GB) | 2× SATA SSD RAID1 | 2× SATA SSD RAID1 |
| **VM storage** | ZFS RAID10 (4-6× NVMe/SATA SSD) | VMFS local (4-6× SSD RAID5/10) | ReFS CSV (4-6× SSD RAID10) |
| **Network** | 2× 10/25 GbE LACP | 2× 10/25 GbE LACP + management | 2× 10/25 GbE LACP |
| **Management** | Proxmox web UI (1× node) | vCSA / vCenter (1× appliance) | Windows Admin Center / SCVMM |
| **HA** | Proxmox HA (watchdog, fencing) | vSphere HA (1 host failure) | Hyper-V HA (WS Failover Cluster) |
| **Backup** | Proxmox Backup Server | Veeam B&R (Community) | Windows Server Backup / Veeam |
| **Licence** | Zdarma (support ~€500/host/rok) | vSphere Essentials (~$600/3 hosts) | Windows Server Standard (2 VMs) |
**Use case**: Startup, pobočka, dev/test, < 200 VM, bez SAN, minimální budget.
**Výhody**: Nízká cena, jednoduchá správa. **Nevýhody**: Omezená škálovatelnost, výpadek hostu = nedostupnost VM.
### Varianta B: Střední HCI (3-6 hostů, vSAN / Ceph)
Hyperkonvergovaná infrastruktura — storage běží na stejných hostech jako VM.
| Parametr | VMware vSAN | Proxmox + Ceph | Nutanix AHV |
|----------|------------|----------------|-------------|
| **CPU** | 1-2× EPYC 9334-9654 (16-32C) | 1-2× EPYC 9224-9334 (12-24C) | 1-2× EPYC 9334-9654 |
| **RAM** | 256-512 GB | 128-256 GB | 256-512 GB |
| **Cache tier** | 1-2× NVMe cache (write buffer) | — (Ceph používá RAM/OSD) | 1-2× NVMe (oplog) |
| **Capacity tier** | 4-8× SSD (SAS/SATA) | 4-8× HBA NVMe/SSD (OSD) | 4-6× SSD (extent store) |
| **Network** | 4× 25 GbE (vSAN + VM + mgmt) | 4× 25 GbE (Ceph public + cluster) | 4× 25 GbE (storage + VM) |
| **Fault domain** | Rack awareness (3 racks min) | CRUSH rack level | Rack awareness |
| **Replication** | RAID-1 mirroring (FTT=1) | 3× replikace / EC 8+3 | 2× kopie + EC |
| **Dedupe/Compress** | Dedup + compression (capacity) | ZFS / Ceph compression (inline) | Inline compression |
| **HA limit** | 1-3 host failures | 1-2 host failures (replication) | 1-2 host failures |
| **Min. hostů** | 2 + witness | 3 (MON + OSD) | 3 |
**Use case**: Střední firma, VDI, general virtualizace, 50-500 VM.
**Doporučení**: Pro vSAN → min. 4 hosty pro FTT=1 s erasure coding. Pro Ceph → min. 3 hosty, ideálně 5+, každý OSD host = 1 OSD na NVMe pro maximální IOPS.
### Varianta C: Enterprise FC SAN (6+ hostů)
Klasická 3-tier architektura — compute (hosty) + storage (SAN) + network oddělené.
| Parametr | VMware vSphere | Hyper-V |
|----------|---------------|---------|
| **CPU** | 2× EPYC 9654-9965 (32-64C) | 2× EPYC 9654-9965 / Xeon 8592+ |
| **RAM** | 512-2048 GB (DDR5) | 512-2048 GB |
| **OS disk** | 2× SATA SSD RAID1 (480 GB) | 2× SATA SSD RAID1 |
| **Storage** | FC SAN LUN (2× FC HBA 32/64G) | FC SAN LUN nebo CSV over SMB |
| **App network** | 2-4× 25/100 GbE LACP | 2-4× 25/100 GbE LACP |
| **Storage network** | 2× FC 32/64G (multipath) | 2× FC 32/64G nebo SMB Multichannel |
| **vMotion / Live Migration** | 2× 25 GbE dedikované (vMotion) | 2× 25 GbE dedikované (SMB/RDMA) |
| **Management** | vCenter (VCSA), NSX, Aria | SCVMM, Azure Arc |
| **Cluster max** | 64-96 hostů (vSphere 8/9) | 64 hostů (WS 2025) |
| **Admission control** | 1-4 host failures | Nodes reserve |
| **Drs / Balancování** | DRS (fully automated) | SCVMM / AKS load balancing |
**Use case**: Enterprise, databáze, kritické aplikace, 500-5000 VM.
**Varianty storage**: FC SAN (nejnižší latence), iSCSI (nižší CAPEX), NFS (jednodušší management).
**FC SAN topologie**:
```
┌─────────────────────────────────────┐
│ FC Fabric │
│ ┌─────────┐ ┌─────────┐ │
│ │ Switch 1│ │ Switch 2│ │
│ └────┬────┘ └────┬────┘ │
└────────┼─────────────────┼──────────┘
┌─────┴─────┐ ┌─────┴─────┐
┌───┤ FC HBA 1 ├─┐ ┌─┤ FC HBA 2 ├───┐
│ └───────────┘ │ │ └───────────┘ │
┌──┴──┐ ┌──┴──┴──┐ ┌──┴──┐
│Host1│ │Host2 │ │Host3│ ...
└─────┘ └────────┘ └─────┘
```
### Varianta D: Hyperscale OpenStack (20+ hostů)
Pro telco, velké private cloudy, MANO/NFVI prostředí.
| Parametr | Red Hat OpenStack | Canonical Charmed OpenStack |
|----------|-------------------|-----------------------------|
| **Compute** | Nova + KVM | Nova + KVM |
| **Storage** | Ceph (Cinder/RBD) + Swift | Ceph + Swift |
| **Network** | Neutron + OVN/OVS + DPDK | Neutron + OVN/OVS |
| **CPU per host** | 2× EPYC 9654-9965 (64-128C) | 2× EPYC 9654-9965 |
| **RAM per host** | 512-1024 GB | 512-1024 GB |
| **Storage per host** | Ceph OSD (4-12× NVMe/SSD) | Ceph OSD |
| **Network per host** | 4-8× 100 GbE (DPDK/VPP) | 4× 100 GbE |
| **Control plane** | 3-9× kontrolní nod (HA) | 3-7× kontrolní node |
| **Orchestrace** | TripleO / OpenStack Kolla | Juju + charms |
| **SDN** | OVN, OpenDaylight | OVN |
| **NFVI ready** | Yes (SR-IOV, NUMA, huge pages) | Yes |
| **Min. velikost** | 9 nodeů (3 ctl + 3 compute + 3 ceph) | 7 nodeů |
**Use case**: Telco (5G UPF, MNO), hyperscale private cloud, > 5000 VM.
### Connectivity summary podle platformy
| Platforma | App / VM síť | Storage síť | Replikace / HA | Management |
|-----------|-------------|-------------|----------------|------------|
| **Proxmox malý** | 2× 10/25 GbE LACP | — (lokální ZFS) | — | 1× 1 GbE |
| **vSAN (3-6)** | 2× 25 GbE LACP | 2× 25 GbE (vSAN) | vSAN traffic | 1× 1 GbE |
| **Proxmox Ceph (3-6)** | 2× 25 GbE | 2× 25 GbE (Ceph public) | 2× 25 GbE (Ceph cluster) | 1× 1 GbE |
| **Nutanix (3-6)** | 2× 25 GbE | Dedikované storage VLAN | Replication traffic | 1× 1 GbE |
| **vSphere FC SAN (6+)** | 2-4× 25/100 GbE LACP | 2× FC 32/64G multipath | 2× 25 GbE (vMotion) | 1× 1 GbE + SAN mgmt |
| **Hyper-V FC SAN (6+)** | 2-4× 25/100 GbE LACP | 2× FC 32/64G nebo SMB | 2× 25 GbE (Live Migration) | 1× 1 GbE |
| **OpenStack (20+)** | 2-4× 100 GbE | 2× 100 GbE (Ceph) | 2× 100 GbE (OVN) | 1× 1 GbE |
## Zdroje
Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
### Doporučená literatura
| Kniha | Autoři | ISBN | Popis |
|-------|--------|------|-------|
| Virtualization Essentials (3rd ed., 2023) | Matthew Portnoy | 978-1119481513 | Praktický průvodce virtualizací: od základů hypervisorů (Type 1/Type 2), konfigurace VM (CPU, memory, storage, networking) až po cloud computing a DevOps. "Learning-by-doing" přístup s tutorialy. Autor je Senior System Engineer u VMware/Splunk. |
| VMware vSphere Design (2nd ed.) | Guthrie, Lowe, Coleman | 978-1119130312 | Komplexní průvodce návrhem vSphere infrastruktury: hardware selection, network layout, security, storage a hypervisory. Popisuje framework pro design, analýzu rozhodnutí a best practices od zkušených VMware architectů. |
*Poslední revize: 2026-06-04*

12
INFRASTRUCTURE.en.md Normal file
View File

@@ -0,0 +1,12 @@
# 🏗️ Infrastructure
This file has been split into separate areas:
| Area | File |
|--------|--------|
| 🖥️ Hypervisors and virtualization | [HYPERVISORS.en.md](HYPERVISORS.en.md) |
| 🏭 Data centers | [DATACENTERS.en.md](DATACENTERS.en.md) |
| 💾 Storage | [STORAGE.en.md](STORAGE.en.md) |
| 🔧 Hardware and servers | [HARDWARE.en.md](HARDWARE.en.md) |
*Last revision: 2026-06-03*

12
INFRASTRUCTURE.md Normal file
View File

@@ -0,0 +1,12 @@
# 🏗️ Infrastruktura
Tento soubor byl rozdělen do samostatných oblastí:
| Oblast | Soubor |
|--------|--------|
| 🖥️ Hypervisory a virtualizace | [HYPERVISORS.md](HYPERVISORS.md) |
| 🏭 Datová centra | [DATACENTERS.md](DATACENTERS.md) |
| 💾 Storage | [STORAGE.md](STORAGE.md) |
| 🔧 Hardware a servery | [HARDWARE.md](HARDWARE.md) |
*Poslední revize: 2026-06-03*

299
KUBERNETES.en.md Normal file
View File

@@ -0,0 +1,299 @@
# ☸ Kubernetes — architecture, platforms, Cluster API
## Overview
Kubernetes (K8s) is an open-source container orchestrator — the de facto standard for deploying, scaling, and managing containerized applications. Built on declarative configuration and control loops (reconciliation).
## Kubernetes deployment methods
| Method | Description | Control plane | Best for |
|--------|-------------|--------------|----------|
| **kubeadm** | Official K8s cluster bootstrap tool | Self-managed (stacked/external etcd) | On-prem, lab, learning |
| **K3s** | Lightweight K8s (Rancher), single binary, embedded etcd/SQLite | Self-managed | Edge, IoT, low-resource, HA with embedded etcd |
| **RKE2** | Rancher Kubernetes Engine 2, CIS-hardened, FIPS-ready | Self-managed | Enterprise on-prem, air-gapped, regulatory |
| **OpenShift** | Red Hat enterprise K8s + operator lifecycle + SDN + routing | Self-managed (RHCOS) | Enterprise, multicluster, platform engineering |
| **Vanilla K8s (CAPI)** | Cluster API — declarative provisioning and lifecycle management | Self-managed (CAPI managed) | Fleet management, GitOps, multi-provider |
| **EKS** (AWS) | Managed K8s | AWS managed | AWS cloud-native, least ops |
| **AKS** (Azure) | Managed K8s | Azure managed | Azure cloud-native |
| **GKE** (GCP) | Managed K8s, auto-pilot, autopilot modes | GCP managed | GCP cloud-native |
| **SKE** (Sangfor) | Managed K8s on Sangfor HCI | Vendor managed | Sangfor HCI ecosystem |
---
## Cluster API (CAPI)
### What is Cluster API
Cluster API is a Kubernetes sub-project (SIG Cluster-Lifecycle) that brings declarative APIs for provisioning, upgrading, and operating Kubernetes clusters. Instead of Terraform scripts or manual `kubeadm`, you define clusters as Kubernetes Custom Resources — `Cluster`, `Machine`, `MachineDeployment`, etc.
Core principle: **A Kubernetes cluster that manages Kubernetes clusters.**
### Architecture
```
┌─────────────────────────────────────────┐
│ Management Cluster │
│ │
│ ┌──────────────────────────────────┐ │
│ │ CAPI Controllers │ │
│ │ ┌──────┐ ┌──────┐ ┌─────────┐ │ │
│ │ │ Infra│ │Bootstrap│ │Control │ │ │
│ │ │ Prov │ │ Prov │ │Plane Pr │ │ │
│ │ └──────┘ └──────┘ └─────────┘ │ │
│ └──────────────────────────────────┘ │
│ │
│ CR: Cluster, Machine, MachineDeployment│
│ ... │
└────────────────┬────────────────────────┘
│ CAPI controller
│ creates / manages
┌────────┴────────┐
▼ ▼
┌───────────────┐ ┌───────────────┐
│ Workload │ │ Workload │
│ Cluster (dev) │ │ Cluster (prod)│
│ ┌───┐ ┌───┐ │ │ ┌───┐ ┌───┐ │
│ │ CP│ │ W │ │ │ │ CP│ │ W │ │
│ └───┘ └───┘ │ │ └───┘ └───┘ │
└───────────────┘ └───────────────┘
```
- **Management cluster** — a Kubernetes cluster running CAPI controllers. Can be a dedicated small admin cluster.
- **Workload (managed) cluster** — Kubernetes clusters managed by CAPI; each is a CRD inside the management cluster.
- **Machine** — abstraction of a compute unit (VM, bare metal) that becomes a K8s node.
### Key CRDs (Custom Resource Definitions)
| CRD | API group | Purpose |
|-----|-----------|---------|
| **Cluster** | `cluster.x-k8s.io` | Cluster representation (infra ref, control plane ref, networking) |
| **Machine** | `cluster.x-k8s.io` | Individual node (VM/BM instance) |
| **MachineDeployment** | `cluster.x-k8s.io` | Declarative scaling and rolling update of workers |
| **MachineSet** | `cluster.x-k8s.io` | Replica set for Machines (lower-level) |
| **MachineHealthCheck** | `cluster.x-k8s.io` | Auto-remediation (replace unhealthy nodes) |
| **ClusterClass** | `cluster.x-k8s.io` | Cluster template for reuse |
| **KubeadmControlPlane** | `controlplane.cluster.x-k8s.io` | Kubeadm-managed control plane (stacked/external etcd) |
| **KubeadmConfig / KubeadmConfigTemplate** | `bootstrap.cluster.x-k8s.io` | Bootstrap configuration (kubeadm init/join) |
### Provider model
CAPI uses a three-layer provider model:
#### 1. Infrastructure Provider
Creates and manages infrastructure (VM, networks, LB, storage).
| Provider | Platform | Status |
|----------|----------|--------|
| **AWS (CAPA)** | AWS EC2, VPC, ELB, EKS | Stable, SIG-sponsored |
| **Azure (CAPZ)** | Azure VM, VNet, LB, AKS | Stable, SIG-sponsored |
| **GCP (CAPG)** | GCP Compute, VPC, GKE | Beta |
| **vSphere (CAPV)** | VMware vSphere | Stable |
| **OpenStack (CAPO)** | OpenStack compute/network | Stable |
| **Metal3** | Bare metal (Ironic) | Stable |
| **Docker (CAPD)** | Docker containers (development) | Tilt/Dev only |
| **Akamai (Linode)** | Linode | Community |
| **Azure Stack HCI** | Azure Stack HCI | Community |
| **cloudscale** | cloudscale.ch | Community |
| **Exoscale** | Exoscale | Community |
| **IBM Cloud** | IBM Cloud | Community |
| **Equinix Metal** | Equinix (ex Packet) | Community |
| **Hetzner** | Hetzner Cloud | Community |
| **OpenNebula** | OpenNebula | Community |
#### 2. Bootstrap Provider
Handles K8s initialization on a node (kubeadm init/join, TLS certs, tokens).
| Provider | Description |
|----------|-------------|
| **Kubeadm** (built-in) | Standard kubeadm init/join, supports stacked/external etcd |
| **EKS** | Bootstrap for EKS managed control plane (AWS) |
| **K3s** | Lightweight K8s bootstrap (edge, IoT) |
| **RKE2** | Rancher K8s bootstrap, CIS-hardened |
| **Talos** | API-driven bootstrap (Sidero Labs), immutable OS |
| **k0smotron** | K0s-based bootstrap + hosted control plane |
| **MicroK8s** | Canonical MicroK8s bootstrap |
| **Canonical Kubernetes** | Canonical K8s (snap-based) |
#### 3. Control Plane Provider
Manages control plane nodes.
| Provider | Description |
|----------|-------------|
| **KubeadmControlPlane** (built-in) | Kubeadm-managed CP, stacked/external etcd |
| **EKS** | AWS EKS managed control plane |
| **Kamaji** | Hosted control plane (CP runs as deployment in management cluster) |
| **K3s** | K3s control plane (edge-optimized) |
| **RKE2** | RKE2 control plane |
| **Talos** | Talos control plane, API-based management |
| **k0smotron** | Hosted control plane (k0s-based) |
| **Nested** | Nested virtualization control plane |
### ClusterClass and Managed Topologies
ClusterClass (stable since CAPI v1beta1, CAPI v1.0+) allows defining a **cluster template**:
```yaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
name: standard-aws-cluster
spec:
controlPlane:
ref:
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlaneTemplate
name: aws-cp-tmpl
machineInfrastructure:
ref:
kind: AWSMachineTemplate
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
name: aws-cp-machine-tmpl
workers:
machineDeployments:
- class: default-worker
template:
bootstrap:
ref:
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
name: aws-worker-bootstrap-tmpl
infrastructure:
ref:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachineTemplate
name: aws-worker-machine-tmpl
variables:
- name: instanceType
required: true
schema:
openAPIV3Schema:
type: string
enum: ["t3.large", "m5.large", "m5.xlarge"]
```
Then create a cluster with variable overrides:
```yaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: dev-team-alpha
namespace: clusters
spec:
topology:
class: standard-aws-cluster
version: v1.30.2
controlPlane:
replicas: 1
workers:
machineDeployments:
- class: default-worker
name: md-0
replicas: 2
variables:
- name: instanceType
value: "m5.xlarge"
```
### Cluster lifecycle with CAPI
| Phase | Action | CAPI mechanism |
|-------|--------|----------------|
| **Create** | `kubectl apply -f cluster.yaml` | Controller creates infra (VM, network), runs kubeadm init/join bootstrap |
| **Scale** | Update `replicas` in MachineDeployment | Controller creates/removes Machine → VM → node join/drain |
| **Upgrade** | Change `version` in KubeadmControlPlane / MachineDeployment | Rolling update: new CP node → upgrade → old drain & delete. Workers: MachineDeployment rolling update |
| **Health check** | MachineHealthCheck | If node unhealthy > timeout, controller creates replacement Machine |
| **Delete** | `kubectl delete cluster` | Controller drains, deletes VMs, cleans up infrastructure |
| **Template update** | Change AWSMachineTemplate / KubeadmConfigTemplate | New Machines use the new template; existing Machines only affected via rolling update |
### Auto-remediation (MachineHealthCheck)
```yaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineHealthCheck
metadata:
name: prod-mhc
namespace: clusters
spec:
clusterName: prod-us-east
selector:
matchLabels:
cluster.x-k8s.io/deployment-name: prod-us-east-workers
unhealthyConditions:
- type: Ready
status: "False"
timeout: 5m
- type: Ready
status: Unknown
timeout: 5m
maxUnhealthy: "40%"
nodeStartupTimeout: 10m
```
### CAPI + GitOps
CAPI integrates naturally with GitOps:
- **ArgoCD** — Cluster and MachineDeployment manifests in Git repo, ArgoCD applies them to the management cluster
- **Flux** — `Kustomization` + `OCIRepository` for CAPI objects
- **Crossplane** — can be combined: Crossplane provisions cloud resources (VPC, subnets), CAPI manages K8s clusters on top
Pattern: a dedicated "fleet management" cluster running CAPI + ArgoCD. All workload clusters are defined as YAML in Git.
### CAPI for on-prem
| Provider | Use case | Note |
|----------|----------|------|
| **Metal3** (Ironic) | Bare metal provisioning (PXE, IPMI, Redfish) | Automatically provisions BM servers as K8s nodes |
| **CAPV (vSphere)** | VMware VMs as K8s nodes | Most common enterprise on-prem |
| **CAPO (OpenStack)** | OpenStack VMs as K8s nodes | OpenStack-native |
| **Nutanix (CAPNX)** | Nutanix AHV/Prism | Community provider |
### CAPI for edge
| Provider | Use case | Note |
|----------|----------|------|
| **K3s bootstrap + control plane** | Lightweight K8s on edge devices | Single binary, SQLite/embedded etcd |
| **RKE2 bootstrap + control plane** | Enterprise edge, air-gapped | CIS-hardened, FIPS |
| **Talos** | Immutable OS, API-driven | Minimal footprint, no SSH |
| **k0smotron** | Hosted control plane for edge clusters | CP runs in management cluster, worker on edge |
### CAPI vs alternatives
| Tool | Approach | CAPI advantage | CAPI disadvantage |
|------|----------|----------------|-------------------|
| **Terraform/Pulumi** | Imperative/declarative IaC | CAPI is K8s-native — same tool for apps and clusters; GitOps ready | Terraform has broader non-K8s resource support |
| **kubeadm** | Manual or scripted | CAPI automates full lifecycle including upgrades and remediation | Higher complexity, requires management cluster |
| **Rancher** | Web UI + API for K8s cluster management | CAPI is open-source, vendor-neutral | Rancher has GUI, monitoring, app catalog |
| **OpenShift Hive/ACM** | Red Hat Advanced Cluster Management | CAPI is standard (SIG) — wider provider ecosystem | ACM has governance, policy, compliance |
### Limitations and maturity
- **Management cluster is SPOF** — needs its own HA and backup (etcd snapshots, certificates)
- **CAPI is not a cluster autoscaler** — it handles cluster lifecycle, not pod auto-scaling within a cluster (use Cluster Autoscaler separately)
- **Provider maturity varies** — AWS/Azure/vSphere stable, GCP/OpenStack beta, some community providers alpha
- **etcd backup is not built-in** — must be handled externally (Velero, etcd snapshot)
- **CAPI does not handle applications** — only K8s cluster lifecycle (monitoring, logging, ingress is user-managed)
- **Learning curve** — requires understanding management cluster, provider model, CRDs
- **CAPI v1.13+ (2026)** — stable release, v1beta1 API is GA, ClusterClass stable, EKS/AKS/GKE managed control plane support
### Recommended production CAPI stack
| Component | Recommendation |
|-----------|---------------|
| **Management cluster** | K3s (small footprint) or kubeadm (3 nodes HA) |
| **Infra provider** | CAPA (AWS) / CAPV (vSphere) / CAPO (OpenStack) — based on platform |
| **Bootstrap/CP provider** | Kubeadm or RKE2 |
| **GitOps** | ArgoCD or Flux |
| **Backup** | Velero + restic/Ceph |
| **Cluster autoscaler** | Cluster Autoscaler (via CAPI integration) |
| **Network** | Cilium (CAPI-native, support) |
| **Secrets** | External Secrets Operator / Sealed Secrets |
| **Monitoring** | Prometheus + Grafana (kube-prometheus-stack) |
| **Ingress** | ingress-nginx / Kong / Traefik |
## Sources
Links, books and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)
*Last revision: 2026-06-18*

299
KUBERNETES.md Normal file
View File

@@ -0,0 +1,299 @@
# ☸ Kubernetes — architektura, platformy, Cluster API
## Přehled
Kubernetes (K8s) je open-source orchestrátor kontejnerů — de facto standard pro nasazování, škálování a správu containerizovaných aplikací. Postaven na modelu deklarativní konfigurace a control loopů (reconciliation).
## Způsoby nasazení Kubernetes
| Metoda | Popis | Správa control plane | Vhodné pro |
|--------|-------|---------------------|------------|
| **kubeadm** | Oficiální nástroj pro bootstrap K8s clusteru | Self-managed (stacked/external etcd) | On-prem, lab, learning |
| **K3s** | Lightweight K8s (Rancher), single binary, embedded etcd/SQLite | Self-managed | Edge, IoT, low-resource, HA s embedded etcd |
| **RKE2** | Rancher Kubernetes Engine 2, CIS-hardened, FIPS-ready | Self-managed | Enterprise on-prem, air-gapped, regulatory |
| **OpenShift** | Red Hat enterprise K8s + operator lifecycle + SDN + routing | Self-managed (RHCOS) | Enterprise, multicluster, platform engineering |
| **Vanilla K8s (CAPI)** | Cluster API — deklarativní provisioning a lifecycle management | Self-managed (CAPI managed) | Fleet management, GitOps, multi-provider |
| **EKS** (AWS) | Managed K8s | AWS managed | AWS cloud-native, nejméně ops |
| **AKS** (Azure) | Managed K8s | Azure managed | Azure cloud-native |
| **GKE** (GCP) | Managed K8s, auto-pilot, autopilot modes | GCP managed | GCP cloud-native |
| **SKE** (Sangfor) | Managed K8s on Sangfor HCI | Vendor managed | Sangfor HCI ekosystém |
---
## Cluster API (CAPI)
### Co je Cluster API
Cluster API je Kubernetes sub-projekt (SIG Cluster-Lifecycle), který přináší deklarativní API pro provisioning, upgrade a operace Kubernetes clusterů. Místo Terraform skriptů nebo manuálního `kubeadm` definujete cluster jako Kubernetes Custom Resources — `Cluster`, `Machine`, `MachineDeployment` atd.
Princip: **Kubernetes cluster, který spravuje Kubernetes clustery.**
### Architektura
```
┌─────────────────────────────────────────┐
│ Management Cluster │
│ │
│ ┌──────────────────────────────────┐ │
│ │ CAPI Controllers │ │
│ │ ┌──────┐ ┌──────┐ ┌─────────┐ │ │
│ │ │ Infra│ │Bootstrap│ │Control │ │ │
│ │ │ Prov │ │ Prov │ │Plane Pr │ │ │
│ │ └──────┘ └──────┘ └─────────┘ │ │
│ └──────────────────────────────────┘ │
│ │
│ CR: Cluster, Machine, MachineDeployment│
│ ... │
└────────────────┬────────────────────────┘
│ CAPI controller
│ vytváří / spravuje
┌────────┴────────┐
▼ ▼
┌───────────────┐ ┌───────────────┐
│ Workload │ │ Workload │
│ Cluster (dev) │ │ Cluster (prod)│
│ ┌───┐ ┌───┐ │ │ ┌───┐ ┌───┐ │
│ │ CP│ │ W │ │ │ │ CP│ │ W │ │
│ └───┘ └───┘ │ │ └───┘ └───┘ │
└───────────────┘ └───────────────┘
```
- **Management cluster** — Kubernetes cluster, kde běží CAPI controllery. Může to být vyhrazený "admin" cluster (často velmi malý).
- **Workload (managed) cluster** — Kubernetes clustery, které CAPI spravuje. Každý je reprezentován jako CRD v management clusteru.
- **Machine** — abstrakce compute jednotky (VM, bare metal), která se stane K8s uzlem.
### Klíčové CRD (Custom Resource Definitions)
| CRD | API skupina | Účel |
|-----|------------|------|
| **Cluster** | `cluster.x-k8s.io` | Reprezentace clusteru (infra reference, control plane ref, networking) |
| **Machine** | `cluster.x-k8s.io` | Jednotlivý uzel (VM/BM instance) |
| **MachineDeployment** | `cluster.x-k8s.io` | Deklarativní škálování a rolling update workerů |
| **MachineSet** | `cluster.x-k8s.io` | Replica set pro Machiny (lower-level) |
| **MachineHealthCheck** | `cluster.x-k8s.io` | Auto-remediaci (automatické nahrazení unhealthy uzlu) |
| **ClusterClass** | `cluster.x-k8s.io` | Šablona pro vytváření clusterů |
| **KubeadmControlPlane** | `controlplane.cluster.x-k8s.io` | Control plane managed kubeadm (stacked/external etcd) |
| **KubeadmConfig / KubeadmConfigTemplate** | `bootstrap.cluster.x-k8s.io` | Bootstrap konfigurace (kubeadm init/join) |
### Provider model
CAPI používá třívrstvý provider model:
#### 1. Infrastructure Provider
Vytváří a spravuje infrastrukturu (VM, sítě, LB, storage).
| Provider | Platforma | Status |
|----------|-----------|--------|
| **AWS (CAPA)** | AWS EC2, VPC, ELB, EKS | Stable, SIG-sponsored |
| **Azure (CAPZ)** | Azure VM, VNet, LB, AKS | Stable, SIG-sponsored |
| **GCP (CAPG)** | GCP Compute, VPC, GKE | Beta |
| **vSphere (CAPV)** | VMware vSphere | Stable |
| **OpenStack (CAPO)** | OpenStack compute/network | Stable |
| **Metal3** | Bare metal (Ironic) | Stable |
| **Docker (CAPD)** | Docker containers (development) | Tilt/Dev only |
| **Akamai (Linode)** | Linode | Community |
| **Azure Stack HCI** | Azure Stack HCI | Community |
| **cloudscale** | cloudscale.ch | Community |
| **Exoscale** | Exoscale | Community |
| **IBM Cloud** | IBM Cloud | Community |
| **Equinix Metal** | Equinix (ex Packet) | Community |
| **Hetzner** | Hetzner Cloud | Community |
| **OpenNebula** | OpenNebula | Community |
#### 2. Bootstrap Provider
Zajišťuje inicializaci K8s na node (kubeadm init/join, TLS certs, tokeny).
| Provider | Popis |
|----------|-------|
| **Kubeadm** (vestavěný) | Standardní kubeadm init/join, podpora stacked/external etcd |
| **EKS** | Bootstrap pro EKS managed control plane (AWS) |
| **K3s** | Lightweight K8s bootstrap (edge, IoT) |
| **RKE2** | Rancher K8s bootstrap, CIS-hardened |
| **Talos** | API-driven bootstrap (Sidero Labs), immutable OS |
| **k0smotron** | K0s-based bootstrap + hosted control plane |
| **MicroK8s** | Canonical MicroK8s bootstrap |
| **Canonical Kubernetes** | Canonical K8s (snap-based) |
#### 3. Control Plane Provider
Spravuje control plane uzly.
| Provider | Popis |
|----------|-------|
| **KubeadmControlPlane** (vestavěný) | Kubeadm-managed CP, stacked/external etcd |
| **EKS** | AWS EKS managed control plane |
| **Kamaji** | Hosted control plane (CP běží jako deployment v management clusteru) |
| **K3s** | K3s control plane (edge-optimized) |
| **RKE2** | RKE2 control plane |
| **Talos** | Talos control plane, API-based management |
| **k0smotron** | Hosted control plane (k0s-based) |
| **Nested** | Nested virtualization control plane |
### ClusterClass a Managed Topologies
ClusterClass (stabilní od CAPI v1beta1, CAPI v1.0+) umožňuje definovat **šablonu clusteru**:
```yaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
name: standard-aws-cluster
spec:
controlPlane:
ref:
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlaneTemplate
name: aws-cp-tmpl
machineInfrastructure:
ref:
kind: AWSMachineTemplate
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
name: aws-cp-machine-tmpl
workers:
machineDeployments:
- class: default-worker
template:
bootstrap:
ref:
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
name: aws-worker-bootstrap-tmpl
infrastructure:
ref:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachineTemplate
name: aws-worker-machine-tmpl
variables:
- name: instanceType
required: true
schema:
openAPIV3Schema:
type: string
enum: ["t3.large", "m5.large", "m5.xlarge"]
```
Pak lze vytvořit cluster s přetížením proměnných:
```yaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: dev-team-alpha
namespace: clusters
spec:
topology:
class: standard-aws-cluster
version: v1.30.2
controlPlane:
replicas: 1
workers:
machineDeployments:
- class: default-worker
name: md-0
replicas: 2
variables:
- name: instanceType
value: "m5.xlarge"
```
### Životní cyklus clusteru s CAPI
| Fáze | Akce | CAPI mechanismus |
|------|------|------------------|
| **Create** | `kubectl apply -f cluster.yaml` | Controller vytvoří infra (VM, network), provede bootstrap kubeadm init/join |
| **Scale** | Upravit `replicas` v MachineDeployment | Controller vytvoří/odstraní Machine → VM → node join/drain |
| **Upgrade** | Změnit `version` v KubeadmControlPlane / MachineDeployment | Rolling update: nový CP node → upgrade → starý drain a delete. Workers: MachineDeployment rolling update |
| **Health check** | MachineHealthCheck | Pokud node unhealthy > timeout, controller vytvoří náhradní Machine |
| **Delete** | `kubectl delete cluster` | Controller provede drain, delete VMs, cleanup infrastruktury |
| **Template update** | Změna AWSMachineTemplate / KubeadmConfigTemplate | Stroj se vytvoří s novou šablonou; stávající Machiny se dotýká jen přes rolling update |
### Auto-remediace (MachineHealthCheck)
```yaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineHealthCheck
metadata:
name: prod-mhc
namespace: clusters
spec:
clusterName: prod-us-east
selector:
matchLabels:
cluster.x-k8s.io/deployment-name: prod-us-east-workers
unhealthyConditions:
- type: Ready
status: "False"
timeout: 5m
- type: Ready
status: Unknown
timeout: 5m
maxUnhealthy: "40%"
nodeStartupTimeout: 10m
```
### CAPI + GitOps
CAPI se přirozeně integruje s GitOps:
- **ArgoCD** — Cluster a MachineDeployment manifesty v Git repozitáři, ArgoCD je aplikuje na management cluster
- **Flux** — `Kustomization` + `OCIRepository` pro CAPI objekty
- **Crossplane** — lze kombinovat: Crossplane pro provisioning cloud resources (VPC, subnets), CAPI pro K8s cluster na nich
Vzor: vyhrazený "fleet management" cluster, na kterém běží CAPI + ArgoCD. Všechny workload clustery jsou definované jako YAML v Gitu.
### CAPI pro on-prem
| Provider | Use case | Poznámka |
|----------|----------|----------|
| **Metal3** (Ironic) | Bare metal provisioning (PXE, IPMI, Redfish) | Automatické provisionování BM serverů jako K8s nodes |
| **CAPV (vSphere)** | VMware VM jako K8s nodes | Většina enterprise on-prem |
| **CAPO (OpenStack)** | OpenStack VM jako K8s nodes | OpenStack-native |
| **Nutanix (CAPNX)** | Nutanix AHV/Prism | Community provider |
### CAPI pro edge
| Provider | Use case | Poznámka |
|----------|----------|----------|
| **K3s bootstrap + control plane** | Lightweight K8s na edge zařízeních | Single binary, SQLite/embedded etcd |
| **RKE2 bootstrap + control plane** | Enterprise edge, air-gapped | CIS-hardened, FIPS |
| **Talos** | Immutable OS, API-driven | Minimal footprint, no SSH |
| **k0smotron** | Hosted control plane pro edge clustery | CP běží v management clusteru, worker na edge |
### CAPI vs alternativy
| Nástroj | Přístup | CAPI výhoda | CAPI nevýhoda |
|---------|---------|-------------|---------------|
| **Terraform/Pulumi** | Imperativní/declarativní IaC | CAPI je K8s-native — stejný nástroj pro appky i clustery; GitOps ready | Terraform má širší podporu non-K8s resources |
| **kubeadm** | Manuální nebo skriptovaný | CAPI automatizuje celý lifecycle včetně upgradů a remediací | Vyšší komplexita, nutný management cluster |
| **Rancher** | Web UI + API pro správu K8s clusterů | CAPI je open-source, vendor-neutral | Rancher má GUI, monitoring, katalog appek |
| **OpenShift Hive/ACM** | Red Hat Advanced Cluster Management | CAPI je standardní (SIG) — širší provider ecosystem | ACM má governance, policy, compliance |
### Limitations a maturity
- **Management cluster je SPOF** — musí mít vlastní HA a backup (etcd zálohy, certifikáty)
- **CAPI není cluster autoscaler** — řeší lifecycle clusterů, ne auto-scaling podů v rámci clusteru (používá se Cluster Autoscaler samostatně)
- **Provider maturity se liší** — AWS/Azure/vSphere stabilní, GCP/OpenStack beta, některé community providers alpha
- **etcd backup není built-in** — nutné řešit externě (Velero, etcd snapshot)
- **CAPI neřeší aplikace** — pouze lifecycle K8s clusterů (monitoring, logging, ingress si řídí uživatel)
- **Learning curve** — nutnost management clusteru, pochopení provider modelu, CRDs
- **CAPI v1.13+ (2026)** — stable release, v1beta1 API je GA, ClusterClass stable, EKS/AKS/GKE managed control plane podpora
### Doporučený stack pro CAPI v produkci
| Komponenta | Doporučení |
|------------|------------|
| **Management cluster** | K3s (malý footprint) nebo kubeadm (3 nodes HA) |
| **Infra provider** | CAPA (AWS) / CAPV (vSphere) / CAPO (OpenStack) — dle platformy |
| **Bootstrap/CP provider** | Kubeadm nebo RKE2 |
| **GitOps** | ArgoCD nebo Flux |
| **Backup** | Velero + restic/Ceph |
| **Cluster autoscaler** | Cluster Autoscaler (přes CAPI integration) |
| **Network** | Cilium (CAPI-native, podpora) |
| **Secrets** | External Secrets Operator / Sealed Secrets |
| **Monitoring** | Prometheus + Grafana (kube-prometheus-stack) |
| **Ingress** | ingress-nginx / Kong / Traefik |
## Zdroje
Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
*Poslední revize: 2026-06-18*

275
MESSAGING.en.md Normal file
View File

@@ -0,0 +1,275 @@
# 📨 Messaging and streaming platforms
## Platform overview
| Platform | Type | Language | Protocol | Persistence | Use case |
|-----------|-----|-------|----------|-------------|----------|
| **Apache Kafka** | Distributed event store | Java/Scala | Binary (TCP) | Disk (log) | Event streaming, data pipeline, log aggregation |
| **RabbitMQ** | Message broker | Erlang | AMQP 0-9-1, MQTT, STOMP | Disk / RAM | Application messaging, task queue, RPC |
| **Apache Pulsar** | Distributed messaging + streaming | Java | Binary (TCP) + REST | Disk (segmented log) | Streaming + queue in one, multi-tenant |
| **NATS** | Lightweight messaging | Go | NATS protocol (TCP) | Memory / JetStream (disk) | Microservices, IoT, edge, low-latency |
| **AWS SQS** | Managed queue | — | HTTPS | Managed | Decoupling services, serverless |
| **AWS SNS** | Managed pub/sub | — | HTTPS, SQS, Lambda, email | Managed | Push notifications, fanout |
| **Azure Service Bus** | Managed messaging | — | AMQP, HTTPS | Managed | Enterprise messaging, sessions, transactions |
| **Google Pub/Sub** | Managed streaming | — | gRPC, REST | Managed | Event-driven, data pipeline |
| **Red Hat AMQ 7** (Artemis) | Message broker | Java | AMQP, MQTT, STOMP, OpenWire | Disk | Enterprise, JMS, high-availability |
| **Oracle Service Bus (OSB)** | Enterprise ESB | Java | HTTP/S, JMS, SOAP, REST, MQ, FTP, AQ | Managed (WebLogic) | Enterprise integration, SOA, protocol mediation, routing |
---
## Platform details
### Apache Kafka
**Architecture:**
```
Producer ──► Topic ──► Partition ──► Consumer Group
├── Partition 0 (Leader) ──► Broker 1
├── Partition 1 (Follower) ──► Broker 2
└── Partition 2 (Follower) ──► Broker 3
```
| Concept | Description |
|---------|-------|
| **Topic** | Logical message category |
| **Partition** | Append-only log, ordered sequence of messages |
| **Broker** | Server in Kafka cluster |
| **Producer** | Publishes messages to topic |
| **Consumer** | Reads messages from partition (within consumer group) |
| **Consumer Group** | Group of consumers sharing topic reading |
| **Offset** | Position in partition (tracked by consumer) |
| **KRaft** | Controller quorum (replaces Zookeeper from Kafka 3.x) |
**Replication and HA:**
| Parameter | Value |
|----------|---------|
| Replication factor | 23 (typically 3 for production) |
| ISR (In-Sync Replicas) | Number of replicas keeping up with leader |
| Min ISR | Minimum ISR for acknowledging writes (acks=all) |
| acks=0 | Fire-and-forget (fastest, possible data loss) |
| acks=1 | Write acknowledged by leader (compromise) |
| acks=all | Write acknowledged by all ISR (safest) |
| Leader failover | Automatic election of new leader from ISR |
**Important configuration:**
```properties
# Production
replication.factor=3
min.insync.replicas=2
default.replication.factor=3
# Retention
log.retention.hours=168 # 7 days
log.retention.bytes=-1 # unlimited (or limit)
log.segment.bytes=1073741824 # 1 GB per segment
# Performance
num.partitions=3 # adjust per need (scale-out)
compression.type=snappy # (snappy, gzip, lz4, zstd)
```
**Partitioning strategies:**
| Strategy | Key | Advantage | Disadvantage |
|----------|------|--------|----------|
| Round-robin | null | Even distribution | Per-key ordering lost |
| Key-based | user_id, order_id | Same key → same partition | Uneven distribution (hot keys) |
| Custom partitioner | Custom logic | Per use-case optimization | More complex maintenance |
### RabbitMQ
**Architecture:**
```
Producer ──► Exchange ──► Binding ──► Queue ──► Consumer
┌───────────┼───────────┐
▼ ▼ ▼
Direct Topic Fanout
Exchange Exchange Exchange
```
| Concept | Description |
|---------|-------|
| **Exchange** | Receives messages from producer, routes to queue |
| **Binding** | Exchange → queue link with routing key |
| **Queue** | FIFO message queue (consumed by consumer) |
| **Virtual Host (vhost)** | Tenant isolation within a single cluster |
| **Publisher Confirm** | Broker acknowledges message receipt |
| **Consumer Ack** | Consumer acknowledges message processing |
**Exchange types:**
| Type | Routing | Use case |
|-----|---------|----------|
| **Direct** | routing_key = binding_key | Task queue, point-to-point |
| **Topic** | routing_key match binding pattern (wildcard `*`, `#`) | Pub/sub with filtering |
| **Fanout** | All bound queues | Broadcast, event notification |
| **Headers** | AMQP headers match | Complex routing (not routing key dependent) |
**Queue types:**
```properties
# Classic Queue (deprecated in production)
x-queue-type: classic
# Quorum Queue (recommended for production)
x-queue-type: quorum
x-quorum-initial-group-size: 3
x-dead-letter-exchange: dlx
# Stream Queue (for large backlogs)
x-queue-type: stream
x-max-length-bytes: 1073741824
```
**HA and clustering:**
| Mode | Description | Use case |
|-------|-------|----------|
| **Quorum Queues** | Raft-based replication (35 node), auto failover | Production, HA messaging |
| **Federation** | Async message forwarding between independent RabbitMQ clusters | Multi-region, DR |
| **Shovel** | Point-to-point message forwarding (Federation at queue level) | Migration, specific routing |
| **Warm Standby (DR)** | Secondary cluster, started on failover | Cold DR |
### Apache Pulsar
**Unique architecture (compute/storage separation):**
```
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Producer │ │ Consumer │ │ Consumer │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
┌──────▼───────────────────▼───────────────────▼──────┐
│ Broker (stateless) │
│ Subscription: Exclusive / Shared / Failover │
└──────────────────────┬──────────────────────────────┘
┌──────────────────────▼──────────────────────────────┐
│ BookKeeper (stateful storage) │
│ ├── Bookie 1 ├── Bookie 2 ├── Bookie 3 ├── ... │
│ └── Ledger (append-only, segmented log) │
└─────────────────────────────────────────────────────┘
```
| Concept | Description |
|---------|-------|
| **Topic** | Logical category (partitioned or non-partitioned) |
| **Subscription** | Delivery mode (Exclusive, Shared, Failover, Key_Shared) |
| **Ledger** | Storage unit in BookKeeper (append-only) |
| **Bookie** | Storage node (BookKeeper) |
| **Managed Ledger** | Segmented log with cache and retention |
**Advantages over Kafka:**
- Compute/storage separation — independent scaling
- Geo-replication built-in (native)
- Multi-tenant (namespaces, isolation)
- TTL, retry, dead letter topic (built-in)
- Read-at-least-once / effectively-once
### NATS
| Feature | Description |
|---------|-------|
| **Core NATS** | Pub/sub, request-reply, < 1 ms latency |
| **JetStream** | Persistence, exactly-once, key-value store, object store |
| **Leaf nodes** | Hierarchical cluster connection |
| **Super-cluster** | Multi-region clustering (global) |
**Use case:** IoT, edge computing, microservices communication, low-latency messaging.
### Oracle Service Bus (OSB)
Part of Oracle SOA Suite, runs on WebLogic Server. Enterprise service bus for integration in Oracle-heavy environments.
| Concept | Description |
|---------|-------|
| **Proxy Service** | Inbound endpoint (HTTP, JMS, MQ, SOAP, REST) |
| **Business Service** | Target backend service |
| **Pipeline** | Message processing — routing, transformation, validation |
| **Split-Join** | Parallel/sequential orchestration of multiple services |
| **Reporting** | Message tracking, SLA monitoring |
**Key features:**
- **Protocol mediation** — translation between SOAP/REST/JMS/MQ/FTP
- **Message transformation** — XSLT, XQuery, MFL (non-XML)
- **Throttling, SLA, alerting** — built-in
- **Oracle AQ (Advanced Queuing)** — integration with Oracle DB queues
- **XPath, XQuery, XSLT 2.0/3.0** — native support
- **Error handling** — fault policies, error queues, retry
**Use case:** Enterprise SOA, Oracle DB → Kafka bridging, legacy mainframe wrapping, B2B integration.
**Alternatives:** IBM Integration Bus (IIB), MuleSoft Anypoint, WSO2 EI, Apache Camel / ServiceMix.
---
## Platform comparison
### Performance and scaling
| Platform | Max throughput | Latency (P99) | Messages/s (1 broker) | Scaling |
|-----------|--------------|---------------|-------------------------|-----------|
| **Kafka** | > 1 GB/s | 210 ms | ~1,000,000 | Partitions (horizontal) |
| **Pulsar** | > 1 GB/s | 515 ms | ~1,000,000 | Brokers + Bookies |
| **RabbitMQ** | ~100 MB/s | < 1 ms (RAM) | ~100,000 | Clustering (node) |
| **NATS** | > 10 GB/s | < 0.5 ms | ~10,000,000 | Clustering + Leaf nodes |
| **OSB** | < 1 GB/s | 10100 ms | ~10,000 | Vertical (WebLogic cluster)
### Delivery guarantees
| Platform | At most once | At least once | Exactly once | Ordering |
|-----------|-------------|---------------|-------------|----------|
| **Kafka** | Yes | Yes (acks=all + min.insync) | Yes (idempotent + transactional) | Per partition |
| **Pulsar** | Yes | Yes | Yes (dedup + transactional) | Per partition |
| **RabbitMQ** | Yes | Yes (Publisher Confirm + Consumer Ack) | Limited | Per queue |
| **NATS** | Yes | Yes (JetStream) | Limited | Per subject |
| **OSB** | Yes | Yes (XA transactions, exactly-once delivery) | Yes (XA + WS-AT) | Per pipeline |
### When to use what
| Use case | Recommended platform | Reasoning |
|----------|---------------------|------------|
| **Event sourcing / audit log** | Kafka, Pulsar | Append-only log, high throughput, replay |
| **CDC (Change Data Capture)** | Kafka (Kafka Connect + Debezium) | Connector ecosystem |
| **Task queue (job processing)** | RabbitMQ, SQS | Dead letter, retry, priority, scheduling |
| **API messaging / microservices** | NATS, RabbitMQ | Low latency, simplicity |
| **Data pipeline (ETL)** | Kafka (KSQL, Kafka Streams) | Stream processing in platform |
| **IoT / Edge** | NATS, MQTT (RabbitMQ) | Lightweight, leaf nodes |
| **Enterprise SOA / EAI** | OSB, IBM IIB, MuleSoft | Protocol mediation, XA, B2B, legacy wrapping |
| **Multi-tenant cloud** | Pulsar | Native multi-tenant, geo-replication |
| **Serverless / event-driven** | SQS/SNS, Pub/Sub | Managed, auto-scaling |
---
## DR and high availability
See [DATACENTERS.en.md](DATACENTERS.en.md) — section "Impact of individual technologies on DC topology selection" for detailed DR mapping per platform.
### Best practices
- **Don't lose messages in queue** — prefer acknowledgement-based consumption (not auto-ack)
- **Dead letter queue** — every main queue has a DLQ for undeliverable messages
- **Monitor lag** — consumer lag is a key metric (Kafka: `kafka.consumer:consumer_lag`)
- **Idempotent consumer** — same message may be delivered twice
- **Retry with backoff** — exponential backoff on processing failure
- **Schema registry** — avoid deserialization errors (Avro, Protobuf, JSON Schema)
- **Encryption** — TLS in transit, encryption at rest (Kafka: cluster-side + topic-level)
---
## Related
- [DATACENTERS.en.md](DATACENTERS.en.md) — DR topology, per-platform mapping
- [CLOUD.en.md](CLOUD.en.md) — managed messaging (SQS, SNS, Service Bus, Pub/Sub)
## Sources
Links, books, and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)
*Last revision: 2026-06-12*

275
MESSAGING.md Normal file
View File

@@ -0,0 +1,275 @@
# 📨 Messaging a streaming platformy
## Přehled platformem
| Platforma | Typ | Jazyk | Protokol | Persistence | Use case |
|-----------|-----|-------|----------|-------------|----------|
| **Apache Kafka** | Distributed event store | Java/Scala | Binary (TCP) | Disk (log) | Event streaming, data pipeline, log aggregation |
| **RabbitMQ** | Message broker | Erlang | AMQP 0-9-1, MQTT, STOMP | Disk / RAM | Aplikační messaging, task queue, RPC |
| **Apache Pulsar** | Distributed messaging + streaming | Java | Binary (TCP) + REST | Disk (segmented log) | Streaming + queue v jednom, multi-tenant |
| **NATS** | Lightweight messaging | Go | NATS protocol (TCP) | Memory / JetStream (disk) | Microservices, IoT, edge, low-latency |
| **AWS SQS** | Managed queue | — | HTTPS | Managed | Decoupling services, serverless |
| **AWS SNS** | Managed pub/sub | — | HTTPS, SQS, Lambda, email | Managed | Push notifications, fanout |
| **Azure Service Bus** | Managed messaging | — | AMQP, HTTPS | Managed | Enterprise messaging, sessions, transactions |
| **Google Pub/Sub** | Managed streaming | — | gRPC, REST | Managed | Event-driven, data pipeline |
| **Red Hat AMQ 7** (Artemis) | Message broker | Java | AMQP, MQTT, STOMP, OpenWire | Disk | Enterprise, JMS, high-availability |
| **Oracle Service Bus (OSB)** | Enterprise ESB | Java | HTTP/S, JMS, SOAP, REST, MQ, FTP, AQ | Managed (WebLogic) | Enterprise integration, SOA, protocol mediation, routing |
---
## Detail platformem
### Apache Kafka
**Architektura:**
```
Producer ──► Topic ──► Partition ──► Consumer Group
├── Partition 0 (Leader) ──► Broker 1
├── Partition 1 (Follower) ──► Broker 2
└── Partition 2 (Follower) ──► Broker 3
```
| Koncept | Popis |
|---------|-------|
| **Topic** | Logická kategorie zpráv |
| **Partition** | Append-only log, ordered sequence of messages |
| **Broker** | Server v Kafka clusteru |
| **Producer** | Publikuje zprávy do topicu |
| **Consumer** | Čte zprávy z partition (v rámci consumer group) |
| **Consumer Group** | Skupina consumerů sdílejících čtení topicu |
| **Offset** | Pozice v partition (sledovaná consumerem) |
| **KRaft** | Controller quorum (nahrazuje Zookeeper od Kafka 3.x) |
**Replikace a HA:**
| Parametr | Hodnota |
|----------|---------|
| Replication factor | 23 (typicky 3 pro produkci) |
| ISR (In-Sync Replicas) | Počet replik, které drží krok s leaderem |
| Min ISR | Minimální počet ISR pro potvrzení zápisu (acks=all) |
| acks=0 | Fire-and-forget (nejrychlejší, možná ztráta dat) |
| acks=1 | Zápis potvrzen leaderem (kompromis) |
| acks=all | Zápis potvrzen všemi ISR (nejbezpečnější) |
| Leader failover | Automatický výběr nového leadera z ISR |
**Důležité konfigurace:**
```properties
# Produkce
replication.factor=3
min.insync.replicas=2
default.replication.factor=3
# Retention
log.retention.hours=168 # 7 dní
log.retention.bytes=-1 # neomezeno (nebo limit)
log.segment.bytes=1073741824 # 1 GB per segment
# Performance
num.partitions=3 # podle potřeb (scale-out)
compression.type=snappy # (snappy, gzip, lz4, zstd)
```
**Partitioning strategies:**
| Strategy | Klíč | Výhoda | Nevýhoda |
|----------|------|--------|----------|
| Round-robin | null | Rovnoměrné rozložení | Ztráta pořadí per klíč |
| Key-based | user_id, order_id | Zprávy se stejným klíčem → stejná partition | Nerovnoměrné rozložení (hot keys) |
| Custom partitioner | Vlastní logika | Optimalizace per use case | Složitější na údržbu |
### RabbitMQ
**Architektura:**
```
Producer ──► Exchange ──► Binding ──► Queue ──► Consumer
┌───────────┼───────────┐
▼ ▼ ▼
Direct Topic Fanout
Exchange Exchange Exchange
```
| Koncept | Popis |
|---------|-------|
| **Exchange** | Přijímá zprávy od producera, routuje do queue |
| **Binding** | Vazba exchange → queue s routing key |
| **Queue** | FIFO fronta zpráv (consumer čte) |
| **Virtual Host (vhost)** | Izolace tenantů v rámci jednoho clusteru |
| **Publisher Confirm** | Potvrzení že broker zprávu přijal |
| **Consumer Ack** | Potvrzení že consumer zprávu zpracoval |
**Exchange typy:**
| Typ | Routing | Use case |
|-----|---------|----------|
| **Direct** | routing_key = binding_key | Task queue, point-to-point |
| **Topic** | routing_key match binding pattern (wildcard `*`, `#`) | Pub/sub s filtrováním |
| **Fanout** | Všem bindovaným queue | Broadcast, event notification |
| **Headers** | AMQP headers match | Komplexní routing (není závislý na routing key) |
**Queue typy:**
```properties
# Classic Queue (deprecated v produkci)
x-queue-type: classic
# Quorum Queue (doporučeno pro produkci)
x-queue-type: quorum
x-quorum-initial-group-size: 3
x-dead-letter-exchange: dlx
# Stream Queue (pro large backlogs)
x-queue-type: stream
x-max-length-bytes: 1073741824
```
**HA a clustering:**
| Režim | Popis | Use case |
|-------|-------|----------|
| **Quorum Queues** | Raft-based replikace (35 node), auto failover | Produkce, HA messaging |
| **Federation** | Async forwarding zpráv mezi nezávislými RabbitMQ clustery | Multi-region, DR |
| **Shovel** | Point-to-point forwarding zpráv (Federation na úrovni queue) | Migrace, specifický routing |
| **Warm Standby (DR)** | Druhý cluster, start až při failoveru | Cold DR |
### Apache Pulsar
**Unikátní architektura (compute/storage separation):**
```
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Producer │ │ Consumer │ │ Consumer │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
┌──────▼───────────────────▼───────────────────▼──────┐
│ Broker (stateless) │
│ Subscription: Exclusive / Shared / Failover │
└──────────────────────┬──────────────────────────────┘
┌──────────────────────▼──────────────────────────────┐
│ BookKeeper (stateful storage) │
│ ├── Bookie 1 ├── Bookie 2 ├── Bookie 3 ├── ... │
│ └── Ledger (append-only, segmented log) │
└─────────────────────────────────────────────────────┘
```
| Koncept | Popis |
|---------|-------|
| **Topic** | Logická kategorie (partitioned nebo non-partitioned) |
| **Subscription** | Způsob doručení (Exclusive, Shared, Failover, Key_Shared) |
| **Ledger** | Storage unit v BookKeeper (append-only) |
| **Bookie** | Storage node (BookKeeper) |
| **Managed Ledger** | Segmentovaný log s cache a retention |
**Výhody oproti Kafce:**
- Compute/storage separation — nezávislé škálování
- Geo-replication built-in (nativní)
- Multi-tenant (namespaces, isolation)
- TTL, retry, dead letter topic (built-in)
- Read-at-least-once / effectively-once
### NATS
| Feature | Popis |
|---------|-------|
| **Core NATS** | Pub/sub, request-reply, < 1 ms latence |
| **JetStream** | Persistence, exactly-once, key-value store, object store |
| **Leaf nodes** | Hierarchické propojení clusterů |
| **Super-cluster** | Multi-region clustering (global) |
**Use case:** IoT, edge computing, microservices communication, low-latency messaging.
### Oracle Service Bus (OSB)
Součást Oracle SOA Suite, běží na WebLogic Serveru. Enterprise service bus pro integraci v Oracle-heavy prostředích.
| Koncept | Popis |
|---------|-------|
| **Proxy Service** | Vstupní endpoint (HTTP, JMS, MQ, SOAP, REST) |
| **Business Service** | Cílový backend service |
| **Pipeline** | Message processing — routing, transformation, validation |
| **Split-Join** | Parallel/sequential orchestration více služeb |
| **Reporting** | Message tracking, SLA monitoring |
**Klíčové vlastnosti:**
- **Protocol mediation** — překlad mezi SOAP/REST/JMS/MQ/FTP
- **Message transformation** — XSLT, XQuery, MFL (neXML)
- **Throttling, SLA, alerting** — built-in
- **Oracle AQ (Advanced Queuing)** — integrace s Oracle DB frontami
- **XPath, XQuery, XSLT 2.0/3.0** — nativní podpora
- **Error handling** — fault policies, error queues, retry
**Use case:** Enterprise SOA, Oracle DB → Kafka bridging, legacy mainframe wrapping, B2B integration.
**Alternativy:** IBM Integration Bus (IIB), MuleSoft Anypoint, WSO2 EI, Apache Camel / ServiceMix.
---
## Srovnání platformem
### Výkon a škálování
| Platforma | Max throughput | Latence (P99) | Počet zpráv/s (1 broker) | Škálování |
|-----------|--------------|---------------|-------------------------|-----------|
| **Kafka** | > 1 GB/s | 210 ms | ~1 000 000 | Partitions (horizontální) |
| **Pulsar** | > 1 GB/s | 515 ms | ~1 000 000 | Brokers + Bookies |
| **RabbitMQ** | ~100 MB/s | < 1 ms (RAM) | ~100 000 | Clustering (node) |
| **NATS** | > 10 GB/s | < 0,5 ms | ~10 000 000 | Clustering + Leaf nodes |
| **OSB** | < 1 GB/s | 10100 ms | ~10 000 | Vertikální (WebLogic cluster)
### Delivery guarantees
| Platforma | At most once | At least once | Exactly once | Ordering |
|-----------|-------------|---------------|-------------|----------|
| **Kafka** | Ano | Ano (acks=all + min.insync) | Ano (idempotent + transactional) | Per partition |
| **Pulsar** | Ano | Ano | Ano (dedup + transactional) | Per partition |
| **RabbitMQ** | Ano | Ano (Publisher Confirm + Consumer Ack) | Omezeně | Per queue |
| **NATS** | Ano | Ano (JetStream) | Omezeně | Per subject |
| **OSB** | Ano | Ano (XA transactions, exactly-once delivery) | Ano (XA + WS-AT) | Per pipeline |
### Kdy co použít
| Use case | Doporučená platforma | Zdůvodnění |
|----------|---------------------|------------|
| **Event sourcing / audit log** | Kafka, Pulsar | Append-only log, high throughput, replay |
| **CDC (Change Data Capture)** | Kafka (Kafka Connect + Debezium) | Ekosystém konektorů |
| **Task queue (job processing)** | RabbitMQ, SQS | Dead letter, retry, priority, scheduling |
| **API messaging / microservices** | NATS, RabbitMQ | Nízká latence, jednoduchost |
| **Data pipeline (ETL)** | Kafka (KSQL, Kafka Streams) | Stream processing v platformě |
| **IoT / Edge** | NATS, MQTT (RabbitMQ) | Lightweight, leaf nodes |
| **Enterprise SOA / EAI** | OSB, IBM IIB, MuleSoft | Protocol mediation, XA, B2B, legacy wrapping |
| **Multi-tenant cloud** | Pulsar | Nativní multi-tenant, geo-replication |
| **Serverless / event-driven** | SQS/SNS, Pub/Sub | Managed, auto-scaling |
---
## DR a vysoká dostupnost
Viz [DATACENTERS.md](DATACENTERS.md) — sekce "Vliv jednotlivých technologií na výběr DC topologie" pro detail DR mapping per platforma.
### Best practices
- **Neztrať zprávu v queue** — preferovat aknowledge-based consumption (ne auto-ack)
- **Dead letter queue** — každá hlavní queue má DLQ pro nedoručitelné zprávy
- **Monitoring lag** — consumer lag je klíčová metrika (Kafka: `kafka.consumer:consumer_lag`)
- **Idempotentní consumer** — stejná zpráva může být doručena dvakrát
- **Retry s backoff** — exponenciální backoff při selhání zpracování
- **Schema registry** — vyhnout se deserialization errors (Avro, Protobuf, JSON Schema)
- **Šifrování** — TLS in transit, encryption at rest (Kafka: cluster-side + topic-level)
---
## Související
- [DATACENTERS.md](DATACENTERS.md) — DR topologie, per-platforma mapping
- [CLOUD.md](CLOUD.md) — managed messaging (SQS, SNS, Service Bus, Pub/Sub)
## Zdroje
Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
*Poslední revize: 2026-06-12*

116
MONGODB.en.md Normal file
View File

@@ -0,0 +1,116 @@
# 🥬 MongoDB
## Overview
MongoDB is the most widespread document-oriented NoSQL database. It stores data as BSON (binary JSON) documents with a flexible schema. Suitable for applications with rapid development where the schema frequently migrates or is diverse.
## Data model
- **Database** → Collection → Document (JSON/BSON)
- **Document** — fields with key-value, nested objects, arrays
- **Flexible schema** — each document can have different fields (but not recommended)
- **ObjectID** — default primary key (12-byte: timestamp + machine + PID + counter)
## Architecture
```
mongod (individual node)
├── WiredTiger storage engine (default since 3.2)
│ ├── B-Tree indexes (B-Tree, not LSM)
│ ├── MVCC (snapshot isolation)
│ ├── Compression (zlib, snappy, zstd)
│ └── Cache (WiredTiger internal cache)
├── Replication (replica set)
│ ├── Primary (all writes)
│ └── Secondary (replication, optional reads)
└── Sharding (cluster)
├── mongos (router)
├── Config servers (metadata)
└── Shards (replica sets)
```
### Replica set
- Primary node = all writes, secondary = replication (oplog)
- Automatic failover (election among secondaries)
- Up to 50 nodes in a replica set, max 7 voting nodes
- Read preference: primary (default), primaryPreferred, secondary, secondaryPreferred, nearest
### Sharding
- Shard key = decisive for distribution
- **Range sharding** — close data on the same shard (good for range queries, risk of hot spots)
- **Hashed sharding** — even distribution (good for write throughput, bad for range queries)
- **Zoned sharding** — data placed according to zones (geo-distribution, compliance)
## Index types
| Type | Description |
|------|-------------|
| **Single field** | Standard B-Tree index |
| **Compound** | Multiple fields in index (order matters) |
| **Multikey** | Index on array field — each value separately |
| **Text** | Full-text search |
| **Geospatial (2d, 2dsphere)** | Geo queries (near, within, intersect) |
| **Hashed** | For hashed sharding |
| **TTL** | Automatic document deletion after expiration |
| **Wildcard** | Index on unknown/irregular fields |
## Aggregation pipeline
MongoDB pipeline framework for data transformations:
```javascript
db.orders.aggregate([
{ $match: { status: "shipped" } },
{ $group: { _id: "$customer_id", total: { $sum: "$amount" } } },
{ $sort: { total: -1 } },
{ $limit: 10 }
])
```
## Recommendations — where MongoDB is better
| Area | MongoDB | Competition | Why MongoDB |
|------|---------|-------------|-------------|
| **Flexible schema** | Schema-less, changes without migration | PostgreSQL (ALTER TABLE + migration) | Rapid development, MVP, frequent model changes |
| **JSON / documents** | Native BSON, nested objects | PostgreSQL (jsonb, but lacks $ operators) | Simpler object mapping from code |
| **Horizontal scaling** | Native sharding (mongos + config) | MySQL (Vitess external) | Built-in, simple to set up |
| **Geo-distribution** | Zoned sharding, replica set per region | Cassandra (AP model, different philosophy) | CP from CAP, consistency + distribution |
| **Aggregation** | Aggregation pipeline, $lookup (LEFT JOIN) | PostgreSQL (SQL JOINs, more powerful) | Useful for denormalized data |
| **Development speed** | ORM-like (Mongoose), natural JSON | SQL (schema first, migrations) | Fastest time-to-market |
### When to use MongoDB
- **Rapid development / MVP** — schema evolves frequently, no migrations
- **Catalog data** — products with varying attributes (e-commerce, marketplace)
- **Content management** — diverse content (blog, CMS, headless CMS)
- **Real-time analytics** — aggregations, dashboards, event data
- **IoT / sensor data** — diverse message structures
- **Mobile applications** — JSON documents naturally map to API responses
### When to use something else
- **Financial transactions** → PostgreSQL (ACID, referential integrity)
- **Complex reports / JOINs** → PostgreSQL or ClickHouse
- **Relationship data (friends, follows)** → Neo4j (graph DB)
- **High-throughput writes** → Cassandra (AP model, no master bottleneck)
- **Small data, single server** → SQLite (simpler, no daemon)
## MongoDB licensing
MongoDB changed its license in 2018 from GNU AGPL v3 to **SSPL** (Server Side Public License):
| Variant | License | Price | Conditions |
|---------|---------|-------|------------|
| **MongoDB Community** | SSPL | Free | SSPL: if you offer MongoDB as a managed service, you must release the entire stack (incl. orchestration, monitoring) as open source. Internal use without restrictions |
| **MongoDB Enterprise Advanced** | Commercial | ~$10,000/server/year (Atlas: pay-per-use) | Enterprise features (LDAP, Kerberos, auditing, encryption), 24/7 support |
| **MongoDB Atlas** | Managed | Pay-per-use (~$0.10-5.00/hour depending on instance) | Fully managed, multi-cloud, auto-scaling, backup, monitoring |
**Impact**: SSPL is similar to Redis model — self-hosted internal use without restrictions, cloud providers (AWS, Azure) cannot offer MongoDB as a managed service without commercial agreement. Alternative: **FerretDB** (open source proxy compatible with MongoDB wire protocol).
## Sources
References, books, and standards: [sources/databases/sources.en.md](sources/databases/sources.en.md)
*Last revision: 2026-06-03*

116
MONGODB.md Normal file
View File

@@ -0,0 +1,116 @@
# 🥬 MongoDB
## Přehled
MongoDB je nejrozšířenější document-oriented NoSQL databáze. Ukládá data jako BSON (binární JSON) dokumenty s flexibilním schematem. Vhodná pro aplikace s rychlým vývojem, kde schema často migruje nebo je různorodé.
## Data model
- **Database** → Collection → Document (JSON/BSON)
- **Document** — pole s klíč-hodnota, vnořené objekty, pole
- **Flexibilní schema** — každý dokument může mít jiná pole (ale nedoporučuje se)
- **ObjectID** — výchozí primární klíč (12-bajtový: timestamp + machine + PID + counter)
## Architektura
```
mongod (jednotlivý node)
├── WiredTiger storage engine (výchozí od 3.2)
│ ├── B-Tree indexy (B-Tree, ne LSM)
│ ├── MVCC (snapshot isolation)
│ ├── Compression (zlib, snappy, zstd)
│ └── Cache (WiredTiger internal cache)
├── Replication (replica set)
│ ├── Primary (všechny zápisy)
│ └── Secondary (replikace, volitelné čtení)
└── Sharding (cluster)
├── mongos (router)
├── Config servers (metadata)
└── Shards (replica sets)
```
### Replica set
- Primární node = všechny zápisy, sekundární = replikace (oplog)
- Automatický failover (election mezi sekundáry)
- Až 50 nodeů v replica setu, max 7 voting nodes
- Read preference: primary (default), primaryPreferred, secondary, secondaryPreferred, nearest
### Sharding
- Shard klíč = rozhodující pro distribuci
- **Range sharding** — blízká data na stejném shardu (good for range queries, risk of hot spots)
- **Hashed sharding** — rovnoměrná distribuce (good for write throughput, bad for range queries)
- **Zoned sharding** — data umístěna podle zón (geo-distribuce, compliance)
## Index types
| Typ | Popis |
|-----|-------|
| **Single field** | Standard B-Tree index |
| **Compound** | Více polí v indexu (order matters) |
| **Multikey** | Index na pole (array) — každá hodnota samostatně |
| **Text** | Full-text search |
| **Geospatial (2d, 2dsphere)** | Geo dotazy (near, within, intersect) |
| **Hashed** | Pro hashed sharding |
| **TTL** | Automatické mazání dokumentů po expiraci |
| **Wildcard** | Index na neznámá/nepravidelná pole |
## Aggregation pipeline
MongoDB pipeline framework pro transformace dat:
```javascript
db.orders.aggregate([
{ $match: { status: "shipped" } },
{ $group: { _id: "$customer_id", total: { $sum: "$amount" } } },
{ $sort: { total: -1 } },
{ $limit: 10 }
])
```
## Doporučení — v čem je MongoDB lepší
| Oblast | MongoDB | Konkurence | Proč MongoDB |
|--------|---------|------------|--------------|
| **Flexibilní schema** | Schema-less, změny bez migrace | PostgreSQL (ALTER TABLE + migration) | Rychlý vývoj, MVP, časté změny modelu |
| **JSON / dokumenty** | Nativní BSON, vnořené objekty | PostgreSQL (jsonb, ale chybí $ operators) | Jednodušší mapování objektů z kódu |
| **Horizontal scaling** | Nativní sharding (mongos + config) | MySQL (Vitess externí) | Vestavěný, jednoduchý na setup |
| **Geo-distribuce** | Zoned sharding, replica set per region | Cassandra (AP model, jiná filozofie) | CP z CAP, konzistence + distribuce |
| **Agregace** | Aggregation pipeline, $lookup (LEFT JOIN) | PostgreSQL (SQL JOINy, výkonnější) | Užitečné pro denormalizovaná data |
| **Rychlost developmentu** | ORM-like (Mongoose), JSON přirozený | SQL (schema first, migrace) | Nejrychlejší time-to-market |
### Kdy použít MongoDB
- **Rychlý vývoj / MVP** — schema evolves frequently, žádné migrace
- **Katalogová data** — produkty s různými atributy (e-commerce, marketplace)
- **Content management** — různorodý obsah (blog, CMS, headless CMS)
- **Real-time analytics** — agregace, dashboardy, event data
- **IoT / senzorová data** — různorodé struktury zpráv
- **Mobilní aplikace** — JSON dokumenty přirozeně mapují API response
### Kdy použít něco jiného
- **Finanční transakce** → PostgreSQL (ACID, referenční integrita)
- **Komplexní reporty / JOINy** → PostgreSQL nebo ClickHouse
- **Vztahová data (friends, follows)** → Neo4j (grafová DB)
- **High-throughput zápisů** → Cassandra (AP model, bez master bottlenecku)
- **Malá data, jeden server** → SQLite (jednodušší, žádný daemon)
## MongoDB licensing
MongoDB změnila licenci v roce 2018 z GNU AGPL v3 na **SSPL** (Server Side Public License):
| Varianta | Licence | Cena | Podmínky |
|----------|---------|------|----------|
| **MongoDB Community** | SSPL | Zdarma | SSPL: pokud nabízíte MongoDB jako managed službu, musíte uvolnit celý stack (vč. orchestrace, monitoringu) jako open source. Interní použití bez omezení |
| **MongoDB Enterprise Advanced** | Komerční | ~$10 000/server/rok (Atlas: pay-per-use) | Enterprise funkce (LDAP, Kerberos, auditing, encryption), support 24/7 |
| **MongoDB Atlas** | Managed | Pay-per-use (~$0.10-5.00/hod dle instance) | Plně managed, multi-cloud, auto-scaling, backup, monitoring |
**Dopad**: SSPL je podobný model jako u Redis — pro self-hosted interní použití bez omezení, cloud poskytovatelé (AWS, Azure) nesmí nabízet MongoDB jako managed službu bez komerční dohody. Alternativa: **FerretDB** (open source proxy kompatibilní s MongoDB wire protokolem).
## Zdroje
Odkazy, knihy a standardy: [sources/databases/sources.md](sources/databases/sources.md)
*Poslední revize: 2026-06-03*

502
MONITORING.en.md Normal file
View File

@@ -0,0 +1,502 @@
# 📊 Monitoring and observability
## OpenMetrics standard
OpenMetrics (CNCF sandbox) is the de-facto standard for metric exposition in cloud-native environments:
- Supports text representation and Protocol Buffers
- Foundation for Prometheus exposition format
- Specifies: counter, gauge, histogram, summary, gaugehistogram, statefulset
- `_total` suffix for cumulative values, `_bucket` for histograms
- Metadata: HELP, TYPE, UNIT, (timestamp optional)
The standard is developed within [OpenObservability](https://github.com/OpenObservability/OpenMetrics).
## New tools and trends (20242026)
| Tool | Description |
|------|-------------|
| **Grafana Sigil** | AI observability for LLM agents (OTel-native) |
| **InfraLens** | eBPF-based, zero-instrumentation network observability |
| **Ingero** | GPU causal observability (eBPF, CUDA tracing) |
| **GreptimeDB** | Unified observability DB — replaces Prometheus + Loki + ES |
| **Netdata** | AI-powered full-stack monitoring, 800+ integrations, edge ML |
## Three pillars of observability
1. **Logs** — unstructured event data (ERROR, WARN, INFO)
2. **Metrics** — numerical data over time (latency, error rate, CPU utilization)
3. **Traces** — request tracking across services (distributed tracing)
## SLI / SLO / SLA
| Term | Meaning | Example |
|------|---------|---------|
| **SLI** (Service Level Indicator) | Measured metric | Latency p99 = 250ms |
| **SLO** (Service Level Objective) | Target value | 99.9 % of requests < 300ms |
| **SLA** (Service Level Agreement) | Legal commitment | 99.95 % uptime |
### Error budget
`Error Budget = 100 % - SLO`
- If SLO is 99.9 %, error budget is 0.1 % of time
- While error budget remains, the team can deploy new features
- When exhausted — freeze on deploys, stability is priority
## Pyramid of metrics — RED vs USE vs 4 Golden Signals
### 4 Golden Signals (Google SRE)
1. **Latency** — request processing time (distinguish success vs error latency)
2. **Traffic** — number of requests / throughput (RPS, QPS, throughput)
3. **Errors** — explicit errors (5xx, 4xx) and implicit (success with wrong result)
4. **Saturation** — how "full" the service is (CPU, memory, queue depth, connection pool)
### USE (for infrastructure)
- **U**tilization — how busy the resource is (% time active)
- **S**aturation — how much is waiting in queue (run queue, I/O wait)
- **E**rrors — errors (dropped packets, disk errors, OOM)
### RED (for services)
- **R**ate — requests per second
- **E**rrors — number of erroneous requests
- **D**uration — latency (distribution, percentiles)
| Methodology | Focus | Typical metrics |
|-------------|-------|-----------------|
| **4 Golden Signals** | Services + infrastructure | Latency, RPS, errors, saturation |
| **USE** | Infrastructure | CPU util, I/O saturation, disk errors |
| **RED** | Microservices | RPS, error rate, p50/p95/p99 latency |
## PromQL examples
| Expression | Description |
|------------|-------------|
| `rate(http_requests_total[5m])` | Requests per second (average over 5 min) |
| `increase(http_requests_total[1h])` | Total increase over 1 hour |
| `sum by (status) (rate(http_requests_total[5m]))` | Requests aggregated by status code |
| `histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))` | p99 latency |
| `avg_over_time(cpu_usage[1h])` | Average CPU utilization over an hour |
| `topk(5, sum(rate(http_requests_total[5m])) by (service))` | Top 5 services by RPS |
| `max_over_time(memory_usage[24h])` | Max memory usage over 24h |
| `rate(node_network_drop_total[5m]) > 0` | Networks with dropped packets |
| `(1 - avg(rate(node_cpu_seconds_total{mode="idle"}[5m])))` | CPU utilization (1 - idle) |
| `delta(http_request_duration_seconds_sum[5m]) / delta(http_request_duration_seconds_count[5m])` | Average latency |
| `absent(metric)` | Alert when metric is missing |
## Recording rules
Pre-aggregation of frequently used PromQL queries to reduce query load.
### When to use
- Complex queries used across multiple dashboards
- Queries over raw data with high cardinality
- Frequently queried aggregations (e.g., p99 latency over last month)
### Example
```yaml
groups:
- name: service_rules
interval: 1m
rules:
- record: job:http_requests:rate5m
expr: sum(rate(http_requests_total[5m])) by (job)
- record: instance:cpu:utilization
expr: (1 - avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance))
- record: service:http_latency:p99
expr: histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))
```
- **record** — new metric name (convention: `level:metric:aggregation`)
- **interval** — how often the rule evaluates (typically 1-5 min)
## Metrics — tools
### Metrics
| Tool | Description |
|------|-------------|
| Prometheus | Pull-based, time-series DB, powerful query language (PromQL) |
| Grafana | Visualization, dashboards, alerting |
| Zabbix | Enterprise monitoring, agent + agentless (SNMP/IPMI/JMX), auto-discovery, trigger-based alerting |
| Datadog | SaaS, APM, logs, metrics in one |
| New Relic | APM, browser monitoring |
| CloudWatch | AWS native |
| Azure Monitor | Azure native |
| Google Cloud Ops | GCP native |
### Logging
| Tool | Description |
|------|-------------|
| ELK Stack | Elasticsearch, Logstash, Kibana |
| Loki | Grafana Loki — lightweight, Prometheus-like |
| Splunk | Enterprise log management |
| Fluentd / Fluent Bit | Log collector and forwarder |
| Vector | High-performance log/metric collector |
### Tracing
| Tool | Description |
|------|-------------|
| Jaeger | Open-source distributed tracing |
| Zipkin | Open-source distributed tracing |
| OpenTelemetry | Standard for instrumentation (logs, metrics, traces) |
| Datadog APM | SaaS tracing |
| AWS X-Ray | AWS tracing |
## OpenTelemetry detail
### Span attributes
```yaml
resource:
attributes:
- service.name: "payment-service"
- service.version: "1.2.3"
- deployment.environment: "production"
scope:
name: "io.opentelemetry.payment"
spans:
- name: "processPayment"
kind: SPAN_KIND_INTERNAL
attributes:
- payment.method: "credit_card"
- payment.amount: 2499
- payment.currency: "CZK"
events:
- name: "authorization.complete"
timestamp: 1717428000000000000
```
### Context propagation (W3C TraceContext)
- **`traceparent`** — header carrying trace-id, span-id, trace flags
- Format: `00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01`
- Version (00) | Trace-ID (32 hex) | Span-ID (16 hex) | TraceFlags (01 = sampled)
- **`tracestate`** — vendor-specific data, compatible cross-provider
- Propagation happens via HTTP headers, gRPC metadata, message queue properties
### Sampling
| Type | Description | Use case |
|------|-------------|----------|
| **Head-based** | Sampling decision at trace start (based on ID) | Simple, deterministic |
| **Tail-based** | Decision after trace completion (based on result, latency) | Better sampling, more complex |
- Tail-based sampling: often used for critical traces (5xx, p99+, slow traces)
- Tools: Grafana Tempo (tail-based), Jaeger (head-based), OTel Collector (head + tail)
## Alerting
### Principles
- **Alert on symptom, not cause** — "500 errors" instead of "high CPU"
- **Reduce noise** — flapping alerts, alert fatigue
- **Runbook for every alert** — what to do when alert fires
- **Alert severity** — P0 (critical), P1 (high), P2 (medium), P3 (low)
### Alertmanager (Prometheus)
```yaml
route:
receiver: "team-pager"
group_by: ["alertname", "cluster"]
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
routes:
- match:
severity: critical
receiver: "team-pager"
repeat_interval: 1h
- match:
severity: warning
receiver: "team-slack"
receivers:
- name: "team-pager"
pagerduty_configs:
- routing_key: "<KEY>"
severity: "{{ .CommonLabels.severity }}"
- name: "team-slack"
slack_configs:
- channel: "#alerts"
title: "{{ .GroupLabels.alertname }}"
```
**Concepts**:
- **Grouping** — grouping alerts by labels (noise reduction, e.g., all down instances in a cluster)
- **Inhibition** — suppression of less severe alerts when a more severe one exists (e.g., nodedown inhibits pod alerts)
- **Silencing** — temporary alert suppression (matching labels + duration)
- **Routing tree** — hierarchical routing by label match (severity, service, team)
### ESM (Event / Incident Management)
- PagerDuty, Opsgenie, OnCall (Grafana)
- Escalation policies
- On-call rotations
## Structured logging
```json
{
"timestamp": "2026-06-03T10:30:00Z",
"level": "ERROR",
"service": "payment-service",
"trace_id": "abc123",
"user_id": "u456",
"message": "Payment gateway timeout",
"duration_ms": 1200,
"error": {
"type": "TimeoutError",
"message": "Gateway did not respond in 1000ms"
}
}
```
### Required fields of structured log
| Field | Description | Example |
|-------|-------------|---------|
| `timestamp` | ISO 8601 / RFC 3339 | `2026-06-03T10:30:00Z` |
| `level` | Log level (RFC 5424) | `ERROR`, `WARN`, `INFO`, `DEBUG` |
| `message` | Human-readable message | `Payment processed` |
| `service` | Service name | `payment-service` |
| `trace_id` | Correlation across services | `abc123def456` |
### RFC 5424 log levels
| Number | Level | Usage |
|--------|-------|-------|
| 0 | EMERG | System unusable |
| 1 | ALERT | Immediate action required |
| 2 | CRIT | Critical error |
| 3 | ERROR | Error (non-critical) |
| 4 | WARN | Warning |
| 5 | NOTICE | Normal but significant event |
| 6 | INFO | Informational message |
| 7 | DEBUG | Debugging (disabled in production) |
### Correlation ID (traceparent)
- Generated at system entry (API gateway, frontend, message consumer)
- Propagated in HTTP header `X-Correlation-ID` / `traceparent`
- Enables linking logs across microservices (→ Grafana Explore, Kibana Discover)
- Implementation: middleware in app, service mesh (Envoy), API gateway
## Distributed tracing detail
### Span kinds
| Kind | Description | Example |
|------|-------------|---------|
| **CLIENT** | Calling downstream service (outbound) | HTTP client calling API |
| **SERVER** | Processing incoming request | HTTP handler |
| **INTERNAL** | Local operation within service | Computation, transformation |
| **PRODUCER** | Sending message to queue | Kafka producer |
| **CONSUMER** | Receiving message from queue | Kafka consumer |
### Trace context chain
```
Trace: abc123
├── Span: /checkout (SERVER, root)
│ ├── Span: validateCart (INTERNAL)
│ ├── Span: POST /orders (CLIENT → payment-service)
│ │ └── Span: /processPayment (SERVER)
│ │ ├── Span: authorizeCard (INTERNAL)
│ │ └── Span: chargeCard (CLIENT → bank-gateway)
│ │ └── Span: /charge (SERVER, external)
│ └── Span: sendConfirmation (PRODUCER → kafka)
│ └── Span: consumeConfirmation (CONSUMER → email-service)
```
- **W3C TraceContext** — standardized cross-service tracing
- **Baggage** — transport of contextual data (tenant, user role) between spans
## Grafana
### Provisioning dashboards as code
```yaml
apiVersion: 1
providers:
- name: "default"
orgId: 1
folder: "Services"
type: file
options:
path: /etc/grafana/provisioning/dashboards
```
Dashboards JSON in git → CI/CD → automatic import into Grafana.
### Variables
- **Query variable** — dynamic values (e.g., list of service names from PromQL: `label_values(up, service)`)
- **Interval variable** — `$__auto_interval`, `$__interval` for variable time range
- **Custom variable** — manual list of values (env: prod, staging, dev)
- **Chained variable** — dependent variable (select namespace → show pods in namespace)
### Annotations
- Drawing events in graphs (deploys, incidents, config changes)
- Sources: Prometheus alerts, Loki logs, GitHub Actions, custom API
- Use case: "Deploy at 14:30 → spike in latency at 14:31 → correlation"
## On-call best practices
### Escalation policies
```
Level 1: Primary on-call (response within 5 min)
└── timeout 15 min
Level 2: Secondary / senior engineer (response within 15 min)
└── timeout 15 min
Level 3: Engineering manager / incident commander
```
### Incident severity matrix
| Severity | Description | Response | Communication |
|----------|-------------|----------|---------------|
| **P0 (Critical)** | Service completely unavailable, data loss, security breach | Immediate, 24/7 | Status page + Stakeholder update |
| **P1 (High)** | Major functionality degraded, part of users affected | Within 15 min | Slack channel + Team lead |
| **P2 (Medium)** | Non-critical feature broken, workaround exists | Within 1 h | Slack channel |
| **P3 (Low)** | Cosmetic issue, no user impact | Next business day | Jira ticket |
### Postmortem
- **Blameless** — goal is to learn, not blame
- **Structure**: Timeline, detection, root cause, resolution, action items
- **SRE principle**: every incident → postmortem → systemic improvement
- **Tools**: Jira, Incident.io, PagerDuty postmortem, Google Docs
## Logging patterns
### Best practices
- **Dashboard for each level** — executive, service, troubleshooting
- **Synthetic monitoring** — heartbeat checks, browser tests (Playwright, Cypress)
- **APM** — Application Performance Monitoring (database queries, external calls)
- **Anomaly detection** — ML-based outlier detection
- **Retention policy** — raw data short term, aggregations long term
- **Unified log format** — JSON, structured data
## Recommended literature
### Classic books
| Book | Authors | ISBN | Key topics |
|------|---------|------|------------|
| **Site Reliability Engineering** | Beyer, Jones, Petoff, Murphy | 978-1491929124 | How Google runs production systems — SRE principles, error budgets, toil, SLI/SLO |
| **The Site Reliability Workbook** | Beyer, Murphy, Rensin, Kawahara, Thorne | 978-1492029502 | Practical companion to SRE — case studies from Evernote, Home Depot, NY Times; SLO implementation, monitoring, on-call |
| **Observability Engineering** | Majors, Fong-Jones, Miranda | 978-1492076445 | First comprehensive book on observability — structured events, iterative hypothesis verification, core analysis loop; 2nd edition in 2026 (32 new chapters on AI, cost governance) |
### Cloud and monitoring
| Book | Author | ISBN/Year | Topics |
|------|--------|-----------|--------|
| **Cloud Observability in Action** | Michael Hausenblas | Manning, 2023 | Practical guide to observability in cloud-native environments — signal types (logs, metrics, traces, profiles), OTel Collector, SLOs, signal correlation, developer observability; open-source tools |
| **Mastering Prometheus** | William Hegedus | 978-1-80512-566-2 | Advanced Prometheus techniques — TSDB internals, custom service discovery, cardinality, remote storage (VictoriaMetrics, Mimir), SLO-based alerting; author is SRE manager at Akamai and Prometheus/Thanos contributor |
| **Observability with Grafana** | Chapman, Holmes | 978-1-80324-964-3 | Complete guide to LGTM stack (Loki, Grafana, Tempo, Mimir) — OTel instrumentation, LogQL/PromQL/TraceQL, AI/ML alerting, real user monitoring with Faro, Pyroscope profiling, k6 load testing |
| **Hands-On Monitoring and Alerting with Prometheus** | Muhammad Badawy | 978-9349887565 | Practical Prometheus guide — installation, configuration, service discovery, labeling, PromQL, Alertmanager, monitoring Linux, Windows, Docker, databases |
### AI and observability
| Book | Authors | ISBN/Year | Topics |
|------|---------|-----------|--------|
| **Observability in the AI-Native Era** | Lipsig, Grabner, Rati | 978-1-80638-959-9 | Connecting observability with AIOps — ML-based anomaly detection, root-cause analysis, self-healing systems, OTel + Prometheus + Grafana + Dynatrace/Datadog, compliance |
| **Open Source Observability** | Corless, Pawar | O'Reilly, 2025 | Report on disaggregated, modular observability stacks — flexibility, cost efficiency, data autonomy, blueprint for custom solutions from open-source components |
## Detailed tool overview
Extended information on tools from the table above:
### Grafana Sigil
AI observability product from Grafana Labs. OpenTelemetry-native SDK for instrumenting LLM agents:
- **Repository**: `github.com/grafana/sigil-sdk` (Go SDK) + `sigil-app` (Grafana plugin)
- **Features**: tracking conversations, generation, tool usage, cost tracking, quality evaluation
- **Growing problem**: 500M+ conversations, 5M+ agents in production (GrafanaCON 2026)
- **Integration**: automatic connection with Prometheus (metrics), Tempo (traces), AI Observability API
### InfraLens
Zero-instrumentation Kubernetes observability built on eBPF:
- **Repository**: `github.com/Herenn/Infralens` (Apache 2.0, Go)
- **Features**: automatic detection of service-to-service communication, topology visualization, AI-powered documentation
- **Architecture**: eBPF agent + Go backend + React frontend
- **Status**: early-stage (1 star, 10 commits), but eBPF-based observability concept is proven (Grafana Beyla, Cilium Hubble, Pixie)
### Ingero
GPU causal observability agent — first of its kind:
- **Repository**: `github.com/ingero-io/ingero` (Apache 2.0)
- **Features**: eBPF tracing from Linux kernel events through CUDA API to Python source code
- **Overhead**: < 2 %, zero code changes, single binary
- **MCP server**: native Model Context Protocol support — AI assistants can directly query GPU data
- **Use case**: diagnosis of GPU stalls, scheduler preemptions, CUDA memory spikes — causal chains instead of plain metrics
- **Version**: v0.19.0 (2026), active development
### GreptimeDB
Unified observability database — one backend for metrics, logs and traces:
- **Repository**: `github.com/GreptimeTeam/greptimedb` (Apache 2.0, Rust)
- **Architecture**: compute-storage disaggregation, object storage first (S3, GCS, Azure Blob), columnar storage
- **Querying**: SQL + PromQL in a single query, JOIN between metrics and logs possible
- **Drop-in replacement**: Prometheus (PromQL, remote write), Loki (Push API), Elasticsearch (bulk API), Jaeger (Query API)
- **Cost reduction**: up to 50× lower costs compared to traditional solutions
- **Roadmap 2026**: v1.0 GA (Q1 2026), v1.1v1.3 (Vector Index, AI Functions, Auto Rollup, adaptive resource management)
- **GreptimeDB Enterprise**: enhanced security, HA, enterprise support
### Netdata
Open-source, real-time monitoring platform for entire infrastructure:
- **Repository**: `github.com/netdata/netdata` (GPLv3+, C; 79k★)
- **Features**: per-second metrics, ML-based anomaly detection, AI-powered troubleshooting, 800+ integrations
- **Zero configuration**: auto-discovery, pre-configured alerts, ready dashboards
- **Architecture**: distributed agent → Netdata Cloud (optional), data stays local
- **Energy efficiency**: according to University of Amsterdam study, the most efficient tool for monitoring Docker systems
- **Netdata Cloud**: free tier (5 nodes), paid from $12/node/month
- **Licensing**: agent GPLv3+, dashboard NCUL1, cloud closed-source
## OpenStack Monitoring
OpenStack provides several services for telemetry and monitoring:
### Ceilometer (Telemetry)
- Metric collection (CPU, memory, network, storage) from compute, network and storage nodes
- Publishing to Gnocchi (time-series DB) or Panko (event storage)
- Notifications via oslo.messaging (RabbitMQ) — pipeline transformations
- Alarming: Aodh — threshold-based alarms, metric combinations
### Monasca
- More modern alternative to Ceilometer (primarily developed for telco use cases)
- Architecture: Monasca API → Log API → Transform → Threshold Engine → Notifier
- Backend: InfluxDB/Gnocchi, Kafka, Elasticsearch
- Supports alerting, notifications, graph dashboards
### Prometheus + OpenStack Exporter
- OpenStack-exporter for Prometheus (exports metrics from Ceilometer / API)
- Service discovery via Prometheus
- Grafana dashboards for visualization
### Masakari (VM High Availability)
- Detection and automatic recovery of VMs on hypervisor failure (host failure)
- Evacuation of instances to healthy compute node
- Integration with Pacemaker for cluster management
## Sources
Links, books and standards: [sources/monitoring/sources.en.md](sources/monitoring/sources.en.md)
*Last revision: 2026-06-03*

502
MONITORING.md Normal file
View File

@@ -0,0 +1,502 @@
# 📊 Monitoring a observabilita
## OpenMetrics standard
OpenMetrics (CNCF sandbox) je de-facto standard pro expozici metrik v cloud-native prostředí:
- Podpora text representation i Protocol Buffers
- Základ pro Prometheus exposition format
- Specifikuje: counter, gauge, histogram, summary, gaugehistogram, statefulset
- `_total` suffix pro kumulativní hodnoty, `_bucket` pro histogramy
- Metadata: HELP, TYPE, UNIT, (časové razítko volitelné)
Standard se vyvíjí v rámci [OpenObservability](https://github.com/OpenObservability/OpenMetrics).
## Nové nástroje a trendy (20242026)
| Nástroj | Popis |
|---------|-------|
| **Grafana Sigil** | AI observability pro LLM agenty (OTel-native) |
| **InfraLens** | eBPF-based, zero-instrumentation network observability |
| **Ingero** | GPU causal observability (eBPF, CUDA tracing) |
| **GreptimeDB** | Unified observability DB — nahrazuje Prometheus + Loki + ES |
| **Netdata** | AI-powered full-stack monitoring, 800+ integrations, edge ML |
## Tři pilíře observability
1. **Logs** — nestrukturovaná data o událostech (ERROR, WARN, INFO)
2. **Metrics** — číselná data v čase (latence, chybovost, vytížení CPU)
3. **Traces** — sledování požadavku napříč službami (distributed tracing)
## SLI / SLO / SLA
| Termín | Význam | Příklad |
|--------|--------|---------|
| **SLI** (Service Level Indicator) | Naměřená metrika | Latence p99 = 250ms |
| **SLO** (Service Level Objective) | Cílová hodnota | 99.9 % requestů < 300ms |
| **SLA** (Service Level Agreement) | Právní závazek | 99.95 % uptime |
### Error budget
`Error Budget = 100 % - SLO`
- Pokud je SLO 99.9 %, error budget je 0.1 % času
- Dokud error budget zbývá, tým může deployovat nové featury
- Po vyčerpání — freeze na deploye, priorita je stabilita
## Pyramid of metrics — RED vs USE vs 4 Golden Signals
### 4 Golden Signals (Google SRE)
1. **Latency** — čas zpracování requestu (rozlišovat success vs error latenci)
2. **Traffic** — počet requestů / propustnost (RPS, QPS, throughput)
3. **Errors** — explicitní chyby (5xx, 4xx) i implicitní (success s chybným výsledkem)
4. **Saturation** — jak je služba "plná" (CPU, memory, queue depth, connection pool)
### USE (pro infrastrukturu)
- **U**tilization — jak je resource vytížená (% času je aktivní)
- **S**aturation — kolik čeká ve frontě (run queue, I/O wait)
- **E**rrors — chyby (dropped packets, disk errors, OOM)
### RED (pro služby)
- **R**ate — počet requestů za sekundu
- **E**rrors — počet chybných requestů
- **D**uration — latence (distribuce, percentily)
| Metodologie | Zaměření | Typické metriky |
|------------|----------|----------------|
| **4 Golden Signals** | Služby + infrastruktura | Latence, RPS, errors, saturation |
| **USE** | Infrastruktura | CPU util, I/O saturace, disk errors |
| **RED** | Microservices | RPS, error rate, p50/p95/p99 latence |
## PromQL příklady
| Výraz | Popis |
|-------|-------|
| `rate(http_requests_total[5m])` | Počet requestů za sekundu (průměr za 5 min) |
| `increase(http_requests_total[1h])` | Celkový nárůst za 1 hodinu |
| `sum by (status) (rate(http_requests_total[5m]))` | Requesty agregované podle status kódu |
| `histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))` | p99 latence |
| `avg_over_time(cpu_usage[1h])` | Průměrné CPU vytížení za hodinu |
| `topk(5, sum(rate(http_requests_total[5m])) by (service))` | Top 5 služeb podle RPS |
| `max_over_time(memory_usage[24h])` | Maximální memory usage za 24h |
| `rate(node_network_drop_total[5m]) > 0` | Sítě s dropped pakety |
| `(1 - avg(rate(node_cpu_seconds_total{mode="idle"}[5m])))` | CPU utilization (1 - idle) |
| `delta(http_request_duration_seconds_sum[5m]) / delta(http_request_duration_seconds_count[5m])` | Průměrná latence |
| `absent(metric)` | Alert když metrika chybí |
## Recording rules
Pre-agregace často používaných PromQL dotazů pro snížení zátěže při dotazování.
### Kdy použít
- Složité dotazy používané na více dashboardech
- Dotazy nad surovými daty s vysokým kardinality
- Často dotazované agregace (např. p99 latence za poslední měsíc)
### Příklad
```yaml
groups:
- name: service_rules
interval: 1m
rules:
- record: job:http_requests:rate5m
expr: sum(rate(http_requests_total[5m])) by (job)
- record: instance:cpu:utilization
expr: (1 - avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance))
- record: service:http_latency:p99
expr: histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))
```
- **record** — název nové metriky (konvence: `level:metric:aggregation`)
- **interval** — jak často se pravidlo vyhodnocuje (typicky 1-5 min)
## Metriky — nástroje
### Metrics
| Nástroj | Popis |
|---------|-------|
| Prometheus | Pull-based, time-series DB, silný query language (PromQL) |
| Grafana | Vizualizace, dashboardy, alerting |
| Zabbix | Enterprise monitoring, agent + agentless (SNMP/IPMI/JMX), auto-discovery, trigger-based alerting |
| Datadog | SaaS, APM, logs, metrics v jednom |
| New Relic | APM, browser monitoring |
| CloudWatch | AWS nativní |
| Azure Monitor | Azure nativní |
| Google Cloud Ops | GCP nativní |
### Logging
| Nástroj | Popis |
|---------|-------|
| ELK Stack | Elasticsearch, Logstash, Kibana |
| Loki | Grafana Loki — lightweight, Prometheus-like |
| Splunk | Enterprise log management |
| Fluentd / Fluent Bit | Log collector a forwarder |
| Vector | High-performance log/metric collector |
### Tracing
| Nástroj | Popis |
|---------|-------|
| Jaeger | Open-source distributed tracing |
| Zipkin | Open-source distributed tracing |
| OpenTelemetry | Standard pro instrumentaci (logs, metrics, traces) |
| Datadog APM | SaaS tracing |
| AWS X-Ray | AWS tracing |
## OpenTelemetry detail
### Span attributes
```yaml
resource:
attributes:
- service.name: "payment-service"
- service.version: "1.2.3"
- deployment.environment: "production"
scope:
name: "io.opentelemetry.payment"
spans:
- name: "processPayment"
kind: SPAN_KIND_INTERNAL
attributes:
- payment.method: "credit_card"
- payment.amount: 2499
- payment.currency: "CZK"
events:
- name: "authorization.complete"
timestamp: 1717428000000000000
```
### Context propagation (W3C TraceContext)
- **`traceparent`** — hlavička nesoucí trace-id, span-id, trace flags
- Formát: `00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01`
- Version (00) | Trace-ID (32 hex) | Span-ID (16 hex) | TraceFlags (01 = sampled)
- **`tracestate`** — vendor-specific data, kompatibilní cross-provider
- Propagace probíhá přes HTTP hlavičky, gRPC metadata, message queue properties
### Sampling
| Typ | Popis | Use case |
|-----|-------|----------|
| **Head-based** | Rozhodnutí o sample na začátku trace (na základě ID) | Jednoduchý, deterministický |
| **Tail-based** | Rozhodnutí po dokončení trace (podle výsledku, latence) | Kvalitnější sample, komplexnější |
- Tail-based sampling: často používán pro kritické trace (5xx, p99+, slow traces)
- Nástroje: Grafana Tempo (tail-based), Jaeger (head-based), OTel Collector (head + tail)
## Alerting
### Principy
- **Alert na symptom, ne na příčinu** — "500 errors" místo "high CPU"
- **Reduce noise** — flapping alerts, alert fatigue
- **Runbook pro každý alert** — co dělat když alert pípne
- **Alert severity** — P0 (critical), P1 (high), P2 (medium), P3 (low)
### Alertmanager (Prometheus)
```yaml
route:
receiver: "team-pager"
group_by: ["alertname", "cluster"]
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
routes:
- match:
severity: critical
receiver: "team-pager"
repeat_interval: 1h
- match:
severity: warning
receiver: "team-slack"
receivers:
- name: "team-pager"
pagerduty_configs:
- routing_key: "<KEY>"
severity: "{{ .CommonLabels.severity }}"
- name: "team-slack"
slack_configs:
- channel: "#alerts"
title: "{{ .GroupLabels.alertname }}"
```
**Koncepty**:
- **Grouping** — seskupování alertů podle labelů (snížení noise, např. všechny down instance v clusteru)
- **Inhibition** — potlačení méně závažných alertů při existenci závažnějšího (např. nodedown inhibuje pod alerty)
- **Silencing** — dočasné potlačení alertu (matching labels + duration)
- **Routing tree** — hierarchické routování podle label match (severity, service, team)
### ESM (Event / Incident Management)
- PagerDuty, Opsgenie, OnCall (Grafana)
- Escalation policies
- On-call rotations
## Strukturované logování
```json
{
"timestamp": "2026-06-03T10:30:00Z",
"level": "ERROR",
"service": "payment-service",
"trace_id": "abc123",
"user_id": "u456",
"message": "Payment gateway timeout",
"duration_ms": 1200,
"error": {
"type": "TimeoutError",
"message": "Gateway did not respond in 1000ms"
}
}
```
### Povinná pole strukturovaného logu
| Pole | Popis | Příklad |
|------|-------|---------|
| `timestamp` | ISO 8601 / RFC 3339 | `2026-06-03T10:30:00Z` |
| `level` | Log level (RFC 5424) | `ERROR`, `WARN`, `INFO`, `DEBUG` |
| `message` | Lidsky čitelná zpráva | `Payment processed` |
| `service` | Název služby | `payment-service` |
| `trace_id` | Korelace napříč službami | `abc123def456` |
### RFC 5424 log levels
| Číslo | Level | Použití |
|-------|-------|---------|
| 0 | EMERG | Systém nepoužitelný |
| 1 | ALERT | Nutná okamžitá akce |
| 2 | CRIT | Kritická chyba |
| 3 | ERROR | Chyba (ne kritická) |
| 4 | WARN | Varování |
| 5 | NOTICE | Normální, ale důležitá událost |
| 6 | INFO | Informační zpráva |
| 7 | DEBUG | Ladění (vypnuto v produkci) |
### Correlation ID (traceparent)
- Generován při vstupu do systému (API gateway, frontend, message consumer)
- Propagován v HTTP hlavičce `X-Correlation-ID` / `traceparent`
- Umožňuje spojit logy napříč microservices (→ Grafana Explore, Kibana Discover)
- Implementace: middleware v aplikaci, service mesh (Envoy), API gateway
## Distributed tracing detail
### Span kinds
| Kind | Popis | Příklad |
|------|-------|---------|
| **CLIENT** | Volání downstream služby (outbound) | HTTP klient volá API |
| **SERVER** | Zpracování příchozího požadavku | HTTP handler |
| **INTERNAL** | Lokální operace v rámci služby | Výpočet, transformace |
| **PRODUCER** | Odeslání zprávy do fronty | Kafka producer |
| **CONSUMER** | Příjem zprávy z fronty | Kafka consumer |
### Trace context chain
```
Trace: abc123
├── Span: /checkout (SERVER, root)
│ ├── Span: validateCart (INTERNAL)
│ ├── Span: POST /orders (CLIENT → payment-service)
│ │ └── Span: /processPayment (SERVER)
│ │ ├── Span: authorizeCard (INTERNAL)
│ │ └── Span: chargeCard (CLIENT → bank-gateway)
│ │ └── Span: /charge (SERVER, external)
│ └── Span: sendConfirmation (PRODUCER → kafka)
│ └── Span: consumeConfirmation (CONSUMER → email-service)
```
- **W3C TraceContext** — standardizace cross-service tracing
- **Baggage** — přenos kontextových dat (tenant, user role) mezi spans
## Grafana
### Provisioning dashboards as code
```yaml
apiVersion: 1
providers:
- name: "default"
orgId: 1
folder: "Services"
type: file
options:
path: /etc/grafana/provisioning/dashboards
```
Dashboards JSON v gitu → CI/CD → automatický import do Grafany.
### Variables
- **Query variable** — dynamické hodnoty (např. seznam service names z PromQL: `label_values(up, service)`)
- **Interval variable** — `$__auto_interval`, `$__interval` pro proměnlivý time range
- **Custom variable** — ruční seznam hodnot (env: prod, staging, dev)
- **Chained variable** — závislá proměnná (výběr namespace → zobrazí pody v namespace)
### Annotations
- Kreslení událostí do grafu (deploye, incidenty, config změny)
- Zdroje: Prometheus alerty, Loki logy, GitHub Actions, custom API
- Use case: "Deploy v 14:30 → spike v latenci v 14:31 → korelace"
## On-call best practices
### Escalation policies
```
Level 1: Primární on-call (reakce do 5 min)
└── timeout 15 min
Level 2: Sekundární / senior engineer (reakce do 15 min)
└── timeout 15 min
Level 3: Engineering manager / incident commander
```
### Incident severity matrix
| Severity | Popis | Reakce | Komunikace |
|----------|-------|--------|------------|
| **P0 (Critical)** | Služba kompletně nedostupná, data loss, security breach | Ihned, 24/7 | Status page + Stakeholder update |
| **P1 (High)** | Major funkčnost degradovaná, část uživatelů postižena | Do 15 min | Slack channel + Tým lead |
| **P2 (Medium)** | Non-critical funkce nefunguje, workaround existuje | Do 1 h | Slack channel |
| **P3 (Low)** | Kosmetický problém, žádný dopad na uživatele | Next business day | Jira ticket |
### Postmortem
- **Blameless** — cílem je naučit se, ne obviňovat
- **Struktura**: Timeline, detection, root cause, resolution, action items
- **SRE princip**: každá incident → postmortem → systémové zlepšení
- **Nástroje**: Jira, Incident.io, PagerDuty postmortem, Google Docs
## Logging patterns
### Best practices
- **Dashboard pro každou úroveň** — executive, service, troubleshooting
- **Syntetické monitoring** — Heartbeat checky, browser tests (Playwright, Cypress)
- **APM** — Application Performance Monitoring (databázové query, externí volání)
- **Anomaly detection** — ML-based detekce outlierů
- **Retention politika** — raw data krátce, agregace dlouhodobě
- **Jednotný formát logů** — JSON, strukturovaná data
## Doporučená literatura
### Klasické knihy
| Kniha | Autoři | ISBN | Klíčová témata |
|-------|--------|------|----------------|
| **Site Reliability Engineering** | Beyer, Jones, Petoff, Murphy | 978-1491929124 | Jak Google provozuje produkční systémy — SRE principy, error budgety, toil, SLI/SLO |
| **The Site Reliability Workbook** | Beyer, Murphy, Rensin, Kawahara, Thorne | 978-1492029502 | Praktický doprovod k SRE — case studies z Evernote, Home Depot, NY Times; implementace SLO, monitoring, on-call |
| **Observability Engineering** | Majors, Fong-Jones, Miranda | 978-1492076445 | První ucelená kniha o observability — structured events, iterativní verifikace hypotéz, core analysis loop; 2. vydání v roce 2026 (32 nových kapitol o AI, cost governance) |
### Cloud a monitoring
| Kniha | Autor | ISBN/Rok | Témata |
|-------|-------|----------|--------|
| **Cloud Observability in Action** | Michael Hausenblas | Manning, 2023 | Praktický průvodce observability v cloud-native prostředí — signal types (logs, metrics, traces, profiles), OTel Collector, SLOs, signal correlation, developer observability; open-source nástroje |
| **Mastering Prometheus** | William Hegedus | 978-1-80512-566-2 | Pokročilé techniky pro Prometheus — interní architektura TSDB, custom service discovery, kardinalita, remote storage (VictoriaMetrics, Mimir), SLO-based alerting; autor je SRE manager v Akamai a contributor Prometheus/Thanos |
| **Observability with Grafana** | Chapman, Holmes | 978-1-80324-964-3 | Kompletní průvodce LGTM stackem (Loki, Grafana, Tempo, Mimir) — instrumentace přes OTel, LogQL/PromQL/TraceQL, AI/ML alerting, real user monitoring s Faro, Pyroscope profiling, k6 zátěžové testování |
| **Hands-On Monitoring and Alerting with Prometheus** | Muhammad Badawy | 978-9349887565 | Praktický průvodce Prometheus — instalace, konfigurace, service discovery, labeling, PromQL, Alertmanager, monitoring Linux, Windows, Docker, databází |
### AI a observability
| Kniha | Autoři | ISBN/Rok | Témata |
|-------|--------|----------|--------|
| **Observability in the AI-Native Era** | Lipsig, Grabner, Rati | 978-1-80638-959-9 | Propojení observability s AIOps — ML-based anomaly detection, root-cause analysis, self-healing systémy, OTel + Prometheus + Grafana + Dynatrace/Datadog, compliance |
| **Open Source Observability** | Corless, Pawar | O'Reilly, 2025 | Report o disaggregated, modulárních observability stackách — flexibilita, cost efficiency, data autonomy, blueprint pro vlastní řešení z open-source komponent |
## Detailní přehled nástrojů
Rozšířené informace k nástrojům z tabulky výše:
### Grafana Sigil
AI observability produkt od Grafana Labs. OpenTelemetry-native SDK pro instrumentaci LLM agentů:
- **Repozitář**: `github.com/grafana/sigil-sdk` (Go SDK) + `sigil-app` (Grafana plugin)
- **Funkce**: sledování konverzací, generování, tool usage, cost tracking, quality evaluation
- **Rostoucí problém**: 500M+ konverzací, 5M+ agentů v produkci (GrafanaCON 2026)
- **Integrace**: automatické propojení s Prometheus (metrics), Tempo (traces), AI Observability API
### InfraLens
Zero-instrumentation Kubernetes observability postavená na eBPF:
- **Repozitář**: `github.com/Herenn/Infralens` (Apache 2.0, Go)
- **Funkce**: automatická detekce service-to-service komunikace, vizualizace topologie, AI-powered dokumentace
- **Architektura**: eBPF agent + Go backend + React frontend
- **Status**: early-stage (1 star, 10 commitů), ale koncept eBPF-based observability je potvrzený (Grafana Beyla, Cilium Hubble, Pixie)
### Ingero
GPU causal observability agent — první svého druhu:
- **Repozitář**: `github.com/ingero-io/ingero` (Apache 2.0)
- **Funkce**: eBPF tracing od Linux kernel eventů přes CUDA API až po Python zdrojový kód
- **Overhead**: < 2 %, zero code changes, jeden binární soubor
- **MCP server**: nativní podpora Model Context Protocol — AI asistenti mohou přímo queryovat GPU data
- **Use case**: diagnostika GPU stallů, scheduler preemptions, CUDA memory spikes — kauzální řetězce místo prostých metrik
- **Verze**: v0.19.0 (2026), aktivní vývoj
### GreptimeDB
Unified observability databáze — jeden backend pro metrics, logs a tracy:
- **Repozitář**: `github.com/GreptimeTeam/greptimedb` (Apache 2.0, Rust)
- **Architektura**: compute-storage disaggregation, object storage first (S3, GCS, Azure Blob), columnar storage
- **Dotazování**: SQL + PromQL v jedné query, možnost JOIN mezi metrikami a logy
- **Drop-in náhrada**: Prometheus (PromQL, remote write), Loki (Push API), Elasticsearch (bulk API), Jaeger (Query API)
- **Cost reduction**: až 50× nižší náklady oproti tradičním řešením
- **Roadmap 2026**: v1.0 GA (Q1 2026), v1.1v1.3 (Vector Index, AI Functions, Auto Rollup, adaptive resource management)
- **GreptimeDB Enterprise**: enhanced security, HA, enterprise support
### Netdata
Open-source, real-time monitoring platform pro celou infrastrukturu:
- **Repozitář**: `github.com/netdata/netdata` (GPLv3+, C; 79k★)
- **Funkce**: per-sekundové metriky, ML-based anomaly detection, AI-powered troubleshooting, 800+ integrací
- **Zero configuration**: auto-discovery, pre-configured alerts, hotové dashboardy
- **Architektura**: distributed agent → Netdata Cloud (volitelně), data zůstávají lokální
- **Energetická efektivita**: dle studie University of Amsterdam nejefektivnější nástroj pro monitoring Docker systémů
- **Netdata Cloud**: free tier (5 node), paid od $12/node/měsíc
- **Licencování**: agent GPLv3+, dashboard NCUL1, cloud closed-source
## OpenStack Monitoring
OpenStack poskytuje několik služeb pro telemetrii a monitoring:
### Ceilometer (Telemetry)
- Sběr metrik (CPU, memory, network, storage) z compute, network a storage uzlů
- Publikování do Gnocchi (time-series DB) nebo Panko (event storage)
- Notifikace přes oslo.messaging (RabbitMQ) — pipeline transformations
- Alarming: Aodh — threshold-based alarmy, kombinace metrik
### Monasca
- Modernější alternativa k Ceilometer (vyvíjen primárně pro telco use cases)
- Architektura: Monasca API → Log API → Transform → Threshold Engine → Notifier
- Backend: InfluxDB/Gnocchi, Kafka, Elasticsearch
- Podporuje alarmování, notifikace, grafové dashboardy
### Prometheus + OpenStack Exporter
- OpenStack-exporter pro Prometheus (exportuje metriky z Ceilometer / API)
- Service discovery přes Prometheus
- Grafana dashboardy pro vizualizaci
### Masakari (VM High Availability)
- Detekce a automatické zotavení VM při selhání hypervisoru (host failure)
- Evacuation instance na zdravý compute node
- Integrace s Pacemaker pro cluster management
## Zdroje
Odkazy, knihy a standardy: [sources/monitoring/sources.md](sources/monitoring/sources.md)
*Poslední revize: 2026-06-03*

142
MYSQL.en.md Normal file
View File

@@ -0,0 +1,142 @@
# 🐬 MySQL & MariaDB
## Overview
MySQL is the most widespread open-source relational database, especially in web environments (LAMP stack). MariaDB is a fork after Oracle's acquisition, fully compatible with extensions. Default choice for WordPress, Drupal, Magento, and most PHP applications.
## Architecture (server + storage engine)
Based on *High Performance MySQL* (Schwartz, Zaitsev, Tkachenko):
```text
MySQL Server Layer
├── Connection handling (thread-per-connection)
├── Query parser & optimizer
├── Built-in functions
└── Storage Engine API
├── InnoDB (default, MVCC, ACID)
├── MyISAM (legacy, table-level locks)
├── MEMORY (in-memory, HEAP)
└── ... (others)
```
### InnoDB (default engine since MySQL 5.5+)
- **MVCC** — Multi-Version Concurrency Control (snapshot isolation)
- **REPEATABLE READ** (default) — next-key locking prevents phantom reads
- **Clustered index** — primary key = physical data ordering
- **Buffer pool** — cache of data and indexes in RAM (main performance parameter)
- **Doublewrite buffer** — prevents partial page writes
### Schema design tips
- Prefer smaller data types (MEDIUMINT over INT, TIMESTAMP over DATETIME)
- Use NULL carefully (each NULL column increases index complexity)
- Use ENUM only for truly small, stable value lists
- JSON columns in MySQL 8+ — useful for flexible schema, but not for joins
### Deferred join pattern
```sql
-- 1. covering index finds PK
-- 2. only then join to full row
SELECT * FROM users
INNER JOIN (
SELECT id FROM users
WHERE status = 'active'
ORDER BY created_at DESC
LIMIT 100 OFFSET 1000
) AS tmp USING (id);
```
**Join decomposition**: Sometimes it's better to split a JOIN into several simple queries (better cache utilization, fewer locks, scaling across servers).
**IN() optimization**: MySQL sorts values in the IN() list and uses binary search (O(log n)), unlike OR clauses (O(n)).
## MariaDB differences from MySQL
| Feature | MySQL 8.x | MariaDB 11.x |
|---------|-----------|--------------|
| **Storage engine** | InnoDB (only) | InnoDB + XtraDB (fork) + Aria + MyRocks |
| **JSON** | Native JSON type | JSON alias to LONGTEXT + JSON functions |
| **CTE** | WITH (non-recursive + recursive) | WITH (non-recursive + recursive) |
| **Window functions** | Yes (8.0+) | Yes (10.2+) |
| **Sequence** | No (auto_increment only) | Yes (CREATE SEQUENCE) |
| **Thread pooling** | Enterprise only | Built-in |
| **Galera cluster** | No (natively) | Yes (native synchronous clustering) |
## ProxySQL
ProxySQL is an advanced proxy for MySQL with sophisticated routing:
| Feature | Description |
|---------|-------------|
| **Query routing** | Rules for directing queries (read/write split, sharding) |
| **Connection pooling** | Multiplexing thousands of connections into a small pool |
| **Query cache** | Result caching in memory (TTL, size limit) |
| **Query rewriting** | Rewrite SQL queries in transit |
| **Active monitoring** | Backend outage detection, automatic failover |
## Recommendations — where MySQL is better
| Area | MySQL | Competition | Why MySQL |
|------|-------|------------|-----------|
| **Web applications** | De facto standard for WP, Drupal, Magento | PostgreSQL (fewer CMS plugins) | Broadest support in web hosting providers |
| **Read-heavy (SELECT heavy)** | InnoDB buffer pool, covering index, adaptive hash | PostgreSQL (MVCC overhead on reads) | Cache-efficient, fast point lookups |
| **Replication** | Async replication, Group Replication, InnoDB Cluster | PostgreSQL (streaming replication) | Simpler setup, extensive documentation |
| **Ecosystem** | ProxySQL, Orchestrator, Vitess, PlanetScale | PostgreSQL (fewer tools) | Most tooling for cluster management |
| **JSON in MySQL 8+** | JSON data type, Multi-Value Indexes | PostgreSQL (jsonb, GIN) | Comparable, Multi-Value Index unique |
### When to use MySQL / MariaDB
- **CMS / e-commerce** — WordPress, Drupal, Magento, Joomla (all require MySQL)
- **Read-heavy applications** — InnoDB buffer pool efficiently caches frequently read data
- **Simple replication** — Group Replication / InnoDB Cluster for HA
- **MariaDB for Galera cluster** — synchronous multi-master clustering
- **PHP applications** — native PHP MySQL extensions (mysqli, PDO_MySQL)
## MySQL / MariaDB licensing
### MySQL licensing
| Variant | License | Price | Restrictions |
|---------|---------|-------|-------------|
| **MySQL Community (GPL)** | GPL v2 | $0 | If you distribute an application that contains MySQL (e.g., embedded), you must release the entire application under GPL. Web applications (over network) ≠ distribution — GPL does not apply |
| **MySQL Standard (Commercial)** | Commercial (Oracle) | ~$2,000/server/year | No GPL restrictions, production support, MySQL Enterprise Monitor |
| **MySQL Enterprise** | Commercial (Oracle) | ~$5,000/server/year | All above + MySQL Enterprise Backup, Audit, Firewall, Thread Pool, Encryption |
| **MySQL Cluster CGE** | Commercial (Oracle) | ~$10,000/server/year | Distributed multi-master cluster (NDB), telco-grade |
**When GPL matters**: If you embed MySQL into a commercial product (e.g., desktop application with MySQL library). Web applications communicating over TCP/IP are **not** distribution — GPL does not apply.
### MariaDB licensing
| Variant | License | Price | Restrictions |
|---------|---------|-------|-------------|
| **MariaDB Community** | GPL v2 | $0 | Same as MySQL Community — GPL, but without Oracle licensing risks |
| **MariaDB Enterprise** | Business Source License (BSL) | Subscription (~$2-5k/server/year) | Automatically converts to GPL v2 after 3 years. Includes enterprise features (ColumnStore, Spider, Xpand) |
| **MariaDB SkySQL** | Managed (BSL) | Pay-per-use (~$0.10-1.00/hour) | Fully managed DBaaS |
**Key difference from Oracle MySQL**:
- MariaDB is an independent fork, not controlled by Oracle
- BSL model is more liberal — becomes open source after 3 years
- MariaDB does not require commercial license for enterprise features (in MySQL they are enterprise-only)
### When to use something else
- **Complex queries / CTE / window functions** → PostgreSQL (more advanced optimizer)
- **GIS / geospatial data** → PostgreSQL + PostGIS
- **Consistency > speed** → PostgreSQL (SSI serializable)
- **High-throughput writes** → Cassandra (MySQL master bottleneck)
- **Distributed SQL cluster** → CockroachDB, Vitess (MySQL compatible sharding)
## Sources
References, books, and standards: [sources/databases/sources.en.md](sources/databases/sources.en.md)
### Recommended reading
| Book | Authors | ISBN | Description |
|------|---------|------|-------------|
| High Performance MySQL (4th ed.) | Schwartz, Zaitsev, Tkachenko | 978-1492075292 | Comprehensive guide to MySQL architecture, optimization, and monitoring |
*Last revision: 2026-06-03*

142
MYSQL.md Normal file
View File

@@ -0,0 +1,142 @@
# 🐬 MySQL & MariaDB
## Přehled
MySQL je nejrozšířenější open-source relační databáze, zejména ve webovém prostředí (LAMP stack). MariaDB je fork po akvizici Oracle, plně kompatibilní s rozšířeními. Výchozí volba pro WordPress, Drupal, Magento a většinu PHP aplikací.
## Architektura (server + storage engine)
Na základě *High Performance MySQL* (Schwartz, Zaitsev, Tkachenko):
```text
MySQL Server Layer
├── Connection handling (thread-per-connection)
├── Query parser & optimizer
├── Built-in functions
└── Storage Engine API
├── InnoDB (výchozí, MVCC, ACID)
├── MyISAM (legacy, table-level locks)
├── MEMORY (in-memory, HEAP)
└── ... (ostatní)
```
### InnoDB (výchozí engine od MySQL 5.5+)
- **MVCC** — Multi-Version Concurrency Control (snapshot isolation)
- **REPEATABLE READ** (výchozí) — next-key locking zabraňuje phantom reads
- **Clustered index** — primární klíč = fyzické uspořádání dat
- **Buffer pool** — cache dat a indexů v RAM (hlavní parametr výkonu)
- **Doublewrite buffer** — prevence částečného zápisu stránky
### Schema design tipy
- Preferovat menší datové typy (MEDIUMINT místo INT, TIMESTAMP místo DATETIME)
- NULL používat opatrně (každý NULL sloupec zvyšuje složitost indexu)
- ENUM používat jen pro opravdu malé, stabilní seznamy hodnot
- JSON sloupce v MySQL 8+ — užitečné pro flexibilní schema, ale ne pro joinování
### Deferred join pattern
```sql
-- 1. covering index najde PK
-- 2. teprve pak join na plný řádek
SELECT * FROM users
INNER JOIN (
SELECT id FROM users
WHERE status = 'active'
ORDER BY created_at DESC
LIMIT 100 OFFSET 1000
) AS tmp USING (id);
```
**Join decomposition**: Někdy výhodnější rozdělit JOIN na několik jednoduchých dotazů (lepší využití cache, méně locků, škálování napříč servery).
**IN() optimalizace**: MySQL řadí hodnoty v IN() seznamu a používá binární vyhledávání (O(log n)), na rozdíl od OR klauzulí (O(n)).
## MariaDB rozdíly oproti MySQL
| Vlastnost | MySQL 8.x | MariaDB 11.x |
|-----------|-----------|--------------|
| **Storage engine** | InnoDB (pouze) | InnoDB + XtraDB (fork) + Aria + MyRocks |
| **JSON** | Native JSON typ | JSON alias na LONGTEXT + JSON funkce |
| **CTE** | WITH (non-recursive + recursive) | WITH (non-recursive + recursive) |
| **Window functions** | Ano (8.0+) | Ano (10.2+) |
| **Sequence** | Ne (auto_increment only) | Ano (CREATE SEQUENCE) |
| **Thread pooling** | Enterprise only | Vestavěný |
| **Galera cluster** | Ne (nativně) | Ano (nativní synchronní clustering) |
## ProxySQL
ProxySQL je advanced proxy pro MySQL s pokročilým routingem:
| Vlastnost | Popis |
|-----------|-------|
| **Query routing** | Pravidla pro směrování dotazů (read/write split, sharding) |
| **Connection pooling** | Multiplexování tisíců spojení do malého poolu |
| **Query cache** | Cache výsledků v paměti (TTL, size limit) |
| **Query rewriting** | Rewrite SQL dotazů na cestě |
| **Aktivní monitoring** | Detekce výpadků backendů, automatic failover |
## Doporučení — v čem je MySQL lepší
| Oblast | MySQL | Konkurence | Proč MySQL |
|--------|-------|------------|------------|
| **Webové aplikace** | De facto standard pro WP, Drupal, Magento | PostgreSQL (méně CMS pluginů) | Nejširší podpora ve web hosting providers |
| **Čtení (SELECT heavy)** | InnoDB buffer pool, covering index, adaptive hash | PostgreSQL (MVCC overhead u čtení) | Cache-efficient, rychlé point lookupy |
| **Replikace** | Async replication, Group Replication, InnoDB Cluster | PostgreSQL (streaming replication) | Jednodušší setup, široká dokumentace |
| **Ekosystém** | ProxySQL, Orchestrator, Vitess, PlanetScale | PostgreSQL (méně nástrojů) | Nejvíce toolingu pro správu clusteru |
| **JSON v MySQL 8+** | JSON datový typ, Multi-Value Indexes | PostgreSQL (jsonb, GIN) | Srovnatelné, Multi-Value Index unikátní |
### Kdy použít MySQL / MariaDB
- **CMS / e-commerce** — WordPress, Drupal, Magento, Joomla (všechny vyžadují MySQL)
- **Read-heavy aplikace** — InnoDB buffer pool efektivně cachuje často čtená data
- **Jednoduchá replicace** — Group Replication / InnoDB Cluster pro HA
- **MariaDB pro Galera cluster** — synchronní multi-master clustering
- **PHP aplikace** — nativní PHP MySQL extensions (mysqli, PDO_MySQL)
## MySQL / MariaDB licensing
### MySQL licensing
| Varianta | Licence | Cena | Omezení |
|----------|---------|------|---------|
| **MySQL Community (GPL)** | GPL v2 | $0 | Pokud distribuujete aplikaci, která obsahuje MySQL (např. embedded), musíte uvolnit celou aplikaci pod GPL. Webová aplikace (přes network) ≠ distribuce — GPL se netýká |
| **MySQL Standard (Commercial)** | Commercial (Oracle) | ~$2 000/server/rok | Bez GPL omezení, production support, MySQL Enterprise Monitor |
| **MySQL Enterprise** | Commercial (Oracle) | ~$5 000/server/rok | Vše výše + MySQL Enterprise Backup, Audit, Firewall, Thread Pool, Encryption |
| **MySQL Cluster CGE** | Commercial (Oracle) | ~$10 000/server/rok | Distributed multi-master cluster (NDB), telco-grade |
**Kdy GPL vadí**: Pokud embeddedujete MySQL do komerčního produktu (např. desktopová aplikace s MySQL knihovnou). Webová aplikace komunikující přes TCP/IP **není** distribuce — GPL se neuplatní.
### MariaDB licensing
| Varianta | Licence | Cena | Omezení |
|----------|---------|------|---------|
| **MariaDB Community** | GPL v2 | $0 | Stejné jako MySQL Community — GPL, ale bez Oracle licenčních rizik |
| **MariaDB Enterprise** | Business Source License (BSL) | Subscription (~$2-5k/server/rok) | Po 3 letech se automaticky mění na GPL v2. Zahrnuje enterprise funkce (ColumnStore, Spider, Xpand) |
| **MariaDB SkySQL** | Managed (BSL) | Pay-per-use (~$0.10-1.00/hod) | Fully managed DBaaS |
**Klíčový rozdíl oproti Oracle MySQL**:
- MariaDB je nezávislý fork, není pod kontrolou Oracle
- BSL model je liberálnější — po 3 letech se stává open source
- MariaDB nevyžaduje commercial licenci pro enterprise funkce (v MySQL jsou enterprise-only)
### Kdy použít něco jiného
- **Komplexní dotazy / CTE / window functions** → PostgreSQL (pokročilejší optimalizátor)
- **GIS / geoprostorová data** → PostgreSQL + PostGIS
- **Konzistence > rychlost** → PostgreSQL (SSI serializable)
- **High-throughput zápisů** → Cassandra (MySQL master bottleneck)
- **Distribuovaný SQL cluster** → CockroachDB, Vitess (MySQL kompatibilní sharding)
## Zdroje
Odkazy, knihy a standardy: [sources/databases/sources.md](sources/databases/sources.md)
### Doporučená literatura
| Kniha | Autoři | ISBN | Popis |
|-------|--------|------|-------|
| High Performance MySQL (4th ed.) | Schwartz, Zaitsev, Tkachenko | 978-1492075292 | Komplexní průvodce architekturou, optimalizací a monitoringem MySQL |
*Poslední revize: 2026-06-03*

635
NETWORKING.en.md Normal file
View File

@@ -0,0 +1,635 @@
# 🌐 Network Architecture
## Reference Model (TCP/IP)
| Layer | Protocols | Devices |
|-------|-----------|---------|
| Application | HTTP/HTTPS, DNS, SMTP, SSH | — |
| Transport | TCP, UDP | — |
| Network | IP, ICMP, BGP | Routers |
| Link | Ethernet, ARP, VLAN | Switches |
## TCP Detail
### 3-way Handshake
```
Client Server
| |
| SYN (seq=x) |
|──────────────────────────────>|
| |
| SYN+ACK (seq=y, ack=x+1) |
|<──────────────────────────────|
| |
| ACK (seq=x+1, ack=y+1) |
|──────────────────────────────>|
| |
| << established >> |
```
- **SYN** — client sends segment with SYN flag (Synchronize Sequence Number)
- **SYN-ACK** — server responds with its own SYN + acknowledgment of client's seq number
- **ACK** — client acknowledges, connection is established
- TCP Fast Open (TFO) — data in SYN packet for 0-RTT on repeated connections
### Flow Control (Sliding Window)
- **Receiver Window (rwnd)** — how much data the receiver is willing to accept
- **Sliding Window** — sender maintains a window of unacknowledged packets
- Window scaling (RFC 1323) — allows window up to 1 GB (instead of 64 KB)
- Zero Window — receiver advertises 0, sender stops, persist timer periodically tests
### Congestion Control
| Algorithm | Description | Use Case |
|-----------|-------------|----------|
| **Cubic** | Default in Linux (since kernel 2.6.19), cubic growth function | General networks, default for the Internet |
| **BBR** (Bottleneck Bandwidth and RTT) | Model-based, measures bandwidth and RTT, not packet loss | High-speed networks, YouTube, Google |
| **Reno** | Classic AIMD (Additive Increase Multiplicative Decrease) | Legacy, reference |
| **CDG** (CAIA Delay Gradient) | Delay-based, congestion detection by RTT gradient | Video streaming, real-time |
BBRv2 (2024+) — includes ECN signaling, coexistence with Cubic, better loss handling.
## DNS (Domain Name System)
- **Record types**: A, AAAA, CNAME, MX, TXT, NS, SRV, PTR, CAA, DS, DNSKEY, RRSIG, NSEC
- **DNS resolver**: recursive query through hierarchy (root → TLD → authoritative)
- **Anycast DNS** — same IP from multiple locations, routing to the nearest
- **DNS caching** — TTL control, cache poisoning protection (DNSSEC)
- **Cloud DNS** — Route53, Azure DNS, Cloud DNS
### DNS Lookup Flow (Step by Step)
```
1. User enters "api.example.com" in the browser
2. OS stub resolver checks local cache (/etc/hosts, systemd-resolved)
3. If not in cache → query to recursive resolver (ISP / 8.8.8.8 / 1.1.1.1)
4. Resolver checks its cache
5. Not found → resolver starts recursive lookup:
a. Query to root nameserver (.) → returns NS for .com
b. Query to .com TLD nameserver → returns NS for example.com
c. Query to authoritative NS for example.com → returns A record (IP)
6. Resolver stores in cache (TTL), returns IP to client
7. Client establishes TCP connection to the obtained IP
```
The whole process typically takes 10200 ms (with cache < 1 ms).
### DNSSEC Detail
- **RRSIG** — digital signature for each RRset (Resource Record Set)
- **DNSKEY** — zone public key (ZSK = Zone Signing Key, KSK = Key Signing Key)
- **DS** (Delegation Signer) — DNSKEY hash passed to the parent zone (chain of trust)
- **NSEC / NSEC3** — authenticated denial of existence (proof that a record does not exist)
- **Chain of trust**: root → .com → example.com (path from trust anchor through DS records)
```
Root DS → .com DNSKEY → .com DS → example.com DNSKEY → example.com RRSIG(A)
```
- Validation: resolver checks signatures across the entire chain up to the trust anchor
### DNS-based Service Discovery
| Mechanism | Description | Example |
|-----------|-------------|---------|
| **SRV record** | Service location (priority, weight, port, target) | `_http._tcp.example.com` |
| **Consul DNS** | Service discovery via DNS interface | `web.service.consul` |
| **CoreDNS** | Kubernetes DNS, plugin-based | `my-svc.my-namespace.svc.cluster.local` |
| **Kubernetes DNS** | Service discovery inside the cluster (kube-dns / CoreDNS) | `svc.cluster.local` |
| **mDNS** (Multicast DNS) | Zero-config, local network (Bonjour/Avahi) | `myprinter.local` |
## Load Balancing
| Type | OSI Layer | Description |
|------|-----------|-------------|
| L4 (NLB) | 4 | TCP/UDP, fast, lower latency |
| L7 (ALB) | 7 | HTTP/HTTPS, path-based routing, sticky sessions |
| Global | DNS | Geo-routing, latency-based, weighted |
### Algorithms
- Round Robin / Weighted RR
- Least Connections
- IP Hash (session persistence)
- Random
### Health Check Types
| Type | Description | Suitable For |
|------|-------------|--------------|
| **TCP health check** | TCP handshake to target port | L4 NLB, basic check |
| **HTTP health check** | HTTP GET to URL, expects 200 OK | L7 ALB, web services |
| **HTTPS health check** | HTTP + TLS handshake | Services with TLS termination |
| **gRPC health check** | gRPC Health/Check RPC (gRPC specific) | Microservices, gRPC services |
| **ICMP ping** | Ping to target IP | Basic connectivity |
### Connection Draining
- **Connection draining** (AWS) / **Deregistration delay** — when a target is removed from ASG/LB, it waits for existing connections to finish (configurable: 1-3600 s)
- **Slow start** — new target gradually receives more requests (prevents cold cache overload)
### Cross-zone Load Balancing
- **Enabled**: LB evenly distributes traffic across all AZs (even with uneven instance count)
- **Disabled**: traffic split evenly between AZs, then within AZ among instances
- AWS ALB/NLB: enabled by default (2022+), no additional charges
## Firewalls and Security
- **Stateful firewall** — tracks connection state (AWS Security Groups, Azure NSG)
- **Stateless firewall** — ACL (Network ACLs)
- **NGFW** — application layer, IPS/IDS (Palo Alto, Fortinet)
- **WAF** — web application protection (Cloudflare, AWS WAF, Azure WAF)
## Network Segmentation — Security Groups vs Network ACLs
| Property | Security Group (SG) | Network ACL (NACL) |
|----------|---------------------|---------------------|
| **State** | Stateful (automatically allows return traffic) | Stateless (explicit rule required for both directions) |
| **Level** | Instance / ENI | Subnet |
| **Rules** | Allow only | Allow and deny |
| **Evaluation** | All rules evaluated (OR) | Rules from lowest number (first match) |
| **Default** | All traffic denied (inbound), all traffic allowed (outbound) | All traffic denied (inbound and outbound) |
| **Support** | AWS, GCP (firewall rules), Azure (NSG) | AWS (NACL), GCP (firewall rules on subnet), Azure (NSG) |
### Micro-segmentation
- **Zero Trust networking** — each workload has its own security group / NGFW policy
- **Service mesh** — Istio, Linkerd, Consul Connect for L7 micro-segmentation (mTLS, authorization policies)
- **Network policies** — Kubernetes NetworkPolicy for pod-to-pod traffic segmentation
- **Tanzu / NSX** — micro-segmentation at hypervisor level
## VPN
- **Site-to-Site** — IPSec, permanent connection between sites
- **Client-to-Site** — OpenVPN, WireGuard, AnyConnect
- **Cloud VPN** — AWS VPN, Azure VPN Gateway, GCP Cloud VPN
## CDN (Content Delivery Network)
- Edge locations for caching static content
- DDoS protection
- SSL/TLS termination at edge
- Providers: CloudFront, Cloudflare, Akamai, Fastly
## BGP and Routing
- **BGP** — protocol for exchanging routes between AS (Autonomous Systems)
- **ASN** — unique network identifier
- **iBGP** — internal BGP (within AS)
- **eBGP** — external BGP (between AS)
### BGP Path Selection Algorithm
BGP router selects a single best path according to the following criteria (in priority order):
1. **WEIGHT** (Cisco-specific) — highest weight (local to router)
2. **LOCAL_PREF** — highest local preference (within AS)
3. **Originate** — prefers route originated by local router
4. **AS_PATH** — shortest AS_PATH length
5. **ORIGIN** — IGP < EGP < INCOMPLETE
6. **MED** (Multi-Exit Discriminator) — lowest MED (with same AS neighbor)
7. **eBGP > iBGP** — prefers external BGP over internal
8. **Next-hop reachable** — path to next-hop must be reachable
9. **Neighbor IP** — prefers path from router with lowest IP
10. **Router ID** — prefers path with lowest Router ID
### iBGP Full Mesh vs Route Reflectors
| Aspect | Full Mesh | Route Reflectors |
|--------|-----------|------------------|
| **Number of sessions** | n(n-1)/2 | n (each peer to RR) |
| **With 100 routers** | 4,950 sessions | 100 (with 1 RR) |
| **Scaling** | Poor (quadratic) | Linear |
| **Redundancy** | Natural | Requires multi-RR + cluster |
| **Configuration** | Simple logic | RR rules (non-transitive) |
BGP must be known for: Cloud interconnects, MPLS L3VPN, SD-WAN, Data center fabrics (VXLAN + BGP EVPN)
## VPC / Virtual Network Architecture
```
Internet ──┬── Internet Gateway (IGW)
┌──────▼──────┐
│ Public Subnet │
│ ┌──────────┐ │
│ │ ALB/NAT │ │
│ └────┬─────┘ │
└───────┼────────┘
┌───────▼────────┐
│ Private Subnet │
│ ┌──────────┐ │
│ │ App │ │
│ └────┬─────┘ │
└───────┼─────────┘
┌───────▼─────────┐
│ Data Subnet │
│ ┌────────────┐ │
│ │ Database │ │
│ └────────────┘ │
└──────────────────┘
```
### VPC Design Patterns
**Three-tier architecture**
- Web tier (public subnets) → ALB
- App tier (private subnets) → auto-scaling
- Data tier (private subnets) → RDS / self-managed DB
- NAT Gateway / Instance in public subnet for outbound traffic from app/data tier
**VPC Peering**
- Direct connection between two VPCs (same or cross-account)
- Transitive peering is **not** supported (A→B, B→C does not imply A→C)
- Use cases: sharing resources (LDAP, monitoring), service endpoints
**Transit Gateway**
- Hub-and-spoke topology, transitive routing
- Supports: VPC, VPN, Direct Connect, peering between TGWs
- Route tables per attachment — environment isolation
- AWS TGW: 50 Gbps per attachment, up to 5000 attachments
**PrivateLink / VPC Endpoint**
- Private access to services without IGW, NAT, VPC peering
- **Interface Endpoint** (ENI in subnet) — for AWS services, SaaS
- **Gateway Endpoint** (S3, DynamoDB) — route table entry, free
- **AWS PrivateLink** — Service Consumer ↔ NLB/ENI ↔ Service Provider
## MTU, Jumbo Frames, PMTUD
| Network | Standard MTU | Jumbo Frames |
|---------|--------------|--------------|
| Ethernet | 1500 B | 9001 B (AWS: 9001, Azure: 1400→9000) |
| GRE tunnel | 1476 B | — |
| PPPoE | 1492 B | — |
| VLAN (802.1Q) | 1496 B | — |
| VXLAN | N/A (inner 1500 + 50) | 8950 B |
**PMTUD** (Path MTU Discovery)
- Sets DF (Don't Fragment) bit in IP header
- If path requires fragmentation → ICMP "Fragmentation Needed" (Type 3, Code 4)
- Decreases MTU until packet passes
- Common problem: ICMP blocked by firewall → black hole (TCP connection hangs)
- **Workaround**: MSS clamping (TCP MSS = MTU - 40)
**Jumbo Frames Use Cases**
- NFS / SMB (NAS)
- iSCSI / NVMe-oF (SAN)
- HPC / MPI workloads
- Data replication (DB, DRBD)
- Amazon EFS, AWS Managed Streaming for Kafka
## Anycast vs Unicast vs Multicast
| Type | Description | Example |
|------|-------------|---------|
| **Unicast** | 1:1 — one source, one destination | Regular TCP/IP traffic |
| **Multicast** | 1:N — one source, group of receivers | IPTV, mDNS, VXLAN BUM traffic |
| **Anycast** | 1:1 from nearest — same IP from multiple locations | DNS (8.8.8.8, 1.1.1.1), Cloudflare |
| **Broadcast** | 1:ALL — all devices on the network | ARP, DHCP (limited to L2 broadcast domain) |
Anycast detail:
- Same IP prefix is announced from multiple locations (BGP)
- Traffic goes to the topologically nearest node (BGP path selection)
- **Advantages**: simple redundancy, DDoS absorption, lower latency
- **Challenges**: connection persistence (TCP), stateful anycast, routing convergence
- **Cloud**: Route53, CloudFront, Cloudflare, Google DNS
## Cloud Networking Resilience (2026)
See also: [CLOUD.en.md](CLOUD.en.md) — cloud architecture, multi-AZ, hybrid cloud connectivity.
### Cell-based Architectures
- Isolate fault domain into "cells" (group of AZ + services)
- Each cell independently deployable, own DB, own LB
- Limit blast radius: failure of one cell does not affect others
- Implementation: AWS Cell-based architecture, Azure STAG (Scale Tier Availability Group)
### DNS Resilience
- **Anycast DNS** — same IP from multiple regions, routing to the nearest
- **DNS failover** — health checks automatically remove unavailable endpoints
- **Multi-DNS provider** — Route53 + Cloudflare + UltraDNS to eliminate SPOF
### Traffic Engineering
- **BGP optimization** — AS path prepend, MED, local pref for controlling inbound/outbound traffic
- **Global Load Balancing** — GSLB at DNS level (latency-based, geo-proximity, weighted)
- **AIOps** — ML-based traffic pattern prediction and automatic scaling
### New Trends
- **Path Aware Networking** — applications choose the network path based on current conditions
- **Segment Routing (SR-MPLS / SRv6)** — MPLS simplification, programmable paths
- **Zero Trust Networking** — micro-segmentation, identity-based access, never trust / always verify
## Advanced Topics from Books
### TCP/IP Illustrated (Stevens, ISBN 978-0321336316)
Key architectural principles according to the book:
- **End-to-End Argument** — correctness and completeness of communication can only be ensured at the application level, not at lower layers. The network should be "dumb", end stations "smart".
- **Fate Sharing** — all state necessary to maintain active communication must be stored at the endpoints, not in the network.
- **Layering** — hierarchical layering of protocols per the OSI model; each layer encapsulates the PDU from the higher layer and adds its own header.
- **Multiplexing/Demultiplexing** — protocols at the same layer coexist thanks to identifiers (IP proto, TCP/UDP port).
- **Sliding Window** — efficient link utilization under high latency (window size = bandwidth × RTT).
The book covers the entire TCP/IP stack from the link layer (Ethernet, ARP, PPP) through IP, ICMP, DHCP, NAT, DNS, UDP, TCP (connection management, timeout, retransmission, congestion control, keepalive) to applications (SNMP, Telnet, FTP, SMTP, NFS, HTTP).
### AI Data Center Network Design (Subramaniam, ISBN 978-0-13-543628-8)
Comprehensive, vendor-agnostic guide to designing network infrastructure for AI clusters.
**Key Concepts:**
- **Rail-Optimized Design (ROD)** — connecting GPUs across racks along "rails", each rail forms an independent network for all-reduce communication. Minimizes latency for synchronous training.
- **Rail-Unified Design (RUD)** — shared network fabric for all GPUs, more flexible but higher demands on load balancing.
- **RoCEv2 (RDMA over Converged Ethernet)** — primary transport for AI clusters: requires lossless fabric (ECN, PFC, DCQCN, SFC, CSIG).
- **Load Balancing for AI** — ECMP is insufficient, requires dynamic/global load balancing (DLB/GLB), flowlet-based rebalancing, per-packet spraying.
- **Topologies** — Clos (3-stage/5-stage), Dragonfly, Torus for scaling to tens of thousands of GPUs.
- **Ultra Ethernet Consortium (UEC)** — new standard for Ethernet in AI clusters (2025+), addresses RoCEv2 limitations.
- **Storage for AI** — NVMe-oF, GPUDirect Storage, parallel file systems for checkpointing and dataset loading.
- **KPIs** — Job Completion Time (JCT), tail latency, fabric utilization, PFC storm detection.
### Cloud Networking and Resilience (Critelli, ISBN 979-8868824357)
Practical guide to building resilient cloud networks (Apress, 2026). The author is EMEA Lead for Networking & Resilience at AWS.
**Layered Approach to Resilience (per OSI model):**
| Layer | Measures |
|-------|----------|
| L1 (Physical) | Redundant connections, diverse fibre paths, DWDM |
| L2 (Link) | MLAG, LACP, spanning-tree fast convergence |
| L3 (Network) | BGP multi-homing, AS path prepend, Anycast |
| L4 (Transport) | Connection draining, slow start, health checks |
| L7 (Application) | DNS failover, global load balancing, cell-based architectures |
**Regulatory Frameworks:** DORA (Digital Operational Resilience Act), NIS2 — require regular resilience testing, chaos engineering, business continuity plans.
**AIOps in Resilience:** ML-based traffic pattern prediction, automatic scaling, proactive fault detection (transition from reactive monitoring to predictive prevention).
### Zero Trust in Resilient Cloud and Network Architectures (Halley et al., ISBN 978-0-13-820460-0)
Cisco Press — practical guide for deploying Zero Trust architectures in hybrid and cloud environments.
**Implementation Framework:**
- **User and Device Trust** — verification of both user and device identity before granting access (SSE — Security Service Edge).
- **Application Access Policies** — granular rules at the application level, not IP addresses.
- **Greenfield vs Brownfield** — new networks built as Zero Trust from the ground up vs. migration of existing infrastructure.
- **Automation** — Terraform, Ansible for provisioning; Meraki, EVPN, Pub/Sub telemetry.
- **Industrial Zero Trust** — extending the concept to OT/ICS environments.
- **Quantum Security** — preparation for post-quantum cryptography in network architectures.
### The Segmentation Blueprint (Kulkarni, ISBN 978-0-13-546236-2)
Cisco Press (2026) — pragmatic guide to network segmentation from VLAN to nanosegmentation.
**Evolution of Segmentation:**
| Generation | Technology | Scope |
|------------|------------|-------|
| Traditional | VLAN, ACL, firewall | Subnet |
| Micro-segmentation | Security Groups, Network Policies | Workload / instance |
| Nanosegmentation | Service mesh (Istio, Linkerd), mTLS | Application / API / process |
**Segmentation Maturity Model:**
1. **Initial** — flat network, no segmentation
2. **Basic** — VLANs, firewall between environments
3. **Defined** — Security Groups, service access policies
4. **Managed** — Micro-segmentation, Network Policies, EVPN
5. **Optimized** — Nanosegmentation, service mesh, Zero Trust, AI-driven policy management
**Key Metric:** Blast radius — how many workloads are compromised when one node is breached. Goal is reduction to a minimum.
### Segment Routing for SP and Enterprise Networks (Deragisch, ISBN 978-0-13-823101-9)
Cisco Press (2024) — comprehensive guide to Segment Routing for both MPLS and IPv6 data plane.
**SR-MPLS vs SRv6:**
| Property | SR-MPLS | SRv6 |
|----------|---------|------|
| SID length | 20 bit (MPLS label) | 128 bit (IPv6 address) |
| Data plane | MPLS | IPv6 + SRH (Segment Routing Header) |
| Signaling | IGP (IS-IS/OSPF) extensions | IGP + BGP extensions |
| Maturity | Mature, widely deployed | Emerging, standardization complete |
| Use case | SP networks, MPLS migration | Cloud, DC, 5G, end-to-end programmability |
**Advantages of SR over classic MPLS:**
- Elimination of LDP/RSVP-TE (signaling is part of IGP)
- Traffic engineering state moved from nodes to packet headers (source routing)
- Fast reroute (FRR) without additional protocols
- Egress Peer Engineering (EPE) — selection of AS exit point
- Micro-loop avoidance during convergence
**Migration Strategies:** Greenfield (new network), Brownfield (gradual migration from MPLS), "SR in a box" — combination of SR and LDP.
### Understanding and Designing Azure Networking (Stuart, Moreno, 2025)
Practical guide to designing Azure networks by two Microsoft Solution Engineers (former CCIEs).
**Key Topics:**
| Area | Key Services and Concepts |
|------|---------------------------|
| **Topologies** | Hub-and-spoke, Virtual WAN, multi-hub designs, Azure Route Server |
| **Hybrid Connectivity** | ExpressRoute, VPN Gateway, SD-WAN integration |
| **Multi-cloud** | Azure ↔ AWS/GCP, cross-cloud fabrics |
| **Security** | NSG, Azure Firewall, DDoS Protection, WAF, AVNM, ZTNA |
| **DNS & PaaS** | Private Link, Private DNS Zones, Private Resolver, hybrid DNS forwarding |
| **Application Delivery** | Azure Load Balancer, App Gateway, Front Door, Traffic Manager |
| **Monitoring** | Network Watcher, Traffic Analytics, Azure Monitor, Policy-as-code |
**Design Decision Framework:** Gather requirements → analyze constraints → select topology → implement → monitor.
### Mastering Next-Gen Juniper Data Centers (Chatterjee, ISBN 978-0-13-533636-6)
Addison-Wesley (2026) — hands-on guide to EVPN VXLAN fabrics on Juniper devices.
**Key Architectures:**
- **EVPN VXLAN fabric** — multi-tenant overlay networks with BGP EVPN control plane and VXLAN data plane.
- **Multivendor interoperability** — detailed procedures for EVPN across Juniper, Cisco NX-OS, Arista EOS.
- **Multicast in EVPN VXLAN** — intra-subnet and inter-subnet multicast design (IGMP/MLD proxying, PIM, EVPN Type-6/7 routes).
- **Day-2 operations** — Juniper Apstra for automation (Terraform provider), telemetry (gNMI, OpenConfig).
- **Service chaining** — connecting NGFW, load balancers within the fabric.
- **DCI with EVPN** — Over-the-Top (OTT) and Integrated Interconnect (VXLAN stitching, MPLS transit).
**Evolution from previous book (Deploying Juniper Data Centers with EVPN VXLAN, 2024):** Expansion with advanced topics — multicast, interoperability, Apstra Day-2, observability stack.
### Intelligent Cloud Networking: AI-Driven Resource Management (Yadav, ISBN 9364220110)
Intersection of AI/ML and cloud network management (Addition Publishing, 2026).
**AI Applications in Network Management:**
| Area | Technique | Benefit |
|------|-----------|---------|
| **Flow prediction** | LSTM, Transformer | Traffic pattern prediction, proactive scaling |
| **Flow classification** | CNN, RL | Traffic type identification for QoS |
| **Load balancing** | DRL (Deep RL) | Dynamic load distribution, congestion reduction |
| **Resource management** | Q-learning, DQN | Optimization of CPU/memory/network allocation |
| **Routing optimization** | DRL, GNN | Adaptive routing based on current conditions |
| **Congestion control** | ML-based CC | Predictive congestion control (instead of reacting to loss) |
| **Anomaly detection** | Autoencoders, Isolation Forest | Real-time attack and anomaly detection |
| **Blockchain security** | Smart contracts | Decentralized access control, audit trail |
**Technology Trends:**
- **Ultra Ethernet Consortium (UEC)** — next-generation Ethernet for AI, lossless fabric, telemetry, adaptive routing.
- **Path Aware Networking** — applications choose path based on current conditions (latency, loss, cost).
- **Self-optimizing networks** — closed loop: telemetry → AI analysis → automatic action → feedback.
## OpenStack Networking (Neutron)
OpenStack Neutron is an SDN framework for managing virtual networks in a multi-tenant environment. Supports VLAN, VXLAN, GRE tunnels, security groups, QoS, and LBaaS (Octavia).
### Backends
| Backend | Description | Suitable For |
|---------|-------------|--------------|
| **OVN (Open Virtual Network)** | Native OpenFlow/OVSDB backend; replaces OVS+agent architecture | Production, scalable deployments |
| **OVS (Open vSwitch)** | Classic agent-based backend (neutron-openvswitch-agent) | Small deployments, legacy |
| **Linux Bridge** | Simple backend without OVS | Development, testing, embedded |
| **Hyper-V** | Windows Server backend | Hybrid environments |
### Important Concepts
- **Networks, Subnets, Ports** — basic network objects
- **Routers** — L3 forwarding between tenant networks (DVR for distributed routing)
- **Security Groups** — stateful firewall rules at the port level
- **Floating IPs** — public IPs mapped to instances (1:1 NAT)
- **LBaaS / Octavia** — load balancing as a service (HAProxy, amphora)
- **Trunk ports** — VLAN tagging for instances (parent + subports)
### Performance Tuning
- **DPDK** — userspace packet processing, bypass kernel, lower latency
- **SR-IOV** — passthrough VF to instance, minimal hypervisor overhead
- **NUMA pinning** — vCPU/memory/NIC affinity for compute instances
- **Hardware offload** — OVS TC Flower, ASAP²
### Use Cases
- Multi-tenant cloud (public and private)
- Telco/NFVI (DPDK, SR-IOV, low-latency)
- SDN lab / network function virtualization
## Zero Trust Networking
Zero Trust is a "never trust, always verify" security model — no entity is implicitly trusted, regardless of its location in the network.
### Principles (NIST SP 800-207)
1. **All resources are external** — there is no trusted internal network
2. **Least privilege** — access only to necessary resources
3. **Micro-segmentation** — workload isolation at the individual process/container level
4. **Encrypt everything** — TLS/mTLS for all communication
5. **Continuous verification** — every request is authenticated and authorized
6. **Dynamic policies** — rules change based on context (user, device, location, time)
### Implementation Layers
| Layer | Technology | Description |
|-------|------------|-------------|
| **Identity** | OIDC, SAML, LDAP | User and device authentication |
| **Device** | MDM, UEM, device certificates | Device state verification (compliant, patch level) |
| **Network** | Micro-segmentation, firewall, SDN | Traffic isolation between workloads |
| **Application** | mTLS, service mesh, API gateway | Applications enforce mutual authentication |
| **Data** | Encryption at rest + in transit, DLP | Data protection regardless of location |
| **Analytics** | SIEM, UEBA, AI/ML | Real-time anomaly and threat detection |
### Tools and Platforms
| Category | Tools |
|----------|-------|
| **ZTNA (Zero Trust Network Access)** | Cloudflare Access, Zscaler, Netskope, Palo Alto Prisma |
| **Service Mesh** | Istio, Linkerd, Consul Connect, Cilium |
| **Micro-segmentation** | VMware NSX, Illumio, Guardicore, Akamai |
| **BeyondCorp** | Google BeyondCorp (open-source: BeyondCorp Alliance) |
| **Security Service Edge (SSE)** | SWG, CASB, ZTNA in one (Zscaler, Netskope, Cloudflare) |
### Zero Trust in the Data Center
In a private DC, Zero Trust is deployed via:
- **EVPN VXLAN** — overlay network with tenant isolation
- **Network Policies** (Kubernetes) — per-pod firewall rules
- **Cilium** — eBPF-based L3/L7 policy enforcement
- **WireGuard / IPsec** — encryption between nodes
- **HashiCorp Boundary** — identity-based access to servers without bastion host
## Best Practices
- **Network segmentation** — separate environments (dev/staging/prod), tiers (web/app/db)
- **Least privilege access** — security groups allow only necessary traffic
- **Monitoring** — VPC Flow Logs, netflow, sFlow
- **Redundancy** — multi-AZ, multi-region for critical services
- **Encryption in transit** — TLS everywhere, mTLS for service-to-service
- **DDoS protection** — AWS Shield, Azure DDoS Protection, Cloudflare
## Resources
Links, books and standards: [sources/networking/sources.en.md](sources/networking/sources.en.md)
- **MTU alignment** — consistent MTU across the entire path, check ICMP blocking for PMTUD
- **IP planning** — RFC 1918 (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16), avoid overlaps for peering
## TLS Detail
### TLS 1.3 Handshake (1-RTT)
```
Client Server
| |
| ClientHello (key_share, sig_algs) |
|─────────────────────────────────────────>|
| |
| ServerHello + EncryptedExtensions |
| + Certificate + CertificateVerify |
| + Finished |
|<─────────────────────────────────────────|
| |
| Finished |
|─────────────────────────────────────────>|
| |
| << Application Data >> |
```
- **0-RTT** (early data) — client sends data immediately with the first message (on repeated connection with PSK)
- 0-RTT risk: replay attacks (HTTP GET is safe, POST requires protection)
- Compared to TLS 1.2: removed obsolete ciphers, AEAD required, faster handshake (2 RTT → 1 RTT)
### Cipher Suites
| Suite | Key exchange | Auth | Encryption | MAC | Status |
|-------|-------------|------|-----------|-----|--------|
| `TLS_AES_128_GCM_SHA256` | (EC)DHE | (EC)DHE | AES-128-GCM | AEAD | TLS 1.3 default |
| `TLS_AES_256_GCM_SHA384` | (EC)DHE | (EC)DHE | AES-256-GCM | AEAD | Higher security |
| `TLS_CHACHA20_POLY1305_SHA256` | (EC)DHE | (EC)DHE | ChaCha20-Poly1305 | AEAD | Mobile / no AES-NI |
| `ECDHE-ECDSA-AES128-GCM-SHA256` | ECDHE | ECDSA | AES-128-GCM | AEAD | TLS 1.2 (PFS) |
| `ECDHE-RSA-AES128-GCM-SHA256` | ECDHE | RSA | AES-128-GCM | AEAD | TLS 1.2 (PFS) |
PFS (Perfect Forward Secrecy) — compromising the private key cannot decrypt previously captured traffic (ECDHE + ephemeral keys).
### Certificate Chain Validation
```
1. Client receives certificate chain from server
2. Validation:
a. Date: certificate is not expired and is valid (notBefore, notAfter)
b. CRL / OCSP: certificate is not revoked (OCSP stapling to reduce latency)
c. Signature chain: each cert's signature in the chain is verified with the issuer's public key
d. Root CA: the last cert in the chain is trusted (in client's trust store)
3. CN / SAN: domain name in the certificate must match the target domain
```
Typical chain: `Leaf Cert → Intermediate CA → Root CA` (self-signed, in trust store).
*Last revised: 2026-06-03*

635
NETWORKING.md Normal file
View File

@@ -0,0 +1,635 @@
# 🌐 Síťová architektura
## Referenční model (TCP/IP)
| Vrstva | Protokoly | Zařízení |
|--------|-----------|----------|
| Aplikační | HTTP/HTTPS, DNS, SMTP, SSH | — |
| Transportní | TCP, UDP | — |
| Síťová | IP, ICMP, BGP | Routery |
| Linková | Ethernet, ARP, VLAN | Switche |
## TCP detail
### 3-way handshake
```
Client Server
| |
| SYN (seq=x) |
|──────────────────────────────>|
| |
| SYN+ACK (seq=y, ack=x+1) |
|<──────────────────────────────|
| |
| ACK (seq=x+1, ack=y+1) |
|──────────────────────────────>|
| |
| << established >> |
```
- **SYN** — client odešle segment s příznakem SYN (Synchronize Sequence Number)
- **SYN-ACK** — server odpoví vlastním SYN + potvrzením clientova seq čísla
- **ACK** — client potvrdí, spojení je navázáno
- TCP Fast Open (TFO) — data v SYN paketu pro 0-RTT u opakovaných spojení
### Flow control (sliding window)
- **Receiver Window (rwnd)** — kolik dat je receiver ochoten přijmout
- **Sliding window** — sender udržuje okno nepotvrzených paketů
- Window scaling (RFC 1323) — umožňuje okno až 1 GB (místo 64 KB)
- Zero Window — receiver oznámí 0, sender zastaví, persist timer periodicky testuje
### Congestion control
| Algoritmus | Popis | Use case |
|-----------|-------|----------|
| **Cubic** | Výchozí v Linuxu (od kernel 2.6.19), kubická growth funkce | Obecné sítě, výchozí pro Internet |
| **BBR** (Bottleneck Bandwidth and RTT) | Model-based, měří bandwidth a RTT, ne packet loss | Vysokorychlostní sítě, YouTube, Google |
| **Reno** | Classic AIMD (Additive Increase Multiplicative Decrease) | Legacy, reference |
| **CDG** (CAIA Delay Gradient) | Delay-based, detekce congeste podle RTT gradientu | Videostreaming, real-time |
BBRv2 (2024+) — zahrnuje ECN signalizaci, koexistenci s Cubic, lepší handling při loss.
## DNS (Domain Name System)
- **Record typy**: A, AAAA, CNAME, MX, TXT, NS, SRV, PTR, CAA, DS, DNSKEY, RRSIG, NSEC
- **DNS resolver**: rekurzivní dotaz přes hierarchii (root → TLD → authoritative)
- **Anycast DNS** — stejná IP z více lokací, směrování k nejbližší
- **DNS caching** — TTL řízení, cache poisoning ochrana (DNSSEC)
- **Cloud DNS** — Route53, Azure DNS, Cloud DNS
### DNS lookup flow (krok za krokem)
```
1. Uživatel zadá "api.example.com" do prohlížeče
2. OS stub resolver zkontroluje lokální cache (/etc/hosts, systemd-resolved)
3. Pokud není v cache → dotaz na rekurzivní resolver (ISP / 8.8.8.8 / 1.1.1.1)
4. Resolver zkontroluje svou cache
5. Nenalezeno → resolver začne rekurzivní lookup:
a. Dotaz na root nameserver (.) → vrátí NS pro .com
b. Dotaz na .com TLD nameserver → vrátí NS pro example.com
c. Dotaz na autoritativní NS pro example.com → vrátí A záznam (IP)
6. Resolver uloží do cache (TTL), vrátí IP clientu
7. Client naváže TCP spojení na získanou IP
```
Celý proces typicky trvá 10200 ms (s cache < 1 ms).
### DNSSEC detail
- **RRSIG** — digitální podpis pro každý RRset (Resource Record Set)
- **DNSKEY** — veřejný klíč zóny (ZSK = Zone Signing Key, KSK = Key Signing Key)
- **DS** (Delegation Signer) — hash DNSKEY předávaný do parent zóny (řetěz důvěry)
- **NSEC / NSEC3** — authenticated denial of existence (důkaz, že záznam neexistuje)
- **Chain of trust**: root → .com → example.com (cesta od trust anchor přes DS recordy)
```
Root DS → .com DNSKEY → .com DS → example.com DNSKEY → example.com RRSIG(A)
```
- Validace: resolver zkontroluje podpisy přes celý řetěz až k trust anchor
### DNS-based service discovery
| Mechanismus | Popis | Příklad |
|------------|-------|---------|
| **SRV record** | Service location (priority, weight, port, target) | `_http._tcp.example.com` |
| **Consul DNS** | Service discovery přes DNS rozhraní | `web.service.consul` |
| **CoreDNS** | Kubernetes DNS, plugin-based | `my-svc.my-namespace.svc.cluster.local` |
| **Kubernetes DNS** | Service discovery uvnitř clusteru (kube-dns / CoreDNS) | `svc.cluster.local` |
| **mDNS** (Multicast DNS) | Zero-config, lokální síť (Bonjour/Avahi) | `myprinter.local` |
## Load Balancing
| Typ | Vrstva (OSI) | Popis |
|-----|-------------|-------|
| L4 (NLB) | 4 | TCP/UDP, rychlý, nižší latence |
| L7 (ALB) | 7 | HTTP/HTTPS, path-based routing, sticky sessions |
| Global | DNS | Geo-routing, latency-based, weighted |
### Algoritmy
- Round Robin / Weighted RR
- Least Connections
- IP Hash (session persistence)
- Random
### Health check typy
| Typ | Popis | Vhodné pro |
|-----|-------|-----------|
| **TCP health check** | TCP handshake na cílový port | L4 NLB, základní check |
| **HTTP health check** | HTTP GET na URL, očekává 200 OK | L7 ALB, webové služby |
| **HTTPS health check** | HTTP + TLS handshake | Služby s TLS terminací |
| **gRPC health check** | gRPC Health/Check RPC (gRPC specific) | Microservices, gRPC služby |
| **ICMP ping** | Ping na cílovou IP | Základní konektivita |
### Connection draining
- **Connection draining** (AWS) / **Deregistration delay** — při odebrání targetu z ASG/LB se počká, až existující spojení skončí (configurable: 1-3600 s)
- **Slow start** — nový target dostává postupně více requestů (zabrání přetížení cold cache)
### Cross-zone load balancing
- **Enabled**: LB rovnoměrně rozděluje traffic napříč všemi AZ (i nerovnoměrný počet instancí)
- **Disabled**: traffic rozdělen rovnoměrně mezi AZ, pak v rámci AZ mezi instance
- AWS ALB/NLB: enabled by default (2022+), bez dalších poplatků
## Firewally a bezpečnost
- **Stateful firewall** — sleduje stav spojení (AWS Security Groups, Azure NSG)
- **Stateless firewall** — ACL (Network ACLs)
- **NGFW** — aplikační vrstva, IPS/IDS (Palo Alto, Fortinet)
- **WAF** — ochrana web aplikací (Cloudflare, AWS WAF, Azure WAF)
## Network segmentation — Security Groups vs Network ACLs
| Vlastnost | Security Group (SG) | Network ACL (NACL) |
|-----------|-------------------|-------------------|
| **State** | Stateful (automaticky povoluje return traffic) | Stateless (nutné explicitní pravidlo pro oba směry) |
| **Úroveň** | Instance / ENI | Subnet |
| **Pravidla** | Povolující (allow only) | Povolující i zakazující (allow + deny) |
| **Vyhodnocení** | Všechna pravidla se vyhodnotí (OR) | Pravidla od nejnižšího čísla (first match) |
| **Default** | All traffic denied (inbound), all traffic allowed (outbound) | All traffic denied (inbound i outbound) |
| **Podpora** | AWS, GCP (firewall rules), Azure (NSG) | AWS (NACL), GCP (firewall rules na subnet), Azure (NSG) |
### Mikrosegmentace
- **Zero Trust networking** — každý workload má vlastní security group / NGFW policy
- **Service mesh** — Istio, Linkerd, Consul Connect pro L7 mikrosegmentaci (mTLS, authorization policies)
- **Network policies** — Kubernetes NetworkPolicy pro segmentaci pod-to-pod trafficu
- **Tanzu / NSX** — micro-segmentation na hypervisor úrovni
## VPN
- **Site-to-Site** — IPSec, trvalé spojení mezi lokalitami
- **Client-to-Site** — OpenVPN, WireGuard, AnyConnect
- **Cloud VPN** — AWS VPN, Azure VPN Gateway, GCP Cloud VPN
## CDN (Content Delivery Network)
- Edge lokace pro cachování statického obsahu
- DDoS ochrana
- SSL/TLS termination na edge
- Poskytovatelé: CloudFront, Cloudflare, Akamai, Fastly
## BGP a routing
- **BGP** — protokol pro výměnu rout mezi AS (autonomními systémy)
- **ASN** — unikátní identifikátor sítě
- **iBGP** — interní BGP (uvnitř AS)
- **eBGP** — externí BGP (mezi AS)
### BGP path selection algoritmus
BGP router vybírá jedinou nejlepší cestu podle následujících kritérií (v pořadí priority):
1. **WEIGHT** (Cisco-specific) — nejvyšší weight (local to router)
2. **LOCAL_PREF** — nejvyšší local preference (v rámci AS)
3. **Originate** — preferuje route originovanou lokálním routerem
4. **AS_PATH** — nejkratší AS_PATH length
5. **ORIGIN** — IGP < EGP < INCOMPLETE
6. **MED** (Multi-Exit Discriminator) — nejnižší MED (při stejném AS souseda)
7. **eBGP > iBGP** — preferuje externí BGP před interním
8. **Next-hop reachable** — cesta k next-hop musí být dosažitelná
9. **Neighbor IP** — preferuje cestu od routeru s nejnižší IP
10. **Router ID** — preferuje cestu s nejnižším Router ID
### iBGP full mesh vs Route Reflectors
| Aspekt | Full mesh | Route reflectors |
|--------|-----------|-----------------|
| **Počet session** | n(n-1)/2 | n (každý peer k RR) |
| **Při 100 routerech** | 4 950 session | 100 (při 1 RR) |
| **Škálování** | Špatné (kvadratické) | Lineární |
| **Redundance** | Přirozená | Nutné multi-RR + cluster |
| **Konfigurace** | Jednoduchá logika | RR pravidel (non-transitive) |
BGP nutné znát pro: Cloud interconnects, MPLS L3VPN, SD-WAN, Data center fabrics (VXLAN + BGP EVPN)
## Architektura VPC / Virtual Network
```
Internet ──┬── Internet Gateway (IGW)
┌──────▼──────┐
│ Public Subnet │
│ ┌──────────┐ │
│ │ ALB/NAT │ │
│ └────┬─────┘ │
└───────┼────────┘
┌───────▼────────┐
│ Private Subnet │
│ ┌──────────┐ │
│ │ App │ │
│ └────┬─────┘ │
└───────┼─────────┘
┌───────▼─────────┐
│ Data Subnet │
│ ┌────────────┐ │
│ │ Database │ │
│ └────────────┘ │
└──────────────────┘
```
### VPC design patterns
**Three-tier architecture**
- Web tier (public subnets) → ALB
- App tier (private subnets) → auto-scaling
- Data tier (private subnets) → RDS / self-managed DB
- NAT Gateway / Instance v public subnet pro outbound traffic z app/data tier
**VPC Peering**
- Přímé spojení mezi dvěma VPC (same nebo cross-account)
- Transitive peering **není** podporován (A→B, B→C neznamená A→C)
- Případy: sharing resources (LDAP, monitoring), service endpoints
**Transit Gateway**
- Hub-and-spoke topologie, transitive routing
- Podporuje: VPC, VPN, Direct Connect, peering mezi TGW
- Route tables per attachment — izolace environmentů
- AWS TGW: 50 Gbps per attachment, až 5000 attachments
**PrivateLink / VPC Endpoint**
- Privátní přístup k službám bez IGW, NAT, VPC peering
- **Interface Endpoint** (ENI v subnet) — pro AWS services, SaaS
- **Gateway Endpoint** (S3, DynamoDB) — route table entry, zdarma
- **AWS PrivateLink** — Service Consumer ↔ NLB/ENI ↔ Service Provider
## MTU, jumbo frames, PMTUD
| Síť | Standardní MTU | Jumbo frames |
|-----|---------------|--------------|
| Ethernet | 1500 B | 9001 B (AWS: 9001, Azure: 1400→9000) |
| GRE tunel | 1476 B | — |
| PPPoE | 1492 B | — |
| VLAN (802.1Q) | 1496 B | — |
| VXLAN | N/A (inner 1500 + 50) | 8950 B |
**PMTUD** (Path MTU Discovery)
- Nastaví DF (Don't Fragment) bit v IP hlavičce
- Pokud cesta vyžaduje fragmentaci → ICMP "Fragmentation Needed" (Type 3, Code 4)
- Snižuje MTU, dokud paket neprojde
- Častý problém: ICMP blokovaný firewallem → black hole (TCP connection hangs)
- **Workaround**: MSS clamping (TCP MSS = MTU - 40)
**Jumbo frames use cases**
- NFS / SMB (NAS)
- iSCSI / NVMe-oF (SAN)
- HPC / MPI workloads
- Data replication (DB, DRBD)
- Amazon EFS, AWS Managed Streaming pro Kafka
## Anycast vs Unicast vs Multicast
| Typ | Popis | Příklad |
|-----|-------|---------|
| **Unicast** | 1:1 — jeden source, jeden destination | Běžný TCP/IP provoz |
| **Multicast** | 1:N — jeden source, skupina receiverů | IPTV, mDNS, VXLAN BUM traffic |
| **Anycast** | 1:1 z nejbližšího — stejná IP z více lokací | DNS (8.8.8.8, 1.1.1.1), Cloudflare |
| **Broadcast** | 1:VŠICHNI — všechna zařízení v síti | ARP, DHCP (omezeno na L2 broadcast doménu) |
Anycast detail:
- Stejná IP prefix je anunciována z více lokací (BGP)
- Traffic jde k topologicky nejbližšímu uzlu (BGP path selection)
- **Výhody**: jednoduchá redundance, DDoS absorpce, nižší latence
- **Výzvy**: connection persistence (TCP), stateful anycast, routing convergence
- **Cloud**: Route53, CloudFront, Cloudflare, Google DNS
## Cloud networking resilience (2026)
Viz také: [CLOUD.md](CLOUD.md) — cloud architektura, multi-AZ, hybrid cloud konektivita.
### Cell-based architektury
- Izolace fault domain do "cell" (skupina AZ + services)
- Každá cell samostatně deploysovatelná, vlastní DB, vlastní LB
- Limit blast radius: selhání jedné cell neovlivní ostatní
- Implementace: AWS Cell-based architecture, Azure STAG (Scale Tier Availability Group)
### DNS resilience
- **Anycast DNS** — stejná IP z více regionů, směrování k nejbližšímu
- **DNS failover** — health checky automaticky odstraňují nedostupné endpointy
- **Multi-DNS provider** — Route53 + Cloudflare + UltraDNS pro eliminaci SPOF
### Traffic engineering
- **BGP optimization** — AS path prepend, MED, local pref pro řízení vstupního/výstupního trafficu
- **Global Load Balancing** — GSLB na DNS úrovni (latency-based, geo-proximity, weighted)
- **AIOps** — ML-based predikce traffic patternů a automatické škálování
### Nové trendy
- **Path Aware Networking** — aplikace si vybírá cestu sítí podle aktuálních podmínek
- **Segment Routing (SR-MPLS / SRv6)** — zjednodušení MPLS, programovatelné cesty
- **Zero Trust Networking** — mikrosegmentace, identity-based access, never trust / always verify
## Rozšířená témata podle knih
### TCP/IP Illustrated (Stevens, ISBN 978-0321336316)
Klíčové architektonické principy dle knihy:
- **End-to-End Argument** — korektnost a kompletnost komunikace může být zajištěna pouze na úrovni aplikace, nikoliv nižších vrstev. Síť má být "hloupá", koncové stanice "chytré".
- **Fate Sharing** — veškerý stav nutný k udržení aktivní komunikace musí být uložen na koncových bodech (endpoints), nikoliv v síti.
- **Layering** — hierarchické vrstvení protokolů dle OSI modelu; každá vrstva zapouzdřuje PDU z vyšší vrstvy a přidává vlastní hlavičku.
- **Multiplexing/Demultiplexing** — protokoly na stejné vrstvě koexistují díky identifikátorům (IP proto, TCP/UDP port).
- **Sliding window** — efektivní využití linky při vysoké latenci (window size = bandwidth × RTT).
Kniha pokrývá celý TCP/IP stack od linkové vrstvy (Ethernet, ARP, PPP) přes IP, ICMP, DHCP, NAT, DNS, UDP, TCP (connection management, timeout, retransmission, congestion control, keepalive) až po aplikace (SNMP, Telnet, FTP, SMTP, NFS, HTTP).
### AI Data Center Network Design (Subramaniam, ISBN 978-0-13-543628-8)
Komplexní, vendor-agnostický průvodce návrhem síťové infrastruktury pro AI clustery.
**Klíčové koncepty:**
- **Rail-Optimized Design (ROD)** — propojení GPU napříč racky po "kolejích" (rail), každá kolej tvoří nezávislou síť pro all-reduce komunikaci. Minimalizuje latenci pro synchronní training.
- **Rail-Unified Design (RUD)** — sdílená network fabric pro všechny GPU, flexibilnější, ale vyšší nároky na load balancing.
- **RoCEv2 (RDMA over Converged Ethernet)** — primární transport pro AI clustery: vyžaduje lossless fabric (ECN, PFC, DCQCN, SFC, CSIG).
- **Load balancing pro AI** — ECMP nestačí, nutné dynamic/global load balancing (DLB/GLB), flowlet-based rebalancing, per-packet spraying.
- **Topologie** — Clos (3-stage/5-stage), Dragonfly, Torus pro škálování na desítky tisíc GPU.
- **Ultra Ethernet Consortium (UEC)** — nový standard pro Ethernet v AI clastrech (2025+), řeší omezení RoCEv2.
- **Storage pro AI** — NVMe-oF, GPUDirect Storage, parallel file systems pro checkpointing a dataset loading.
- **KPIs** — Job Completion Time (JCT), tail latency, fabric utilization, PFC storm detection.
### Cloud Networking and Resilience (Critelli, ISBN 979-8868824357)
Praktický průvodce budováním resilientních cloudových sítí (Apress, 2026). Autor je EMEA Lead pro Networking & Resilience v AWS.
**Vrstvový přístup k resilienci (podle OSI modelu):**
| Vrstva | Opatření |
|--------|----------|
| L1 (Fyzická) | Redundantní přípojky, diverse fibre paths, DWDM |
| L2 (Linková) | MLAG, LACP, spanning-tree rychlá konvergence |
| L3 (Síťová) | BGP multi-homing, AS path prepend, Anycast |
| L4 (Transportní) | Connection draining, slow start, health checky |
| L7 (Aplikační) | DNS failover, global load balancing, cell-based architektury |
**Regulatorní rámce:** DORA (Digital Operational Resilience Act), NIS2 — vyžadují pravidelné testování resiliency, chaos engineering, business continuity plány.
**AIOps v resilienci:** ML-based predikce traffic patternů, automatické škálování, proaktivní fault detection (přechod od reaktivního monitoringu k prediktivní prevenci).
### Zero Trust in Resilient Cloud and Network Architectures (Halley et al., ISBN 978-0-13-820460-0)
Cisco Press — praktická příručka pro nasazení Zero Trust architektur v hybridních a cloudových prostředích.
**Implementační rámec:**
- **User and Device Trust** — ověření identity uživatele i zařízení před udělením přístupu (SSE — Security Service Edge).
- **Application Access Policies** — granulární pravidla na úrovni aplikace, nikoliv IP adresy.
- **Greenfield vs Brownfield** — nové sítě stavěné jako Zero Trust od základu vs. migrace existující infrastruktury.
- **Automation** — Terraform, Ansible pro provisioning; Meraki, EVPN, Pub/Sub telemetrie.
- **Industrial Zero Trust** — rozšíření konceptu do OT/ICS prostředí.
- **Quantum security** — příprava na post-quantum kryptografii v síťových architekturách.
### The Segmentation Blueprint (Kulkarni, ISBN 978-0-13-546236-2)
Cisco Press (2026) — pragmatický průvodce segmentací sítě od VLAN po nanosegmentaci.
**Evoluce segmentace:**
| Generace | Technologie | Rozsah |
|----------|-------------|--------|
| Tradiční | VLAN, ACL, firewall | Podsíť / subnet |
| Mikrosegmentace | Security Groups, Network Policies | Workload / instance |
| Nanosegmentace | Service mesh (Istio, Linkerd), mTLS | Aplikace / API / proces |
**Maturity model segmentace:**
1. **Initial** — flat network, žádná segmentace
2. **Basic** — VLANy, firewall mezi environmenty
3. **Defined** — Security Groups, service access policies
4. **Managed** — Mikrosegmentace, Network Policies, EVPN
5. **Optimized** — Nanosegmentace, service mesh, Zero Trust, AI-driven policy management
**Klíčová metrika:** Blast radius — kolik workloadů je ohroženo při kompromitaci jednoho uzlu. Cílem je redukce na minimum.
### Segment Routing for SP and Enterprise Networks (Deragisch, ISBN 978-0-13-823101-9)
Cisco Press (2024) — ucelený průvodce Segment Routingem pro MPLS i IPv6 data plane.
**SR-MPLS vs SRv6:**
| Vlastnost | SR-MPLS | SRv6 |
|-----------|---------|------|
| SID délka | 20 bit (MPLS label) | 128 bit (IPv6 address) |
| Data plane | MPLS | IPv6 + SRH (Segment Routing Header) |
| Signalizace | IGP (IS-IS/OSPF) extensions | IGP + BGP extensions |
| Zrání | Mature, široce nasazeno | Emerging, standardizace dokončena |
| Use case | SP sítě, MPLS migration | Cloud, DC, 5G, end-to-end programovatelnost |
**Výhody SR oproti klasickému MPLS:**
- Eliminace LDP/RSVP-TE (signaling je součástí IGP)
- Traffic engineering state přesunutý z uzlů do packet headerů (source routing)
- Fast reroute (FRR) bez dodatečných protokolů
- Egress Peer Engineering (EPE) — výběr exit pointu AS
- Mikro-loop avoidance při konvergenci
**Migrační strategie:** Greenfield (nová síť), Brownfield (postupná migrace z MPLS), "SR in a box" — kombinace SR a LDP.
### Understanding and Designing Azure Networking (Stuart, Moreno, 2025)
Praktický průvodce návrhem Azure sítí od dvou Microsoft Solution Engineers (bývalí CCIE).
**Klíčová témata:**
| Oblast | Klíčové služby a koncepty |
|--------|--------------------------|
| **Topologie** | Hub-and-spoke, Virtual WAN, multi-hub designs, Azure Route Server |
| **Hybridní konektivita** | ExpressRoute, VPN Gateway, SD-WAN integrace |
| **Multi-cloud** | Azure ↔ AWS/GCP, cross-cloud fabrics |
| **Bezpečnost** | NSG, Azure Firewall, DDoS Protection, WAF, AVNM, ZTNA |
| **DNS & PaaS** | Private Link, Private DNS Zones, Private Resolver, hybrid DNS forwarding |
| **Application delivery** | Azure Load Balancer, App Gateway, Front Door, Traffic Manager |
| **Monitoring** | Network Watcher, Traffic Analytics, Azure Monitor, Policy-as-code |
**Design decision framework:** Sbírej požadavky → analyzuj constraints → vyber topologii → implementuj → monitoruj.
### Mastering Next-Gen Juniper Data Centers (Chatterjee, ISBN 978-0-13-533636-6)
Addison-Wesley (2026) — hands-on průvodce EVPN VXLAN fabrikami na Juniper zařízeních.
**Klíčové architektury:**
- **EVPN VXLAN fabric** — multi-tenant overlay sítě s BGP EVPN control plane a VXLAN data plane.
- **Multivendor interoperability** — detailní postupy pro EVPN napříč Juniper, Cisco NX-OS, Arista EOS.
- **Multicast v EVPN VXLAN** — intra-subnet i inter-subnet multicast design (IGMP/MLD proxying, PIM, EVPN Type-6/7 routes).
- **Day-2 operations** — Juniper Apstra pro automatizaci (Terraform provider), telemetrie (gNMI, OpenConfig).
- **Service chaining** — propojení NGFW, load balancerů v rámci fabric.
- **DCI s EVPN** — Over-the-Top (OTT) i Integrated Interconnect (VXLAN stitching, MPLS transit).
**Evoluce od předchozí knihy (Deploying Juniper Data Centers with EVPN VXLAN, 2024):** Rozšíření o pokročilá témata — multicast, interoperability, Apstra Day-2, observability stack.
### Intelligent Cloud Networking: AI-Driven Resource Management (Yadav, ISBN 9364220110)
Průnik AI/ML a cloudového network managementu (Addition Publishing, 2026).
**Aplikace AI v síťovém managementu:**
| Oblast | Technika | Přínos |
|--------|----------|--------|
| **Flow prediction** | LSTM, Transformer | Predikce traffic patternů, proaktivní škálování |
| **Flow classification** | CNN, RL | Identifikace typů provozu pro QoS |
| **Load balancing** | DRL (Deep RL) | Dynamická distribuce zátěže, redukce congestion |
| **Resource management** | Q-learning, DQN | Optimalizace alokace CPU/memory/network |
| **Routing optimization** | DRL, GNN | Adaptivní routing podle aktuálních podmínek |
| **Congestion control** | ML-based CC | Prediktivní řízení congeste (místo reakce na loss) |
| **Anomaly detection** | Autoencoders, Isolation Forest | Detekce útoků a anomálií v reálném čase |
| **Blockchain security** | Smart contracts | Decentralizované řízení přístupu, audit trail |
**Technologické směry:**
- **Ultra Ethernet Consortium (UEC)** — nová generace Ethernetu pro AI, lossless fabric, telemetrie, adaptive routing.
- **Path Aware Networking** — aplikace si vybírá cestu podle aktuálních podmínek (latence, loss, cena).
- **Self-optimizing networks** — uzavřená smyčka: telemetrie → AI analýza → automatická akce → zpětná vazba.
## OpenStack Networking (Neutron)
OpenStack Neutron je SDN framework pro správu virtuálních sítí v multi-tenant prostředí. Podporuje VLAN, VXLAN, GRE tunely, security groups, QoS, a LBaaS (Octavia).
### Backendy
| Backend | Popis | Vhodné pro |
|---------|-------|-----------|
| **OVN (Open Virtual Network)** | Nativní OpenFlow/OVSDB backend; nahrazuje OVS+agent architekturu | Produkce, škálovatelná nasazení |
| **OVS (Open vSwitch)** | Klasický agent-based backend (neutron-openvswitch-agent) | Malé nasazení, legacy |
| **Linux Bridge** | Jednoduchý backend bez OVS | Vývoj, testování, embedded |
| **Hyper-V** | Windows Server backend | Hybridní prostředí |
### Důležité koncepty
- **Networks, Subnets, Ports** — základní síťové objekty
- **Routers** — L3 forwarding mezi tenant sítěmi (DVR pro distribuovaný routing)
- **Security Groups** — stateful firewall rules na úrovni portu
- **Floating IPs** — veřejné IP mapované na instance (1:1 NAT)
- **LBaaS / Octavia** — load balancing as a service (HAProxy, amphora)
- **Trunk ports** — VLAN tagging pro instance (parent + subports)
### Performance tuning
- **DPDK** — userspace packet processing, bypass kernel, nižší latence
- **SR-IOV** — passthrough VF do instance, minimální režie hypervisoru
- **NUMA pinning** — afinita vCPU/memory/NIC pro výpočetní instance
- **Hardware offload** — OVS TC Flower, ASAP²
### Use cases
- Multi-tenant cloud (veřejný i privátní)
- Telco/NFVI (DPDK, SR-IOV, low-latency)
- SDN lab / network function virtualization
## Zero Trust Networking
Zero Trust je bezpečnostní model "never trust, always verify" — žádná entita není implicitně důvěryhodná, bez ohledu na umístění v síti.
### Principy (NIST SP 800-207)
1. **All resources are external** — není důvěryhodná interní síť
2. **Least privilege** — přístup jen k nezbytným zdrojům
3. **Micro-segmentation** — izolace workloadů na úrovni jednotlivých procesů/kontejnerů
4. **Encrypt everything** — TLS/mTLS pro veškerou komunikaci
5. **Continuous verification** — každý request je autentizován a autorizován
6. **Dynamic policies** — pravidla se mění podle kontextu (uživatel, zařízení, lokace, čas)
### Implementační vrstvy
| Vrstva | Technologie | Popis |
|--------|-------------|-------|
| **Identity** | OIDC, SAML, LDAP | Autentizace uživatelů a zařízení |
| **Device** | MDM, UEM, device certificates | Ověření stavu zařízení (compliant, patch level) |
| **Network** | Micro-segmentation, firewall, SDN | Izolace trafficu mezi workloady |
| **Application** | mTLS, service mesh, API gateway | Aplikace si vynucují vzájemnou autentizaci |
| **Data** | Encryption at rest + in transit, DLP | Ochrana dat bez ohledu na umístění |
| **Analytics** | SIEM, UEBA, AI/ML | Detekce anomálií a hrozeb v reálném čase |
### Nástroje a platformy
| Kategorie | Nástroje |
|-----------|----------|
| **ZTNA (Zero Trust Network Access)** | Cloudflare Access, Zscaler, Netskope, Palo Alto Prisma |
| **Service Mesh** | Istio, Linkerd, Consul Connect, Cilium |
| **Micro-segmentation** | VMware NSX, Illumio, Guardicore, Akamai |
| **BeyondCorp** | Google BeyondCorp (open-source: BeyondCorp Alliance) |
| **Security Service Edge (SSE)** | SWG, CASB, ZTNA v jednom (Zscaler, Netskope, Cloudflare) |
### Zero Trust v datacentru
V privátním DC se Zero Trust nasazuje přes:
- **EVPN VXLAN** — overlay síť s tenant isolation
- **Network Policies** (Kubernetes) — per-pod firewall pravidla
- **Cilium** — eBPF-based L3/L7 policy enforcement
- **WireGuard / IPsec** — šifrování mezi uzly
- **HashiCorp Boundary** — identity-based access k serverům bez bastion host
## Best practices
- **Segmentace sítě** — oddělení environmentů (dev/staging/prod), vrstev (web/app/db)
- **Least privilege access** — security groups povolují jen nutný provoz
- **Monitoring** — VPC Flow Logs, netflow, sFlow
- **Redundance** — multi-AZ, multi-region pro kritické služby
- **Encryption in transit** — TLS všude, mTLS pro service-to-service
- **DDoS protection** — AWS Shield, Azure DDoS Protection, Cloudflare
## Zdroje
Odkazy, knihy a standardy: [sources/networking/sources.md](sources/networking/sources.md)
- **MTU alignment** — konzistentní MTU napříč celou cestou, kontrola ICMP blokování pro PMTUD
- **IP planning** — RFC 1918 (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16), vyhnout se překryvům pro peering
## TLS detail
### TLS 1.3 handshake (1-RTT)
```
Client Server
| |
| ClientHello (key_share, sig_algs) |
|─────────────────────────────────────────>|
| |
| ServerHello + EncryptedExtensions |
| + Certificate + CertificateVerify |
| + Finished |
|<─────────────────────────────────────────|
| |
| Finished |
|─────────────────────────────────────────>|
| |
| << Application Data >> |
```
- **0-RTT** (early data) — client posílá data ihned s první zprávou (při opakovaném spojení s PSK)
- Riziko 0-RTT: replay attacks (HTTP GET je bezpečný, POST vyžaduje ochranu)
- Oproti TLS 1.2: odstraněny zastaralé ciphery, AEAD required, faster handshake (2 RTT → 1 RTT)
### Cipher suites
| Suite | Key exchange | Auth | Encryption | MAC | Status |
|-------|-------------|------|-----------|-----|--------|
| `TLS_AES_128_GCM_SHA256` | (EC)DHE | (EC)DHE | AES-128-GCM | AEAD | TLS 1.3 default |
| `TLS_AES_256_GCM_SHA384` | (EC)DHE | (EC)DHE | AES-256-GCM | AEAD | Vyšší bezpečnost |
| `TLS_CHACHA20_POLY1305_SHA256` | (EC)DHE | (EC)DHE | ChaCha20-Poly1305 | AEAD | Mobilní / bez AES-NI |
| `ECDHE-ECDSA-AES128-GCM-SHA256` | ECDHE | ECDSA | AES-128-GCM | AEAD | TLS 1.2 (PFS) |
| `ECDHE-RSA-AES128-GCM-SHA256` | ECDHE | RSA | AES-128-GCM | AEAD | TLS 1.2 (PFS) |
PFS (Perfect Forward Secrecy) — při kompromitaci privátního klíče nelze dešifrovat dříve zachycený provoz (ECDHE + ephemeral klíče).
### Certificate chain validation
```
1. Client obdrží certificate chain od serveru
2. Validace:
a. Datum: certifikát není expirovaný a je platný (notBefore, notAfter)
b. CRL / OCSP: certifikát není revokován (OCSP stapling pro snížení latence)
c. Signature chain: podpis každého certu v řetězu je ověřen veřejným klíčem vydavatele
d. Root CA: poslední cert v řetězu je důvěryhodný (v trust store klienta)
3. CN / SAN: doménové jméno v certifikátu musí odpovídat cílové doméně
```
Typický řetěz: `Leaf Cert → Intermediate CA → Root CA` (self-signed, v trust store).
*Poslední revize: 2026-06-03*

207
ORACLE.en.md Normal file
View File

@@ -0,0 +1,207 @@
# 🏛️ Oracle Database
## Overview
Oracle Database is a proprietary relational database with the broadest range of enterprise features — RAC clustering, Active Data Guard, partitioning, advanced compression, in-memory options, and Oracle Exadata integration. Dominant in the corporate world, finance, telecommunications, and mainframe ecosystem.
## Architecture
### Oracle instance + database
```
Oracle Instance (memory + processes)
├── System Global Area (SGA)
│ ├── Database Buffer Cache
│ ├── Shared Pool (library cache, dictionary cache)
│ ├── Redo Log Buffer
│ ├── Java Pool
│ ├── Large Pool (backup, parallel)
│ └── In-Memory Column Store (option)
├── Program Global Area (PGA) — per session
└── Background processes
├── PMON (process monitor)
├── SMON (system monitor)
├── DBWn (database writer)
├── LGWR (log writer)
├── CKPT (checkpoint)
├── ARCn (archiver)
└── MMON (manageability monitor)
```
### Multitenant architecture (12c+)
```
Container Database (CDB)
├── Root (CDB$ROOT) — metadata, system objects
├── Seed (PDB$SEED) — template for new PDBs
├── Pluggable Database 1 (PDB1) — application A
├── Pluggable Database 2 (PDB2) — application B
└── Pluggable Database 3 (PDB3) — application C
```
Each PDB looks like a separate database but shares SGA and background processes. Advantage: higher density, simpler patching (CDB level), resource management per PDB.
### Oracle RAC (Real Application Clusters)
Multi-instance architecture — multiple servers access the same storage:
```text
Node 1 ─── Oracle ASM ─── Shared Storage (SAN/NFS)
Node 2 ─── Oracle ASM ─── Shared Storage (SAN/NFS)
Node 3 ─── Oracle ASM ─── Shared Storage (SAN/NFS)
Cache Fusion (private interconnect)
```
- Up to 64 nodes in a cluster
- **Cache Fusion** — transfer of dirty blocks between instances via private interconnect (RAC-specific)
- **ASM** (Automatic Storage Management) — clustered filesystem + volume manager
- **Service** — workload routing (primary, report, batch)
### Oracle Data Guard
| Mode | Protection | Latency | Use case |
|------|-----------|---------|----------|
| **Maximum Protection** | Zero data loss (sync) | Highest | Critical systems |
| **Maximum Availability** | Zero data loss (sync, fallback to async) | High | Enterprise standard |
| **Maximum Performance** | Async | Lowest | Remote DR |
- **Active Data Guard** — standby for reads (reporting, backup) — requires license
- **Far Sync** — synchronous write to remote standby via async (compromise)
### Oracle Exadata
Hardware+software platform for Oracle DB:
| Component | Description |
|-----------|-------------|
| **Database Servers** | x86 (Xeon), 2-8× per rack, NVMe, 1.5-6 TB RAM |
| **Storage Servers** | Total capacity up to 2.7 PB raw per rack |
| **Smart Scan** | Predicate filtering at the storage layer (instead of DB server) |
| **Smart Flash Cache** | Multiple caching layers (RAM, Flash, disk) |
| **RDMA over Converged Ethernet** | Low latency between DB and storage servers |
Suitable for: largest OLTP, data warehousing, consolidation.
## Key enterprise features
| Feature | Description | Competition |
|---------|-------------|-------------|
| **RAC** | Shared-everything cluster up to 64 nodes | MSSQL AlwaysOn FCI (2 nodes) |
| **Active Data Guard** | Standby for reads, far sync, automatic failover | MSSQL AlwaysOn AG, PostgreSQL streaming |
| **Partitioning** | Range, List, Hash, Composite, interval, reference | PostgreSQL (declarative partitioning 10+) |
| **Advanced Compression** | Columnar, HCC (Exadata), OLTP compression | InnoDB page compression, PG TOAST |
| **In-Memory** | Column store in SGA for real-time analytics | PG (no native), MSSQL (columnstore index) |
| **Advanced Security** | TDE, data redaction, VPD, audit vault, database firewall | PG (pgcrypto, pgaudit), MSSQL (TDE, Always Encrypted) |
| **Flashback** | Querying historical data (Flashback Query, Table, Database) | PG (temporal tables via extension), MSSQL (system-versioned) |
| **Sharding** | System-managed, composite, user-defined | MongoDB (native), Vitess (MySQL), Citus (PG) |
| **ASM** | Clustered filesystem + volume manager | VMware VMFS, Windows CSV |
## Oracle licensing detail
### Editions
| Edition | Metric | Price (indicative) | Limitations |
|---------|--------|-------------------|-------------|
| **Oracle Database Standard Edition 2 (SE2)** | Per core (core factor 0.5) | ~$17,500/core | Max 16 CPU threads (per server), max 2 sockets, no RAC (only Oracle RAC One), no partitioning, in-memory, compression |
| **Oracle Database Enterprise Edition (EE)** | Per core (core factor 0.5) | ~$47,500/core | No limits, all features (RAC, partitioning, in-memory, compression, Advanced Security) — but all as **optional licenses** |
| **Oracle Database Enterprise Edition (RAC)** | Per core (EE + RAC option) | ~$47,500 + $23,000/core | EE + RAC clustering |
### Optional licenses (options) — EE only
| Option | Price (indicative / core) | Use case |
|--------|--------------------------|----------|
| **Real Application Clusters (RAC)** | ~$23,000 | Multi-instance cluster |
| **Active Data Guard** | ~$10,000 | Standby for reads, far sync, automatic failover |
| **Partitioning** | ~$11,500 | Range, list, hash, interval, reference, system |
| **Advanced Compression** | ~$11,500 | OLTP compression, HCC (Exadata), JSON compression |
| **Advanced Security** | ~$15,000 | TDE, data redaction, database firewall |
| **In-Memory Database** | ~$23,000 | Column store in SGA for real-time analytics |
| **Database Vault** | ~$5,750 | Separation of duties, multi-tenancy security |
| **Multitenant (EE)** | Free (since 21c) | CDB/PDB — max 3 PDB per CDB in EE without license. Unlimited with Multitenant option (~$17,500) |
| **Spatial / Graph** | ~$5,750 | Geospatial data, property graph |
| **Label Security** | ~$5,750 | Row-level security with classifications |
### Oracle Cloud (OCI) licensing
| Service | Model | Price | Note |
|---------|-------|-------|------|
| **OCI Base Database (RDS-like)** | BYOL or License Included | ~$1-5/hour (BYOL cheaper) | Single instance or RAC, automatic backup, patching |
| **OCI Exadata Database Service** | BYOL or License Included | ~$5-30/hour (depending on shape) | Exadata X9M/X10M in OCI, elastic, full Exadata features |
| **OCI Autonomous Database** | Per CPU (ECPU) | ~$0.50-3.00/ECPU/hour | Auto-tuning, auto-scaling, auto-patching |
| **BYOL (Bring Your Own License)** | Own Oracle license in OCI | Infrastructure only | Can use existing perpetual license, including support |
### RAC sizing — license cost
```text
4-node RAC, each node 2× EPYC 9654 (96C) = 192 cores per node
Core factor 0.5 → 96 Oracle licenses per node
4 × 96 = 384 Oracle EE licenses
EE: 384 × $47,500 = $18,240,000
RAC option: 384 × $23,000 = $8,832,000
Support 22% annually: ($18,240,000 + $8,832,000) × 0.22 = $5,955,840/year
Tip: For RAC, consider smaller CPUs (e.g., 64C instead of 96C) — license cost often exceeds hardware cost.
```
### Oracle vs PostgreSQL — comparison
| Area | Oracle | PostgreSQL |
|------|--------|------------|
| **License** | Proprietary (per core, ~$17.5k-47.5k/core + 22% support annually) | Open source (PostgreSQL license, MIT-like) |
| **RAC clustering** | Native, shared-everything | None (Citus = shared-nothing) |
| **Multitenant** | CDB/PDB architecture | None (schemas per tenant) |
| **Parallel execution** | Mature (auto DOP, parallel index scan) | Good (parallel seq/index scan, join) |
| **Storage management** | ASM (integrated) | OS volume / LVM |
| **Materialized views** | With refresh on commit, query rewrite | No query rewrite |
| **Partitioning** | 40+ options (interval, referential, system) | Declarative (range, list, hash since 10+) |
| **In-memory** | Columnar in SGA | Not native |
| **Standby usage** | Active Data Guard (read-only, license) | Hot standby (read-only, free) |
| **Cloud** | OCI (Oracle Cloud), AWS RDS, Azure | All clouds (native) |
## Recommendations — where Oracle is better
| Area | Oracle | Competition | Why Oracle |
|------|--------|-------------|------------|
| **License cost (4-node RAC, 384 cores)** | ~$50M (1st year incl. support) | PostgreSQL: $0 | Oracle: 22% support annually on license fee |
| **Vendor lock-in** | High (GoldenGate migration difficult, PL/SQL specific) | PostgreSQL: none | MySQL and PG have migration tools from Oracle (ora2pg, AWS DMS) |
| **Enterprise OLTP** | RAC + ASM, zero-downtime patching | MSSQL (FCI limit 2 nodes) | Shared-everything cluster, transparent failover |
| **Finance / Banking** | Audit Vault, Database Vault, TDE, VPD | PG (pgaudit, row-level security) | Compliance certifications (SOX, PCI, GDPR) |
| **Consolidation** | Multitenant (CDB/PDB) — hundreds of DBs on 1 instance | PG (citizen schemas) | Lower overhead, simpler management |
| **Data Warehouse** | Exadata Smart Scan, parallel execution, in-memory | ClickHouse (specialized) | Hybrid workload (OLTP + DW in one DB) |
| **High-end hardware** | Exadata engineered system | PG (runs on anything) | Full-stack HW+SW optimization |
| **Partitioning** | Range of options (reference, interval, system) | PG (basic) | 10+ years lead in implementation |
| **Flashback / recovery** | Flashback Database, Table, Query — any point in time | PG (PITR, point-in-time) | Faster, more granular recovery |
| **Ecosystem** | OEM, Data Pump, SQL Developer, Toad, GoldenGate | PG (pgAdmin, pg_dump, Patroni) | Decades of enterprise tooling |
### When to use Oracle
- **Critical OLTP systems** — banking, payment processing, trading
- **Enterprise consolidation** — hundreds of DBs on one RAC cluster (multitenant)
- **Regulated environments** — finance, healthcare, government (audit, security, compliance)
- **Oracle ecosystem** — E-Business Suite, PeopleSoft, Siebel, JD Edwards
- **Exadata customers** — maximum performance for hybrid workload (OLTP + DW)
- **GoldenGate replication** — heterogeneous replication (Oracle → Kafka, Oracle → PostgreSQL)
- **Cloud migration** — OCI, AWS RDS for Oracle, Azure Oracle Database Service
### When to use something else
- **Startup / SME** → PostgreSQL (free, sufficient performance, no vendor lock-in)
- **Web / LAMP stack** → MySQL (simpler, cheaper, broad support)
- **Cloud-native** → Aurora, CockroachDB (architecture for cloud, not port of on-prem to cloud)
- **Need only SQL** → PostgreSQL (Oracle overhead not worth it)
- **Horizontal write scaling** → Cassandra (RAC scales reads, writes go through one node)
## Sources
References, books, and standards: [sources/databases/sources.en.md](sources/databases/sources.en.md)
### Recommended reading
| Book | Authors | ISBN | Description |
|------|---------|------|-------------|
| Oracle Database 23ai New Features | Oracle Corporation | — | Official guide to new features — AI Vector Search, JSON Relational Duality, property graphs, schema privileges |
| Expert Oracle Architecture (3rd ed.) | Thomas Kyte, Darl Kuhn | 978-1484249602 | Comprehensive explanation of Oracle architecture — from storage to RAC and Data Guard |
*Last revision: 2026-06-03*

207
ORACLE.md Normal file
View File

@@ -0,0 +1,207 @@
# 🏛️ Oracle Database
## Přehled
Oracle Database je proprietární relační databáze s nejširší škálou enterprise funkcí — RAC clustering, Active Data Guard, partitioning, advanced compression, in-memory options a Oracle Exadata integrace. Dominantní v korporátním světě, financích, telekomunikacích a mainframe ekosystému.
## Architektura
### Oracle instance + database
```
Oracle Instance (memory + processes)
├── System Global Area (SGA)
│ ├── Database Buffer Cache
│ ├── Shared Pool (library cache, dictionary cache)
│ ├── Redo Log Buffer
│ ├── Java Pool
│ ├── Large Pool (backup, parallel)
│ └── In-Memory Column Store (option)
├── Program Global Area (PGA) — per session
└── Background processes
├── PMON (process monitor)
├── SMON (system monitor)
├── DBWn (database writer)
├── LGWR (log writer)
├── CKPT (checkpoint)
├── ARCn (archiver)
└── MMON (manageability monitor)
```
### Multitenant architektura (12c+)
```
Container Database (CDB)
├── Root (CDB$ROOT) — metadata, system objects
├── Seed (PDB$SEED) — template pro nové PDB
├── Pluggable Database 1 (PDB1) — aplikace A
├── Pluggable Database 2 (PDB2) — aplikace B
└── Pluggable Database 3 (PDB3) — aplikace C
```
Každé PDB vypadá jako samostatná databáze, ale sdílí SGA a background procesy. Výhoda: vyšší densita, jednodušší patchování (CDB úroveň), resource management per PDB.
### Oracle RAC (Real Application Clusters)
Multi-instance architektura — více serverů přistupuje ke stejné storage:
```text
Node 1 ─── Oracle ASM ─── Shared Storage (SAN/NFS)
Node 2 ─── Oracle ASM ─── Shared Storage (SAN/NFS)
Node 3 ─── Oracle ASM ─── Shared Storage (SAN/NFS)
Cache Fusion (private interconnect)
```
- Až 64 nodů v clusteru
- **Cache Fusion** — transfer dirty blocks mezi instancemi přes private interconnect (RAC-specific)
- **ASM** (Automatic Storage Management) — clustered filesystem + volume manager
- **Service** — workload routing (primární, report, batch)
### Oracle Data Guard
| Režim | Ochrana | Latence | Use case |
|-------|---------|---------|----------|
| **Maximum Protection** | Zero data loss (sync) | Nejvyšší | Kritické systémy |
| **Maximum Availability** | Zero data loss (sync, fallback na async) | Vysoká | Enterprise standard |
| **Maximum Performance** | Async | Nejnižší | DR na dálku |
- **Active Data Guard** — standby pro čtení (reporting, backup) — vyžaduje licenci
- **Far Sync** — synchronní zápis na vzdálený standby přes async (kompromis)
### Oracle Exadata
Hardware+software platforma pro Oracle DB:
| Komponenta | Popis |
|-----------|-------|
| **Database Servers** | x86 (Xeon), 2-8× per rack, NVMe, 1.5-6 TB RAM |
| **Storage Servers** | Celková kapacita až 2.7 PB raw per rack |
| **Smart Scan** | Predikátová filtrace na storage vrstvě (místo v DB serveru) |
| **Smart Flash Cache** | Násobné vrstvy caching (RAM, Flash, disk) |
| **RDMA over Converged Ethernet** | Nízká latence mezi DB a storage servery |
Vhodné pro: největší OLTP, data warehousing, consolidation.
## Klíčové enterprise funkce
| Funkce | Popis | Konkurence |
|--------|-------|------------|
| **RAC** | Shared-everything cluster až 64 uzlů | MSSQL AlwaysOn FCI (2 uzly) |
| **Active Data Guard** | Standby pro čtení, far sync, automatic failover | MSSQL AlwaysOn AG, PostgreSQL streaming |
| **Partitioning** | Range, List, Hash, Composite, interval, reference | PostgreSQL (declarative partitioning 10+) |
| **Advanced Compression** | Columnar, HCC (Exadata), OLTP compression | InnoDB page compression, PG TOAST |
| **In-Memory** | Column store v SGA pro real-time analytics | PG (no native), MSSQL (columnstore index) |
| **Advanced Security** | TDE, data redaction, VPD, audit vault, database firewall | PG (pgcrypto, pgaudit), MSSQL (TDE, Always Encrypted) |
| **Flashback** | Dotazování na historická data (Flashback Query, Table, Database) | PG (temporal tables via extension), MSSQL (system-versioned) |
| **Sharding** | System-managed, composite, user-defined | MongoDB (native), Vitess (MySQL), Citus (PG) |
| **ASM** | Clustered filesystem + volume manager | VMware VMFS, Windows CSV |
## Oracle licensing detail
### Edice
| Edice | Metrika | Cena (orientační) | Limitace |
|-------|---------|-------------------|----------|
| **Oracle Database Standard Edition 2 (SE2)** | Per core (core factor 0.5) | ~$17 500/core | Max 16 CPU threads (na server), max 2 sockets, žádný RAC (pouze Oracle RAC One), žádné partitioning, in-memory, compression |
| **Oracle Database Enterprise Edition (EE)** | Per core (core factor 0.5) | ~$47 500/core | Bez omezení, všechny funkce (RAC, partitioning, in-memory, compression, Advanced Security) — ale vše jako **volitelné licence** |
| **Oracle Database Enterprise Edition (RAC)** | Per core (EE + RAC option) | ~$47 500 + $23 000/core | EE + RAC clustering |
### Volitelné licence (options) — EE only
| Option | Cena (orientační / core) | Use case |
|--------|--------------------------|----------|
| **Real Application Clusters (RAC)** | ~$23 000 | Multi-instance cluster |
| **Active Data Guard** | ~$10 000 | Standby pro čtení, far sync, automatic failover |
| **Partitioning** | ~$11 500 | Range, list, hash, interval, reference, system |
| **Advanced Compression** | ~$11 500 | OLTP compression, HCC (Exadata), JSON compression |
| **Advanced Security** | ~$15 000 | TDE, data redaction, database firewall |
| **In-Memory Database** | ~$23 000 | Column store v SGA pro real-time analytics |
| **Database Vault** | ~$5 750 | Separation of duties, multi-tenancy security |
| **Multitenant (EE)** | Zdarma (od 21c) | CDB/PDB — max 3 PDB na CDB v EE bez license. Neomezeno s Multitenant option (~$17 500) |
| **Spatial / Graph** | ~$5 750 | Geoprostorová data, property graph |
| **Label Security** | ~$5 750 | Row-level security s klasifikacemi |
### Oracle Cloud (OCI) licensing
| Služba | Model | Cena | Poznámka |
|--------|-------|------|----------|
| **OCI Base Database (RDS-like)** | BYOL nebo License Included | ~$1-5/hod (BYOL levnější) | Single instance nebo RAC, automatické backup, patching |
| **OCI Exadata Database Service** | BYOL nebo License Included | ~$5-30/hod (dle shape) | Exadata X9M/X10M v OCI, elastic, full Exadata features |
| **OCI Autonomous Database** | Per CPU (ECPU) | ~$0.50-3.00/ECPU/hod | Auto-tuning, auto-scaling, auto-patching |
| **BYOL (Bring Your Own License)** | Vlastní Oracle license v OCI | Jen infrastruktura | Lze použít stávající perpetual license, včetně supportu |
### RAC sizing — licence cost
```text
4-node RAC, každý node 2× EPYC 9654 (96C) = 192 cores per node
Core factor 0.5 → 96 Oracle licenses per node
4 × 96 = 384 Oracle EE licenses
EE: 384 × $47 500 = $18 240 000
RAC option: 384 × $23 000 = $8 832 000
Support 22 % ročně: ($18 240 000 + $8 832 000) × 0.22 = $5 955 840/rok
Tip: Pro RAC zvažte menší CPU (např. 64C místo 96C) — license cost často převyšuje hardware cost.
```
### Oracle vs PostgreSQL — srovnání
| Oblast | Oracle | PostgreSQL |
|--------|--------|------------|
| **Licence** | Proprietary (per core, ~$17.5k-47.5k/core + 22 % support ročně) | Open source (PostgreSQL license, MIT-like) |
| **RAC clustering** | Nativní, shared-everything | Žádné (Citus = shared-nothing) |
| **Multitenant** | CDB/PDB architektura | Žádné (schemas per tenant) |
| **Parallel execution** | Vyspělý (auto DOP, parallel index scan) | Dobrý (parallel seq/index scan, join) |
| **Storage management** | ASM (integrovaný) | OS volume / LVM |
| **Materialized views** | S refresh on commit, query rewrite | Není query rewrite |
| **Partitioning** | 40+ možností (interval, referential, system) | Declarative (range, list, hash od 10+) |
| **In-memory** | Columnar in SGA | Není nativní |
| **Standby použitek** | Active Data Guard (read-only, licence) | Hot standby (read-only, zdarma) |
| **Cloud** | OCI (Oracle Cloud), AWS RDS, Azure | Všechny cloudy (native) |
## Doporučení — v čem je Oracle lepší
| Oblast | Oracle | Konkurence | Proč Oracle |
|--------|--------|------------|-------------|
| **Licence cost (4-node RAC, 384 cores)** | ~$50M (1. rok vč. supportu) | PostgreSQL: $0 | Oracle: 22 % support ročně z license fee |
| **Vendor lock-in** | Vysoký (GoldenGate migrace náročná, PL/SQL specific) | PostgreSQL: žádný | MySQL i PG mají nástroje pro migraci z Oracle (ora2pg, AWS DMS) |
| **Enterprise OLTP** | RAC + ASM, zero-downtime patching | MSSQL (FCI limit 2 nodes) | Shared-everything cluster, transparent failover |
| **Finance / Banking** | Audit Vault, Database Vault, TDE, VPD | PG (pgaudit, row-level security) | Compliance certifikace (SOX, PCI, GDPR) |
| **Consolidace** | Multitenant (CDB/PDB) — stovky DB na 1 instanci | PG (citizen schemas) | Nižší overhead, jednodušší management |
| **Data Warehouse** | Exadata Smart Scan, parallel execution, in-memory | ClickHouse (specializovaná) | Hybrid workload (OLTP + DW v jedné DB) |
| **High-end hardware** | Exadata engineered system | PG (běží na čemkoliv) | Full-stack optimalizace HW+SW |
| **Partitioning** | Rozsah možností (reference, interval, system) | PG (basic) | 10+ let náskok v implementaci |
| **Flashback / recovery** | Flashback Database, Table, Query — libovolný čas | PG (PITR, point-in-time) | Rychlejší, granularnější recovery |
| **Ekosystém** | OEM, Data Pump, SQL Developer, Toad, GoldenGate | PG (pgAdmin, pg_dump, Patroni) | Desítky let enterprise toolingu |
### Kdy použít Oracle
- **Kritické OLTP systémy** — banking, payment processing, trading
- **Enterprise konsolidace** — stovky DB na jednom RAC clusteru (multitenant)
- **Regulované prostředí** — finance, healthcare, government (audit, security, compliance)
- **Oracle ekosystém** — E-Business Suite, PeopleSoft, Siebel, JD Edwards
- **Exadata zákazníci** — maximální výkon pro hybrid workload (OLTP + DW)
- **GoldenGate replikace** — heterogenní replikace (Oracle → Kafka, Oracle → PostgreSQL)
- **Cloud migration** — OCI, AWS RDS for Oracle, Azure Oracle Database Service
### Kdy použít něco jiného
- **Startup / SME** → PostgreSQL (zdarma, dostatečný výkon, žádný vendor lock-in)
- **Web / LAMP stack** → MySQL (jednodušší, levnější, široká podpora)
- **Cloud-native** → Aurora, CockroachDB (architektura pro cloud, ne port on-prem do cloudu)
- **Potřebujete jen SQL** → PostgreSQL (Oracle overhead se nevyplatí)
- **Horizontální škálování zápisů** → Cassandra (RAC škáluje čtení, zápisy jdou přes jeden nod)
## Zdroje
Odkazy, knihy a standardy: [sources/databases/sources.md](sources/databases/sources.md)
### Doporučená literatura
| Kniha | Autoři | ISBN | Popis |
|-------|--------|------|-------|
| Oracle Database 23ai New Features | Oracle Corporation | — | Oficiální průvodce novinkami — AI Vector Search, JSON Relational Duality, property graphs, schema privileges |
| Expert Oracle Architecture (3rd ed.) | Thomas Kyte, Darl Kuhn | 978-1484249602 | Komplexní výklad Oracle architektury — od storage po RAC a Data Guard |
*Poslední revize: 2026-06-03*

337
OS.en.md Normal file
View File

@@ -0,0 +1,337 @@
# Operating Systems
> Overview of Linux distributions and Microsoft Windows for server, container, and AI/GPU workloads, including support lifecycle, EOL dates, and comparison.
---
## Distribution overview
| Distribution | Family | Package manager | Init | Security | Reference platform |
|-------------|--------|----------------|------|----------|-------------------|
| **Ubuntu LTS** | Debian | apt (deb) | systemd | AppArmor | NVIDIA DGX, widest AI/GPU support |
| **Debian** | Debian | apt (deb) | systemd | AppArmor | General-purpose server, stability |
| **RHEL** | Red Hat | dnf (rpm) | systemd | SELinux | Enterprise standard, SAP, Oracle DB |
| **Rocky Linux** | Red Hat | dnf (rpm) | systemd | SELinux | RHEL binary compatible (free) |
| **AlmaLinux** | Red Hat | dnf (rpm) | systemd | SELinux | RHEL binary compatible (free) |
| **SLES** | SUSE | zypper (rpm) | systemd | AppArmor | HPC, SAP, mainframe |
| **OpenSUSE Leap** | SUSE | zypper (rpm) | systemd | AppArmor | Desktop, development |
| **OpenSUSE Tumbleweed** | SUSE | zypper (rpm) | systemd | AppArmor | Rolling release, bleeding edge |
| **Fedora** | Red Hat | dnf (rpm) | systemd | SELinux | Desktop, technology preview |
| **Arch Linux** | Independent | pacman | systemd | — | Rolling, power users |
| **Alpine Linux** | Independent | apk | OpenRC | — | Container image, embedded |
| **Flatcar Container Linux** | Independent | — (image-based) | systemd | — | K8s worker node, minimal footprint |
| **Bottlerocket** | Independent | — (image-based) | systemd | — | AWS K8s, minimal footprint |
---
## Support lifecycle and EOL dates
> **Standard:** base support (bug fixes, security). **LTS/ELS:** extended support (security only).
> ESM = Ubuntu Extended Security Maintenance, EUS = RHEL Extended Update Support, LTSS = SUSE Long Term Service Pack Support.
### Ubuntu LTS
| Version | Release | Standard support | ESM / Ubuntu Pro | Note |
|---------|---------|-----------------|------------------|------|
| **20.04 LTS** (Focal) | 2020-04 | End 2025-04 | End 2030-04 | Last release with Python 2 |
| **22.04 LTS** (Jammy) | 2022-04 | End 2027-04 | End 2032-04 | NVIDIA DGX standard |
| **24.04 LTS** (Noble) | 2024-04 | End 2029-04 | End 2034-04 | Latest GPU/CUDA support |
| **26.04 LTS** (planned) | 2026-04 | End 2031-04 | End 2036-04 | — |
### RHEL
| Version | Release | Full support | Maintenance support | Extended life cycle |
|---------|---------|-------------|-------------------|-------------------|
| **7** | 2014-06 | End 2019-08 | End 2024-06 | End 2028-06 (ELS) |
| **8** | 2019-05 | End 2024-05 | End 2029-05 | End 2034-06 (ELS) |
| **9** | 2022-05 | End 2027-05 | End 2032-05 | End 2037-06 (ELS) |
| **10** (planned) | 2025 | End 2029 | End 2034 | — |
### Rocky Linux / AlmaLinux
| Version | Release | Support until | RHEL compatible | Note |
|---------|---------|-------------|-----------------|------|
| **8** | 2021-06 | 2029-05 | Yes (since RHEL 8.4) | Alma/Rocky |
| **9** | 2022-07 | 2032-05 | Yes (since RHEL 9.0) | Alma/Rocky |
### Debian
| Version | Release | Full support | LTS support | ELTS (paid) |
|---------|---------|-------------|-------------|-------------|
| **11** (Bullseye) | 2021-08 | 2024-08 | End 2026-08 | End 2028-08 |
| **12** (Bookworm) | 2023-06 | 2026-06 | End 2028-06 | End 2030-06 |
| **13** (Trixie) | 2025 (expected) | ~3 years post-release | ~5 years post-release | — |
### SLES
| Version | Release | General support | LTSS | Note |
|---------|---------|---------------|------|------|
| **15 SP3** | 2021-06 | End 2024-12 | End 2027-12 | — |
| **15 SP4** | 2022-06 | End 2025-12 | End 2028-12 | — |
| **15 SP5** | 2023-06 | End 2026-12 | End 2029-12 | Current SP |
| **15 SP6** | 2024-10 | End 2027-12 | End 2030-12 | — |
### Fedora
| Version | Release | EOL | Note |
|---------|---------|-----|------|
| **38** | 2023-04 | 2024-05 | — |
| **39** | 2023-11 | 2024-12 | — |
| **40** | 2024-04 | 2025-05 | — |
| **41** | 2024-11 | 2025-12 | — |
Fedora releases a new version every ~6 months, EOL ~13 months after release. Serves as upstream for RHEL.
### Alpine Linux
| Version | Release | EOL |
|---------|---------|-----|
| **3.18** | 2023-05 | 2025-05 |
| **3.19** | 2023-12 | 2025-12 |
| **3.20** | 2024-05 | 2026-05 |
| **3.21** | 2024-12 | 2026-12 |
---
## Kernel version per distribution
| Distribution | Kernel (default) | Kernel (HWE/enhanced) | Note |
|------------|-----------------|----------------------|------|
| Ubuntu 22.04 LTS | 5.15 (GA) | 6.5+ (HWE) | HWE from 22.04.2 |
| Ubuntu 24.04 LTS | 6.8 | — | — |
| RHEL 8 | 4.18 | — | Backported features |
| RHEL 9 | 5.14 | — | Backported features |
| RHEL 10 | 6.11+ (expected) | — | — |
| Rocky/Alma 8 | 4.18 | — | Same as RHEL 8 |
| Rocky/Alma 9 | 5.14 | — | Same as RHEL 9 |
| Debian 11 | 5.10 | 6.1 (backports) | — |
| Debian 12 | 6.1 | — | — |
| SLES 15 SP5 | 5.14 | — | — |
| SLES 15 SP6 | 6.4 | — | — |
| Fedora 40 | 6.8+ | — | Rolling upstream |
| Alpine 3.20 | 6.6 | — | — |
---
## Use case comparison
| Use case | Recommended distribution | Rationale |
|----------|------------------------|-----------|
| **AI/GPU cluster (DGX)** | Ubuntu 22.04 LTS / DGX OS | NVIDIA standard, CUDA, MLNX_OFED |
| **Enterprise K8s (OpenShift)** | RHEL 9 / RHCOS | Red Hat support, GPU Operator |
| **Vanilla K8s (on-prem)** | Ubuntu 22.04 LTS + Flatcar (workers) | Community support, minimal worker image |
| **HPC cluster (Slurm)** | Rocky Linux 9 / Ubuntu 22.04 | EL ecosystem + Lustre, or Ubuntu |
| **Traditional enterprise DB (Oracle, SAP)** | RHEL 9 / SLES 15 | Vendor certification |
| **Container host** | Ubuntu 22.04 / Alpine | Broad image compatibility / min size |
| **Development / desktop** | Fedora / Ubuntu 24.04 / OpenSUSE Tumbleweed | Latest packages, HW support |
| **Embedded / IoT** | Debian / Alpine / Yocto | Minimal footprint, stability |
| **Edge inference** | Ubuntu (ARM) / NVIDIA JetPack | Jetson, GPU support |
| **Mainframe (IBM z/Arch)** | SLES 15 / RHEL 9 | IBM certification |
---
## Package management comparison
| Feature | apt (Debian/Ubuntu) | dnf (RHEL/Rocky/Alma/Fedora) | zypper (SUSE) | pacman (Arch) | apk (Alpine) |
|---------|--------------------|------------------------------|---------------|---------------|-------------|
| **Package format** | .deb | .rpm | .rpm | .pkg.tar.zst | .apk |
| **Repo management** | /etc/apt/sources.list | /etc/yum.repos.d/ | /etc/zypp/repos.d/ | /etc/pacman.conf | /etc/apk/repositories |
| **Lock file** | — (apt-mark hold) | — (exclude) | — (lock) | — (IgnorePkg) | — |
| **Transactional update** | No | Yes (dnf history) | Yes (zypper history) | No | No |
| **Rollback** | No (manual) | Yes (dnf history rollback) | Yes (snapper + zypper) | No | No |
| **Delta updates** | Yes (apt-xapian) | Yes (deltarpm) | Yes (zsync) | No | No |
| **Version (as of 2025)** | apt 2.7+ | dnf 4.18+ | zypper 1.14+ | pacman 6.1+ | apk 2.14+ |
---
## Security model comparison
| Feature | SELinux (RHEL derivatives) | AppArmor (Ubuntu/Debian/SUSE) |
|---------|--------------------------|-------------------------------|
| **Type** | Mandatory Access Control (MAC) | Mandatory Access Control (MAC) |
| **Labeling** | Context-based (user:role:type) | Path-based (profile per executable) |
| **Configuration** | Policy (modules, booleans) | Profiles (text, in /etc/apparmor.d/) |
| **Modes** | Enforcing / Permissive / Disabled | Enforce / Complain / Disabled |
| **Learning curve** | Steep (complex policies) | Moderate (simpler profiles) |
| **Default in** | RHEL, Rocky, Alma, Fedora | Ubuntu, Debian, SLES, OpenSUSE |
| **Use case** | Enterprise multi-tenant, regulated | General-purpose server, app containment |
| **Container integration** | SELinux labels on container | AppArmor profile on container |
Additional layers:
- **seccomp** — syscall filtering (default in containerd, Docker)
- **Capabilities** — Linux capabilities (drop all except required)
- **cgroups v2** — resource isolation (CPU, memory, IO, PID)
- **User namespaces** — rootless containers (Podman, Docker rootless)
---
## Recommended migration path for EOL distributions
| From | To | Recommended approach |
|------|-----|---------------------|
| Ubuntu 20.04 (EOL 2025) | Ubuntu 22.04 or 24.04 | `do-release-upgrade` or fresh install |
| RHEL 7 (EOL 2024) | RHEL 8 or 9 | `leapp` upgrade, or fresh install |
| Rocky/Alma 8 | Rocky/Alma 9 | `dnf upgrade --releasever=9` |
| Debian 11 (EOL LTS 2026) | Debian 12 | `apt full-upgrade` + new sources.list |
| SLES 15 SP4 (EOL 2025) | SLES 15 SP6 | `zypper migration` |
| Fedora 40 (EOL 2025) | Fedora 42+ | `dnf system-upgrade` |
---
## Microsoft Windows
### Windows Server — editions
| Edition | Price (approx) | Core limits | VM rights | Use case |
|---------|---------------|-------------|-----------|----------|
| **Datacenter** | ~$6,155 (2025) | Unlimited | Unlimited Windows VMs per host | Virtualization, SDDC, S2D, HCI |
| **Standard** | ~$1,069 (2025) | 2 CPU, unlimited cores | 2 Windows VMs + Hyper-V host | General server, AD, file server |
| **Essentials** | ~$501 (2025) | 1 CPU, max 10 users | — | Small business (≤25 users) |
| **Azure Edition** | Pay-as-you-go | Per Azure VM | Per Azure | Azure-only, hotpatching |
Licensing: Windows Server Standard and Datacenter are licensed **per core** (min 16 core/server + 8 core/VM).
### Windows Server — support lifecycle
> **Mainstream:** regular updates (bug fixes, security, features). **Extended:** security updates only (free).
> **ESU:** Extended Security Updates (paid tier, ~$45300/core/year).
| Version | Release | Mainstream support | Extended support | ESU | Note |
|---------|---------|------------------|-----------------|-----|------|
| **2012 R2** | 2013-11 | 2018-10 | 2023-10 | End 2026-10 (year 3) | ESU paid, final year |
| **2016** | 2016-10 | 2022-01 | 2027-01 | — | Last with Desktop Experience |
| **2019** | 2019-01 | 2024-01 | 2029-01 | — | Last with Nano Server (1803 only) |
| **2022** | 2021-09 | 2026-10 | 2031-10 | — | Current, TPM 2.0, Credential Guard |
| **2025** | 2024-11 | 2029-10 | 2034-10 | — | Hotpatching, PowerShell 7, SMB over QUIC |
### Windows Server — version vs edition feature grid
| Version | Hyper-V | Storage Spaces Direct | Software-defined networking | Containers | GPU DDA / vGPU | WSL2 |
|---------|---------|---------------------|---------------------------|------------|---------------|------|
| 2016 Standard | Yes | No (DC only) | No (DC only) | Windows only | Yes | No |
| 2016 Datacenter | Yes | Yes | Yes | Windows | Yes | No |
| 2019 Standard | Yes | No | No | Windows | Yes | No |
| 2019 Datacenter | Yes | Yes | Yes | Windows | Yes | No |
| 2022 Standard | Yes | No | No | Windows + Linux | Yes | No |
| 2022 Datacenter | Yes | Yes | Yes | Windows + Linux (2022.2+) | Yes | No |
| 2025 Datacenter | Yes | Yes | Yes | Windows + Linux | Yes | Yes |
### Windows Desktop — support lifecycle
> **E = Enterprise, Pro = Professional, Home = Consumer**
> LTSC = Long Term Servicing Channel (stable, no feature updates)
| Version | Release | EOL (Home/Pro) | EOL (Enterprise) | LTSC EOL | Note |
|---------|---------|---------------|-----------------|----------|------|
| **10 21H2** | 2021-11 | — | 2024-06 | — |
| **10 22H2** | 2022-10 | 2025-10 | 2025-10 | — | Final Windows 10 |
| **10 LTSC 2021** | 2021-11 | — | — | 2032-01 | IoT Enterprise LTSC |
| **11 22H2** | 2022-09 | 2024-10 | 2025-10 | — |
| **11 23H2** | 2023-10 | 2025-11 | 2026-11 | — |
| **11 24H2** | 2024-10 | 2026-10 | 2027-10 | — | First with Recall, Copilot+ |
| **11 LTSC 2024** | 2024-10 | — | — | 2029-10 | Enterprise LTSC |
Windows 10 support **ended 2025-10-14** — last version with classic Control Panel.
### Windows vs Linux — comparison
| Feature | Windows Server | RHEL / Ubuntu |
|---------|---------------|---------------|
| **License (server)** | $5006,000 (per core) + CAL | $0800 (per node subscription) |
| **License (desktop)** | $100200 (OEM/retail) | Free |
| **Support cost** | Included in license (SA/ESU) | $2001,300/node/year (RHEL) |
| **Package management** | MSI, AppX, winget, NuGet | APT, DNF, Zypper |
| **Package count** | ~10,000 (chocolatey) | ~60,000+ (Ubuntu repo) |
| **Desktop GUI** | Windows Shell (mandatory) | Optional (GNOME, KDE, XFCE…) |
| **Server GUI** | Windows Shell (core-only since 2022) | CLI-only (standard) |
| **Kernel** | NT hybrid kernel (kernel-mode Win32) | Monolithic Linux kernel |
| **Device support** | OEM driver model (WHQL) | Open source + vendor drivers |
| **Container types** | Windows + Linux (WSL2) | Linux (Docker, Podman, containerd) |
| **Container registry** | Docker Hub, ACR, Nexus | Docker Hub, Quay, GHCR, Nexus… |
| **Container image size** | ~48 GB (Windows Server Core) | ~100 MB 1 GB (Alpine/Ubuntu) |
| **GPU passthrough** | DDA (Discrete Device Assignment) | GPU Direct, VFIO, SR-IOV |
| **AI/ML support** | WSL2 (CUDA), Azure ML | Native CUDA, ROCm, oneAPI |
| **CUDA support** | Yes (via WSL2 or Docker) | Native (nvidia-container-toolkit) |
| **Orchestration** | AD / GPO / SCCM / WAC | Ansible, Puppet, Salt, Foreman |
| **RBAC/AAA** | Active Directory (+ Kerberos) | LDAP, FreeIPA, SSSD, AD |
| **Remote management** | RDP, WinRM, PowerShell Remoting | SSH, Cockpit, Webmin |
| **Filesystem** | NTFS, ReFS, CSVFS | ext4, XFS, Btrfs, ZFS |
| **Max file system size** | 256 TB (NTFS), 1.2 YB (ReFS) | 1 EB (XFS), 16 EB (ZFS) |
| **Hypervisor** | Hyper-V (Type 1) | KVM (Type 2-like), Xen |
| **Dynamic memory** | Hyper-V Dynamic Memory | KSM, virtio-balloon (KVM) |
| **Live migration** | Hyper-V Live Migration | KVM Live Migration, vMotion |
### Windows specific features
| Feature | Description | Linux alternative |
|---------|------------|-------------------|
| **Active Directory** | Identity, auth, GPO, DNS, DHCP | FreeIPA, Samba AD DC, 389-ds, SSSD |
| **Group Policy** | Central desktop/server configuration | Ansible, Puppet, Salt (agent-based) |
| **Hyper-V + S2D** | Hyper-converged storage and virtualization (HCI) | Proxmox Ceph / oVirt + Gluster |
| **Failover Clustering** | Cluster-aware apps (SQL, File Server) | Pacemaker + Corosync + DRBD |
| **IIS** | Web server, ASP.NET host | Nginx, Apache (.NET host possible) |
| **PowerShell** | Scripting, Desired State Configuration | Bash, Python, Ansible |
| **Windows Admin Center** | GUI management | Cockpit, Webmin |
| **BitLocker** | Full disk encryption | LUKS + cryptsetup |
| **Windows Defender** | Antivirus + EDR | ClamAV, Wazuh, Osquery |
| **SQL Server** | Relational database | PostgreSQL, MySQL, MariaDB |
### Recommended OS per use case (including Windows)
| Use case | OS | Rationale |
|----------|-----|-------|
| **Active Directory / GPO / hybrid ID** | Windows Server 2022/2025 | AD is Windows-only |
| **SQL Server (failover cluster)** | Windows Server Datacenter + SQL EE | Always On FCI, ReFS |
| **Exchange / SharePoint** | Windows Server 2022 | Windows-only |
| **Enterprise desktop management** | Windows 11 Enterprise + Intune/SCCM | GPO, AD, enterprise MDM |
| **.NET / ASP.NET apps** | Windows Server / Linux (.NET Core) | .NET 6+ runs on Linux |
| **HCI (Microsoft stack)** | Windows Server Datacenter + S2D + Hyper-V | Azure Stack HCI |
| **Virtualization (mixed workload)** | Windows Server Datacenter (Hyper-V) | Linux + Windows VMs under one |
| **AI/GPU inference** | Linux (Ubuntu) + CUDA | NVIDIA optimal; WSL2 alternative |
| **Container orchestration (Windows nodes)** | Windows Server 2022/2025 + containerd | Windows Pods in AKS on-prem |
| **Tier 2 apps / web / API** | Ubuntu or RHEL (Linux) | Lower TCO, smaller footprint |
### Windows Server migration paths
| From | To | Recommended approach |
|------|-----|---------------------|
| Windows Server 2012 R2 (EOL 2023) | Windows Server 2022/2025 | In-place upgrade or fresh + migration |
| Windows Server 2016 (EOL 2027) | Windows Server 2022/2025 | In-place upgrade or fresh |
| Windows Server 2019 | Windows Server 2022/2025 | In-place upgrade (`Setup.exe /auto upgrade`) |
| Windows Server 2022 | Windows Server 2025 | In-place upgrade or fresh |
| Windows Server → Cloud | Azure VM / Azure Stack HCI | Azure Migrate, Storage Migration Service |
| Windows Server → Linux | Ubuntu / RHEL (re-platform) | Migrate app to .NET Core or alternative |
### Windows — API and operational limits
| Limit | Windows Server | Windows Desktop |
|-------|---------------|----------------|
| **Max RAM** | 24 TB (2025 Datacenter) | 2 TB (Pro/Enterprise), 128 GB (Home) |
| **Max CPU sockets** | 64 (Datacenter), 2 (Standard) | 2 |
| **Max CPU cores** | Unlimited | 128 (Pro), 64 (Home) |
| **Max file size (NTFS)** | 256 TB | 256 TB |
| **Max file size (ReFS)** | 18.4 EB (2025) | — |
| **Max volume size (NTFS)** | 256 TB | 256 TB |
| **Max volume size (ReFS)** | 1.2 YB (theoretical) | — |
| **Max dedup volume** | 64 TB (Data Deduplication) | — |
| **Max cluster nodes** | 64 (Failover Cluster) | — |
| **Max VM per host** | Unlimited (Datacenter) | — |
| **VM memory per VM** | 12 TB (2022+) | — |
| **VM vCPU per VM** | 240 (2022+) | — |
| **Concurrent RDP** | 2 (admin), 200+ (RDS CAL) | 1 (Home), more (RDP host) |
| **PowerShell Remoting** | Unlimited (WinRM) | Yes (WinRM) |
---
## Related
- [AI-INFRASTRUCTURE.en.md](AI-INFRASTRUCTURE.en.md) — OS for AI workloads, GPU drivers, kernel parameters
- [KUBERNETES.en.md](KUBERNETES.en.md) — container runtime, orchestration
- [HYPERVISORS.en.md](HYPERVISORS.en.md) — hypervisors, VM host OS
- [DATACENTERS.en.md](DATACENTERS.en.md) — DC layout, HW platforms
## Sources
Links, books, and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)
*Last revision: 2026-06-18*

333
OS.md Normal file
View File

@@ -0,0 +1,333 @@
# Operační systémy
> Přehled Linux distribucí a Microsoft Windows pro serverové, containerové a AI/GPU workloady, včetně support lifecycle, EOL dat a srovnání.
---
## Přehled distribucí
| Distribuce | Rodina | Package manager | Init | Security | Reference platforma |
|-----------|--------|----------------|------|----------|-------------------|
| **Ubuntu LTS** | Debian | apt (deb) | systemd | AppArmor | NVIDIA DGX, nejširší AI/GPU support |
| **Debian** | Debian | apt (deb) | systemd | AppArmor | Univerzální server, stabilita |
| **RHEL** | Red Hat | dnf (rpm) | systemd | SELinux | Enterprise standard, SAP, Oracle DB |
| **Rocky Linux** | Red Hat | dnf (rpm) | systemd | SELinux | RHEL binary compatible (free) |
| **AlmaLinux** | Red Hat | dnf (rpm) | systemd | SELinux | RHEL binary compatible (free) |
| **SLES** | SUSE | zypper (rpm) | systemd | AppArmor | HPC, SAP, mainframe |
| **OpenSUSE Leap** | SUSE | zypper (rpm) | systemd | AppArmor | Desktop, vývoj |
| **OpenSUSE Tumbleweed** | SUSE | zypper (rpm) | systemd | AppArmor | Rolling release, bleeding edge |
| **Fedora** | Red Hat | dnf (rpm) | systemd | SELinux | Desktop, technologický preview |
| **Arch Linux** | Independent | pacman | systemd | — | Rolling, power users |
| **Alpine Linux** | Independent | apk | OpenRC | — | Container image, embedded |
| **Flatcar Container Linux** | Independent | — (image-based) | systemd | — | K8s worker node, minimal footprint |
| **Bottlerocket** | Independent | — (image-based) | systemd | — | AWS K8s, minimal footprint |
---
## Support lifecycle a EOL data
> **Standard:** základní podpora (bug fixy, security). **LTS/ELS:** prodloužená podpora (jen security).
> ESM = Ubuntu Extended Security Maintenance, EUS = RHEL Extended Update Support, LTSS = SUSE Long Term Service Pack Support.
### Ubuntu LTS
| Verze | Release | Standard support | ESM / Ubuntu Pro | Poznámka |
|-------|---------|-----------------|------------------|----------|
| **20.04 LTS** (Focal) | 2020-04 | Konec 2025-04 | Konec 2030-04 | Poslední verze s Python 2 |
| **22.04 LTS** (Jammy) | 2022-04 | Konec 2027-04 | Konec 2032-04 | NVIDIA DGX standard |
| **24.04 LTS** (Noble) | 2024-04 | Konec 2029-04 | Konec 2034-04 | Nejnovější GPU/CUDA support |
| **26.04 LTS** (plán) | 2026-04 | Konec 2031-04 | Konec 2036-04 | — |
### RHEL
| Verze | Release | Full support | Maintenance support | Extended life cycle |
|-------|---------|-------------|-------------------|-------------------|
| **7** | 2014-06 | Konec 2019-08 | Konec 2024-06 | Konec 2028-06 (ELS) |
| **8** | 2019-05 | Konec 2024-05 | Konec 2029-05 | Konec 2034-06 (ELS) |
| **9** | 2022-05 | Konec 2027-05 | Konec 2032-05 | Konec 2037-06 (ELS) |
| **10** (plán) | 2025 | Konec 2029 | Konec 2034 | — |
### Rocky Linux / AlmaLinux
| Verze | Release | Support do | Kompatibilní s RHEL | Poznámka |
|-------|---------|-----------|-------------------|----------|
| **8** | 2021-06 | 2029-05 | Ano (od RHEL 8.4) | Alma/rocky |
| **9** | 2022-07 | 2032-05 | Ano (od RHEL 9.0) | Alma/rocky |
### Debian
| Verze | Release | Full support | LTS support | ELTS (paid) |
|-------|---------|-------------|-------------|-------------|
| **11** (Bullseye) | 2021-08 | 2024-08 | Konec 2026-08 | Konec 2028-08 |
| **12** (Bookworm) | 2023-06 | 2026-06 | Konec 2028-06 | Konec 2030-06 |
| **13** (Trixie) | 2025 (oček.) | ~3 roky po release | ~5 let po release | — |
### SLES
| Verze | Release | General support | LTSS | Poznámka |
|-------|---------|---------------|------|----------|
| **15 SP3** | 2021-06 | Konec 2024-12 | Konec 2027-12 | — |
| **15 SP4** | 2022-06 | Konec 2025-12 | Konec 2028-12 | — |
| **15 SP5** | 2023-06 | Konec 2026-12 | Konec 2029-12 | Aktuální SP |
| **15 SP6** | 2024-10 | Konec 2027-12 | Konec 2030-12 | — |
### Fedora
| Verze | Release | EOL | Poznámka |
|-------|---------|-----|----------|
| **38** | 2023-04 | 2024-05 | — |
| **39** | 2023-11 | 2024-12 | — |
| **40** | 2024-04 | 2025-05 | — |
| **41** | 2024-11 | 2025-12 | — |
Fedora vydává novou verzi každých ~6 měsíců, EOL ~13 měsíců po release. Slouží jako upstream pro RHEL.
### Alpine Linux
| Verze | Release | EOL |
|-------|---------|-----|
| **3.18** | 2023-05 | 2025-05 |
| **3.19** | 2023-12 | 2025-12 |
| **3.20** | 2024-05 | 2026-05 |
| **3.21** | 2024-12 | 2026-12 |
---
## Kernel verze per distribuce
| Distribuce | Kernel (default) | Kernel (HWE/enhanced) | Poznámka |
|-----------|-----------------|----------------------|----------|
| Ubuntu 22.04 LTS | 5.15 (GA) | 6.5+ (HWE) | HWE od 22.04.2 |
| Ubuntu 24.04 LTS | 6.8 | — | — |
| RHEL 8 | 4.18 | — | Backportované featur |
| RHEL 9 | 5.14 | — | Backportované featur |
| RHEL 10 | 6.11+ (oček.) | — | — |
| Rocky/Alma 8 | 4.18 | — | Stejný jako RHEL 8 |
| Rocky/Alma 9 | 5.14 | — | Stejný jako RHEL 9 |
| Debian 11 | 5.10 | 6.1 (backports) | — |
| Debian 12 | 6.1 | — | — |
| SLES 15 SP5 | 5.14 | — | — |
| SLES 15 SP6 | 6.4 | — | — |
| Fedora 40 | 6.8+ | — | Rolling upstream |
| Alpine 3.20 | 6.6 | — | — |
---
## Srovnání dle use case
| Use case | Doporučená distribuce | Zdůvodnění |
|----------|---------------------|-------|
| **AI/GPU cluster (DGX)** | Ubuntu 22.04 LTS / DGX OS | NVIDIA standard, CUDA, MLNX_OFED |
| **Enterprise K8s (OpenShift)** | RHEL 9 / RHCOS | Red Hat support, GPU Operator |
| **Vanilla K8s (on-prem)** | Ubuntu 22.04 LTS + Flatcar (workers) | Community support, minimal worker image |
| **HPC cluster (Slurm)** | Rocky Linux 9 / Ubuntu 22.04 | EL ekosystém + Lustre, nebo Ubuntu |
| **Traditional enterprise DB (Oracle, SAP)** | RHEL 9 / SLES 15 | Vendor certifikace |
| **Container host** | Ubuntu 22.04 / Alpine | Široká image kompatibilita / min size |
| **Vývoj / desktop** | Fedora / Ubuntu 24.04 / OpenSUSE Tumbleweed | Aktuální balíčky, HW support |
| **Embedded / IoT** | Debian / Alpine / Yocto | Minimal footprint, stabilita |
| **Edge inference** | Ubuntu (ARM) / NVIDIA JetPack | Jetson, GPU support |
| **Mainframe (IBM z/Arch)** | SLES 15 / RHEL 9 | IBM certifikace |
---
## Package management srovnání
| Vlastnost | apt (Debian/Ubuntu) | dnf (RHEL/Rocky/Alma/Fedora) | zypper (SUSE) | pacman (Arch) | apk (Alpine) |
|-----------|--------------------|------------------------------|---------------|---------------|-------------|
| **Formát balíčků** | .deb | .rpm | .rpm | .pkg.tar.zst | .apk |
| **Repo management** | /etc/apt/sources.list | /etc/yum.repos.d/ | /etc/zypp/repos.d/ | /etc/pacman.conf | /etc/apk/repositories |
| **Lock file** | — (apt-mark hold) | — (exclude) | — (lock) | — (IgnorePkg) | — |
| **Transactional update** | Ne | Ano (dnf history) | Ano (zypper history) | Ne | Ne |
| **Rollback** | Ne (manual) | Ano (dnf history rollback) | Ano (snapper + zypper) | Ne | Ne |
| **Delta updates** | Ano (apt-xapian) | Ano (deltarpm) | Ano (zsync) | Ne | Ne |
| **Verze (k 2025)** | apt 2.7+ | dnf 4.18+ | zypper 1.14+ | pacman 6.1+ | apk 2.14+ |
---
## Security model porovnání
| Vlastnost | SELinux (RHEL deriváty) | AppArmor (Ubuntu/Debian/SUSE) |
|-----------|----------------------|------------------------------|
| **Typ** | Mandatory Access Control (MAC) | Mandatory Access Control (MAC) |
| **Labelování** | Kontextové (user:role:type) | Path-based (profil k executable) |
| **Konfigurace** | Policy (moduly, booleany) | Profily (textové, v /etc/apparmor.d/) |
| **Režimy** | Enforcing / Permissive / Disabled | Enforce / Complain / Disabled |
| **Křivka učení** | Strmá (politiky komplexní) | Mírná (profily jednodušší) |
| **Default v** | RHEL, Rocky, Alma, Fedora | Ubuntu, Debian, SLES, OpenSUSE |
| **Use case** | Enterprise multiclient, regulované prostředí | Univerzální server, containment aplikací |
| **Container integrace** | SELinux labels na kontejner | AppArmor profile na kontejner |
Další vrstvy:
- **seccomp** — syscall filtering (default v containerd, Docker)
- **Capabilities** — Linux capabilities (drop vše kromě nutných)
- **cgroups v2** — resource isolation (CPU, memory, IO, PID)
- **User namespaces** — rootless kontejnery (Podman, Docker rootless)
---
## Doporučená migrační cesta pro EOL distribuce
| Ze staré verze | Na | Doporučený postup |
|----------------|-----|-------------------|
| Ubuntu 20.04 (EOL 2025) | Ubuntu 22.04 nebo 24.04 | `do-release-upgrade` nebo fresh install |
| RHEL 7 (EOL 2024) | RHEL 8 nebo 9 | `leapp` upgrade, nebo fresh install |
| Rocky/Alma 8 | Rocky/Alma 9 | `dnf upgrade --releasever=9` |
| Debian 11 (EOL LTS 2026) | Debian 12 | `apt full-upgrade` + nové sources.list |
| SLES 15 SP4 (EOL 2025) | SLES 15 SP6 | `zypper migration` |
| Fedora 40 (EOL 2025) | Fedora 42+ | `dnf system-upgrade` |
---
## Microsoft Windows
### Windows Server — edice
| Edice | Cena (approx) | Core limity | VM rights | Use case |
|-------|--------------|-------------|-----------|----------|
| **Datacenter** | ~$6 155 (2025) | Neomezen | Neomezené Windows VM na hostiteli | Virtualizace, SDDC, S2D, HCI |
| **Standard** | ~$1 069 (2025) | 2 CPU, neomezen jader | 2 Windows VM + Hyper-V host | Běžný server, AD, file server |
| **Essentials** | ~$501 (2025) | 1 CPU, max 10 uživatelů | — | Malé firmy (do 25 uživatelů) |
| **Azure Edition** | Pay-as-you-go | Dle Azure VM | Dle Azure | Azure-only, hotpatching |
Licencování: Windows Server Standard a Datacenter se licencují **per core** (min 16 core/server + 8 core/VM).
### Windows Server — support lifecycle
> **Mainstream:** běžné aktualizace (bug fixy, security, feature). **Extended:** jen security aktualizace (zdarma).
> **ESU:** Extended Security Updates (placená vrstva navíc, cca $45300/core/rok).
| Verze | Release | Mainstream support | Extended support | ESU | Poznámka |
|-------|---------|------------------|-----------------|-----|----------|
| **2012 R2** | 2013-11 | 2018-10 | 2023-10 | Konec 2026-10 (3. rok) | ESU placená, poslední rok |
| **2016** | 2016-10 | 2022-01 | 2027-01 | — | Poslední s Desktop Experience |
| **2019** | 2019-01 | 2024-01 | 2029-01 | — | Poslední s Nano Server (jen 1803) |
| **2022** | 2021-09 | 2026-10 | 2031-10 | — | Aktuální, TPM 2.0, Credential Guard |
| **2025** | 2024-11 | 2029-10 | 2034-10 | — | Hotpatching, PowerShell 7, SMB over QUIC |
### Windows Server — verze vs edice grid
| Verze | Hyper-V | Storage Spaces Direct | Software-defined networking | Containers | GPU DDA / vGPU | WSL2 |
|-------|---------|---------------------|---------------------------|------------|---------------|------|
| 2016 Standard | Ano | Ne (jen Datacenter) | Ne (jen Datacenter) | Jen Windows | Ano | Ne |
| 2016 Datacenter | Ano | Ano | Ano | Windows | Ano | Ne |
| 2019 Standard | Ano | Ne | Ne | Windows | Ano | Ne |
| 2019 Datacenter | Ano | Ano | Ano | Windows | Ano | Ne |
| 2022 Standard | Ano | Ne | Ne | Windows + Linux | Ano | Ne |
| 2022 Datacenter | Ano | Ano | Ano | Windows + Linux (2022.2+) | Ano | Ne |
| 2025 Datacenter | Ano | Ano | Ano | Windows + Linux | Ano | Ano |
### Windows Desktop — support lifecycle
> **E = Enterprise, Pro = Professional, Home = Consumer**
> LTSC = Long Term Servicing Channel (stabilní, bez feature updatů)
| Verze | Release | EOL (Home/Pro) | EOL (Enterprise) | LTSC EOL | Poznámka |
|-------|---------|---------------|-----------------|----------|----------|
| **10 21H2** | 2021-11 | — | 2024-06 | — |
| **10 22H2** | 2022-10 | 2025-10 | 2025-10 | — | Poslední Windows 10 |
| **10 LTSC 2021** | 2021-11 | — | — | 2032-01 | IoT Enterprise LTSC |
| **11 22H2** | 2022-09 | 2024-10 | 2025-10 | — |
| **11 23H2** | 2023-10 | 2025-11 | 2026-11 | — |
| **11 24H2** | 2024-10 | 2026-10 | 2027-10 | — | První s Recall, Copilot+ |
| **11 LTSC 2024** | 2024-10 | — | — | 2029-10 | Enterprise LTSC |
Podpora Windows 10 **skončila 2025-10-14** — poslední verze s klasickým ovládacím panelem.
### Windows vs Linux — srovnání
| Vlastnost | Windows Server | RHEL / Ubuntu |
|-----------|---------------|---------------|
| **Licence (server)** | $5006 000 (per core) + CAL | $0800 (per node subscription) |
| **Licence (desktop)** | $100200 (OEM/retail) | Zdarma |
| **Cena za support** | Zahrnuto v licenci (SA/ESU) | $2001 300/node/rok (RHEL) |
| **Package management** | MSI, AppX, winget, NuGet | APT, DNF, Zypper |
| **Package count** | ~10 000 (chocolatey) | ~60 000+ (Ubuntu repo) |
| **Desktop GUI** | Windows Shell (mandatory) | Volitelný (GNOME, KDE, XFCE…) |
| **Server GUI** | Windows Shell (od 2022 Core only) | CLI-only (standard) |
| **Kernel** | NT hybrid kernel (kernel-mode Win32) | Monolithic Linux kernel |
| **Device support** | OEM driver model (WHQL) | Open source + vendor drivers |
| **Container types** | Windows + Linux (WSL2) | Linux (Docker, Podman, containerd) |
| **Container registry** | Docker Hub, ACR, Nexus | Docker Hub, Quay, GHCR, Nexus… |
| **Container image size** | ~48 GB (Windows Server Core) | ~100 MB 1 GB (Alpine/Ubuntu) |
| **GPU passthrough** | DDA (Discrete Device Assignment) | GPU Direct, VFIO, SR-IOV |
| **AI/ML support** | WSL2 (CUDA), Azure ML | Native CUDA, ROCm, oneAPI |
| **CUDA support** | Ano (přes WSL2 nebo Docker) | Native (nvidia-container-toolkit) |
| **Orchestration** | AD / GPO / SCCM / WAC | Ansible, Puppet, Salt, Foreman |
| **RBAC/AAA** | Active Directory (+ Kerberos) | LDAP, FreeIPA, SSSD, AD |
| **Remote management** | RDP, WinRM, PowerShell Remoting | SSH, Cockpit, Webmin |
| **Filesystem** | NTFS, ReFS, CSVFS | ext4, XFS, Btrfs, ZFS |
| **Max file system size** | 256 TB (NTFS), 1.2 YB (ReFS) | 1 EB (XFS), 16 EB (ZFS) |
| **Hypervisor** | Hyper-V (Type 1) | KVM (Type 2-ish), Xen |
| **Dynamic memory** | Hyper-V Dynamic Memory | KSM, virtio-balloon (KVM) |
| **Live migration** | Hyper-V Live Migration | KVM Live Migration, vMotion |
### Windows specific features
| Feature | Popis | Lze nahradit na Linuxu? |
|---------|-------|------------------------|
| **Active Directory** | Identity, auth, GPO, DNS, DHCP | FreeIPA, Samba AD DC, 389-ds, SSSD |
| **Group Policy** | Centrální konfigurace desktopů/serverů | Ansible, Puppet, Salt (agent-based) |
| **Hyper-V + S2D** | Hyper-converged storage a virtualizace (HCI) | Proxmox Ceph / oVirt + Gluster |
| **Failover Clustering** | Cluster-aware aplikace (SQL, File Server) | Pacemaker + Corosync + DRBD |
| **IIS** | Web server, ASP.NET host | Nginx, Apache (bez ASP.NET, nebo .NET host) |
| **PowerShell** | Scripting, Desired State Configuration | Bash, Python, Ansible |
| **Windows Admin Center** | GUI management | Cockpit, Webmin |
| **BitLocker** | Full disk encryption | LUKS + cryptsetup |
| **Windows Defender** | Antivirus + EDR | ClamAV, Wazuh, Osquery |
| **SQL Server** | Relační DB | PostgreSQL, MySQL, MariaDB |
### Doporučený OS dle use case (včetně Windows)
| Use case | OS | Zdůvodnění |
|----------|-----|-------|
| **Active Directory / GPO / hybrid ID** | Windows Server 2022/2025 | AD jen na Windows |
| **SQL Server (failover cluster)** | Windows Server Datacenter + SQL EE | Always On FCI, ReFS |
| **Exchange / SharePoint** | Windows Server 2022 | Jen na Windows |
| **Enterprise desktop management** | Windows 11 Enterprise + Intune/SCCM | GPO, AD, enterprise MDM |
| **.NET / ASP.NET aplikace** | Windows Server / Linux (.NET Core) | .NET 6+ běží na Linuxu |
| **HCI (Microsoft stack)** | Windows Server Datacenter + S2D + Hyper-V | Azure Stack HCI |
| **Virtualizace (mixed workload)** | Windows Server Datacenter (Hyper-V) | Linux i Windows VM pod jedním |
| **AI/GPU inference** | Linux (Ubuntu) + CUDA | NVIDIA optimální; WSL2 alternativa |
| **Container orchestration (Windows nodes)** | Windows Server 2022/2025 + containerd | Windows Pods v AKS on-prem |
| **Tier 2 aplikace / web / API** | Ubuntu nebo RHEL (Linux) | Nižší TCO, menší footprint |
### Windows Server migrační cesty
| Ze staré verze | Na | Doporučený postup |
|---------------|-----|-------------------|
| Windows Server 2012 R2 (EOL 2023) | Windows Server 2022/2025 | In-place upgrade nebo fresh + migration |
| Windows Server 2016 (EOL 2027) | Windows Server 2022/2025 | In-place upgrade nebo fresh |
| Windows Server 2019 | Windows Server 2022/2025 | In-place upgrade (`Setup.exe /auto upgrade`) |
| Windows Server 2022 | Windows Server 2025 | In-place upgrade nebo fresh |
| Windows Server → Cloud | Azure VM / Azure Stack HCI | Azure Migrate, Storage Migration Service |
| Windows Server → Linux | Ubuntu / RHEL (re-platform) | Migrace aplikace na .NET Core nebo alternativu |
### Windows — API a provozní limity
| Limit | Windows Server | Windows Desktop |
|-------|---------------|----------------|
| **Max RAM** | 24 TB (2025 Datacenter) | 2 TB (Pro/Enterprise), 128 GB (Home) |
| **Max CPU sockets** | 64 (Datacenter), 2 (Standard) | 2 |
| **Max CPU cores** | Neomezen | 128 (Pro), 64 (Home) |
| **Max file size (NTFS)** | 256 TB | 256 TB |
| **Max file size (ReFS)** | 18.4 EB (2025) | — |
| **Max volume size (NTFS)** | 256 TB | 256 TB |
| **Max volume size (ReFS)** | 1.2 YB (teoreticky) | — |
| **Max dedup volume** | 64 TB (Data Deduplication) | — |
| **Max cluster nodes** | 64 (Failover Cluster) | — |
| **Max VM per host** | Neomezen (Datacenter) | — |
| **VM memory per VM** | 12 TB (2022+) | — |
| **VM vCPU per VM** | 240 (2022+) | — |
| **Concurrent RDP** | 2 (admin), 200+ (RDS CAL) | 1 (Home), více (RDP host) |
| **PowerShell Remoting** | Neomezen (WinRM) | Ano (WinRM) |
- [AI-INFRASTRUCTURE.md](AI-INFRASTRUCTURE.md) — OS pro AI workloady, GPU drivery, kernel parametry
- [KUBERNETES.md](KUBERNETES.md) — container runtime, orchestrace
- [HYPERVISORS.md](HYPERVISORS.md) — hypervisory, VM host OS
- [DATACENTERS.md](DATACENTERS.md) — DC layout, HW platformy
## Zdroje
Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
*Poslední revize: 2026-06-18*

178
POSTGRESQL.en.md Normal file
View File

@@ -0,0 +1,178 @@
# 🐘 PostgreSQL
## Overview
PostgreSQL is the most advanced open-source relational database with emphasis on extensibility, SQL standards, and reliability. Development since 1996, strong community, active release cycle (major version every year).
## Architecture
### Process model
```text
Postmaster (supervisor)
├── Backend process (1 per connection)
├── WAL writer
├── Checkpointer
├── Autovacuum launcher
├── Stats collector
├── Logical replication launcher
└── Archiver (WAL archiving)
```
Each connection = its own OS process (not thread). Advantage: isolation, stability. Disadvantage: higher memory footprint with thousands of connections → connection pooler required (PgBouncer).
### MVCC (Multi-Version Concurrency Control)
Each transaction sees a snapshot of data from the moment it started. Old row versions (tuples) remain in the table:
- INSERT creates a new tuple with `xmin = current_xid`
- DELETE marks tuple with `xmax = current_xid` (doesn't disappear immediately)
- UPDATE = DELETE old + INSERT new
- VACUUM physically deletes tuples older than the oldest active snapshot
### VACUUM and autovacuum
| Parameter | Description | Default |
|-----------|-------------|---------|
| `autovacuum_vacuum_threshold` | Min. dead rows to trigger vacuum | 50 |
| `autovacuum_vacuum_scale_factor` | % of table as threshold | 0.2 (20%) |
| `autovacuum_analyze_threshold` | Min. changed rows for ANALYZE | 50 |
| `autovacuum_vacuum_cost_limit` | Limits I/O of vacuum (prevents load) | 200 |
| `autovacuum_naptime` | Interval between checks | 1 min |
| `deadlock_timeout` | Deadlock detection | 1 s |
**Signs of insufficient vacuum**: table growth (bloat), degraded index scan performance, XID wraparound hazard.
### WAL (Write-Ahead Log)
Append-only log of all changes for crash recovery and replication:
```conf
wal_level = replica # or logical
archive_mode = on
archive_command = 'aws s3 cp %p s3://backups/pg-wal/%f'
```
**PITR (Point-In-Time Recovery)**:
1. Restore base backup (pg_basebackup)
2. Replay WAL archives up to target time
3. `recovery_target_time = '2026-06-03 10:30:00 UTC'`
### Replication slots
- **Physical** — guarantees WAL is not deleted by master until replica consumes it
- **Logical** — for logical replication (selective tables, data transformation)
- **Risk**: if replica fails, WAL grows on disk (disk full)
- Monitoring: `pg_replication_slots`, `pg_stat_replication`
### Configuration
Main files (per Obe & Hsu):
- `postgresql.conf` — memory, network, logging, storage
- `pg_hba.conf` — access privileges
- `pg_ident.conf` — OS user to PostgreSQL role mapping
### AI-Ready PostgreSQL 18
(Kumar, Linster, 2026) — PostgreSQL 18 as a unified platform for transactions, analytics, and AI:
| Area | Technique |
|------|-----------|
| Vectors | pgvector — embeddings directly in table rows |
| Hybrid pattern | Semantic recall → SQL filtering |
| LLM integration | PostgreSQL + MCP (Model Context Protocol) |
| Embedding pipeline | Batch and stream embedding generation |
**Hybrid query**:
```sql
SELECT p.*, pm.name
FROM products p
JOIN product_embeddings pe ON p.id = pe.product_id
WHERE pe.embedding <-> '[0.1, 0.3, ...]' < 0.8
AND p.in_stock = true
AND p.price < 100.00
ORDER BY pe.embedding <-> '[0.1, 0.3, ...]'
LIMIT 10;
```
### Extensions
| Extension | Purpose |
|-----------|---------|
| pgvector | Vector search for AI/embeddings |
| PostGIS | Geographic data, spatial queries |
| pg_stat_statements | Query performance monitoring |
| pg_duckdb | Analytical queries (DuckDB engine inside PG) |
| pg_search | Full-text and hybrid search |
| pg_cron | DB job scheduling |
| citus | Horizontal scaling (sharding) |
| timescaledb | Time-series optimization |
| pgaudit | Audit logging |
## Connection pooling
| Pooler | Type | Protocol |
|--------|------|----------|
| PgBouncer | Proxy (transaction/session) | PostgreSQL wire |
| Odyssey | Proxy (multithreaded) | PostgreSQL wire |
| pgpool-II | Proxy (replication, load balancing) | PostgreSQL wire |
| RDS Proxy | Managed proxy (AWS) | PostgreSQL wire |
**PgBouncer modes**:
- **Session pooling** — connection held for entire application session → overhead
- **Transaction pooling** — connection returned after transaction completes → more efficient (requires statelessness)
## Recommendations — where PostgreSQL is better
| Area | PostgreSQL | Competition | Why PG |
|------|-----------|-------------|--------|
| **Extensibility** | Extensions, custom types, operators, index methods | MySQL limited | Can add anything from vectors to full-text in DB |
| **SQL standard** | Closest to ANSI SQL | MySQL deviations (GROUP BY, ALTER TABLE) | Portability, fewer surprises |
| **Geospatial data** | PostGIS (gold standard GIS) | MySQL GIS (limited) | Only real open-source choice for GIS |
| **Consistency** | SSI serializable, foreign keys, CHECK, exclusions | MySQL MyISAM no FK, InnoDB only RC | Suitable for financial and critical systems |
| **Concurrent read/write** | MVCC without reader/writer blocking | MySQL InnoDB reader blocks writer (and vice versa) in older versions | Better read scalability |
| **AI/vectors** | pgvector natively in DB | Separate vector DB (increased latency) | Hybrid queries in single SQL |
| **License** | PostgreSQL license (MIT-like) | MySQL dual license (Oracle) | No vendor lock-in |
### When to use PostgreSQL
- **Enterprise applications** — require ACID, referential integrity, complex transactions
- **Geographic systems** — GIS, map applications, location services
- **Financial systems** — accounting, banking, compliance (audit logging, SSI)
- **AI / RAG applications** — hybrid vector + relational queries in one DB
- **Analytics on relational data** — pg_duckdb, materialized views, window functions
- **Multi-tenant applications** — row-level security, schemas per tenant
## PostgreSQL licensing
| Variant | License | Price | Restrictions |
|---------|---------|-------|-------------|
| **PostgreSQL** | PostgreSQL license (MIT-like) | $0 | None — can use, modify, distribute in commercial products. No "commercial license" needed |
| **Amazon Aurora PostgreSQL** | Proprietary (AWS) | ~$0.10-1.00/hour | AWS managed, PostgreSQL compatible. AWS may use PG code thanks to PostgreSQL license |
| **YugabyteDB** | Apache 2.0 | $0 (core) | PostgreSQL compatible distributed SQL, built on PG query layer |
| **TimescaleDB** | Apache 2.0 (community) / Timescale License (enterprise) | $0 (community) | Time-series extensions for PostgreSQL. Enterprise: tiered storage, compression, multi-node |
**Key point**: The PostgreSQL license is one of the most liberal — it allows cloud providers (AWS, GCP, Azure) to offer PostgreSQL as a managed service without restrictions. This is different from MongoDB (SSPL) and Redis (RSALv2). Thanks to this, PostgreSQL has the broadest cloud support of any database.
**Impact on choice**: No license risk, no vendor lock-in, no hidden costs. PostgreSQL is a safe choice for any project.
### When to use something else
- **Simple web / blog** → SQLite (lighter in embedded scenarios)
- **High-throughput key-value** → Redis (order of magnitude lower latency)
- **Time-series at massive scale** → TimescaleDB, InfluxDB
- **Globally distributed data** → CockroachDB, Spanner
- **Full-text search primarily** → Elasticsearch
## Sources
References, books, and standards: [sources/databases/sources.en.md](sources/databases/sources.en.md)
### Recommended reading
| Book | Authors | ISBN | Description |
|------|---------|------|-------------|
| PostgreSQL: Up and Running (3rd ed.) | Regina Obe, Leo Hsu | 978-1491962935 | Practical guide to administration, configuration, and extensions |
| AI-Ready PostgreSQL 18 | Kumar, Linster | — | PostgreSQL as unified platform for AI workloads |
*Last revision: 2026-06-03*

178
POSTGRESQL.md Normal file
View File

@@ -0,0 +1,178 @@
# 🐘 PostgreSQL
## Přehled
PostgreSQL je nejpokročilejší open-source relační databáze s důrazem na rozšiřitelnost, standardy SQL a spolehlivost. Vývoj od 1996, silná komunita, aktivní release cyklus (major verze každý rok).
## Architektura
### Procesový model
```text
Postmaster (supervisor)
├── Backend process (1 per connection)
├── WAL writer
├── Checkpointer
├── Autovacuum launcher
├── Stats collector
├── Logical replication launcher
└── Archiver (WAL archiving)
```
Každé spojení = vlastní OS proces (ne vlákno). Výhoda: izolace, stabilita. Nevýhoda: vyšší memory footprint u tisíců spojení → nutný connection pooler (PgBouncer).
### MVCC (Multi-Version Concurrency Control)
Každá transakce vidí snapshot dat z okamžiku startu. Staré verze řádků (tuple) zůstávají v tabulce:
- INSERT vytvoří nový tuple s `xmin = current_xid`
- DELETE označí tuple `xmax = current_xid` (nezmizí hned)
- UPDATE = DELETE old + INSERT new
- VACUUM fyzicky maže tuple starší než nejstarší aktivní snapshot
### VACUUM a autovacuum
| Parametr | Popis | Výchozí |
|----------|-------|---------|
| `autovacuum_vacuum_threshold` | Min. mrtvých řádků pro spuštění | 50 |
| `autovacuum_vacuum_scale_factor` | % z tabulky jako threshold | 0.2 (20 %) |
| `autovacuum_analyze_threshold` | Min. změněných řádků pro ANALYZE | 50 |
| `autovacuum_vacuum_cost_limit` | Limituje I/O vacuum (prevence zátěže) | 200 |
| `autovacuum_naptime` | Interval mezi kontrolami | 1 min |
| `deadlock_timeout` | Detekce deadlocků | 1 s |
**Příznaky nedostatečného vacuum**: růst tabulky (bloat), zhoršení výkonu index scanů, XID wraparound hazard.
### WAL (Write-Ahead Log)
Append-only log všech změn pro crash recovery a replikaci:
```conf
wal_level = replica # nebo logical
archive_mode = on
archive_command = 'aws s3 cp %p s3://backups/pg-wal/%f'
```
**PITR (Point-In-Time Recovery)**:
1. Restore base backup (pg_basebackup)
2. Replay WAL archivů až k cílovému času
3. `recovery_target_time = '2026-06-03 10:30:00 UTC'`
### Replication slots
- **Physical** — zaručuje, že WAL není smazán masterem, dokud ho replica nespotřebuje
- **Logical** — pro logickou replikaci (selektivní tabulky, transformace dat)
- **Riziko**: pokud replica spadne, WAL naroste na disku (disk full)
- Monitoring: `pg_replication_slots`, `pg_stat_replication`
### Konfigurace
Hlavní soubory (dle Obe & Hsu):
- `postgresql.conf` — paměť, síť, logování, storage
- `pg_hba.conf` — přístupová práva
- `pg_ident.conf` — mapování OS uživatelů na PostgreSQL role
### AI-Ready PostgreSQL 18
(Kumar, Linster, 2026) — PostgreSQL 18 jako unified platform pro transakce, analytiku a AI:
| Oblast | Technika |
|--------|----------|
| Vektory | pgvector — embeddingy přímo v řádcích tabulky |
| Hybridní pattern | Semantic recall → SQL filtrování |
| LLM integrace | PostgreSQL + MCP (Model Context Protocol) |
| Embedding pipeline | Batch i stream generování embeddingů |
**Hybridní dotaz**:
```sql
SELECT p.*, pm.name
FROM products p
JOIN product_embeddings pe ON p.id = pe.product_id
WHERE pe.embedding <-> '[0.1, 0.3, ...]' < 0.8
AND p.in_stock = true
AND p.price < 100.00
ORDER BY pe.embedding <-> '[0.1, 0.3, ...]'
LIMIT 10;
```
### Rozšíření (extensions)
| Extension | Účel |
|-----------|-------|
| pgvector | Vektorové vyhledávání pro AI/embeddings |
| PostGIS | Geografická data, prostorové dotazy |
| pg_stat_statements | Monitoring výkonu dotazů |
| pg_duckdb | Analytické dotazy (DuckDB engine uvnitř PG) |
| pg_search | Full-text a hybridní vyhledávání |
| pg_cron | Scheduling úloh v DB |
| citus | Horizontální škálování (sharding) |
| timescaledb | Time-series optimalizace |
| pgaudit | Auditní logování |
## Connection pooling
| Pooler | Typ | Protokol |
|--------|-----|----------|
| PgBouncer | Proxy (transaction/session) | PostgreSQL wire |
| Odyssey | Proxy (multithreaded) | PostgreSQL wire |
| pgpool-II | Proxy (replication, load balancing) | PostgreSQL wire |
| RDS Proxy | Managed proxy (AWS) | PostgreSQL wire |
**PgBouncer režimy**:
- **Session pooling** — spojení drženo po celou dobu session (aplikace) → overhead
- **Transaction pooling** — spojení vráceno po dokončení transakce → efektivnější (vyžaduje bezstavovost)
## Doporučení — v čem je PostgreSQL lepší
| Oblast | PostgreSQL | Konkurence | Proč PG |
|--------|-----------|------------|---------|
| **Rozšiřitelnost** | Extensions, custom types, operators, index methods | MySQL omezené | Lze přidat cokoliv od vektorů po full-text v DB |
| **SQL standard** | Nejbližší ANSI SQL | MySQL odbočky (GROUP BY, ALTER TABLE) | Přenositelnost, méně překvapení |
| **Geoprostorová data** | PostGIS (zlatý standard GIS) | MySQL GIS (omezený) | Jediná reálná open-source volba pro GIS |
| **Konzistence** | SSI serializable, foreign keys, CHECK, exclusions | MySQL MyISAM bez FK, InnoDB jen RC | Vhodné pro finanční a kritické systémy |
| **Concurrent读写** | MVCC bez reader/writer blokování | MySQL InnoDB reader blokuje writer (a naopak) u starších verzí | Lepší škálovatelnost čtení |
| **AI/vektory** | pgvector nativně v DB | Samostatná vektorová DB (zvýšení latence) | Hybridní dotazy v jediném SQL |
| **Licence** | PostgreSQL license (MIT-like) | MySQL dvojí licence (Oracle) | Žádná vendor lock-in |
### Kdy použít PostgreSQL
- **Enterprise aplikace** — vyžadují ACID, referenční integritu, komplexní transakce
- **Geografické systémy** — GIS, mapové aplikace, lokalitní služby
- **Finanční systémy** — účetnictví, banking, compliance (audit logging, SSI)
- **AI / RAG aplikace** — hybridní vektorové + relační dotazy v jedné DB
- **Analytika na relačních datech** — pg_duckdb, materializované views, window functions
- **Multi-tenant aplikace** — row-level security, schemas per tenant
## PostgreSQL licensing
| Varianta | Licence | Cena | Omezení |
|----------|---------|------|---------|
| **PostgreSQL** | PostgreSQL license (MIT-like) | $0 | Žádná — lze používat, modifikovat, distribuovat v komerčních produktech. Není potřeba žádný "commercial license" |
| **Amazon Aurora PostgreSQL** | Proprietary (AWS) | ~$0.10-1.00/hod | AWS managed, PostgreSQL compatible. AWS smí používat PG kód díky PostgreSQL license |
| **YugabyteDB** | Apache 2.0 | $0 (core) | PostgreSQL kompatibilní distributed SQL, postaveno na PG query layer |
| **TimescaleDB** | Apache 2.0 (community) / Timescale License (enterprise) | $0 (community) | Časově řadová rozšíření PostgreSQL. Enterprise: tiered storage, compression, multi-node |
**Klíčové**: PostgreSQL license je jedna z nejliberálnějších — umožňuje cloud providerům (AWS, GCP, Azure) nabízet PostgreSQL jako managed službu bez omezení. To je rozdíl oproti MongoDB (SSPL) a Redis (RSALv2). Díky tomu má PostgreSQL nejširší cloud podporu ze všech databází.
**Dopad na výběr**: Žádný license risk, žádný vendor lock-in, žádné skryté náklady. PostgreSQL je bezpečná volba pro jakýkoliv projekt.
### Kdy použít něco jiného
- **Jednoduchý web / blog** → SQLite (v embedded scénáři lehčí)
- **High-throughput key-value** → Redis (o řád nižší latence)
- **Time-series v masivním měřítku** → TimescaleDB, InfluxDB
- **Globálně distribuovaná data** → CockroachDB, Spanner
- **Full-text search primárně** → Elasticsearch
## Zdroje
Odkazy, knihy a standardy: [sources/databases/sources.md](sources/databases/sources.md)
### Doporučená literatura
| Kniha | Autoři | ISBN | Popis |
|-------|--------|------|-------|
| PostgreSQL: Up and Running (3rd ed.) | Regina Obe, Leo Hsu | 978-1491962935 | Praktický průvodce administrací, konfigurací a extensions |
| AI-Ready PostgreSQL 18 | Kumar, Linster | — | PostgreSQL jako unified platform pro AI workloads |
*Poslední revize: 2026-06-03*

228
PROVISIONING.en.md Normal file
View File

@@ -0,0 +1,228 @@
# 📦 Provisioning — boot, installation, server management
## Network boot (PXE / iPXE)
### PXE boot flow
```
1. Server power-on → PXE ROM in NIC / UEFI
2. DHCP Broadcast → DHCP server offers IP + next-server (TFTP) + boot file
3. TFTP downloads pxelinux.0 (BIOS) / bootx64.efi (UEFI)
4. Loads configuration (pxelinux.cfg/default or MAC/IP-based)
5. Downloads kernel + initrd via TFTP/HTTP (iPXE)
6. Kernel boot → automated installation (Kickstart / Preseed / AutoYaST)
```
### DHCP configuration (ISC DHCP)
```
subnet 10.0.0.0 netmask 255.255.255.0 {
next-server 10.0.0.10; # TFTP server
filename "ipxe.efi"; # Boot file (UEFI)
option domain-name-servers 10.0.0.10;
option routers 10.0.0.1;
}
```
### iPXE (modern PXE replacement)
- HTTP instead of TFTP (faster, more reliable)
- HTTPS support (Image verification, secure boot)
- iSCSI boot, FCoE boot
- Scriptable: `chain http://boot.example.com/script.ipxe`
- Embedded: iPXE ROM flashed directly into NIC
### PXE vs iPXE comparison
| Feature | PXE | iPXE |
|---------|-----|------|
| Protocol | TFTP (slow, 512B/block) | HTTP/HTTPS/iSCSI |
| Encryption | No | HTTPS, TLS |
| Scripting | Menu only | Full scripting engine |
| Debugging | Limited | Built-in shell |
| UEFI/BIOS | Both | Both |
## Automated installation
### Kickstart (RHEL/Alma/Rocky)
```
# Minimal kickstart for RHEL 9
text
url --url="http://10.0.0.10/install/rhel9"
lang en_US.UTF-8
keyboard us
timezone Europe/Prague --isUtc
rootpw --iscrypted $6$...
%packages
@^minimal-environment
vim
net-tools
%end
%post
echo "node001" > /etc/hostname
%end
reboot
```
### Preseed (Debian/Ubuntu)
```
d-i debian-installer/locale string en_US.UTF-8
d-i keyboard-configuration/xkb-keymap us
d-i netcfg/choose_interface select auto
d-i netcfg/get_hostname string node001
d-i clock-setup/utc boolean true
d-i time/zone string Europe/Prague
d-i partman-auto/method string regular
d-i partman-auto/choose_recipe select atomic
d-i passwd/root-login boolean true
d-i passwd/root-password password securepass
d-i passwd/root-password-again password securepass
d-i pkgsel/include string openssh-server vim
d-i finish-install/reboot_in_progress note
```
## Metal as a Service
### MAAS (Canonical)
- **Discovery**: DHCP → PXE boot → hardware detection (CPU, RAM, disk, MAC)
- **Commissioning**: node goes through commissioning, stores inventory in DB
- **Deploy**: OS image (Ubuntu, RHEL, ESXi) written to disk → reboot
- **Integration**: Juju, OpenStack, Kubernetes (Charmed Kubernetes)
- **Networking**: VLAN, subnet, DNS/DHCP management, BGP peering
### Digital Rebar / RackN
- **Provisioning**: workflow-based (stages: discovery → firmware → OS → config)
- **Multi-cloud**: bare metal + cloud + edge
- **Template**: templates for OS deployment (RHEL, Ubuntu, VMware)
- **API**: fully REST API, Terraform provider
## Management API — Redfish
### DMTF Standard
REST API (JSON) → successor to IPMI.
| Endpoint | Purpose |
|----------|---------|
| `/redfish/v1/Systems/` | Server management (power, boot, inventory) |
| `/redfish/v1/Chassis/` | Physical hardware (PSU, fan, temp, sensors) |
| `/redfish/v1/Managers/` | BMC (iLO, iDRAC, XClarity) |
| `/redfish/v1/UpdateService/` | Firmware updates |
| `/redfish/v1/EventService/` | Event subscription (webhook) |
### Redfish examples
```
# Power on server
POST /redfish/v1/Systems/1/Actions/ComputerSystem.Reset
Body: {"ResetType": "On"}
# Set boot override (one-shot PXE)
PATCH /redfish/v1/Systems/1
Body: {"Boot": {"BootSourceOverrideTarget": "Pxe", "BootSourceOverrideEnabled": "Once"}}
# Get sensor data
GET /redfish/v1/Chassis/1/Thermal
→ {"Temperatures": [{"Name": "CPU1", "ReadingCelsius": 45}], "Fans": [...]}
```
### IPMI (legacy)
- Port 623/UDP (RMCP)
- `ipmitool power on/off/status`
- `ipmitool sensor list`
- `ipmitool chassis bootdev pxe`
- Serial over LAN: `ipmitool sol activate`
## Terraform for provisioning
```hcl
# Terraform provider for VMware vSphere
provider "vsphere" {
user = var.vsphere_user
password = var.vsphere_password
vsphere_server = var.vsphere_server
}
resource "vsphere_virtual_machine" "web" {
name = "web-${count.index}"
resource_pool_id = data.vsphere_resource_pool.pool.id
datastore_id = data.vsphere_datastore.ds.id
num_cpus = 4
memory = 16384
guest_id = "rhel9_64Guest"
network_interface { network_id = data.vsphere_network.net.id }
disk { label = "os", size = 80 }
}
```
More in [CICD.en.md](CICD.en.md#infrastructure-as-code-iac).
## Firmware management
- **BIOS/UEFI settings**: profile update during provisioning (Redfish `PATCH /Systems/1/Bios`)
- **Firmware updates**: Redfish UpdateService, SUU (Dell), SUM (HPE), SMM (Supermicro)
- **Lifecycle Controller** (Dell LC): integrated OS for firmware management
- **Baseline management**: maintain consistent firmware versions across fleet
- **Boot: UEFI vs Legacy BIOS**:
- **UEFI**: Secure Boot, GPT, larger disks, faster boot
- **Legacy BIOS**: MBR, compatibility, 2 TB boot disk limit
## Configuration management (post-provisioning)
| Tool | Language | Push/Pull | Use case |
|------|----------|-----------|----------|
| **Ansible** | YAML | Push (SSH) | General config management, ad-hoc |
| **Puppet** | Ruby DSL | Pull (agent) | State management, enterprise |
| **Chef** | Ruby DSL | Pull (agent) | Compliance, infrastructure automation |
| **SaltStack** | YAML/Python | Both (salt-minion) | High-speed config, event-driven |
More in [CICD.en.md](CICD.en.md).
## OpenStack Provisioning
OpenStack offers several methods for provisioning infrastructure:
### Deployment tools
| Tool | Description | Use case |
|------|-------------|----------|
| **TripleO (OpenStack on OpenStack)** | Deploy OpenStack using bare metal (Ironic) + Heat orchestration | Production, Red Hat OSP |
| **Kolla (Ansible + Docker)** | Containerized OpenStack services, Ansible orchestration | Production, flexible |
| **Kolla-Kubernetes** | OpenStack on Kubernetes | Kubernetes-native, edge |
| **Charmed OpenStack (Juju)** | Canonical, Juju charms for OpenStack | Ubuntu, hybrid cloud |
| **OpenStack Charms** | Juju charms for individual services | Fine-grained deployment |
| **DevStack** | Fast development deployment | Dev/test, learning |
| **OpenStack-Ansible** | Ansible playbooks for OpenStack (OSA) | Legacy, AIO |
### Ironic (Bare Metal Provisioning)
- OpenStack service for managing and provisioning bare metal servers
- Supports PXE, iPXE, Redfish, IPMI
- Concepts: **Node** (HW), **Port** (MAC), **Driver** (HW type)
- Lifecycle: enroll → manage → inspect → provide → available → active
- Integration with Nova: Nova runs instances on bare metal via Ironic
### Glance (Image Management)
- Image catalog for VM images and ISO
- Supported formats: raw, qcow2, vmdk, vhd, iso
- Image caching on compute node (for faster boot)
- Multi-backend: file, Ceph RBD, Swift, NFS
## Sources
Links, books and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)
*Last revision: 2026-06-03*

228
PROVISIONING.md Normal file
View File

@@ -0,0 +1,228 @@
# 📦 Provisioning — boot, instalace, správa serverů
## Síťový boot (PXE / iPXE)
### PXE boot flow
```
1. Server power-on → PXE ROM v NIC / UEFI
2. DHCP Broadcast → DHCP server nabídne IP + next-server (TFTP) + boot file
3. TFTP stáhne pxelinux.0 (BIOS) / bootx64.efi (UEFI)
4. Načte konfiguraci (pxelinux.cfg/default nebo MAC/IP-based)
5. Stáhne kernel + initrd přes TFTP/HTTP (iPXE)
6. Kernel boot → automatická instalace (Kickstart / Preseed / AutoYaST)
```
### DHCP konfigurace (ISC DHCP)
```
subnet 10.0.0.0 netmask 255.255.255.0 {
next-server 10.0.0.10; # TFTP server
filename "ipxe.efi"; # Boot file (UEFI)
option domain-name-servers 10.0.0.10;
option routers 10.0.0.1;
}
```
### iPXE (moderní náhrada PXE)
- HTTP místo TFTP (rychlejší, spolehlivější)
- HTTPS support (Image verification, secure boot)
- iSCSI boot, FCoE boot
- Scriptable: `chain http://boot.example.com/script.ipxe`
- Embedded: iPXE ROM flashnutá přímo do NIC
### Porovnání PXE vs iPXE
| Vlastnost | PXE | iPXE |
|-----------|-----|------|
| Protokol | TFTP (pomalý, 512B/blok) | HTTP/HTTPS/iSCSI |
| Šifrování | Ne | HTTPS, TLS |
| Scripting | Pouze menu | Plný scripting engine |
| Debugging | Omezený | Vestavěný shell |
| UEFI/BIOS | Oba | Oba |
## Automatická instalace
### Kickstart (RHEL/Alma/Rocky)
```
# Minimal kickstart pro RHEL 9
text
url --url="http://10.0.0.10/install/rhel9"
lang en_US.UTF-8
keyboard us
timezone Europe/Prague --isUtc
rootpw --iscrypted $6$...
%packages
@^minimal-environment
vim
net-tools
%end
%post
echo "node001" > /etc/hostname
%end
reboot
```
### Preseed (Debian/Ubuntu)
```
d-i debian-installer/locale string en_US.UTF-8
d-i keyboard-configuration/xkb-keymap us
d-i netcfg/choose_interface select auto
d-i netcfg/get_hostname string node001
d-i clock-setup/utc boolean true
d-i time/zone string Europe/Prague
d-i partman-auto/method string regular
d-i partman-auto/choose_recipe select atomic
d-i passwd/root-login boolean true
d-i passwd/root-password password securepass
d-i passwd/root-password-again password securepass
d-i pkgsel/include string openssh-server vim
d-i finish-install/reboot_in_progress note
```
## Metal as a Service
### MAAS (Canonical)
- **Discovery**: DHCP → PXE boot → hardware detection (CPU, RAM, disk, MAC)
- **Komisionování**: node projde commissioning, uloží inventory do DB
- **Deploy**: obraz OS (Ubuntu, RHEL, ESXi) nahrán na disk → reboot
- **Integrace**: Juju, OpenStack, Kubernetes (Charmed Kubernetes)
- **Networking**: VLAN, subnet, DNS/DHCP management, BGP peering
### Digital Rebar / RackN
- **Provisioning**: workflow-based (stages: discovery → firmware → OS → config)
- **Multi-cloud**: bare metal + cloud + edge
- **Template**: šablony pro OS deployment (RHEL, Ubuntu, VMware)
- **API**: plně REST API, Terraform provider
## Management API — Redfish
### Standard DMTF
REST API (JSON) → nástupce IPMI.
| Endpoint | Účel |
|----------|------|
| `/redfish/v1/Systems/` | Server management (power, boot, inventory) |
| `/redfish/v1/Chassis/` | Fyzický hardware (PSU, fan, temp, sensors) |
| `/redfish/v1/Managers/` | BMC (iLO, iDRAC, XClarity) |
| `/redfish/v1/UpdateService/` | Firmware updates |
| `/redfish/v1/EventService/` | Event subscription (webhook) |
### Redfish příklady
```
# Power on server
POST /redfish/v1/Systems/1/Actions/ComputerSystem.Reset
Body: {"ResetType": "On"}
# Set boot override (one-shot PXE)
PATCH /redfish/v1/Systems/1
Body: {"Boot": {"BootSourceOverrideTarget": "Pxe", "BootSourceOverrideEnabled": "Once"}}
# Get sensor data
GET /redfish/v1/Chassis/1/Thermal
→ {"Temperatures": [{"Name": "CPU1", "ReadingCelsius": 45}], "Fans": [...]}
```
### IPMI (legacy)
- Port 623/UDP (RMCP)
- `ipmitool power on/off/status`
- `ipmitool sensor list`
- `ipmitool chassis bootdev pxe`
- Serial over LAN: `ipmitool sol activate`
## Terraform pro provisioning
```hcl
# Terraform provider pro VMware vSphere
provider "vsphere" {
user = var.vsphere_user
password = var.vsphere_password
vsphere_server = var.vsphere_server
}
resource "vsphere_virtual_machine" "web" {
name = "web-${count.index}"
resource_pool_id = data.vsphere_resource_pool.pool.id
datastore_id = data.vsphere_datastore.ds.id
num_cpus = 4
memory = 16384
guest_id = "rhel9_64Guest"
network_interface { network_id = data.vsphere_network.net.id }
disk { label = "os", size = 80 }
}
```
Více v [CICD.md](CICD.md#infrastructure-as-code-iac).
## Firmware management
- **BIOS/UEFI settings**: profilový update při provisioningu (Redfish `PATCH /Systems/1/Bios`)
- **Firmware updates**: Redfish UpdateService, SUU (Dell), SUM (HPE), SMM (Supermicro)
- **Lifecycle Controller** (Dell LC): integrovaný OS pro firmware management
- **Baseline management**: udržovat konzistentní firmware verze napříč fleetem
- **Boot: UEFI vs Legacy BIOS**:
- **UEFI**: Secure Boot, GPT, větší disky, rychlejší boot
- **Legacy BIOS**: MBR, kompatibilita, limit 2 TB boot disk
## Configuration management (post-provisioning)
| Nástroj | Jazyk | Push/Pull | Use case |
|---------|-------|-----------|----------|
| **Ansible** | YAML | Push (SSH) | General config management, ad-hoc |
| **Puppet** | Ruby DSL | Pull (agent) | State management, enterprise |
| **Chef** | Ruby DSL | Pull (agent) | Compliance, infrastructure automation |
| **SaltStack** | YAML/Python | Both (salt-minion) | High-speed config, event-driven |
Více v [CICD.md](CICD.md).
## OpenStack Provisioning
OpenStack nabízí několik metod pro provisionování infrastruktury:
### Deployment nástroje
| Nástroj | Popis | Use case |
|---------|-------|----------|
| **TripleO (OpenStack on OpenStack)** | Deploy OpenStack pomocí bare metal (Ironic) + Heat orchestrace | Produkce, Red Hat OSP |
| **Kolla (Ansible + Docker)** | Containerizované OpenStack služby, Ansible orchestrace | Produkce, flexibilní |
| **Kolla-Kubernetes** | OpenStack na Kubernetes | Kubernetes-native, edge |
| **Charmed OpenStack (Juju)** | Canonical, Juju charmy pro OpenStack | Ubuntu, hybrid cloud |
| **OpenStack Charms** | Juju charmy pro jednotlivé služby | Fine-grained deployment |
| **DevStack** | Rychlý vývojový deployment | Dev/test, learning |
| **OpenStack-Ansible** | Ansible playbooky pro OpenStack (OSA) | Legacy, AIO |
### Ironic (Bare Metal Provisioning)
- OpenStack service pro správu a provisionování bare metal serverů
- Podporuje PXE, iPXE, Redfish, IPMI
- Koncepty: **Node** (HW), **Port** (MAC), **Driver** (HW typ)
- Lifecycle: enroll → manage → inspect → provide → available → active
- Integrace s Nova: Nova spouští instance na bare metal přes Ironic
### Glance (Image Management)
- Image catalog pro VM images a ISO
- Podpora formátů: raw, qcow2, vmdk, vhd, iso
- Image caching na compute node (pro rychlejší boot)
- Multi-backend: file, Ceph RBD, Swift, NFS
## Zdroje
Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
*Poslední revize: 2026-06-03*

211
README.en.md Normal file
View File

@@ -0,0 +1,211 @@
# 🏗️ Infrastructure Architecture — Knowledge Base
Comprehensive overview of topics, principles, and best practices for infrastructure design and operations.
Bilingual: Czech (`.md`) and English (`.en.md`).
---
## Topic Map — Relationships Between Areas
```
┌─────────────┐
│ CLOUD │
│ (IaaS/PaaS)│
└──────┬──────┘
┌──────────────┼──────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────────┐
│NETWORKING│ │ STORAGE │ │ DATABASES │
│(L2-L7, │ │(SAN/NAS/ │ │ (SQL/NOSQL/ │
│ Zero Tr.)│ │ Ceph/SDS)│ │ Vector) │
└────┬─────┘ └────┬─────┘ └──────┬───────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────┐
│ DATACENTERS │
│ (Tier, power, cooling, layout) │
└────────────┬────────────────────────┘
┌────────────┼────────────┬───────────────┐
▼ ▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌────────┐ ┌──────────────┐
│SERVER-HW │ │SERVER- │ │ GPU │ │ PROVISIONING │
│(CPU,RAM, │ │CONFIG │ │(NVIDIA/│ │ (PXE, Ironic │
│ PCIe,BM) │ │(BIOS, │ │ AMD) │ │ Terraform) │
└──────────┘ │ NUMA) │ └────────┘ └──────────────┘
└──────────┘
┌──────────┐ ┌──────────┐ ┌────────┐ ┌────────────┐
│HYPERVISOR│ │ MONITOR │ │ CICD │ │ ☸ K8s │
│(VMware, │ │(Prom, │ │(GitOps, │ │(CAPI, K3s, │
│ KVM, ...)│ │ Grafana) │ │ IaC) │ │ RKE2...) │
└──────────┘ └──────────┘ └────────┘ └────────────┘
```
---
## Navigation — Czech (`.md`)
| Area | File | Description | Related to |
|------|------|-------------|------------|
| ☁️ Cloud architecture | [CLOUD.md](CLOUD.md) | AWS/Azure/GCP, hybrid cloud, multi-cloud | GPU, NETWORKING |
| 🌐 Network architecture | [NETWORKING.md](NETWORKING.md) | DNS, BGP, VPC, Zero Trust, EVPN VXLAN, TLS | CLOUD |
| 📊 Monitoring & observability | [MONITORING.md](MONITORING.md) | Prometheus, Grafana, OTel, logging, alerting | — |
| 🔄 CI/CD & DevOps | [CICD.md](CICD.md) | Pipelines, GitOps, IaC (Terraform), deployment | — |
| 💻 Operační systémy | [OS.md](OS.md) | Linux distribuce, Windows Server, lifecycle, EOL, kernel | KUBERNETES, HYPERVISORS, AI-INFRASTRUCTURE |
| 🔄 Disaster Recovery | [DR.md](DR.md) | RTO, RPO, scenarios, prevention, uptime calculation | CLOUD, DATACENTERS, MONITORING |
| 🗄️ Database architecture | [DATABASES.md](DATABASES.md) | Classification, sharding, replication, caching | POSTGRESQL, MYSQL, ORACLE, MONGODB, REDIS, CASSANDRA, VEKTOROVE-DB, DATABAZOVE-ENGINY |
| 🗄️ Big Data | [BIG-DATA.md](BIG-DATA.md) | HDFS, Spark, Flink, Trino, Iceberg, Delta Lake, Lakehouse | DATABASES, CLOUD, MESSAGING, KUBERNETES |
| 🖥️ Hypervisors | [HYPERVISORS.md](HYPERVISORS.md) | VMware, Hyper-V, KVM, Proxmox, migration | STORAGE, SERVER-HW |
| 🏭 Data centers | [DATACENTERS.md](DATACENTERS.md) | Tier, power, cooling, layout, DC services, secondary DC topologies | MONITORING, MESSAGING |
| 💾 Storage | [STORAGE.md](STORAGE.md) | SAN/NAS/object, RAID, SDS, Ceph, OpenStack Cinder/Swift/Manila | — |
| 🔌 Server connectivity | [CONNECTIVITY.md](CONNECTIVITY.md) | Ethernet, FC SAN, iSCSI, NVMe-oF, SAS | — |
| 🔧 Server hardware | [SERVER-HW.md](SERVER-HW.md) | CPU, RAM, PCIe, NUMA, BMC | CONNECTIVITY |
| 🎮 GPU | [GPU.md](GPU.md) | NVIDIA/AMD, NVLink, MIG/vGPU, AI, Cyborg | — |
| ⚙️ Server config | [SERVER-CONFIG.md](SERVER-CONFIG.md) | BIOS tuning, DB/hypervisor/K8s/storage best practices | — |
| 📦 Provisioning | [PROVISIONING.md](PROVISIONING.md) | PXE, Redfish, Terraform, Ironic, OpenStack deploy | CICD |
| ☸ Kubernetes | [KUBERNETES.md](KUBERNETES.md) | K8s architektura, deployment, Cluster API (CAPI) | CICD, CLOUD, NETWORKING |
| 📨 Messaging & streaming | [MESSAGING.md](MESSAGING.md) | Kafka, RabbitMQ, Pulsar, NATS, managed queue/pubsub | DATACENTERS, CLOUD |
| 🏗️ Migrace DC | [DC-MIGRATION.md](DC-MIGRATION.md) | Strategie, fáze, network, DB, rollback | DATACENTERS, CLOUD, DR, NETWORKING, STORAGE |
| 🧠 AI infrastruktura | [AI-INFRASTRUCTURE.md](AI-INFRASTRUCTURE.md) | GPU, AI networking, storage, cluster, cooling, training/inference | GPU, NETWORKING, STORAGE, DATACENTERS, CLOUD |
| 📋 Legacy index | [HARDWARE.md](HARDWARE.md) | → SERVER-HW, GPU, SERVER-CONFIG, PROVISIONING | SERVER-HW, GPU, SERVER-CONFIG, PROVISIONING |
| 📋 Legacy infra | [INFRASTRUCTURE.md](INFRASTRUCTURE.md) | → HYPERVISORS, DATACENTERS, STORAGE, HARDWARE | HYPERVISORS, DATACENTERS, STORAGE, HARDWARE |
| 📋 Review workflow | [REVIEW.md](REVIEW.md) | Review and content control process | — |
| 📝 ADR template | [templates/ADR.md](templates/ADR.md) | Architecture Decision Record template | — |
### Detailed DB files
| File | Description |
|------|-------------|
| [POSTGRESQL.en.md](POSTGRESQL.en.md) | PostgreSQL — architecture, replication, tuning |
| [MYSQL.en.md](MYSQL.en.md) | MySQL & MariaDB |
| [ORACLE.en.md](ORACLE.en.md) | Oracle Database — RAC, Data Guard, tuning |
| [MONGODB.en.md](MONGODB.en.md) | MongoDB — document DB, sharding, replica sets |
| [REDIS.en.md](REDIS.en.md) | Redis — cache, session store, streams |
| [CASSANDRA.en.md](CASSANDRA.en.md) | Cassandra & ScyllaDB — wide-column, nosql |
| [VEKTOROVE-DB.md](VEKTOROVE-DB.md) | Vector databases — Pinecone, Qdrant, Milvus, pgvector |
| [DATABAZOVE-ENGINY.md](DATABAZOVE-ENGINY.md) | Common DB concepts — transactions, indexes, locking |
---
## Navigation — English (`.en.md`)
| Area | File | Description | Related to |
|------|------|-------------|------------|
| ☁️ Cloud architecture | [CLOUD.en.md](CLOUD.en.md) | AWS/Azure/GCP, hybrid cloud, multi-cloud | GPU, NETWORKING |
| 🌐 Network architecture | [NETWORKING.en.md](NETWORKING.en.md) | DNS, BGP, VPC, Zero Trust, EVPN VXLAN, TLS | CLOUD |
| 📊 Monitoring & observability | [MONITORING.en.md](MONITORING.en.md) | Prometheus, Grafana, OTel, logging, alerting | — |
| 🔄 CI/CD & DevOps | [CICD.en.md](CICD.en.md) | Pipelines, GitOps, IaC (Terraform), deployment | — |
| 💻 Operating systems | [OS.en.md](OS.en.md) | Linux distributions, Windows Server, lifecycle, EOL, kernel | KUBERNETES, HYPERVISORS, AI-INFRASTRUCTURE |
| 🔄 Disaster Recovery | [DR.en.md](DR.en.md) | RTO, RPO, scenarios, prevention, uptime calculation | CLOUD, DATACENTERS, MONITORING |
| 🗄️ Database architecture | [DATABASES.en.md](DATABASES.en.md) | Classification, sharding, replication, caching | POSTGRESQL, MYSQL, ORACLE, MONGODB, REDIS, CASSANDRA, VECTOR-DBS, DATABASE-ENGINES |
| 🗄️ Big Data | [BIG-DATA.en.md](BIG-DATA.en.md) | HDFS, Spark, Flink, Trino, Iceberg, Delta Lake, Lakehouse | DATABASES, CLOUD, MESSAGING, KUBERNETES |
| 🖥️ Hypervisors | [HYPERVISORS.en.md](HYPERVISORS.en.md) | VMware, Hyper-V, KVM, Proxmox, migration | STORAGE, SERVER-HW |
| 🏭 Data centers | [DATACENTERS.en.md](DATACENTERS.en.md) | Tier, power, cooling, layout, DC services, secondary DC topologies | MONITORING, MESSAGING |
| 💾 Storage | [STORAGE.en.md](STORAGE.en.md) | SAN/NAS/object, RAID, SDS, Ceph | — |
| 🔌 Server connectivity | [CONNECTIVITY.en.md](CONNECTIVITY.en.md) | Ethernet, FC SAN, iSCSI, NVMe-oF, SAS | — |
| 🔧 Server hardware | [SERVER-HW.en.md](SERVER-HW.en.md) | CPU, RAM, PCIe, NUMA, BMC | CONNECTIVITY |
| 🎮 GPU | [GPU.en.md](GPU.en.md) | NVIDIA/AMD, NVLink, MIG/vGPU, AI, Cyborg | — |
| ⚙️ Server config | [SERVER-CONFIG.en.md](SERVER-CONFIG.en.md) | BIOS tuning, DB/hypervisor/K8s/storage best practices | — |
| 📦 Provisioning | [PROVISIONING.en.md](PROVISIONING.en.md) | PXE, Redfish, Terraform, Ironic, OpenStack deploy | CICD |
| ☸ Kubernetes | [KUBERNETES.en.md](KUBERNETES.en.md) | K8s architecture, deployment, Cluster API (CAPI) | CICD, CLOUD, NETWORKING |
| 📨 Messaging & streaming | [MESSAGING.en.md](MESSAGING.en.md) | Kafka, RabbitMQ, Pulsar, NATS, managed queue/pubsub | DATACENTERS, CLOUD |
| 🏗️ DC Migration | [DC-MIGRATION.en.md](DC-MIGRATION.en.md) | Strategies, phases, network, DB, rollback | DATACENTERS, CLOUD, DR, NETWORKING, STORAGE |
| 🧠 AI Infrastructure | [AI-INFRASTRUCTURE.en.md](AI-INFRASTRUCTURE.en.md) | GPU, AI networking, storage, cluster, cooling, training/inference | GPU, NETWORKING, STORAGE, DATACENTERS, CLOUD |
| 📋 Legacy index | [HARDWARE.en.md](HARDWARE.en.md) | → SERVER-HW, GPU, SERVER-CONFIG, PROVISIONING | SERVER-HW, GPU, SERVER-CONFIG, PROVISIONING |
| 📋 Legacy infra | [INFRASTRUCTURE.en.md](INFRASTRUCTURE.en.md) | → HYPERVISORS, DATACENTERS, STORAGE, HARDWARE | HYPERVISORS, DATACENTERS, STORAGE, HARDWARE |
| 📋 Review workflow | [REVIEW.en.md](REVIEW.en.md) | Review and content control process | — |
| 📝 ADR template | [templates/ADR.en.md](templates/ADR.en.md) | Architecture Decision Record template | — |
### Detailed DB files
| File | Description |
|------|-------------|
| [POSTGRESQL.en.md](POSTGRESQL.en.md) | PostgreSQL — architecture, replication, tuning |
| [MYSQL.en.md](MYSQL.en.md) | MySQL & MariaDB |
| [ORACLE.en.md](ORACLE.en.md) | Oracle Database — RAC, Data Guard, tuning |
| [MONGODB.en.md](MONGODB.en.md) | MongoDB — document DB, sharding, replica sets |
| [REDIS.en.md](REDIS.en.md) | Redis — cache, session store, streams |
| [CASSANDRA.en.md](CASSANDRA.en.md) | Cassandra & ScyllaDB — wide-column, nosql |
| [VECTOR-DBS.en.md](VECTOR-DBS.en.md) | Vector databases — Pinecone, Qdrant, Milvus, pgvector |
| [DATABASE-ENGINES.en.md](DATABASE-ENGINES.en.md) | Common DB concepts — transactions, indexes, locking |
---
## Case Studies
| File | Description |
|------|-------------|
| [case-studies/proxmox-demo/README.md](case-studies/proxmox-demo/README.md) | Proxmox VE demo cluster — návrh (CZ) |
| [case-studies/proxmox-demo/README.en.md](case-studies/proxmox-demo/README.en.md) | Proxmox VE demo cluster — design (EN) |
---
## Cross-Reference Matrix
| File | References |
|------|------------|
| `CLOUD.md` / `CLOUD.en.md` | [`GPU.en.md`](GPU.en.md), [`NETWORKING.en.md`](NETWORKING.en.md), [`sources/cloud/sources.en.md`](sources/cloud/sources.en.md) |
| `NETWORKING.md` / `NETWORKING.en.md` | [`CLOUD.en.md`](CLOUD.en.md), [`sources/networking/sources.en.md`](sources/networking/sources.en.md) |
| `DATACENTERS.md` / `DATACENTERS.en.md` | [`MONITORING.en.md`](MONITORING.en.md), [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
| `MONITORING.md` / `MONITORING.en.md` | [`sources/monitoring/sources.en.md`](sources/monitoring/sources.en.md) |
| `CICD.md` / `CICD.en.md` | [`sources/cicd/sources.en.md`](sources/cicd/sources.en.md) |
| `DR.md` / `DR.en.md` | [`CLOUD.en.md`](CLOUD.en.md), [`DATACENTERS.en.md`](DATACENTERS.en.md), [`MONITORING.en.md`](MONITORING.en.md), [`CICD.en.md`](CICD.en.md), [`STORAGE.en.md`](STORAGE.en.md), [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
| `MESSAGING.md` / `MESSAGING.en.md` | [`DATACENTERS.en.md`](DATACENTERS.en.md), [`CLOUD.en.md`](CLOUD.en.md), [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
| `DC-MIGRATION.md` / `DC-MIGRATION.en.md` | [`DATACENTERS.en.md`](DATACENTERS.en.md), [`CLOUD.en.md`](CLOUD.en.md), [`DR.en.md`](DR.en.md), [`NETWORKING.en.md`](NETWORKING.en.md), [`STORAGE.en.md`](STORAGE.en.md), [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
| `AI-INFRASTRUCTURE.md` / `AI-INFRASTRUCTURE.en.md` | [`GPU.en.md`](GPU.en.md), [`NETWORKING.en.md`](NETWORKING.en.md), [`STORAGE.en.md`](STORAGE.en.md), [`DATACENTERS.en.md`](DATACENTERS.en.md), [`CLOUD.en.md`](CLOUD.en.md), [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
| `PROVISIONING.md` / `PROVISIONING.en.md` | [`CICD.en.md`](CICD.en.md), [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
| `STORAGE.md` / `STORAGE.en.md` | [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
| `GPU.md` / `GPU.en.md` | [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
| `SERVER-HW.md` / `SERVER-HW.en.md` | [`CONNECTIVITY.en.md`](CONNECTIVITY.en.md), [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
| `SERVER-CONFIG.md` / `SERVER-CONFIG.en.md` | [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
| `CONNECTIVITY.md` / `CONNECTIVITY.en.md` | [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
| `HYPERVISORS.md` / `HYPERVISORS.en.md` | [`STORAGE.en.md`](STORAGE.en.md), [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
| `DATABASES.md` / `DATABASES.en.md` | [`POSTGRESQL.en.md`](POSTGRESQL.en.md), [`MYSQL.en.md`](MYSQL.en.md), [`ORACLE.en.md`](ORACLE.en.md), [`MONGODB.en.md`](MONGODB.en.md), [`REDIS.en.md`](REDIS.en.md), [`CASSANDRA.en.md`](CASSANDRA.en.md), [`VEKTOROVE-DB.md`](VEKTOROVE-DB.md), [`DATABAZOVE-ENGINY.md`](DATABAZOVE-ENGINY.md), [`sources/databases/sources.en.md`](sources/databases/sources.en.md) |
| `HARDWARE.md` / `HARDWARE.en.md` | [`SERVER-HW.en.md`](SERVER-HW.en.md), [`GPU.en.md`](GPU.en.md), [`SERVER-CONFIG.en.md`](SERVER-CONFIG.en.md), [`PROVISIONING.en.md`](PROVISIONING.en.md) |
| `OS.md` / `OS.en.md` | [`AI-INFRASTRUCTURE.en.md`](AI-INFRASTRUCTURE.en.md), [`KUBERNETES.en.md`](KUBERNETES.en.md), [`HYPERVISORS.en.md`](HYPERVISORS.en.md), [`DATACENTERS.en.md`](DATACENTERS.en.md), [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
| `KUBERNETES.md` / `KUBERNETES.en.md` | [`CICD.en.md`](CICD.en.md), [`CLOUD.en.md`](CLOUD.en.md), [`NETWORKING.en.md`](NETWORKING.en.md), [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
| `BIG-DATA.md` / `BIG-DATA.en.md` | [`DATABASES.en.md`](DATABASES.en.md), [`CLOUD.en.md`](CLOUD.en.md), [`MESSAGING.en.md`](MESSAGING.en.md), [`KUBERNETES.en.md`](KUBERNETES.en.md), [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
| `INFRASTRUCTURE.md` / `INFRASTRUCTURE.en.md` | [`HYPERVISORS.en.md`](HYPERVISORS.en.md), [`DATACENTERS.en.md`](DATACENTERS.en.md), [`STORAGE.en.md`](STORAGE.en.md), [`HARDWARE.en.md`](HARDWARE.en.md) |
---
## Sources
Raw reference data (documentation, books, standards) by area:
| Area | Czech | English |
|------|-------|---------|
| ☁️ Cloud | [`sources/cloud/sources.md`](sources/cloud/sources.md) | [`sources/cloud/sources.en.md`](sources/cloud/sources.en.md) |
| 🌐 Networking | [`sources/networking/sources.md`](sources/networking/sources.md) | [`sources/networking/sources.en.md`](sources/networking/sources.en.md) |
| 📊 Monitoring | [`sources/monitoring/sources.md`](sources/monitoring/sources.md) | [`sources/monitoring/sources.en.md`](sources/monitoring/sources.en.md) |
| 🔄 CI/CD | [`sources/cicd/sources.md`](sources/cicd/sources.md) | [`sources/cicd/sources.en.md`](sources/cicd/sources.en.md) |
| 🗄️ Databases | [`sources/databases/sources.md`](sources/databases/sources.md) | [`sources/databases/sources.en.md`](sources/databases/sources.en.md) |
| 🏗️ Infrastructure | [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) | [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
---
## KB Agents
| Agent | Description |
|-------|-------------|
| [`kb-research`](.opencode/agents/kb-research.md) | Processes [todo] items — research on new topics |
| [`kb-source-scout`](.opencode/agents/kb-source-scout.md) | Finds new sources and adds them to sources/ |
| [`kb-reviewer`](.opencode/agents/kb-reviewer.md) | Audits consistency, links, duplications, formatting |
| [`kb-index`](.opencode/agents/kb-index.md) | **Maintains this index** — scans files, extracts cross-references, validates links |
---
## Principles
| Czech | English |
|-------|---------|
| **Dostupnost** — SLA, redundance, failover, multi-AZ | **Availability** — SLA, redundancy, failover, multi-AZ |
| **Škálovatelnost** — horizontální vs. vertikální, auto-scaling | **Scalability** — horizontal vs. vertical, auto-scaling |
| **Bezpečnost** — defense in depth, least privilege, zero trust | **Security** — defense in depth, least privilege, zero trust |
| **Náklady** — FinOps, right-sizing, reserved instances | **Cost** — FinOps, right-sizing, reserved instances |
| **Operability** — observabilita, automation, dokumentace | **Operability** — observability, automation, documentation |
---
*This index is automatically maintained by the `kb-index` agent. Last updated: 2026-06-18.*

211
README.md Normal file
View File

@@ -0,0 +1,211 @@
# 🏗️ Infrastructure Architecture — Knowledge Base
Komplexní přehled témat, principů a best practices pro návrh a provoz infrastruktury.
Bilingual: Czech (`.md`) and English (`.en.md`).
---
## Topic Map — Vztahy mezi oblastmi
```
┌─────────────┐
│ CLOUD │
│ (IaaS/PaaS)│
└──────┬──────┘
┌──────────────┼──────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────────┐
│NETWORKING│ │ STORAGE │ │ DATABASES │
│(L2-L7, │ │(SAN/NAS/ │ │ (SQL/NOSQL/ │
│ Zero Tr.)│ │ Ceph/SDS)│ │ Vector) │
└────┬─────┘ └────┬─────┘ └──────┬───────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────┐
│ DATACENTERS │
│ (Tier, power, cooling, layout) │
└────────────┬────────────────────────┘
┌────────────┼────────────┬───────────────┐
▼ ▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌────────┐ ┌──────────────┐
│SERVER-HW │ │SERVER- │ │ GPU │ │ PROVISIONING │
│(CPU,RAM, │ │CONFIG │ │(NVIDIA/│ │ (PXE, Ironic │
│ PCIe,BM) │ │(BIOS, │ │ AMD) │ │ Terraform) │
└──────────┘ │ NUMA) │ └────────┘ └──────────────┘
└──────────┘
┌──────────┐ ┌──────────┐ ┌────────┐ ┌────────────┐
│HYPERVISOR│ │ MONITOR │ │ CICD │ │ ☸ K8s │
│(VMware, │ │(Prom, │ │(GitOps, │ │(CAPI, K3s, │
│ KVM, ...)│ │ Grafana) │ │ IaC) │ │ RKE2...) │
└──────────┘ └──────────┘ └────────┘ └────────────┘
```
---
## Navigace — Czech (`.md`)
| Oblast | Soubor | Popis | Propojeno s |
|--------|--------|-------|-------------|
| ☁️ Cloud architektura | [CLOUD.md](CLOUD.md) | AWS/Azure/GCP, hybrid cloud, multi-cloud, well-architected framework | GPU, NETWORKING |
| 🌐 Síťová architektura | [NETWORKING.md](NETWORKING.md) | DNS, BGP, VPC, Zero Trust, EVPN VXLAN, TLS | CLOUD |
| 📊 Monitoring a observabilita | [MONITORING.md](MONITORING.md) | Prometheus, Grafana, OTel, logging, alerting, SLO | — |
| 🔄 CI/CD a DevOps | [CICD.md](CICD.md) | Pipelines, GitOps, IaC (Terraform), deployment strategie | — |
| 💻 Operační systémy | [OS.md](OS.md) | Linux distribuce, Windows Server, lifecycle, EOL, kernel | KUBERNETES, HYPERVISORS, AI-INFRASTRUCTURE |
| 🔄 Disaster Recovery | [DR.md](DR.md) | RTO, RPO, scénáře, prevence, výpočet uptimu | CLOUD, DATACENTERS, MONITORING |
| 🗄️ Databázová architektura | [DATABASES.md](DATABASES.md) | Klasifikace, sharding, replikace, caching | POSTGRESQL, MYSQL, ORACLE, MONGODB, REDIS, CASSANDRA, VEKTOROVE-DB, DATABAZOVE-ENGINY |
| 🗄️ Big Data | [BIG-DATA.md](BIG-DATA.md) | HDFS, Spark, Flink, Trino, Iceberg, Delta Lake, Lakehouse | DATABASES, CLOUD, MESSAGING, KUBERNETES |
| 🖥️ Hypervisory | [HYPERVISORS.md](HYPERVISORS.md) | VMware, Hyper-V, KVM, Proxmox, migrace | STORAGE, SERVER-HW |
| 🏭 Datová centra | [DATACENTERS.md](DATACENTERS.md) | Tier, power, cooling, layout, DC služby, sekundární DC topologie | MONITORING, MESSAGING |
| 💾 Storage | [STORAGE.md](STORAGE.md) | SAN/NAS/object, RAID, SDS, Ceph, OpenStack Cinder/Swift/Manila | — |
| 🔌 Server connectivity | [CONNECTIVITY.md](CONNECTIVITY.md) | Ethernet, FC SAN, iSCSI, NVMe-oF, SAS | — |
| 🔧 Server hardware | [SERVER-HW.md](SERVER-HW.md) | CPU, RAM, PCIe, NUMA, BMC | CONNECTIVITY |
| 🎮 GPU | [GPU.md](GPU.md) | NVIDIA/AMD, NVLink, MIG/vGPU, AI, Cyborg | — |
| ⚙️ Server config | [SERVER-CONFIG.md](SERVER-CONFIG.md) | BIOS tuning, DB/hypervisor/K8s/storage best practices | — |
| 📦 Provisioning | [PROVISIONING.md](PROVISIONING.md) | PXE, Redfish, Terraform, Ironic, OpenStack deploy | CICD |
| ☸ Kubernetes | [KUBERNETES.md](KUBERNETES.md) | K8s architektura, deployment, Cluster API (CAPI) | CICD, CLOUD, NETWORKING |
| 📨 Messaging & streaming | [MESSAGING.md](MESSAGING.md) | Kafka, RabbitMQ, Pulsar, NATS, managed queue/pubsub | DATACENTERS, CLOUD |
| 🏗️ Migrace DC | [DC-MIGRATION.md](DC-MIGRATION.md) | Strategie, fáze, network, DB, rollback | DATACENTERS, CLOUD, DR, NETWORKING, STORAGE |
| 🧠 AI infrastruktura | [AI-INFRASTRUCTURE.md](AI-INFRASTRUCTURE.md) | GPU, AI networking, storage, cluster, cooling, training/inference | GPU, NETWORKING, STORAGE, DATACENTERS, CLOUD |
| 📋 Původní rozcestník | [HARDWARE.md](HARDWARE.md) | Legacy index → SERVER-HW, GPU, SERVER-CONFIG, PROVISIONING | SERVER-HW, GPU, SERVER-CONFIG, PROVISIONING |
| 📋 Původní infrastruktura | [INFRASTRUCTURE.md](INFRASTRUCTURE.md) | Legacy index → HYPERVISORS, DATACENTERS, STORAGE, HARDWARE | HYPERVISORS, DATACENTERS, STORAGE, HARDWARE |
| 📋 Review workflow | [REVIEW.md](REVIEW.md) | Proces oponentury a kontroly obsahu | — |
| 📝 ADR template | [templates/ADR.md](templates/ADR.md) | Architecture Decision Record template | — |
### Detailní DB soubory
| Soubor | Popis |
|--------|-------|
| [POSTGRESQL.md](POSTGRESQL.md) | PostgreSQL — architektura, replikace, tuning |
| [MYSQL.md](MYSQL.md) | MySQL & MariaDB |
| [ORACLE.md](ORACLE.md) | Oracle Database — RAC, Data Guard, tuning |
| [MONGODB.md](MONGODB.md) | MongoDB — document DB, sharding, replica sets |
| [REDIS.md](REDIS.md) | Redis — cache, session store, streamy |
| [CASSANDRA.md](CASSANDRA.md) | Cassandra & ScyllaDB — wide-column, nosql |
| [VEKTOROVE-DB.md](VEKTOROVE-DB.md) | Vector databáze — Pinecone, Qdrant, Milvus, pgvector |
| [DATABAZOVE-ENGINY.md](DATABAZOVE-ENGINY.md) | Společné koncepty napříč DB — transakce, indexy, locking |
---
## Navigation — English (`.en.md`)
| Area | File | Description | Related to |
|------|------|-------------|------------|
| ☁️ Cloud architecture | [CLOUD.en.md](CLOUD.en.md) | AWS/Azure/GCP, hybrid cloud, multi-cloud | GPU, NETWORKING |
| 🌐 Network architecture | [NETWORKING.en.md](NETWORKING.en.md) | DNS, BGP, VPC, Zero Trust, EVPN VXLAN, TLS | CLOUD |
| 📊 Monitoring & observability | [MONITORING.en.md](MONITORING.en.md) | Prometheus, Grafana, OTel, logging, alerting | — |
| 🔄 CI/CD & DevOps | [CICD.en.md](CICD.en.md) | Pipelines, GitOps, IaC (Terraform), deployment | — |
| 💻 Operating systems | [OS.en.md](OS.en.md) | Linux distributions, Windows Server, lifecycle, EOL, kernel | KUBERNETES, HYPERVISORS, AI-INFRASTRUCTURE |
| 🔄 Disaster Recovery | [DR.en.md](DR.en.md) | RTO, RPO, scenarios, prevention, uptime calculation | CLOUD, DATACENTERS, MONITORING |
| 🗄️ Database architecture | [DATABASES.en.md](DATABASES.en.md) | Classification, sharding, replication, caching | POSTGRESQL, MYSQL, ORACLE, MONGODB, REDIS, CASSANDRA, VECTOR-DBS, DATABASE-ENGINES |
| 🗄️ Big Data | [BIG-DATA.en.md](BIG-DATA.en.md) | HDFS, Spark, Flink, Trino, Iceberg, Delta Lake, Lakehouse | DATABASES, CLOUD, MESSAGING, KUBERNETES |
| 🖥️ Hypervisors | [HYPERVISORS.en.md](HYPERVISORS.en.md) | VMware, Hyper-V, KVM, Proxmox, migration | STORAGE, SERVER-HW |
| 🏭 Data centers | [DATACENTERS.en.md](DATACENTERS.en.md) | Tier, power, cooling, layout, DC services, secondary DC topologies | MONITORING, MESSAGING |
| 💾 Storage | [STORAGE.en.md](STORAGE.en.md) | SAN/NAS/object, RAID, SDS, Ceph | — |
| 🔌 Server connectivity | [CONNECTIVITY.en.md](CONNECTIVITY.en.md) | Ethernet, FC SAN, iSCSI, NVMe-oF, SAS | — |
| 🔧 Server hardware | [SERVER-HW.en.md](SERVER-HW.en.md) | CPU, RAM, PCIe, NUMA, BMC | CONNECTIVITY |
| 🎮 GPU | [GPU.en.md](GPU.en.md) | NVIDIA/AMD, NVLink, MIG/vGPU, AI, Cyborg | — |
| ⚙️ Server config | [SERVER-CONFIG.en.md](SERVER-CONFIG.en.md) | BIOS tuning, DB/hypervisor/K8s/storage best practices | — |
| 📦 Provisioning | [PROVISIONING.en.md](PROVISIONING.en.md) | PXE, Redfish, Terraform, Ironic, OpenStack deploy | CICD |
| ☸ Kubernetes | [KUBERNETES.en.md](KUBERNETES.en.md) | K8s architecture, deployment, Cluster API (CAPI) | CICD, CLOUD, NETWORKING |
| 📨 Messaging & streaming | [MESSAGING.en.md](MESSAGING.en.md) | Kafka, RabbitMQ, Pulsar, NATS, managed queue/pubsub | DATACENTERS, CLOUD |
| 🏗️ DC Migration | [DC-MIGRATION.en.md](DC-MIGRATION.en.md) | Strategies, phases, network, DB, rollback | DATACENTERS, CLOUD, DR, NETWORKING, STORAGE |
| 🧠 AI Infrastructure | [AI-INFRASTRUCTURE.en.md](AI-INFRASTRUCTURE.en.md) | GPU, AI networking, storage, cluster, cooling, training/inference | GPU, NETWORKING, STORAGE, DATACENTERS, CLOUD |
| 📋 Legacy index | [HARDWARE.en.md](HARDWARE.en.md) | → SERVER-HW, GPU, SERVER-CONFIG, PROVISIONING | SERVER-HW, GPU, SERVER-CONFIG, PROVISIONING |
| 📋 Legacy infra | [INFRASTRUCTURE.en.md](INFRASTRUCTURE.en.md) | → HYPERVISORS, DATACENTERS, STORAGE, HARDWARE | HYPERVISORS, DATACENTERS, STORAGE, HARDWARE |
| 📋 Review workflow | [REVIEW.en.md](REVIEW.en.md) | Review and content control process | — |
| 📝 ADR template | [templates/ADR.en.md](templates/ADR.en.md) | Architecture Decision Record template | — |
### Detailed DB files
| File | Description |
|------|-------------|
| [POSTGRESQL.en.md](POSTGRESQL.en.md) | PostgreSQL — architecture, replication, tuning |
| [MYSQL.en.md](MYSQL.en.md) | MySQL & MariaDB |
| [ORACLE.en.md](ORACLE.en.md) | Oracle Database — RAC, Data Guard, tuning |
| [MONGODB.en.md](MONGODB.en.md) | MongoDB — document DB, sharding, replica sets |
| [REDIS.en.md](REDIS.en.md) | Redis — cache, session store, streams |
| [CASSANDRA.en.md](CASSANDRA.en.md) | Cassandra & ScyllaDB — wide-column, nosql |
| [VECTOR-DBS.en.md](VECTOR-DBS.en.md) | Vector databases — Pinecone, Qdrant, Milvus, pgvector |
| [DATABASE-ENGINES.en.md](DATABASE-ENGINES.en.md) | Common DB concepts — transactions, indexes, locking |
---
## Case Studies
| File | Popis / Description |
|------|-------------------|
| [case-studies/proxmox-demo/README.md](case-studies/proxmox-demo/README.md) | Proxmox VE demo cluster — návrh (CZ) |
| [case-studies/proxmox-demo/README.en.md](case-studies/proxmox-demo/README.en.md) | Proxmox VE demo cluster — design (EN) |
---
## Cross-Reference Matrix
| Soubor (File) | Odkazuje na (References) |
|---------------|--------------------------|
| `CLOUD.md` / `CLOUD.en.md` | [`GPU.md`](GPU.md), [`NETWORKING.md`](NETWORKING.md), [`sources/cloud/sources.md`](sources/cloud/sources.md) |
| `NETWORKING.md` / `NETWORKING.en.md` | [`CLOUD.md`](CLOUD.md), [`sources/networking/sources.md`](sources/networking/sources.md) |
| `DATACENTERS.md` / `DATACENTERS.en.md` | [`MONITORING.md`](MONITORING.md), [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
| `MONITORING.md` / `MONITORING.en.md` | [`sources/monitoring/sources.md`](sources/monitoring/sources.md) |
| `CICD.md` / `CICD.en.md` | [`sources/cicd/sources.md`](sources/cicd/sources.md) |
| `DR.md` / `DR.en.md` | [`CLOUD.md`](CLOUD.md), [`DATACENTERS.md`](DATACENTERS.md), [`MONITORING.md`](MONITORING.md), [`CICD.md`](CICD.md), [`STORAGE.md`](STORAGE.md), [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
| `MESSAGING.md` / `MESSAGING.en.md` | [`DATACENTERS.md`](DATACENTERS.md), [`CLOUD.md`](CLOUD.md), [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
| `DC-MIGRATION.md` / `DC-MIGRATION.en.md` | [`DATACENTERS.md`](DATACENTERS.md), [`CLOUD.md`](CLOUD.md), [`DR.md`](DR.md), [`NETWORKING.md`](NETWORKING.md), [`STORAGE.md`](STORAGE.md), [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
| `OS.md` / `OS.en.md` | [`AI-INFRASTRUCTURE.md`](AI-INFRASTRUCTURE.md), [`KUBERNETES.md`](KUBERNETES.md), [`HYPERVISORS.md`](HYPERVISORS.md), [`DATACENTERS.md`](DATACENTERS.md), [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
| `AI-INFRASTRUCTURE.md` / `AI-INFRASTRUCTURE.en.md` | [`GPU.md`](GPU.md), [`NETWORKING.md`](NETWORKING.md), [`STORAGE.md`](STORAGE.md), [`DATACENTERS.md`](DATACENTERS.md), [`CLOUD.md`](CLOUD.md), [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
| `PROVISIONING.md` / `PROVISIONING.en.md` | [`CICD.md`](CICD.md), [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
| `STORAGE.md` / `STORAGE.en.md` | [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
| `GPU.md` / `GPU.en.md` | [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
| `SERVER-HW.md` / `SERVER-HW.en.md` | [`CONNECTIVITY.md`](CONNECTIVITY.md), [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
| `SERVER-CONFIG.md` / `SERVER-CONFIG.en.md` | [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
| `CONNECTIVITY.md` / `CONNECTIVITY.en.md` | [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
| `HYPERVISORS.md` / `HYPERVISORS.en.md` | [`STORAGE.md`](STORAGE.md), [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
| `DATABASES.md` / `DATABASES.en.md` | [`POSTGRESQL.md`](POSTGRESQL.md), [`MYSQL.md`](MYSQL.md), [`ORACLE.md`](ORACLE.md), [`MONGODB.md`](MONGODB.md), [`REDIS.md`](REDIS.md), [`CASSANDRA.md`](CASSANDRA.md), [`VEKTOROVE-DB.md`](VEKTOROVE-DB.md), [`DATABAZOVE-ENGINY.md`](DATABAZOVE-ENGINY.md), [`sources/databases/sources.md`](sources/databases/sources.md) |
| `HARDWARE.md` / `HARDWARE.en.md` | [`SERVER-HW.md`](SERVER-HW.md), [`GPU.md`](GPU.md), [`SERVER-CONFIG.md`](SERVER-CONFIG.md), [`PROVISIONING.md`](PROVISIONING.md) |
| `INFRASTRUCTURE.md` / `INFRASTRUCTURE.en.md` | [`HYPERVISORS.md`](HYPERVISORS.md), [`DATACENTERS.md`](DATACENTERS.md), [`STORAGE.md`](STORAGE.md), [`HARDWARE.md`](HARDWARE.md) |
| `KUBERNETES.md` / `KUBERNETES.en.md` | [`CICD.md`](CICD.md), [`CLOUD.md`](CLOUD.md), [`NETWORKING.md`](NETWORKING.md), [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
| `BIG-DATA.md` / `BIG-DATA.en.md` | [`DATABASES.md`](DATABASES.md), [`CLOUD.md`](CLOUD.md), [`MESSAGING.md`](MESSAGING.md), [`KUBERNETES.md`](KUBERNETES.md), [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) |
---
## Zdroje / Sources
Raw referenční data (dokumentace, knihy, standardy) podle oblastí:
| Oblast | Czech | English |
|--------|-------|---------|
| ☁️ Cloud | [`sources/cloud/sources.md`](sources/cloud/sources.md) | [`sources/cloud/sources.en.md`](sources/cloud/sources.en.md) |
| 🌐 Networking | [`sources/networking/sources.md`](sources/networking/sources.md) | [`sources/networking/sources.en.md`](sources/networking/sources.en.md) |
| 📊 Monitoring | [`sources/monitoring/sources.md`](sources/monitoring/sources.md) | [`sources/monitoring/sources.en.md`](sources/monitoring/sources.en.md) |
| 🔄 CI/CD | [`sources/cicd/sources.md`](sources/cicd/sources.md) | [`sources/cicd/sources.en.md`](sources/cicd/sources.en.md) |
| 🗄️ Databases | [`sources/databases/sources.md`](sources/databases/sources.md) | [`sources/databases/sources.en.md`](sources/databases/sources.en.md) |
| 🏗️ Infrastructure | [`sources/infrastructure/sources.md`](sources/infrastructure/sources.md) | [`sources/infrastructure/sources.en.md`](sources/infrastructure/sources.en.md) |
---
## KB Agents
| Agent | Popis / Description |
|-------|-------------------|
| [`kb-research`](.opencode/agents/kb-research.md) | Zpracovává [todo] položky — rešerše nových témat |
| [`kb-source-scout`](.opencode/agents/kb-source-scout.md) | Vyhledává nové zdroje a přidává je do sources/ |
| [`kb-reviewer`](.opencode/agents/kb-reviewer.md) | Audit konzistence, odkazů, duplicit, formátování |
| [`kb-index`](.opencode/agents/kb-index.md) | **Udržuje tento rozcestník** — scanuje soubory, extrahuje křížové reference, validuje odkazy |
---
## Principy / Principles
| Principy (CZ) | Principles (EN) |
|---------------|-----------------|
| **Dostupnost** — SLA, redundance, failover, multi-AZ | **Availability** — SLA, redundancy, failover, multi-AZ |
| **Škálovatelnost** — horizontální vs. vertikální, auto-scaling | **Scalability** — horizontal vs. vertical, auto-scaling |
| **Bezpečnost** — defense in depth, least privilege, zero trust | **Security** — defense in depth, least privilege, zero trust |
| **Náklady** — FinOps, right-sizing, reserved instances | **Cost** — FinOps, right-sizing, reserved instances |
| **Operability** — observabilita, automation, dokumentace | **Operability** — observability, automation, documentation |
---
*Rozcestník je automaticky udržován agentem `kb-index`. Poslední aktualizace: 2026-06-18.*

119
REDIS.en.md Normal file
View File

@@ -0,0 +1,119 @@
# 🔴 Redis
## Overview
Redis is an in-memory key-value store with advanced data structures, used primarily as a cache, session store, message broker, and real-time database. Runs in RAM with optional disk persistence (RDB/AOF).
## Data structures
| Structure | Description | Use case |
|-----------|-------------|----------|
| **String** | Binary string (max 512 MB) | Cache values, session tokens, counters |
| **Hash** | Map field-value | User profile, cached object |
| **List** | Linked list (push/pop on both ends) | Queue (RPUSH/LPOP), log stream |
| **Set** | Unique values (unordered) | Tags, deduplication, memberships |
| **Sorted Set** | Unique + score (sorted) | Leaderboards, rate limiting, timeouts |
| **Bitmap** | Bit field | Feature flags, daily active users |
| **HyperLogLog** | Approximate cardinality (12 KB = 2^64) | Unique visitors (error < 1%) |
| **Stream** | Append-only log (Kafka-like) | Event store, messaging |
| **Geospatial** | Geo-indexing (GEOADD, GEOSEARCH) | Location queries, proximity search |
| **JSON** | JSON document (RedisJSON module) | Document structures |
## Eviction policies
| Policy | Description | Use case |
|--------|-------------|----------|
| **noeviction** | Error on write when full | Transactional data, must not lose |
| **allkeys-lru** | LRU on all keys | General cache, standard |
| **allkeys-lfu** | LFU on all keys | Frequently accessed data |
| **volatile-lru** | LRU on keys with TTL | Cache with expiration |
| **volatile-ttl** | Closest to expiration | Short-lived data |
| **allkeys-random** | Random | Testing |
## Redis Cluster vs Sentinel
| Feature | Redis Sentinel | Redis Cluster |
|---------|---------------|---------------|
| **Scaling** | Read replicas (master + replica) | Data sharding (16384 hash slots) |
| **Auto-failover** | Yes (Sentinel) | Yes (gossip-based) |
| **Multi-key ops** | Yes (transactions on master) | Limited (same hash slot) |
| **Client communication** | Via Sentinel (deprecated) | Cluster nodes redirect (MOVED/ASK) |
| **Minimum nodes** | Master + Replica + 3 Sentinel | 3 masters (each with replica) |
| **Use case** | High availability, single shard | Multi-shard, horizontal scaling |
## Persistence
| Method | Description | RTO | RPO | Use case |
|--------|-------------|-----|-----|----------|
| **RDB** (Redis Database) | Periodic snapshot to dump.rdb | Minutes | Last snapshot | Cache, loss tolerated |
| **AOF** (Append-Only File) | Append-only log of all write operations | Seconds | 1 s (fsync every sec) | Data must not be lost |
| **RDB + AOF** | Combination | Seconds | 1 s | Recommended for production |
## Modules (RediSearch, RedisJSON, RedisGraph)
Redis is extensible via modules:
- **RediSearch** — full-text search, facets, prefix/suffix search
- **RedisJSON** — JSON path queries, document manipulation
- **RedisGraph** — graph DB (based on Cypher, deprecated since 2025)
- **RedisTimeSeries** — time-series with downsampling, retention policies
- **RedisBloom** — Bloom filters, Cuckoo filters, Top-K, Count-Min Sketch
## Memcached vs Redis
| Feature | Redis | Memcached |
|---------|-------|-----------|
| **Data structures** | String, Hash, List, Set, Sorted Set, Stream, JSON | String only |
| **Persistence** | RDB + AOF | None (purely in-memory) |
| **Replication** | Master-replica, Cluster | None (multi-threaded) |
| **Eviction** | 6 policies | LRU only |
| **Lua scripting** | Yes (EVAL) | No |
| **Transactions** | Yes (MULTI/EXEC) | No |
| **Pub/Sub** | Yes | No |
| **Streaming** | Yes (Stream) | No |
## Recommendations — where Redis is better
| Area | Redis | Competition | Why Redis |
|------|-------|-------------|-----------|
| **Cache (in-memory)** | < 1 ms latency, 6 eviction policies | Memcached (LRU string only) | Richer data types, persistence, cluster |
| **Session store** | Hash + TTL, Cluster for HA | DynamoDB (higher latency) | Simpler, faster, native expiration |
| **Rate limiting** | Sorted Set (sliding window counter) | Application in DB (complex) | Atomic operations, built-in logic |
| **Leaderboard / scoring** | Sorted Set (ZADD, ZRANK, ZREVRANGE) | SQL (ORDER BY + COUNT = expensive) | O(1) rank, O(log N) insert |
| **Message queue** | List/Stream (RPUSH+BLPOP) | Kafka (heavy, JVM) | Lightweight, embedded, no broker |
| **Real-time analytics** | HyperLogLog + Bitmap + Stream | ClickHouse (heavy analytics) | Real-time aggregation, small RAM |
| **Geolocation** | GEOADD, GEOSEARCH, GEODIST | PostGIS (heavier, disk-based) | In-memory, ideal for real-time |
### When to use Redis
- **Cache for API** — response cache, DB query cache, session cache
- **Session management** — distributed sessions across servers
- **Rate limiting** — API gateway, per-user/per-IP limits
- **Leaderboards / rankings** — real-time scoring
- **Message broker** — task queue (RQ, Celery with Redis), pub/sub notifications
- **Real-time analytics** — counting uniques, metrics, dashboards
- **Geo-proximity** — "find nearest branch" in < 1 ms
### When to use something else
- **Persistent data with SQL queries** → PostgreSQL or MySQL
- **Large volumes > RAM** → Memcached (multi-threaded), Dragonfly (more RAM utilization)
- **Long-term message queue** → Kafka, RabbitMQ (disk-based persistence)
- **Document DB** → MongoDB (persistent, complex queries)
## Redis licensing
Redis underwent a major license change in 2024:
| Period | License | Conditions |
|--------|---------|------------|
| **Until March 2024** | BSD 3-clause (open source) | Completely free use, including managed services |
| **Since March 2024** | RSALv2 + SSPL (dual license) | SSPL: if you offer Redis as a managed service, you must release the entire stack as open source. RSALv2: restrictions on cloud operators |
| **Valkey (fork, Linux Foundation)** | BSD 3-clause | Fully open source fork of Redis 7.2, supported by Linux Foundation, AWS, Google, Oracle |
**Impact**: Managed Redis services (AWS ElastiCache, Google Memorystore, Azure Cache for Redis) cannot use Redis 7.4+ without a commercial license → they are migrating to **Valkey**. For self-hosted Redis, no change — RSALv2/SSPL does not restrict internal use.
## Sources
References, books, and standards: [sources/databases/sources.en.md](sources/databases/sources.en.md)
*Last revision: 2026-06-03*

119
REDIS.md Normal file
View File

@@ -0,0 +1,119 @@
# 🔴 Redis
## Přehled
Redis je in-memory key-value store s pokročilými datovými strukturami, používaný primárně jako cache, session store, message broker a real-time databáze. Běží v RAM s možností persistence na disk (RDB/AOF).
## Data structures
| Struktura | Popis | Use case |
|-----------|-------|----------|
| **String** | Binární string (max 512 MB) | Cache hodnoty, session tokeny, counters |
| **Hash** | Map field-value | Uživatelský profil, objekt v cache |
| **List** | Linked list (push/pop na oba konce) | Queue (RPUSH/LPOP), log stream |
| **Set** | Unikátní hodnoty (unordered) | Tags, deduplikace, memberships |
| **Sorted Set** | Unikátní + score (řazení) | Leaderboardy, rate limiting, timeouts |
| **Bitmap** | Bitové pole | Feature flagy, daily active users |
| **HyperLogLog** | Approximate cardinality (12 KB = 2^64) | Unikátní návštěvníci (error < 1 %) |
| **Stream** | Append-only log (Kafka-like) | Event store, messaging |
| **Geospatial** | Geo-indexing (GEOADD, GEOSEARCH) | Lokalitní dotazy, proximity search |
| **JSON** | JSON dokument (RedisJSON modul) | Dokumentové struktury |
## Eviction policies
| Policy | Popis | Use case |
|--------|-------|----------|
| **noeviction** | Chyba při zápisu když je plno | Transakční data, neztrácet |
| **allkeys-lru** | LRU na všechny klíče | Obecná cache, standard |
| **allkeys-lfu** | LFU na všechny klíče | Často přistupovaná data |
| **volatile-lru** | LRU na klíče s TTL | Cache s expirací |
| **volatile-ttl** | Nejblíž k expiraci | Krátkodobá data |
| **allkeys-random** | Náhodný | Testování |
## Redis Cluster vs Sentinel
| Vlastnost | Redis Sentinel | Redis Cluster |
|-----------|---------------|---------------|
| **Škálování** | Read replicas (master + replica) | Data sharding (16384 hash slotů) |
| **Auto-failover** | Ano (Sentinel) | Ano (gossip-based) |
| **Multi-key ops** | Ano (transactiony na masteru) | Omezené (stejný hash slot) |
| **Client komunikace** | Přes Sentinel (deprecated) | Cluster nodes redirect (MOVED/ASK) |
| **Minimální uzly** | Master + Replica + 3 Sentinel | 3 masters (každý s replikou) |
| **Use case** | Vysoká dostupnost, single shard | Multi-shard, horizontální škálování |
## Persistence
| Metoda | Popis | RTO | RPO | Use case |
|--------|-------|-----|-----|----------|
| **RDB** (Redis Database) | Periodický snapshot do dump.rdb | Minuty | Poslední snapshot | Cache, ztráta tolerována |
| **AOF** (Append-Only File) | Append-only log všech write operací | Sekundy | 1 s (fsync every sec) | Data nesmí být ztracena |
| **RDB + AOF** | Kombinace | Sekundy | 1 s | Doporučeno pro produkci |
## Moduly (RediSearch, RedisJSON, RedisGraph)
Redis rozšiřitelný modulem:
- **RediSearch** — full-text search, facety, prefix/suffix vyhledávání
- **RedisJSON** — JSON path dotazy, manipulace dokumentů
- **RedisGraph** — grafová DB (na bázi Cypher, deprecated od 2025)
- **RedisTimeSeries** — time-series s downsamplingem, retention politikami
- **RedisBloom** — Bloom filtry, Cuckoo filtry, Top-K, Count-Min Sketch
## Memcached vs Redis
| Vlastnost | Redis | Memcached |
|-----------|-------|-----------|
| **Data structures** | String, Hash, List, Set, Sorted Set, Stream, JSON | Pouze String |
| **Persistence** | RDB + AOF | Žádná (čistě in-memory) |
| **Replication** | Master-replica, Cluster | Žádná (multi-threaded) |
| **Eviction** | 6 politik | LRU pouze |
| **Lua scripting** | Ano (EVAL) | Ne |
| **Transakce** | Ano (MULTI/EXEC) | Ne |
| **Pub/Sub** | Ano | Ne |
| **Streaming** | Ano (Stream) | Ne |
## Doporučení — v čem je Redis lepší
| Oblast | Redis | Konkurence | Proč Redis |
|--------|-------|------------|------------|
| **Cache (in-memory)** | < 1 ms latence, 6 eviction politik | Memcached (pouze LRU string) | Bohatší datové typy, persistence, cluster |
| **Session store** | Hash + TTL, Cluster pro HA | DynamoDB (vyšší latence) | Jednodušší, rychlejší, nativní expirace |
| **Rate limiting** | Sorted Set (sliding window counter) | Aplikace v DB (složité) | Atomic operace, vestavěná logika |
| **Leaderboard / scoring** | Sorted Set (ZADD, ZRANK, ZREVRANGE) | SQL (ORDER BY + COUNT = expensive) | O(1) rank, O(log N) insert |
| **Message queue** | List/Stream (RPUSH+BLPOP) | Kafka (těžká, JVM) | Lehká, embedded, žádný broker |
| **Real-time analytics** | HyperLogLog + Bitmap + Stream | ClickHouse (těžká analytika) | Agregace v reálném čase, malá RAM |
| **Geolokace** | GEOADD, GEOSEARCH, GEODIST | PostGIS (těžší, disk-based) | In-memory, ideální pro real-time |
### Kdy použít Redis
- **Cache pro API** — response cache, DB query cache, session cache
- **Session management** — distribuované session napříč servery
- **Rate limiting** — API gateway, per-user/per-IP limity
- **Leaderboardy / žebříčky** — real-time skórování
- **Message broker** — fronta úloh (RQ, Celery s Redis), pub/sub notifikace
- **Real-time analytics** — počítání unikátů, metrik, dashboardů
- **Geoproxmity** — "najdi nejbližší pobočku" v < 1 ms
### Kdy použít něco jiného
- **Trvalá data s SQL dotazy** → PostgreSQL nebo MySQL
- **Velké objemy > RAM** → Memcached (multi-threaded), Dragonfly (více RAM utilization)
- **Dlouhodobá fronta zpráv** → Kafka, RabbitMQ (disk-based persistence)
- **Dokumentová DB** → MongoDB (persistentní, komplexní dotazy)
## Redis licensing
Redis prošel zásadní změnou licence v roce 2024:
| Období | Licence | Podmínky |
|--------|---------|----------|
| **Do března 2024** | BSD 3-clause (open source) | Zcela volné použití, včetně managed služeb |
| **Od března 2024** | RSALv2 + SSPL (dual license) | SSPL: pokud nabízíte Redis jako managed službu, musíte uvolnit celý stack jako open source. RSALv2: omezení na cloud provozovatele |
| **Valkey (fork, Linux Foundation)** | BSD 3-clause | Plně open source fork Redis 7.2, podpora od Linux Foundation, AWS, Google, Oracle |
**Dopad**: Managed Redis služby (AWS ElastiCache, Google Memorystore, Azure Cache for Redis) nemohou používat Redis 7.4+ bez komerční licence → přechází na **Valkey**. Pro self-hosted Redis beze změny — RSALv2/SSPL neomezuje interní použití.
## Zdroje
Odkazy, knihy a standardy: [sources/databases/sources.md](sources/databases/sources.md)
*Poslední revize: 2026-06-03*

89
REVIEW.en.md Normal file
View File

@@ -0,0 +1,89 @@
# 📋 Review workflow — Review and content control
## Process
```
Draft ──→ Self-review ──→ Peer review ──→ Approval ──→ Merged
↑ │
└────────────── Feedback loop ───────────────────────┘
```
## Phases
### 1. Draft
- Author creates new content / edits existing
- Mark files as `[draft]` in the commit message note
- Goal: capture ideas, structure, and facts
### 2. Self-review (author)
- [ ] Is the content **understandable**? Would a junior understand it?
- [ ] Are the **facts correct**? Verify against sources / official documentation
- [ ] Are **sources** cited? (links in `sources/`)
- [ ] Is the **structure consistent** with the rest of the KB?
- [ ] Are **abbreviations** explained?
- [ ] **Spelling and grammar**
- [ ] Does the **tone** match — factual, without subjective opinions
- [ ] Does it contain **actionable best practices**, not just theory
### 3. Peer review (colleague / reviewer)
- Author requests a review (PR / issue / @mention)
- Reviewer checks:
- **Technical accuracy** — are data and concepts valid?
- **Completeness** — is anything important missing?
- **Impartiality** — does it not favor one vendor without reason?
- **Currency** — is any information outdated?
**Review template:**
```
## Review: [file name]
### Technical accuracy
- [ ] Facts are correct
- [ ] Recommendations are appropriate
- [ ] Cited sources are relevant
### Structure and form
- [ ] Logical structure
- [ ] Consistent formatting
- [ ] Language is understandable
### Comments
- [ ] [comment 1]
- [ ] [comment 2]
### Verdict
- [ ] Approved
- [ ] Approved with reservations (see comments)
- [ ] Rejected (reason: …)
```
### 4. Approval
- Approves: author + at least 1 peer reviewer
- After approval, content is considered `[done]`
- Changes after approval require a new review cycle
### 5. Merged / Published
- Content is considered current and trustworthy
- If a source in `sources/` is marked `[done]`, it confirms processing
## File states
| Status | Meaning |
|--------|--------|
| `[draft]` | In progress, not yet reviewed |
| `[in-review]` | Peer review in progress |
| `[done]` | Approved, current |
| `[outdated]` | Outdated, awaiting revision |
| `[deprecated]` | Replaced by another document |
## Regular revision
- **Quarterly** — check currency of the entire KB
- **Trigger** — new tool version, architecture change, EOL technology
- Each file should have a **last revision date** in its footer

89
REVIEW.md Normal file
View File

@@ -0,0 +1,89 @@
# 📋 Review workflow — Oponentura a kontrola obsahu
## Proces
```
Draft ──→ Self-review ──→ Peer review ──→ Approval ──→ Merged
↑ │
└────────────── Feedback loop ───────────────────────┘
```
## Fáze
### 1. Draft
- Autor vytvoří nový obsah / upraví existující
- Označí soubory jako `[draft]` v poznámce commit message
- Cíl: zachytit myšlenky, strukturu a fakta
### 2. Self-review (autor)
- [ ] Je obsah **srozumitelný**? Pochopí to junior?
- [ ] Jsou **fakta správná**? Ověřit proti sources / oficiální dokumentaci
- [ ] Jsou uvedeny **zdroje**? (odkazy v `sources/`)
- [ ] Je **struktura konzistentní** se zbytkem KB?
- [ ] Jsou **zkratky** vysvětleny?
- [ ] **Pravopis a gramatika**
- [ ] Odpovídá **tón** — faktický, bez subjektivních názorů
- [ ] Obsahuje **actionable best practices**, nejen teorii
### 3. Peer review (kolega / oponent)
- Autor zažádá o review (PR / issue / @mention)
- Oponent kontroluje:
- **Odborná správnost** — jsou data a koncepty validní?
- **Úplnost** — není něco důležitého vynecháno?
- **Nestrannost** — neupřednostňuje jeden vendor bezdůvodně?
- **Aktuálnost** — nejsou informace zastaralé?
**Review template:**
```
## Review: [název souboru]
### Odborná správnost
- [ ] Fakta jsou správná
- [ ] Doporučení jsou vhodná
- [ ] Uvedené zdroje jsou relevantní
### Struktura a forma
- [ ] Logické členění
- [ ] Konzistentní formátování
- [ ] Jazyk je srozumitelný
### Připomínky
- [ ] [připomínka 1]
- [ ] [připomínka 2]
### Verdikt
- [ ] Schvaluji
- [ ] Schvaluji s výhradami (viz připomínky)
- [ ] Zamítnuto (důvod: …)
```
### 4. Approval
- Schvaluje: autor + minimálně 1 peer reviewer
- Po schválení se obsah považuje za `[done]`
- Změny po schválení vyžadují nový review cyklus
### 5. Merged / Published
- Obsah je považován za aktuální a důvěryhodný
- Pokud je zdroj označen `[done]` v `sources/`, je to potvrzení zpracování
## Stavy souborů
| Status | Význam |
|--------|--------|
| `[draft]` | Rozpracováno, neprošlo review |
| `[in-review]` | Probíhá peer review |
| `[done]` | Schváleno, aktuální |
| `[outdated]` | Zastaralé, čeká na revizi |
| `[deprecated]` | Nahrazeno jiným dokumentem |
## Pravidelná revize
- **Kvartálně** — kontrola aktuálnosti celé KB
- **Trigger** — nová verze nástroje, změna architektury, EOL technologie
- Každý soubor by měl mít v patičce **datum poslední revize**

757
SERVER-CONFIG.en.md Normal file
View File

@@ -0,0 +1,757 @@
# ⚙️ Server configuration — best practices by workload
## General BIOS/UEFI settings
| Setting | Recommendation | Rationale |
|-----------|-----------|------------|
| **Boot mode** | UEFI | Secure Boot, GPT, larger disks |
| **Power profile** | Performance / OS Control | Max performance, C-States disabled |
| **Hyper-Threading** | Enabled | +30-50 % throughput for multi-thread |
| **Virtualization** | Enabled (VT-x/AMD-V) | Required for hypervisor, containers |
| **SR-IOV** | Enabled | GPU, NIC passthrough |
| **NUMA** | Enabled | NUMA-aware scheduling |
| **ACPI** | Enabled | Power management, OS-level |
| **Secure Boot** | Enabled | Secure boot chain |
| **TPM** | Enabled | Measured boot, key storage |
---
## 1. Database servers
### CPU Selection
| DB type | CPU preference | Rationale |
|--------|---------------|------------|
| **OLTP** (PostgreSQL, MySQL) | High clock, moderate cores | Low latency per transaction, limited parallelism |
| **OLAP** (ClickHouse, Snowflake) | Many cores, AVX-512 | Columnstore, high parallelism |
| **In-memory** (Redis, Memcached) | High clock, low cache latency | Single-threaded (Redis), RAM bandwidth |
| **Document** (MongoDB) | Balance (clock × cores) | Mixed workload |
| **Distributed** (Cassandra, Scylla) | Many cores, high cache | Shard-per-core (Scylla), compaction |
| **Oracle OLTP** | High clock, moderate cores, core-factor aware | CPU license cost (core factor 0.5 for AMD EPYC and Intel Xeon) |
| **Oracle OLAP / DW** | Many cores, large SGA, in-memory option | Parallel query, Exadata Smart Scan, compression |
### Oracle CPU licensing — core factor
Oracle licenses per core with a correction factor depending on the processor. Factor 0.5 means 2 cores = 1 Oracle license.
| Processor | Core factor | 64 physical cores → Oracle licenses |
|----------|-------------|--------------------------------------|
| AMD EPYC (all series) | 0.5 | 32 |
| Intel Xeon (Scalable) | 0.5 | 32 |
| IBM POWER | 1.0 | 64 |
| ARM (Ampere Altra) | 0.5 | 32 |
**Impact on CPU selection**: At the same Oracle license cost, EPYC with more cores is more advantageous — you get more compute power for the same license price.
### Configuration by company size and storage type
#### Variant A: Small company — local NVMe RAID
| Component | Recommendation | Note |
|-----------|-----------|----------|
| **CPU** | 1× EPYC 9124/9224 or Intel Xeon 4410Y (8-16C) | 1 socket, high clock |
| **RAM** | 64-256 GB (8-16 GB/core) | DDR5-4800, 1DPC |
| **OS disk** | 2× SATA/SAS SSD, RAID 1 (240-480 GB) | For OS + binaries |
| **Data disk** | 4-6× NVMe (U.2/E3.S), RAID 10 | Local data, no sharing |
| **WAL disk** | 2× NVMe RAID 1 (400-800 GB) | PostgreSQL only |
| **Network** | 2× 25 GbE (LACP) | Application traffic + management |
| **Form factor** | 1U or 2U | Single node, no cluster |
| **Storage backend** | Local RAID controller (PERC/Broadcom) | HW RAID 10 or SW RAID (mdadm) |
| **HA** | Application manages failover (patroni, repmgr, orchestrator) | Standby node on failure |
**Use case**: Startup, branch office, dev/test, < 500 users, single database server, low availability requirements.
#### Variant B: Medium company — local NVMe + asynchronous replication
| Component | Recommendation | Note |
|-----------|-----------|----------|
| **CPU** | 1-2× EPYC 9334/9374F or Intel Xeon 5418Y (16-24C) | 1-2 socket, balanced |
| **RAM** | 128-512 GB (8-16 GB/core) | DDR5-4800/5600, 1DPC |
| **OS disk** | 2× NVMe RAID 1 (2× 480 GB) | OS + binaries |
| **Data disk** | 6-8× NVMe, RAID 10 | Local NVMe, 3-6 TB usable |
| **WAL disk** | 2× NVMe RAID 1 (2× 800 GB) | Separate from data |
| **Network** | 2× 25 GbE (app) + 2× 25 GbE (replication) | Application and replication networks separated |
| **Form factor** | 2U | Primary + replica node |
| **Storage backend** | SW RAID (mdadm) or HW RAID (PERC H965) | Write-back cache with BBU |
| **HA** | Patroni / repmgr / MySQL InnoDB Cluster | Asynchronous replication to 1-2 standby |
**Use case**: E-commerce, medium SaaS, 500-5000 users, RPO < 1 min, RTO < 5 min.
#### Variant C: Large company — FC SAN (enterprise)
| Component | Recommendation | Note |
|-----------|-----------|----------|
| **CPU** | 2× EPYC 9654/9965 or Xeon 8592+/6980P (48-128C) | 2 socket, max cores, large cache |
| **RAM** | 512 GB - 2 TB (8-16 GB/core) | DDR5, 2DPC (speed penalty), 12 channels (EPYC) |
| **OS disk** | 2× SATA SSD RAID 1 (2× 480 GB) | OS only, data on SAN |
| **Data + WAL** | LUNs from FC SAN | Hitachi VSP / Dell PowerMax / Pure //X |
| **HBA** | 2× dual-port FC HBA (32/64 Gb) | Multipath (active-active), FC-NVMe |
| **Network** | 2× 25/100 GbE (app) + 2× 32/64 Gb FC (storage) | App and storage networks separated |
| **Form factor** | 2U | 2-8 node cluster (RAC, AlwaysOn AG) |
| **Storage backend** | FC SAN — LUN per database | Thin provisioning, RAID on SAN, snapshots |
| **HA** | Oracle RAC / SQL Server AOAG / PostgreSQL Patroni | Synchronous replication, FC multipath |
**SAN advantages**: Centralized management, snapshots, cloning, disaster recovery (SRDF/Metro), separate storage network, higher availability.
**Disadvantages**: Higher latency compared to local NVMe (~50-200 µs over SAN vs ~10 µs local NVMe), higher CAPEX, vendor lock-in.
#### Variant D: Large company — Ceph / SDS backend
| Component | Recommendation | Note |
|-----------|-----------|----------|
| **CPU** | 2× EPYC 9334/9654 (16-32C) | Fewer cores than SAN variant — part of CPU goes to Ceph client |
| **RAM** | 256-512 GB | Less RAM — Ceph client cache is not as effective as local buffer |
| **OS disk** | 2× SATA SSD RAID 1 (2× 480 GB) | OS |
| **Network** | 2× 25/100 GbE (app) + 2× 25/100 GbE (Ceph public) | App and Ceph traffic over Ethernet |
| **HBA** | Storage HBA in IT/HBA mode (no RAID) | For Ceph OSD node, not DB node |
| **Form factor** | 2U | DB node + separate Ceph OSD node |
| **Storage backend** | RBD (RADOS Block Device) over Ceph | 3× replication or erasure coding |
| **HA** | Application + Ceph inherent HA | Ceph self-healing, auto-rebalance |
**Ceph advantages**: No vendor lock-in, horizontal scaling, unified platform for block/file/object, lower CAPEX.
**Disadvantages**: Higher latency and CPU overhead (Ceph client → network → OSD), variable performance, more complex troubleshooting.
#### Variant E: Cloud — RDS / CloudSQL / Azure SQL
| Component | Recommendation | Note |
|-----------|-----------|----------|
| **Compute** | AWS RDS (db.r7g/r8g), Azure SQL (GP/BC/Hyperscale) | Managed service, no OS access |
| **Storage** | EBS gp3 / io2, Azure Premium SSD v2, Cloud SQL SSD | Automatic scaling, PITR, multi-AZ |
| **Network** | Security Group, Private Link, VPC peering | No HBA, no SAN — everything over Ethernet |
| **HA** | Multi-AZ (synchronous), read replicas | Managed failover, RTO < 60 s |
| **Backup** | Automated, PITR (7-35 days) | No management required |
**Use case**: No on-prem hardware, elastic scaling, pay-per-use, lower operational overhead.
**Disadvantages**: Higher long-term costs, data residency, network latency, limited customization.
### Variant comparison
| Aspect | Local NVMe (small) | Local NVMe (medium) | FC SAN | Ceph | Cloud |
|--------|---------------------|----------------------|--------|------|-------|
| **Latency** | ~10 µs | ~10 µs | ~50-200 µs | ~100-500 µs | ~100-1000 µs |
| **Scaling** | Vertical | Vertical | Horizontal | Horizontal | Elastic |
| **CAPEX** | Low | Medium | High | Medium | None (OPEX) |
| **Operational overhead** | Low | Low | High (SAN admin) | Medium | None |
| **HA** | Application | Patroni/Cluster | RAC/AOAG | Ceph HA | Managed |
| **RPO** | 1-5 min | < 1 min | < 10 s | < 30 s | < 60 s |
| **RTO** | 5-15 min | < 5 min | < 2 min | < 5 min | < 60 s |
| **Number of servers** | 1-2 | 2-4 | 4-16 | 6-20+ | 0 (managed) |
| **Company** | Startup/SME | SME/Enterprise | Enterprise | Enterprise | Any |
### PostgreSQL parameter matrix by storage type
| Parameter | Local NVMe | FC SAN | Ceph RBD |
|----------|-----------|--------|----------|
| `random_page_cost` | 1.1 | 1.5-2.0 | 2.0-3.0 |
| `effective_io_concurrency` | 300 | 100-200 | 50-100 |
| `synchronous_commit` | off (NVMe cache) | on (SAN cache) | off (Ceph cache) |
| `full_page_writes` | on | on | on (even over Ceph) |
### Storage layout by backend type
**Local NVMe (small/medium):**
```
Mount point FS RAID Disk Purpose
/ ext4 1 (mirror) 2× SATA SSD OS
/data xfs 10 4-8× NVMe Data
/wal xfs 1 (mirror) 2× NVMe WAL (PG)
```
**FC SAN (enterprise):**
```
Mount point FS Device Purpose
/ ext4 local RAID 1 (2× SSD) OS
/dev/sdb xfs FC LUN 1 (500 GB) WAL (PG)
/dev/sdc xfs FC LUN 2 (2 TB) Data
/dev/sdd xfs FC LUN 3 (2 TB) Indexes (separate)
```
**Ceph RBD:**
```
Mount point FS Ceph device Purpose
/ ext4 local RAID 1 (2× SSD) OS
/dev/rbd0 xfs rbd datastore-01 Data + WAL (Ceph RBD)
```
### Kernel tuning by variant
**Local NVMe:**
```
vm.dirty_ratio = 30
vm.dirty_background_ratio = 5
```
**FC SAN:**
```
# SAN storage — higher latency, less aggressive flush
vm.dirty_ratio = 20
vm.dirty_background_ratio = 3
vm.dirty_expire_centisecs = 3000 # Defer writes (SAN cache)
```
**Ceph RBD:**
```
# Ceph RBD — network storage, optimize for RBD cache
vm.dirty_ratio = 15
vm.dirty_background_ratio = 2
# RBD cache settings
# rbd cache = true (client-side)
# rbd cache size = 256-512 MB
```
### Database-specific tuning
| Parameter | PostgreSQL | MySQL | Oracle | MongoDB |
|----------|-----------|-------|--------|---------|
| **Cache** | `shared_buffers` 25 % RAM | `innodb_buffer_pool` 70-80 % RAM | `SGA_TARGET` 60-80 % RAM | `WiredTiger cache` 50-80 % RAM |
| **OS cache** | `effective_cache_size` 75 % RAM | OS cache + InnoDB | OS cache (double buffering risk with large SGA) | OS cache |
| **Write buffer** | `wal_buffers` 64-256 MB | `innodb_log_file_size` 1-4 GB | Redo log (2-4 groups, 200 MB-4 GB) | WiredTiger log |
| **Connections** | `max_connections` 50-500 | `max_connections` 100-500 | `processes` 200-2000 | maxIncomingConnections |
| **I/O** | `effective_io_concurrency` 200 | `innodb_io_capacity` 2000 | `db_file_multiblock_read_count` 128 | WiredTiger eviction |
| **Huge pages** | `huge_pages = try` | `large-pages = ON` | `use_large_pages = only` (mandatory) | transparent_hugepages=never |
| **Parallel query** | `max_parallel_workers` 4-8 | `innodb_parallel_read_threads` 4 | `parallel_degree_policy = auto` — up to 64 | — |
### Connectivity by variant
| Variant | App network | Storage network | Replication | Management |
|----------|---------|-------------|-----------|------------|
| **Local (small)** | 2× 25 GbE LACP | — | 2× 25 GbE (same) | iDRAC/iLO |
| **Local (medium)** | 2× 25 GbE LACP | — | 2× 25 GbE dedicated | iDRAC/iLO |
| **FC SAN** | 2× 25/100 GbE | 2× 32/64 Gb FC (multipath) | FC replication | iDRAC/iLO + SAN mgmt |
| **Ceph** | 2× 25/100 GbE | 2× 25/100 GbE (public net) | 2× 25/100 GbE (cluster net) | iDRAC/iLO + Ceph mgmt |
| **Cloud** | Elastic IP / Private Link | — | — | AWS Console / API |
| **Oracle Standalone** | 2× 25 GbE LACP | ASM (2× 25 GbE or FC 32G) | Data Guard 2× 25 GbE | iLO + ASM mgmt |
| **Oracle RAC** | 2-4× 25/100 GbE | 2× 64 Gb FC (multipath) | Cache Fusion interconnect | iLO + SAN mgmt |
| **Oracle Exadata** | 4-8× 100 GbE RoCE | NVMe over Fabric | RDMA interconnect | Exadata CLI + OEDA |
### Oracle-specific configuration
#### Oracle ASM — diskgroup layout
Oracle ASM (Automatic Storage Management) replaces traditional filesystem + volume manager:
| Diskgroup | Redundancy | Disks | Purpose |
|-----------|-----------|-------|-------|
| **DATA** | Normal (2× mirror) | 4-12× FC LUN/NVMe | Data files, temp files, control files |
| **FRA** (Flash Recovery Area) | Normal (2× mirror) | 2-6× FC LUN/NVMe | Archive logs, backup, flashback logs |
| **REDO** | High (3× mirror) | 2-4× FC LUN/NVMe | Online redo log groups (I/O critical) |
| **SPFILE** | Normal | 2× small LUN | Server parameter file |
**ASM striping**: Coarse (1 MB) for regular data, Fine (128 KB) for redo logs (lower write latency).
#### Variant O1: Standalone Oracle (small/medium, single instance)
| Parameter | Small (< 500 users) | Medium (500-2000 users) |
|----------|---------------------|------------------------|
| **CPU** | 1-2× EPYC 9124-9224 / Xeon 4410Y (8-16C) | 2× EPYC 9334-9374F / Xeon 5418Y (16-24C) |
| **RAM (SGA + PGA)** | 64-128 GB (SGA 70 %, PGA 30 %) | 128-512 GB (SGA 60-80 %, PGA 20-40 %) |
| **Huge pages** | Yes (vm.nr_hugepages) — mandatory for SGA | Yes |
| **OS disk** | 2× SATA SSD RAID 1 (240 GB) | 2× NVMe RAID 1 (480 GB) |
| **DATA + FRA** | 4-6× NVMe, ASM normal redundancy | 6-8× NVMe or FC LUN, ASM normal |
| **REDO** | 2-4× NVMe (separate from DATA), ASM high | 4× FC LUN (separate), ASM high |
| **Archive log** | Local FRA | FC LUN (FRA diskgroup) |
| **Network (app)** | 2× 25 GbE LACP | 2-4× 25/100 GbE LACP |
| **Network (storage)** | — (local NVMe) | 2× FC 32G multipath |
| **Network (Data Guard)** | — | 2× 25 GbE dedicated |
| **DB version** | Oracle SE2 (max 16 threads) | Oracle EE (unlimited) |
**Use case**: Dev/test, small production DBs, branch offices. SE2 license = max 16 CPU threads, limited parallel execution.
#### Variant O2: Oracle Data Guard (medium/large, HA + DR)
Primary + standby in active-passive mode, Active Data Guard possible for reporting.
| Parameter | Recommendation |
|----------|-----------|
| **CPU** | 2× EPYC 9654-9965 / Xeon 8592+ (32-64C) |
| **RAM** | 256-1024 GB (SGA 60-80 %, PGA 20-40 %) |
| **Huge pages** | Yes (50-80 % RAM allocated for SGA) |
| **OS disk** | 2× NVMe RAID 1 (480 GB) |
| **Storage** | FC SAN LUN (DATA + FRA + REDO separate) or NVMe + ASM |
| **HBA** | 2× dual-port FC 32/64 Gb (multipath active-active) |
| **App network** | 2-4× 25/100 GbE LACP |
| **Storage network** | 2× FC 32/64 Gb multipath |
| **Data Guard network** | 2× 25/100 GbE dedicated (sync or async) |
| **Data Guard mode** | Maximum Availability (sync, fallback to async) — RPO = 0 |
| **Topology** | 1 primary + 1-2 standby (physical), far sync for geo-DR |
| **Active Data Guard** | Standby open for read (reporting, backup) — requires ADG license |
**Data Guard latency**:
```text
Synchronous (Maximum Availability):
Primary COMMIT → LGWR flush REDO → sync over network → Standby LGWR → ACK → ~1-5 ms
RPO = 0, impact on write latency
Asynchronous (Maximum Performance):
Primary COMMIT → LGWR flush REDO → async to standby buffer → ~0.1-1 ms
RPO = a few seconds, negligible write impact
```
**Network requirements for Data Guard sync**:
- RTT < 2 ms for synchronous mode (recommended < 1 ms)
- Min. 10 GbE, recommended 25 GbE (throughput = REDO rate × 2)
- REDO rate: OLTP ~50-500 MB/s, batch ~500-2000 MB/s
- At REDO rate 500 MB/s and 25 GbE → ~20 % link utilization
#### Variant O3: Oracle RAC (large, enterprise)
Multi-instance cluster with shared storage and Cache Fusion.
| Parameter | Recommendation |
|----------|-----------|
| **Number of nodes** | 2-4 (typical), max 64 (RAC cluster) |
| **CPU per node** | 2× EPYC 9654-9965 / Xeon 8592+ (32-64C) |
| **RAM per node** | 512-2048 GB (SGA 60-80 %, PGA 20-40 %) |
| **Huge pages** | Yes (1 GB pages if RAM > 512 GB) |
| **Storage** | FC SAN — shared LUNs (ASM normal/high redundancy) |
| **HBA** | 2× dual-port FC 64 Gb (multipath, active-active) |
| **App network** | 2-4× 25/100 GbE LACP (VIP, SCAN listener) |
| **Storage network** | 2-4× FC 64 Gb (multipath per node) |
| **Cache Fusion interconnect** | 2× 100 GbE (RoCE v2 or InfiniBand) — dedicated |
| **RAC interconnect latency** | < 5 µs (recommended), max < 10 µs |
| **ASM** | Normal redundancy (2-way mirror) |
| **Oracle Clusterware** | Voting disk (3× 1 GB LUN), OCR (3× 500 MB LUN) |
| **Service** | OLTP_service, REPORT_service, BATCH_service |
**Cache Fusion — critical interconnect**:
```
Node A (DB instance) ←──→ Node B (DB instance)
│ │
└──────── ASM ───────────┘
FC SAN (shared storage)
Cache Fusion traffic: dirty block transfer between instances
→ Latency < 5 µs, otherwise RAC scaling degrades
→ Capacity: 2× 100 GbE, dedicated switch or InfiniBand HDR100
→ Recommended MTU: 9000 (jumbo frames)
```
**RAC sizing by transaction count**:
| TPS | Nodes | CPU per node | RAM per node | Interconnect |
|-----|------|-------------|-------------|-------------|
| < 10 000 | 2 | 16-24C | 256 GB | 2× 25 GbE |
| 10 000 - 50 000 | 2-4 | 32-48C | 512 GB | 2× 100 GbE RoCE |
| 50 000 - 200 000 | 4-8 | 48-64C | 1024 GB | 2× 100 GbE RoCE / InfiniBand |
| > 200 000 | 8+ | 64-128C | 2048 GB | InfiniBand HDR100/HDR200 |
**RAC sizing — license cost calculation**:
```text
Example: 4-node RAC, each node 2× EPYC 9654 (96C) = 192 cores per node
Core factor 0.5 → 96 Oracle licenses per node
4 × 96 = 384 Oracle EE licenses
At ~$47.5k/license → ~$18.2M (licenses only, without 22 % annual support)
```
#### Variant O4: Oracle Exadata (hyperscale)
Engineered system — optimal for hybrid workload (OLTP + DW).
| Parameter | X9M / X10M | Use case |
|----------|-----------|----------|
| **Database servers** | 2-8× (Xeon, 1.5-6 TB RAM, NVMe) | Compute |
| **Storage servers** | 3-18× (NVMe + HDD, Smart Scan) | Predicate offloading |
| **Smart Scan** | Filtering at storage layer | Less data over network, higher throughput |
| **RoCE interconnect** | 100 GbE (RDMA) | Low latency, high bandwidth |
| **In-Memory Column Store** | Optional license | Real-time analytics without ETL |
| **HCC (Hybrid Columnar Compression)** | Compression in storage servers | Up to 10-15× compression for DW |
| **Rack power** | ~15-30 kW (full rack) | Higher density |
**When to choose Exadata over standalone RAC**:
- OLTP > 50 000 TPS
- Consolidation needed (multiple DBs on one cluster)
- Smart Scan significantly accelerates reporting on production data
- HCC for storage savings on DW workloads
---
## 2. Hypervisor host (ESXi / KVM / Hyper-V)
### Configuration by size and storage type
#### Variant A: Small company — local storage (2-3 hosts)
| Component | Recommendation | Note |
|-----------|-----------|----------|
| **CPU** | 1× EPYC 9224/9254 or Xeon 4410Y/5418Y (12-24C) | 1 socket, enough cores for VM density |
| **RAM** | 128-256 GB (4-8 GB/core) | DDR5, 1DPC |
| **OS disk** | 2× SATA SSD RAID 1 (2× 240-480 GB) | ESXi / Proxmox / Hyper-V boot |
| **VM storage** | 4-6× SATA/SAS SSD, RAID 5/6 or 10 | Local RAID, 4-12 TB usable |
| **Network** | 2-4× 10/25 GbE (LACP) | Shared for everything (management + VM + storage) |
| **Hypervisor** | VMware vSphere Standard / Proxmox VE / Hyper-V | Basic license, no enterprise features |
| **Storage backend** | Local RAID controller (PERC H755, Broadcom 9560) | HW RAID with cache, write-back |
| **HA** | VMware HA / Proxmox HA | Restart VM on another host on failure |
| **Backup** | Veeam B&R Free / PBS (Proxmox Backup Server) | Local or USB disk |
**Use case**: Small office, branch office, dev/test, < 10 VMs, low budget, simple management.
**Limitations**: No vMotion without shared storage, outage during host failure (HA restart, not seamless).
#### Variant B: Medium company — vSAN / Ceph (3-6 hosts)
| Component | Recommendation | Note |
|-----------|-----------|----------|
| **CPU** | 1-2× EPYC 9334/9654 or Xeon 5418Y/8592+ (16-32C) | 1-2 socket |
| **RAM** | 256-512 GB (4-8 GB/core) | DDR5, 2DPC (minimal penalty) |
| **OS disk** | 2× SATA SSD RAID 1 or 2× M.2 NVMe (BOSS-S1) | Separate from VM storage |
| **Cache tier** | 1-2× NVMe (vSAN caching / Ceph WAL+DB) | For write performance |
| **Capacity tier** | 4-8× SATA/SAS SSD or HDD (vSAN capacity / Ceph OSD) | HDD for capacity, SSD for performance |
| **Network** | 4× 25/100 GbE — 2× VM + mgmt, 2× storage (vSAN/Ceph) | Separate storage network, RDMA (RoCE v2) |
| **Hypervisor** | VMware vSAN / Proxmox Ceph / StarWind HCI | HCI license (vSAN ~$2.5k/Core) |
| **Storage backend** | vSAN OSA/ESA or Ceph (RADOS) | Distributed storage, auto-rebalance |
| **HA** | vSphere HA + vSAN / Proxmox HA + Ceph | vMotion, DRS, automated failover |
| **Failover** | N+1 (one host as reserve) | vSAN requires min. 4 hosts (ESA min. 3) |
**Pure Ceph variant (Proxmox / OpenStack)**:
```
Proxmox node (3-6×):
├── CPU: 1× EPYC 9224-9334 (12-24C)
├── RAM: 128-256 GB
├── OS: 2× SATA SSD RAID 1
├── Ceph OSD: 4-8× NVMe/SATA SSD (RAW, HBA mode)
├── Network: 2× 25 GbE (public) + 2× 25 GbE (cluster)
└── Storage: Ceph 3× replication, CRUSH host failure domain
```
**VMware vSAN variant (4-6 hosts)**:
```
vSAN node (4-6×):
├── CPU: 1-2× EPYC/Xeon (16-32C)
├── RAM: 256-512 GB
├── OS: 2× M.2 NVMe (BOSS-S1) or SD card (deprecated)
├── vSAN cache: 1-2× NVMe (write buffer)
├── vSAN capacity: 4-8× SATA SSD (vSAN ESA) or HDD (vSAN OSA)
├── Network: 2× 25/100 GbE (VM) + 2× 25 GbE (vSAN)
└── Storage: vSAN ESA (all-NVMe) or OSA (hybrid)
```
**Use case**: SME, enterprise division, 10-100 VMs, need for vMotion, DRS, HA, simple storage management.
#### Variant C: Large company — FC SAN (6+ hosts)
| Component | Recommendation | Note |
|-----------|-----------|----------|
| **CPU** | 2× EPYC 9654/9965 or Xeon 8592+/6980P (32-64C) | 2 socket, max VM density |
| **RAM** | 512 GB - 2 TB (4-8 GB/core) | DDR5, 2DPC |
| **OS disk** | 2× SATA SSD RAID 1 or SD card (vSphere) | Boot, image storage |
| **VM storage** | LUNs from FC SAN — VMFS / NFS datastores | Hitachi, Dell, Pure, HPE storage |
| **HBA** | 2× dual-port FC HBA 32/64 Gb | Multipath, FC-NVMe |
| **Network** | 4-8× 25/100 GbE — split by traffic type | Management, VM, vMotion, FT separated |
| **Hypervisor** | VMware vSphere Enterprise+ / Hyper-V DC | Enterprise license, DRS, HA, FT |
| **Storage backend** | FC SAN — VMFS 8 datastores, VVols | Thin provisioning, storage DRS, array snapshots |
| **HA** | vSphere HA + DRS + vCenter | vMotion, DRS, FT, SRM for DR |
| **Failover** | N+1 or admission control (CPU/RAM reserve) | Reserved capacity for HA failover |
**Use case**: Enterprise, 100+ VMs, mix of DB and applications, centralized storage management, enterprise SLA.
#### Variant D: Hyperscale — Ceph / SDS (20+ hosts)
| Component | Recommendation | Note |
|-----------|-----------|----------|
| **CPU** | 2× EPYC 9654/9965 (64-128C) | 2 socket, compute optimal |
| **RAM** | 512 GB - 1 TB (2-4 GB/core) | Low overcommit ratio for consistency |
| **OS disk** | 2× M.2 NVMe RAID 1 (BOSS) | Boot |
| **Network** | 4-8× 100 GbE (compute + storage) | Separate OVN/OVS for SDN, VXLAN tunneling |
| **Hypervisor** | OpenStack (Nova) / OpenShift (KubeVirt) | Open source, API-driven, multi-tenant |
| **Storage backend** | Ceph (RADOS, RBD, RGW, CephFS) | Unified storage, erasure coding (8+3) |
| **Orchestration** | OpenStack / Kubernetes | Infrastructure-as-Code, autoscaling |
| **HA** | OpenStack HA / Kubernetes HA | Self-healing, auto-rebalance |
**Use case**: Cloud provider, hyperscale, 500+ VMs, multi-tenant, maximum automation.
### Hypervisor variant comparison
| Aspect | Local (small) | vSAN/Ceph (medium) | FC SAN (large) | Ceph hyperscale |
|--------|---------------|---------------------|----------------|-----------------|
| **Storage** | Local RAID | vSAN / Ceph (HCI) | FC SAN (centralized) | Ceph (distributed) |
| **Number of hosts** | 2-3 | 3-6 | 6-50+ | 20+ |
| **VM latency** | ~10 µs (local) | ~100-500 µs | ~200 µs (SAN) | ~500-2000 µs |
| **CAPEX/host** | Low | Medium | High | Medium |
| **CAPEX storage** | Low | None (part of hosts) | High (SAN array) | None (part of hosts) |
| **Management** | Simple (per host) | vCenter / Proxmox | vCenter + SAN mgmt | OpenStack / K8s |
| **vMotion** | No (no shared storage) | Yes (vSAN / Ceph RBD) | Yes (FC LUN) | Yes (Ceph RBD) |
| **DRS** | No | Yes (vSphere) | Yes (vSphere) | OpenStack scheduler |
| **Scaling** | Vertical | Horizontal (add host) | Horizontal (host + SAN) | Horizontal |
### Network design by variant
#### Small (local storage)
| Traffic | VLAN | Speed | Teaming | Note |
|---------|------|----------|---------|----------|
| Management | Mgmt | 1 GbE | Active/Passive | Dedicated port (iLO/iDRAC) |
| VM + Storage | All | 2-4× 10/25 GbE | LACP | Shared, VLAN tagging |
```
┌──────────────────────────────────────────┐
│ Host │
│ ┌──────┐ ┌─────────────────────────────┐│
│ │ iLO │ │ NIC1 NIC2 ││
│ │ 1 GbE │ │ [LACP] 25 GbE ││
│ └──────┘ └──────────┬──────────────────┘│
└──────────────────────┼───────────────────┘
┌─────┴─────┐
│ Switch │
└───────────┘
```
#### Medium (vSAN / Ceph)
| Traffic | VLAN | Speed | Teaming | Note |
|---------|------|----------|---------|----------|
| Management | Mgmt | 1 GbE | Active/Passive | Dedicated iLO/iDRAC |
| VM | VM | 2× 25/100 GbE | LACP | VM traffic, migration |
| Storage | vSAN/Ceph | 2× 25/100 GbE | LACP or RDMA | Separate, Jumbo frames (MTU 9000) |
```
┌──────────────────────────────────────────┐
│ Host │
│ ┌──────┐ ┌──────────┐ ┌───────────────┐│
│ │ iLO │ │ NIC1 NIC2│ │ NIC3 NIC4 ││
│ │ 1 GbE │ │ VM traffic│ │ Storage (vSAN)││
│ └──────┘ └──────────┘ └───────────────┘│
└──────────────────────────────────────────┘
```
#### Large (FC SAN)
| Traffic | VLAN | Speed | Teaming | Note |
|---------|------|----------|---------|----------|
| Management | Mgmt | 1 GbE | Active/Passive | Dedicated |
| VM | VM | 2-4× 25/100 GbE | LACP | VM traffic |
| vMotion | vMotion | 2× 25 GbE | Dedicated | Multi-NIC vMotion |
| FT | FT | 2× 10/25 GbE | Dedicated | Low latency |
| Storage | — | 2× 32/64 Gb FC | Multipath | FC SAN |
```
┌──────────────────────────────────────────────┐
│ Host │
│ ┌──────┐ ┌────────────┐ ┌────┐ ┌─────────┐│
│ │ iLO │ │ NIC1-4 │ │HBA1│ │ HBA2 ││
│ │ 1 GbE │ │ VM+vMotion+FT│ │32Gb│ │ 32Gb ││
│ └──────┘ └────────────┘ └─┬──┘ └──┬──────┘│
└────────────────────────────┼───────┼───────┘
│ │
┌───────┴───┐ ┌─┴────────┐
│ Ethernet │ │ FC Switch │
│ Switch │ │ (Brocade/ │
│ │ │ Cisco) │
└───────────┘ └──────────┘
```
### BIOS for hypervisor — all variants
| Setting | Value | Rationale |
|-----------|---------|------------|
| Hyper-Threading | Enabled | Higher VM density |
| Virtualization Technology | Enabled | VT-x/AMD-V |
| VT-d / IOMMU | Enabled | Passthrough, SR-IOV |
| Power Management | Performance / OS | Minimize VM exit latency |
| C-States | Disabled | Lower VM exit latency (important for real-time VMs) |
| NUMA | Enabled | NUMA-aware VM placement |
| SR-IOV | Enabled | NIC/GPU virtualization |
| Adjacent Sector Prefetch | Enabled (Intel) | Better sequential reads |
| DCU Streamer / IP Prefetcher | Enabled | HW prefetch for VM workload |
| Patrol Scrub | Disabled (vSAN/Ceph) | Can cause latency spikes with SDS |
### Hypervisor selection by variant
| Criterion | VMware vSphere | Proxmox VE | Hyper-V | OpenStack |
|-----------|---------------|------------|---------|-----------|
| **Size** | SME - Enterprise | SME | SME - Enterprise | Hyperscale |
| **Storage** | vSAN, SAN, NFS | Ceph, ZFS, NFS | Storage Spaces, SAN | Ceph, manila |
| **License** | ~$1-5k/core | Free (support ~$500/host) | Part of Windows Server | Open source |
| **Familiarity** | Highest | Medium | Windows admin | Low |
| **Automation** | Terraform, Ansible, PowerCLI | Ansible, Terraform, PBS | PowerShell, SCVMM | Terraform, Heat, Ansible |
| **Ecosystem** | Broadest (Veeam, Zerto, SRM) | Growing (PBS, remote migration) | Windows ecosystem | Open source (Kolla, TripleO) |
---
## 3. Kubernetes node
### Node profiles
| Role | CPU | RAM | Storage | Network | Use case |
|------|-----|-----|---------|---------|----------|
| **General purpose** | 16-32 cores | 64-128 GB | 1× NVMe OS + 1×NVMe local | Web, API, microservices |
| **Memory optimized** | 32-64 cores | 256-512 GB | 1× NVMe OS + 2×NVMe local | In-memory cache, DB |
| **Compute optimized** | 64-128 cores | 128-256 GB | 1× NVMe OS | Batch, CI/CD |
| **GPU node** | 32-64 cores | 512-1024 GB | 1× NVMe OS + 4-8×NVMe local | AI/ML training, inference |
| **Storage node** | 16-32 cores | 64-128 GB | 4-12× NVMe/SATA (Ceph/Longhorn) | SDS, persistent volumes |
### Kernel tuning
```
# /etc/sysctl.d/99-kubernetes.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
net.ipv4.conf.all.forwarding = 1
# Connection tracking (for NodePort, Service)
net.netfilter.nf_conntrack_max = 2097152
net.netfilter.nf_conntrack_tcp_timeout_established = 86400
# File watchers (for kubelet, containerd)
fs.inotify.max_user_instances = 8192
fs.inotify.max_user_watches = 524288
# Memory management
vm.swappiness = 0
vm.overcommit_memory = 1 # Allow overcommit (CRI-O, containerd)
vm.panic_on_oom = 0
kernel.panic = 10
kernel.panic_on_oops = 1
```
### Container storage
| Type | Recommendation | Note |
|-----|-----------|----------|
| **OS disk** | RAID 1 (2× NVMe) | Ext4/XFS, 100-200 GB |
| **Container runtime image** | RAID 1 (2× NVMe) | /var/lib/containerd, 200-500 GB |
| **Local PV** | Single NVMe | Raw device, no RAID |
| **Rook/Ceph OSD** | Raw NVMe/SATA | HBA/IT mode, no RAID |
| **Longhorn** | Raw NVMe/SATA | Ext4/XFS per volume |
---
## 4. Storage server (Ceph / MinIO / NAS)
### Ceph OSD node
| Component | Recommendation | Note |
|-----------|-----------|----------|
| **CPU** | 1-2 cores per OSD | Up to 12 OSD per node (24 cores) |
| **RAM** | 4-8 GB per OSD + OS | BlueStore cache, 16-64 GB min |
| **Network** | 2× 25/100 GbE | Public + Cluster network |
| **Storage** | 10-12× NVMe/SATA SSD OSD | HBA/IT mode, no RAID |
| **OS disk** | 2× SATA SSD RAID 1 | OS, Ceph MON/MGR |
**BIOS for Ceph:**
- SATA/NVMe: AHCI/NVMe mode (not RAID)
- C-States: Disabled (lower OSD latency)
- NUMA: Enabled
- Power: Performance
### MinIO node
| Component | Recommendation |
|-----------|-----------|
| **CPU** | 8-16 cores (32+ for erasure coding) |
| **RAM** | 32-64 GB + 1 GB per 1 TB storage |
| **Storage** | 4-16× NVMe (direct, no RAID) |
| **Network** | 2× 25/100 GbE |
| **OS** | Ubuntu / RHEL, XFS (for data) |
### NAS (TrueNAS / FreeNAS)
- **ZFS**: RAID-Z1/Z2/Z3, compression (lz4, zstd), dedup
- **ARC cache**: 1 GB per 1 TB storage (max 64 GB)
- **L2ARC**: NVMe cache (optional, read-heavy)
- **SLOG**: NVDIMM / Optane (sync write, ZIL)
- **Network**: 2-4× 10/25 GbE LACP
---
## 5. Web / API servers
| Parameter | Recommendation |
|----------|-----------|
| **CPU** | High clock, 8-32 cores |
| **RAM** | 32-128 GB |
| **Storage** | 2× NVMe RAID 1 (OS + app) |
| **OS** | Ubuntu / RHEL, optimized kernel |
| **Network** | 2× 10/25 GbE (bonding) |
**Kernel tuning:**
```
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 15
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.core.netdev_max_backlog = 65535
```
---
## Quick decision tree — server selection by workload, size and storage
```mermaid
flowchart TD
W["What workload?"] --> DB["Database"]
W --> HV["Virtualization"]
W --> K8s["Kubernetes"]
W --> AI["AI/ML"]
W --> ST["Storage server"]
W --> WEB["Web / API"]
DB --> DBS{"Company size"}
DBS -->|"< 500"| DB1["1× EPYC 8-16C, 64-256 GB<br/>NVMe RAID10, 2× 25GbE"]
DBS -->|"500-5000"| DB2{"Storage"}
DB2 -->|"Local"| DB2L["1-2× EPYC 16-24C, 128-512 GB<br/>NVMe RAID10, 4× 25GbE"]
DB2 -->|"Ceph"| DB2C["2× EPYC 16-32C, 256-512 GB<br/>RBD, 4× 25/100GbE"]
DBS -->|"Enterprise"| DB3{"Storage"}
DB3 -->|"FC SAN"| DB3F["2× EPYC 48-128C, 512-2048 GB<br/>SAN LUN + 2× FC 32/64G"]
DB3 -->|"Ceph"| DB3C["2× EPYC 32-64C, 256-512 GB<br/>RBD, 4× 100GbE"]
DBS -->|"Cloud"| DBC["RDS/Azure SQL/CloudSQL<br/>Managed, Multi-AZ"]
DB --> ORACLE{"Oracle architecture?"}
ORACLE -->|"Standalone"| ORA1["1-2× EPYC 8-24C<br/>64-512 GB, ASM local/FC<br/>2× 25GbE + FC 32G"]
ORACLE -->|"Data Guard"| ORA2["2× EPYC 32-64C<br/>256-1024 GB, FC SAN<br/>2× 25/100GbE + 2× FC 64G<br/>2× 25GbE (DG sync)"]
ORACLE -->|"RAC 2-4 nodes"| ORA3["Per node: 2× EPYC 32-64C<br/>512-2048 GB, FC SAN<br/>2× 100GbE (app)<br/>2× FC 64G (storage)<br/>2× 100GbE RoCE (interconnect)"]
ORACLE -->|"Exadata"| ORA4["Engineered system<br/>2-8 DB servers + 3-18 storage<br/>RoCE 100GbE, Smart Scan<br/>15-30 kW/rack"]
HV --> HVS{"Number of hosts"}
HVS -->|"2-3"| HV1["1× EPYC 12-24C, 128-256 GB<br/>RAID5/6 SSD, 2-4× 10/25GbE"]
HVS -->|"3-6"| HV2{"HCI"}
HV2 -->|"vSAN"| HV2V["1-2× EPYC 16-32C, 256-512 GB<br/>NVMe cache + SSD, 4× 25GbE"]
HV2 -->|"Ceph"| HV2C["1× EPYC 12-24C, 128-256 GB<br/>4-8× HBA NVMe/SSD, 4× 25GbE"]
HVS -->|"6+"| HV3["2× EPYC 32-64C, 512-2048 GB<br/>FC SAN 32/64G, 4-8× 25/100GbE"]
HVS -->|"20+"| HV4["2× EPYC 64-128C, 512-1024 GB<br/>OpenStack + Ceph, 4-8× 100GbE"]
K8s --> K8T{"Node type"}
K8T -->|"General"| K8G["16-32C, 64-128 GB<br/>2× NVMe, 2× 25GbE"]
K8T -->|"Memory"| K8M["32-64C, 256-512 GB<br/>3× NVMe, 2× 25GbE"]
K8T -->|"GPU"| K8U["32-64C, 512-1024 GB<br/>6-10× NVMe, H100/B200, 4× 100GbE"]
K8T -->|"Storage"| K8S["16-32C, 64-128 GB<br/>6-14× HBA NVMe, 4× 25GbE"]
AI --> AIT{"Purpose"}
AIT -->|"Training"| AITR["GPU H100/B200, NVLink<br/>InfiniBand 400Gb/s, liquid cooling"]
AIT -->|"Inference"| AIIR["A100/H200, MIG<br/>PCIe 5.0, 2× 100GbE"]
ST --> STT{"Type"}
STT -->|"Ceph OSD"| STC["EPYC (PCIe lanes)<br/>4-8 GB/OSD, HBA, 2× 25/100GbE"]
STT -->|"MinIO"| STM["EPYC 8-16C, 32-64 GB<br/>4-16× NVMe direct, 2× 25/100GbE"]
STT -->|"NAS (ZFS)"| STN["EPYC 16-32C, 64-128 GB<br/>RAID-Z, SLOG NVMe, 2-4× 10/25GbE"]
WEB --> WEBE["EPYC high clock, 8-32C<br/>32-128 GB, 2× NVMe RAID1, 2× 10/25GbE"]
```
### Connectivity summary by platform
| Platform | App / VM network | Storage network | Replication / Cluster | Management |
|-----------|-------------|-------------|---------------------|------------|
| **DB local (small)** | 2× 25 GbE LACP | — | 2× 25 GbE (shared) | 1× 1 GbE (iLO) |
| **DB local (medium)** | 2× 25/100 GbE LACP | — | 2× 25 GbE dedicated | 1× 1 GbE (iLO) |
| **DB FC SAN** | 2× 25/100 GbE LACP | 2× 32/64 Gb FC multipath | FC replication | 1× 1 GbE (iLO) + SAN mgmt |
| **DB Ceph** | 2× 25/100 GbE | 2× 25/100 GbE (Ceph public) | 2× 25/100 GbE (Ceph cluster) | 1× 1 GbE (iLO) |
| **Hypervisor local** | 2-4× 10/25 GbE LACP | — (local) | — | 1× 1 GbE (iLO) |
| **Hypervisor vSAN** | 2× 25/100 GbE LACP | 2× 25/100 GbE (vSAN) | vSAN traffic | 1× 1 GbE (iLO) |
| **Hypervisor FC SAN** | 2-4× 25/100 GbE LACP | 2× 32/64 Gb FC multipath | 2× 25 GbE (vMotion) | 1× 1 GbE (iLO) |
| **Hypervisor Ceph** | 2× 25/100 GbE LACP | 2× 25/100 GbE (Ceph) | 2× 25 GbE (migration) | 1× 1 GbE (iLO) |
| **Kubernetes** | 2× 25/100 GbE | 2× 25/100 GbE (Ceph/Longhorn) | 2× 25/100 GbE (K8s cluster) | 1× 1 GbE (BMC) |
| **Web/API** | 2× 10/25 GbE LACP | — | — | 1× 1 GbE (BMC) |
| **Oracle Standalone** | 2× 25 GbE LACP | 2× FC 32G or NVMe local | Data Guard 2× 25 GbE | 1× 1 GbE (iLO) + ASM mgmt |
| **Oracle Data Guard** | 2× 25/100 GbE LACP | 2× FC 64G multipath | 2× 25 GbE (DG sync) | 1× 1 GbE (iLO) + SAN mgmt |
| **Oracle RAC** | 2× 100 GbE LACP (VIP/SCAN) | 2× FC 64G multipath | 2× 100 GbE RoCE (Cache Fusion) | 1× 1 GbE (iLO) + Clusterware |
| **Oracle Exadata** | 4-8× 100 GbE RoCE | NVMe over Fabric | RDMA interconnect | Exadata CLI + OEDA |
## Sources
Links, books and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)
*Last revision: 2026-06-03*

757
SERVER-CONFIG.md Normal file
View File

@@ -0,0 +1,757 @@
# ⚙️ Server configuration — best practices podle workloadu
## Obecná BIOS/UEFI nastavení
| Nastavení | Doporučení | Zdůvodnění |
|-----------|-----------|------------|
| **Boot mode** | UEFI | Secure Boot, GPT, větší disky |
| **Power profile** | Performance / OS Control | Max výkon, C-States disabled |
| **Hyper-Threading** | Enabled | +30-50 % throughput pro multi-thread |
| **Virtualization** | Enabled (VT-x/AMD-V) | Nutné pro hypervisor, containers |
| **SR-IOV** | Enabled | GPU, NIC passthrough |
| **NUMA** | Enabled | NUMA-aware scheduling |
| **ACPI** | Enabled | Power management, OS-level |
| **Secure Boot** | Enabled | Secure boot chain |
| **TPM** | Enabled | Measured boot, key storage |
---
## 1. Databázové servery
### Volba CPU
| DB typ | CPU preference | Zdůvodnění |
|--------|---------------|------------|
| **OLTP** (PostgreSQL, MySQL) | High clock, moderate cores | Nízká latence na transakci, limited parallelism |
| **OLAP** (ClickHouse, Snowflake) | Many cores, AVX-512 | Columnstore, high parallelism |
| **In-memory** (Redis, Memcached) | High clock, low cache latency | Single-threaded (Redis), RAM bandwidth |
| **Document** (MongoDB) | Balance (clock × cores) | Mixed workload |
| **Distributed** (Cassandra, Scylla) | Many cores, high cache | Shard-per-core (Scylla), compaction |
| **Oracle OLTP** | High clock, moderate cores, core-factor aware | CPU license cost (core factor 0.5 pro AMD EPYC i Intel Xeon) |
| **Oracle OLAP / DW** | Many cores, large SGA, in-memory option | Parallel query, Exadata Smart Scan, compression |
### Oracle CPU licensing — core factor
Oracle licencuje na jádro s korekčním faktorem dle procesoru. Faktor 0.5 znamená, že 2 jádra = 1 Oracle license.
| Procesor | Core factor | 64 fyzických jader → Oracle licencí |
|----------|-------------|--------------------------------------|
| AMD EPYC (všechny řady) | 0.5 | 32 |
| Intel Xeon (Scalable) | 0.5 | 32 |
| IBM POWER | 1.0 | 64 |
| ARM (Ampere Altra) | 0.5 | 32 |
**Dopad na výběr CPU**: Při stejném Oracle license cost je EPYC s více jádry výhodnější — dostanete více compute power za stejnou license cenu.
### Konfigurace podle velikosti firmy a typu storage
#### Varianta A: Malá firma — lokální NVMe RAID
| Komponenta | Doporučení | Poznámka |
|-----------|-----------|----------|
| **CPU** | 1× EPYC 9124/9224 nebo Intel Xeon 4410Y (8-16C) | 1 socket, high clock |
| **RAM** | 64-256 GB (8-16 GB/core) | DDR5-4800, 1DPC |
| **OS disk** | 2× SATA/SAS SSD, RAID 1 (240-480 GB) | Pro OS + binární soubory |
| **Data disk** | 4-6× NVMe (U.2/E3.S), RAID 10 | Lokální data, žádné sdílení |
| **WAL disk** | 2× NVMe RAID 1 (400-800 GB) | Pouze PostgreSQL |
| **Network** | 2× 25 GbE (LACP) | Aplikační traffic + management |
| **Form factor** | 1U nebo 2U | Single node, žádný cluster |
| **Storage backend** | Lokální RAID controller (PERC/Broadcom) | HW RAID 10 nebo SW RAID (mdadm) |
| **HA** | Aplikace řídí failover (patroni, repmgr, orchestrator) | Standby node při selhání |
**Use case**: Startup, pobočka, dev/test, < 500 uživatelů, jeden databázový server, nízké nároky na dostupnost.
#### Varianta B: Střední firma — lokální NVMe + asynchronní replikace
| Komponenta | Doporučení | Poznámka |
|-----------|-----------|----------|
| **CPU** | 1-2× EPYC 9334/9374F nebo Intel Xeon 5418Y (16-24C) | 1-2 socket, balanced |
| **RAM** | 128-512 GB (8-16 GB/core) | DDR5-4800/5600, 1DPC |
| **OS disk** | 2× NVMe RAID 1 (2× 480 GB) | OS + binárky |
| **Data disk** | 6-8× NVMe, RAID 10 | Lokální NVMe, 3-6 TB usable |
| **WAL disk** | 2× NVMe RAID 1 (2× 800 GB) | Oddělený od data |
| **Network** | 2× 25 GbE (app) + 2× 25 GbE (replication) | Aplikační a replikační síť odděleny |
| **Form factor** | 2U | Primární + replica node |
| **Storage backend** | SW RAID (mdadm) nebo HW RAID (PERC H965) | Write-back cache s BBU |
| **HA** | Patroni / repmgr / MySQL InnoDB Cluster | Asynchronní replikace na 1-2 standby |
**Use case**: E-commerce, SaaS střední velikosti, 500-5000 uživatelů, RPO < 1 min, RTO < 5 min.
#### Varianta C: Velká firma — FC SAN (enterprise)
| Komponenta | Doporučení | Poznámka |
|-----------|-----------|----------|
| **CPU** | 2× EPYC 9654/9965 nebo Xeon 8592+/6980P (48-128C) | 2 socket, max cores, large cache |
| **RAM** | 512 GB - 2 TB (8-16 GB/core) | DDR5, 2DPC (penalizace speed), 12 channelů (EPYC) |
| **OS disk** | 2× SATA SSD RAID 1 (2× 480 GB) | Pouze OS, data na SAN |
| **Data + WAL** | LUNy z FC SAN | Hitachi VSP / Dell PowerMax / Pure //X |
| **HBA** | 2× dual-port FC HBA (32/64 Gb) | Multipath (active-active), FC-NVMe |
| **Network** | 2× 25/100 GbE (app) + 2× 32/64 Gb FC (storage) | App i storage síť odděleny |
| **Form factor** | 2U | 2-8 node cluster (RAC, AlwaysOn AG) |
| **Storage backend** | FC SAN — LUN per databáze | Thin provisioning, RAID na SAN, snapshots |
| **HA** | Oracle RAC / SQL Server AOAG / PostgreSQL Patroni | Synchronní replikace, FC multipath |
**Výhody SAN**: Centrální management, snapshots, cloning, disaster recovery (SRDF/Metro), oddělená storage síť, vyšší dostupnost.
**Nevýhody**: Vyšší latence oproti lokálnímu NVMe (~50-200 µs přes SAN vs ~10 µs local NVMe), vyšší CAPEX, vendor lock-in.
#### Varianta D: Velká firma — Ceph / SDS backend
| Komponenta | Doporučení | Poznámka |
|-----------|-----------|----------|
| **CPU** | 2× EPYC 9334/9654 (16-32C) | Méně cores než SAN varianta — část CPU jde na Ceph client |
| **RAM** | 256-512 GB | Méně RAM — Ceph client cache není tak efektivní jako lokální buffer |
| **OS disk** | 2× SATA SSD RAID 1 (2× 480 GB) | OS |
| **Network** | 2× 25/100 GbE (app) + 2× 25/100 GbE (Ceph public) | App i Ceph traffic po Ethernetu |
| **HBA** | Storage HBA v IT/HBA mode (žádný RAID) | Pro Ceph OSD node, ne DB node |
| **Form factor** | 2U | DB nod + separátní Ceph OSD nod |
| **Storage backend** | RBD (RADOS Block Device) přes Ceph | 3× replikace nebo erasure coding |
| **HA** | Aplikace + Ceph inherentní HA | Ceph self-healing, auto-rebalance |
**Výhody Ceph**: Žádný vendor lock-in, horizontální škálování, jednotná platforma pro block/file/object, nižší CAPEX.
**Nevýhody**: Vyšší latence a CPU režie (Ceph client → network → OSD), variabilní výkon, složitější troubleshooting.
#### Varianta E: Cloud — RDS / CloudSQL / Azure SQL
| Komponenta | Doporučení | Poznámka |
|-----------|-----------|----------|
| **Compute** | AWS RDS (db.r7g/r8g), Azure SQL (GP/BC/Hyperscale) | Managed service, bez přístupu k OS |
| **Storage** | EBS gp3 / io2, Azure Premium SSD v2, Cloud SQL SSD | Automatické škálování, PITR, multi-AZ |
| **Network** | Security Group, Private Link, VPC peering | Žádný HBA, žádná SAN — vše přes Ethernet |
| **HA** | Multi-AZ (synchronní), read replicas | Managed failover, RTO < 60 s |
| **Backup** | Automated, PITR (7-35 dní) | Bez nutnosti managementu |
**Use case**: Žádný on-prem hardware, elastické škálování, pay-per-use, menší provozní režie.
**Nevýhody**: Vyšší dlouhodobé náklady, data residency, network latency, limited customization.
### Srovnání variant
| Aspekt | Lokální NVMe (malá) | Lokální NVMe (střední) | FC SAN | Ceph | Cloud |
|--------|---------------------|----------------------|--------|------|-------|
| **Latence** | ~10 µs | ~10 µs | ~50-200 µs | ~100-500 µs | ~100-1000 µs |
| **Škálování** | Vertikální | Vertikální | Horizontální | Horizontální | Elastické |
| **CAPEX** | Nízký | Střední | Vysoký | Střední | Žádný (OPEX) |
| **Provozní režie** | Nízká | Nízká | Vysoká (SAN admin) | Střední | Žádná |
| **HA** | Aplikace | Patroni/Cluster | RAC/AOAG | Ceph HA | Managed |
| **RPO** | 1-5 min | < 1 min | < 10 s | < 30 s | < 60 s |
| **RTO** | 5-15 min | < 5 min | < 2 min | < 5 min | < 60 s |
| **Počet serverů** | 1-2 | 2-4 | 4-16 | 6-20+ | 0 (managed) |
| **Firma** | Startup/SME | SME/Enterprise | Enterprise | Enterprise | Libovolná |
### PostgreSQL parameter matrix podle storage typu
| Parametr | Local NVMe | FC SAN | Ceph RBD |
|----------|-----------|--------|----------|
| `random_page_cost` | 1.1 | 1.5-2.0 | 2.0-3.0 |
| `effective_io_concurrency` | 300 | 100-200 | 50-100 |
| `synchronous_commit` | off (NVMe cache) | on (SAN cache) | off (Ceph cache) |
| `full_page_writes` | on | on | on (i přes Ceph) |
### Storage layout podle typu backendu
**Lokální NVMe (malá/střední):**
```
Mount point FS RAID Disk Účel
/ ext4 1 (mirror) 2× SATA SSD OS
/data xfs 10 4-8× NVMe Data
/wal xfs 1 (mirror) 2× NVMe WAL (PG)
```
**FC SAN (enterprise):**
```
Mount point FS Device Účel
/ ext4 local RAID 1 (2× SSD) OS
/dev/sdb xfs FC LUN 1 (500 GB) WAL (PG)
/dev/sdc xfs FC LUN 2 (2 TB) Data
/dev/sdd xfs FC LUN 3 (2 TB) Indexy (oddělené)
```
**Ceph RBD:**
```
Mount point FS Ceph device Účel
/ ext4 local RAID 1 (2× SSD) OS
/dev/rbd0 xfs rbd datastore-01 Data + WAL (Ceph RBD)
```
### Kernel tuning podle variants
**Lokální NVMe:**
```
vm.dirty_ratio = 30
vm.dirty_background_ratio = 5
```
**FC SAN:**
```
# SAN storage — vyšší latency, méně agresivní flush
vm.dirty_ratio = 20
vm.dirty_background_ratio = 3
vm.dirty_expire_centisecs = 3000 # Defer writes (SAN cache)
```
**Ceph RBD:**
```
# Ceph RBD — network storage, optimalizovat pro RBD cache
vm.dirty_ratio = 15
vm.dirty_background_ratio = 2
# RBD cache settings
# rbd cache = true (client-side)
# rbd cache size = 256-512 MB
```
### Database-specific tuning
| Parametr | PostgreSQL | MySQL | Oracle | MongoDB |
|----------|-----------|-------|--------|---------|
| **Cache** | `shared_buffers` 25 % RAM | `innodb_buffer_pool` 70-80 % RAM | `SGA_TARGET` 60-80 % RAM | `WiredTiger cache` 50-80 % RAM |
| **OS cache** | `effective_cache_size` 75 % RAM | OS cache + InnoDB | OS cache (double buffering risk při large SGA) | OS cache |
| **Write buffer** | `wal_buffers` 64-256 MB | `innodb_log_file_size` 1-4 GB | Redo log (2-4 groups, 200 MB-4 GB) | WiredTiger log |
| **Connections** | `max_connections` 50-500 | `max_connections` 100-500 | `processes` 200-2000 | maxIncomingConnections |
| **I/O** | `effective_io_concurrency` 200 | `innodb_io_capacity` 2000 | `db_file_multiblock_read_count` 128 | WiredTiger eviction |
| **Huge pages** | `huge_pages = try` | `large-pages = ON` | `use_large_pages = only` (mandatory) | transparent_hugepages=never |
| **Parallel query** | `max_parallel_workers` 4-8 | `innodb_parallel_read_threads` 4 | `parallel_degree_policy = auto` — až 64 | — |
### Connectivity per variant
| Varianta | App síť | Storage síť | Replikace | Management |
|----------|---------|-------------|-----------|------------|
| **Lokální (malá)** | 2× 25 GbE LACP | — | 2× 25 GbE (same) | iDRAC/iLO |
| **Lokální (střední)** | 2× 25 GbE LACP | — | 2× 25 GbE dedik. | iDRAC/iLO |
| **FC SAN** | 2× 25/100 GbE | 2× 32/64 Gb FC (multipath) | FC replication | iDRAC/iLO + SAN mgmt |
| **Ceph** | 2× 25/100 GbE | 2× 25/100 GbE (public net) | 2× 25/100 GbE (cluster net) | iDRAC/iLO + Ceph mgmt |
| **Cloud** | Elastic IP / Private Link | — | — | AWS Console / API |
| **Oracle Standalone** | 2× 25 GbE LACP | ASM (2× 25 GbE nebo FC 32G) | Data Guard 2× 25 GbE | iLO + ASM mgmt |
| **Oracle RAC** | 2-4× 25/100 GbE | 2× 64 Gb FC (multipath) | Cache Fusion interconnect | iLO + SAN mgmt |
| **Oracle Exadata** | 4-8× 100 GbE RoCE | NVMe over Fabric | RDMA interconnect | Exadata CLI + OEDA |
### Oracle-specific konfigurace
#### Oracle ASM — diskgroup layout
Oracle ASM (Automatic Storage Management) nahrazuje tradiční filesystem + volume manager:
| Diskgroup | Redundancy | Disky | Účel |
|-----------|-----------|-------|-------|
| **DATA** | Normal (2× mirror) | 4-12× FC LUN/NVMe | Data files, temp files, control files |
| **FRA** (Flash Recovery Area) | Normal (2× mirror) | 2-6× FC LUN/NVMe | Archive logs, backup, flashback logs |
| **REDO** | High (3× mirror) | 2-4× FC LUN/NVMe | Online redo log groups (I/O kritické) |
| **SPFILE** | Normal | 2× small LUN | Server parameter file |
**ASM striping**: Coarse (1 MB) pro běžná data, Fine (128 KB) pro redo logy (nižší latence zápisu).
#### Varianta O1: Standalone Oracle (malá/střední, single instance)
| Parametr | Small (< 500 users) | Medium (500-2000 users) |
|----------|---------------------|------------------------|
| **CPU** | 1-2× EPYC 9124-9224 / Xeon 4410Y (8-16C) | 2× EPYC 9334-9374F / Xeon 5418Y (16-24C) |
| **RAM (SGA + PGA)** | 64-128 GB (SGA 70 %, PGA 30 %) | 128-512 GB (SGA 60-80 %, PGA 20-40 %) |
| **Huge pages** | Ano (vm.nr_hugepages) — mandatory pro SGA | Ano |
| **OS disk** | 2× SATA SSD RAID 1 (240 GB) | 2× NVMe RAID 1 (480 GB) |
| **DATA + FRA** | 4-6× NVMe, ASM normal redundancy | 6-8× NVMe nebo FC LUN, ASM normal |
| **REDO** | 2-4× NVMe (oddělené od DATA), ASM high | 4× FC LUN (oddělené), ASM high |
| **Archive log** | Lokální FRA | FC LUN (FRA diskgroup) |
| **Network (app)** | 2× 25 GbE LACP | 2-4× 25/100 GbE LACP |
| **Network (storage)** | — (lokální NVMe) | 2× FC 32G multipath |
| **Network (Data Guard)** | — | 2× 25 GbE dedikované |
| **DB version** | Oracle SE2 (max 16 threads) | Oracle EE (neomezené) |
**Use case**: Dev/test, malé produkční DB, pobočky. SE2 license = max 16 CPU threads, limitovaná parallel execution.
#### Varianta O2: Oracle Data Guard (střední/velká, HA + DR)
Primární + standby v active-passive režimu, možnost Active Data Guard pro reporting.
| Parametr | Doporučení |
|----------|-----------|
| **CPU** | 2× EPYC 9654-9965 / Xeon 8592+ (32-64C) |
| **RAM** | 256-1024 GB (SGA 60-80 %, PGA 20-40 %) |
| **Huge pages** | Ano (50-80 % RAM alokováno pro SGA) |
| **OS disk** | 2× NVMe RAID 1 (480 GB) |
| **Storage** | FC SAN LUN (DATA + FRA + REDO odděleně) nebo NVMe + ASM |
| **HBA** | 2× dual-port FC 32/64 Gb (multipath active-active) |
| **App network** | 2-4× 25/100 GbE LACP |
| **Storage network** | 2× FC 32/64 Gb multipath |
| **Data Guard network** | 2× 25/100 GbE dedikované (sync nebo async) |
| **Data Guard režim** | Maximum Availability (sync, fallback na async) — RPO = 0 |
| **Topologie** | 1 primary + 1-2 standby (physical), far sync pro geo-DR |
| **Active Data Guard** | Standby otevřená pro čtení (reporting, backup) — vyžaduje ADG licenci |
**Latence Data Guard**:
```text
Synchronní (Maximum Availability):
Primární COMMIT → LGWR flush REDO → sync přes síť → Standby LGWR → ACK → ~1-5 ms
RPO = 0, dopad na latenci zápisu
Asynchronní (Maximum Performance):
Primární COMMIT → LGWR flush REDO → async do standby buffer → ~0.1-1 ms
RPO = několik sekund, zanedbatelný dopad na zápis
```
**Síťové požadavky pro Data Guard sync**:
- RTT < 2 ms pro synchronní režim (doporučeno < 1 ms)
- Min. 10 GbE, doporučeno 25 GbE (propustnost = REDO rate × 2)
- REDO rate: OLTP ~50-500 MB/s, batch ~500-2000 MB/s
- Při REDO rate 500 MB/s a 25 GbE → ~20 % link utilization
#### Varianta O3: Oracle RAC (velká, enterprise)
Multi-instance cluster se shared storage a Cache Fusion.
| Parametr | Doporučení |
|----------|-----------|
| **Počet nodů** | 2-4 (typicky), max 64 (RAC cluster) |
| **CPU per node** | 2× EPYC 9654-9965 / Xeon 8592+ (32-64C) |
| **RAM per node** | 512-2048 GB (SGA 60-80 %, PGA 20-40 %) |
| **Huge pages** | Ano (1 GB stránky pokud RAM > 512 GB) |
| **Storage** | FC SAN — shared LUNs (ASM normal/high redundancy) |
| **HBA** | 2× dual-port FC 64 Gb (multipath, active-active) |
| **App network** | 2-4× 25/100 GbE LACP (VIP, SCAN listener) |
| **Storage network** | 2-4× FC 64 Gb (multipath per node) |
| **Cache Fusion interconnect** | 2× 100 GbE (RoCE v2 nebo InfiniBand) — dedikovaný |
| **RAC interconnect latency** | < 5 µs (doporučeno), max < 10 µs |
| **ASM** | Normal redundancy (2-way mirror) |
| **Oracle Clusterware** | Voting disk (3× 1 GB LUN), OCR (3× 500 MB LUN) |
| **Service** | OLTP_service, REPORT_service, BATCH_service |
**Cache Fusion — kritický interconnect**:
```
Node A (DB instance) ←──→ Node B (DB instance)
│ │
└──────── ASM ───────────┘
FC SAN (shared storage)
Cache Fusion traffic: dirty block transfer mezi instancemi
→ Latence < 5 µs, jinak RAC škálování degraduje
→ Kapacita: 2× 100 GbE, dedikovaný switch nebo InfiniBand HDR100
→ Doporučená MTU: 9000 (jumbo frames)
```
**RAC sizing podle počtu transakcí**:
| TPS | Nodů | CPU per node | RAM per node | Interconnect |
|-----|------|-------------|-------------|-------------|
| < 10 000 | 2 | 16-24C | 256 GB | 2× 25 GbE |
| 10 000 - 50 000 | 2-4 | 32-48C | 512 GB | 2× 100 GbE RoCE |
| 50 000 - 200 000 | 4-8 | 48-64C | 1024 GB | 2× 100 GbE RoCE / InfiniBand |
| > 200 000 | 8+ | 64-128C | 2048 GB | InfiniBand HDR100/HDR200 |
**RAC sizing — výpočet licence cost**:
```text
Příklad: 4-node RAC, každý node 2× EPYC 9654 (96C) = 192 cores per node
Core factor 0.5 → 96 Oracle licenses per node
4 × 96 = 384 Oracle EE licenses
Pri ~$47.5k/license → ~$18.2M (jen licence, bez supportu 22 % ročně)
```
#### Varianta O4: Oracle Exadata (hyperscale)
Engineered system — optimální pro hybrid workload (OLTP + DW).
| Parametr | X9M / X10M | Use case |
|----------|-----------|----------|
| **Database servers** | 2-8× (Xeon, 1.5-6 TB RAM, NVMe) | Compute |
| **Storage servers** | 3-18× (NVMe + HDD, Smart Scan) | Offloading predikátů |
| **Smart Scan** | Filtrace na storage vrstvě | Méně dat po síti, vyšší propustnost |
| **RoCE interconnect** | 100 GbE (RDMA) | Nízká latence, high bandwidth |
| **In-Memory Column Store** | Volitelná licence | Real-time analytics bez ETL |
| **HCC (Hybrid Columnar Compression)** | Compression v storage serverech | Až 10-15× komprese pro DW |
| **Rack power** | ~15-30 kW (full rack) | Vyšší densita |
**Kdy zvolit Exadata místo standalone RAC**:
- OLTP > 50 000 TPS
- Potřeba konsolidace (více DB na jeden cluster)
- Smart Scan výrazně zrychluje reporting na produkčních datech
- HCC pro úsporu storage u DW workloadů
---
## 2. Hypervisor host (ESXi / KVM / Hyper-V)
### Konfigurace podle velikosti a storage typu
#### Varianta A: Malá firma — lokální storage (2-3 hosty)
| Komponenta | Doporučení | Poznámka |
|-----------|-----------|----------|
| **CPU** | 1× EPYC 9224/9254 nebo Xeon 4410Y/5418Y (12-24C) | 1 socket, dost cores pro VM density |
| **RAM** | 128-256 GB (4-8 GB/core) | DDR5, 1DPC |
| **OS disk** | 2× SATA SSD RAID 1 (2× 240-480 GB) | ESXi / Proxmox / Hyper-V boot |
| **VM storage** | 4-6× SATA/SAS SSD, RAID 5/6 nebo 10 | Lokální RAID, 4-12 TB usable |
| **Network** | 2-4× 10/25 GbE (LACP) | Sdílený pro vše (management + VM + storage) |
| **Hypervisor** | VMware vSphere Standard / Proxmox VE / Hyper-V | Basic license, žádné enterprise funkce |
| **Storage backend** | Lokální RAID controller (PERC H755, Broadcom 9560) | HW RAID s cache, write-back |
| **HA** | VMware HA / Proxmox HA | Restart VM na jiném hostu při selhání |
| **Backup** | Veeam B&R Free / PBS (Proxmox Backup Server) | Lokální nebo USB disk |
**Use case**: Malá kancelář, pobočka, dev/test, < 10 VM, nízký rozpočet, jednoduchá správa.
**Limitace**: Žádné vMotion bez shared storage, outage při výpadku hosta (restart HA, ne seamless).
#### Varianta B: Střední firma — vSAN / Ceph (3-6 hostů)
| Komponenta | Doporučení | Poznámka |
|-----------|-----------|----------|
| **CPU** | 1-2× EPYC 9334/9654 nebo Xeon 5418Y/8592+ (16-32C) | 1-2 socket |
| **RAM** | 256-512 GB (4-8 GB/core) | DDR5, 2DPC (minimální penalizace) |
| **OS disk** | 2× SATA SSD RAID 1 nebo 2× M.2 NVMe (BOSS-S1) | Oddělený od VM storage |
| **Cache tier** | 1-2× NVMe (vSAN caching / Ceph WAL+DB) | Pro write performance |
| **Capacity tier** | 4-8× SATA/SAS SSD nebo HDD (vSAN capacity / Ceph OSD) | HDD pro kapacitu, SSD pro performance |
| **Network** | 4× 25/100 GbE — 2× VM + mgmt, 2× storage (vSAN/Ceph) | Oddělená storage síť, RDMA (RoCE v2) |
| **Hypervisor** | VMware vSAN / Proxmox Ceph / StarWind HCI | HCI license (vSAN ~$2.5k/Core) |
| **Storage backend** | vSAN OSA/ESA nebo Ceph (RADOS) | Distributed storage, auto-rebalance |
| **HA** | vSphere HA + vSAN / Proxmox HA + Ceph | vMotion, DRS, automated failover |
| **Failover** | N+1 (jeden host jako rezerva) | U vSAN min. 4 hosty (pro ESA min. 3) |
**Čistě Ceph varianta (Proxmox / OpenStack)**:
```
Proxmox node (3-6×):
├── CPU: 1× EPYC 9224-9334 (12-24C)
├── RAM: 128-256 GB
├── OS: 2× SATA SSD RAID 1
├── Ceph OSD: 4-8× NVMe/SATA SSD (RAW, HBA mode)
├── Network: 2× 25 GbE (public) + 2× 25 GbE (cluster)
└── Storage: Ceph 3× replication, CRUSH host failure domain
```
**VMware vSAN varianta (4-6 hostů)**:
```
vSAN node (4-6×):
├── CPU: 1-2× EPYC/Xeon (16-32C)
├── RAM: 256-512 GB
├── OS: 2× M.2 NVMe (BOSS-S1) nebo SD card (deprecated)
├── vSAN cache: 1-2× NVMe (write buffer)
├── vSAN capacity: 4-8× SATA SSD (vSAN ESA) nebo HDD (vSAN OSA)
├── Network: 2× 25/100 GbE (VM) + 2× 25 GbE (vSAN)
└── Storage: vSAN ESA (all-NVMe) nebo OSA (hybrid)
```
**Use case**: SME, enterprise divize, 10-100 VM, potřeba vMotion, DRS, HA, jednoduchý storage management.
#### Varianta C: Velká firma — FC SAN (6+ hostů)
| Komponenta | Doporučení | Poznámka |
|-----------|-----------|----------|
| **CPU** | 2× EPYC 9654/9965 nebo Xeon 8592+/6980P (32-64C) | 2 socket, max VM density |
| **RAM** | 512 GB - 2 TB (4-8 GB/core) | DDR5, 2DPC |
| **OS disk** | 2× SATA SSD RAID 1 nebo SD card (vSphere) | Boot, image storage |
| **VM storage** | LUNy z FC SAN — VMFS / NFS datastory | Hitachi, Dell, Pure, HPE storage |
| **HBA** | 2× dual-port FC HBA 32/64 Gb | Multipath, FC-NVMe |
| **Network** | 4-8× 25/100 GbE — rozdělené do traffic typů | Management, VM, vMotion, FT odděleny |
| **Hypervisor** | VMware vSphere Enterprise+ / Hyper-V DC | Enterprise license, DRS, HA, FT |
| **Storage backend** | FC SAN — VMFS 8 datastory, VVols | Thin provisioning, storage DRS, array snapshots |
| **HA** | vSphere HA + DRS + vCenter | vMotion, DRS, FT, SRM pro DR |
| **Failover** | N+1 nebo admission control (rezerva CPU/RAM) | Vyhrazená kapacita pro HA failover |
**Use case**: Enterprise, 100+ VM, mix DB a aplikací, centralizovaný storage management, enterprise SLA.
#### Varianta D: Hyperscale — Ceph / SDS (20+ hostů)
| Komponenta | Doporučení | Poznámka |
|-----------|-----------|----------|
| **CPU** | 2× EPYC 9654/9965 (64-128C) | 2 socket, compute optimální |
| **RAM** | 512 GB - 1 TB (2-4 GB/core) | Nízký overcommit ratio pro konzistenci |
| **OS disk** | 2× M.2 NVMe RAID 1 (BOSS) | Boot |
| **Network** | 4-8× 100 GbE (compute + storage) | Separate OVN/OVS pro SDN, VXLAN tunneling |
| **Hypervisor** | OpenStack (Nova) / OpenShift (KubeVirt) | Open source, API-driven, multi-tenant |
| **Storage backend** | Ceph (RADOS, RBD, RGW, CephFS) | Unified storage, erasure coding (8+3) |
| **Orchestrace** | OpenStack / Kubernetes | Infrastructure-as-Code, autoscaling |
| **HA** | OpenStack HA / Kubernetes HA | Self-healing, auto-rebalance |
**Use case**: Cloud provider, hyperscale, 500+ VM, multi-tenant, maximální automatizace.
### Srovnání hypervisor variant
| Aspekt | Lokální (malá) | vSAN/Ceph (střední) | FC SAN (velká) | Ceph hyperscale |
|--------|---------------|---------------------|----------------|-----------------|
| **Storage** | Lokální RAID | vSAN / Ceph (HCI) | FC SAN (centralizovaný) | Ceph (distribuovaný) |
| **Počet hostů** | 2-3 | 3-6 | 6-50+ | 20+ |
| **Latence VM** | ~10 µs (local) | ~100-500 µs | ~200 µs (SAN) | ~500-2000 µs |
| **CAPEX/host** | Nízký | Střední | Vysoký | Střední |
| **CAPEX storage** | Nízký | Žádný (součást hostů) | Vysoký (SAN array) | Žádný (součást hostů) |
| **Management** | Simple (per host) | vCenter / Proxmox | vCenter + SAN mgmt | OpenStack / K8s |
| **vMotion** | Ne (bez sdílené storage) | Ano (vSAN / Ceph RBD) | Ano (FC LUN) | Ano (Ceph RBD) |
| **DRS** | Ne | Ano (vSphere) | Ano (vSphere) | OpenStack scheduler |
| **Škálování** | Vertikální | Horizontální (přidat host) | Horizontální (host + SAN) | Horizontální |
### Network design podle varianty
#### Malá (lokální storage)
| Traffic | VLAN | Rychlost | Teaming | Poznámka |
|---------|------|----------|---------|----------|
| Management | Mgmt | 1 GbE | Active/Passive | Dedikovaný port (iLO/iDRAC) |
| VM + Storage | All | 2-4× 10/25 GbE | LACP | Sdílené, VLAN tagging |
```
┌──────────────────────────────────────────┐
│ Host │
│ ┌──────┐ ┌─────────────────────────────┐│
│ │ iLO │ │ NIC1 NIC2 ││
│ │ 1 GbE │ │ [LACP] 25 GbE ││
│ └──────┘ └──────────┬──────────────────┘│
└──────────────────────┼───────────────────┘
┌─────┴─────┐
│ Switch │
└───────────┘
```
#### Střední (vSAN / Ceph)
| Traffic | VLAN | Rychlost | Teaming | Poznámka |
|---------|------|----------|---------|----------|
| Management | Mgmt | 1 GbE | Active/Passive | Dedikovaný iLO/iDRAC |
| VM | VM | 2× 25/100 GbE | LACP | VM traffic, migrace |
| Storage | vSAN/Ceph | 2× 25/100 GbE | LACP nebo RDMA | Oddělený, Jumbo frames (MTU 9000) |
```
┌──────────────────────────────────────────┐
│ Host │
│ ┌──────┐ ┌──────────┐ ┌───────────────┐│
│ │ iLO │ │ NIC1 NIC2│ │ NIC3 NIC4 ││
│ │ 1 GbE │ │ VM traffic│ │ Storage (vSAN)││
│ └──────┘ └──────────┘ └───────────────┘│
└──────────────────────────────────────────┘
```
#### Velká (FC SAN)
| Traffic | VLAN | Rychlost | Teaming | Poznámka |
|---------|------|----------|---------|----------|
| Management | Mgmt | 1 GbE | Active/Passive | Dedikovaný |
| VM | VM | 2-4× 25/100 GbE | LACP | VM traffic |
| vMotion | vMotion | 2× 25 GbE | Dedikovaný | Multi-NIC vMotion |
| FT | FT | 2× 10/25 GbE | Dedikovaný | Low latency |
| Storage | — | 2× 32/64 Gb FC | Multipath | FC SAN |
```
┌──────────────────────────────────────────────┐
│ Host │
│ ┌──────┐ ┌────────────┐ ┌────┐ ┌─────────┐│
│ │ iLO │ │ NIC1-4 │ │HBA1│ │ HBA2 ││
│ │ 1 GbE │ │ VM+vMotion+FT│ │32Gb│ │ 32Gb ││
│ └──────┘ └────────────┘ └─┬──┘ └──┬──────┘│
└────────────────────────────┼───────┼───────┘
│ │
┌───────┴───┐ ┌─┴────────┐
│ Ethernet │ │ FC Switch │
│ Switch │ │ (Brocade/ │
│ │ │ Cisco) │
└───────────┘ └──────────┘
```
### BIOS pro hypervisor — všechny varianty
| Nastavení | Hodnota | Zdůvodnění |
|-----------|---------|------------|
| Hyper-Threading | Enabled | Vyšší VM density |
| Virtualization Technology | Enabled | VT-x/AMD-V |
| VT-d / IOMMU | Enabled | Passthrough, SR-IOV |
| Power Management | Performance / OS | Minimalizace latence VM exit |
| C-States | Disabled | Nižší latence VM exit (důležité pro real-time VM) |
| NUMA | Enabled | NUMA-aware VM placement |
| SR-IOV | Enabled | NIC/GPU virtualizace |
| Adjacent Sector Prefetch | Enabled (Intel) | Lepší sekvenční čtení |
| DCU Streamer / IP Prefetcher | Enabled | HW prefetch pro VM workload |
| Patrol Scrub | Disabled (vSAN/Ceph) | Může způsobovat latency spikes u SDS |
### Výběr hypervisoru podle varianty
| Kritérium | VMware vSphere | Proxmox VE | Hyper-V | OpenStack |
|-----------|---------------|------------|---------|-----------|
| **Velikost** | SME - Enterprise | SME | SME - Enterprise | Hyperscale |
| **Storage** | vSAN, SAN, NFS | Ceph, ZFS, NFS | Storage Spaces, SAN | Ceph, manila |
| **License** | ~$1-5k/core | Zdarma (support ~$500/host) | Součást Windows Server | Open source |
| **Familiarita** | Nejvyšší | Střední | Windows admin | Nízká |
| **Automation** | Terraform, Ansible, PowerCLI | Ansible, Terraform, PBS | PowerShell, SCVMM | Terraform, Heat, Ansible |
| **Ekosystém** | Nejširší (Veeam, Zerto, SRM) | Rostoucí (PBS, vzdálená migrace) | Windows ecosystem | Open source (Kolla, TripleO) |
---
## 3. Kubernetes node
### Node profily
| Role | CPU | RAM | Storage | Network | Use case |
|------|-----|-----|---------|---------|----------|
| **General purpose** | 16-32 cores | 64-128 GB | 1× NVMe OS + 1×NVMe local | Web, API, microservices |
| **Memory optimized** | 32-64 cores | 256-512 GB | 1× NVMe OS + 2×NVMe local | In-memory cache, DB |
| **Compute optimized** | 64-128 cores | 128-256 GB | 1× NVMe OS | Batch, CI/CD |
| **GPU node** | 32-64 cores | 512-1024 GB | 1× NVMe OS + 4-8×NVMe local | AI/ML training, inference |
| **Storage node** | 16-32 cores | 64-128 GB | 4-12× NVMe/SATA (Ceph/Longhorn) | SDS, persistent volumes |
### Kernel tuning
```
# /etc/sysctl.d/99-kubernetes.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
net.ipv4.conf.all.forwarding = 1
# Connection tracking (pro NodePort, Service)
net.netfilter.nf_conntrack_max = 2097152
net.netfilter.nf_conntrack_tcp_timeout_established = 86400
# File watchers (pro kubelet, containerd)
fs.inotify.max_user_instances = 8192
fs.inotify.max_user_watches = 524288
# Memory management
vm.swappiness = 0
vm.overcommit_memory = 1 # Allow overcommit (CRI-O, containerd)
vm.panic_on_oom = 0
kernel.panic = 10
kernel.panic_on_oops = 1
```
### Container storage
| Typ | Doporučení | Poznámka |
|-----|-----------|----------|
| **OS disk** | RAID 1 (2× NVMe) | Ext4/XFS, 100-200 GB |
| **Container runtime image** | RAID 1 (2× NVMe) | /var/lib/containerd, 200-500 GB |
| **Local PV** | Single NVMe | Raw device, no RAID |
| **Rook/Ceph OSD** | Raw NVMe/SATA | HBA/IT mode, no RAID |
| **Longhorn** | Raw NVMe/SATA | Ext4/XFS per volume |
---
## 4. Storage server (Ceph / MinIO / NAS)
### Ceph OSD node
| Komponenta | Doporučení | Poznámka |
|-----------|-----------|----------|
| **CPU** | 1-2 cores per OSD | Do 12 OSD na node (24 cores) |
| **RAM** | 4-8 GB per OSD + OS | BlueStore cache, 16-64 GB min |
| **Network** | 2× 25/100 GbE | Public + Cluster network |
| **Storage** | 10-12× NVMe/SATA SSD OSD | HBA/IT mode, žádný RAID |
| **OS disk** | 2× SATA SSD RAID 1 | OS, Ceph MON/MGR |
**BIOS pro Ceph:**
- SATA/NVMe: AHCI/NVMe mode (ne RAID)
- C-States: Disabled (nižší latence OSD)
- NUMA: Enabled
- Power: Performance
### MinIO node
| Komponenta | Doporučení |
|-----------|-----------|
| **CPU** | 8-16 cores (32+ pro erasure coding) |
| **RAM** | 32-64 GB + 1 GB per 1 TB storage |
| **Storage** | 4-16× NVMe (direct, no RAID) |
| **Network** | 2× 25/100 GbE |
| **OS** | Ubuntu / RHEL, XFS (pro data) |
### NAS (TrueNAS / FreeNAS)
- **ZFS**: RAID-Z1/Z2/Z3, compression (lz4, zstd), dedup
- **ARC cache**: 1 GB per 1 TB storage (max 64 GB)
- **L2ARC**: NVMe cache (optional, read-heavy)
- **SLOG**: NVDIMM / Optane (sync write, ZIL)
- **Network**: 2-4× 10/25 GbE LACP
---
## 5. Web / API servery
| Parametr | Doporučení |
|----------|-----------|
| **CPU** | High clock, 8-32 cores |
| **RAM** | 32-128 GB |
| **Storage** | 2× NVMe RAID 1 (OS + app) |
| **OS** | Ubuntu / RHEL, optimized kernel |
| **Network** | 2× 10/25 GbE (bonding) |
**Kernel tuning:**
```
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 15
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.core.netdev_max_backlog = 65535
```
---
## Rychlý decision tree — výběr serveru podle workloadu, velikosti a storage
```mermaid
flowchart TD
W["Jaký workload?"] --> DB["Databáze"]
W --> HV["Virtualizace"]
W --> K8s["Kubernetes"]
W --> AI["AI/ML"]
W --> ST["Storage server"]
W --> WEB["Web / API"]
DB --> DBS{"Velikost firmy"}
DBS -->|"< 500"| DB1["1× EPYC 8-16C, 64-256 GB<br/>NVMe RAID10, 2× 25GbE"]
DBS -->|"500-5000"| DB2{"Storage"}
DB2 -->|"Lokální"| DB2L["1-2× EPYC 16-24C, 128-512 GB<br/>NVMe RAID10, 4× 25GbE"]
DB2 -->|"Ceph"| DB2C["2× EPYC 16-32C, 256-512 GB<br/>RBD, 4× 25/100GbE"]
DBS -->|"Enterprise"| DB3{"Storage"}
DB3 -->|"FC SAN"| DB3F["2× EPYC 48-128C, 512-2048 GB<br/>SAN LUN + 2× FC 32/64G"]
DB3 -->|"Ceph"| DB3C["2× EPYC 32-64C, 256-512 GB<br/>RBD, 4× 100GbE"]
DBS -->|"Cloud"| DBC["RDS/Azure SQL/CloudSQL<br/>Managed, Multi-AZ"]
DB --> ORACLE{"Oracle architektura?"}
ORACLE -->|"Standalone"| ORA1["1-2× EPYC 8-24C<br/>64-512 GB, ASM local/FC<br/>2× 25GbE + FC 32G"]
ORACLE -->|"Data Guard"| ORA2["2× EPYC 32-64C<br/>256-1024 GB, FC SAN<br/>2× 25/100GbE + 2× FC 64G<br/>2× 25GbE (DG sync)"]
ORACLE -->|"RAC 2-4 nodes"| ORA3["Per node: 2× EPYC 32-64C<br/>512-2048 GB, FC SAN<br/>2× 100GbE (app)<br/>2× FC 64G (storage)<br/>2× 100GbE RoCE (interconnect)"]
ORACLE -->|"Exadata"| ORA4["Engineered system<br/>2-8 DB servers + 3-18 storage<br/>RoCE 100GbE, Smart Scan<br/>15-30 kW/rack"]
HV --> HVS{"Počet hostů"}
HVS -->|"2-3"| HV1["1× EPYC 12-24C, 128-256 GB<br/>RAID5/6 SSD, 2-4× 10/25GbE"]
HVS -->|"3-6"| HV2{"HCI"}
HV2 -->|"vSAN"| HV2V["1-2× EPYC 16-32C, 256-512 GB<br/>NVMe cache + SSD, 4× 25GbE"]
HV2 -->|"Ceph"| HV2C["1× EPYC 12-24C, 128-256 GB<br/>4-8× HBA NVMe/SSD, 4× 25GbE"]
HVS -->|"6+"| HV3["2× EPYC 32-64C, 512-2048 GB<br/>FC SAN 32/64G, 4-8× 25/100GbE"]
HVS -->|"20+"| HV4["2× EPYC 64-128C, 512-1024 GB<br/>OpenStack + Ceph, 4-8× 100GbE"]
K8s --> K8T{"Typ uzlu"}
K8T -->|"General"| K8G["16-32C, 64-128 GB<br/>2× NVMe, 2× 25GbE"]
K8T -->|"Memory"| K8M["32-64C, 256-512 GB<br/>3× NVMe, 2× 25GbE"]
K8T -->|"GPU"| K8U["32-64C, 512-1024 GB<br/>6-10× NVMe, H100/B200, 4× 100GbE"]
K8T -->|"Storage"| K8S["16-32C, 64-128 GB<br/>6-14× HBA NVMe, 4× 25GbE"]
AI --> AIT{"Účel"}
AIT -->|"Trénování"| AITR["GPU H100/B200, NVLink<br/>InfiniBand 400Gb/s, liquid cooling"]
AIT -->|"Inference"| AIIR["A100/H200, MIG<br/>PCIe 5.0, 2× 100GbE"]
ST --> STT{"Typ"}
STT -->|"Ceph OSD"| STC["EPYC (PCIe lanes)<br/>4-8 GB/OSD, HBA, 2× 25/100GbE"]
STT -->|"MinIO"| STM["EPYC 8-16C, 32-64 GB<br/>4-16× NVMe direct, 2× 25/100GbE"]
STT -->|"NAS (ZFS)"| STN["EPYC 16-32C, 64-128 GB<br/>RAID-Z, SLOG NVMe, 2-4× 10/25GbE"]
WEB --> WEBE["EPYC high clock, 8-32C<br/>32-128 GB, 2× NVMe RAID1, 2× 10/25GbE"]
```
### Connectivity summary podle platformy
| Platforma | App / VM síť | Storage síť | Replikace / Cluster | Management |
|-----------|-------------|-------------|---------------------|------------|
| **DB lokální (malá)** | 2× 25 GbE LACP | — | 2× 25 GbE (sdílené) | 1× 1 GbE (iLO) |
| **DB lokální (střední)** | 2× 25/100 GbE LACP | — | 2× 25 GbE dedikované | 1× 1 GbE (iLO) |
| **DB FC SAN** | 2× 25/100 GbE LACP | 2× 32/64 Gb FC multipath | FC replication | 1× 1 GbE (iLO) + SAN mgmt |
| **DB Ceph** | 2× 25/100 GbE | 2× 25/100 GbE (Ceph public) | 2× 25/100 GbE (Ceph cluster) | 1× 1 GbE (iLO) |
| **Hypervisor lokální** | 2-4× 10/25 GbE LACP | — (lokální) | — | 1× 1 GbE (iLO) |
| **Hypervisor vSAN** | 2× 25/100 GbE LACP | 2× 25/100 GbE (vSAN) | vSAN traffic | 1× 1 GbE (iLO) |
| **Hypervisor FC SAN** | 2-4× 25/100 GbE LACP | 2× 32/64 Gb FC multipath | 2× 25 GbE (vMotion) | 1× 1 GbE (iLO) |
| **Hypervisor Ceph** | 2× 25/100 GbE LACP | 2× 25/100 GbE (Ceph) | 2× 25 GbE (migration) | 1× 1 GbE (iLO) |
| **Kubernetes** | 2× 25/100 GbE | 2× 25/100 GbE (Ceph/Longhorn) | 2× 25/100 GbE (K8s cluster) | 1× 1 GbE (BMC) |
| **Web/API** | 2× 10/25 GbE LACP | — | — | 1× 1 GbE (BMC) |
| **Oracle Standalone** | 2× 25 GbE LACP | 2× FC 32G nebo NVMe local | Data Guard 2× 25 GbE | 1× 1 GbE (iLO) + ASM mgmt |
| **Oracle Data Guard** | 2× 25/100 GbE LACP | 2× FC 64G multipath | 2× 25 GbE (DG sync) | 1× 1 GbE (iLO) + SAN mgmt |
| **Oracle RAC** | 2× 100 GbE LACP (VIP/SCAN) | 2× FC 64G multipath | 2× 100 GbE RoCE (Cache Fusion) | 1× 1 GbE (iLO) + Clusterware |
| **Oracle Exadata** | 4-8× 100 GbE RoCE | NVMe over Fabric | RDMA interconnect | Exadata CLI + OEDA |
## Zdroje
Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
*Poslední revize: 2026-06-03*

400
SERVER-HW.en.md Normal file
View File

@@ -0,0 +1,400 @@
# 🔧 Server hardware — components and architecture
## Form factors
| Type | Description | Advantages | Disadvantages |
|-----|-------|--------|----------|
| **Rack (1U/2U/4U)** | Standard rack mount, 19" width | Wide range of configurations, easy replacement | Limited PCIe slots in 1U |
| **Blade** | Modular server into chassis (HPE Synergy, Dell MX) | High density, shared power/cooling | Vendor lock-in, higher chassis cost |
| **Tower** | Standalone cabinet | Quiet, expandable | Takes space, not rack-optimized |
| **Edge / Micro** | Small, low power, industrial design | Environmental resistance, low consumption | Limited performance, fewer PCIe |
## Processors (CPU)
### Intel Xeon vs AMD EPYC
| Feature | Intel Xeon (6th gen Granite Rapids) | AMD EPYC (5th gen Turin) |
|-----------|-----------------------------------|------------------------|
| **Max cores** | 128 (P-cores) | 192 (Zen 5c) / 128 (Zen 5) |
| **PCIe lanes** | 80-96 per socket | 128 per socket |
| **Memory channels** | 8 (DDR5) | 12 (DDR5) |
| **Max memory** | 4 TB | 6 TB+ |
| **Cache L3** | ~200 MB | ~384 MB |
| **AVX-512** | Yes (full width) | Yes (256bit) |
| **AMX (matrix)** | Yes (AMX, Intel AMX) | No |
| **TDP** | 350-500 W | 360-500 W |
| **Infrastructure** | Intel QuickAssist, DSA, IAA | AMD Infinity Architecture |
| **Use case** | AI inference, networking, HPC | Virtualization, databases, general purpose |
### CPU selection guide
| Workload | Recommended CPU | Rationale |
|----------|---------------|------------|
| **Database (OLTP)** | EPYC (high core count, more memory channels) | More PCIe lanes for NVMe, higher memory bandwidth |
| **Database (OLAP/DW)** | Xeon (AVX-512, AMX) | Vector instructions for analytical queries |
| **Virtualization** | EPYC (more cores, lower TCO) | Higher core density, lower price per core |
| **HPC / AI training** | Xeon + GPU (AMX for preprocessing) | AMX for data preprocessing, GPU for training |
| **Web / API servers** | EPYC (good perf/core, low TDP variants) | Good performance/W ratio |
| **Storage** | EPYC (128 PCIe lanes for NVMe) | Maximum NVMe drives |
## Memory (RAM)
### DIMM types
| Type | Description | Use case | Server support |
|-----|-------|----------|---------------|
| **RDIMM** (Registered) | Registered, buffered address lines (1 register) | Standard server memory | All servers |
| **LRDIMM** (Load-Reduced) | Reduced electrical load (2 registers — data + addresses) | High-capacity configurations (more DIMMs per channel) | Enterprise, 4R+ |
| **NVDIMM** (Non-Volatile) | Battery-backed DRAM + flash | Write cache, metadata, persistence | Legacy (Intel Optane PMEM) |
| **3D XPoint / Optane** | PCM-based persistence (discontinued by Intel) | Legacy | Intel-only, discontinued |
### DDR5 vs DDR4 key differences
| Feature | DDR4 | DDR5 |
|-----------|------|------|
| **Channel architecture** | 1× 64-bit channel per DIMM | 2× 32-bit sub-channel per DIMM |
| **Bank groups** | 4 (single rank) | 8 (single rank) |
| **Burst length** | 8 (BL8) | 16 (BL16) |
| **On-die ECC** | No | Yes (for correcting bit errors in DRAM) |
| **PMIC** | On motherboard | On DIMM (power management IC) |
| **VDD** | 1.2 V | 1.1 V |
| **RCD** | 1× RCD per DIMM | 2× RCD (one per sub-channel) |
| **Max DIMM capacity** | 64 GB (LRDIMM) | 256 GB (RDIMM 3DS) |
| **Max speed** | 3200 MT/s | 6400 MT/s (currently 4800-5600) |
### Memory rank — detail
Rank = set of DRAM chips on a DIMM that are accessible simultaneously (64bit data + 8bit ECC).
| Rank | Number of DRAM chips (x8) | DIMM capacity (typ.) | Description |
|------|---------------------|---------------------|-------|
| **Single Rank (1R)** | 8-9 | 8-32 GB | All DRAM chips in one bank |
| **Dual Rank (2R)** | 16-18 | 16-128 GB | Two banks, rank interleaving |
| **Quad Rank (4R)** | 32-36 | 64-256 GB (3DS) | Four banks, higher capacity |
| **Octa Rank (8R)** | 64-72 | 256 GB (3DS) | Highest capacity, enterprise |
**Rank interleaving**: Dual-rank DIMM can address two ranks alternately, increasing effective bandwidth (up to 5-15 % over single-rank at the same speed).
**DDR5 rank vs DDR4**: DDR5 single-rank already contains 8 bank groups (equivalent to dual-rank DDR4), therefore rank upgrade is less significant on DDR5 than DDR4.
**Rule**: Always prefer dual-rank DIMMs over single-rank for higher density and bandwidth. Quad-rank and octa-rank only LRDIMM or 3DS.
### DIMM population — basic rules
#### 1DPC vs 2DPC (DIMMs Per Channel)
| Configuration | DIMMs per channel | Max speed DDR5 | Bandwidth | Capacity |
|------------|-----------------|---------------|-----------|----------|
| **1DPC** | 1 | 4800-5600 MT/s | 100 % | Lower |
| **2DPC** | 2 | 4000-4400 MT/s | ~80 % | Higher |
**Important**: Populating 2 DIMMs per channel reduces memory speed. E.g. Dell R760:
- 1DPC: 5600 MT/s (with 5th Gen Xeon)
- 2DPC: 4400 MT/s (always)
#### Channel architecture (Intel Xeon 4th/5th Gen — 8 channels per CPU)
```
CPU 1 — Channel A [Slot A1 (white)] [Slot A9 (black)] 1DPC: populate white slots
─ Channel B [Slot A7 (white)] [Slot A15 (black)] 2DPC: populate white + black
─ Channel C [Slot A3 (white)] [Slot A11 (black)]
─ Channel D [Slot A5 (white)] [Slot A13 (black)]
─ Channel E [Slot A4 (white)] [Slot A12 (black)]
─ Channel F [Slot A6 (white)] [Slot A14 (black)]
─ Channel G [Slot A2 (white)] [Slot A10 (black)]
─ Channel H [Slot A8 (white)] [Slot A16 (black)]
```
#### Channel architecture (AMD EPYC — 12 channels per CPU)
```
CPU 1 ─ Channel 0-11 (12× single channel, 2 DPC)
Slot A0 (P0) / Slot A1 (P1) — per specific server model
```
AMD EPYC has 12 memory channels (vs Intel 8), giving 50 % higher theoretical memory bandwidth.
### Population rules by vendor
#### Dell PowerEdge (R660 / R760)
| Number of DIMMs per CPU | 1DPC (white slots) | 2DPC (white + black) | Speed |
|-------------------|-------------------|---------------------|-------|
| **1 DIMM per CPU** | A1 (Channel A) | — | 5600 MT/s |
| **2 DIMMs per CPU** | A1, A7 | — | 5600 MT/s |
| **4 DIMMs per CPU** | A1, A7, A3, A5 | — | 5600 MT/s |
| **8 DIMMs per CPU** | A1-A8 (all white) | — | 5600 MT/s |
| **16 DIMMs per CPU** | A1-A8 (white) | A9-A16 (black) | 4400 MT/s |
**Key Dell rules**:
1. All DIMMs must be DDR5 (do not mix generations)
2. Do not mix DIMM capacities (all identical)
3. Do not mix x4 and x8 DRAM chips
4. Do not mix 3DS and non-3DS RDIMM
5. If mixing DIMM speeds, all run at the lowest
6. Balance capacity across processors
7. Optimal configuration: 16× identical DIMM (1DPC on each channel)
8. Fault Resilient Memory (FRM): only 8 or 16 DIMMs per processor
#### HPE ProLiant (DL360 / DL380 Gen11)
**Population order** (16 slots per CPU, Intel):
| DIMMs | Population order |
|-------|---------------|
| 1 | 10 |
| 2 | 1, 3 |
| 4 | 1, 3, 7, 10 |
| 6 | 3, 5, 7, 10, 14, 16 |
| 8 | 1, 3, 5, 7, 10, 12, 14, 16 |
| 12 | 1, 2, 3, 5, 6, 7, 10, 11, 12, 14, 15, 16 |
| 16 | 1-16 |
**HPE SmartMemory rules**:
1. Most qualified configuration: 1DPC (white slots)
2. 2DPC (black slots) only after populating all white
3. HBM + 4th Gen Intel: does not support Hemi (hemisphere) and SGX
4. Heterogeneous mix: higher rank count into white slots
5. **Do not mix**: 3DS with non-3DS, x4 with x8, different ranks in channel, 16 Gb / 24 Gb / 32 Gb DRAM
#### HPE Gen11/Gen12 with AMD EPYC 9005 (a50012817enw)
AMD EPYC 9005 (Turin) delivers 12 memory channels per CPU and supports DDR5-6400.
| Feature | Detail |
|-----------|--------|
| **Memory channels** | 12 per CPU (vs 8 on Intel) |
| **Max DIMM slots** | 24 per CPU (2 DPC) |
| **Max speed** | DDR5-6400 (1 DPC), DDR5-48005600 (2 DPC) |
| **Max capacity** | 6 TB+ (12× 256 GB 3DS RDIMM) |
| **DIMM types** | RDIMM (1R/2R/4R/8R), 3DS RDIMM, LRDIMM |
| **Population** | 1 DPC (white slots): 12 DIMMs, full speed; 2 DPC: 24 DIMMs, reduced speed |
| **Optimum** | 12× identical DIMMs (1 DPC on each channel) = max bandwidth |
**Rules for AMD EPYC 9005:**
1. Populate with equal capacities within a channel
2. 1 DPC = full speed 6400 MT/s, 2 DPC = lower speed
3. For optimal bandwidth: 12 DIMMs (1DPC) per CPU — all 12 channels utilized
4. Maximum capacity: 24 DIMMs (2DPC) — 24× 256 GB = 6 TB per CPU
5. Do not mix RDIMM and LRDIMM in the same system
### Memory population — decision flow
```
How many DIMMs per CPU?
├── 1 DIMM → Channel A (slot 1), losing 87.5 % bandwidth
├── 2 DIMMs → Channels A+B, still losing 75 % bandwidth
├── 4 DIMMs → Channels A,B,C,D, better but not optimal
├── 8 DIMMs → 1DPC on all channels = MAX SPEED (5600 MT/s)
│ ✅ Recommended for performance
├── 12 DIMMs → 8× 1DPC + 4× 2DPC = mixed speed (4400 MT/s)
├── 16 DIMMs → 2DPC on all channels = MAX CAPACITY (4400 MT/s)
│ ✅ For capacity-intensive workloads
└── More than 16 → LRDIMM / 3DS only, speed penalty
Conclusion: 8 DIMMs per CPU (1DPC) = highest performance
16 DIMMs per CPU (2DPC) = highest capacity
```
### Impact of configuration on performance
| Configuration | Relative bandwidth | Latency | Use case |
|------------|-------------------|---------|----------|
| **1DPC, 8 ch, 5600 MT/s** (8 DIMM) | 100 % | Lowest | OLTP databases, HPC, real-time |
| **2DPC, 8 ch, 4400 MT/s** (16 DIMM) | ~78 % | +10-15 % | Virtualization, VDI, in-memory DB |
| **Mixed 1+2DPC** (12 DIMM) | ~85 % | Medium | Capacity/performance compromise |
| **Unbalanced channels** | 50-70 % | High | **Avoid** |
**Vendor recommendations:**
- **Dell**: 16× identical DIMMs (8 per CPU), 1DPC, 5600 MT/s = optimal performance
- **HPE Intel**: Always populate white slots first, 1DPC for max performance, 2DPC for max capacity
- **HPE AMD EPYC 9005**: 12 channels per CPU, 1DPC = 12 DIMMs per CPU at 6400 MT/s (max bandwidth); 2DPC = 24 DIMMs per CPU (max capacity 6 TB)
- **Supermicro**: Consult specific manual for the given model (DSG, GPU, storage)
- **Lenovo**: Same rules as Intel/AMD platform — prefer 1DPC
### Memory sizing per workload
| Workload | RAM/core ratio | Typical pool | Recommended configuration |
|----------|---------------|--------------|----------------------|
| Database (OLTP) | 8-16 GB/core, DB in RAM | 256 GB - 2 TB | 8× 32-64 GB RDIMM, 1DPC |
| Database (OLAP) | 16-64 GB/core, columnstore | 512 GB - 4 TB+ | 16× 64-128 GB RDIMM, 2DPC |
| Virtualization (VM) | 4-8 GB/core, per VM density | 256 GB - 2 TB | 8-16× 32-64 GB RDIMM |
| Kubernetes (general) | 2-4 GB/core | 64-256 GB | 8× 16-32 GB RDIMM, 1DPC |
| AI training (CPU preprocessing) | 2-4 GB/core | 128-512 GB | 8× 32-64 GB RDIMM, 1DPC |
| HPC | 1-2 GB/core | 64-128 GB | 8× 16 GB RDIMM, 1DPC, high-speed |
| In-memory DB (SAP HANA) | 8-32 GB/core | 1-6 TB+ | 16× 128-256 GB LRDIMM/3DS |
| Big Data — Spark worker | 4-8 GB/core | 128-512 GB | 8-16× 32-64 GB RDIMM, 1DPC, NVMe scratch |
| Big Data — Flink worker | 8-16 GB/core (incl. managed state) | 128-512 GB | 8-16× 32-64 GB RDIMM, 1DPC, RocksDB on NVMe |
| Big Data — Trino worker | 4-8 GB/core | 64-256 GB | 8× 16-32 GB RDIMM, 1DPC |
| Big Data — HDFS DataNode | 1-2 GB/core (metadata cache) | 64-256 GB | 8× 16-32 GB RDIMM, 1DPC, max storage density |
## PCIe
| Generation | Year | Speed per lane | x16 throughput | x24 (GPU) |
|----------|-----|-------------------|-----------------|-----------|
| **PCIe 3.0** | 2010 | 985 MB/s | 15.8 GB/s | 23.6 GB/s |
| **PCIe 4.0** | 2017 | 1.97 GB/s | 31.5 GB/s | 47.3 GB/s |
| **PCIe 5.0** | 2022 | 3.94 GB/s | 63 GB/s | 94.5 GB/s |
| **PCIe 6.0** | 2025 | 7.88 GB/s | 126 GB/s | 189 GB/s |
**PCIe lane allocation**:
- GPU (x16): NVIDIA H100, AMD MI300X
- NVMe U.2 (x4): each NVMe drive
- NIC 100 GbE (x16): dual-port 100 GbE
- RAID/HBA (x8): storage controller
**CPU PCIe lane count**:
- Intel Xeon Scalable (4th gen): 64-80 lanes per socket
- AMD EPYC (4th gen Genoa): 128 lanes per socket
- Dual-socket: 256 lanes total
## NUMA
### Topology
```
Socket 0 (NUMA node 0) Socket 1 (NUMA node 1)
├── Cores 0-31 ├── Cores 32-63
├── Memory 0-256 GB ├── Memory 256-512 GB
├── PCIe root complex (GPU, NVMe) ├── PCIe root complex (NIC, NVMe)
└── I/O hub └── I/O hub
│ │
└───────── Infinity Fabric / UPI ──┘
```
- **Local access** — CPU → own memory (low latency, full bandwidth)
- **Remote access** — CPU → second socket memory (higher latency, ~1.5×, lower bandwidth)
- NUMA-aware applications: databases, VMs, DPDK, AI training
### Cross-NUMA penalty
| CPU | Local latency | Remote latency | Penalty |
|-----|--------------|----------------|---------|
| AMD EPYC (Genoa) | ~80 ns | ~150 ns | ~1.9× |
| Intel Xeon (Sapphire Rapids) | ~90 ns | ~160 ns | ~1.8× |
## TDP and cooling
| CPU | TDP | Core count | Cooling |
|-----|-----|-----------|----------|
| Intel Xeon Platinum 8480+ | 350 W | 56 | Air (high-performance) |
| Intel Xeon 6980P (Granite Rapids) | 500 W | 128 | Liquid recommended |
| AMD EPYC 9654 (Genoa) | 360 W | 96 | Air / Liquid |
| AMD EPYC 9965 (Turin) | 500 W | 192 | Liquid recommended |
### Cooling requirements per rack density
| Rack density | kW/rack | Cooling |
|-------------|---------|---------|
| Low | 1-5 kW | Free air cooling |
| Medium | 5-15 kW | CRAC/CRAH, hot/cold aisle |
| High | 15-40 kW | In-row cooling, rear-door HX |
| Ultra | 40-100+ kW | Direct-to-chip liquid, immersion |
## BMC and management
| Vendor | BMC | API | Remote console | Features |
|--------|-----|-----|---------------|----------|
| **Dell** | iDRAC (9/10) | Redfish, RACADM | Virtual Console (HTML5) | Lifecycle Controller, SUU |
| **HPE** | iLO (5/6) | Redfish, iLOREST | Integrated Remote Console | Smart Update Manager, SUM |
| **Supermicro** | BMC / IPMI | IPMI, Redfish | IPMIView, HTML5 KVM | SuperDoctor, SSM |
| **Lenovo** | XClarity Controller | Redfish, IPMI | Remote Console | XClarity Administrator |
| **Cisco** | CIMC / UCSM | Redfish, XML API | KVM Console | UCS Manager, Intersight |
### Standard functions
- Power: on/off/cycle/reset
- Boot: one-shot PXE, CD-ROM redirect, BIOS setup
- Monitoring: sensors (temp, voltage, fan, PSU)
- Alerting: SNMP traps, email, Redfish events
- Remote media: ISO mount over network
- Serial over LAN (SOL)
## Vendors and series
| Vendor | Rack series | Blade series | Management |
|---------|-------------|-------------|------------|
| **Dell** | PowerEdge R6xx/R7xx (R660, R760) | MX7000, FX2 | iDRAC, OpenManage Enterprise |
| **HPE** | ProLiant DL (DL360, DL380) | Synergy, BladeSystem | iLO, OneView, OpsRamp |
| **Cisco** | UCS C-Series (C240, C245) | UCS B-Series, Fabric Interconnect | UCS Manager, Intersight |
| **Lenovo** | ThinkSystem SR (SR630, SR650) | ThinkSystem SN | XClarity |
| **Supermicro** | SuperServer (for GPU, storage, cloud) | FatTwin, MicroBlade | IPMI, SuperDoctor |
## Server connectivity
Detailed chapter on network and storage connectivity: [CONNECTIVITY.en.md](CONNECTIVITY.en.md)
## Storage controllers
| Controller | Type | RAID | Cache | Protocol |
|-----------|-----|------|-------|----------|
| **Dell PERC** (H755, H965) | HW RAID | 0/1/5/6/10/50/60 | 4-8 GB NV | NVMe, SAS, SATA |
| **Broadcom / LSI** (9560, 9670) | HW RAID / HBA | 0/1/5/6/10/50/60 | 4 GB NV | NVMe, SAS, SATA |
| **Intel VROC** | SW RAID (CPU) | 0/1/5/10 | — | NVMe only |
| **M.2 HW RAID** (BOSS-S1) | HW RAID | 0/1 | — | 2× M.2 NVMe/SATA |
### IT vs HW RAID mode
| Feature | IT (Initiator Target) / HBA | HW RAID |
|-----------|---------------------------|---------|
| **OS sees** | Each disk individually | RAID virtual disk |
| **Caching** | OS cache | RAID controller cache (BBU) |
| **RAID** | Software (mdadm, ZFS, Ceph) | Hardware + SW driver |
| **Passthrough** | Yes | No |
| **Use case** | SDS (Ceph, MinIO), ZFS | VMware VMFS, Windows, legacy |
| **Battery/Backup** | Not needed | Write-back cache requires BBU |
## Pricing (2026)
### CPU pricing (2026)
| CPU | Cores | TDP | 1ku price | $/core |
|-----|-------|-----|----------|--------|
| AMD EPYC 9965 (Turin) | 192 | 500 W | ~$11,988 | $62 |
| AMD EPYC 9655 (Turin) | 96 | 400 W | ~$6,500 | $68 |
| AMD EPYC 9475F (Turin) | 48 | 360 W | ~$5,000 | $104 |
| Intel Xeon 6980P (Granite Rapids) | 128 | 500 W | ~$12,460 | $97 |
| Intel Xeon 6980P (Granite Rapids-AP) | 128 | 500 W | $13,955 | $109 |
| Intel Xeon 6767P (Granite Rapids) | 64 | 350 W | ~$7,000 | $109 |
Sources: AMD 1ku pricing, Intel RCP, Newegg verified.
### DDR5 RDIMM pricing (2026 — AI-driven price surge)
| Capacity | Speed | Price 2025 | Price Q2 2026 | Change |
|----------|---------|-----------|-------------|-------|
| 32 GB (2R×8) | DDR5-5600 | ~$95 | ~$400550 | +400500 % |
| 64 GB (2R×4) | DDR5-4800 | ~$180 | ~$700900 | +400 % |
| 96 GB (2R×4) | DDR5-6400 | ~$300 | ~$1,2001,600 | +400 % |
| 128 GB (2R×4) | DDR5-6400 | ~$450 | ~$1,8002,500 | +450 % |
| 256 GB (LRDIMM) | DDR5-6400 | ~$900 | ~$4,0005,000 | +450 % |
Trend: DDR5 prices have risen ~400500 % since mid-2025 due to AI-driven demand. Further increases expected in H2 2026. Source: Counterpoint, TrendForce.
### NVMe SSD pricing (enterprise, 2026)
| Capacity | Type | Price 2024 | Price Q2 2026 | Change |
|----------|-----|-----------|-------------|-------|
| 1.92 TB | NVMe U.3 (read-intensive) | ~$200 | ~$500600 | +150 % |
| 3.84 TB | NVMe U.3 (mixed-use) | ~$400 | ~$1,0001,200 | +150 % |
| 7.68 TB | NVMe U.3 (mixed-use) | ~$800 | ~$2,0002,500 | +150 % |
| 15.36 TB | NVMe U.3 (mixed-use) | ~$1,500 | ~$4,0005,000 | +170 % |
Trend: NAND flash prices have risen ~100200 % since 2025, average enterprise SSD now costs 23× more. Source: TrendForce, Xinnor.
### Total server cost (example configurations)
| Configuration | CPU | RAM | Storage | Estimated Price |
|-------------|-----|-----|------|-----------|
| DB server (OLTP) | 2× EPYC 9655 (96C) | 1 TB DDR5 | 6× 1.92 TB NVMe | ~$45,00060,000 |
| GPU server (AI) | 2× Xeon 6980P | 2 TB DDR5 | 4× 3.84 TB NVMe | ~$80,000120,000 (w/o GPU) |
| Hypervisor host | 2× EPYC 9475F (48C) | 512 GB DDR5 | 2× 1.92 TB NVMe + 4× 16 TB HDD | ~$25,00035,000 |
| Storage server (Ceph) | 1× EPYC 9655 (96C) | 256 GB DDR5 | 24× 15.36 TB NVMe | ~$60,00080,000 |
## Sources
Links, books and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)
*Last revision: 2026-06-03*

400
SERVER-HW.md Normal file
View File

@@ -0,0 +1,400 @@
# 🔧 Server hardware — komponenty a architektura
## Form faktory
| Typ | Popis | Výhody | Nevýhody |
|-----|-------|--------|----------|
| **Rack (1U/2U/4U)** | Standardní rack mount, šířka 19" | Široká škála konfigurací, jednoduchá výměna | Omezený počet PCIe slotů v 1U |
| **Blade** | Modulární server do chassis (HPE Synergy, Dell MX) | Vysoká hustota, sdílené napájení/chlazení | Vendor lock-in, vyšší cena chassis |
| **Tower** | Samostatně stojící skříň | Tichý, rozšiřitelný | Zabírá místo, není rack-optimized |
| **Edge / Micro** | Malý, nízká spotřeba, industriální provedení | Odolnost vůči prostředí, nízký odběr | Omezený výkon, méně PCIe |
## Procesory (CPU)
### Intel Xeon vs AMD EPYC
| Vlastnost | Intel Xeon (6. gen Granite Rapids) | AMD EPYC (5. gen Turin) |
|-----------|-----------------------------------|------------------------|
| **Max jader** | 128 (P-cores) | 192 (Zen 5c) / 128 (Zen 5) |
| **PCIe lanes** | 80-96 per socket | 128 per socket |
| **Memory channels** | 8 (DDR5) | 12 (DDR5) |
| **Max memory** | 4 TB | 6 TB+ |
| **Cache L3** | ~200 MB | ~384 MB |
| **AVX-512** | Ano (full width) | Ano (256bit) |
| **AMX (matrix)** | Ano (AMX, Intel AMX) | Ne |
| **TDP** | 350-500 W | 360-500 W |
| **Infrastructure** | Intel QuickAssist, DSA, IAA | AMD Infinity Architecture |
| **Use case** | AI inference, networking, HPC | Virtualizace, databáze, general purpose |
### CPU selection guide
| Workload | Doporučený CPU | Zdůvodnění |
|----------|---------------|------------|
| **Databáze (OLTP)** | EPYC (high core count, more memory channels) | Více PCIe lanes pro NVMe, vyšší memory bandwidth |
| **Databáze (OLAP/DW)** | Xeon (AVX-512, AMX) | Vektorové instrukce pro analytické dotazy |
| **Virtualizace** | EPYC (více jader, nižší TCO) | Vyšší core density, nižší cena per core |
| **HPC / AI training** | Xeon + GPU (AMX pro preprocessing) | AMX pro data preprocessing, GPU pro training |
| **Web / API servery** | EPYC (good perf/core, low TDP variants) | Dobrý poměr výkon/W |
| **Storage** | EPYC (128 PCIe lanes pro NVMe) | Maximum NVMe disků |
## Operační paměť (RAM)
### Typy DIMM
| Typ | Popis | Use case | Server support |
|-----|-------|----------|---------------|
| **RDIMM** (Registered) | Registrovaná, buffer adresových linek (1 register) | Standardní serverová paměť | Všechny servery |
| **LRDIMM** (Load-Reduced) | Snížená elektrická zátěž (2 registry — data + adresy) | Vysokokapacitní konfigurace (více DIMMů na channel) | Enterprise, 4R+ |
| **NVDIMM** (Non-Volatile) | Bateriově zálohovaná DRAM + flash | Write cache, metadata, persistence | Legacy (Intel Optane PMEM) |
| **3D XPoint / Optane** | PCM-based persistence (ukončeno Intelem) | Legacy | Intel-only, ukončeno |
### DDR5 vs DDR4 klíčové rozdíly
| Vlastnost | DDR4 | DDR5 |
|-----------|------|------|
| **Channel architektura** | 1× 64-bit channel per DIMM | 2× 32-bit sub-channel per DIMM |
| **Bank groups** | 4 (single rank) | 8 (single rank) |
| **Burst length** | 8 (BL8) | 16 (BL16) |
| **On-die ECC** | Ne | Ano (pro opravu bitových chyb v DRAM) |
| **PMIC** | Na motherboard | Na DIMM (power management IC) |
| **VDD** | 1.2 V | 1.1 V |
| **RCD** | 1× RCD per DIMM | 2× RCD (jeden na sub-channel) |
| **Max DIMM capacity** | 64 GB (LRDIMM) | 256 GB (RDIMM 3DS) |
| **Max speed** | 3200 MT/s | 6400 MT/s (aktuálně 4800-5600) |
### Memory rank — detail
Rank = sada DRAM čipů na DIMMu, které jsou přístupné současně (64bit data + 8bit ECC).
| Rank | Počet DRAM čipů (x8) | Kapacita DIMM (typ.) | Popis |
|------|---------------------|---------------------|-------|
| **Single Rank (1R)** | 8-9 | 8-32 GB | Všechny DRAM čipy v jedné bance |
| **Dual Rank (2R)** | 16-18 | 16-128 GB | Dvě banky, rank interleaving |
| **Quad Rank (4R)** | 32-36 | 64-256 GB (3DS) | Čtyři banky, vyšší kapacita |
| **Octa Rank (8R)** | 64-72 | 256 GB (3DS) | Nejvyšší kapacita, enterprise |
**Rank interleaving**: Dual-rank DIMM může oslovovat dva ranking střídavě, což zvyšuje efektivní bandwidth (až o 5-15 % oproti single-rank při stejném taktu).
**DDR5 rank vs DDR4**: DDR5 single-rank již obsahuje 8 bank groups (ekvivalent dual-rank DDR4), proto je rank upgrade u DDR5 méně výrazný než u DDR4.
**Pravidlo**: Vždy preferovat dual-rank DIMMy před single-rank pro vyšší hustotu a bandwidth. Quad-rank a octa-rank pouze LRDIMM nebo 3DS.
### Osazování DIMM — základní pravidla
#### 1DPC vs 2DPC (DIMMs Per Channel)
| Konfigurace | DIMMů na channel | Max speed DDR5 | Bandwidth | Kapacita |
|------------|-----------------|---------------|-----------|----------|
| **1DPC** | 1 | 4800-5600 MT/s | 100 % | Nižší |
| **2DPC** | 2 | 4000-4400 MT/s | ~80 % | Vyšší |
**Důležité**: Při osazení 2 DIMMů na channel klesá rychlost pamětí. Např. Dell R760:
- 1DPC: 5600 MT/s (s 5th Gen Xeon)
- 2DPC: 4400 MT/s (vždy)
#### Channel architecture (Intel Xeon 4th/5th Gen — 8 channels per CPU)
```
CPU 1 — Channel A [Slot A1 (white)] [Slot A9 (black)] 1DPC: osadit bílé sloty
─ Channel B [Slot A7 (white)] [Slot A15 (black)] 2DPC: osadit bílé + černé
─ Channel C [Slot A3 (white)] [Slot A11 (black)]
─ Channel D [Slot A5 (white)] [Slot A13 (black)]
─ Channel E [Slot A4 (white)] [Slot A12 (black)]
─ Channel F [Slot A6 (white)] [Slot A14 (black)]
─ Channel G [Slot A2 (white)] [Slot A10 (black)]
─ Channel H [Slot A8 (white)] [Slot A16 (black)]
```
#### Channel architecture (AMD EPYC — 12 channels per CPU)
```
CPU 1 ─ Channel 0-11 (12× single channel, 2 DPC)
Slot A0 (P0) / Slot A1 (P1) — dle konkrétního serveru
```
AMD EPYC má 12 memory channels (vs Intel 8), což dává o 50 % vyšší teoretickou memory bandwidth.
### Pravidla osazování od výrobců
#### Dell PowerEdge (R660 / R760)
| Počet DIMMů na CPU | 1DPC (bílé sloty) | 2DPC (bílé + černé) | Speed |
|-------------------|-------------------|---------------------|-------|
| **1 DIMM per CPU** | A1 (Channel A) | — | 5600 MT/s |
| **2 DIMMs per CPU** | A1, A7 | — | 5600 MT/s |
| **4 DIMMs per CPU** | A1, A7, A3, A5 | — | 5600 MT/s |
| **8 DIMMs per CPU** | A1-A8 (všechny bílé) | — | 5600 MT/s |
| **16 DIMMs per CPU** | A1-A8 (bílé) | A9-A16 (černé) | 4400 MT/s |
**Klíčová pravidla dle Dell**:
1. Všechny DIMMy musí být DDR5 (nemíchat generace)
2. Nemíchat kapacity DIMMů (všechny stejné)
3. Nemíchat x4 a x8 DRAM chips
4. Nemíchat 3DS a non-3DS RDIMM
5. Pokud mícháte rychlosti DIMMů, všechny běží na nejnižší
6. Vyvážit kapacitu mezi procesory
7. Optimální konfigurace: 16× identický DIMM (1DPC na každém channelu)
8. Fault Resilient Memory (FRM): pouze 8 nebo 16 DIMMů na procesor
#### HPE ProLiant (DL360 / DL380 Gen11)
**Population order** (16 slotů na CPU, Intel):
| DIMMů | Pořadí osazení |
|-------|---------------|
| 1 | 10 |
| 2 | 1, 3 |
| 4 | 1, 3, 7, 10 |
| 6 | 3, 5, 7, 10, 14, 16 |
| 8 | 1, 3, 5, 7, 10, 12, 14, 16 |
| 12 | 1, 2, 3, 5, 6, 7, 10, 11, 12, 14, 15, 16 |
| 16 | 1-16 |
**Pravidla HPE SmartMemory**:
1. Nejkvalifikovanější konfigurace: 1DPC (bílé sloty)
2. 2DPC (černé sloty) až po osazení všech bílých
3. HBM + 4th Gen Intel: nepodporuje Hemi (hemisphere) a SGX
4. Heterogenní mix: vyšší rank count do bílých slotů
5. **Nemíchat**: 3DS s non-3DS, x4 s x8, různé ranky v channelu, 16 Gb / 24 Gb / 32 Gb DRAM
#### HPE Gen11/Gen12 s AMD EPYC 9005 (a50012817enw)
AMD EPYC 9005 (Turin) přináší 12 memory channels na CPU a podporu DDR5-6400.
| Vlastnost | Detail |
|-----------|--------|
| **Memory channels** | 12 per CPU (vs 8 u Intel) |
| **Max DIMM slots** | 24 per CPU (2 DPC) |
| **Max speed** | DDR5-6400 (1 DPC), DDR5-48005600 (2 DPC) |
| **Max capacity** | 6 TB+ (12× 256 GB 3DS RDIMM) |
| **DIMM typy** | RDIMM (1R/2R/4R/8R), 3DS RDIMM, LRDIMM |
| **Population** | 1 DPC (bílé sloty): 12 DIMMs, plná rychlost; 2 DPC: 24 DIMMs, snížená rychlost |
| **Optimum** | 12× identických DIMMů (1 DPC na každém channelu) = max bandwidth |
**Pravidla pro AMD EPYC 9005:**
1. Osazovat po stejných kapacitách v rámci channelu
2. 1 DPC = plná rychlost 6400 MT/s, 2 DPC = nižší rychlost
3. Pro optimální bandwidth: 12 DIMMů (1DPC) na CPU — využito všech 12 channelů
4. Maximální kapacita: 24 DIMMů (2DPC) — 24× 256 GB = 6 TB na CPU
5. Nemíchat RDIMM a LRDIMM ve stejném systému
### Memory population — decision flow
```
Kolik DIMMů na CPU?
├── 1 DIMM → Channel A (slot 1), ztrácíte 87.5 % bandwidth
├── 2 DIMMs → Channels A+B, stále ztráta 75 % bandwidth
├── 4 DIMMs → Channels A,B,C,D, lepší, ale ne optimální
├── 8 DIMMs → 1DPC na všech channel = MAX SPEED (5600 MT/s)
│ ✅ Doporučeno pro výkon
├── 12 DIMMs → 8× 1DPC + 4× 2DPC = mixed speed (4400 MT/s)
├── 16 DIMMs → 2DPC na všech channel = MAX KAPACITA (4400 MT/s)
│ ✅ Pro kapacitně náročné workloady
└── Více než 16 → Pouze s LRDIMM / 3DS, speed penalty
Závěr: 8 DIMMů na CPU (1DPC) = nejvyšší výkon
16 DIMMů na CPU (2DPC) = nejvyšší kapacita
```
### Vliv konfigurace na výkon
| Konfigurace | Relativní bandwidth | Latence | Use case |
|------------|-------------------|---------|----------|
| **1DPC, 8 ch, 5600 MT/s** (8 DIMM) | 100 % | Nejnižší | Databáze OLTP, HPC, real-time |
| **2DPC, 8 ch, 4400 MT/s** (16 DIMM) | ~78 % | +10-15 % | Virtualizace, VDI, in-memory DB |
| **Mixed 1+2DPC** (12 DIMM) | ~85 % | Střední | Kompromis kapacity/výkonu |
| **Unbalanced channels** | 50-70 % | Vysoká | **Vyhnout se** |
**Doporučení výrobců:**
- **Dell**: 16× identických DIMMů (8 per CPU), 1DPC, 5600 MT/s = optimální výkon
- **HPE Intel**: Vždy plnit bílé sloty první, pro max výkon 1DPC, pro max kapacitu 2DPC
- **HPE AMD EPYC 9005**: 12 channelů na CPU, 1DPC = 12 DIMMů na CPU při 6400 MT/s (max bandwidth); 2DPC = 24 DIMMů na CPU (max kapacita 6 TB)
- **Supermicro**: Sledovat konkrétní manual pro daný model (DSG, GPU, storage)
- **Lenovo**: Stejná pravidla jako Intel/AMD platforma — preferovat 1DPC
### Memory sizing per workload
| Workload | Poměr RAM/core | Typický pool | Doporučená konfigurace |
|----------|---------------|--------------|----------------------|
| Databáze (OLTP) | 8-16 GB/core, DB v RAM | 256 GB - 2 TB | 8× 32-64 GB RDIMM, 1DPC |
| Databáze (OLAP) | 16-64 GB/core, columnstore | 512 GB - 4 TB+ | 16× 64-128 GB RDIMM, 2DPC |
| Virtualizace (VM) | 4-8 GB/core, podle VM density | 256 GB - 2 TB | 8-16× 32-64 GB RDIMM |
| Kubernetes (general) | 2-4 GB/core | 64-256 GB | 8× 16-32 GB RDIMM, 1DPC |
| AI training (CPU preprocessing) | 2-4 GB/core | 128-512 GB | 8× 32-64 GB RDIMM, 1DPC |
| HPC | 1-2 GB/core | 64-128 GB | 8× 16 GB RDIMM, 1DPC, high-speed |
| In-memory DB (SAP HANA) | 8-32 GB/core | 1-6 TB+ | 16× 128-256 GB LRDIMM/3DS |
| Big Data — Spark worker | 4-8 GB/core | 128-512 GB | 8-16× 32-64 GB RDIMM, 1DPC, NVMe scratch |
| Big Data — Flink worker | 8-16 GB/core (vč. managed state) | 128-512 GB | 8-16× 32-64 GB RDIMM, 1DPC, RocksDB na NVMe |
| Big Data — Trino worker | 4-8 GB/core | 64-256 GB | 8× 16-32 GB RDIMM, 1DPC |
| Big Data — HDFS DataNode | 1-2 GB/core (metadata cache) | 64-256 GB | 8× 16-32 GB RDIMM, 1DPC, max storage density |
## PCIe
| Generace | Rok | Rychlost per lane | x16 propustnost | x24 (GPU) |
|----------|-----|-------------------|-----------------|-----------|
| **PCIe 3.0** | 2010 | 985 MB/s | 15.8 GB/s | 23.6 GB/s |
| **PCIe 4.0** | 2017 | 1.97 GB/s | 31.5 GB/s | 47.3 GB/s |
| **PCIe 5.0** | 2022 | 3.94 GB/s | 63 GB/s | 94.5 GB/s |
| **PCIe 6.0** | 2025 | 7.88 GB/s | 126 GB/s | 189 GB/s |
**PCIe lane allocation**:
- GPU (x16): NVIDIA H100, AMD MI300X
- NVMe U.2 (x4): každý NVMe disk
- NIC 100 GbE (x16): dual-port 100 GbE
- RAID/HBA (x8): storage controller
**CPU PCIe lane count**:
- Intel Xeon Scalable (4. gen): 64-80 lanes per socket
- AMD EPYC (4. gen Genoa): 128 lanes per socket
- Dual-socket: 256 lanes total
## NUMA
### Topologie
```
Socket 0 (NUMA node 0) Socket 1 (NUMA node 1)
├── Cores 0-31 ├── Cores 32-63
├── Memory 0-256 GB ├── Memory 256-512 GB
├── PCIe root complex (GPU, NVMe) ├── PCIe root complex (NIC, NVMe)
└── I/O hub └── I/O hub
│ │
└───────── Infinity Fabric / UPI ──┘
```
- **Local access** — CPU → vlastní memory (nízká latence, plná bandwidth)
- **Remote access** — CPU → druhý socket memory (vyšší latence, ~1.5×, nižší bandwidth)
- NUMA-aware aplikace: databáze, VM, DPDK, AI training
### Cross-NUMA penalty
| CPU | Local latency | Remote latency | Penalty |
|-----|--------------|----------------|---------|
| AMD EPYC (Genoa) | ~80 ns | ~150 ns | ~1.9× |
| Intel Xeon (Sapphire Rapids) | ~90 ns | ~160 ns | ~1.8× |
## TDP a chlazení
| CPU | TDP | Core count | Chlazení |
|-----|-----|-----------|----------|
| Intel Xeon Platinum 8480+ | 350 W | 56 | Air (high-performance) |
| Intel Xeon 6980P (Granite Rapids) | 500 W | 128 | Liquid recommended |
| AMD EPYC 9654 (Genoa) | 360 W | 96 | Air / Liquid |
| AMD EPYC 9965 (Turin) | 500 W | 192 | Liquid recommended |
### Cooling requirements per rack density
| Rack density | kW/rack | Cooling |
|-------------|---------|---------|
| Low | 1-5 kW | Free air cooling |
| Medium | 5-15 kW | CRAC/CRAH, hot/cold aisle |
| High | 15-40 kW | In-row cooling, rear-door HX |
| Ultra | 40-100+ kW | Direct-to-chip liquid, immersion |
## BMC a management
| Vendor | BMC | API | Remote console | Features |
|--------|-----|-----|---------------|----------|
| **Dell** | iDRAC (9/10) | Redfish, RACADM | Virtual Console (HTML5) | Lifecycle Controller, SUU |
| **HPE** | iLO (5/6) | Redfish, iLOREST | Integrated Remote Console | Smart Update Manager, SUM |
| **Supermicro** | BMC / IPMI | IPMI, Redfish | IPMIView, HTML5 KVM | SuperDoctor, SSM |
| **Lenovo** | XClarity Controller | Redfish, IPMI | Remote Console | XClarity Administrator |
| **Cisco** | CIMC / UCSM | Redfish, XML API | KVM Console | UCS Manager, Intersight |
### Standardní funkce
- Power: on/off/cycle/reset
- Boot: one-shot PXE, CD-ROM redirect, BIOS setup
- Monitoring: sensors (temp, voltage, fan, PSU)
- Alerting: SNMP traps, email, Redfish events
- Remote media: ISO mount přes network
- Serial over LAN (SOL)
## Výrobci a řady
| Výrobce | Rack series | Blade series | Management |
|---------|-------------|-------------|------------|
| **Dell** | PowerEdge R6xx/R7xx (R660, R760) | MX7000, FX2 | iDRAC, OpenManage Enterprise |
| **HPE** | ProLiant DL (DL360, DL380) | Synergy, BladeSystem | iLO, OneView, OpsRamp |
| **Cisco** | UCS C-Series (C240, C245) | UCS B-Series, Fabric Interconnect | UCS Manager, Intersight |
| **Lenovo** | ThinkSystem SR (SR630, SR650) | ThinkSystem SN | XClarity |
| **Supermicro** | SuperServer (pro GPU, storage, cloud) | FatTwin, MicroBlade | IPMI, SuperDoctor |
## Server connectivity
Detailní kapitola o síťové a storage konektivitě: [CONNECTIVITY.md](CONNECTIVITY.md)
## Storage controllers
| Controller | Typ | RAID | Cache | Protokol |
|-----------|-----|------|-------|----------|
| **Dell PERC** (H755, H965) | HW RAID | 0/1/5/6/10/50/60 | 4-8 GB NV | NVMe, SAS, SATA |
| **Broadcom / LSI** (9560, 9670) | HW RAID / HBA | 0/1/5/6/10/50/60 | 4 GB NV | NVMe, SAS, SATA |
| **Intel VROC** | SW RAID (CPU) | 0/1/5/10 | — | NVMe only |
| **M.2 HW RAID** (BOSS-S1) | HW RAID | 0/1 | — | 2× M.2 NVMe/SATA |
### IT vs HW RAID mode
| Vlastnost | IT (Initiator Target) / HBA | HW RAID |
|-----------|---------------------------|---------|
| **OS vidí** | Každý disk samostatně | RAID virtuální disk |
| **Caching** | OS cache | RAID controller cache (BBU) |
| **RAID** | Software (mdadm, ZFS, Ceph) | Hardware + SW driver |
| **Passthrough** | Ano | Ne |
| **Use case** | SDS (Ceph, MinIO), ZFS | VMware VMFS, Windows, legacy |
| **Battery/Backup** | Není potřeba | Write-back cache vyžaduje BBU |
## Ceny (2026)
### CPU ceny (2026)
| CPU | Cores | TDP | 1ku cena | $/core |
|-----|-------|-----|----------|--------|
| AMD EPYC 9965 (Turin) | 192 | 500 W | ~$11 988 | $62 |
| AMD EPYC 9655 (Turin) | 96 | 400 W | ~$6 500 | $68 |
| AMD EPYC 9475F (Turin) | 48 | 360 W | ~$5 000 | $104 |
| Intel Xeon 6980P (Granite Rapids) | 128 | 500 W | ~$12 460 | $97 |
| Intel Xeon 6980P (Granite Rapids-AP) | 128 | 500 W | $13 955 | $109 |
| Intel Xeon 6767P (Granite Rapids) | 64 | 350 W | ~$7 000 | $109 |
Sources: AMD 1ku pricing, Intel RCP, Newegg verified.
### DDR5 RDIMM ceny (2026 — AI-driven price surge)
| Kapacita | Rychlost | Cena 2025 | Cena Q2 2026 | Změna |
|----------|---------|-----------|-------------|-------|
| 32 GB (2R×8) | DDR5-5600 | ~$95 | ~$400550 | +400500 % |
| 64 GB (2R×4) | DDR5-4800 | ~$180 | ~$700900 | +400 % |
| 96 GB (2R×4) | DDR5-6400 | ~$300 | ~$1 2001 600 | +400 % |
| 128 GB (2R×4) | DDR5-6400 | ~$450 | ~$1 8002 500 | +450 % |
| 256 GB (LRDIMM) | DDR5-6400 | ~$900 | ~$4 0005 000 | +450 % |
Trend: DDR5 ceny vzrostly ~400500 % od mid-2025 kvůli AI-driven poptávce. Očekává se další růst v H2 2026. Zdroj: Counterpoint, TrendForce.
### NVMe SSD ceny (enterprise, 2026)
| Kapacita | Typ | Cena 2024 | Cena Q2 2026 | Změna |
|----------|-----|-----------|-------------|-------|
| 1.92 TB | NVMe U.3 (read-intensive) | ~$200 | ~$500600 | +150 % |
| 3.84 TB | NVMe U.3 (mixed-use) | ~$400 | ~$1 0001 200 | +150 % |
| 7.68 TB | NVMe U.3 (mixed-use) | ~$800 | ~$2 0002 500 | +150 % |
| 15.36 TB | NVMe U.3 (mixed-use) | ~$1 500 | ~$4 0005 000 | +170 % |
Trend: NAND flash ceny vzrostly ~100200 % od 2025, průměrný enterprise SSD stojí 23× více. Zdroj: TrendForce, Xinnor.
### Celková cena serveru (příkladové konfigurace)
| Konfigurace | CPU | RAM | Disk | Odhad ceny |
|-------------|-----|-----|------|-----------|
| DB server (OLTP) | 2× EPYC 9655 (96C) | 1 TB DDR5 | 6× 1.92 TB NVMe | ~$45 00060 000 |
| GPU server (AI) | 2× Xeon 6980P | 2 TB DDR5 | 4× 3.84 TB NVMe | ~$80 000120 000 (bez GPU) |
| Hypervisor host | 2× EPYC 9475F (48C) | 512 GB DDR5 | 2× 1.92 TB NVMe + 4× 16 TB HDD | ~$25 00035 000 |
| Storage server (Ceph) | 1× EPYC 9655 (96C) | 256 GB DDR5 | 24× 15.36 TB NVMe | ~$60 00080 000 |
## Zdroje
Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
*Poslední revize: 2026-06-03*

334
STORAGE.en.md Normal file
View File

@@ -0,0 +1,334 @@
# 💾 Storage infrastructure
## Storage types
| Type | Description | Latency | Use case |
|-----|-------|---------|----------|
| **DAS** (Direct Attached) | Disks directly in server | <0.1 ms | OS, cache, local data |
| **SAN** (Storage Area Network) | Block devices over network | <1 ms | Databases, VM datastores |
| **NAS** (Network Attached Storage) | File access (NFS, SMB) | 1-3 ms | Shared files, home dirs |
| **Object storage** | REST API, flat namespace | 10-100 ms | Backups, media, big data |
## Protocols
| Protocol | Type | Speed | Note |
|----------|-----|----------|----------|
| **Fibre Channel** | SAN | 8/16/32/64 Gbps | Low latency, dedicated network |
| **iSCSI** | SAN (IP) | 1/10/25 GbE | Cheaper, over ethernet |
| **NVMe-oF** | SAN (NVMe) | 25/50/100 GbE | Lowest latency, emerging |
| **NFS** | NAS | 1/10/25 GbE | Universal, simple |
| **SMB/CIFS** | NAS | 1/10/25 GbE | Windows native |
| **S3 API** | Object | — | Standard for object storage |
## RAID
| RAID | Min. disks | Capacity | Protection | Read speed | Write speed | Use case |
|------|-----------|----------|---------|---------------|----------------|----------|
| **0** | 2 | 100 % | None | N × (striping) | N × | Temp data, cache (risky) |
| **1** | 2 | 50 % | 1 disk | N × (mirror) | 1 × | OS disk, critical data |
| **5** | 3 | 67-94 % | 1 disk | N-1 × | N-1 × (parity write penalty) | Universal file/VM storage |
| **6** | 4 | 50-88 % | 2 disks | N-2 × | N-2 × (double parity) | Large capacities, important data |
| **10** | 4 | 50 % | 1/mirror | N × | N/2 × | Databases, VM, high-performance |
| **50** | 6 | 67-94 % | 1/stripe | N-1 × | N-1 × | Large capacity + performance |
| **60** | 8 | 50-88 % | 2/stripe | N-2 × | N-2 × | Enterprise |
### Stripe size
- Small stripe (16-64 KB) — better IOPS, worse throughput (databases, OLTP)
- Large stripe (128-1024 KB) — better throughput, worse IOPS (video, media, backup)
- Write hole on RAID 5/6: metadata inconsistency during power loss while writing parity (prevention: non-volatile cache, battery-backed RAID controller)
## Software-Defined Storage (SDS)
| Tool | Type | Use case |
|---------|-----|----------|
| **Ceph** | Object/Block/File (RADOS) | Universal SDS, OpenStack, Kubernetes |
| **MinIO** | Object (S3 API) | High-performance S3, AI/ML data lake |
| **GlusterFS** | Distributed File | Shared filesystem, POSIX |
| **Longhorn** | Block (Kubernetes) | K8s PVC, microservices |
| **Linstor** | Block (DRBD + LVM) | Linux SDS, Kubernetes |
| **VMware vSAN** | Block (HCI) | VMware ecosystem |
| **StarWind** | Block (HCI) | Hyper-V / VMware |
### Ceph
**Architecture**:
```
RADOS (Reliable Autonomic Distributed Object Store)
├── Monitors (MON) — cluster map, quorum (3/5)
├── Managers (MGR) — dashboard, balancer, orchestrator
├── OSDs (Object Storage Daemons) — data + replication
└── MDS (Metadata Server) — CephFS only
```
**CRUSH map** (Controlled Replication Under Scalable Hashing):
- Algorithm for calculating data placement (no central index)
- Layers: Root → Datacenter → Rack → Host → OSD
- Failure domain: replication across racks / hosts
- `ceph osd crush rule create-replicated replicated_rule default host`
**Access interfaces**:
| Interface | Type | Use case |
|----------|-----|----------|
| **RBD** (RADOS Block Device) | Block | VM images, Kubernetes PVC (csi-rbd) |
| **RGW** (RADOS Gateway) | Object (S3/Swift API) | S3-compatible storage, backup |
| **CephFS** | File (POSIX) | Shared filesystem, home dirs |
| **NFS-Ganesha** | File (NFS) | NFS export over CephFS |
**Erasure coding**:
- K+M (data + parity chunks), e.g. 8+3 (8 data, 3 parity)
- More space-efficient than 3× replication (1.375× vs 3×)
- Higher CPU overhead, lower IOPS
- Recommended for cold data (RGW) instead of replication
## Enterprise storage vendors
### Hitachi VSP (Virtual Storage Platform)
| Model | Architecture | Max capacity | IOPS / Latency | Protocols | Use case |
|-------|-------------|--------------|----------------|-----------|----------|
| **VSP 5200/5600** | Active-active, scale-up/out, 212 controllers | 69.3 PB raw, 287 PBe | 33M IOPS, 39 µs | FC-NVMe 32Gb, FC 16/32Gb, FICON 16Gb, iSCSI 10Gb | Mission-critical, mainframe, enterprise consolidation |
| **VSP E590/E790/E1090** | Symmetric active-active, up to 65 nodes/130 controllers | 10.62 PB raw (E1090) | 8.4M IOPS, <41 µs | FC 32Gb, iSCSI 25Gb, FC-NVMe 32Gb | Midrange enterprise, hybrid workloads |
**Key features**: SVOS common across entire portfolio, AI-driven data reduction 4:1 guarantee, Global-Active Device metro clustering, 8 nines availability (HW), 100% data availability guarantee.
---
### Huawei OceanStor Dorado
| Model | Architecture | Max capacity | IOPS / Latency | Protocols | Use case |
|-------|-------------|--------------|----------------|-----------|----------|
| **Dorado 8000/18000 V6** | SmartMatrix full-mesh, up to 32 controllers | 32 TB cache, 6400 SSD | 40M IOPS, 0.05 ms | FC 32/64Gb, FC-NVMe, iSCSI, NFS, SMB, NVMe/RoCE, S3 | Mission-critical, finance, govt, carrier |
| **Dorado 8000/18000 V7 (2025)** | SmartMatrix 4.0, up to 64/128 controllers | 500 PB+ | >100M IOPS, 0.03 ms | FC, RoCE, NVMe/TCP, NFS, SMB, S3 | AI workloads, converged block/file/object |
**Key features**: SmartMatrix survives 7/8 controllers, FlashEver (3-gen online HW upgrade in 10 years), RAID-TP (triple SSD failure), DPU-based SmartNIC, ML-based I/O prefetch, 100% ransomware detection (Tolly), #1 SPC-1 benchmark.
---
### Dell PowerStore & PowerMax
| Model | Architecture | Max capacity | IOPS / Latency | Protocols | Use case |
|-------|-------------|--------------|----------------|-----------|----------|
| **PowerStore 1500/5500/9500 (Gen 3)** | Active-active dual-node, PCIe Gen5, DDR5, RDMA 200GbE | 1.2 PB raw, 5.8 PBe | 3× IOPS vs Gen2 | FC 32/64Gb, iSCSI, NVMe/FC, NVMe/TCP, NFSv4, SMB3 | Midrange-to-high-end, VMware, containerized |
| **PowerMax 2500/8500** | Scale-out NVMe, Dynamic Fabric, up to 16 nodes | 8.8 PBe (2500), 18 PBe (8500) | 6 nines availability | FC 64Gb, FICON, NVMe/FC, NVMe/TCP, iSCSI, NFS, SMB | Mission-critical, mainframe, OLTP, cyber vault |
**Key features**: PowerStore 6:1 DRR guarantee, unified block/file/vVols out of box, Cyber Detect AI anomaly; PowerMax 5:1 DRR, Secure Snapshots 65M, SRDF/Metro, Flexible RAID up to 92% efficient, FIPS 140-3.
---
### HPE Alletra
| Model | Architecture | Max capacity | IOPS / Latency | Protocols | Use case |
|-------|-------------|--------------|----------------|-----------|----------|
| **Alletra 5000** | Active-active hybrid flash, dual controller | 1.2 PB raw | 99.9999% guarantee | FC, iSCSI | Mixed primary + secondary, cost-efficient hybrid |
| **Alletra 6000** | Active-active all-NVMe, dual controller | ~368 TB usable | <100 µs | FC, iSCSI | Business-critical DB, VDI, VMware |
| **Alletra 9000** | Active-active all-NVMe, multi-node scale-out | 24 PB+ usable | ~23M IOPS, <150 µs | FC, iSCSI, NVMe/FC | Mission-critical ERP, AI, consolidation |
| **Alletra Storage MP** | Disaggregated modular, block + file + object | 5.8 PB block, 11.8 PB object | 100% availability guarantee | FC, iSCSI, NVMe/FC, NFS, SMB, S3 | Multi-protocol consolidation, AI/analytics |
**Key features**: Triple Parity RAID (5000), InfoSight AI Ops, HPE GreenLake as-a-service, non-disruptive controller upgrades (MP), 100% data availability guarantee.
---
### Infinidat
| Model | Architecture | Max capacity | IOPS / Latency | Protocols | Use case |
|-------|-------------|--------------|----------------|-----------|----------|
| **InfiniBox SSA G4** | Triple-active controller, AMD EPYC PCIe 5.0, DDR5 | 1.97 PB usable / 5.9 PBe | 2.24M IOPS, 35 µs | FC 32Gb, 25/100GbE, NVMe-oF/TCP, iSCSI, NFS, SMB, S3 | Mission-critical Oracle/SQL, multi-site DR |
| **InfiniBox G4 Hybrid** | Triple-active hybrid (HDD + flash cache) | 10.9 PB raw / 32.8 PBe | 2.24M IOPS, 64 GB/s | FC, Ethernet, NVMe-oF, iSCSI, NFS, SMB, S3 | Backup, massive unstructured data |
**Key features**: Only 3-way active on the market, Neural Cache (ML-driven), InfiniRAID, Immutable snapshots, 100% availability + 1-min snapshot recovery guarantee, everything included in base price (no extra licensing).
---
### Pure Storage FlashArray
| Model | Architecture | Max capacity | IOPS / Latency | Protocols | Use case |
|-------|-------------|--------------|----------------|-----------|----------|
| **FlashArray//X (X20X90 R5)** | Active-active, NVMe DirectFlash | 1.2 PB raw / 4.4 PBe | 250 µs, 5:1 DRR | FC, NVMe/FC, NVMe/RoCE, NVMe/TCP, iSCSI, NFS, SMB | Mission-critical DB, VMware, enterprise |
| **FlashArray//C (C50C90 R5)** | Active-active, QLC DirectFlash | 4.2 PB raw / 16.3 PBe | 5:1 DRR | FC, NVMe-oF, iSCSI, NFS, SMB | Capacity-optimized, backup, file |
| **FlashArray//XL (XL190)** | Active-active, 40 DirectFlash modules | 1.9 PB raw / 9.4 PBe | >4M IOPS, <100 µs, 45 GB/s | FC 64Gb, 100GbE RoCE, NVMe/FC, NVMe/TCP, NFS, SMB | Largest DB consolidation, OLTP |
**Key features**: DirectFlash (no FTL layer), 99.9999% availability, Evergreen (never forklift upgrade), Purity OS unified across entire portfolio, ActiveCluster/ActiveDR, Pure1 AIOps.
---
### Lenovo ThinkSystem
| Model | Architecture | Max capacity | IOPS / Latency | Protocols | Use case |
|-------|-------------|--------------|----------------|-----------|----------|
| **DM Series** (DM3200F/5200F/7200F) | Active-active, all-NVMe, NetApp ONTAP | 1.8 PB raw / 6.8 PBe | Up to 120 NVMe SSD | FC 64Gb, iSCSI, NVMe/FC, NFS, SMB, S3 | Unified block/file, AI/ML, VMware |
| **DG Series** (DG5200/7200) | Active-active, all-QLC, ONTAP | 7.4 PB raw / 27 PBe | QLC economics | FC, NVMe/FC, NVMe/TCP, iSCSI, NFS, SMB, S3 | Capacity-optimized, backup, archive |
| **DE Series** (DE4000FDE6600F) | Active-active, SAS/NVMe hybrid | 1.84 PB raw | 2M IOPS, <100 µs, 44 GB/s | FC 32Gb, iSCSI 25Gb, NVMe/FC, SAS, NVMe/RoCE | HPC, analytics, video surveillance |
**Key features**: DM/DG use ONTAP (SnapMirror, SnapVault, FabricPool, RAID-DP/RAID-TEC); cluster scale-out up to 12 HA pairs; DE series best price/performance in portfolio.
---
### Synology
| Model | Architecture | Max capacity | Protocols | Use case |
|-------|-------------|--------------|-----------|----------|
| **UC3200/UC3400** | Active-active dual-controller, SAS backend | 576 TB raw | iSCSI, FC 16Gb, 10/25GbE | SMB/midmarket SAN, VMware, HA |
| **DS/RS Series** (RS3626xs+, RS6426xs+) | Single-controller / HA pair, Btrfs | 864 TB raw, 1 PB volume | SMB, NFS, iSCSI, FC (HBA) | SME all-in-one NAS/SAN, backup, surveillance |
**Key features**: DSM UC for SAN, Synology HA, Snapshot Replication (16K snapshots), VMware VAAI/ODX/ALUA, Surveillance Station, low TCO.
---
### Vendor comparison — overview
| Vendor | Flagship | Max IOPS | Max capacity | Latency | Availability guarantee | Main differentiator |
|--------|----------|----------|-------------|---------|---------------------|----------------------|
| **Hitachi** | VSP 5600 | 33M | 287 PBe | 39 µs | 8 nines (HW) | Mainframe + open; 65-node cluster |
| **Huawei** | Dorado 18000 V7 | >100M | 500 PB+ | 0.03 ms | 99.99999% | SmartMatrix; #1 SPC-1 |
| **Dell** | PowerMax 8500 | — | 18 PBe | — | 6 nines | SRDF/Metro; mainframe |
| **HPE** | Alletra 9000/MP | ~3M | 11.8 PBe | <150 µs | 100% data guarantee | InfoSight AIOps; GreenLake |
| **Infinidat** | InfiniBox SSA G4 | 2.24M | 32.8 PBe | 35 µs | 100% availability | 3-way active; Neural Cache |
| **Pure** | FlashArray//XL | >4M | 16.3 PBe | <100 µs | 99.9999% | DirectFlash; Evergreen |
| **Lenovo** | DM7200F | — | 27 PBe | — | — | ONTAP ecosystem; broad portfolio |
| **Synology** | UC3400 | 690K | 576 TB | — | — | Lowest price for active-active SAN |
---
### Storage selection by use case
| Use case | Recommendation | Rationale |
|----------|-----------|-------------|
| **Mainframe + open hybrid** | Hitachi VSP / Dell PowerMax | Only ones with FICON + FC simultaneously |
| **AI/ML training** | Huawei Dorado V7 / Pure //XL | Highest IOPS, lowest latency |
| **Enterprise DB (Oracle, SQL Server)** | Infinidat / Pure //X | Low latency, consistent performance |
| **Virtualization (VMware, Hyper-V)** | Dell PowerStore / HPE Alletra 6000 | VAAI, vVols, InfoSight |
| **SMB / SME** | Synology / Lenovo DE | Low TCO, simple management |
| **Object storage / backup** | Pure //C / Lenovo DG / Infinidat Hybrid | QLC economics, high capacity |
| **Multi-protocol consolidation** | HPE Alletra MP / Huawei Dorado | Block + file + object in one platform |
## Decision diagram — storage platform selection
```mermaid
flowchart TD
Start(["Storage requirement"]) --> PROTO{"Access type"}
PROTO -->|"Block (SAN)"| BLOCK
PROTO -->|"File (NAS)"| FILE
PROTO -->|"Object"| OBJECT
BLOCK --> BPERF{"Performance tier"}
BPERF -->|"Tier 0/1<br/>< 100 µs, > 1M IOPS"| BT1["Infinidat / Pure //XL<br/>Huawei Dorado V7<br/>FC-NVMe, NVMe-oF"]
BPERF -->|"Tier 2<br/>100-500 µs"| BT2["Dell PowerStore / HPE Alletra 6000<br/>Hitachi VSP / Lenovo DM<br/>FC 32G, iSCSI 25GbE"]
BPERF -->|"Tier 3<br/>SME / low-cost"| BT3["Synology UC3400<br/>Lenovo DE / Dell PowerVault<br/>iSCSI, SAS"]
BLOCK --> BECOS{"Ecosystem"}
BECOS -->|"Mainframe"| BMF["Hitachi VSP / Dell PowerMax<br/>FICON + FC simultaneously"]
BECOS -->|"VMware"| BVM["Dell PowerStore / HPE Alletra<br/>VAAI, vVols, InfoSight"]
BECOS -->|"Oracle / SQL Server"| BDB["Infinidat / Pure //X<br/>Lowest latency"]
FILE --> FSIZE{"Scaling"}
FSIZE -->|"Enterprise"| FE["HPE Alletra MP (file)<br/>Lenovo DM / Dell PowerScale<br/>NFS, SMB, multi-protocol"]
FSIZE -->|"SMB"| FS["Synology DS/RS<br/>Lenovo DE / TrueNAS<br/>Btrfs, NFS, SMB, low TCO"]
OBJECT --> OUSE{"Use case"}
OUSE -->|"Backup / archive"| OB["Pure //C / Infinidat Hybrid<br/>Lenovo DG<br/>QLC, erasure coding, low cost/TB"]
OUSE -->|"AI/ML data lake"| OM["MinIO / Pure //C<br/>High throughput S3<br/>NVMe direct, erasure coding"]
OUSE -->|"Kubernetes PVC"| OK["Ceph RBD / Longhorn / Linstor<br/>SDS on K8s<br/>CSI, replication, snapshots"]
```
## OpenStack Storage
OpenStack offers three main storage services:
| Service | Type | Description |
|--------|-----|-------|
| **Cinder** | Block storage | Persistent volumes for instances (iSCSI, NFS, Ceph RBD) |
| **Swift** | Object storage | RESTful object store (S3-compatible via middleware) |
| **Manila** | File storage | Shared file systems (NFS, CIFS) as a managed service |
### Cinder (Block Storage)
- Multi-backend support: LVM, Ceph RBD, NFS, iSCSI, Fibre Channel
- Snapshoting, cloning, encryption at rest
- Cinder scheduler for volume distribution across backends
- QoS specs for IOPS/bandwidth limits
### Swift (Object Storage)
- Alternative to S3 for on-prem object storage
- Ring-based data distribution (consistent hashing)
- Multi-region replication (syncopy)
- Stateless REST API (RESTful, no single point of failure)
### Manila (Shared File Systems)
- Managed NFS/CIFS for sharing between instances
- Backends: NetApp, Dell EMC, CephFS, GlusterFS
- Access rules (IP-based, cert-based, user-based)
- Use case: HPC cluster home directories, NAS for legacy apps
### Container storage (OpenStack + Ceph)
Ceph is the most common storage backend for OpenStack: Cinder (RBD), Swift (RGW), Manila (CephFS), Glance (RBD images).
## Big Data storage
### HDFS cluster
HDFS is the primary storage for the Hadoop ecosystem (on-prem). Typical configuration:
| Parameter | Value | Note |
|-----------|-------|------|
| **Disk per DataNode** | 824 × HDD (1422 TB) + 2× NVMe (metadata, cache) | Balance capacity / performance |
| **Replication factor** | 3× | Rack-aware |
| **Network** | 2× 25/100 GbE (data) + 1× 1 GbE (management) | Data + replication traffic |
| **RAM** | 64256 GB (OS cache + metadata) | HDFS cache + OS buffer cache |
| **CPU** | 1632 cores | HDFS overhead is low |
| **NameNode HA** | Active + Standby + JN (JournalNode) | Quorum-based HA |
| **Use case** | Sequential read/write, large files, Spark YARN |
**Model cluster — 1 PB usable:**
- 10× DataNode (12× 18 TB HDD, 2× 1.9 TB NVMe)
- 2× NameNode (HA, 256 GB RAM)
- 3× JournalNode (small VMs)
- Replication 3× → raw ~ 2.2 PB
- Network: 25 GbE for data, 100 GbE for shuffle-heavy Spark
### Object storage as Data Lake (S3/GCS/MinIO)
For new projects (Spark on K8s, Iceberg/Delta, lakehouse), object storage is preferred over HDFS:
| Platform | Advantages | Limits |
|----------|-----------|--------|
| **MinIO** (on-prem) | S3 API, erasure coding, NVMe direct, high throughput | Single tenant (per cluster) |
| **Pure //C** (on-prem) | QLC NVMe, dedupe, S3 + NFS | Higher $/TB |
| **AWS S3** (cloud) | Unlimited capacity, Iceberg/Delta support | Egress fees |
| **Azure ADLS** (cloud) | Hierarchical namespace, HNS, POSIX-like ACLs | Vendor lock |
| **GCP GCS** (cloud) | Uniform + fine-grained ACLs, object versioning | Region restrictions |
### Comparison: HDFS vs Object Storage for Big Data
| Criteria | HDFS | Object Storage (S3/MinIO) |
|----------|------|-------------------------|
| **Architecture** | Master/worker (NameNode SPOF) | Distributed, no SPOF (erasure coding) |
| **Consistency** | Strong (single writer per file) | Eventual (S3) / Strong (MinIO) |
| **Throughput** | High (rack-aware, locality) | High (network-bound) |
| **Scaling** | Horizontal (DataNode) | Horizontal (stateless) |
| **Cost** | Low (HDD) | Medium (S3 API) |
| **Metadata** | NameNode (1M blocks ~ 1 GB RAM) | Object-level (flat namespace) |
| **Spark integration** | Native (locality-optimized) | S3A connector, Hadoop Compatible |
| **2026 trend** | Legacy, declining | Standard for new projects |
For more information about Big Data see [BIG-DATA.en.md](BIG-DATA.en.md).
## Sources
Links, books and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)
### Recommended reading
| Book | Authors | ISBN | Description |
|-------|--------|------|-------|
| Storage Systems | Ganger, Gibson | 978-1680837540 | Textbook covering the design, implementation and operation of storage systems — from device characteristics through OS, databases and networking to server distribution and large-scale systems. An essential resource for storage infrastructure architects. |
*Last revision: 2026-06-03*

334
STORAGE.md Normal file
View File

@@ -0,0 +1,334 @@
# 💾 Storage infrastruktura
## Typy úložišť
| Typ | Popis | Latence | Use case |
|-----|-------|---------|----------|
| **DAS** (Direct Attached) | Disky přímo v serveru | <0.1 ms | OS, cache, lokální data |
| **SAN** (Storage Area Network) | Bloková zařízení po síti | <1 ms | Databáze, VM datastory |
| **NAS** (Network Attached Storage) | Souborový přístup (NFS, SMB) | 1-3 ms | Sdílené soubory, home dirs |
| **Object storage** | REST API, flat namespace | 10-100 ms | Zálohy, media, big data |
## Protokoly
| Protokol | Typ | Rychlost | Poznámka |
|----------|-----|----------|----------|
| **Fibre Channel** | SAN | 8/16/32/64 Gbps | Nízká latence, dedikovaná síť |
| **iSCSI** | SAN (IP) | 1/10/25 GbE | Levnější, po ethernetu |
| **NVMe-oF** | SAN (NVMe) | 25/50/100 GbE | Nejnižší latence, emerging |
| **NFS** | NAS | 1/10/25 GbE | Univerzální, jednoduchý |
| **SMB/CIFS** | NAS | 1/10/25 GbE | Windows native |
| **S3 API** | Object | — | Standard pro object storage |
## RAID
| RAID | Min. disků | Kapacita | Ochrana | Rychlost čtení | Rychlost zápisu | Use case |
|------|-----------|----------|---------|---------------|----------------|----------|
| **0** | 2 | 100 % | Žádná | N × (striping) | N × | Temp data, cache (risky) |
| **1** | 2 | 50 % | 1 disk | N × (mirror) | 1 × | OS disk, kritická data |
| **5** | 3 | 67-94 % | 1 disk | N-1 × | N-1 × (parity write penalty) | Univerzální file/VM storage |
| **6** | 4 | 50-88 % | 2 disky | N-2 × | N-2 × (double parity) | Velké kapacity, důležitá data |
| **10** | 4 | 50 % | 1/mirror | N × | N/2 × | Databáze, VM, high-performance |
| **50** | 6 | 67-94 % | 1/stripe | N-1 × | N-1 × | Large capacity + performance |
| **60** | 8 | 50-88 % | 2/stripe | N-2 × | N-2 × | Enterprise |
### Stripe size
- Malý stripe (16-64 KB) — lepší IOPS, horší throughput (databáze, OLTP)
- Velký stripe (128-1024 KB) — lepší throughput, horší IOPS (video, media, backup)
- Write hole u RAID 5/6: při výpadku během zápisu parity je metadata nekonzistentní (prevence: non-volatile cache, battery-backed RAID controller)
## Software-Defined Storage (SDS)
| Nástroj | Typ | Use case |
|---------|-----|----------|
| **Ceph** | Object/Block/File (RADOS) | Univerzální SDS, OpenStack, Kubernetes |
| **MinIO** | Object (S3 API) | High-performance S3, AI/ML data lake |
| **GlusterFS** | Distributed File | Shared filesystem, POSIX |
| **Longhorn** | Block (Kubernetes) | K8s PVC, mikroservisy |
| **Linstor** | Block (DRBD + LVM) | Linux SDS, Kubernetes |
| **VMware vSAN** | Block (HCI) | VMware ecosystem |
| **StarWind** | Block (HCI) | Hyper-V / VMware |
### Ceph
**Architektura**:
```
RADOS (Reliable Autonomic Distributed Object Store)
├── Monitors (MON) — cluster map, quorum (3/5)
├── Managers (MGR) — dashboard, balancer, orchestrator
├── OSDs (Object Storage Daemons) — data + replikace
└── MDS (Metadata Server) — pouze pro CephFS
```
**CRUSH map** (Controlled Replication Under Scalable Hashing):
- Algoritmus pro výpočet umístění dat (žádný centrální index)
- Vrstvy: Root → Datacenter → Rack → Host → OSD
- Failure domain: replikace napříč racky / hosty
- `ceph osd crush rule create-replicated replicated_rule default host`
**Přístupová rozhraní**:
| Rozhraní | Typ | Use case |
|----------|-----|----------|
| **RBD** (RADOS Block Device) | Block | VM images, Kubernetes PVC (csi-rbd) |
| **RGW** (RADOS Gateway) | Object (S3/Swift API) | S3-kompatibilní storage, backup |
| **CephFS** | File (POSIX) | Shared filesystem, home dirs |
| **NFS-Ganesha** | File (NFS) | NFS export přes CephFS |
**Erasure coding**:
- K+M (data + parity chunks), např. 8+3 (8 data, 3 parity)
- Prostorově efektivnější než 3× replikace (1.375× vs 3×)
- Vyšší CPU režie, nižší IOPS
- Doporučeno pro cold data (RGW) místo replikace
## Výrobci enterprise storage
### Hitachi VSP (Virtual Storage Platform)
| Model | Architektura | Max kapacita | IOPS / Latence | Protokoly | Use case |
|-------|-------------|--------------|----------------|-----------|----------|
| **VSP 5200/5600** | Active-active, scale-up/out, 212 controllerů | 69.3 PB raw, 287 PBe | 33M IOPS, 39 µs | FC-NVMe 32Gb, FC 16/32Gb, FICON 16Gb, iSCSI 10Gb | Mission-critical, mainframe, enterprise consolidation |
| **VSP E590/E790/E1090** | Symmetric active-active, až 65 nodů/130 controllerů | 10.62 PB raw (E1090) | 8.4M IOPS, <41 µs | FC 32Gb, iSCSI 25Gb, FC-NVMe 32Gb | Midrange enterprise, hybrid workloads |
**Klíčové vlastnosti**: SVOS společný pro celé portfolio, AI-driven data reduction 4:1 garance, Global-Active Device metro clustering, 8 nines availability (HW), 100% data availability guarantee.
---
### Huawei OceanStor Dorado
| Model | Architektura | Max kapacita | IOPS / Latence | Protokoly | Use case |
|-------|-------------|--------------|----------------|-----------|----------|
| **Dorado 8000/18000 V6** | SmartMatrix full-mesh, až 32 controllerů | 32 TB cache, 6400 SSD | 40M IOPS, 0.05 ms | FC 32/64Gb, FC-NVMe, iSCSI, NFS, SMB, NVMe/RoCE, S3 | Mission-critical, finance, govt, carrier |
| **Dorado 8000/18000 V7 (2025)** | SmartMatrix 4.0, až 64/128 controllerů | 500 PB+ | >100M IOPS, 0.03 ms | FC, RoCE, NVMe/TCP, NFS, SMB, S3 | AI workloads, converged block/file/object |
**Klíčové vlastnosti**: SmartMatrix přežije 7/8 controllerů, FlashEver (3-gen online HW upgrade za 10 let), RAID-TP (triple SSD failure), DPU-based SmartNIC, ML-based I/O prefetch, 100% ransomware detection (Tolly), #1 SPC-1 benchmark.
---
### Dell PowerStore & PowerMax
| Model | Architektura | Max kapacita | IOPS / Latence | Protokoly | Use case |
|-------|-------------|--------------|----------------|-----------|----------|
| **PowerStore 1500/5500/9500 (Gen 3)** | Active-active dual-node, PCIe Gen5, DDR5, RDMA 200GbE | 1.2 PB raw, 5.8 PBe | 3× IOPS oproti Gen2 | FC 32/64Gb, iSCSI, NVMe/FC, NVMe/TCP, NFSv4, SMB3 | Midrange-to-high-end, VMware, containerized |
| **PowerMax 2500/8500** | Scale-out NVMe, Dynamic Fabric, až 16 nodů | 8.8 PBe (2500), 18 PBe (8500) | 6 nines availability | FC 64Gb, FICON, NVMe/FC, NVMe/TCP, iSCSI, NFS, SMB | Mission-critical, mainframe, OLTP, cyber vault |
**Klíčové vlastnosti**: PowerStore 6:1 DRR garance, unified block/file/vVols out of box, Cyber Detect AI anomaly; PowerMax 5:1 DRR, Secure Snapshots 65M, SRDF/Metro, Flexible RAID až 92% efficient, FIPS 140-3.
---
### HPE Alletra
| Model | Architektura | Max kapacita | IOPS / Latence | Protokoly | Use case |
|-------|-------------|--------------|----------------|-----------|----------|
| **Alletra 5000** | Active-active hybrid flash, dual controller | 1.2 PB raw | 99.9999% garance | FC, iSCSI | Mixed primary + secondary, cost-efficient hybrid |
| **Alletra 6000** | Active-active all-NVMe, dual controller | ~368 TB usable | <100 µs | FC, iSCSI | Business-critical DB, VDI, VMware |
| **Alletra 9000** | Active-active all-NVMe, multi-node scale-out | 24 PB+ usable | ~23M IOPS, <150 µs | FC, iSCSI, NVMe/FC | Mission-critical ERP, AI, consolidation |
| **Alletra Storage MP** | Disaggregated modular, block + file + object | 5.8 PB block, 11.8 PB object | 100% availability garance | FC, iSCSI, NVMe/FC, NFS, SMB, S3 | Multi-protocol consolidation, AI/analytics |
**Klíčové vlastnosti**: Triple Parity RAID (5000), InfoSight AI Ops, HPE GreenLake as-a-service, non-disruptive controller upgrades (MP), 100% data availability guarantee.
---
### Infinidat
| Model | Architektura | Max kapacita | IOPS / Latence | Protokoly | Use case |
|-------|-------------|--------------|----------------|-----------|----------|
| **InfiniBox SSA G4** | Triple-active controller, AMD EPYC PCIe 5.0, DDR5 | 1.97 PB usable / 5.9 PBe | 2.24M IOPS, 35 µs | FC 32Gb, 25/100GbE, NVMe-oF/TCP, iSCSI, NFS, SMB, S3 | Mission-critical Oracle/SQL, multi-site DR |
| **InfiniBox G4 Hybrid** | Triple-active hybrid (HDD + flash cache) | 10.9 PB raw / 32.8 PBe | 2.24M IOPS, 64 GB/s | FC, Ethernet, NVMe-oF, iSCSI, NFS, SMB, S3 | Backup, massive unstructured data |
**Klíčové vlastnosti**: 3-way active jediný na trhu, Neural Cache (ML-driven), InfiniRAID, Immutable snapshots, 100% availability + 1-min snapshot recovery garance, vše v základní ceně (žádný extra licensing).
---
### Pure Storage FlashArray
| Model | Architektura | Max kapacita | IOPS / Latence | Protokoly | Use case |
|-------|-------------|--------------|----------------|-----------|----------|
| **FlashArray//X (X20X90 R5)** | Active-active, NVMe DirectFlash | 1.2 PB raw / 4.4 PBe | 250 µs, 5:1 DRR | FC, NVMe/FC, NVMe/RoCE, NVMe/TCP, iSCSI, NFS, SMB | Mission-critical DB, VMware, enterprise |
| **FlashArray//C (C50C90 R5)** | Active-active, QLC DirectFlash | 4.2 PB raw / 16.3 PBe | 5:1 DRR | FC, NVMe-oF, iSCSI, NFS, SMB | Capacity-optimized, backup, file |
| **FlashArray//XL (XL190)** | Active-active, 40 DirectFlash modulů | 1.9 PB raw / 9.4 PBe | >4M IOPS, <100 µs, 45 GB/s | FC 64Gb, 100GbE RoCE, NVMe/FC, NVMe/TCP, NFS, SMB | Největší DB konsolidace, OLTP |
**Klíčové vlastnosti**: DirectFlash (bez FTL vrstvy), 99.9999% availability, Evergreen (nikdy forklift upgrade), Purity OS jednotný napříč celým portfoliem, ActiveCluster/ActiveDR, Pure1 AIOps.
---
### Lenovo ThinkSystem
| Model | Architektura | Max kapacita | IOPS / Latence | Protokoly | Use case |
|-------|-------------|--------------|----------------|-----------|----------|
| **DM Series** (DM3200F/5200F/7200F) | Active-active, all-NVMe, NetApp ONTAP | 1.8 PB raw / 6.8 PBe | Až 120 NVMe SSD | FC 64Gb, iSCSI, NVMe/FC, NFS, SMB, S3 | Unified block/file, AI/ML, VMware |
| **DG Series** (DG5200/7200) | Active-active, all-QLC, ONTAP | 7.4 PB raw / 27 PBe | QLC ekonomie | FC, NVMe/FC, NVMe/TCP, iSCSI, NFS, SMB, S3 | Capacity-optimized, backup, archive |
| **DE Series** (DE4000FDE6600F) | Active-active, SAS/NVMe hybrid | 1.84 PB raw | 2M IOPS, <100 µs, 44 GB/s | FC 32Gb, iSCSI 25Gb, NVMe/FC, SAS, NVMe/RoCE | HPC, analytics, video surveillance |
**Klíčové vlastnosti**: DM/DG využívají ONTAP (SnapMirror, SnapVault, FabricPool, RAID-DP/RAID-TEC); cluster scale-out až 12 HA párů; DE série nejlepší poměr cena/výkon v portfoliu.
---
### Synology
| Model | Architektura | Max kapacita | Protokoly | Use case |
|-------|-------------|--------------|-----------|----------|
| **UC3200/UC3400** | Active-active dual-controller, SAS backend | 576 TB raw | iSCSI, FC 16Gb, 10/25GbE | SMB/midmarket SAN, VMware, HA |
| **DS/RS Series** (RS3626xs+, RS6426xs+) | Single-controller / HA pair, Btrfs | 864 TB raw, 1 PB volume | SMB, NFS, iSCSI, FC (HBA) | SME all-in-one NAS/SAN, backup, surveillance |
**Klíčové vlastnosti**: DSM UC pro SAN, Synology HA, Snapshot Replication (16K snapshots), VMware VAAI/ODX/ALUA, Surveillance Station, nízké TCO.
---
### Srovnání vendorů — přehled
| Vendor | Flagship | Max IOPS | Max kapacita | Latence | Garance availability | Hlavní diferentiátor |
|--------|----------|----------|-------------|---------|---------------------|----------------------|
| **Hitachi** | VSP 5600 | 33M | 287 PBe | 39 µs | 8 nines (HW) | Mainframe + open; 65-node cluster |
| **Huawei** | Dorado 18000 V7 | >100M | 500 PB+ | 0.03 ms | 99.99999% | SmartMatrix; #1 SPC-1 |
| **Dell** | PowerMax 8500 | — | 18 PBe | — | 6 nines | SRDF/Metro; mainframe |
| **HPE** | Alletra 9000/MP | ~3M | 11.8 PBe | <150 µs | 100% data guarantee | InfoSight AIOps; GreenLake |
| **Infinidat** | InfiniBox SSA G4 | 2.24M | 32.8 PBe | 35 µs | 100% availability | 3-way active; Neural Cache |
| **Pure** | FlashArray//XL | >4M | 16.3 PBe | <100 µs | 99.9999% | DirectFlash; Evergreen |
| **Lenovo** | DM7200F | — | 27 PBe | — | — | ONTAP ecosystem; široké portfolio |
| **Synology** | UC3400 | 690K | 576 TB | — | — | Nejnižší cena za active-active SAN |
---
### Výběr storage dle use case
| Use case | Doporučení | Zdůvodnění |
|----------|-----------|-------------|
| **Mainframe + open hybrid** | Hitachi VSP / Dell PowerMax | Jediní s FICON + FC současně |
| **AI/ML trénování** | Huawei Dorado V7 / Pure //XL | Nejvyšší IOPS, nejnižší latence |
| **Enterprise DB (Oracle, SQL Server)** | Infinidat / Pure //X | Nízká latence, konzistentní výkon |
| **Virtualizace (VMware, Hyper-V)** | Dell PowerStore / HPE Alletra 6000 | VAAI, vVols, InfoSight |
| **SMB / SME** | Synology / Lenovo DE | Nízké TCO, jednoduchá správa |
| **Object storage / backup** | Pure //C / Lenovo DG / Infinidat Hybrid | QLC ekonomie, vysoká kapacita |
| **Multi-protocol konsolidace** | HPE Alletra MP / Huawei Dorado | Block + file + object v jedné platformě |
## Decision diagram — výběr storage platformy
```mermaid
flowchart TD
Start(["Storage requirement"]) --> PROTO{"Access type"}
PROTO -->|"Block (SAN)"| BLOCK
PROTO -->|"File (NAS)"| FILE
PROTO -->|"Object"| OBJECT
BLOCK --> BPERF{"Performance tier"}
BPERF -->|"Tier 0/1<br/>< 100 µs, > 1M IOPS"| BT1["Infinidat / Pure //XL<br/>Huawei Dorado V7<br/>FC-NVMe, NVMe-oF"]
BPERF -->|"Tier 2<br/>100-500 µs"| BT2["Dell PowerStore / HPE Alletra 6000<br/>Hitachi VSP / Lenovo DM<br/>FC 32G, iSCSI 25GbE"]
BPERF -->|"Tier 3<br/>SME / low-cost"| BT3["Synology UC3400<br/>Lenovo DE / Dell PowerVault<br/>iSCSI, SAS"]
BLOCK --> BECOS{"Ecosystem"}
BECOS -->|"Mainframe"| BMF["Hitachi VSP / Dell PowerMax<br/>FICON + FC současně"]
BECOS -->|"VMware"| BVM["Dell PowerStore / HPE Alletra<br/>VAAI, vVols, InfoSight"]
BECOS -->|"Oracle / SQL Server"| BDB["Infinidat / Pure //X<br/>Nejnižší latence"]
FILE --> FSIZE{"Škálování"}
FSIZE -->|"Enterprise"| FE["HPE Alletra MP (file)<br/>Lenovo DM / Dell PowerScale<br/>NFS, SMB, multi-protocol"]
FSIZE -->|"SMB"| FS["Synology DS/RS<br/>Lenovo DE / TrueNAS<br/>Btrfs, NFS, SMB, nízké TCO"]
OBJECT --> OUSE{"Use case"}
OUSE -->|"Backup / archive"| OB["Pure //C / Infinidat Hybrid<br/>Lenovo DG<br/>QLC, erasure coding, nízká cena/TB"]
OUSE -->|"AI/ML data lake"| OM["MinIO / Pure //C<br/>High throughput S3<br/>NVMe direct, erasure coding"]
OUSE -->|"Kubernetes PVC"| OK["Ceph RBD / Longhorn / Linstor<br/>SDS na K8s<br/>CSI, replication, snapshots"]
```
## OpenStack Storage
OpenStack nabízí tři hlavní storage služby:
| Služba | Typ | Popis |
|--------|-----|-------|
| **Cinder** | Block storage | Persistent volumes pro instance (iSCSI, NFS, Ceph RBD) |
| **Swift** | Object storage | RESTful object store (S3-kompatibilní via middleware) |
| **Manila** | File storage | Shared file systems (NFS, CIFS) jako managed service |
### Cinder (Block Storage)
- Podpora multi-backend: LVM, Ceph RBD, NFS, iSCSI, Fibre Channel
- Snapshoting, cloning, encryption at rest
- Cinder scheduler pro distribuci volume napříč backendy
- QoS specs pro omezení IOPS/bandwidth
### Swift (Object Storage)
- Alternativa k S3 pro on-prem object storage
- Ring-based data distribution (consistent hashing)
- Multi-region replikace (syncopy)
- Stateless REST API (RESTful, no single point of failure)
### Manila (Shared File Systems)
- Managed NFS/CIFS pro sdílení mezi instancemi
- Backendy: NetApp, Dell EMC, CephFS, GlusterFS
- Access rules (IP-based, cert-based, user-based)
- Use case: HPC cluster home directories, NAS pro legacy apps
### Kontejnerový storage (OpenStack + Ceph)
Ceph je nejčastější storage backend pro OpenStack: Cinder (RBD), Swift (RGW), Manila (CephFS), Glance (RBD images).
## Big Data storage
### HDFS cluster
HDFS je primární storage pro Hadoop ekosystém (on-prem). Typická konfigurace:
| Parametr | Hodnota | Poznámka |
|----------|---------|----------|
| **Disk per DataNode** | 824 × HDD (1422 TB) + 2× NVMe (metadata, cache) | Balance capacity / performance |
| **Replication factor** | 3× | Rack-aware |
| **Network** | 2× 25/100 GbE (data) + 1× 1 GbE (management) | Data + replication traffic |
| **RAM** | 64256 GB (OS cache + metadata) | HDFS cache + OS buffer cache |
| **CPU** | 1632 cores | HDFS overhead je nízký |
| **NameNode HA** | Active + Standby + JN (JournalNode) | Quorum-based HA |
| **Use case** | Secvenční čtení/zápis, velké soubory, Spark YARN |
**Modelový cluster — 1 PB usable:**
- 10× DataNode (12× 18 TB HDD, 2× 1.9 TB NVMe)
- 2× NameNode (HA, 256 GB RAM)
- 3× JournalNode (malé VM)
- Replication 3× → raw ~ 2.2 PB
- Network: 25 GbE pro data, 100 GbE pro shuffle-heavy Spark
### Object storage jako Data Lake (S3/GCS/MinIO)
Pro nové projekty (Spark on K8s, Iceberg/Delta, lakehouse) se preferuje object storage před HDFS:
| Platforma | Výhody | Limity |
|-----------|--------|--------|
| **MinIO** (on-prem) | S3 API, erasure coding, NVMe direct, high throughput | Single tenant (per cluster) |
| **Pure //C** (on-prem) | QLC NVMe, dedupe, S3 + NFS | Vyšší cena/TB |
| **AWS S3** (cloud) | Neomezená kapacita, Iceberg/Delta support | Egress fees |
| **Azure ADLS** (cloud) | Hierarchical namespace, HNS, POSIX-like ACLs | Vendor lock |
| **GCP GCS** (cloud) | Uniform + fine-grained ACLs, object versioning | Region restrictions |
### Srovnání: HDFS vs Object Storage pro Big Data
| Kritérium | HDFS | Object Storage (S3/MinIO) |
|-----------|------|-------------------------|
| **Architektura** | Master/worker (NameNode SPOF) | Distributed, no SPOF (erasure coding) |
| **Konzistence** | Strong (jediný writer per file) | Eventual (S3) / Strong (MinIO) |
| **Propustnost** | Vysoká (rack-aware, locality) | Vysoká (network-bound) |
| **Škálování** | Horizontální (DataNode) | Horizontální (stateless) |
| **Cena** | Nízká (HDD) | Střední (S3 API) |
| **Metadata** | NameNode (1 mil. bloků ~ 1 GB RAM) | Object-level (flat namespace) |
| **Spark integration** | Native (locality optimalizace) | S3A connector, Hadoop Compatible |
| **2026 trend** | Legacy, klesající | Standard pro nové projekty |
Podrobnější informace o Big Data viz [BIG-DATA.md](BIG-DATA.md).
## Zdroje
Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
### Doporučená literatura
| Kniha | Autoři | ISBN | Popis |
|-------|--------|------|-------|
| Storage Systems | Ganger, Gibson | 978-1680837540 | Učebnice pokrývající návrh, implementaci a provoz úložných systémů — od charakteristik jednotlivých zařízení přes OS, databáze a networking až po distribuce v serverech a large-scale systémech. Nezbytný zdroj pro architekty storage infrastruktury. |
*Poslední revize: 2026-06-03*

105
VECTOR-DBS.en.md Normal file
View File

@@ -0,0 +1,105 @@
# 🧠 Vector Databases
## Overview
Specialized databases for storing and searching **embeddings** — vector representations of unstructured data (text, images, audio, video). They enable **semantic search** based on similarity, not exact matching. A key building block for RAG (Retrieval-Augmented Generation) and AI applications.
## Embeddings
- Map unstructured data into a vector space (list of numbers)
- Proximity in vector space = semantic similarity
- Generated by models: Word2Vec, BERT, OpenAI embeddings, E5, Cohere, Mistral
- Dimensions: 384 (all-MiniLM) to 3072 (OpenAI text-embedding-3-large)
## Vector indexing
| Method | Algorithm | Description | Accuracy | Speed |
|--------|-----------|-------------|----------|-------|
| **Flat (brute-force)** | Full scan | Comparison with all vectors | 100% | O(N) — slow for > 100K |
| **IVF** (Inverted File) | K-means clustering | Partition into clusters, search nearest cluster | ~95-99% | O(sqrt(N)) |
| **HNSW** (Hierarchical Navigable Small World) | Navigable graph | Multi-level graph, greedy search | ~99-100% | O(log N) |
| **IVF-PQ** | IVF + Product Quantization | Vector compression, less memory | ~90-95% | O(sqrt(N)) |
| **DiskANN** | SSD-based graph | Vectors on disk, Vamana graph | ~95-98% | O(log N) + I/O |
### Index selection
| Number of vectors | Requirement | Recommended index |
|------------------|-------------|------------------|
| < 100K | 100% accuracy | Flat |
| 100K - 10M | High accuracy, speed | HNSW |
| 10M+ | Memory efficiency | IVF-PQ, DiskANN |
| 100M+ | Scaling on SSD | DiskANN |
## Use case: RAG (Retrieval-Augmented Generation)
```text
User query → Embedding model → Vector DB search → Relevant chunks → LLM → Answer
```
Variants:
- **Naive RAG** — single retrieval + single generation
- **Advanced RAG** — pre-retrieval (query rewriting, HyDE) + post-retrieval (reranking, filtering)
- **Multi-modal RAG** — text + images + audio in one pipeline
## Tools — comparison
| Tool | Type | Indexes | Cloud | Self-hosted | Note |
|------|------|---------|-------|-------------|------|
| **Pinecone** | Managed | HNSW, IVF-PQ | Yes | No | Fully managed, no ops. Pricing by dimension and vector count |
| **Weaviate** | Open source | HNSW, Flat | Yes (WCD) | Yes | Graph + vector, hybrid queries, modular (generative search) |
| **Qdrant** | Open source | HNSW, IVF-PQ, quantization | Yes (Cloud) | Yes | Rust, batch API, filter concurrent with vector search |
| **Milvus** | Open source | IVF, HNSW, IVF-PQ, DiskANN | Yes (Zilliz) | Yes | GPU acceleration. More complex ops (K8s required) |
| **pgvector** | PostgreSQL extension | IVFFlat, HNSW | All (via RDS) | Yes | Embeddings directly in PostgreSQL. Hybrid SQL + vectors |
| **Chroma** | Open source | HNSW | No | Yes | Simple embedding + retrieval, Python-native |
| **LanceDB** | Open source | IVF-PQ | No | Yes | Multi-modal data, Arrow format, no server (embedded) |
| **Elasticsearch** | Search engine | HNSW (8.0+) | Yes (Cloud) | Yes | If you already have ES, can use for vectors too |
### pgvector vs standalone vector DB
| Feature | pgvector | Standalone (Pinecone, Qdrant, Milvus) |
|---------|----------|---------------------------------------|
| **Architecture** | Extension in PostgreSQL | Standalone service |
| **Hybrid queries** | Native SQL + vectors | Requires coordination of two systems |
| **Latency** | Higher (disk-based PG) | Lower (in-memory indexes) |
| **Scaling** | PG replication / Citus | Native sharding, rebalancing |
| **Consistency** | PG ACID transactions | Eventual consistency |
| **Operations** | One system | Two systems (operational overhead) |
## Recommendations — Tool selection
| Scenario | Recommendation | Rationale |
|----------|---------------|-----------|
| **RAG on PostgreSQL data** | pgvector | Hybrid SQL + vectors in one DB |
| **RAG production, no ops** | Pinecone | Fully managed, scalable, no operations |
| **Self-hosted RAG** | Qdrant (simpler) / Milvus (performance) | Open source, data control |
| **Full-text + vectors** | Elasticsearch / Weaviate | Combination of BM25 + vector score |
| **Research / prototyping** | Chroma | Python-native, quick start |
| **Embedded / edge** | LanceDB | No server, Arrow format |
| **Multi-modal data** | Weaviate / LanceDB | Native image, audio, video support |
| **GPU acceleration** | Milvus | CUDA support for index build |
## When to (not) use a vector DB
**Use** when:
- You need semantic search (similarity by meaning, not keywords)
- You are building a RAG / AI assistant over your own data
- Document/image deduplication (near-duplicate detection)
- Recommendation systems (similar content, similar users)
**Do not use** when:
- You need exact matching (keys, IDs, foreign keys) → SQL
- Full-text search suffices (BM25, stemming) → Elasticsearch, PostgreSQL full-text
- Vectors are just a complement to the primary DB → pgvector (simplicity)
- Fewer than 1000 documents → brute-force in application is sufficient
## Sources
References, books, and standards: [sources/databases/sources.en.md](sources/databases/sources.en.md)
### Recommended reading
| Book | Authors | Description |
|------|---------|-------------|
| Vector Databases | Borwankar (2026) | Comprehensive guide to vector DBs from concepts to production deployment |
*Last revision: 2026-06-03*

105
VEKTOROVE-DB.md Normal file
View File

@@ -0,0 +1,105 @@
# 🧠 Vektorové databáze
## Přehled
Specializované databáze pro ukládání a vyhledávání **embeddingů** — vektorových reprezentací nestrukturovaných dat (text, obrázky, audio, video). Umožňují **sémantické vyhledávání** na základě podobnosti, nikoliv přesné shody. Klíčový stavební kámen pro RAG (Retrieval-Augmented Generation) a AI aplikace.
## Embeddings
- Mapují nestrukturovaná data do vektorového prostoru (seznam čísel)
- Blízkost ve vektorovém prostoru = sémantická podobnost
- Generovány modely: Word2Vec, BERT, OpenAI embeddings, E5, Cohere, Mistral
- Dimenze: 384 (all-MiniLM) až 3072 (OpenAI text-embedding-3-large)
## Indexování vektorů
| Metoda | Algoritmus | Popis | Přesnost | Rychlost |
|--------|-----------|-------|----------|----------|
| **Flat (brute-force)** | Úplné prohledání | Porovnání se všemi vektory | 100 % | O(N) — pomalé pro > 100K |
| **IVF** (Inverted File) | K-means clustering | Rozdělení do shluků, hledá se v nejbližším shluku | ~95-99 % | O(sqrt(N)) |
| **HNSW** (Hierarchical Navigable Small World) | Navigovatelný graf | Víceúrovňový graf, greedy search | ~99-100 % | O(log N) |
| **IVF-PQ** | IVF + Product Quantization | Komprese vektorů, menší paměť | ~90-95 % | O(sqrt(N)) |
| **DiskANN** | SSD-based graf | Vektory na disku, Vamana graf | ~95-98 % | O(log N) + I/O |
### Volba indexu
| Počet vektorů | Požadavek | Doporučený index |
|--------------|-----------|-----------------|
| < 100K | 100% přesnost | Flat |
| 100K - 10M | Vysoká přesnost, rychlost | HNSW |
| 10M+ | Paměťová efektivita | IVF-PQ, DiskANN |
| 100M+ | Škálování na SSD | DiskANN |
## Use case: RAG (Retrieval-Augmented Generation)
```text
User query → Embedding model → Vector DB search → Relevant chunks → LLM → Answer
```
Varianty:
- **Naive RAG** — jeden retrieval + jeden generování
- **Advanced RAG** — pre-retrieval (query rewriting, HyDE) + post-retrieval (reranking, filtering)
- **Multi-modal RAG** — text + obrázky + audio do jednoho pipeline
## Nástroje — srovnání
| Nástroj | Typ | Indexy | Cloud | Self-hosted | Poznámka |
|---------|-----|--------|-------|-------------|----------|
| **Pinecone** | Managed | HNSW, IVF-PQ | Ano | Ne | Plně spravovaná, žádný ops. Cena dle dimenze a počtu vektorů |
| **Weaviate** | Open source | HNSW, Flat | Ano (WCD) | Ano | Grafová + vektorová, hybridní dotazy, modulární (generative search) |
| **Qdrant** | Open source | HNSW, IVF-PQ, quantization | Ano (Cloud) | Ano | Rust, batch API, filtr souběžně s vektorovým search |
| **Milvus** | Open source | IVF, HNSW, IVF-PQ, DiskANN | Ano (Zilliz) | Ano | GPU akcelerace. Komplexnější ops (K8s required) |
| **pgvector** | PostgreSQL extension | IVFFlat, HNSW | Vše (díky RDS) | Ano | Embeddingy přímo v PostgreSQL. Hybridní SQL + vektory |
| **Chroma** | Open source | HNSW | Ne | Ano | Jednoduchý na embedding + retrieval, Python-native |
| **LanceDB** | Open source | IVF-PQ | Ne | Ano | Multimodální data, Arrow formát, žádný server (embedded) |
| **Elasticsearch** | Search engine | HNSW (8.0+) | Ano (Cloud) | Ano | Pokud už máte ES, lze použít i pro vektory |
### pgvector vs samostatná vektorová DB
| Vlastnost | pgvector | Samostatná (Pinecone, Qdrant, Milvus) |
|-----------|----------|---------------------------------------|
| **Architektura** | Extension v PostgreSQL | Samostatná služba |
| **Hybridní dotazy** | Nativní SQL + vektory | Nutná koordinace dvou systémů |
| **Latence** | Vyšší (disk-based PG) | Nižší (in-memory indexy) |
| **Škálování** | PG replikace / Citus | Nativní sharding, rebalancing |
| **Konzistence** | PG ACID transakce | Eventual consistency |
| **Provoz** | Jeden systém | Dva systémy (operational overhead) |
## Doporučení — Volba nástroje
| Scénář | Doporučení | Zdůvodnění |
|--------|-----------|-------------|
| **RAG na PostgreSQL datech** | pgvector | Hybridní SQL + vektory v jedné DB |
| **RAG produkce, žádný ops** | Pinecone | Plně managed, škálovatelné, žádný provoz |
| **Self-hosted RAG** | Qdrant (jednodušší) / Milvus (výkon) | Open source, kontrola nad daty |
| **Full-text + vektory** | Elasticsearch / Weaviate | Kombinace BM25 + vektorového skóre |
| **Výzkum / prototypování** | Chroma | Python-native, rychlý start |
| **Embedded / edge** | LanceDB | Žádný server, Arrow formát |
| **Multi-modal data** | Weaviate / LanceDB | Nativní podpora obrázků, audio, videa |
| **GPU akcelerace** | Milvus | CUDA podpora pro index build |
## Kdy vektorovou DB (ne)použít
**Použít** když:
- Potřebujete sémantické vyhledávání (podobnost podle významu, ne klíčových slov)
- Stavíte RAG / AI asistenta nad vlastními daty
- Deduplikace dokumentů, obrázků (near-duplicate detection)
- Doporučovací systémy (podobný obsah, podobní uživatelé)
**Nepoužít** když:
- Potřebujete přesnou shodu (klíče, ID, foreign keys) → SQL
- Full-text search stačí (BM25, stemming) → Elasticsearch, PostgreSQL full-text
- Vektory jen jako doplněk k primární DB → pgvector (jednoduchost)
- Méně než 1000 dokumentů → postačí brute-force v aplikaci
## Zdroje
Odkazy, knihy a standardy: [sources/databases/sources.md](sources/databases/sources.md)
### Doporučená literatura
| Kniha | Autoři | Popis |
|-------|--------|-------|
| Vector Databases | Borwankar (2026) | Komplexní průvodce vektorovými DB od konceptů po produkční nasazení |
*Poslední revize: 2026-06-03*

View File

@@ -0,0 +1,378 @@
# Case study: Proxmox VE demo cluster (3× node, Ceph, HA)
## 1. Requirements and parameters
| Parameter | Value |
|----------|---------|
| Number of hosts | 3 |
| Purpose | demo, learning, development |
| Hypervisor | Proxmox VE (free) |
| Budget | low-cost (~$10,000$15,000) |
| Storage | Ceph (HCI) |
| HA | yes |
| Location | 1 rack, standard office room |
---
## 2. Server configuration
Based on a combination of the **Mini variant** (23 hosts, single-socket) and the **pure Ceph variant** per SERVER-CONFIG.md. Each of the 3 nodes is identical.
### 2.1 Single node configuration
| Component | Specification | Rationale |
|------------|-------------|------------|
| **CPU** | 1× AMD EPYC 9224 (24C/48T, 200 W TDP) or Intel Xeon 5418Y (16C/32T) | SERVER-CONFIG.md: "Pure Ceph variant: CPU 1× EPYC 92249334 (1224C)". Ceph requires 12 cores per OSD; with 3 OSD + Proxmox + VM, 12+ cores is the minimum. |
| **RAM** | 128 GB DDR5-4800 (4× 32 GB RDIMM, 1DPC) | SERVER-CONFIG.md: "RAM 128256 GB" for Ceph variant. 128 GB is sufficient for demo; 48 GB per OSD + OS + lightweight VMs. |
| **OS disk** | 2× 240 GB SATA SSD, RAID 1 (HW controller in HBA mode or SW mdadm) | "OS: 2× SATA SSD RAID 1" per Ceph variant. |
| **Ceph OSD** | 3× 960 GB SATA SSD (HBA/IT mode, no HW RAID) | "Ceph OSD: 48× NVMe/SATA SSD (RAW, HBA mode)". For demo we reduce to 3 OSD/node. Total 9 OSD in cluster. |
| **NIC** | 2× dual-port 10 GbE SFP+ (total 4× 10 GbE) | "Network: 2× 25 GbE public + 2× 25 GbE cluster". For low-cost we choose 10 GbE (SFP+), the concept remains the same. |
| **BMC** | 1× 1 GbE (iDRAC / iLO / IPMI) | Standard management port, CONNECTIVITY.md. |
| **Form factor** | 1U rack server (Dell R660, HPE DL360 Gen11, or Supermicro) | 19" rack, suitable for 1U. |
### 2.2 CPU choice rationale
KB states for the Mini variant "1× EPYC 4124 (4C) or Xeon E-2400". However, 4 cores is insufficient for Ceph (OSD + Proxmox + VM). Therefore we choose EPYC 9224 (24C) / Xeon 5418Y (16C), which corresponds to the Ceph variant in SERVER-CONFIG.md. The price is higher, but the cluster is functional for real-world testing.
---
## 3. Storage variant — Ceph
### 3.1 Topology
```
3× Proxmox node ─── each 3× OSD (SATA SSD)
Ceph cluster
┌─────────┼─────────┐
3× MON 3× MGR 9× OSD
```
### 3.2 Ceph configuration
| Parameter | Value | Note |
|----------|---------|---------|
| Replication | 3 (size = 3, min_size = 2) | Standard per STORAGE.md |
| Failure domain | host | CRUSH: replication across nodes |
| Raw capacity | 9 × 960 GB ≈ 8.6 TB | |
| Usable capacity | ~2.9 TB (8.6 / 3) | Sufficient for demo |
| OSD backend | BlueStore | Default in Ceph, recommended |
| MON quorum | 3 (1 per node) | Minimum for HA |
| Cache | RAM (BlueStore cache) | 12 GB per OSD |
| Network public | 2× 10 GbE LACP | VM traffic + Ceph frontend |
| Network cluster | 2× 10 GbE LACP | Ceph backend replication |
| MTU | 9000 (jumbo frames) | Recommended per NETWORKING.md |
### 3.3 Storage layout on disk
```
/dev/sda 240 GB OS (RAID 1, mirror with /dev/sdb)
/dev/sdc 960 GB OSD.0 (RAW, BlueStore)
/dev/sdd 960 GB OSD.1 (RAW, BlueStore)
/dev/sde 960 GB OSD.2 (RAW, BlueStore)
```
### 3.4 Ceph pool design
| Pool | PG count | Replication | Purpose |
|------|----------|-----------|-------|
| vms | 128 | 3× | VM disks (RBD) |
| data | 64 | 3× | Data volume |
| backups | 32 | 3× | Backups (low priority) |
PG count is approximate for demo (9 OSD). Production formula: (OSD_total × 100) / replication_size.
---
## 4. Network
### 4.1 Topology
```
┌─────────────────┐
│ 10 GbE Switch │
│ (24-port SFP+) │
└──┬──┬──┬──┬──┬──┘
┌─────────────┘ │ │ └─────────────┐
│ │ │ │
┌─────┴─────┐ ┌────┴──┴───┐ ┌───────┴──┐
│ Node 1 │ │ Node 2 │ │ Node 3 │
│ 4×10GbE │ │ 4×10GbE │ │ 4×10GbE │
│ ┌──────┐ │ │ ┌──────┐ │ │ ┌──────┐ │
│ │1GbE │ │ │ │1GbE │ │ │ │1GbE │ │
│ │BMC │ │ │ │BMC │ │ │ │BMC │ │
└─────────┘ └───────────┘ └───────────┘
```
### 4.2 VLAN and traffic segmentation
| VLAN | Purpose | Ports | MTU |
|------|------|-------|-----|
| VLAN 10 | Management (Proxmox web UI, SSH) | 1× 1 GbE BMC | 1500 |
| VLAN 20 | VM traffic + Ceph public | 2× 10 GbE (bond) | 9000 |
| VLAN 30 | Ceph cluster (backend) | 2× 10 GbE (bond) | 9000 |
### 4.3 Switch
| Parameter | Value |
|----------|---------|
| Model | MikroTik CRS326-24S+2Q+RM or similar L2+ switch |
| Ports | 24× SFP+ 10 GbE |
| Management | VLAN 10, IP 10.0.0.254/24 |
| Features | VLAN, LACP (LAG), Jumbo frames (MTU 9000), SNMP |
### 4.4 Cabling
| Type | Length | Quantity | Purpose |
|-----|-------|-------|-------|
| SFP+ DAC (passive) | 3 m | 12 | 10 GbE connection server ↔ switch |
| Cat6A UTP | 3 m | 3 | Management (1 GbE BMC) |
| Cat6A UTP | 1 m | 1 | Internet uplink (patch panel) |
DAC cables are cheaper than SFP+ optics + patch cords — suitable for single-rack.
---
## 5. Rack layout
### 5.1 Dimensions and positions
| U | Device | Power (W) |
|---|----------|-----------|
| U1 | Switch 10 GbE (1U) | ~60 W |
| U2 | UPS (2U) | — |
| U3 | (empty, ventilation) | — |
| U4 | Server Node 1 (1U) | ~250 W |
| U5 | Server Node 2 (1U) | ~250 W |
| U6 | Server Node 3 (1U) | ~250 W |
| U7U15 | Empty (optional storage, patch panel) | — |
| Parameter | Value |
|----------|---------|
| Rack type | 15U wall-mount, 19", 600×600 mm |
| Total IT load | ~810 W |
| PUE estimate | ~1.5 (office room, no precision cooling) |
| Cooling | Standard office AC (ASHRAE A2: 1035 °C). Sufficient for <1 kW. |
**Note:** KB (DATACENTERS.md) states free air cooling for low density (<5 kW/rack). Standard ventilation and AC are sufficient in an office.
### 5.2 UPS
| Parameter | Value |
|----------|---------|
| Type | VI (line-interactive) — per DATACENTERS.md for smaller racks |
| Capacity | 2000 VA / 1200 W |
| Backup time | ~1520 min at 810 W load |
| Output | 8× C13 (for servers + switch) |
| Battery | VRLA (cheaper) or Li-ion LFP |
| Management | USB / SNMP card (automatic Proxmox shutdown) |
Optionally can be upgraded to VFI (double-conversion) UPS for cleaner output, but VI is sufficient for demo.
### 5.3 PDU
1× basic 1U PDU (8× C13), 230 V / 10 A — for distribution to servers.
---
## 6. Hypervisor — Proxmox VE
### 6.1 Installation and configuration
| Component | Version / Configuration |
|------------|---------------------|
| Hypervisor | Proxmox VE 8.x (Debian 12 + KVM + LXC) |
| Storage backend | Ceph Reef / Squid (18.x) integrated in Proxmox |
| Cluster | 3-node cluster, Corosync + PMXCFS |
| HA | Proxmox HA — 1 node failure tolerance (remaining 2 take over VMs) |
| Fencing | watchdog (softdog) + Proxmox HA manager |
### 6.2 License
| Item | Price | Note |
|---------|------|----------|
| Proxmox VE | $0 | Open source, full functionality without license |
| Proxmox community support | $0 | Forum, wiki |
| Proxmox enterprise support (optional) | ~€500/host/year | Can be purchased later |
HYPERVISORS.md: Proxmox VE is "open source (free)", no license required.
### 6.3 HA setup
- HA group: all 3 nodes, no-quorum-policy = "stop" (for demo)
- Max VM restart: 2 attempts
- Migration: live migration via Ceph RBD (shared storage)
---
## 7. Budget estimate
**Disclaimer:** KB does not contain specific component prices. The following amounts are approximate market estimates (Q2 2026, USD).
### 7.1 Servers (3×)
| Item | Qty | Price/unit | Total |
|---------|------|----------|--------|
| 1U rack server (basic config, without CPU/RAM/disk) | 3 | ~$1,200 | $3,600 |
| AMD EPYC 9224 (24C) / Intel Xeon 5418Y (16C) — per KB | 3 | ~$900 | $2,700 |
| RAM 128 GB (4× 32 GB DDR5-4800 RDIMM) | 3 | ~$600 | $1,800 |
| 240 GB SATA SSD (OS) | 6 | ~$50 | $300 |
| 960 GB SATA SSD (Ceph OSD) | 9 | ~$150 | $1,350 |
| Dual-port 10 GbE SFP+ NIC (e.g. Intel X710-DA2) | 6 | ~$120 | $720 |
| **Servers total** | | | **~$10,470** |
### 7.2 Network
| Item | Qty | Price/unit | Total |
|---------|------|----------|--------|
| MikroTik CRS326-24S+2Q+RM (24× 10GbE SFP+) | 1 | ~$600 | $600 |
| SFP+ DAC cable 3 m (passive) | 12 | ~$15 | $180 |
| Network total | | | **~$780** |
### 7.3 Rack and power
| Item | Qty | Price/unit | Total |
|---------|------|----------|--------|
| 15U wall-mount rack 19" | 1 | ~$300 | $300 |
| UPS 2000 VA (line-interactive, VRLA) | 1 | ~$450 | $450 |
| 1U PDU basic (8× C13) | 1 | ~$60 | $60 |
| Rack + power total | | | **~$810** |
### 7.4 Other
| Item | Price |
|---------|------|
| Cat6A patch cables, management | ~$50 |
| Mounting material, velcro | ~$30 |
| Shipping and installation | ~$200 |
| Other total | **~$280** |
### 7.5 Total calculation
| Category | Amount |
|-----------|--------|
| Servers (3× node) | ~$10,470 |
| Network (switch + cables) | ~$780 |
| Rack + power | ~$810 |
| Other | ~$280 |
| **Total** | **~$12,340** |
| Reserve (1015%) | ~$1,2001,800 |
| **Total with reserve** | **~$13,500$14,100** |
Budget **$10,000$15,000** is achievable. Using cheaper CPUs (EPYC 4124P / Xeon E-2488), it can be built for ~$8,0009,000, but with limited performance for Ceph.
**Possible savings:**
- CPU: 2× EPYC 4124P (4C) + 1× more powerful node → ~$800 savings (but asymmetric cluster)
- OSD: 2× instead of 3× SSD/node → ~$500 savings (less capacity)
- Switch: 12-port instead of 24-port → ~$300 savings
---
## 8. Topology diagram
```mermaid
flowchart TB
subgraph Rack["15U Rack (office)"]
U1["U1: 10GbE Switch (MikroTik)"]
U2["U2: UPS 2000 VA"]
U4["U4: Node 1 — Proxmox + Ceph OSD"]
U5["U5: Node 2 — Proxmox + Ceph OSD"]
U6["U6: Node 3 — Proxmox + Ceph OSD"]
end
subgraph Node1["Node 1 (detail)"]
N1_CPU["CPU: EPYC 9224 (24C)"]
N1_RAM["RAM: 128 GB DDR5"]
N1_OS["OS: 2× 240 GB SSD (RAID 1)"]
N1_OSD1["OSD.0: 960 GB SSD"]
N1_OSD2["OSD.1: 960 GB SSD"]
N1_OSD3["OSD.2: 960 GB SSD"]
N1_NIC["NIC: 4× 10GbE SFP+"]
N1_BMC["BMC: 1× 1GbE"]
end
U1 ---|"4× 10GbE LACP<br/>(public + cluster)"| U4
U1 ---|"4× 10GbE LACP"| U5
U1 ---|"4× 10GbE LACP"| U6
U4 --- N1_CPU
U4 --- N1_RAM
U4 --- N1_OS
U4 --- N1_OSD1
U4 --- N1_OSD2
U4 --- N1_OSD3
U4 --- N1_NIC
U4 --- N1_BMC
subgraph Ceph["Ceph Cluster"]
CEPH_MON["3× MON (1 per node)"]
CEPH_MGR["3× MGR (1 per node)"]
CEPH_OSD["9× OSD (3 per node)"]
end
U4 --- CEPH_MON
U5 --- CEPH_MON
U6 --- CEPH_MON
U4 --- CEPH_MGR
U5 --- CEPH_MGR
U6 --- CEPH_MGR
U4 --- CEPH_OSD
U5 --- CEPH_OSD
U6 --- CEPH_OSD
subgraph Proxmox["Proxmox VE Cluster"]
PMX_HA["HA Group (3 nodes)"]
PMX_HA --- U4
PMX_HA --- U5
PMX_HA --- U6
end
subgraph Uplink["Internet / LAN"]
UPLINK_SW["Office LAN<br/>(1 GbE)"]
end
U1 ---|"1× Cat6A<br/>1 GbE"| UPLINK_SW
U1 ---|"Internet<br/>(ISP router)"| UPLINK_SW
```
---
## 9. Summary and key decisions
| Decision | Variant | Rationale |
|------------|----------|------------|
| Hypervisor | Proxmox VE | HYPERVISORS.md: "For SME / low budget — open source, built-in Ceph, no license costs". Ideal for demo. |
| Storage | Ceph (3× replication) | STORAGE.md + SERVER-CONFIG.md: Ceph is the recommended SDS for Proxmox, 3 nodes minimum for quorum. |
| CPU | Single-socket EPYC 9224 / Xeon 5418Y | Compromise between price (Mini variant ~1 socket) and performance for Ceph (Ceph variant ~12+ cores). |
| Network | 10 GbE SFP+ (instead of 25 GbE) | KB recommends 25 GbE, but for low-cost demo 10 GbE is sufficient. The concept (public/cluster network separation) remains the same. |
| Rack | 15U wall-mount | Suitable for office, no raised floor, no precision cooling. |
| UPS | 2000 VA line-interactive | DATACENTERS.md: VI type for smaller racks. Sufficient for demo. |
| License | Proxmox VE (free) | No license costs, support can be purchased later. |
### Compromises compared to production deployment
- **25 GbE → 10 GbE**: lower Ceph cluster network throughput (not an issue in demo environment)
- **HDD → SSD**: for Ceph OSD we choose SSD instead of HDD (higher price, better performance — demo focuses on functionality, not capacity)
- **2× 10 GbE public + 2× 10 GbE cluster → combined on LACP**: can be merged when ports are scarce, but separation is better
- **Cooling**: office AC, not DC-grade precision cooling (PUE ~1.51.8)
### What KB does not address (supplemented from practice)
KB does not contain specific component prices — the budget is an approximate market estimate. It also does not specify a concrete switch model with L2+ features (VLAN, LACP, Jumbo frames). Here we follow common practice for the SOHO/SME segment.
---
## 10. References from KB
- **DATACENTERS.md** — rack layout, power chain, UPS types, cooling classes (ASHRAE), cabling standards
- **HYPERVISORS.md** — Proxmox VE as open source variant, platform comparison, Mini variant (23 hosts), Ceph connectivity
- **SERVER-CONFIG.md** — Pure Ceph variant (36 hosts), HW specification, network design, BIOS settings
- **STORAGE.md** — Ceph architecture (MON/MGR/OSD, CRUSH map, BlueStore, replication), SDS overview
- **CONNECTIVITY.md** — Ethernet speeds (10/25 GbE), SFP+ form factor, NIC placement, management port
- **NETWORKING.md** — VLAN segmentation, MTU and jumbo frames, best practices
- **SERVER-HW.md** — CPU selection (EPYC vs Xeon), RAM population (1DPC/2DPC), NUMA, form factors
---
*Last revision: 2026-06-04*

View File

@@ -0,0 +1,378 @@
# Případová studie: Proxmox VE demo cluster (3× node, Ceph, HA)
## 1. Zadání a parametry
| Parametr | Hodnota |
|----------|---------|
| Počet hostů | 3 |
| Účel | demo, učení, vývoj |
| Hypervisor | Proxmox VE (free) |
| Rozpočet | low-cost (~$10 000$15 000) |
| Storage | Ceph (HCI) |
| HA | ano |
| Lokalita | 1 rack, běžná kancelářská místnost |
---
## 2. Serverová sestava
Vychází z kombinace **varianty Mini** (23 hosty, single-socket) a **čistě Ceph varianty** dle SERVER-CONFIG.md. Každý ze 3 nodů je identický.
### 2.1 Konfigurace jednoho nodu
| Komponenta | Specifikace | Zdůvodnění |
|------------|-------------|------------|
| **CPU** | 1× AMD EPYC 9224 (24C/48T, 200 W TDP) nebo Intel Xeon 5418Y (16C/32T) | SERVER-CONFIG.md: "Čistě Ceph varianta: CPU 1× EPYC 92249334 (1224C)". Ceph vyžaduje 12 jádra na OSD; při 3 OSD + Proxmox + VM je 12+ jader minimum. |
| **RAM** | 128 GB DDR5-4800 (4× 32 GB RDIMM, 1DPC) | SERVER-CONFIG.md: "RAM 128256 GB" pro Ceph variantu. 128 GB dostačuje pro demo; 48 GB na OSD + OS + lehké VM. |
| **OS disk** | 2× 240 GB SATA SSD, RAID 1 (HW řadič v HBA režimu nebo SW mdadm) | "OS: 2× SATA SSD RAID 1" dle Ceph varianty. |
| **Ceph OSD** | 3× 960 GB SATA SSD (HBA/IT mode, žádný HW RAID) | "Ceph OSD: 48× NVMe/SATA SSD (RAW, HBA mode)". Pro demo snižujeme na 3 OSD/node. Celkem 9 OSD v clusteru. |
| **NIC** | 2× dual-port 10 GbE SFP+ (celkem 4× 10 GbE) | "Network: 2× 25 GbE public + 2× 25 GbE cluster". Pro low-cost volíme 10 GbE (SFP+), koncept zůstává stejný. |
| **BMC** | 1× 1 GbE (iDRAC / iLO / IPMI) | Standardní management port, CONNECTIVITY.md. |
| **Form factor** | 1U rack server (Dell R660, HPE DL360 Gen11, nebo Supermicro) | Rack 19", vhodný do 1U. |
### 2.2 Zdůvodnění CPU volby
KB uvádí pro Mini variantu "1× EPYC 4124 (4C) nebo Xeon E-2400". Pro Ceph je však 4 jader málo (OSD + Proxmox + VM). Proto volíme EPYC 9224 (24C) / Xeon 5418Y (16C), což odpovídá Ceph variantě v SERVER-CONFIG.md. Cena je vyšší, ale cluster je funkční i pro reálné testování.
---
## 3. Storage varianta — Ceph
### 3.1 Topologie
```
3× Proxmox node ─── každý 3× OSD (SATA SSD)
Ceph cluster
┌─────────┼─────────┐
3× MON 3× MGR 9× OSD
```
### 3.2 Konfigurace Ceph
| Parametr | Hodnota | Poznámka |
|----------|---------|----------|
| Replikace | 3 (size = 3, min_size = 2) | Standard dle STORAGE.md |
| Failure domain | host | CRUSH: replikace napříč nodem |
| Raw kapacita | 9 × 960 GB ≈ 8.6 TB | |
| Usable kapacita | ~2.9 TB (8.6 / 3) | Dostačující pro demo |
| OSD backend | BlueStore | Výchozí v Cephu, doporučeno |
| MON kvórum | 3 (1 per node) | Minimální pro HA |
| Cache | RAM (BlueStore cache) | 12 GB per OSD |
| Síť public | 2× 10 GbE LACP | VM traffic + Ceph frontend |
| Síť cluster | 2× 10 GbE LACP | Ceph backend replikace |
| MTU | 9000 (jumbo frames) | Doporučeno dle NETWORKING.md |
### 3.3 Storage layout na disku
```
/dev/sda 240 GB OS (RAID 1, mirror s /dev/sdb)
/dev/sdc 960 GB OSD.0 (RAW, BlueStore)
/dev/sdd 960 GB OSD.1 (RAW, BlueStore)
/dev/sde 960 GB OSD.2 (RAW, BlueStore)
```
### 3.4 Ceph pool design
| Pool | PG count | Replikace | Účel |
|------|----------|-----------|-------|
| vms | 128 | 3× | VM disky (RBD) |
| data | 64 | 3× | Data volume |
| backups | 32 | 3× | Zálohy (low priority) |
PG count orientační pro demo (9 OSD). Produkční vzorec: (OSD_total × 100) / replication_size.
---
## 4. Network
### 4.1 Topologie
```
┌─────────────────┐
│ 10 GbE Switch │
│ (24-port SFP+) │
└──┬──┬──┬──┬──┬──┘
┌─────────────┘ │ │ └─────────────┐
│ │ │ │
┌─────┴─────┐ ┌────┴──┴───┐ ┌───────┴──┐
│ Node 1 │ │ Node 2 │ │ Node 3 │
│ 4×10GbE │ │ 4×10GbE │ │ 4×10GbE │
│ ┌──────┐ │ │ ┌──────┐ │ │ ┌──────┐ │
│ │1GbE │ │ │ │1GbE │ │ │ │1GbE │ │
│ │BMC │ │ │ │BMC │ │ │ │BMC │ │
└─────────┘ └───────────┘ └───────────┘
```
### 4.2 VLAN a traffic segmentation
| VLAN | Účel | Porty | MTU |
|------|------|-------|-----|
| VLAN 10 | Management (Proxmox web UI, SSH) | 1× 1 GbE BMC | 1500 |
| VLAN 20 | VM traffic + Ceph public | 2× 10 GbE (bond) | 9000 |
| VLAN 30 | Ceph cluster (backend) | 2× 10 GbE (bond) | 9000 |
### 4.3 Switch
| Parametr | Hodnota |
|----------|---------|
| Model | MikroTik CRS326-24S+2Q+RM nebo podobný L2+ switch |
| Porty | 24× SFP+ 10 GbE |
| Management | VLAN 10, IP 10.0.0.254/24 |
| Features | VLAN, LACP (LAG), Jumbo frames (MTU 9000), SNMP |
### 4.4 Kabeláž
| Typ | Délka | Počet | Účel |
|-----|-------|-------|-------|
| SFP+ DAC (pasivní) | 3 m | 12 | 10 GbE propojení server ↔ switch |
| Cat6A UTP | 3 m | 3 | Management (1 GbE BMC) |
| Cat6A UTP | 1 m | 1 | Internet uplink (patch panel) |
DAC kabely jsou levnější než SFP+ optika + patch cordy — vhodné pro single-rack.
---
## 5. Rack layout
### 5.1 Rozměry a pozice
| U | Zařízení | Výkon (W) |
|---|----------|-----------|
| U1 | Switch 10 GbE (1U) | ~60 W |
| U2 | UPS (2U) | — |
| U3 | (volný, ventilace) | — |
| U4 | Server Node 1 (1U) | ~250 W |
| U5 | Server Node 2 (1U) | ~250 W |
| U6 | Server Node 3 (1U) | ~250 W |
| U7U15 | Volné (příp. storage, patch panel) | — |
| Parametr | Hodnota |
|----------|---------|
| Rack typ | 15U wall-mount, 19", 600×600 mm |
| Celkový IT load | ~810 W |
| PUE odhad | ~1.5 (kancelářská místnost, žádné precise cooling) |
| Chlazení | Běžná kancelářská klimatizace (ASHRAE A2: 1035 °C). Pro <1 kW dostačuje. |
**Poznámka:** KB (DATACENTERS.md) uvádí pro nízkou hustotu (<5 kW/rack) free air cooling. V kanceláři postačí standardní ventilace a AC.
### 5.2 UPS
| Parametr | Hodnota |
|----------|---------|
| Typ | VI (line-interactive) — dle DATACENTERS.md pro menší racky |
| Kapacita | 2000 VA / 1200 W |
| Záložní doba | ~1520 min při 810 W loadu |
| Výstup | 8× C13 (pro servery + switch) |
| Baterie | VRLA (levnější) nebo Li-ion LFP |
| Management | USB / SNMP karta (automatické vypnutí Proxmox) |
Volitelně lze rozšířit na VFI (double-conversion) UPS pro čistší výstup, ale u dema postačuje VI.
### 5.3 PDU
1× základní 1U PDU (8× C13), 230 V / 10 A — pro distribuci do serverů.
---
## 6. Hypervisor — Proxmox VE
### 6.1 Instalace a konfigurace
| Komponenta | Verze / Konfigurace |
|------------|---------------------|
| Hypervisor | Proxmox VE 8.x (Debian 12 + KVM + LXC) |
| Storage backend | Ceph Reef / Squid (18.x) integrovaný v Proxmox |
| Cluster | 3-node cluster, Corosync + PMXCFS |
| HA | Proxmox HA — 1 node failure tolerance (ostatní 2 převezmou VM) |
| Fencing | watchdog (softdog) + Proxmox HA manager |
### 6.2 Licence
| Položka | Cena | Poznámka |
|---------|------|----------|
| Proxmox VE | $0 | Open source, plná funkcionalita bez licence |
| Proxmox komunita support | $0 | Fórum, wiki |
| Proxmox podnikový support (volitelný) | ~€500/host/rok | Lze dokoupit později |
HYPERVISORS.md: Proxmox VE je "open source (free)", licence není vyžadována.
### 6.3 HA nastavení
- Skupina HA: všechny 3 nody, no-quorum-policy = "stop" (pro demo)
- Max restart VM: 2 pokusy
- Migration: live migration přes Ceph RBD (sdílený storage)
---
## 7. Odhad rozpočtu
**Upozornění:** KB neobsahuje konkrétní ceny komponent. Následující částky jsou orientační tržní odhady (Q2 2026, USD).
### 7.1 Servery (3×)
| Položka | Kusů | Cena/kus | Celkem |
|---------|------|----------|--------|
| 1U rack server (basic config, bez CPU/RAM/disk) | 3 | ~$1 200 | $3 600 |
| AMD EPYC 9224 (24C) / Intel Xeon 5418Y (16C) — dle KB | 3 | ~$900 | $2 700 |
| RAM 128 GB (4× 32 GB DDR5-4800 RDIMM) | 3 | ~$600 | $1 800 |
| 240 GB SATA SSD (OS) | 6 | ~$50 | $300 |
| 960 GB SATA SSD (Ceph OSD) | 9 | ~$150 | $1 350 |
| Dual-port 10 GbE SFP+ NIC (např. Intel X710-DA2) | 6 | ~$120 | $720 |
| **Servery celkem** | | | **~$10 470** |
### 7.2 Síť
| Položka | Kusů | Cena/kus | Celkem |
|---------|------|----------|--------|
| MikroTik CRS326-24S+2Q+RM (24× 10GbE SFP+) | 1 | ~$600 | $600 |
| SFP+ DAC kabel 3 m (pasivní) | 12 | ~$15 | $180 |
| Sítě celkem | | | **~$780** |
### 7.3 Rack a napájení
| Položka | Kusů | Cena/kus | Celkem |
|---------|------|----------|--------|
| 15U wall-mount rack 19" | 1 | ~$300 | $300 |
| UPS 2000 VA (line-interactive, VRLA) | 1 | ~$450 | $450 |
| 1U PDU basic (8× C13) | 1 | ~$60 | $60 |
| Rack + power celkem | | | **~$810** |
### 7.4 Ostatní
| Položka | Cena |
|---------|------|
| Cat6A patch kabely, management | ~$50 |
| Montážní materiál, velcro | ~$30 |
| Přeprava a instalace | ~$200 |
| Ostatní celkem | **~$280** |
### 7.5 Celková kalkulace
| Kategorie | Částka |
|-----------|--------|
| Servery (3× node) | ~$10 470 |
| Síť (switch + kabely) | ~$780 |
| Rack + napájení | ~$810 |
| Ostatní | ~$280 |
| **Celkem** | **~$12 340** |
| Rezerva (1015 %) | ~$1 2001 800 |
| **Celkem s rezervou** | **~$13 500$14 100** |
Rozpočet **$10 000$15 000** je dosažitelný. Při použití levnějších CPU (EPYC 4124P / Xeon E-2488) lze sestavit za ~$8 0009 000, ale s omezeným výkonem pro Ceph.
**Možné úspory:**
- CPU: 2× EPYC 4124P (4C) + 1× silnější node → ~$800 úspora (ale asymetrický cluster)
- OSD: 2× místo 3× SSD/node → ~$500 úspora (menší kapacita)
- Switch: 12-port místo 24-port → ~$300 úspora
---
## 8. Topologický diagram
```mermaid
flowchart TB
subgraph Rack["15U Rack (kancelář)"]
U1["U1: 10GbE Switch (MikroTik)"]
U2["U2: UPS 2000 VA"]
U4["U4: Node 1 — Proxmox + Ceph OSD"]
U5["U5: Node 2 — Proxmox + Ceph OSD"]
U6["U6: Node 3 — Proxmox + Ceph OSD"]
end
subgraph Node1["Node 1 (detail)"]
N1_CPU["CPU: EPYC 9224 (24C)"]
N1_RAM["RAM: 128 GB DDR5"]
N1_OS["OS: 2× 240 GB SSD (RAID 1)"]
N1_OSD1["OSD.0: 960 GB SSD"]
N1_OSD2["OSD.1: 960 GB SSD"]
N1_OSD3["OSD.2: 960 GB SSD"]
N1_NIC["NIC: 4× 10GbE SFP+"]
N1_BMC["BMC: 1× 1GbE"]
end
U1 ---|"4× 10GbE LACP<br/>(public + cluster)"| U4
U1 ---|"4× 10GbE LACP"| U5
U1 ---|"4× 10GbE LACP"| U6
U4 --- N1_CPU
U4 --- N1_RAM
U4 --- N1_OS
U4 --- N1_OSD1
U4 --- N1_OSD2
U4 --- N1_OSD3
U4 --- N1_NIC
U4 --- N1_BMC
subgraph Ceph["Ceph Cluster"]
CEPH_MON["3× MON (1 per node)"]
CEPH_MGR["3× MGR (1 per node)"]
CEPH_OSD["9× OSD (3 per node)"]
end
U4 --- CEPH_MON
U5 --- CEPH_MON
U6 --- CEPH_MON
U4 --- CEPH_MGR
U5 --- CEPH_MGR
U6 --- CEPH_MGR
U4 --- CEPH_OSD
U5 --- CEPH_OSD
U6 --- CEPH_OSD
subgraph Proxmox["Proxmox VE Cluster"]
PMX_HA["HA Group (3 nodes)"]
PMX_HA --- U4
PMX_HA --- U5
PMX_HA --- U6
end
subgraph Uplink["Internet / LAN"]
UPLINK_SW["Office LAN<br/>(1 GbE)"]
end
U1 ---|"1× Cat6A<br/>1 GbE"| UPLINK_SW
U1 ---|"Internet<br/>(ISP router)"| UPLINK_SW
```
---
## 9. Shrnutí a klíčová rozhodnutí
| Rozhodnutí | Varianta | Zdůvodnění |
|------------|----------|------------|
| Hypervisor | Proxmox VE | HYPERVISORS.md: "Pro SME / nízký budget — open source, vestavěný Ceph, žádné licenční náklady". Pro demo ideální. |
| Storage | Ceph (3× replikace) | STORAGE.md + SERVER-CONFIG.md: Ceph je doporučený SDS pro Proxmox, 3 nodes minimum pro kvórum. |
| CPU | Single-socket EPYC 9224 / Xeon 5418Y | Kompromis mezi cenou (Mini varianta ~1 socket) a výkonem pro Ceph (Ceph varianta ~12+ jader). |
| Network | 10 GbE SFP+ (místo 25 GbE) | KB doporučuje 25 GbE, ale pro demo low-cost stačí 10 GbE. Koncept (oddělení public/cluster sítě) zůstává stejný. |
| Rack | 15U wall-mount | Vhodný do kanceláře, bez raised floor, bez precision cooling. |
| UPS | 2000 VA line-interactive | DATACENTERS.md: VI typ pro menší racky. Pro demo dostačuje. |
| Licence | Proxmox VE (free) | Bez licenčních nákladů, support lze dokoupit později. |
### Kompromisy oproti produkčnímu nasazení
- **25 GbE → 10 GbE**: nižší propustnost Ceph cluster sítě (v demo prostředí nevadí)
- **HDD → SSD**: pro Ceph OSD volíme SSD místo HDD (vyšší cena, lepší výkon — v demu jde o funkčnost, ne kapacitu)
- **2× 10 GbE public + 2× 10 GbE cluster → dohromady na LACP**: lze sloučit při nedostatku portů, ale separace je lepší
- **Chlazení**: office AC, nikoliv DC-grade precision cooling (PUE ~1.51.8)
### Co KB neřeší (doplněno z praxe)
KB neobsahuje konkrétní ceny komponent — rozpočet je orientační tržní odhad. Dále neřeší konkrétní model switch poskytovatele L2+ funkcí (VLAN, LACP, Jumbo frames). Zde vycházíme z běžné praxe pro SOHO/SME segment.
---
## 10. Použité zdroje z KB
- **DATACENTERS.md** — rack layout, power chain, UPS typy, cooling třídy (ASHRAE), cabling standardy
- **HYPERVISORS.md** — Proxmox VE jako open source varianta, srovnání platforem, varianta Mini (23 hosty), Ceph connectivity
- **SERVER-CONFIG.md** — Čistě Ceph varianta (36 hostů), HW specifikace, network design, BIOS nastavení
- **STORAGE.md** — Ceph architektura (MON/MGR/OSD, CRUSH map, BlueStore, replikace), SDS přehled
- **CONNECTIVITY.md** — Ethernet rychlosti (10/25 GbE), SFP+ form factor, NIC placement, management port
- **NETWORKING.md** — VLAN segmentation, MTU a jumbo frames, best practices
- **SERVER-HW.md** — CPU selection (EPYC vs Xeon), RAM osazování (1DPC/2DPC), NUMA, form faktory
---
*Poslední revize: 2026-06-04*

BIN
sources/.DS_Store vendored Normal file

Binary file not shown.

21
sources/README.en.md Normal file
View File

@@ -0,0 +1,21 @@
# Raw sources — Immutable reference data
This directory contains raw reference data (links, books, standards, RFCs) from which the knowledge base is built.
**Rules:**
- Content is **immutable** — once added, it does not change (append only)
- A source is tagged `[done]` if it has already been processed into the KB
- Each area has its own `sources.md`
## Structure
```
sources/
├── README.md
├── cloud/
├── networking/
├── monitoring/
├── cicd/
├── databases/
└── infrastructure/
```

21
sources/README.md Normal file
View File

@@ -0,0 +1,21 @@
# Raw zdroje — Immutable reference data
Tento adresář obsahuje nespracovaná referenční data (odkazy, knihy, standardy, RFC), ze kterých knowledge base vychází.
**Pravidla:**
- Obsah je **immutable** — po přidání se nemění (pouze append)
- Zdroj označujeme tagem `[done]` pokud je již zpracován do KB
- Každá oblast má vlastní `sources.md`
## Struktura
```
sources/
├── README.md
├── cloud/
├── networking/
├── monitoring/
├── cicd/
├── databases/
└── infrastructure/
```

View File

@@ -0,0 +1,35 @@
# CI/CD and DevOps — Sources
## Official documentation
| Source | URL | Status |
|-------|-----|--------|
| Terraform docs | https://developer.hashicorp.com/terraform/docs | `[done]` |
| ArgoCD docs | https://argo-cd.readthedocs.io/ | `[done]` |
| Flux docs | https://fluxcd.io/flux/ | `[done]` |
| GitHub Actions docs | https://docs.github.com/en/actions | `[done]` |
| GitLab CI docs | https://docs.gitlab.com/ee/ci/ | `[done]` |
## Books
| Name | Author | ISBN | Status |
|-------|-------|------|--------|
| The DevOps Handbook | Kim, Humble, Debois, Willis | 978-1942788003 | `[done]` |
| Infrastructure as Code (2nd ed.) | Kief Morris | 978-1098114671 | `[done]` |
| Terraform: Up and Running (3rd ed.) | Yevgeniy Brikman | 978-1098166045 | `[done]` |
| Continuous Delivery | Humble, Farley | 978-0321601912 | `[done]` |
## Standards
| Standard | Description | Status |
|----------|-------|--------|
| 12 Factor App | https://12factor.net/ | `[done]` |
| CNCF Cloud Native Landscape | https://landscape.cncf.io/ | `[done]` |
## New books (20242026)
| Name | Author | ISBN | Status |
|-------|-------|------|--------|
| CI/CD Design Patterns | Bajpai, Schildmeijer, Piwosz, Mishra | 978-1-83588-965-7 | `[done]` |
| AI-Native Software Delivery | Durkin, Minick, Gaikwad | — (O'Reilly, 2025) | `[done]` |
| DevOps Frameworks, Techniques, and Tools | Vijayakumaran, Kofler, Öggl, Springer | 978-1-4932-2670-2 | `[done]` |

35
sources/cicd/sources.md Normal file
View File

@@ -0,0 +1,35 @@
# CI/CD a DevOps — Zdroje
## Oficiální dokumentace
| Zdroj | URL | Status |
|-------|-----|--------|
| Terraform docs | https://developer.hashicorp.com/terraform/docs | `[done]` |
| ArgoCD docs | https://argo-cd.readthedocs.io/ | `[done]` |
| Flux docs | https://fluxcd.io/flux/ | `[done]` |
| GitHub Actions docs | https://docs.github.com/en/actions | `[done]` |
| GitLab CI docs | https://docs.gitlab.com/ee/ci/ | `[done]` |
## Knihy
| Název | Autor | ISBN | Status |
|-------|-------|------|--------|
| The DevOps Handbook | Kim, Humble, Debois, Willis | 978-1942788003 | `[done]` |
| Infrastructure as Code (2nd ed.) | Kief Morris | 978-1098114671 | `[done]` |
| Terraform: Up and Running (3rd ed.) | Yevgeniy Brikman | 978-1098166045 | `[done]` |
| Continuous Delivery | Humble, Farley | 978-0321601912 | `[done]` |
## Standardy
| Standard | Popis | Status |
|----------|-------|--------|
| 12 Factor App | https://12factor.net/ | `[done]` |
| CNCF Cloud Native Landscape | https://landscape.cncf.io/ | `[done]` |
## Nové knihy (20242026)
| Název | Autor | ISBN | Status |
|-------|-------|------|--------|
| CI/CD Design Patterns | Bajpai, Schildmeijer, Piwosz, Mishra | 978-1-83588-965-7 | `[done]` |
| AI-Native Software Delivery | Durkin, Minick, Gaikwad | — (O'Reilly, 2025) | `[done]` |
| DevOps Frameworks, Techniques, and Tools | Vijayakumaran, Kofler, Öggl, Springer | 978-1-4932-2670-2 | `[done]` |

View File

@@ -0,0 +1,37 @@
# Cloud architecture — Sources
## Official documentation
| Source | URL | Status |
|-------|-----|--------|
| AWS Well-Architected Framework | https://docs.aws.amazon.com/wellarchitected/latest/framework/ | `[done]` |
| Azure Well-Architected Framework | https://learn.microsoft.com/en-us/azure/well-architected/ | `[done]` |
| Google Cloud Architecture Framework | https://cloud.google.com/architecture/framework | `[done]` |
| AWS Multi-AZ / Multi-Region whitepaper | https://docs.aws.amazon.com/whitepapers/latest/aws-fault-isolation-boundaries/ | `[done]` |
## Books
| Name | Author | ISBN | Status |
|-------|-------|------|--------|
| Cloud Architecture Patterns | Bill Wilder | 978-1449319779 | `[done]` |
| Building Evolutionary Architectures | Ford, Parsons, Kua | 978-1492097549 | `[done]` |
## New books (20242026)
| Name | Author | ISBN | Status |
|-------|-------|------|--------|
| Multi-Cloud Administration Guide | Jeroen Mulder | 978-1-5015-1948-2 | `[done]` |
| AWS for Solutions Architects (3rd ed.) | Shrivastava, Srivastav, Thakur | 978-1-83664-193-3 | `[done]` |
| Engineering Resilient Systems on AWS | Schwarz, Moran, Bachmeier | 978-1-098-16241-2 | `[done]` |
| Building Resilient Architectures on AWS | — | 978-1-83588-711-0 | `[done]` |
| Multi-Cloud Handbook for Developers | Natarajan, Jacob | 978-1-80461-709-0 | `[done]` |
| The Azure Cloud Native Architecture Mapbook (2nd ed.) | Stéphane Eyskens | 978-1-80580-505-2 | `[done]` |
| Cloud Computing: AWS, Azure, and Google Cloud | Azhar ul Haque Sario | 978-3384756886 | `[done]` |
## Certifications
| Certification | Area |
|-------------|--------|
| AWS Solutions Architect — Associate | AWS |
| Azure Solutions Architect Expert | Azure |
| Google Professional Cloud Architect | GCP |

37
sources/cloud/sources.md Normal file
View File

@@ -0,0 +1,37 @@
# Cloud architektura — Zdroje
## Oficiální dokumentace
| Zdroj | URL | Status |
|-------|-----|--------|
| AWS Well-Architected Framework | https://docs.aws.amazon.com/wellarchitected/latest/framework/ | `[done]` |
| Azure Well-Architected Framework | https://learn.microsoft.com/en-us/azure/well-architected/ | `[done]` |
| Google Cloud Architecture Framework | https://cloud.google.com/architecture/framework | `[done]` |
| AWS Multi-AZ / Multi-Region whitepaper | https://docs.aws.amazon.com/whitepapers/latest/aws-fault-isolation-boundaries/ | `[done]` |
## Knihy
| Název | Autor | ISBN | Status |
|-------|-------|------|--------|
| Cloud Architecture Patterns | Bill Wilder | 978-1449319779 | `[done]` |
| Building Evolutionary Architectures | Ford, Parsons, Kua | 978-1492097549 | `[done]` |
## Nové knihy (20242026)
| Název | Autor | ISBN | Status |
|-------|-------|------|--------|
| Multi-Cloud Administration Guide | Jeroen Mulder | 978-1-5015-1948-2 | `[done]` |
| AWS for Solutions Architects (3rd ed.) | Shrivastava, Srivastav, Thakur | 978-1-83664-193-3 | `[done]` |
| Engineering Resilient Systems on AWS | Schwarz, Moran, Bachmeier | 978-1-098-16241-2 | `[done]` |
| Building Resilient Architectures on AWS | — | 978-1-83588-711-0 | `[done]` |
| Multi-Cloud Handbook for Developers | Natarajan, Jacob | 978-1-80461-709-0 | `[done]` |
| The Azure Cloud Native Architecture Mapbook (2nd ed.) | Stéphane Eyskens | 978-1-80580-505-2 | `[done]` |
| Cloud Computing: AWS, Azure, and Google Cloud | Azhar ul Haque Sario | 978-3384756886 | `[done]` |
## Certifikace
| Certifikace | Oblast |
|-------------|--------|
| AWS Solutions Architect — Associate | AWS |
| Azure Solutions Architect Expert | Azure |
| Google Professional Cloud Architect | GCP |

View File

@@ -0,0 +1,34 @@
# Database architecture — Sources
## Official documentation
| Source | URL | Status |
|-------|-----|--------|
| PostgreSQL docs | https://www.postgresql.org/docs/ | `[done]` |
| MySQL docs | https://dev.mysql.com/doc/ | `[done]` |
| MongoDB docs | https://www.mongodb.com/docs/ | `[done]` |
| Redis docs | https://redis.io/docs/ | `[done]` |
| Cassandra docs | https://cassandra.apache.org/doc/ | `[done]` |
| Amazon DynamoDB docs | https://docs.aws.amazon.com/dynamodb/ | `[done]` |
## Books
| Name | Author | ISBN | Status |
|-------|-------|------|--------|
| Designing Data-Intensive Applications (1st ed.) | Martin Kleppmann | 978-1449373320 | `[done]` |
| Designing Data-Intensive Applications (2nd ed.) | Kleppmann, Riccomini | 978-1098119058 | `[done]` |
| Database Internals | Alex Petrov | 978-1492040346 | `[done]` |
| High Performance MySQL | Schwartz, Zaitsev, Tkachenko | 978-1492080510 | `[done]` |
| PostgreSQL: Up and Running | Regina Obe, Leo Hsu | 978-1491963418 | `[done]` |
| Architecting an Apache Iceberg Lakehouse | Alex Merced | 978-1-63343-510-0 | `[done]` |
| More SQL Antipatterns | Bill Karwin | 979-8888652060 | `[done]` |
| AI-Ready PostgreSQL 18 | Vibhor Kumar, Marc Linster | 978-1-80602-847-4 | `[done]` |
| Vector Databases | Nitin Borwankar | 978-1-098-17758-4 | `[done]` |
## Articles / talks
| Name | URL | Status |
|-------|-----|--------|
| CAP Theorem (Eric Brewer) | https://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed/ | `[done]` |
| PACELC theorem | https://www.cs.umd.edu/~abadi/papers/abadi-pacelc.pdf | `[done]` |
| Amazon Dynamo DB paper | https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf | `[done]` |

View File

@@ -0,0 +1,34 @@
# Databázová architektura — Zdroje
## Oficiální dokumentace
| Zdroj | URL | Status |
|-------|-----|--------|
| PostgreSQL docs | https://www.postgresql.org/docs/ | `[done]` |
| MySQL docs | https://dev.mysql.com/doc/ | `[done]` |
| MongoDB docs | https://www.mongodb.com/docs/ | `[done]` |
| Redis docs | https://redis.io/docs/ | `[done]` |
| Cassandra docs | https://cassandra.apache.org/doc/ | `[done]` |
| Amazon DynamoDB docs | https://docs.aws.amazon.com/dynamodb/ | `[done]` |
## Knihy
| Název | Autor | ISBN | Status |
|-------|-------|------|--------|
| Designing Data-Intensive Applications (1st ed.) | Martin Kleppmann | 978-1449373320 | `[done]` |
| Designing Data-Intensive Applications (2nd ed.) | Kleppmann, Riccomini | 978-1098119058 | `[done]` |
| Database Internals | Alex Petrov | 978-1492040346 | `[done]` |
| High Performance MySQL | Schwartz, Zaitsev, Tkachenko | 978-1492080510 | `[done]` |
| PostgreSQL: Up and Running | Regina Obe, Leo Hsu | 978-1491963418 | `[done]` |
| Architecting an Apache Iceberg Lakehouse | Alex Merced | 978-1-63343-510-0 | `[done]` |
| More SQL Antipatterns | Bill Karwin | 979-8888652060 | `[done]` |
| AI-Ready PostgreSQL 18 | Vibhor Kumar, Marc Linster | 978-1-80602-847-4 | `[done]` |
| Vector Databases | Nitin Borwankar | 978-1-098-17758-4 | `[done]` |
## Články / přednášky
| Název | URL | Status |
|-------|-----|--------|
| CAP Theorem (Eric Brewer) | https://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed/ | `[done]` |
| PACELC theorem | https://www.cs.umd.edu/~abadi/papers/abadi-pacelc.pdf | `[done]` |
| Amazon Dynamo DB paper | https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf | `[done]` |

View File

@@ -0,0 +1,182 @@
# Infrastructure — Sources
Split into separate files:
- [HYPERVISORS.en.md](../../HYPERVISORS.en.md) — hypervisors and virtualization
- [DATACENTERS.en.md](../../DATACENTERS.en.md) — data centers
- [STORAGE.en.md](../../STORAGE.en.md) — storage
- [HARDWARE.en.md](../../HARDWARE.en.md) — hardware and servers
## Official documentation
| Source | URL | Status |
|-------|-----|--------|
| VMware vSphere docs | https://docs.vmware.com/en/VMware-vSphere/ | `[done]` |
| Microsoft Hyper-V docs | https://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/ | `[done]` |
| Proxmox VE docs | https://pve.proxmox.com/wiki/Main_Page | `[done]` |
| OpenStack docs | https://docs.openstack.org/ | `[done]` |
| Ceph docs | https://docs.ceph.com/ | `[done]` |
| Redfish specification | https://www.dmtf.org/standards/redfish | `[done]` |
## Standards
| Standard | Description | Status |
|----------|-------|--------|
| TIA-942 | Telecommunications Infrastructure Standard for Data Centers | `[done]` |
| Uptime Institute Tier Standard | Data Center Tier Classification | `[done]` |
| ASHRAE TC 9.9 | Thermal Guidelines for Data Processing Environments | `[done]` |
| S.M.A.R.T. | Self-Monitoring, Analysis and Reporting Technology | `[done]` |
## Books
| Name | Author | ISBN | Status |
|-------|-------|------|--------|
| The Data Center as a Computer (1st ed. → 4th ed. 2025) | Barroso, Hölzle, Ranganathan | 978-3-031-99488-3 | `[done]` |
| Storage Systems | Ganger, Gibson | 978-1680837540 | `[done]` |
| Virtualization Essentials | Matthew Portnoy | 978-1119481513 | `[done]` |
| VMware vSphere Design (2nd ed.) | Forbes Guthrie, Scott Lowe | 978-1119130312 | `[done]` |
| AI Data Center Network Design and Technologies (1st ed.) | Subramaniam, Styszynski, Tambakuwala | 978-0-13-543628-8 | `[done]` |
| Electronics Cooling: From the Chip to the Datacenter | Abraham et al. | 978-0-443-47084-4 | `[done]` |
| The AI Cloud Infrastructure Blueprint | Thummarakoti, Vududala, Madupati, Kaushik | 978-1-041-16642-9 | `[done]` |
## Server connectivity
| Source | URL | Status |
|-------|-----|--------|
| HPE Gen11 NIC selection guide | https://www.hpe.com/psnow/doc/a50007643enw | `[done]` |
| Broadcom / Emulex FC HBA specs | https://www.broadcom.com/products/storage/fibre-channel-host-bus-adapters | `[done]` |
| NVIDIA Mellanox Ethernet + InfiniBand adapters | https://www.nvidia.com/en-us/networking/ethernet/ | `[done]` |
| NVMe-oF specification (NVM Express Inc.) | https://nvmexpress.org/specifications/ | `[done]` |
| Dell PowerEdge R760 NIC placement guide | https://www.dell.com/support/manuals/en-us/poweredge-r760/per760_ism_pub/ | `[done]` |
## Server memory — DIMM population
| Source | URL | Status |
|-------|-----|--------|
| Dell PowerEdge R760 Installation & Service Manual — System Memory Guidelines | https://www.dell.com/support/manuals/en-al/oth-r760/per760_ism_pub/system-memory-guidelines | `[done]` |
| Dell PowerEdge R760 — General Memory Module Installation Guidelines | https://www.dell.com/support/manuals/en-al/oth-r760/per760_ism_pub/general-memory-module-installation-guidelines | `[done]` |
| HPE Gen11 Server Memory Population Rules (4th Gen Intel Xeon) | https://www.hpe.com/psnow/doc/a50007437enw | `[done]` |
| HPE Gen11 Server Memory Population Rules (5th Gen Intel Xeon) | https://www.hpe.com/psnow/doc/a50010242enw | `[done]` |
| HPE Gen11/Gen12 Server Memory Population Rules (AMD EPYC 9005) | https://www.hpe.com/psnow/doc/a50012817enw | `[done]` |
| Single Rank vs Dual Rank vs Quad Rank vs Octa Rank Memory | https://corewavelabs.com/single-rank-vs-dual-rank-vs-quad-vs-octa-memory/ | `[done]` |
## Enterprise storage
| Source | URL | Status |
|-------|-----|--------|
| Hitachi VSP 5000 series datasheet | https://www.hitachivantara.com/en-us/products/storage/vsp-5000-series | `[done]` |
| Hitachi VSP E series datasheet | https://www.hitachivantara.com/en-us/products/storage/vsp-e-series | `[done]` |
| Huawei OceanStor Dorado V6 datasheet | https://e.huawei.com/en/products/storage/all-flash-storage/dorado-8000 | `[done]` |
| Huawei OceanStor Dorado V7 announcement | https://e.huawei.com/en/news/2025/oceanstor-dorado-v7 | `[done]` |
| Dell PowerStore documentation | https://www.dell.com/en-us/dt/storage/powerstore.htm | `[done]` |
| Dell PowerMax documentation | https://www.dell.com/en-us/dt/storage/powermax.htm | `[done]` |
| HPE Alletra documentation | https://www.hpe.com/us/en/storage/alletra.html | `[done]` |
| Infinidat InfiniBox SSA G4 datasheet | https://www.infinidat.com/en/products/infinibox-ssa | `[done]` |
| Pure Storage FlashArray documentation | https://www.purestorage.com/products/flasharray.html | `[done]` |
| Lenovo ThinkSystem DM series docs | https://lenovopress.com/storage/thinkstorage/dm-series | `[done]` |
| Lenovo ThinkSystem DE series docs | https://lenovopress.com/storage/thinkstorage/de-series | `[done]` |
| Synology Unified Controller datasheet | https://www.synology.com/en-us/products/UC3400 | `[done]` |
## OpenStack
| Source | URL | Status |
|-------|-----|--------|
| OpenStack Neutron networking docs | https://docs.openstack.org/neutron/latest/ | `[done]` |
| OpenStack Cinder block storage docs | https://docs.openstack.org/cinder/latest/ | `[done]` |
| OpenStack Swift object storage docs | https://docs.openstack.org/swift/latest/ | `[done]` |
| OpenStack Cyborg GPU lifecycle docs | https://docs.openstack.org/cyborg/latest/ | `[done]` |
| OpenStack Ironic bare metal docs | https://docs.openstack.org/ironic/latest/ | `[done]` |
| TripleO deployment docs | https://docs.openstack.org/tripleo-docs/latest/ | `[done]` |
| OpenStack Kolla (Kubernetes deployment) docs | https://docs.openstack.org/kolla/latest/ | `[done]` |
| Canonical Charmed OpenStack docs | https://ubuntu.com/openstack/docs | `[done]` |
| OpenStack Ceilometer / Telemetry docs | https://docs.openstack.org/ceilometer/latest/ | `[done]` |
| OpenStack Masakari (VM HA) docs | https://docs.openstack.org/masakari/latest/ | `[done]` |
| OpenStack Cyborg (GPU lifecycle management) | https://docs.openstack.org/cyborg/latest/ | `[done]` |
| OpenQA — OpenStack CI/CD | https://github.com/openstack-infra/openqa | `[done]` |
| OpenStack Charms (Juju) deployment | https://charmhub.io/openstack | `[done]` |
| OpenStack Zuul CI/CD system | https://zuul-ci.org/docs/zuul/ | `[done]` |
## VMware exit strategy
| Source | URL | Status |
|-------|-----|--------|
| VMware Alternatives in 2026: A Practical Exit Playbook — Platform9 | https://platform9.com/blog/vmware-alternatives-in-2026-a-practical-exit-playbook | `[done]` |
| VMware Exit Strategy — Intelligent Visibility | https://intelligentvisibility.com/data-center-infrastructure/vmware-exit-strategy | `[done]` |
| VMware to Proxmox Migration Guide 2026 — Petronella Tech | https://petronellatech.com/blog/vmware-to-proxmox-migration-guide | `[done]` |
| Migrating from VMware to Proxmox — Hornetsecurity | https://www.hornetsecurity.com/en/blog/migrate-vmware-to-proxmox | `[done]` |
| The Great VMware Exodus — Virtualization Howto | https://www.virtualizationhowto.com/2025/07/the-great-vmware-exodus-real-migration-stories-and-alternatives-for-2025/ | `[done]` |
| VMware to Hyper-V Migration 2026 — iShift | https://www.ishift.net/vmware-hyper-v-migration-2026 | `[done]` |
| VMware to Nutanix Migration 2026 — Redress Compliance | https://redresscompliance.com/vmware-to-nutanix | `[done]` |
| Hyper-V Licensing 2026 — Redress Compliance | https://redresscompliance.com/hyper-v-licensing-2026 | `[done]` |
| Beyond virtualization: a guide to modern vSphere alternatives — Spectro Cloud | https://www.spectrocloud.com/blog/vsphere-alternatives | `[done]` |
| VMware Migration in 2026: Proxmox, KVM, XCP-ng & Veeam — StarWind | https://starwindsoftware.com/blog/vmware-migration-to-proxmox-kvm-xcp-ng-2026 | `[done]` |
| Complete guide to modern vSphere alternatives — Spectro Cloud | https://www.spectrocloud.com/blog/vsphere-alternatives | `[done]` |
| Broadcom VMware Acquisition: What's Next — Sayers | https://www.sayers.com/blog/after-the-deal-whats-next-for-vmware-customers | `[done]` |
| Stanford University migration from VMware to Proxmox | https://itcommunity.stanford.edu/news/enterprise-technology-completes-successful-virtual-infrastructure-migration-vmware-proxmox | `[done]` |
| | **Sangfor** | |
| Sangfor HCI — product page | https://www.sangfor.com/cloud-and-infrastructure/products/hci-hyper-converged-infrastructure | `[done]` |
| Sangfor aSV — hypervisor | https://www.sangfor.com/cloud-and-infrastructure/products/asv-hypervisor-server-virtualization | `[done]` |
| Sangfor vs VMware — feature comparison | https://www.sangfor.com/blog/cloud-and-infrastructure/sangfor-hci-vs-vmware-feature-comparison | `[done]` |
| | **AI infrastructure** | |
| NVIDIA DGX — documentation | https://www.nvidia.com/en-us/data-center/dgx-platform/ | `[done]` |
| InfiniBand — Mellanox/NVIDIA | https://www.nvidia.com/en-us/networking/products/infiniband/ | `[done]` |
| Lustre parallel filesystem | https://www.lustre.org/ | `[done]` |
| WekaFS — AI storage | https://www.weka.io/ | `[done]` |
| vLLM — inference server | https://github.com/vllm-project/vllm | `[done]` |
| Megatron-LM — distributed training | https://github.com/NVIDIA/Megatron-LM | `[done]` |
| | **Kubernetes / Cluster API** | |
| Cluster API (CAPI) — official documentation (The CAPI Book) | https://cluster-api.sigs.k8s.io/ | `[done]` |
| Cluster API — GitHub (kubernetes-sigs/cluster-api) | https://github.com/kubernetes-sigs/cluster-api | `[done]` |
| Cluster API — provider list | https://cluster-api.sigs.k8s.io/reference/providers.html | `[done]` |
| Kubernetes — official documentation | https://kubernetes.io/docs/ | `[done]` |
| K3s — lightweight Kubernetes | https://k3s.io/ | `[done]` |
| RKE2 — Rancher Kubernetes Engine 2 | https://docs.rke2.io/ | `[done]` |
| Talos — API-driven Kubernetes OS | https://www.talos.dev/ | `[done]` |
| Kamaji — hosted control plane provider | https://kamaji.clastix.io/ | `[done]` |
| Metal3 — bare metal provider for CAPI | https://metal3.io/ | `[done]` |
| Cluster API — ClusterClass and topologies | https://kubernetes.io/blog/2021/10/08/capi-clusterclass-and-managed-topologies/ | `[done]` |
| | **Big Data** | |
| Apache Spark — official documentation | https://spark.apache.org/docs/latest/ | `[done]` |
| Apache Flink — official documentation | https://flink.apache.org/ | `[done]` |
| Trino — distributed SQL engine | https://trino.io/docs/current/ | `[done]` |
| Apache Iceberg — table format | https://iceberg.apache.org/ | `[done]` |
| Delta Lake — documentation | https://docs.delta.io/ | `[done]` |
| Apache Hudi | https://hudi.apache.org/ | `[done]` |
| Apache Paimon | https://paimon.apache.org/ | `[done]` |
| Apache Hadoop — documentation | https://hadoop.apache.org/docs/stable/ | `[done]` |
| Apache Airflow — documentation | https://airflow.apache.org/docs/ | `[done]` |
| Dagster — documentation | https://docs.dagster.io/ | `[done]` |
| Prefect — documentation | https://docs.prefect.io/ | `[done]` |
| HDFS architecture (Apache) | https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html | `[done]` |
| | **Operating Systems** | |
| Ubuntu lifecycle — Ubuntu Pro + ESM | https://ubuntu.com/about/release-cycle | `[done]` |
| RHEL lifecycle — Red Hat Enterprise Linux | https://access.redhat.com/support/policy/updates/errata | `[done]` |
| Rocky Linux lifecycle | https://rockylinux.org/download/ | `[done]` |
| AlmaLinux lifecycle | https://almalinux.org/ | `[done]` |
| Debian releases / LTS | https://wiki.debian.org/LTS | `[done]` |
| SLES lifecycle — SUSE | https://www.suse.com/lifecycle/ | `[done]` |
| Alpine Linux releases | https://alpinelinux.org/releases/ | `[done]` |
| Fedora lifecycle | https://docs.fedoraproject.org/en-US/releases/lifecycle/ | `[done]` |
| SELinux — Red Hat docs | https://www.redhat.com/en/topics/linux/what-is-selinux | `[done]` |
| AppArmor — Ubuntu wiki | https://wiki.ubuntu.com/AppArmor | `[done]` |
| | **Windows** | |
| Windows Server lifecycle | https://learn.microsoft.com/en-us/lifecycle/products/windows-server-2022/ | `[done]` |
| Windows Server 2025 lifecycle | https://learn.microsoft.com/en-us/lifecycle/products/windows-server-2025/ | `[done]` |
| Windows 11 lifecycle | https://learn.microsoft.com/en-us/lifecycle/products/windows-11-enterprise/ | `[done]` |
| Windows 10 EOL | https://learn.microsoft.com/en-us/lifecycle/products/windows-10-enterprise/ | `[done]` |
| Windows Server licensing (per core) | https://learn.microsoft.com/en-us/windows-server/get-started/editions-and-support | `[done]` |
| | **GPU pricing** | |
| NVIDIA AI GPU pricing guide (2026) | https://intuitionlabs.ai/articles/nvidia-ai-gpu-pricing-guide | `[done]` |
| GPU cloud pricing comparison (2026) | https://www.spheron.network/blog/gpu-cloud-pricing-comparison-2026/ | `[done]` |
| GPU pricing trends 2026 — CompuX | https://compux.net/docs/guides/gpu-pricing-trends-2026 | `[done]` |
| AMD MI300X pricing (2026) | https://www.thundercompute.com/blog/amd-mi300x-pricing | `[done]` |
| GPU price/performance frontier — Silicon Analysts | https://siliconanalysts.com/tools/frontier | `[done]` |
## Hardware manufacturers
| Manufacturer | Server series | Management |
|---------|---------------|------------|
| Dell | PowerEdge (R6xx, R7xx) | iDRAC / OpenManage |
| HPE | ProLiant (DL, ML, Synergy) | iLO / OneView |
| Cisco | UCS (B-Series, C-Series) | UCS Manager / Intersight |
| Lenovo | ThinkSystem (SR, ST) | XClarity |
| Supermicro | SuperServer (cloud, storage, GPU) | IPMI / SuperDoctor |

View File

@@ -0,0 +1,197 @@
# Infrastruktura — Zdroje
Rozděleno do samostatných souborů:
- [HYPERVISORS.md](../../HYPERVISORS.md) — hypervisory a virtualizace
- [DATACENTERS.md](../../DATACENTERS.md) — datová centra
- [STORAGE.md](../../STORAGE.md) — storage
- [HARDWARE.md](../../HARDWARE.md) — hardware a servery
## Oficiální dokumentace
| Zdroj | URL | Status |
|-------|-----|--------|
| VMware vSphere docs | https://docs.vmware.com/en/VMware-vSphere/ | `[done]` |
| Microsoft Hyper-V docs | https://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/ | `[done]` |
| Proxmox VE docs | https://pve.proxmox.com/wiki/Main_Page | `[done]` |
| OpenStack docs | https://docs.openstack.org/ | `[done]` |
| Ceph docs | https://docs.ceph.com/ | `[done]` |
| Redfish specification | https://www.dmtf.org/standards/redfish | `[done]` |
## Standardy
| Standard | Popis | Status |
|----------|-------|--------|
| TIA-942 | Telecommunications Infrastructure Standard for Data Centers | `[done]` |
| Uptime Institute Tier Standard | Data Center Tier Classification | `[done]` |
| ASHRAE TC 9.9 | Thermal Guidelines for Data Processing Environments | `[done]` |
| S.M.A.R.T. | Self-Monitoring, Analysis and Reporting Technology | `[done]` |
## Knihy
| Název | Autor | ISBN | Status |
|-------|-------|------|--------|
| The Data Center as a Computer (1st ed. → 4th ed. 2025) | Barroso, Hölzle, Ranganathan | 978-3-031-99488-3 | `[done]` |
| Storage Systems | Ganger, Gibson | 978-1680837540 | `[done]` |
| Virtualization Essentials | Matthew Portnoy | 978-1119481513 | `[done]` |
| VMware vSphere Design (2nd ed.) | Forbes Guthrie, Scott Lowe | 978-1119130312 | `[done]` |
| AI Data Center Network Design and Technologies (1st ed.) | Subramaniam, Styszynski, Tambakuwala | 978-0-13-543628-8 | `[done]` |
| Electronics Cooling: From the Chip to the Datacenter | Abraham et al. | 978-0-443-47084-4 | `[done]` |
| The AI Cloud Infrastructure Blueprint | Thummarakoti, Vududala, Madupati, Kaushik | 978-1-041-16642-9 | `[done]` |
## Server connectivity
| Zdroj | URL | Status |
|-------|-----|--------|
| HPE Gen11 NIC selection guide | https://www.hpe.com/psnow/doc/a50007643enw | `[done]` |
| Broadcom / Emulex FC HBA specs | https://www.broadcom.com/products/storage/fibre-channel-host-bus-adapters | `[done]` |
| NVIDIA Mellanox Ethernet + InfiniBand adapters | https://www.nvidia.com/en-us/networking/ethernet/ | `[done]` |
| NVMe-oF specification (NVM Express Inc.) | https://nvmexpress.org/specifications/ | `[done]` |
| Dell PowerEdge R760 NIC placement guide | https://www.dell.com/support/manuals/en-us/poweredge-r760/per760_ism_pub/ | `[done]` |
## Server memory — osazování DIMM
| Zdroj | URL | Status |
|-------|-----|--------|
| Dell PowerEdge R760 Installation & Service Manual — System Memory Guidelines | https://www.dell.com/support/manuals/en-al/oth-r760/per760_ism_pub/system-memory-guidelines | `[done]` |
| Dell PowerEdge R760 — General Memory Module Installation Guidelines | https://www.dell.com/support/manuals/en-al/oth-r760/per760_ism_pub/general-memory-module-installation-guidelines | `[done]` |
| HPE Gen11 Server Memory Population Rules (4th Gen Intel Xeon) | https://www.hpe.com/psnow/doc/a50007437enw | `[done]` |
| HPE Gen11 Server Memory Population Rules (5th Gen Intel Xeon) | https://www.hpe.com/psnow/doc/a50010242enw | `[done]` |
| HPE Gen11/Gen12 Server Memory Population Rules (AMD EPYC 9005) | https://www.hpe.com/psnow/doc/a50012817enw | `[done]` |
| Single Rank vs Dual Rank vs Quad Rank vs Octa Rank Memory | https://corewavelabs.com/single-rank-vs-dual-rank-vs-quad-vs-octa-memory/ | `[done]` |
## Enterprise storage
| Zdroj | URL | Status |
|-------|-----|--------|
| Hitachi VSP 5000 series datasheet | https://www.hitachivantara.com/en-us/products/storage/vsp-5000-series | `[done]` |
| Hitachi VSP E series datasheet | https://www.hitachivantara.com/en-us/products/storage/vsp-e-series | `[done]` |
| Huawei OceanStor Dorado V6 datasheet | https://e.huawei.com/en/products/storage/all-flash-storage/dorado-8000 | `[done]` |
| Huawei OceanStor Dorado V7 announcement | https://e.huawei.com/en/news/2025/oceanstor-dorado-v7 | `[done]` |
| Dell PowerStore documentation | https://www.dell.com/en-us/dt/storage/powerstore.htm | `[done]` |
| Dell PowerMax documentation | https://www.dell.com/en-us/dt/storage/powermax.htm | `[done]` |
| HPE Alletra documentation | https://www.hpe.com/us/en/storage/alletra.html | `[done]` |
| Infinidat InfiniBox SSA G4 datasheet | https://www.infinidat.com/en/products/infinibox-ssa | `[done]` |
| Pure Storage FlashArray documentation | https://www.purestorage.com/products/flasharray.html | `[done]` |
| Lenovo ThinkSystem DM series docs | https://lenovopress.com/storage/thinkstorage/dm-series | `[done]` |
| Lenovo ThinkSystem DE series docs | https://lenovopress.com/storage/thinkstorage/de-series | `[done]` |
| Synology Unified Controller datasheet | https://www.synology.com/en-us/products/UC3400 | `[done]` |
## OpenStack
| Zdroj | URL | Status |
|-------|-----|--------|
| OpenStack Neutron networking docs | https://docs.openstack.org/neutron/latest/ | `[done]` |
| OpenStack Cinder block storage docs | https://docs.openstack.org/cinder/latest/ | `[done]` |
| OpenStack Swift object storage docs | https://docs.openstack.org/swift/latest/ | `[done]` |
| OpenStack Cyborg GPU lifecycle docs | https://docs.openstack.org/cyborg/latest/ | `[done]` |
| OpenStack Ironic bare metal docs | https://docs.openstack.org/ironic/latest/ | `[done]` |
| TripleO deployment docs | https://docs.openstack.org/tripleo-docs/latest/ | `[done]` |
| OpenStack Kolla (Kubernetes deployment) docs | https://docs.openstack.org/kolla/latest/ | `[done]` |
| Canonical Charmed OpenStack docs | https://ubuntu.com/openstack/docs | `[done]` |
| OpenStack Ceilometer / Telemetry docs | https://docs.openstack.org/ceilometer/latest/ | `[done]` |
| OpenStack Masakari (VM HA) docs | https://docs.openstack.org/masakari/latest/ | `[done]` |
| OpenStack Cyborg (GPU lifecycle management) | https://docs.openstack.org/cyborg/latest/ | `[done]` |
| OpenQA — OpenStack CI/CD | https://github.com/openstack-infra/openqa | `[done]` |
| OpenStack Charms (Juju) deployment | https://charmhub.io/openstack | `[done]` |
| OpenStack Zuul CI/CD system | https://zuul-ci.org/docs/zuul/ | `[done]` |
## VMware exit strategie
| Zdroj | URL | Status |
|-------|-----|--------|
| VMware Alternatives in 2026: A Practical Exit Playbook — Platform9 | https://platform9.com/blog/vmware-alternatives-in-2026-a-practical-exit-playbook | `[done]` |
| VMware Exit Strategy — Intelligent Visibility | https://intelligentvisibility.com/data-center-infrastructure/vmware-exit-strategy | `[done]` |
| VMware to Proxmox Migration Guide 2026 — Petronella Tech | https://petronellatech.com/blog/vmware-to-proxmox-migration-guide | `[done]` |
| Migrating from VMware to Proxmox — Hornetsecurity | https://www.hornetsecurity.com/en/blog/migrate-vmware-to-proxmox | `[done]` |
| The Great VMware Exodus — Virtualization Howto | https://www.virtualizationhowto.com/2025/07/the-great-vmware-exodus-real-migration-stories-and-alternatives-for-2025/ | `[done]` |
| VMware to Hyper-V Migration 2026 — iShift | https://www.ishift.net/vmware-hyper-v-migration-2026 | `[done]` |
| VMware to Nutanix Migration 2026 — Redress Compliance | https://redresscompliance.com/vmware-to-nutanix | `[done]` |
| Hyper-V Licensing 2026 — Redress Compliance | https://redresscompliance.com/hyper-v-licensing-2026 | `[done]` |
| Beyond virtualization: a guide to modern vSphere alternatives — Spectro Cloud | https://www.spectrocloud.com/blog/vsphere-alternatives | `[done]` |
| VMware Migration in 2026: Proxmox, KVM, XCP-ng & Veeam — StarWind | https://starwindsoftware.com/blog/vmware-migration-to-proxmox-kvm-xcp-ng-2026 | `[done]` |
| Complete guide to modern vSphere alternatives — Spectro Cloud | https://www.spectrocloud.com/blog/vsphere-alternatives | `[done]` |
| Broadcom VMware Acquisition: What's Next — Sayers | https://www.sayers.com/blog/after-the-deal-whats-next-for-vmware-customers | `[done]` |
| Stanford University migration from VMware to Proxmox | https://itcommunity.stanford.edu/news/enterprise-technology-completes-successful-virtual-infrastructure-migration-vmware-proxmox | `[done]` |
| | **Messaging / streaming** | |
| Apache Kafka docs | https://kafka.apache.org/documentation/ | `[done]` |
| RabbitMQ docs | https://www.rabbitmq.com/documentation.html | `[done]` |
| Apache Pulsar docs | https://pulsar.apache.org/docs/ | `[done]` |
| NATS docs | https://docs.nats.io/ | `[done]` |
| Designing Event-Driven Systems (Confluent) | https://www.confluent.io/designing-event-driven-systems/ | `[done]` |
| Kafka: The Definitive Guide (2nd ed.) — Confluent | https://www.confluent.io/resources/kafka-the-definitive-guide/ | `[done]` |
| Enterprise Integration Patterns — Hohpe & Woolf | https://www.enterpriseintegrationpatterns.com/ | `[done]` |
| | **DC migrace** | |
| AWS Cloud Migration — 6 Strategies for Migrating to the Cloud | https://aws.amazon.com/blogs/enterprise-strategy/6-strategies-for-migrating-applications-to-the-cloud/ | `[done]` |
| Azure Cloud Migration — Microsoft Cloud Adoption Framework | https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/ | `[done]` |
| Gartner 5 Rs of Cloud Migration | https://www.gartner.com/en/documents/3984835 | `[done]` |
| VMware Site Recovery Manager — documentation | https://docs.vmware.com/en/Site-Recovery-Manager/ | `[done]` |
| Zerto — Disaster Recovery & Migration | https://www.zerto.com/resources/ | `[done]` |
| The Phoenix Project — IT Ops & Migration patterns | https://itrevolution.com/product/the-phoenix-project/ | `[done]` |
| | **Sangfor** | |
| Sangfor HCI — product page | https://www.sangfor.com/cloud-and-infrastructure/products/hci-hyper-converged-infrastructure | `[done]` |
| Sangfor aSV — hypervisor | https://www.sangfor.com/cloud-and-infrastructure/products/asv-hypervisor-server-virtualization | `[done]` |
| Sangfor vs VMware — feature comparison | https://www.sangfor.com/blog/cloud-and-infrastructure/sangfor-hci-vs-vmware-feature-comparison | `[done]` |
| | **AI infrastruktura** | |
| NVIDIA DGX — documentation | https://www.nvidia.com/en-us/data-center/dgx-platform/ | `[done]` |
| InfiniBand — Mellanox/NVIDIA | https://www.nvidia.com/en-us/networking/products/infiniband/ | `[done]` |
| Lustre parallel filesystem | https://www.lustre.org/ | `[done]` |
| WekaFS — AI storage | https://www.weka.io/ | `[done]` |
| vLLM — inference server | https://github.com/vllm-project/vllm | `[done]` |
| Megatron-LM — distributed training | https://github.com/NVIDIA/Megatron-LM | `[done]`
| | **Kubernetes / Cluster API** | |
| Cluster API (CAPI) — oficiální dokumentace (The CAPI Book) | https://cluster-api.sigs.k8s.io/ | `[done]` |
| Cluster API — GitHub (kubernetes-sigs/cluster-api) | https://github.com/kubernetes-sigs/cluster-api | `[done]` |
| Cluster API — seznam providerů | https://cluster-api.sigs.k8s.io/reference/providers.html | `[done]` |
| Kubernetes — oficiální dokumentace | https://kubernetes.io/docs/ | `[done]` |
| K3s — lightweigh Kubernetes | https://k3s.io/ | `[done]` |
| RKE2 — Rancher Kubernetes Engine 2 | https://docs.rke2.io/ | `[done]` |
| Talos — API-driven Kubernetes OS | https://www.talos.dev/ | `[done]` |
| Kamaji — hosted control plane provider | https://kamaji.clastix.io/ | `[done]` |
| Metal3 — bare metal provider pro CAPI | https://metal3.io/ | `[done]` |
| Cluster API — ClusterClass a topologies | https://kubernetes.io/blog/2021/10/08/capi-clusterclass-and-managed-topologies/ | `[done]` |
| | **Big Data** | |
| Apache Spark — oficiální dokumentace | https://spark.apache.org/docs/latest/ | `[done]` |
| Apache Flink — oficiální dokumentace | https://flink.apache.org/ | `[done]` |
| Trino — distribuovaný SQL engine | https://trino.io/docs/current/ | `[done]` |
| Apache Iceberg — tabulkový formát | https://iceberg.apache.org/ | `[done]` |
| Delta Lake — dokumentace | https://docs.delta.io/ | `[done]` |
| Apache Hudi | https://hudi.apache.org/ | `[done]` |
| Apache Paimon | https://paimon.apache.org/ | `[done]` |
| Apache Hadoop — dokumentace | https://hadoop.apache.org/docs/stable/ | `[done]` |
| Apache Airflow — dokumentace | https://airflow.apache.org/docs/ | `[done]` |
| Dagster — dokumentace | https://docs.dagster.io/ | `[done]` |
| Prefect — dokumentace | https://docs.prefect.io/ | `[done]` |
| HDFS architektura (Apache) | https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html | `[done]` |
| | **Operační systémy** | |
| Ubuntu lifecycle — Ubuntu Pro + ESM | https://ubuntu.com/about/release-cycle | `[done]` |
| RHEL lifecycle — Red Hat Enterprise Linux | https://access.redhat.com/support/policy/updates/errata | `[done]` |
| Rocky Linux lifecycle | https://rockylinux.org/download/ | `[done]` |
| AlmaLinux lifecycle | https://almalinux.org/ | `[done]` |
| Debian releases / LTS | https://wiki.debian.org/LTS | `[done]` |
| SLES lifecycle — SUSE | https://www.suse.com/lifecycle/ | `[done]` |
| Alpine Linux releases | https://alpinelinux.org/releases/ | `[done]` |
| Fedora lifecycle | https://docs.fedoraproject.org/en-US/releases/lifecycle/ | `[done]` |
| SELinux — Red Hat docs | https://www.redhat.com/en/topics/linux/what-is-selinux | `[done]` |
| AppArmor — Ubuntu wiki | https://wiki.ubuntu.com/AppArmor | `[done]` |
| | **Windows** | |
| Windows Server lifecycle | https://learn.microsoft.com/en-us/lifecycle/products/windows-server-2022/ | `[done]` |
| Windows Server 2025 lifecycle | https://learn.microsoft.com/en-us/lifecycle/products/windows-server-2025/ | `[done]` |
| Windows 11 lifecycle | https://learn.microsoft.com/en-us/lifecycle/products/windows-11-enterprise/ | `[done]` |
| Windows 10 EOL | https://learn.microsoft.com/en-us/lifecycle/products/windows-10-enterprise/ | `[done]` |
| Windows Server licensing (per core) | https://learn.microsoft.com/en-us/windows-server/get-started/editions-and-support | `[done]` |
| | **GPU ceny** | |
| NVIDIA AI GPU pricing guide (2026) | https://intuitionlabs.ai/articles/nvidia-ai-gpu-pricing-guide | `[done]` |
| GPU cloud pricing comparison (2026) | https://www.spheron.network/blog/gpu-cloud-pricing-comparison-2026/ | `[done]` |
| GPU pricing trends 2026 — CompuX | https://compux.net/docs/guides/gpu-pricing-trends-2026 | `[done]` |
| AMD MI300X pricing (2026) | https://www.thundercompute.com/blog/amd-mi300x-pricing | `[done]` |
| GPU price/performance frontier — Silicon Analysts | https://siliconanalysts.com/tools/frontier | `[done]` |
## Výrobci hardware
| Výrobce | Serverové řady | Management |
|---------|---------------|------------|
| Dell | PowerEdge (R6xx, R7xx) | iDRAC / OpenManage |
| HPE | ProLiant (DL, ML, Synergy) | iLO / OneView |
| Cisco | UCS (B-Series, C-Series) | UCS Manager / Intersight |
| Lenovo | ThinkSystem (SR, ST) | XClarity |
| Supermicro | SuperServer (cloud, storage, GPU) | IPMI / SuperDoctor |

View File

@@ -0,0 +1,50 @@
# Monitoring and observability — Sources
## Official documentation
| Source | URL | Status |
|-------|-----|--------|
| Prometheus docs | https://prometheus.io/docs/ | `[done]` |
| Grafana docs | https://grafana.com/docs/ | `[done]` |
| Zabbix docs | https://www.zabbix.com/documentation/ | `[done]` |
| OpenTelemetry specification | https://opentelemetry.io/docs/specs/otel/ | `[done]` |
| OpenMetrics standard | https://openmetrics.io/ | `[done]` |
## Books
| Name | Author | ISBN | Status |
|-------|-------|------|--------|
| Site Reliability Engineering | Beyer, Jones, Petoff, Murphy | 978-1491929124 | `[done]` |
| The Site Reliability Workbook | Beyer, Jones, Petoff, Murphy | 978-1492029502 | `[done]` |
| Observability Engineering | Majors, Fong-Pong | 978-1492076445 | `[done]` |
## Articles
| Name | URL | Status |
|-------|-----|--------|
| The USE Method (Brendan Gregg) | https://www.brendangregg.com/usemethod.html | `[done]` |
| The RED Method (Tom Wilkie) | https://grafana.com/blog/2018/08/02/the-red-method-how-to-instrument-your-services/ | `[done]` |
| Google SRE book (free) | https://sre.google/sre-book/table-of-contents/ | `[done]` |
## New books (20242026)
| Name | Author | ISBN | Status |
|-------|-------|------|--------|
| Mastering OpenTelemetry and Observability | Steve Flanders | 978-1-394-25312-8 | `[done]` |
| OpenTelemetry Cookbook | — | 978-9349174238 | `[done]` |
| Cloud Observability in Action | Michael Hausenblas | — (Manning, 2023) | `[done]` |
| Observability in the AI-Native Era | Lipsig, Grabner, Rati | 978-1-80638-959-9 | `[done]` |
| Mastering Prometheus | William Hegedus | 978-1-80512-566-2 | `[done]` |
| Observability with Grafana (LGTM stack) | Chapman, Holmes | 978-1-80324-964-3 | `[done]` |
| Open Source Observability | Corless, Pawar | — (O'Reilly, 2025) | `[done]` |
| Hands-On Monitoring and Alerting with Prometheus | Muhammad Badawy | 978-9349887565 | `[done]` |
## New tools (20242026)
| Tool | Description | URL | Status |
|---------|-------|-----|--------|
| Grafana Sigil | AI observability (OpenTelemetry-native) | https://github.com/grafana/sigil | `[done]` |
| InfraLens | eBPF-based zero-instrumentation observability | https://github.com/Herenn/Infralens | `[done]` |
| Ingero | GPU causal observability (eBPF) | https://github.com/ingero-io/ingero | `[done]` |
| GreptimeDB | Unified observability DB (OTel-native) | https://github.com/GreptimeTeam/greptimedb | `[done]` |
| Netdata | AI-powered full-stack observability | https://github.com/netdata/netdata | `[done]` |

View File

@@ -0,0 +1,50 @@
# Monitoring a observabilita — Zdroje
## Oficiální dokumentace
| Zdroj | URL | Status |
|-------|-----|--------|
| Prometheus docs | https://prometheus.io/docs/ | `[done]` |
| Grafana docs | https://grafana.com/docs/ | `[done]` |
| Zabbix docs | https://www.zabbix.com/documentation/ | `[done]` |
| OpenTelemetry specification | https://opentelemetry.io/docs/specs/otel/ | `[done]` |
| OpenMetrics standard | https://openmetrics.io/ | `[done]` |
## Knihy
| Název | Autor | ISBN | Status |
|-------|-------|------|--------|
| Site Reliability Engineering | Beyer, Jones, Petoff, Murphy | 978-1491929124 | `[done]` |
| The Site Reliability Workbook | Beyer, Jones, Petoff, Murphy | 978-1492029502 | `[done]` |
| Observability Engineering | Majors, Fong-Pong | 978-1492076445 | `[done]` |
## Články
| Název | URL | Status |
|-------|-----|--------|
| The USE Method (Brendan Gregg) | https://www.brendangregg.com/usemethod.html | `[done]` |
| The RED Method (Tom Wilkie) | https://grafana.com/blog/2018/08/02/the-red-method-how-to-instrument-your-services/ | `[done]` |
| Google SRE book (free) | https://sre.google/sre-book/table-of-contents/ | `[done]` |
## Nové knihy (20242026)
| Název | Autor | ISBN | Status |
|-------|-------|------|--------|
| Mastering OpenTelemetry and Observability | Steve Flanders | 978-1-394-25312-8 | `[done]` |
| OpenTelemetry Cookbook | — | 978-9349174238 | `[done]` |
| Cloud Observability in Action | Michael Hausenblas | — (Manning, 2023) | `[done]` |
| Observability in the AI-Native Era | Lipsig, Grabner, Rati | 978-1-80638-959-9 | `[done]` |
| Mastering Prometheus | William Hegedus | 978-1-80512-566-2 | `[done]` |
| Observability with Grafana (LGTM stack) | Chapman, Holmes | 978-1-80324-964-3 | `[done]` |
| Open Source Observability | Corless, Pawar | — (O'Reilly, 2025) | `[done]` |
| Hands-On Monitoring and Alerting with Prometheus | Muhammad Badawy | 978-9349887565 | `[done]` |
## Nové nástroje (20242026)
| Nástroj | Popis | URL | Status |
|---------|-------|-----|--------|
| Grafana Sigil | AI observability (OpenTelemetry-native) | https://github.com/grafana/sigil | `[done]` |
| InfraLens | eBPF-based zero-instrumentation observability | https://github.com/Herenn/Infralens | `[done]` |
| Ingero | GPU causal observability (eBPF) | https://github.com/ingero-io/ingero | `[done]` |
| GreptimeDB | Unified observability DB (OTel-native) | https://github.com/GreptimeTeam/greptimedb | `[done]` |
| Netdata | AI-powered full-stack observability | https://github.com/netdata/netdata | `[done]` |

View File

@@ -0,0 +1,40 @@
# Network architecture — Sources
## RFCs and standards
| RFC | Name | Status |
|-----|-------|--------|
| RFC 791 | Internet Protocol | `[done]` |
| RFC 793 | Transmission Control Protocol | `[done]` |
| RFC 1034/1035 | Domain Names — Concepts and Facilities | `[done]` |
| RFC 4271 | Border Gateway Protocol (BGP-4) | `[done]` |
| RFC 5246 | TLS 1.2 | `[done]` |
| RFC 8446 | TLS 1.3 | `[done]` |
## Official documentation
| Source | URL | Status |
|-------|-----|--------|
| AWS VPC docs | https://docs.aws.amazon.com/vpc/ | `[done]` |
| Azure Virtual Network docs | https://learn.microsoft.com/en-us/azure/virtual-network/ | `[done]` |
| Google VPC docs | https://cloud.google.com/vpc/docs | `[done]` |
## Books
| Name | Author | ISBN | Status |
|-------|-------|------|--------|
| Computer Networking: A Top-Down Approach | Kurose, Ross | 978-0133594140 | `[done]` |
| TCP/IP Illustrated | W. Richard Stevens | 978-0321336316 | `[done]` |
## New books (20242026)
| Name | Author | ISBN | Status |
|-------|-------|------|--------|
| AI Data Center Network Design and Technologies | Subramaniam, Styszynski, Tambakuwala | 978-0-13-543628-8 | `[done]` |
| Cloud Networking and Resilience | Cristian Critelli | 979-8868824357 | `[done]` |
| Zero Trust in Resilient Cloud and Network Architectures | Halley, Prajapati, Leza, Saini | 978-0-13-820460-0 | `[done]` |
| The Segmentation Blueprint | Kulkarni, Sivakumar, Morais, Lloyd | 978-0-13-546236-2 | `[done]` |
| Segment Routing for SP and Enterprise Networks | Deragisch et al. | 978-0-13-823101-9 | `[done]` |
| Understanding and Designing Azure Networking | Stuart, Moreno | — (2025) | `[done]` |
| Mastering Next-Gen Juniper Data Centers | Aninda Chatterjee | 978-0-13-533636-6 | `[done]` |
| Intelligent Cloud Networking: AI-Driven Resource Management | Manoj Yadav | 9364220110 | `[done]` |

View File

@@ -0,0 +1,40 @@
# Síťová architektura — Zdroje
## RFC a standardy
| RFC | Název | Status |
|-----|-------|--------|
| RFC 791 | Internet Protocol | `[done]` |
| RFC 793 | Transmission Control Protocol | `[done]` |
| RFC 1034/1035 | Domain Names — Concepts and Facilities | `[done]` |
| RFC 4271 | Border Gateway Protocol (BGP-4) | `[done]` |
| RFC 5246 | TLS 1.2 | `[done]` |
| RFC 8446 | TLS 1.3 | `[done]` |
## Oficiální dokumentace
| Zdroj | URL | Status |
|-------|-----|--------|
| AWS VPC docs | https://docs.aws.amazon.com/vpc/ | `[done]` |
| Azure Virtual Network docs | https://learn.microsoft.com/en-us/azure/virtual-network/ | `[done]` |
| Google VPC docs | https://cloud.google.com/vpc/docs | `[done]` |
## Knihy
| Název | Autor | ISBN | Status |
|-------|-------|------|--------|
| Computer Networking: A Top-Down Approach | Kurose, Ross | 978-0133594140 | `[done]` |
| TCP/IP Illustrated | W. Richard Stevens | 978-0321336316 | `[done]` |
## Nové knihy (20242026)
| Název | Autor | ISBN | Status |
|-------|-------|------|--------|
| AI Data Center Network Design and Technologies | Subramaniam, Styszynski, Tambakuwala | 978-0-13-543628-8 | `[done]` |
| Cloud Networking and Resilience | Cristian Critelli | 979-8868824357 | `[done]` |
| Zero Trust in Resilient Cloud and Network Architectures | Halley, Prajapati, Leza, Saini | 978-0-13-820460-0 | `[done]` |
| The Segmentation Blueprint | Kulkarni, Sivakumar, Morais, Lloyd | 978-0-13-546236-2 | `[done]` |
| Segment Routing for SP and Enterprise Networks | Deragisch et al. | 978-0-13-823101-9 | `[done]` |
| Understanding and Designing Azure Networking | Stuart, Moreno | — (2025) | `[done]` |
| Mastering Next-Gen Juniper Data Centers | Aninda Chatterjee | 978-0-13-533636-6 | `[done]` |
| Intelligent Cloud Networking: AI-Driven Resource Management | Manoj Yadav | 9364220110 | `[done]` |

50
templates/ADR.en.md Normal file
View File

@@ -0,0 +1,50 @@
# ADR — Architecture Decision Record
## Decision title
<!-- Brief title (e.g. "Using PostgreSQL as primary database") -->
## Status
<!--
Proposed | Approved | Deprecated | Superseded by [ADR-XXX]
-->
## Context
<!--
Describe the problem we are solving. What are the circumstances, constraints, and requirements?
-->
## Decision
<!--
What solution did we choose and why? Describe the architectural approach.
-->
## Rationale
<!--
Why did we choose this solution? What are the main benefits compared to alternatives?
-->
## Alternatives
<!--
What other options did we consider and why did we reject them?
-->
## Consequences
<!--
- What changes? What needs to be done?
- What are the trade-offs (e.g. higher complexity for lower latency)?
- Impact on other teams / systems?
-->
## Metadata
- **Date**: YYYY-MM-DD
- **Author**: name
- **Stakeholders**: team A, team B
- **References**: [link to design doc], [link to issue]

50
templates/ADR.md Normal file
View File

@@ -0,0 +1,50 @@
# ADR — Architecture Decision Record
## Název rozhodnutí
<!-- Stručný název (např. "Použití PostgreSQL jako primární databáze") -->
## Status
<!--
Navrženo | Schváleno | Deprecated | Nahrazeno [ADR-XXX]
-->
## Kontext
<!--
Popište problém, který řešíme. Jaké jsou okolnosti, omezení a požadavky?
-->
## Rozhodnutí
<!--
Jaké řešení jsme zvolili a proč? Popište architektonický přístup.
-->
## Důvody
<!--
Proč jsme zvolili toto řešení? Jaké jsou hlavní benefity oproti alternativám?
-->
## Alternativy
<!--
Jaké další možnosti jsme zvažovali a proč jsme je zamítli?
-->
## Důsledky
<!--
- Co se mění? Co je potřeba udělat?
- Jaké jsou trade-offy (např. vyšší komplexita za nižší latenci)?
- Dopad na ostatní týmy / systémy?
-->
## Metadata
- **Datum**: YYYY-MM-DD
- **Autor**: jméno
- **Zainteresované strany**: tým A, tým B
- **Reference**: [odkaz na design doc], [odkaz na issue]