# ⚙️ Server configuration — best practices podle workloadu

## Obecná BIOS/UEFI nastavení

| Nastavení | Doporučení | Zdůvodnění |
|-----------|-----------|------------|
| **Boot mode** | UEFI | Secure Boot, GPT, větší disky |
| **Power profile** | Performance / OS Control | Max výkon, C-States disabled |
| **Hyper-Threading** | Enabled | +30-50 % throughput pro multi-thread |
| **Virtualization** | Enabled (VT-x/AMD-V) | Nutné pro hypervisor, containers |
| **SR-IOV** | Enabled | GPU, NIC passthrough |
| **NUMA** | Enabled | NUMA-aware scheduling |
| **ACPI** | Enabled | Power management, OS-level |
| **Security Boot** | Enabled | Secure boot chain |
| **TPM** | Enabled | Measured boot, key storage |

---

## 1. Databázové servery

### Volba CPU

| DB typ | CPU preference | Zdůvodnění |
|--------|---------------|------------|
| **OLTP** (PostgreSQL, MySQL) | High clock, moderate cores | Nízká latence na transakci, limited parallelism |
| **OLAP** (ClickHouse, Snowflake) | Many cores, AVX-512 | Columnstore, high parallelism |
| **In-memory** (Redis, Memcached) | High clock, low cache latency | Single-threaded (Redis), RAM bandwidth |
| **Document** (MongoDB) | Balance (clock × cores) | Mixed workload |
| **Distributed** (Cassandra, Scylla) | Many cores, high cache | Shard-per-core (Scylla), compaction |

### Storage layout

```
Mount point    FS       RAID     Disk type    Účel
─────────────  ─────    ─────    ──────────   ──────────────────
/               ext4    1 (mirror)  2× SSD    OS, binární soubory
/data           xfs     10 (stripe) 4-8× NVMe Databázová data
/wal            xfs     1 (mirror) 2× NVMe   Write-ahead log (PostgreSQL)
/tmp            tmpfs   —          RAM       Dočasné soubory
```

### PostgreSQL specific

| Parametr | Doporučení | Poznámka |
|----------|-----------|----------|
| `shared_buffers` | 25 % RAM | Cache databázových bloků |
| `effective_cache_size` | 75 % RAM | Odhad OS cache pro query planner |
| `work_mem` | 4-64 MB per operation | SORT, HASH JOIN (correlate s max_connections) |
| `maintenance_work_mem` | 1-10 % RAM | VACUUM, CREATE INDEX, ANALYZE |
| `wal_buffers` | 64-256 MB | Write-ahead log buffer |
| `max_connections` | 50-500 | Connection pooling (PgBouncer) |
| `random_page_cost` | 1.1 (NVMe), 4 (HDD) | Index scan cost (NVMe = téměř seq scan) |
| `effective_io_concurrency` | 200 (NVMe), 2 (HDD) | Parallel I/O |

### MySQL / MariaDB specific

| Parametr | Doporučení | Poznámka |
|----------|-----------|----------|
| `innodb_buffer_pool_size` | 70-80 % RAM | Hlavní cache InnoDB |
| `innodb_log_file_size` | 1-4 GB | Redo log, čím větší tím lepší write perf |
| `innodb_flush_log_at_trx_commit` | 1 (ACID) / 2 (perf) | 1 = fsync každou transakci |
| `innodb_io_capacity` | 2000 (NVMe) / 200 (HDD) | IOPS limit |
| `innodb_write_io_threads` | 4-8 | Parallel write threads |
| `max_connections` | 100-500 | Connection pooling doporučen |

### MongoDB specific

| Parametr | Doporučení | Poznámka |
|----------|-----------|----------|
| **WiredTiger cache** | 50-80 % RAM | Storage engine cache |
| **WiredTiger compression** | Snappy / Zstd | Komprese disku (zlib je pomalý) |
| `filesystem` | XFS | Doporučený FS (ext4 OK, NTFS ne) |
| **ReadConcern/WriteConcern** | majority/majority | Pro důležitá data |

### Kernel tuning pro DB

```
# /etc/sysctl.d/99-database.conf

vm.swappiness = 1              # Minimalizuj swap, preferuj cache
vm.dirty_ratio = 30            # % RAM before background flush
vm.dirty_background_ratio = 5  # Start flush at 5 %
vm.nr_hugepages = 0            # Huge pages pokud DB podporuje (PostgreSQL, MongoDB)
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.core.somaxconn = 4096

# I/O scheduler (NVMe = none, SSD = mq-deadline, HDD = kyber/bfq)
# echo none > /sys/block/nvme0n1/queue/scheduler
```

---

## 2. Hypervisor host (ESXi / KVM / Hyper-V)

### CPU a NUMA

| Nastavení | Doporučení | Poznámka |
|-----------|-----------|----------|
| **Overcommit ratio** | 1:1 až 4:1 (vCPU:pCPU) | Podle workloadu (1:1 DB, 4:1 web) |
| **NUMA-aware sizing** | VM ≤ 1 NUMA node | Cross-NUMA penalty ~1.5-2× latence |
| **CPU pinning** | Dedikované VM: ano | Zamezí context switching |
| **C-States** | Disabled (in BIOS) | Nižší latence, vyšší spotřeba |
| **P-State** | OS control / Performance | HW power management |
| **Hyper-Threading** | Enabled | Více vCPU, watch for noisy neighbor |

### Storage pro hypervisor

```
VM storage:
├── OS datastore          RAID 1 (2× SATA SSD) — ESXi boot, images
├── VM datastore (gold)   RAID 10 (4× NVMe) — critical VMs, DB
├── VM datastore (silver) RAID 5 (6× SAS SSD) — general VMs
└── VM datastore (bronze) RAID 6 (8× SATA HDD) — backup, archive

Swap datastore:                   1× NVMe nebo SATA SSD (dedikovaný)
```

### Network design

| Traffic | VLAN | Speed | NIC teaming |
|---------|------|-------|-------------|
| **Management** | Mgmt VLAN | 1 GbE | Active/Passive |
| **VM traffic** | VM VLANs | 25/100 GbE | LACP (802.3ad) |
| **Storage** | Storage VLAN | 25/100 GbE | LACP / RDMA | 
| **vMotion** | vMotion VLAN | 25/100 GbE | Dedikovaný, multi-NIC |
| **FT (Fault Tolerance)** | FT VLAN | 10 GbE | Dedikovaný, low latency |

### BIOS pro hypervisor

| Nastavení | Hodnota | Zdůvodnění |
|-----------|---------|------------|
| Hyper-Threading | Enabled | Vyšší VM density |
| Virtualization Technology | Enabled | VT-x/AMD-V |
| VT-d / IOMMU | Enabled | Passthrough, SR-IOV |
| Power Management | Performance / OS | Minimalizace latence VM |
| C-States | Disabled | Nižší latence VM exit |
| NUMA | Enabled | NUMA-aware VM placement |
| SR-IOV | Enabled | NIC/GPU virtualizace |

---

## 3. Kubernetes node

### Node profily

| Role | CPU | RAM | Storage | Network | Use case |
|------|-----|-----|---------|---------|----------|
| **General purpose** | 16-32 cores | 64-128 GB | 1× NVMe OS + 1×NVMe local | Web, API, microservices |
| **Memory optimized** | 32-64 cores | 256-512 GB | 1× NVMe OS + 2×NVMe local | In-memory cache, DB |
| **Compute optimized** | 64-128 cores | 128-256 GB | 1× NVMe OS | Batch, CI/CD |
| **GPU node** | 32-64 cores | 512-1024 GB | 1× NVMe OS + 4-8×NVMe local | AI/ML training, inference |
| **Storage node** | 16-32 cores | 64-128 GB | 4-12× NVMe/SATA (Ceph/Longhorn) | SDS, persistent volumes |

### Kernel tuning

```
# /etc/sysctl.d/99-kubernetes.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
net.ipv4.conf.all.forwarding = 1

# Connection tracking (pro NodePort, Service)
net.netfilter.nf_conntrack_max = 2097152
net.netfilter.nf_conntrack_tcp_timeout_established = 86400

# File watchers (pro kubelet, containerd)
fs.inotify.max_user_instances = 8192
fs.inotify.max_user_watches = 524288

# Memory management
vm.swappiness = 0
vm.overcommit_memory = 1      # Allow overcommit (CRI-O, containerd)
vm.panic_on_oom = 0
kernel.panic = 10
kernel.panic_on_oops = 1
```

### Container storage

| Typ | Doporučení | Poznámka |
|-----|-----------|----------|
| **OS disk** | RAID 1 (2× NVMe) | Ext4/XFS, 100-200 GB |
| **Container runtime image** | RAID 1 (2× NVMe) | /var/lib/containerd, 200-500 GB |
| **Local PV** | Single NVMe | Raw device, no RAID |
| **Rook/Ceph OSD** | Raw NVMe/SATA | HBA/IT mode, no RAID |
| **Longhorn** | Raw NVMe/SATA | Ext4/XFS per volume |

---

## 4. Storage server (Ceph / MinIO / NAS)

### Ceph OSD node

| Komponenta | Doporučení | Poznámka |
|-----------|-----------|----------|
| **CPU** | 1-2 cores per OSD | Do 12 OSD na node (24 cores) |
| **RAM** | 4-8 GB per OSD + OS | BlueStore cache, 16-64 GB min |
| **Network** | 2× 25/100 GbE | Public + Cluster network |
| **Storage** | 10-12× NVMe/SATA SSD OSD | HBA/IT mode, žádný RAID |
| **OS disk** | 2× SATA SSD RAID 1 | OS, Ceph MON/MGR |

**BIOS pro Ceph:**
- SATA/NVMe: AHCI/NVMe mode (ne RAID)
- C-States: Disabled (nižší latence OSD)
- NUMA: Enabled
- Power: Performance

### MinIO node

| Komponenta | Doporučení |
|-----------|-----------|
| **CPU** | 8-16 cores (32+ pro erasure coding) |
| **RAM** | 32-64 GB + 1 GB per 1 TB storage |
| **Storage** | 4-16× NVMe (direct, no RAID) |
| **Network** | 2× 25/100 GbE |
| **OS** | Ubuntu / RHEL, XFS (pro data) |

### NAS (TrueNAS / FreeNAS)

- **ZFS**: RAID-Z1/Z2/Z3, compression (lz4, zstd), dedup
- **ARC cache**: 1 GB per 1 TB storage (max 64 GB)
- **L2ARC**: NVMe cache (optional, read-heavy)
- **SLOG**: NVDIMM / Optane (sync write, ZIL)
- **Network**: 2-4× 10/25 GbE LACP

---

## 5. Web / API servery

| Parametr | Doporučení |
|----------|-----------|
| **CPU** | High clock, 8-32 cores |
| **RAM** | 32-128 GB |
| **Storage** | 2× NVMe RAID 1 (OS + app) |
| **OS** | Ubuntu / RHEL, optimized kernel |
| **Network** | 2× 10/25 GbE (bonding) |

**Kernel tuning:**
```
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 15
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.core.netdev_max_backlog = 65535
```

---

## Rychlý decision tree — výběr serveru

```
Jaký workload?
│
├── Databáze (OLTP)
│   → EPYC, high clock, 8-16 GB/core, RAID10 NVMe, huge pages
│
├── Databáze (OLAP)
│   → Xeon AMX/AVX-512, 16-64 GB/core, many cores
│
├── Virtualizace
│   → EPYC, many cores, 4-8 GB/core, shared storage (SAN/NFS/vSAN)
│
├── Kubernetes
│   → EPYC, balance, 2-4 GB/core, local NVMe
│
├── AI/ML training
│   → GPU node (H100/B200), NVLink, InfiniBand, liquid cooling
│
├── AI/ML inference
│   → A100/H200, MIG, large VRAM, PCIe 5.0
│
├── Storage (Ceph/SDS)
│   → EPYC (PCIe lanes), HBA mode, 4-8 GB/OSD
│
└── Web / API
    → EPYC, high clock, 2-4 GB/core, 10 GbE

## Zdroje

Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
```