This commit is contained in:
Stanislav Hubacek
2026-06-03 14:47:26 +02:00
parent 70ee14c2c2
commit c6fa0bff6a
31 changed files with 4212 additions and 0 deletions

285
SERVER-CONFIG.md Normal file
View File

@@ -0,0 +1,285 @@
# ⚙️ Server configuration — best practices podle workloadu
## Obecná BIOS/UEFI nastavení
| Nastavení | Doporučení | Zdůvodnění |
|-----------|-----------|------------|
| **Boot mode** | UEFI | Secure Boot, GPT, větší disky |
| **Power profile** | Performance / OS Control | Max výkon, C-States disabled |
| **Hyper-Threading** | Enabled | +30-50 % throughput pro multi-thread |
| **Virtualization** | Enabled (VT-x/AMD-V) | Nutné pro hypervisor, containers |
| **SR-IOV** | Enabled | GPU, NIC passthrough |
| **NUMA** | Enabled | NUMA-aware scheduling |
| **ACPI** | Enabled | Power management, OS-level |
| **Security Boot** | Enabled | Secure boot chain |
| **TPM** | Enabled | Measured boot, key storage |
---
## 1. Databázové servery
### Volba CPU
| DB typ | CPU preference | Zdůvodnění |
|--------|---------------|------------|
| **OLTP** (PostgreSQL, MySQL) | High clock, moderate cores | Nízká latence na transakci, limited parallelism |
| **OLAP** (ClickHouse, Snowflake) | Many cores, AVX-512 | Columnstore, high parallelism |
| **In-memory** (Redis, Memcached) | High clock, low cache latency | Single-threaded (Redis), RAM bandwidth |
| **Document** (MongoDB) | Balance (clock × cores) | Mixed workload |
| **Distributed** (Cassandra, Scylla) | Many cores, high cache | Shard-per-core (Scylla), compaction |
### Storage layout
```
Mount point FS RAID Disk type Účel
───────────── ───── ───── ────────── ──────────────────
/ ext4 1 (mirror) 2× SSD OS, binární soubory
/data xfs 10 (stripe) 4-8× NVMe Databázová data
/wal xfs 1 (mirror) 2× NVMe Write-ahead log (PostgreSQL)
/tmp tmpfs — RAM Dočasné soubory
```
### PostgreSQL specific
| Parametr | Doporučení | Poznámka |
|----------|-----------|----------|
| `shared_buffers` | 25 % RAM | Cache databázových bloků |
| `effective_cache_size` | 75 % RAM | Odhad OS cache pro query planner |
| `work_mem` | 4-64 MB per operation | SORT, HASH JOIN (correlate s max_connections) |
| `maintenance_work_mem` | 1-10 % RAM | VACUUM, CREATE INDEX, ANALYZE |
| `wal_buffers` | 64-256 MB | Write-ahead log buffer |
| `max_connections` | 50-500 | Connection pooling (PgBouncer) |
| `random_page_cost` | 1.1 (NVMe), 4 (HDD) | Index scan cost (NVMe = téměř seq scan) |
| `effective_io_concurrency` | 200 (NVMe), 2 (HDD) | Parallel I/O |
### MySQL / MariaDB specific
| Parametr | Doporučení | Poznámka |
|----------|-----------|----------|
| `innodb_buffer_pool_size` | 70-80 % RAM | Hlavní cache InnoDB |
| `innodb_log_file_size` | 1-4 GB | Redo log, čím větší tím lepší write perf |
| `innodb_flush_log_at_trx_commit` | 1 (ACID) / 2 (perf) | 1 = fsync každou transakci |
| `innodb_io_capacity` | 2000 (NVMe) / 200 (HDD) | IOPS limit |
| `innodb_write_io_threads` | 4-8 | Parallel write threads |
| `max_connections` | 100-500 | Connection pooling doporučen |
### MongoDB specific
| Parametr | Doporučení | Poznámka |
|----------|-----------|----------|
| **WiredTiger cache** | 50-80 % RAM | Storage engine cache |
| **WiredTiger compression** | Snappy / Zstd | Komprese disku (zlib je pomalý) |
| `filesystem` | XFS | Doporučený FS (ext4 OK, NTFS ne) |
| **ReadConcern/WriteConcern** | majority/majority | Pro důležitá data |
### Kernel tuning pro DB
```
# /etc/sysctl.d/99-database.conf
vm.swappiness = 1 # Minimalizuj swap, preferuj cache
vm.dirty_ratio = 30 # % RAM before background flush
vm.dirty_background_ratio = 5 # Start flush at 5 %
vm.nr_hugepages = 0 # Huge pages pokud DB podporuje (PostgreSQL, MongoDB)
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.core.somaxconn = 4096
# I/O scheduler (NVMe = none, SSD = mq-deadline, HDD = kyber/bfq)
# echo none > /sys/block/nvme0n1/queue/scheduler
```
---
## 2. Hypervisor host (ESXi / KVM / Hyper-V)
### CPU a NUMA
| Nastavení | Doporučení | Poznámka |
|-----------|-----------|----------|
| **Overcommit ratio** | 1:1 až 4:1 (vCPU:pCPU) | Podle workloadu (1:1 DB, 4:1 web) |
| **NUMA-aware sizing** | VM ≤ 1 NUMA node | Cross-NUMA penalty ~1.5-2× latence |
| **CPU pinning** | Dedikované VM: ano | Zamezí context switching |
| **C-States** | Disabled (in BIOS) | Nižší latence, vyšší spotřeba |
| **P-State** | OS control / Performance | HW power management |
| **Hyper-Threading** | Enabled | Více vCPU, watch for noisy neighbor |
### Storage pro hypervisor
```
VM storage:
├── OS datastore RAID 1 (2× SATA SSD) — ESXi boot, images
├── VM datastore (gold) RAID 10 (4× NVMe) — critical VMs, DB
├── VM datastore (silver) RAID 5 (6× SAS SSD) — general VMs
└── VM datastore (bronze) RAID 6 (8× SATA HDD) — backup, archive
Swap datastore: 1× NVMe nebo SATA SSD (dedikovaný)
```
### Network design
| Traffic | VLAN | Speed | NIC teaming |
|---------|------|-------|-------------|
| **Management** | Mgmt VLAN | 1 GbE | Active/Passive |
| **VM traffic** | VM VLANs | 25/100 GbE | LACP (802.3ad) |
| **Storage** | Storage VLAN | 25/100 GbE | LACP / RDMA |
| **vMotion** | vMotion VLAN | 25/100 GbE | Dedikovaný, multi-NIC |
| **FT (Fault Tolerance)** | FT VLAN | 10 GbE | Dedikovaný, low latency |
### BIOS pro hypervisor
| Nastavení | Hodnota | Zdůvodnění |
|-----------|---------|------------|
| Hyper-Threading | Enabled | Vyšší VM density |
| Virtualization Technology | Enabled | VT-x/AMD-V |
| VT-d / IOMMU | Enabled | Passthrough, SR-IOV |
| Power Management | Performance / OS | Minimalizace latence VM |
| C-States | Disabled | Nižší latence VM exit |
| NUMA | Enabled | NUMA-aware VM placement |
| SR-IOV | Enabled | NIC/GPU virtualizace |
---
## 3. Kubernetes node
### Node profily
| Role | CPU | RAM | Storage | Network | Use case |
|------|-----|-----|---------|---------|----------|
| **General purpose** | 16-32 cores | 64-128 GB | 1× NVMe OS + 1×NVMe local | Web, API, microservices |
| **Memory optimized** | 32-64 cores | 256-512 GB | 1× NVMe OS + 2×NVMe local | In-memory cache, DB |
| **Compute optimized** | 64-128 cores | 128-256 GB | 1× NVMe OS | Batch, CI/CD |
| **GPU node** | 32-64 cores | 512-1024 GB | 1× NVMe OS + 4-8×NVMe local | AI/ML training, inference |
| **Storage node** | 16-32 cores | 64-128 GB | 4-12× NVMe/SATA (Ceph/Longhorn) | SDS, persistent volumes |
### Kernel tuning
```
# /etc/sysctl.d/99-kubernetes.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
net.ipv4.conf.all.forwarding = 1
# Connection tracking (pro NodePort, Service)
net.netfilter.nf_conntrack_max = 2097152
net.netfilter.nf_conntrack_tcp_timeout_established = 86400
# File watchers (pro kubelet, containerd)
fs.inotify.max_user_instances = 8192
fs.inotify.max_user_watches = 524288
# Memory management
vm.swappiness = 0
vm.overcommit_memory = 1 # Allow overcommit (CRI-O, containerd)
vm.panic_on_oom = 0
kernel.panic = 10
kernel.panic_on_oops = 1
```
### Container storage
| Typ | Doporučení | Poznámka |
|-----|-----------|----------|
| **OS disk** | RAID 1 (2× NVMe) | Ext4/XFS, 100-200 GB |
| **Container runtime image** | RAID 1 (2× NVMe) | /var/lib/containerd, 200-500 GB |
| **Local PV** | Single NVMe | Raw device, no RAID |
| **Rook/Ceph OSD** | Raw NVMe/SATA | HBA/IT mode, no RAID |
| **Longhorn** | Raw NVMe/SATA | Ext4/XFS per volume |
---
## 4. Storage server (Ceph / MinIO / NAS)
### Ceph OSD node
| Komponenta | Doporučení | Poznámka |
|-----------|-----------|----------|
| **CPU** | 1-2 cores per OSD | Do 12 OSD na node (24 cores) |
| **RAM** | 4-8 GB per OSD + OS | BlueStore cache, 16-64 GB min |
| **Network** | 2× 25/100 GbE | Public + Cluster network |
| **Storage** | 10-12× NVMe/SATA SSD OSD | HBA/IT mode, žádný RAID |
| **OS disk** | 2× SATA SSD RAID 1 | OS, Ceph MON/MGR |
**BIOS pro Ceph:**
- SATA/NVMe: AHCI/NVMe mode (ne RAID)
- C-States: Disabled (nižší latence OSD)
- NUMA: Enabled
- Power: Performance
### MinIO node
| Komponenta | Doporučení |
|-----------|-----------|
| **CPU** | 8-16 cores (32+ pro erasure coding) |
| **RAM** | 32-64 GB + 1 GB per 1 TB storage |
| **Storage** | 4-16× NVMe (direct, no RAID) |
| **Network** | 2× 25/100 GbE |
| **OS** | Ubuntu / RHEL, XFS (pro data) |
### NAS (TrueNAS / FreeNAS)
- **ZFS**: RAID-Z1/Z2/Z3, compression (lz4, zstd), dedup
- **ARC cache**: 1 GB per 1 TB storage (max 64 GB)
- **L2ARC**: NVMe cache (optional, read-heavy)
- **SLOG**: NVDIMM / Optane (sync write, ZIL)
- **Network**: 2-4× 10/25 GbE LACP
---
## 5. Web / API servery
| Parametr | Doporučení |
|----------|-----------|
| **CPU** | High clock, 8-32 cores |
| **RAM** | 32-128 GB |
| **Storage** | 2× NVMe RAID 1 (OS + app) |
| **OS** | Ubuntu / RHEL, optimized kernel |
| **Network** | 2× 10/25 GbE (bonding) |
**Kernel tuning:**
```
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 15
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.core.netdev_max_backlog = 65535
```
---
## Rychlý decision tree — výběr serveru
```
Jaký workload?
├── Databáze (OLTP)
│ → EPYC, high clock, 8-16 GB/core, RAID10 NVMe, huge pages
├── Databáze (OLAP)
│ → Xeon AMX/AVX-512, 16-64 GB/core, many cores
├── Virtualizace
│ → EPYC, many cores, 4-8 GB/core, shared storage (SAN/NFS/vSAN)
├── Kubernetes
│ → EPYC, balance, 2-4 GB/core, local NVMe
├── AI/ML training
│ → GPU node (H100/B200), NVLink, InfiniBand, liquid cooling
├── AI/ML inference
│ → A100/H200, MIG, large VRAM, PCIe 5.0
├── Storage (Ceph/SDS)
│ → EPYC (PCIe lanes), HBA mode, 4-8 GB/OSD
└── Web / API
→ EPYC, high clock, 2-4 GB/core, 10 GbE
## Zdroje
Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
```