# ⚙️ Server configuration — best practices podle workloadu ## Obecná BIOS/UEFI nastavení | Nastavení | Doporučení | Zdůvodnění | |-----------|-----------|------------| | **Boot mode** | UEFI | Secure Boot, GPT, větší disky | | **Power profile** | Performance / OS Control | Max výkon, C-States disabled | | **Hyper-Threading** | Enabled | +30-50 % throughput pro multi-thread | | **Virtualization** | Enabled (VT-x/AMD-V) | Nutné pro hypervisor, containers | | **SR-IOV** | Enabled | GPU, NIC passthrough | | **NUMA** | Enabled | NUMA-aware scheduling | | **ACPI** | Enabled | Power management, OS-level | | **Security Boot** | Enabled | Secure boot chain | | **TPM** | Enabled | Measured boot, key storage | --- ## 1. Databázové servery ### Volba CPU | DB typ | CPU preference | Zdůvodnění | |--------|---------------|------------| | **OLTP** (PostgreSQL, MySQL) | High clock, moderate cores | Nízká latence na transakci, limited parallelism | | **OLAP** (ClickHouse, Snowflake) | Many cores, AVX-512 | Columnstore, high parallelism | | **In-memory** (Redis, Memcached) | High clock, low cache latency | Single-threaded (Redis), RAM bandwidth | | **Document** (MongoDB) | Balance (clock × cores) | Mixed workload | | **Distributed** (Cassandra, Scylla) | Many cores, high cache | Shard-per-core (Scylla), compaction | ### Storage layout ``` Mount point FS RAID Disk type Účel ───────────── ───── ───── ────────── ────────────────── / ext4 1 (mirror) 2× SSD OS, binární soubory /data xfs 10 (stripe) 4-8× NVMe Databázová data /wal xfs 1 (mirror) 2× NVMe Write-ahead log (PostgreSQL) /tmp tmpfs — RAM Dočasné soubory ``` ### PostgreSQL specific | Parametr | Doporučení | Poznámka | |----------|-----------|----------| | `shared_buffers` | 25 % RAM | Cache databázových bloků | | `effective_cache_size` | 75 % RAM | Odhad OS cache pro query planner | | `work_mem` | 4-64 MB per operation | SORT, HASH JOIN (correlate s max_connections) | | `maintenance_work_mem` | 1-10 % RAM | VACUUM, CREATE INDEX, ANALYZE | | `wal_buffers` | 64-256 MB | Write-ahead log buffer | | `max_connections` | 50-500 | Connection pooling (PgBouncer) | | `random_page_cost` | 1.1 (NVMe), 4 (HDD) | Index scan cost (NVMe = téměř seq scan) | | `effective_io_concurrency` | 200 (NVMe), 2 (HDD) | Parallel I/O | ### MySQL / MariaDB specific | Parametr | Doporučení | Poznámka | |----------|-----------|----------| | `innodb_buffer_pool_size` | 70-80 % RAM | Hlavní cache InnoDB | | `innodb_log_file_size` | 1-4 GB | Redo log, čím větší tím lepší write perf | | `innodb_flush_log_at_trx_commit` | 1 (ACID) / 2 (perf) | 1 = fsync každou transakci | | `innodb_io_capacity` | 2000 (NVMe) / 200 (HDD) | IOPS limit | | `innodb_write_io_threads` | 4-8 | Parallel write threads | | `max_connections` | 100-500 | Connection pooling doporučen | ### MongoDB specific | Parametr | Doporučení | Poznámka | |----------|-----------|----------| | **WiredTiger cache** | 50-80 % RAM | Storage engine cache | | **WiredTiger compression** | Snappy / Zstd | Komprese disku (zlib je pomalý) | | `filesystem` | XFS | Doporučený FS (ext4 OK, NTFS ne) | | **ReadConcern/WriteConcern** | majority/majority | Pro důležitá data | ### Kernel tuning pro DB ``` # /etc/sysctl.d/99-database.conf vm.swappiness = 1 # Minimalizuj swap, preferuj cache vm.dirty_ratio = 30 # % RAM before background flush vm.dirty_background_ratio = 5 # Start flush at 5 % vm.nr_hugepages = 0 # Huge pages pokud DB podporuje (PostgreSQL, MongoDB) net.core.rmem_max = 134217728 net.core.wmem_max = 134217728 net.ipv4.tcp_rmem = 4096 87380 134217728 net.ipv4.tcp_wmem = 4096 65536 134217728 net.core.somaxconn = 4096 # I/O scheduler (NVMe = none, SSD = mq-deadline, HDD = kyber/bfq) # echo none > /sys/block/nvme0n1/queue/scheduler ``` --- ## 2. Hypervisor host (ESXi / KVM / Hyper-V) ### CPU a NUMA | Nastavení | Doporučení | Poznámka | |-----------|-----------|----------| | **Overcommit ratio** | 1:1 až 4:1 (vCPU:pCPU) | Podle workloadu (1:1 DB, 4:1 web) | | **NUMA-aware sizing** | VM ≤ 1 NUMA node | Cross-NUMA penalty ~1.5-2× latence | | **CPU pinning** | Dedikované VM: ano | Zamezí context switching | | **C-States** | Disabled (in BIOS) | Nižší latence, vyšší spotřeba | | **P-State** | OS control / Performance | HW power management | | **Hyper-Threading** | Enabled | Více vCPU, watch for noisy neighbor | ### Storage pro hypervisor ``` VM storage: ├── OS datastore RAID 1 (2× SATA SSD) — ESXi boot, images ├── VM datastore (gold) RAID 10 (4× NVMe) — critical VMs, DB ├── VM datastore (silver) RAID 5 (6× SAS SSD) — general VMs └── VM datastore (bronze) RAID 6 (8× SATA HDD) — backup, archive Swap datastore: 1× NVMe nebo SATA SSD (dedikovaný) ``` ### Network design | Traffic | VLAN | Speed | NIC teaming | |---------|------|-------|-------------| | **Management** | Mgmt VLAN | 1 GbE | Active/Passive | | **VM traffic** | VM VLANs | 25/100 GbE | LACP (802.3ad) | | **Storage** | Storage VLAN | 25/100 GbE | LACP / RDMA | | **vMotion** | vMotion VLAN | 25/100 GbE | Dedikovaný, multi-NIC | | **FT (Fault Tolerance)** | FT VLAN | 10 GbE | Dedikovaný, low latency | ### BIOS pro hypervisor | Nastavení | Hodnota | Zdůvodnění | |-----------|---------|------------| | Hyper-Threading | Enabled | Vyšší VM density | | Virtualization Technology | Enabled | VT-x/AMD-V | | VT-d / IOMMU | Enabled | Passthrough, SR-IOV | | Power Management | Performance / OS | Minimalizace latence VM | | C-States | Disabled | Nižší latence VM exit | | NUMA | Enabled | NUMA-aware VM placement | | SR-IOV | Enabled | NIC/GPU virtualizace | --- ## 3. Kubernetes node ### Node profily | Role | CPU | RAM | Storage | Network | Use case | |------|-----|-----|---------|---------|----------| | **General purpose** | 16-32 cores | 64-128 GB | 1× NVMe OS + 1×NVMe local | Web, API, microservices | | **Memory optimized** | 32-64 cores | 256-512 GB | 1× NVMe OS + 2×NVMe local | In-memory cache, DB | | **Compute optimized** | 64-128 cores | 128-256 GB | 1× NVMe OS | Batch, CI/CD | | **GPU node** | 32-64 cores | 512-1024 GB | 1× NVMe OS + 4-8×NVMe local | AI/ML training, inference | | **Storage node** | 16-32 cores | 64-128 GB | 4-12× NVMe/SATA (Ceph/Longhorn) | SDS, persistent volumes | ### Kernel tuning ``` # /etc/sysctl.d/99-kubernetes.conf net.bridge.bridge-nf-call-iptables = 1 net.bridge.bridge-nf-call-ip6tables = 1 net.ipv4.ip_forward = 1 net.ipv4.conf.all.forwarding = 1 # Connection tracking (pro NodePort, Service) net.netfilter.nf_conntrack_max = 2097152 net.netfilter.nf_conntrack_tcp_timeout_established = 86400 # File watchers (pro kubelet, containerd) fs.inotify.max_user_instances = 8192 fs.inotify.max_user_watches = 524288 # Memory management vm.swappiness = 0 vm.overcommit_memory = 1 # Allow overcommit (CRI-O, containerd) vm.panic_on_oom = 0 kernel.panic = 10 kernel.panic_on_oops = 1 ``` ### Container storage | Typ | Doporučení | Poznámka | |-----|-----------|----------| | **OS disk** | RAID 1 (2× NVMe) | Ext4/XFS, 100-200 GB | | **Container runtime image** | RAID 1 (2× NVMe) | /var/lib/containerd, 200-500 GB | | **Local PV** | Single NVMe | Raw device, no RAID | | **Rook/Ceph OSD** | Raw NVMe/SATA | HBA/IT mode, no RAID | | **Longhorn** | Raw NVMe/SATA | Ext4/XFS per volume | --- ## 4. Storage server (Ceph / MinIO / NAS) ### Ceph OSD node | Komponenta | Doporučení | Poznámka | |-----------|-----------|----------| | **CPU** | 1-2 cores per OSD | Do 12 OSD na node (24 cores) | | **RAM** | 4-8 GB per OSD + OS | BlueStore cache, 16-64 GB min | | **Network** | 2× 25/100 GbE | Public + Cluster network | | **Storage** | 10-12× NVMe/SATA SSD OSD | HBA/IT mode, žádný RAID | | **OS disk** | 2× SATA SSD RAID 1 | OS, Ceph MON/MGR | **BIOS pro Ceph:** - SATA/NVMe: AHCI/NVMe mode (ne RAID) - C-States: Disabled (nižší latence OSD) - NUMA: Enabled - Power: Performance ### MinIO node | Komponenta | Doporučení | |-----------|-----------| | **CPU** | 8-16 cores (32+ pro erasure coding) | | **RAM** | 32-64 GB + 1 GB per 1 TB storage | | **Storage** | 4-16× NVMe (direct, no RAID) | | **Network** | 2× 25/100 GbE | | **OS** | Ubuntu / RHEL, XFS (pro data) | ### NAS (TrueNAS / FreeNAS) - **ZFS**: RAID-Z1/Z2/Z3, compression (lz4, zstd), dedup - **ARC cache**: 1 GB per 1 TB storage (max 64 GB) - **L2ARC**: NVMe cache (optional, read-heavy) - **SLOG**: NVDIMM / Optane (sync write, ZIL) - **Network**: 2-4× 10/25 GbE LACP --- ## 5. Web / API servery | Parametr | Doporučení | |----------|-----------| | **CPU** | High clock, 8-32 cores | | **RAM** | 32-128 GB | | **Storage** | 2× NVMe RAID 1 (OS + app) | | **OS** | Ubuntu / RHEL, optimized kernel | | **Network** | 2× 10/25 GbE (bonding) | **Kernel tuning:** ``` net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_fin_timeout = 15 net.core.somaxconn = 65535 net.ipv4.tcp_max_syn_backlog = 65535 net.core.netdev_max_backlog = 65535 ``` --- ## Rychlý decision tree — výběr serveru ``` Jaký workload? │ ├── Databáze (OLTP) │ → EPYC, high clock, 8-16 GB/core, RAID10 NVMe, huge pages │ ├── Databáze (OLAP) │ → Xeon AMX/AVX-512, 16-64 GB/core, many cores │ ├── Virtualizace │ → EPYC, many cores, 4-8 GB/core, shared storage (SAN/NFS/vSAN) │ ├── Kubernetes │ → EPYC, balance, 2-4 GB/core, local NVMe │ ├── AI/ML training │ → GPU node (H100/B200), NVLink, InfiniBand, liquid cooling │ ├── AI/ML inference │ → A100/H200, MIG, large VRAM, PCIe 5.0 │ ├── Storage (Ceph/SDS) │ → EPYC (PCIe lanes), HBA mode, 4-8 GB/OSD │ └── Web / API → EPYC, high clock, 2-4 GB/core, 10 GbE ## Zdroje Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md) ```