Files
knowledge-base/SERVER-CONFIG.md
Stanislav Hubacek c6fa0bff6a commit
2026-06-11 15:27:28 +02:00

286 lines
9.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ⚙️ Server configuration — best practices podle workloadu
## Obecná BIOS/UEFI nastavení
| Nastavení | Doporučení | Zdůvodnění |
|-----------|-----------|------------|
| **Boot mode** | UEFI | Secure Boot, GPT, větší disky |
| **Power profile** | Performance / OS Control | Max výkon, C-States disabled |
| **Hyper-Threading** | Enabled | +30-50 % throughput pro multi-thread |
| **Virtualization** | Enabled (VT-x/AMD-V) | Nutné pro hypervisor, containers |
| **SR-IOV** | Enabled | GPU, NIC passthrough |
| **NUMA** | Enabled | NUMA-aware scheduling |
| **ACPI** | Enabled | Power management, OS-level |
| **Security Boot** | Enabled | Secure boot chain |
| **TPM** | Enabled | Measured boot, key storage |
---
## 1. Databázové servery
### Volba CPU
| DB typ | CPU preference | Zdůvodnění |
|--------|---------------|------------|
| **OLTP** (PostgreSQL, MySQL) | High clock, moderate cores | Nízká latence na transakci, limited parallelism |
| **OLAP** (ClickHouse, Snowflake) | Many cores, AVX-512 | Columnstore, high parallelism |
| **In-memory** (Redis, Memcached) | High clock, low cache latency | Single-threaded (Redis), RAM bandwidth |
| **Document** (MongoDB) | Balance (clock × cores) | Mixed workload |
| **Distributed** (Cassandra, Scylla) | Many cores, high cache | Shard-per-core (Scylla), compaction |
### Storage layout
```
Mount point FS RAID Disk type Účel
───────────── ───── ───── ────────── ──────────────────
/ ext4 1 (mirror) 2× SSD OS, binární soubory
/data xfs 10 (stripe) 4-8× NVMe Databázová data
/wal xfs 1 (mirror) 2× NVMe Write-ahead log (PostgreSQL)
/tmp tmpfs — RAM Dočasné soubory
```
### PostgreSQL specific
| Parametr | Doporučení | Poznámka |
|----------|-----------|----------|
| `shared_buffers` | 25 % RAM | Cache databázových bloků |
| `effective_cache_size` | 75 % RAM | Odhad OS cache pro query planner |
| `work_mem` | 4-64 MB per operation | SORT, HASH JOIN (correlate s max_connections) |
| `maintenance_work_mem` | 1-10 % RAM | VACUUM, CREATE INDEX, ANALYZE |
| `wal_buffers` | 64-256 MB | Write-ahead log buffer |
| `max_connections` | 50-500 | Connection pooling (PgBouncer) |
| `random_page_cost` | 1.1 (NVMe), 4 (HDD) | Index scan cost (NVMe = téměř seq scan) |
| `effective_io_concurrency` | 200 (NVMe), 2 (HDD) | Parallel I/O |
### MySQL / MariaDB specific
| Parametr | Doporučení | Poznámka |
|----------|-----------|----------|
| `innodb_buffer_pool_size` | 70-80 % RAM | Hlavní cache InnoDB |
| `innodb_log_file_size` | 1-4 GB | Redo log, čím větší tím lepší write perf |
| `innodb_flush_log_at_trx_commit` | 1 (ACID) / 2 (perf) | 1 = fsync každou transakci |
| `innodb_io_capacity` | 2000 (NVMe) / 200 (HDD) | IOPS limit |
| `innodb_write_io_threads` | 4-8 | Parallel write threads |
| `max_connections` | 100-500 | Connection pooling doporučen |
### MongoDB specific
| Parametr | Doporučení | Poznámka |
|----------|-----------|----------|
| **WiredTiger cache** | 50-80 % RAM | Storage engine cache |
| **WiredTiger compression** | Snappy / Zstd | Komprese disku (zlib je pomalý) |
| `filesystem` | XFS | Doporučený FS (ext4 OK, NTFS ne) |
| **ReadConcern/WriteConcern** | majority/majority | Pro důležitá data |
### Kernel tuning pro DB
```
# /etc/sysctl.d/99-database.conf
vm.swappiness = 1 # Minimalizuj swap, preferuj cache
vm.dirty_ratio = 30 # % RAM before background flush
vm.dirty_background_ratio = 5 # Start flush at 5 %
vm.nr_hugepages = 0 # Huge pages pokud DB podporuje (PostgreSQL, MongoDB)
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.core.somaxconn = 4096
# I/O scheduler (NVMe = none, SSD = mq-deadline, HDD = kyber/bfq)
# echo none > /sys/block/nvme0n1/queue/scheduler
```
---
## 2. Hypervisor host (ESXi / KVM / Hyper-V)
### CPU a NUMA
| Nastavení | Doporučení | Poznámka |
|-----------|-----------|----------|
| **Overcommit ratio** | 1:1 až 4:1 (vCPU:pCPU) | Podle workloadu (1:1 DB, 4:1 web) |
| **NUMA-aware sizing** | VM ≤ 1 NUMA node | Cross-NUMA penalty ~1.5-2× latence |
| **CPU pinning** | Dedikované VM: ano | Zamezí context switching |
| **C-States** | Disabled (in BIOS) | Nižší latence, vyšší spotřeba |
| **P-State** | OS control / Performance | HW power management |
| **Hyper-Threading** | Enabled | Více vCPU, watch for noisy neighbor |
### Storage pro hypervisor
```
VM storage:
├── OS datastore RAID 1 (2× SATA SSD) — ESXi boot, images
├── VM datastore (gold) RAID 10 (4× NVMe) — critical VMs, DB
├── VM datastore (silver) RAID 5 (6× SAS SSD) — general VMs
└── VM datastore (bronze) RAID 6 (8× SATA HDD) — backup, archive
Swap datastore: 1× NVMe nebo SATA SSD (dedikovaný)
```
### Network design
| Traffic | VLAN | Speed | NIC teaming |
|---------|------|-------|-------------|
| **Management** | Mgmt VLAN | 1 GbE | Active/Passive |
| **VM traffic** | VM VLANs | 25/100 GbE | LACP (802.3ad) |
| **Storage** | Storage VLAN | 25/100 GbE | LACP / RDMA |
| **vMotion** | vMotion VLAN | 25/100 GbE | Dedikovaný, multi-NIC |
| **FT (Fault Tolerance)** | FT VLAN | 10 GbE | Dedikovaný, low latency |
### BIOS pro hypervisor
| Nastavení | Hodnota | Zdůvodnění |
|-----------|---------|------------|
| Hyper-Threading | Enabled | Vyšší VM density |
| Virtualization Technology | Enabled | VT-x/AMD-V |
| VT-d / IOMMU | Enabled | Passthrough, SR-IOV |
| Power Management | Performance / OS | Minimalizace latence VM |
| C-States | Disabled | Nižší latence VM exit |
| NUMA | Enabled | NUMA-aware VM placement |
| SR-IOV | Enabled | NIC/GPU virtualizace |
---
## 3. Kubernetes node
### Node profily
| Role | CPU | RAM | Storage | Network | Use case |
|------|-----|-----|---------|---------|----------|
| **General purpose** | 16-32 cores | 64-128 GB | 1× NVMe OS + 1×NVMe local | Web, API, microservices |
| **Memory optimized** | 32-64 cores | 256-512 GB | 1× NVMe OS + 2×NVMe local | In-memory cache, DB |
| **Compute optimized** | 64-128 cores | 128-256 GB | 1× NVMe OS | Batch, CI/CD |
| **GPU node** | 32-64 cores | 512-1024 GB | 1× NVMe OS + 4-8×NVMe local | AI/ML training, inference |
| **Storage node** | 16-32 cores | 64-128 GB | 4-12× NVMe/SATA (Ceph/Longhorn) | SDS, persistent volumes |
### Kernel tuning
```
# /etc/sysctl.d/99-kubernetes.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
net.ipv4.conf.all.forwarding = 1
# Connection tracking (pro NodePort, Service)
net.netfilter.nf_conntrack_max = 2097152
net.netfilter.nf_conntrack_tcp_timeout_established = 86400
# File watchers (pro kubelet, containerd)
fs.inotify.max_user_instances = 8192
fs.inotify.max_user_watches = 524288
# Memory management
vm.swappiness = 0
vm.overcommit_memory = 1 # Allow overcommit (CRI-O, containerd)
vm.panic_on_oom = 0
kernel.panic = 10
kernel.panic_on_oops = 1
```
### Container storage
| Typ | Doporučení | Poznámka |
|-----|-----------|----------|
| **OS disk** | RAID 1 (2× NVMe) | Ext4/XFS, 100-200 GB |
| **Container runtime image** | RAID 1 (2× NVMe) | /var/lib/containerd, 200-500 GB |
| **Local PV** | Single NVMe | Raw device, no RAID |
| **Rook/Ceph OSD** | Raw NVMe/SATA | HBA/IT mode, no RAID |
| **Longhorn** | Raw NVMe/SATA | Ext4/XFS per volume |
---
## 4. Storage server (Ceph / MinIO / NAS)
### Ceph OSD node
| Komponenta | Doporučení | Poznámka |
|-----------|-----------|----------|
| **CPU** | 1-2 cores per OSD | Do 12 OSD na node (24 cores) |
| **RAM** | 4-8 GB per OSD + OS | BlueStore cache, 16-64 GB min |
| **Network** | 2× 25/100 GbE | Public + Cluster network |
| **Storage** | 10-12× NVMe/SATA SSD OSD | HBA/IT mode, žádný RAID |
| **OS disk** | 2× SATA SSD RAID 1 | OS, Ceph MON/MGR |
**BIOS pro Ceph:**
- SATA/NVMe: AHCI/NVMe mode (ne RAID)
- C-States: Disabled (nižší latence OSD)
- NUMA: Enabled
- Power: Performance
### MinIO node
| Komponenta | Doporučení |
|-----------|-----------|
| **CPU** | 8-16 cores (32+ pro erasure coding) |
| **RAM** | 32-64 GB + 1 GB per 1 TB storage |
| **Storage** | 4-16× NVMe (direct, no RAID) |
| **Network** | 2× 25/100 GbE |
| **OS** | Ubuntu / RHEL, XFS (pro data) |
### NAS (TrueNAS / FreeNAS)
- **ZFS**: RAID-Z1/Z2/Z3, compression (lz4, zstd), dedup
- **ARC cache**: 1 GB per 1 TB storage (max 64 GB)
- **L2ARC**: NVMe cache (optional, read-heavy)
- **SLOG**: NVDIMM / Optane (sync write, ZIL)
- **Network**: 2-4× 10/25 GbE LACP
---
## 5. Web / API servery
| Parametr | Doporučení |
|----------|-----------|
| **CPU** | High clock, 8-32 cores |
| **RAM** | 32-128 GB |
| **Storage** | 2× NVMe RAID 1 (OS + app) |
| **OS** | Ubuntu / RHEL, optimized kernel |
| **Network** | 2× 10/25 GbE (bonding) |
**Kernel tuning:**
```
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 15
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.core.netdev_max_backlog = 65535
```
---
## Rychlý decision tree — výběr serveru
```
Jaký workload?
├── Databáze (OLTP)
│ → EPYC, high clock, 8-16 GB/core, RAID10 NVMe, huge pages
├── Databáze (OLAP)
│ → Xeon AMX/AVX-512, 16-64 GB/core, many cores
├── Virtualizace
│ → EPYC, many cores, 4-8 GB/core, shared storage (SAN/NFS/vSAN)
├── Kubernetes
│ → EPYC, balance, 2-4 GB/core, local NVMe
├── AI/ML training
│ → GPU node (H100/B200), NVLink, InfiniBand, liquid cooling
├── AI/ML inference
│ → A100/H200, MIG, large VRAM, PCIe 5.0
├── Storage (Ceph/SDS)
│ → EPYC (PCIe lanes), HBA mode, 4-8 GB/OSD
└── Web / API
→ EPYC, high clock, 2-4 GB/core, 10 GbE
## Zdroje
Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
```