Upload files to "/"
This commit is contained in:
128
GPU.md
Normal file
128
GPU.md
Normal file
@@ -0,0 +1,128 @@
|
|||||||
|
# 🎮 GPU — architektura, modely, virtualizace
|
||||||
|
|
||||||
|
## GPU modely
|
||||||
|
|
||||||
|
### NVIDIA
|
||||||
|
|
||||||
|
| GPU | Architektura | VRAM | HBM | FP16 (TFLOPS) | FP8 (TFLOPS) | Interconnect | TDP |
|
||||||
|
|-----|-------------|------|-----|--------------|-------------|-------------|-----|
|
||||||
|
| **A100** | Ampere (2020) | 40/80 GB | HBM2e | 312 | — | NVLink 3 (600 GB/s) | 400 W |
|
||||||
|
| **H100** | Hopper (2022) | 80 GB | HBM3 | 1000 | 2000 (sparse) | NVLink 4 (900 GB/s) | 700 W |
|
||||||
|
| **H200** | Hopper (2023) | 141 GB | HBM3e | 1650 | ~3300 | NVLink 4 (900 GB/s) | 700 W |
|
||||||
|
| **B200** | Blackwell (2024) | 192 GB | HBM3e | 2250 | ~4500 | NVLink 5 (1800 GB/s) | 700 W |
|
||||||
|
| **B100** | Blackwell (2024) | 192 GB | HBM3e | ~1800 | ~3600 | NVLink 5 | 700 W |
|
||||||
|
| **GB200** | Blackwell (2024) | — | HBM3e | 4500 (dual) | 9000 (dual) | NVLink 5 | 2700 W |
|
||||||
|
|
||||||
|
### AMD
|
||||||
|
|
||||||
|
| GPU | Architektura | VRAM | HBM | FP16 (TFLOPS) | Interconnect | TDP |
|
||||||
|
|-----|-------------|------|-----|--------------|-------------|-----|
|
||||||
|
| **MI250X** | CDNA 2 (2021) | 128 GB | HBM2e | 383 | Infinity Fabric | 500 W |
|
||||||
|
| **MI300X** | CDNA 3 (2023) | 192 GB | HBM3 | ~2600 | Infinity Fabric (896 GB/s) | 750 W |
|
||||||
|
| **MI350** | CDNA 4 (2025) | 288 GB | HBM3e | ~3500 | Infinity Fabric | 750 W |
|
||||||
|
|
||||||
|
## GPU interconnects
|
||||||
|
|
||||||
|
| Technologie | Poskytovatel | Bandwidth | Topologie | Use case |
|
||||||
|
|------------|-------------|-----------|-----------|----------|
|
||||||
|
| **NVLink 4** | NVIDIA | 900 GB/s (18× 50 GB/s) | GPU-GPU direct | AI training (H100, H200) |
|
||||||
|
| **NVLink 5** | NVIDIA | 1800 GB/s (18× 100 GB/s) | GPU-GPU direct | AI training (B200, GB200) |
|
||||||
|
| **Infinity Fabric** | AMD | 896 GB/s | GPU-GPU + CPU-GPU | AI training (MI300X, MI350) |
|
||||||
|
| **NVSwitch** | NVIDIA | 900 GB/s per GPU (NVLink) | Full-mesh (256 GPU) | DGX SuperPOD, HGX |
|
||||||
|
| **InfiniBand (NDR)** | NVIDIA/Mellanox | 400 Gbps per port | GPU-NIC direct, RDMA | Distributed training, HPC |
|
||||||
|
| **PCIe 5.0** | Standard | 63 GB/s per x16 | CPU-GPU | Inference, rendering |
|
||||||
|
| **Ethernet (RoCE v2)** | Standard | 100/200/400 GbE | GPU-NIC, RDMA over converged ethernet | AI inference, storage |
|
||||||
|
|
||||||
|
### GPU direct communication
|
||||||
|
|
||||||
|
```
|
||||||
|
GPU 0 ──NVLink── GPU 1 GPU 0 ───PCIe─── CPU ───PCIe─── GPU 1
|
||||||
|
│ │
|
||||||
|
│ │
|
||||||
|
NVSwitch InfiniBand
|
||||||
|
│ │
|
||||||
|
│ │
|
||||||
|
GPU 2 ──NVLink── GPU 3 GPU 2 ───PCIe─── CPU ───PCIe─── GPU 3
|
||||||
|
|
||||||
|
NVLink topologie (GPU direct) PCIe topologie (CPU mediated)
|
||||||
|
```
|
||||||
|
|
||||||
|
- **GPU Direct RDMA** — GPU ↔ NIC bez CPU (InfiniBand, RoCE)
|
||||||
|
- **GPU Direct Storage** — GPU ↔ NVMe bez CPU (NVIDIA Magnum IO)
|
||||||
|
- **NVSwitch** — full bisection bandwidth mezi všemi GPU v node
|
||||||
|
|
||||||
|
## Virtualizace GPU
|
||||||
|
|
||||||
|
| Technologie | Popis | GPU support | Use case |
|
||||||
|
|------------|-------|-------------|----------|
|
||||||
|
| **NVIDIA vGPU (Grid)** | Časové slicing + dedikované profily | A-series (VDI), Q-series (pro viz), B-series (AI) | VDI, virtualizované AI |
|
||||||
|
| **NVIDIA MIG** | Hardwarové partition GPU | A100 (7 inst.), H100/H200/B200 | AI inference, multi-tenant GPU |
|
||||||
|
| **AMD MxGPU** | SR-IOV, hardwarové partition | AMD MI (pro), Radeon Pro | VDI, cloud gaming |
|
||||||
|
| **Intel SG (SG1)** | SR-IOV, hardwarové partition | Intel SG1, Flex, Arc | VDI, media transcoding |
|
||||||
|
| **GPU passthrough** | Dedikovaný GPU celé VM (VFIO-pci) | Všechny GPU | AI training, HPC, nejvyšší výkon |
|
||||||
|
|
||||||
|
### MIG partition table (A100 / H100)
|
||||||
|
|
||||||
|
| GPU | Partition profile | GPU Memory | Compute units |
|
||||||
|
|-----|------------------|-----------|--------------|
|
||||||
|
| **A100 80 GB** | 1g.5gb | 5 GB | 1 |
|
||||||
|
| A100 80 GB | 2g.10gb | 10 GB | 2 |
|
||||||
|
| A100 80 GB | 3g.20gb | 20 GB | 3 |
|
||||||
|
| A100 80 GB | 7g.40gb | 40 GB | 7 |
|
||||||
|
| A100 80 GB | Full (7× 1g) | 7 × 5 GB | 7 instances |
|
||||||
|
| **H100 80 GB** | 1g.6gb+me | 6 GB | 1 |
|
||||||
|
| H100 80 GB | 2g.12gb+me | 12 GB | 2 |
|
||||||
|
| H100 80 GB | 3g.24gb+me | 24 GB | 3 |
|
||||||
|
| H100 80 GB | 7g.80gb | 80 GB | 7 |
|
||||||
|
|
||||||
|
## GPU use cases
|
||||||
|
|
||||||
|
### AI Training
|
||||||
|
|
||||||
|
- **Modely**: LLM (70B-405B+), vision, multimodal
|
||||||
|
- **GPU**: H100, B200, GB200, MI300X
|
||||||
|
- **Interconnect**: NVLink 5 / Infinity Fabric (v rámci node), InfiniBand NDR (mezi nody)
|
||||||
|
- **Parallelism**: Data Parallel (DDP), Tensor Parallel (TP), Pipeline Parallel (PP), Fully Sharded (FSDP)
|
||||||
|
- **Framework**: PyTorch (NCCL), JAX (XLA), DeepSpeed, Megatron-LM
|
||||||
|
- **Tipy**:
|
||||||
|
- GB200: 2× B200 propojené NVLink, 8 GPU → 4 GB200
|
||||||
|
- DGX B200 / HGX B200: standardní building block
|
||||||
|
- InfiniBand: fat tree topology pro all-reduce optimalizaci
|
||||||
|
|
||||||
|
### AI Inference
|
||||||
|
|
||||||
|
- **Modely**: LLM serving, embedding, image gen
|
||||||
|
- **GPU**: A100, H200, B200 (larger VRAM pro větší modely)
|
||||||
|
- **Techniky**: MIG partition, TensorRT-LLM, vLLM, Triton Inference Server
|
||||||
|
- **Kvantizace**: FP8, INT8, INT4 → nižší VRAM, vyšší throughput
|
||||||
|
- **Latency**: batch size optimalizace, dynamic batching, continuous batching
|
||||||
|
- **Scale**: on-prem (2-32 GPU) / cloud (elastic)
|
||||||
|
|
||||||
|
### VDI (Virtual Desktop Infrastructure)
|
||||||
|
|
||||||
|
- **GPU**: NVIDIA A16 (1 GPU = 16 users), A10 (1 GPU = 4 users)
|
||||||
|
- **Technologie**: vGPU (Grid), AMD MxGPU
|
||||||
|
- **Protokoly**: VMware Blast, Citrix HDX, Microsoft RDP, PC-over-IP (HP Teradici)
|
||||||
|
- **Use case**: CAD (CATIA, SolidWorks), Office, engineering, healthcare (PACS)
|
||||||
|
|
||||||
|
### Rendering a VFX
|
||||||
|
|
||||||
|
- **GPU**: NVIDIA RTX 6000 Ada, RTX A6000, AMD Radeon Pro W7900
|
||||||
|
- **Rendering**: Blender (Cycles/OptiX), V-Ray, Octane Render, Redshift
|
||||||
|
- **Denoising**: AI-accelerated denoising na GPU
|
||||||
|
- **Farm rendering**: Deadline, Qube! (job scheduler)
|
||||||
|
|
||||||
|
## GPU server form factors
|
||||||
|
|
||||||
|
| Form factor | GPU count | Power | Cooling | Příklad |
|
||||||
|
|------------|-----------|-------|---------|---------|
|
||||||
|
| **1U** | 1-2 | 700-1400 W | Air (high-RPM) | Dell XR4510c |
|
||||||
|
| **2U** | 4-8 | 3-6 kW | Air / Liquid | Dell R760xa, HPE DL380a |
|
||||||
|
| **4U** | 8-10 | 5-8 kW | Liquid | NVIDIA DGX H100, Dell R760xa |
|
||||||
|
| **8U / Chassis** | 8-16 | 10-20 kW | Liquid (CDU) | NVIDIA HGX, Supermicro SYS-821GE |
|
||||||
|
|
||||||
|
## Zdroje
|
||||||
|
|
||||||
|
Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
|
||||||
|
|
||||||
|
*Poslední revize: 2026-06-03*
|
||||||
12
HARDWARE.md
Normal file
12
HARDWARE.md
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
# 🔧 Hardware a servery
|
||||||
|
|
||||||
|
Tento soubor byl rozdělen do samostatných oblastí:
|
||||||
|
|
||||||
|
| Oblast | Soubor |
|
||||||
|
|--------|--------|
|
||||||
|
| 🔧 Server hardware — komponenty a architektura | [SERVER-HW.md](SERVER-HW.md) |
|
||||||
|
| 🎮 GPU — architektura, modely, virtualizace | [GPU.md](GPU.md) |
|
||||||
|
| ⚙️ Server configuration — best practices podle workloadu | [SERVER-CONFIG.md](SERVER-CONFIG.md) |
|
||||||
|
| 📦 Provisioning — boot, instalace, správa serverů | [PROVISIONING.md](PROVISIONING.md) |
|
||||||
|
|
||||||
|
*Poslední revize: 2026-06-03*
|
||||||
197
PROVISIONING.md
Normal file
197
PROVISIONING.md
Normal file
@@ -0,0 +1,197 @@
|
|||||||
|
# 📦 Provisioning — boot, instalace, správa serverů
|
||||||
|
|
||||||
|
## Síťový boot (PXE / iPXE)
|
||||||
|
|
||||||
|
### PXE boot flow
|
||||||
|
|
||||||
|
```
|
||||||
|
1. Server power-on → PXE ROM v NIC / UEFI
|
||||||
|
2. DHCP Broadcast → DHCP server nabídne IP + next-server (TFTP) + boot file
|
||||||
|
3. TFTP stáhne pxelinux.0 (BIOS) / bootx64.efi (UEFI)
|
||||||
|
4. Načte konfiguraci (pxelinux.cfg/default nebo MAC/IP-based)
|
||||||
|
5. Stáhne kernel + initrd přes TFTP/HTTP (iPXE)
|
||||||
|
6. Kernel boot → automatická instalace (Kickstart / Preseed / AutoYaST)
|
||||||
|
```
|
||||||
|
|
||||||
|
### DHCP konfigurace (ISC DHCP)
|
||||||
|
|
||||||
|
```
|
||||||
|
subnet 10.0.0.0 netmask 255.255.255.0 {
|
||||||
|
next-server 10.0.0.10; # TFTP server
|
||||||
|
filename "ipxe.efi"; # Boot file (UEFI)
|
||||||
|
option domain-name-servers 10.0.0.10;
|
||||||
|
option routers 10.0.0.1;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### iPXE (moderní náhrada PXE)
|
||||||
|
|
||||||
|
- HTTP místo TFTP (rychlejší, spolehlivější)
|
||||||
|
- HTTPS support (Image verification, secure boot)
|
||||||
|
- iSCSI boot, FCoE boot
|
||||||
|
- Scriptable: `chain http://boot.example.com/script.ipxe`
|
||||||
|
- Embedded: iPXE ROM flashnutá přímo do NIC
|
||||||
|
|
||||||
|
### Porovnání PXE vs iPXE
|
||||||
|
|
||||||
|
| Vlastnost | PXE | iPXE |
|
||||||
|
|-----------|-----|------|
|
||||||
|
| Protokol | TFTP (pomalý, 512B/blok) | HTTP/HTTPS/iSCSI |
|
||||||
|
| Šifrování | Ne | HTTPS, TLS |
|
||||||
|
| Scripting | Pouze menu | Plný scripting engine |
|
||||||
|
| Debugging | Omezený | Vestavěný shell |
|
||||||
|
| UEFI/BIOS | Oba | Oba |
|
||||||
|
|
||||||
|
## Automatická instalace
|
||||||
|
|
||||||
|
### Kickstart (RHEL/Alma/Rocky)
|
||||||
|
|
||||||
|
```
|
||||||
|
# Minimal kickstart pro RHEL 9
|
||||||
|
text
|
||||||
|
url --url="http://10.0.0.10/install/rhel9"
|
||||||
|
lang en_US.UTF-8
|
||||||
|
keyboard us
|
||||||
|
timezone Europe/Prague --isUtc
|
||||||
|
|
||||||
|
rootpw --iscrypted $6$...
|
||||||
|
|
||||||
|
%packages
|
||||||
|
@^minimal-environment
|
||||||
|
vim
|
||||||
|
net-tools
|
||||||
|
%end
|
||||||
|
|
||||||
|
%post
|
||||||
|
echo "node001" > /etc/hostname
|
||||||
|
%end
|
||||||
|
|
||||||
|
reboot
|
||||||
|
```
|
||||||
|
|
||||||
|
### Preseed (Debian/Ubuntu)
|
||||||
|
|
||||||
|
```
|
||||||
|
d-i debian-installer/locale string en_US.UTF-8
|
||||||
|
d-i keyboard-configuration/xkb-keymap us
|
||||||
|
d-i netcfg/choose_interface select auto
|
||||||
|
d-i netcfg/get_hostname string node001
|
||||||
|
d-i clock-setup/utc boolean true
|
||||||
|
d-i time/zone string Europe/Prague
|
||||||
|
|
||||||
|
d-i partman-auto/method string regular
|
||||||
|
d-i partman-auto/choose_recipe select atomic
|
||||||
|
|
||||||
|
d-i passwd/root-login boolean true
|
||||||
|
d-i passwd/root-password password securepass
|
||||||
|
d-i passwd/root-password-again password securepass
|
||||||
|
|
||||||
|
d-i pkgsel/include string openssh-server vim
|
||||||
|
d-i finish-install/reboot_in_progress note
|
||||||
|
```
|
||||||
|
|
||||||
|
## Metal as a Service
|
||||||
|
|
||||||
|
### MAAS (Canonical)
|
||||||
|
|
||||||
|
- **Discovery**: DHCP → PXE boot → hardware detection (CPU, RAM, disk, MAC)
|
||||||
|
- **Komisionování**: node projde commissioning, uloží inventory do DB
|
||||||
|
- **Deploy**: obraz OS (Ubuntu, RHEL, ESXi) nahrán na disk → reboot
|
||||||
|
- **Integrace**: Juju, OpenStack, Kubernetes (Charmed Kubernetes)
|
||||||
|
- **Networking**: VLAN, subnet, DNS/DHCP management, BGP peering
|
||||||
|
|
||||||
|
### Digital Rebar / RackN
|
||||||
|
|
||||||
|
- **Provisioning**: workflow-based (stages: discovery → firmware → OS → config)
|
||||||
|
- **Multi-cloud**: bare metal + cloud + edge
|
||||||
|
- **Template**: šablony pro OS deployment (RHEL, Ubuntu, VMware)
|
||||||
|
- **API**: plně REST API, Terraform provider
|
||||||
|
|
||||||
|
## Management API — Redfish
|
||||||
|
|
||||||
|
### Standard DMTF
|
||||||
|
|
||||||
|
REST API (JSON) → nástupce IPMI.
|
||||||
|
|
||||||
|
| Endpoint | Účel |
|
||||||
|
|----------|------|
|
||||||
|
| `/redfish/v1/Systems/` | Server management (power, boot, inventory) |
|
||||||
|
| `/redfish/v1/Chassis/` | Fyzický hardware (PSU, fan, temp, sensors) |
|
||||||
|
| `/redfish/v1/Managers/` | BMC (iLO, iDRAC, XClarity) |
|
||||||
|
| `/redfish/v1/UpdateService/` | Firmware updates |
|
||||||
|
| `/redfish/v1/EventService/` | Event subscription (webhook) |
|
||||||
|
|
||||||
|
### Redfish příklady
|
||||||
|
|
||||||
|
```
|
||||||
|
# Power on server
|
||||||
|
POST /redfish/v1/Systems/1/Actions/ComputerSystem.Reset
|
||||||
|
Body: {"ResetType": "On"}
|
||||||
|
|
||||||
|
# Set boot override (one-shot PXE)
|
||||||
|
PATCH /redfish/v1/Systems/1
|
||||||
|
Body: {"Boot": {"BootSourceOverrideTarget": "Pxe", "BootSourceOverrideEnabled": "Once"}}
|
||||||
|
|
||||||
|
# Get sensor data
|
||||||
|
GET /redfish/v1/Chassis/1/Thermal
|
||||||
|
→ {"Temperatures": [{"Name": "CPU1", "ReadingCelsius": 45}], "Fans": [...]}
|
||||||
|
```
|
||||||
|
|
||||||
|
### IPMI (legacy)
|
||||||
|
|
||||||
|
- Port 623/UDP (RMCP)
|
||||||
|
- `ipmitool power on/off/status`
|
||||||
|
- `ipmitool sensor list`
|
||||||
|
- `ipmitool chassis bootdev pxe`
|
||||||
|
- Serial over LAN: `ipmitool sol activate`
|
||||||
|
|
||||||
|
## Terraform pro provisioning
|
||||||
|
|
||||||
|
```hcl
|
||||||
|
# Terraform provider pro VMware vSphere
|
||||||
|
provider "vsphere" {
|
||||||
|
user = var.vsphere_user
|
||||||
|
password = var.vsphere_password
|
||||||
|
vsphere_server = var.vsphere_server
|
||||||
|
}
|
||||||
|
|
||||||
|
resource "vsphere_virtual_machine" "web" {
|
||||||
|
name = "web-${count.index}"
|
||||||
|
resource_pool_id = data.vsphere_resource_pool.pool.id
|
||||||
|
datastore_id = data.vsphere_datastore.ds.id
|
||||||
|
num_cpus = 4
|
||||||
|
memory = 16384
|
||||||
|
guest_id = "rhel9_64Guest"
|
||||||
|
network_interface { network_id = data.vsphere_network.net.id }
|
||||||
|
disk { label = "os", size = 80 }
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Více v [CICD.md](CICD.md#infrastructure-as-code).
|
||||||
|
|
||||||
|
## Firmware management
|
||||||
|
|
||||||
|
- **BIOS/UEFI settings**: profilový update při provisioningu (Redfish `PATCH /Systems/1/Bios`)
|
||||||
|
- **Firmware updates**: Redfish UpdateService, SUU (Dell), SUM (HPE), SMM (Supermicro)
|
||||||
|
- **Lifecycle Controller** (Dell LC): integrovaný OS pro firmware management
|
||||||
|
- **Baseline management**: udržovat konzistentní firmware verze napříč fleetem
|
||||||
|
- **Boot: UEFI vs Legacy BIOS**:
|
||||||
|
- **UEFI**: Secure Boot, GPT, větší disky, rychlejší boot
|
||||||
|
- **Legacy BIOS**: MBR, kompatibilita, limit 2 TB boot disk
|
||||||
|
|
||||||
|
## Configuration management (post-provisioning)
|
||||||
|
|
||||||
|
| Nástroj | Jazyk | Push/Pull | Use case |
|
||||||
|
|---------|-------|-----------|----------|
|
||||||
|
| **Ansible** | YAML | Push (SSH) | General config management, ad-hoc |
|
||||||
|
| **Puppet** | Ruby DSL | Pull (agent) | State management, enterprise |
|
||||||
|
| **Chef** | Ruby DSL | Pull (agent) | Compliance, infrastructure automation |
|
||||||
|
| **SaltStack** | YAML/Python | Both (salt-minion) | High-speed config, event-driven |
|
||||||
|
|
||||||
|
Více v [CICD.md](CICD.md).
|
||||||
|
|
||||||
|
## Zdroje
|
||||||
|
|
||||||
|
Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
|
||||||
|
|
||||||
|
*Poslední revize: 2026-06-03*
|
||||||
757
SERVER-CONFIG.md
Normal file
757
SERVER-CONFIG.md
Normal file
@@ -0,0 +1,757 @@
|
|||||||
|
# ⚙️ Server configuration — best practices podle workloadu
|
||||||
|
|
||||||
|
## Obecná BIOS/UEFI nastavení
|
||||||
|
|
||||||
|
| Nastavení | Doporučení | Zdůvodnění |
|
||||||
|
|-----------|-----------|------------|
|
||||||
|
| **Boot mode** | UEFI | Secure Boot, GPT, větší disky |
|
||||||
|
| **Power profile** | Performance / OS Control | Max výkon, C-States disabled |
|
||||||
|
| **Hyper-Threading** | Enabled | +30-50 % throughput pro multi-thread |
|
||||||
|
| **Virtualization** | Enabled (VT-x/AMD-V) | Nutné pro hypervisor, containers |
|
||||||
|
| **SR-IOV** | Enabled | GPU, NIC passthrough |
|
||||||
|
| **NUMA** | Enabled | NUMA-aware scheduling |
|
||||||
|
| **ACPI** | Enabled | Power management, OS-level |
|
||||||
|
| **Secure Boot** | Enabled | Secure boot chain |
|
||||||
|
| **TPM** | Enabled | Measured boot, key storage |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Databázové servery
|
||||||
|
|
||||||
|
### Volba CPU
|
||||||
|
|
||||||
|
| DB typ | CPU preference | Zdůvodnění |
|
||||||
|
|--------|---------------|------------|
|
||||||
|
| **OLTP** (PostgreSQL, MySQL) | High clock, moderate cores | Nízká latence na transakci, limited parallelism |
|
||||||
|
| **OLAP** (ClickHouse, Snowflake) | Many cores, AVX-512 | Columnstore, high parallelism |
|
||||||
|
| **In-memory** (Redis, Memcached) | High clock, low cache latency | Single-threaded (Redis), RAM bandwidth |
|
||||||
|
| **Document** (MongoDB) | Balance (clock × cores) | Mixed workload |
|
||||||
|
| **Distributed** (Cassandra, Scylla) | Many cores, high cache | Shard-per-core (Scylla), compaction |
|
||||||
|
| **Oracle OLTP** | High clock, moderate cores, core-factor aware | CPU license cost (core factor 0.5 pro AMD EPYC i Intel Xeon) |
|
||||||
|
| **Oracle OLAP / DW** | Many cores, large SGA, in-memory option | Parallel query, Exadata Smart Scan, compression |
|
||||||
|
|
||||||
|
### Oracle CPU licensing — core factor
|
||||||
|
|
||||||
|
Oracle licencuje na jádro s korekčním faktorem dle procesoru. Faktor 0.5 znamená, že 2 jádra = 1 Oracle license.
|
||||||
|
|
||||||
|
| Procesor | Core factor | 64 fyzických jader → Oracle licencí |
|
||||||
|
|----------|-------------|--------------------------------------|
|
||||||
|
| AMD EPYC (všechny řady) | 0.5 | 32 |
|
||||||
|
| Intel Xeon (Scalable) | 0.5 | 32 |
|
||||||
|
| IBM POWER | 1.0 | 64 |
|
||||||
|
| ARM (Ampere Altra) | 0.5 | 32 |
|
||||||
|
|
||||||
|
**Dopad na výběr CPU**: Při stejném Oracle license cost je EPYC s více jádry výhodnější — dostanete více compute power za stejnou license cenu.
|
||||||
|
|
||||||
|
### Konfigurace podle velikosti firmy a typu storage
|
||||||
|
|
||||||
|
#### Varianta A: Malá firma — lokální NVMe RAID
|
||||||
|
|
||||||
|
| Komponenta | Doporučení | Poznámka |
|
||||||
|
|-----------|-----------|----------|
|
||||||
|
| **CPU** | 1× EPYC 9124/9224 nebo Intel Xeon 4410Y (8-16C) | 1 socket, high clock |
|
||||||
|
| **RAM** | 64-256 GB (8-16 GB/core) | DDR5-4800, 1DPC |
|
||||||
|
| **OS disk** | 2× SATA/SAS SSD, RAID 1 (240-480 GB) | Pro OS + binární soubory |
|
||||||
|
| **Data disk** | 4-6× NVMe (U.2/E3.S), RAID 10 | Lokální data, žádné sdílení |
|
||||||
|
| **WAL disk** | 2× NVMe RAID 1 (400-800 GB) | Pouze PostgreSQL |
|
||||||
|
| **Network** | 2× 25 GbE (LACP) | Aplikační traffic + management |
|
||||||
|
| **Form factor** | 1U nebo 2U | Single node, žádný cluster |
|
||||||
|
| **Storage backend** | Lokální RAID controller (PERC/Broadcom) | HW RAID 10 nebo SW RAID (mdadm) |
|
||||||
|
| **HA** | Aplikace řídí failover (patroni, repmgr, orchestrator) | Standby node při selhání |
|
||||||
|
|
||||||
|
**Use case**: Startup, pobočka, dev/test, < 500 uživatelů, jeden databázový server, nízké nároky na dostupnost.
|
||||||
|
|
||||||
|
#### Varianta B: Střední firma — lokální NVMe + asynchronní replikace
|
||||||
|
|
||||||
|
| Komponenta | Doporučení | Poznámka |
|
||||||
|
|-----------|-----------|----------|
|
||||||
|
| **CPU** | 1-2× EPYC 9334/9374F nebo Intel Xeon 5418Y (16-24C) | 1-2 socket, balanced |
|
||||||
|
| **RAM** | 128-512 GB (8-16 GB/core) | DDR5-4800/5600, 1DPC |
|
||||||
|
| **OS disk** | 2× NVMe RAID 1 (2× 480 GB) | OS + binárky |
|
||||||
|
| **Data disk** | 6-8× NVMe, RAID 10 | Lokální NVMe, 3-6 TB usable |
|
||||||
|
| **WAL disk** | 2× NVMe RAID 1 (2× 800 GB) | Oddělený od data |
|
||||||
|
| **Network** | 2× 25 GbE (app) + 2× 25 GbE (replication) | Aplikační a replikační síť odděleny |
|
||||||
|
| **Form factor** | 2U | Primární + replica node |
|
||||||
|
| **Storage backend** | SW RAID (mdadm) nebo HW RAID (PERC H965) | Write-back cache s BBU |
|
||||||
|
| **HA** | Patroni / repmgr / MySQL InnoDB Cluster | Asynchronní replikace na 1-2 standby |
|
||||||
|
|
||||||
|
**Use case**: E-commerce, SaaS střední velikosti, 500-5000 uživatelů, RPO < 1 min, RTO < 5 min.
|
||||||
|
|
||||||
|
#### Varianta C: Velká firma — FC SAN (enterprise)
|
||||||
|
|
||||||
|
| Komponenta | Doporučení | Poznámka |
|
||||||
|
|-----------|-----------|----------|
|
||||||
|
| **CPU** | 2× EPYC 9654/9965 nebo Xeon 8592+/6980P (48-128C) | 2 socket, max cores, large cache |
|
||||||
|
| **RAM** | 512 GB - 2 TB (8-16 GB/core) | DDR5, 2DPC (penalizace speed), 12 channelů (EPYC) |
|
||||||
|
| **OS disk** | 2× SATA SSD RAID 1 (2× 480 GB) | Pouze OS, data na SAN |
|
||||||
|
| **Data + WAL** | LUNy z FC SAN | Hitachi VSP / Dell PowerMax / Pure //X |
|
||||||
|
| **HBA** | 2× dual-port FC HBA (32/64 Gb) | Multipath (active-active), FC-NVMe |
|
||||||
|
| **Network** | 2× 25/100 GbE (app) + 2× 32/64 Gb FC (storage) | App i storage síť odděleny |
|
||||||
|
| **Form factor** | 2U | 2-8 node cluster (RAC, AlwaysOn AG) |
|
||||||
|
| **Storage backend** | FC SAN — LUN per databáze | Thin provisioning, RAID na SAN, snapshots |
|
||||||
|
| **HA** | Oracle RAC / SQL Server AOAG / PostgreSQL Patroni | Synchronní replikace, FC multipath |
|
||||||
|
|
||||||
|
**Výhody SAN**: Centrální management, snapshots, cloning, disaster recovery (SRDF/Metro), oddělená storage síť, vyšší dostupnost.
|
||||||
|
**Nevýhody**: Vyšší latence oproti lokálnímu NVMe (~50-200 µs přes SAN vs ~10 µs local NVMe), vyšší CAPEX, vendor lock-in.
|
||||||
|
|
||||||
|
#### Varianta D: Velká firma — Ceph / SDS backend
|
||||||
|
|
||||||
|
| Komponenta | Doporučení | Poznámka |
|
||||||
|
|-----------|-----------|----------|
|
||||||
|
| **CPU** | 2× EPYC 9334/9654 (16-32C) | Méně cores než SAN varianta — část CPU jde na Ceph client |
|
||||||
|
| **RAM** | 256-512 GB | Méně RAM — Ceph client cache není tak efektivní jako lokální buffer |
|
||||||
|
| **OS disk** | 2× SATA SSD RAID 1 (2× 480 GB) | OS |
|
||||||
|
| **Network** | 2× 25/100 GbE (app) + 2× 25/100 GbE (Ceph public) | App i Ceph traffic po Ethernetu |
|
||||||
|
| **HBA** | Storage HBA v IT/HBA mode (žádný RAID) | Pro Ceph OSD node, ne DB node |
|
||||||
|
| **Form factor** | 2U | DB nod + separátní Ceph OSD nod |
|
||||||
|
| **Storage backend** | RBD (RADOS Block Device) přes Ceph | 3× replikace nebo erasure coding |
|
||||||
|
| **HA** | Aplikace + Ceph inherentní HA | Ceph self-healing, auto-rebalance |
|
||||||
|
|
||||||
|
**Výhody Ceph**: Žádný vendor lock-in, horizontální škálování, jednotná platforma pro block/file/object, nižší CAPEX.
|
||||||
|
**Nevýhody**: Vyšší latence a CPU režie (Ceph client → network → OSD), variabilní výkon, složitější troubleshooting.
|
||||||
|
|
||||||
|
#### Varianta E: Cloud — RDS / CloudSQL / Azure SQL
|
||||||
|
|
||||||
|
| Komponenta | Doporučení | Poznámka |
|
||||||
|
|-----------|-----------|----------|
|
||||||
|
| **Compute** | AWS RDS (db.r7g/r8g), Azure SQL (GP/BC/Hyperscale) | Managed service, bez přístupu k OS |
|
||||||
|
| **Storage** | EBS gp3 / io2, Azure Premium SSD v2, Cloud SQL SSD | Automatické škálování, PITR, multi-AZ |
|
||||||
|
| **Network** | Security Group, Private Link, VPC peering | Žádný HBA, žádná SAN — vše přes Ethernet |
|
||||||
|
| **HA** | Multi-AZ (synchronní), read replicas | Managed failover, RTO < 60 s |
|
||||||
|
| **Backup** | Automated, PITR (7-35 dní) | Bez nutnosti managementu |
|
||||||
|
|
||||||
|
**Use case**: Žádný on-prem hardware, elastické škálování, pay-per-use, menší provozní režie.
|
||||||
|
**Nevýhody**: Vyšší dlouhodobé náklady, data residency, network latency, limited customization.
|
||||||
|
|
||||||
|
### Srovnání variant
|
||||||
|
|
||||||
|
| Aspekt | Lokální NVMe (malá) | Lokální NVMe (střední) | FC SAN | Ceph | Cloud |
|
||||||
|
|--------|---------------------|----------------------|--------|------|-------|
|
||||||
|
| **Latence** | ~10 µs | ~10 µs | ~50-200 µs | ~100-500 µs | ~100-1000 µs |
|
||||||
|
| **Škálování** | Vertikální | Vertikální | Horizontální | Horizontální | Elastické |
|
||||||
|
| **CAPEX** | Nízký | Střední | Vysoký | Střední | Žádný (OPEX) |
|
||||||
|
| **Provozní režie** | Nízká | Nízká | Vysoká (SAN admin) | Střední | Žádná |
|
||||||
|
| **HA** | Aplikace | Patroni/Cluster | RAC/AOAG | Ceph HA | Managed |
|
||||||
|
| **RPO** | 1-5 min | < 1 min | < 10 s | < 30 s | < 60 s |
|
||||||
|
| **RTO** | 5-15 min | < 5 min | < 2 min | < 5 min | < 60 s |
|
||||||
|
| **Počet serverů** | 1-2 | 2-4 | 4-16 | 6-20+ | 0 (managed) |
|
||||||
|
| **Firma** | Startup/SME | SME/Enterprise | Enterprise | Enterprise | Libovolná |
|
||||||
|
|
||||||
|
### PostgreSQL parameter matrix podle storage typu
|
||||||
|
|
||||||
|
| Parametr | Local NVMe | FC SAN | Ceph RBD |
|
||||||
|
|----------|-----------|--------|----------|
|
||||||
|
| `random_page_cost` | 1.1 | 1.5-2.0 | 2.0-3.0 |
|
||||||
|
| `effective_io_concurrency` | 300 | 100-200 | 50-100 |
|
||||||
|
| `synchronous_commit` | off (NVMe cache) | on (SAN cache) | off (Ceph cache) |
|
||||||
|
| `full_page_writes` | on | on | on (i přes Ceph) |
|
||||||
|
|
||||||
|
### Storage layout podle typu backendu
|
||||||
|
|
||||||
|
**Lokální NVMe (malá/střední):**
|
||||||
|
```
|
||||||
|
Mount point FS RAID Disk Účel
|
||||||
|
/ ext4 1 (mirror) 2× SATA SSD OS
|
||||||
|
/data xfs 10 4-8× NVMe Data
|
||||||
|
/wal xfs 1 (mirror) 2× NVMe WAL (PG)
|
||||||
|
```
|
||||||
|
|
||||||
|
**FC SAN (enterprise):**
|
||||||
|
```
|
||||||
|
Mount point FS Device Účel
|
||||||
|
/ ext4 local RAID 1 (2× SSD) OS
|
||||||
|
/dev/sdb xfs FC LUN 1 (500 GB) WAL (PG)
|
||||||
|
/dev/sdc xfs FC LUN 2 (2 TB) Data
|
||||||
|
/dev/sdd xfs FC LUN 3 (2 TB) Indexy (oddělené)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Ceph RBD:**
|
||||||
|
```
|
||||||
|
Mount point FS Ceph device Účel
|
||||||
|
/ ext4 local RAID 1 (2× SSD) OS
|
||||||
|
/dev/rbd0 xfs rbd datastore-01 Data + WAL (Ceph RBD)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Kernel tuning podle variants
|
||||||
|
|
||||||
|
**Lokální NVMe:**
|
||||||
|
```
|
||||||
|
vm.dirty_ratio = 30
|
||||||
|
vm.dirty_background_ratio = 5
|
||||||
|
```
|
||||||
|
|
||||||
|
**FC SAN:**
|
||||||
|
```
|
||||||
|
# SAN storage — vyšší latency, méně agresivní flush
|
||||||
|
vm.dirty_ratio = 20
|
||||||
|
vm.dirty_background_ratio = 3
|
||||||
|
vm.dirty_expire_centisecs = 3000 # Defer writes (SAN cache)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Ceph RBD:**
|
||||||
|
```
|
||||||
|
# Ceph RBD — network storage, optimalizovat pro RBD cache
|
||||||
|
vm.dirty_ratio = 15
|
||||||
|
vm.dirty_background_ratio = 2
|
||||||
|
# RBD cache settings
|
||||||
|
# rbd cache = true (client-side)
|
||||||
|
# rbd cache size = 256-512 MB
|
||||||
|
```
|
||||||
|
|
||||||
|
### Database-specific tuning
|
||||||
|
|
||||||
|
| Parametr | PostgreSQL | MySQL | Oracle | MongoDB |
|
||||||
|
|----------|-----------|-------|--------|---------|
|
||||||
|
| **Cache** | `shared_buffers` 25 % RAM | `innodb_buffer_pool` 70-80 % RAM | `SGA_TARGET` 60-80 % RAM | `WiredTiger cache` 50-80 % RAM |
|
||||||
|
| **OS cache** | `effective_cache_size` 75 % RAM | OS cache + InnoDB | OS cache (double buffering risk při large SGA) | OS cache |
|
||||||
|
| **Write buffer** | `wal_buffers` 64-256 MB | `innodb_log_file_size` 1-4 GB | Redo log (2-4 groups, 200 MB-4 GB) | WiredTiger log |
|
||||||
|
| **Connections** | `max_connections` 50-500 | `max_connections` 100-500 | `processes` 200-2000 | maxIncomingConnections |
|
||||||
|
| **I/O** | `effective_io_concurrency` 200 | `innodb_io_capacity` 2000 | `db_file_multiblock_read_count` 128 | WiredTiger eviction |
|
||||||
|
| **Huge pages** | `huge_pages = try` | `large-pages = ON` | `use_large_pages = only` (mandatory) | transparent_hugepages=never |
|
||||||
|
| **Parallel query** | `max_parallel_workers` 4-8 | `innodb_parallel_read_threads` 4 | `parallel_degree_policy = auto` — až 64 | — |
|
||||||
|
|
||||||
|
### Connectivity per variant
|
||||||
|
|
||||||
|
| Varianta | App síť | Storage síť | Replikace | Management |
|
||||||
|
|----------|---------|-------------|-----------|------------|
|
||||||
|
| **Lokální (malá)** | 2× 25 GbE LACP | — | 2× 25 GbE (same) | iDRAC/iLO |
|
||||||
|
| **Lokální (střední)** | 2× 25 GbE LACP | — | 2× 25 GbE dedik. | iDRAC/iLO |
|
||||||
|
| **FC SAN** | 2× 25/100 GbE | 2× 32/64 Gb FC (multipath) | FC replication | iDRAC/iLO + SAN mgmt |
|
||||||
|
| **Ceph** | 2× 25/100 GbE | 2× 25/100 GbE (public net) | 2× 25/100 GbE (cluster net) | iDRAC/iLO + Ceph mgmt |
|
||||||
|
| **Cloud** | Elastic IP / Private Link | — | — | AWS Console / API |
|
||||||
|
| **Oracle Standalone** | 2× 25 GbE LACP | ASM (2× 25 GbE nebo FC 32G) | Data Guard 2× 25 GbE | iLO + ASM mgmt |
|
||||||
|
| **Oracle RAC** | 2-4× 25/100 GbE | 2× 64 Gb FC (multipath) | Cache Fusion interconnect | iLO + SAN mgmt |
|
||||||
|
| **Oracle Exadata** | 4-8× 100 GbE RoCE | NVMe over Fabric | RDMA interconnect | Exadata CLI + OEDA |
|
||||||
|
|
||||||
|
### Oracle-specific konfigurace
|
||||||
|
|
||||||
|
#### Oracle ASM — diskgroup layout
|
||||||
|
|
||||||
|
Oracle ASM (Automatic Storage Management) nahrazuje tradiční filesystem + volume manager:
|
||||||
|
|
||||||
|
| Diskgroup | Redundancy | Disky | Účel |
|
||||||
|
|-----------|-----------|-------|-------|
|
||||||
|
| **DATA** | Normal (2× mirror) | 4-12× FC LUN/NVMe | Data files, temp files, control files |
|
||||||
|
| **FRA** (Flash Recovery Area) | Normal (2× mirror) | 2-6× FC LUN/NVMe | Archive logs, backup, flashback logs |
|
||||||
|
| **REDO** | High (3× mirror) | 2-4× FC LUN/NVMe | Online redo log groups (I/O kritické) |
|
||||||
|
| **SPFILE** | Normal | 2× small LUN | Server parameter file |
|
||||||
|
|
||||||
|
**ASM striping**: Coarse (1 MB) pro běžná data, Fine (128 KB) pro redo logy (nižší latence zápisu).
|
||||||
|
|
||||||
|
#### Varianta O1: Standalone Oracle (malá/střední, single instance)
|
||||||
|
|
||||||
|
| Parametr | Small (< 500 users) | Medium (500-2000 users) |
|
||||||
|
|----------|---------------------|------------------------|
|
||||||
|
| **CPU** | 1-2× EPYC 9124-9224 / Xeon 4410Y (8-16C) | 2× EPYC 9334-9374F / Xeon 5418Y (16-24C) |
|
||||||
|
| **RAM (SGA + PGA)** | 64-128 GB (SGA 70 %, PGA 30 %) | 128-512 GB (SGA 60-80 %, PGA 20-40 %) |
|
||||||
|
| **Huge pages** | Ano (vm.nr_hugepages) — mandatory pro SGA | Ano |
|
||||||
|
| **OS disk** | 2× SATA SSD RAID 1 (240 GB) | 2× NVMe RAID 1 (480 GB) |
|
||||||
|
| **DATA + FRA** | 4-6× NVMe, ASM normal redundancy | 6-8× NVMe nebo FC LUN, ASM normal |
|
||||||
|
| **REDO** | 2-4× NVMe (oddělené od DATA), ASM high | 4× FC LUN (oddělené), ASM high |
|
||||||
|
| **Archive log** | Lokální FRA | FC LUN (FRA diskgroup) |
|
||||||
|
| **Network (app)** | 2× 25 GbE LACP | 2-4× 25/100 GbE LACP |
|
||||||
|
| **Network (storage)** | — (lokální NVMe) | 2× FC 32G multipath |
|
||||||
|
| **Network (Data Guard)** | — | 2× 25 GbE dedikované |
|
||||||
|
| **DB version** | Oracle SE2 (max 16 threads) | Oracle EE (neomezené) |
|
||||||
|
|
||||||
|
**Use case**: Dev/test, malé produkční DB, pobočky. SE2 license = max 16 CPU threads, limitovaná parallel execution.
|
||||||
|
|
||||||
|
#### Varianta O2: Oracle Data Guard (střední/velká, HA + DR)
|
||||||
|
|
||||||
|
Primární + standby v active-passive režimu, možnost Active Data Guard pro reporting.
|
||||||
|
|
||||||
|
| Parametr | Doporučení |
|
||||||
|
|----------|-----------|
|
||||||
|
| **CPU** | 2× EPYC 9654-9965 / Xeon 8592+ (32-64C) |
|
||||||
|
| **RAM** | 256-1024 GB (SGA 60-80 %, PGA 20-40 %) |
|
||||||
|
| **Huge pages** | Ano (50-80 % RAM alokováno pro SGA) |
|
||||||
|
| **OS disk** | 2× NVMe RAID 1 (480 GB) |
|
||||||
|
| **Storage** | FC SAN LUN (DATA + FRA + REDO odděleně) nebo NVMe + ASM |
|
||||||
|
| **HBA** | 2× dual-port FC 32/64 Gb (multipath active-active) |
|
||||||
|
| **App network** | 2-4× 25/100 GbE LACP |
|
||||||
|
| **Storage network** | 2× FC 32/64 Gb multipath |
|
||||||
|
| **Data Guard network** | 2× 25/100 GbE dedikované (sync nebo async) |
|
||||||
|
| **Data Guard režim** | Maximum Availability (sync, fallback na async) — RPO = 0 |
|
||||||
|
| **Topologie** | 1 primary + 1-2 standby (physical), far sync pro geo-DR |
|
||||||
|
| **Active Data Guard** | Standby otevřená pro čtení (reporting, backup) — vyžaduje ADG licenci |
|
||||||
|
|
||||||
|
**Latence Data Guard**:
|
||||||
|
```text
|
||||||
|
Synchronní (Maximum Availability):
|
||||||
|
Primární COMMIT → LGWR flush REDO → sync přes síť → Standby LGWR → ACK → ~1-5 ms
|
||||||
|
RPO = 0, dopad na latenci zápisu
|
||||||
|
|
||||||
|
Asynchronní (Maximum Performance):
|
||||||
|
Primární COMMIT → LGWR flush REDO → async do standby buffer → ~0.1-1 ms
|
||||||
|
RPO = několik sekund, zanedbatelný dopad na zápis
|
||||||
|
```
|
||||||
|
|
||||||
|
**Síťové požadavky pro Data Guard sync**:
|
||||||
|
- RTT < 2 ms pro synchronní režim (doporučeno < 1 ms)
|
||||||
|
- Min. 10 GbE, doporučeno 25 GbE (propustnost = REDO rate × 2)
|
||||||
|
- REDO rate: OLTP ~50-500 MB/s, batch ~500-2000 MB/s
|
||||||
|
- Při REDO rate 500 MB/s a 25 GbE → ~20 % link utilization
|
||||||
|
|
||||||
|
#### Varianta O3: Oracle RAC (velká, enterprise)
|
||||||
|
|
||||||
|
Multi-instance cluster se shared storage a Cache Fusion.
|
||||||
|
|
||||||
|
| Parametr | Doporučení |
|
||||||
|
|----------|-----------|
|
||||||
|
| **Počet nodů** | 2-4 (typicky), max 64 (RAC cluster) |
|
||||||
|
| **CPU per node** | 2× EPYC 9654-9965 / Xeon 8592+ (32-64C) |
|
||||||
|
| **RAM per node** | 512-2048 GB (SGA 60-80 %, PGA 20-40 %) |
|
||||||
|
| **Huge pages** | Ano (1 GB stránky pokud RAM > 512 GB) |
|
||||||
|
| **Storage** | FC SAN — shared LUNs (ASM normal/high redundancy) |
|
||||||
|
| **HBA** | 2× dual-port FC 64 Gb (multipath, active-active) |
|
||||||
|
| **App network** | 2-4× 25/100 GbE LACP (VIP, SCAN listener) |
|
||||||
|
| **Storage network** | 2-4× FC 64 Gb (multipath per node) |
|
||||||
|
| **Cache Fusion interconnect** | 2× 100 GbE (RoCE v2 nebo InfiniBand) — dedikovaný |
|
||||||
|
| **RAC interconnect latency** | < 5 µs (doporučeno), max < 10 µs |
|
||||||
|
| **ASM** | Normal redundancy (2-way mirror) |
|
||||||
|
| **Oracle Clusterware** | Voting disk (3× 1 GB LUN), OCR (3× 500 MB LUN) |
|
||||||
|
| **Service** | OLTP_service, REPORT_service, BATCH_service |
|
||||||
|
|
||||||
|
**Cache Fusion — kritický interconnect**:
|
||||||
|
```
|
||||||
|
Node A (DB instance) ←──→ Node B (DB instance)
|
||||||
|
│ │
|
||||||
|
└──────── ASM ───────────┘
|
||||||
|
│
|
||||||
|
FC SAN (shared storage)
|
||||||
|
|
||||||
|
Cache Fusion traffic: dirty block transfer mezi instancemi
|
||||||
|
→ Latence < 5 µs, jinak RAC škálování degraduje
|
||||||
|
→ Kapacita: 2× 100 GbE, dedikovaný switch nebo InfiniBand HDR100
|
||||||
|
→ Doporučená MTU: 9000 (jumbo frames)
|
||||||
|
```
|
||||||
|
|
||||||
|
**RAC sizing podle počtu transakcí**:
|
||||||
|
|
||||||
|
| TPS | Nodů | CPU per node | RAM per node | Interconnect |
|
||||||
|
|-----|------|-------------|-------------|-------------|
|
||||||
|
| < 10 000 | 2 | 16-24C | 256 GB | 2× 25 GbE |
|
||||||
|
| 10 000 - 50 000 | 2-4 | 32-48C | 512 GB | 2× 100 GbE RoCE |
|
||||||
|
| 50 000 - 200 000 | 4-8 | 48-64C | 1024 GB | 2× 100 GbE RoCE / InfiniBand |
|
||||||
|
| > 200 000 | 8+ | 64-128C | 2048 GB | InfiniBand HDR100/HDR200 |
|
||||||
|
|
||||||
|
**RAC sizing — výpočet licence cost**:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Příklad: 4-node RAC, každý node 2× EPYC 9654 (96C) = 192 cores per node
|
||||||
|
Core factor 0.5 → 96 Oracle licenses per node
|
||||||
|
4 × 96 = 384 Oracle EE licenses
|
||||||
|
Pri ~$47.5k/license → ~$18.2M (jen licence, bez supportu 22 % ročně)
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Varianta O4: Oracle Exadata (hyperscale)
|
||||||
|
|
||||||
|
Engineered system — optimální pro hybrid workload (OLTP + DW).
|
||||||
|
|
||||||
|
| Parametr | X9M / X10M | Use case |
|
||||||
|
|----------|-----------|----------|
|
||||||
|
| **Database servers** | 2-8× (Xeon, 1.5-6 TB RAM, NVMe) | Compute |
|
||||||
|
| **Storage servers** | 3-18× (NVMe + HDD, Smart Scan) | Offloading predikátů |
|
||||||
|
| **Smart Scan** | Filtrace na storage vrstvě | Méně dat po síti, vyšší propustnost |
|
||||||
|
| **RoCE interconnect** | 100 GbE (RDMA) | Nízká latence, high bandwidth |
|
||||||
|
| **In-Memory Column Store** | Volitelná licence | Real-time analytics bez ETL |
|
||||||
|
| **HCC (Hybrid Columnar Compression)** | Compression v storage serverech | Až 10-15× komprese pro DW |
|
||||||
|
| **Rack power** | ~15-30 kW (full rack) | Vyšší densita |
|
||||||
|
|
||||||
|
**Kdy zvolit Exadata místo standalone RAC**:
|
||||||
|
- OLTP > 50 000 TPS
|
||||||
|
- Potřeba konsolidace (více DB na jeden cluster)
|
||||||
|
- Smart Scan výrazně zrychluje reporting na produkčních datech
|
||||||
|
- HCC pro úsporu storage u DW workloadů
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## 2. Hypervisor host (ESXi / KVM / Hyper-V)
|
||||||
|
|
||||||
|
### Konfigurace podle velikosti a storage typu
|
||||||
|
|
||||||
|
#### Varianta A: Malá firma — lokální storage (2-3 hosty)
|
||||||
|
|
||||||
|
| Komponenta | Doporučení | Poznámka |
|
||||||
|
|-----------|-----------|----------|
|
||||||
|
| **CPU** | 1× EPYC 9224/9254 nebo Xeon 4410Y/5418Y (12-24C) | 1 socket, dost cores pro VM density |
|
||||||
|
| **RAM** | 128-256 GB (4-8 GB/core) | DDR5, 1DPC |
|
||||||
|
| **OS disk** | 2× SATA SSD RAID 1 (2× 240-480 GB) | ESXi / Proxmox / Hyper-V boot |
|
||||||
|
| **VM storage** | 4-6× SATA/SAS SSD, RAID 5/6 nebo 10 | Lokální RAID, 4-12 TB usable |
|
||||||
|
| **Network** | 2-4× 10/25 GbE (LACP) | Sdílený pro vše (management + VM + storage) |
|
||||||
|
| **Hypervisor** | VMware vSphere Standard / Proxmox VE / Hyper-V | Basic license, žádné enterprise funkce |
|
||||||
|
| **Storage backend** | Lokální RAID controller (PERC H755, Broadcom 9560) | HW RAID s cache, write-back |
|
||||||
|
| **HA** | VMware HA / Proxmox HA | Restart VM na jiném hostu při selhání |
|
||||||
|
| **Backup** | Veeam B&R Free / PBS (Proxmox Backup Server) | Lokální nebo USB disk |
|
||||||
|
|
||||||
|
**Use case**: Malá kancelář, pobočka, dev/test, < 10 VM, nízký rozpočet, jednoduchá správa.
|
||||||
|
**Limitace**: Žádné vMotion bez shared storage, outage při výpadku hosta (restart HA, ne seamless).
|
||||||
|
|
||||||
|
#### Varianta B: Střední firma — vSAN / Ceph (3-6 hostů)
|
||||||
|
|
||||||
|
| Komponenta | Doporučení | Poznámka |
|
||||||
|
|-----------|-----------|----------|
|
||||||
|
| **CPU** | 1-2× EPYC 9334/9654 nebo Xeon 5418Y/8592+ (16-32C) | 1-2 socket |
|
||||||
|
| **RAM** | 256-512 GB (4-8 GB/core) | DDR5, 2DPC (minimální penalizace) |
|
||||||
|
| **OS disk** | 2× SATA SSD RAID 1 nebo 2× M.2 NVMe (BOSS-S1) | Oddělený od VM storage |
|
||||||
|
| **Cache tier** | 1-2× NVMe (vSAN caching / Ceph WAL+DB) | Pro write performance |
|
||||||
|
| **Capacity tier** | 4-8× SATA/SAS SSD nebo HDD (vSAN capacity / Ceph OSD) | HDD pro kapacitu, SSD pro performance |
|
||||||
|
| **Network** | 4× 25/100 GbE — 2× VM + mgmt, 2× storage (vSAN/Ceph) | Oddělená storage síť, RDMA (RoCE v2) |
|
||||||
|
| **Hypervisor** | VMware vSAN / Proxmox Ceph / StarWind HCI | HCI license (vSAN ~$2.5k/Core) |
|
||||||
|
| **Storage backend** | vSAN OSA/ESA nebo Ceph (RADOS) | Distributed storage, auto-rebalance |
|
||||||
|
| **HA** | vSphere HA + vSAN / Proxmox HA + Ceph | vMotion, DRS, automated failover |
|
||||||
|
| **Failover** | N+1 (jeden host jako rezerva) | U vSAN min. 4 hosty (pro ESA min. 3) |
|
||||||
|
|
||||||
|
**Čistě Ceph varianta (Proxmox / OpenStack)**:
|
||||||
|
```
|
||||||
|
Proxmox node (3-6×):
|
||||||
|
├── CPU: 1× EPYC 9224-9334 (12-24C)
|
||||||
|
├── RAM: 128-256 GB
|
||||||
|
├── OS: 2× SATA SSD RAID 1
|
||||||
|
├── Ceph OSD: 4-8× NVMe/SATA SSD (RAW, HBA mode)
|
||||||
|
├── Network: 2× 25 GbE (public) + 2× 25 GbE (cluster)
|
||||||
|
└── Storage: Ceph 3× replication, CRUSH host failure domain
|
||||||
|
```
|
||||||
|
|
||||||
|
**VMware vSAN varianta (4-6 hostů)**:
|
||||||
|
```
|
||||||
|
vSAN node (4-6×):
|
||||||
|
├── CPU: 1-2× EPYC/Xeon (16-32C)
|
||||||
|
├── RAM: 256-512 GB
|
||||||
|
├── OS: 2× M.2 NVMe (BOSS-S1) nebo SD card (deprecated)
|
||||||
|
├── vSAN cache: 1-2× NVMe (write buffer)
|
||||||
|
├── vSAN capacity: 4-8× SATA SSD (vSAN ESA) nebo HDD (vSAN OSA)
|
||||||
|
├── Network: 2× 25/100 GbE (VM) + 2× 25 GbE (vSAN)
|
||||||
|
└── Storage: vSAN ESA (all-NVMe) nebo OSA (hybrid)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Use case**: SME, enterprise divize, 10-100 VM, potřeba vMotion, DRS, HA, jednoduchý storage management.
|
||||||
|
|
||||||
|
#### Varianta C: Velká firma — FC SAN (6+ hostů)
|
||||||
|
|
||||||
|
| Komponenta | Doporučení | Poznámka |
|
||||||
|
|-----------|-----------|----------|
|
||||||
|
| **CPU** | 2× EPYC 9654/9965 nebo Xeon 8592+/6980P (32-64C) | 2 socket, max VM density |
|
||||||
|
| **RAM** | 512 GB - 2 TB (4-8 GB/core) | DDR5, 2DPC |
|
||||||
|
| **OS disk** | 2× SATA SSD RAID 1 nebo SD card (vSphere) | Boot, image storage |
|
||||||
|
| **VM storage** | LUNy z FC SAN — VMFS / NFS datastory | Hitachi, Dell, Pure, HPE storage |
|
||||||
|
| **HBA** | 2× dual-port FC HBA 32/64 Gb | Multipath, FC-NVMe |
|
||||||
|
| **Network** | 4-8× 25/100 GbE — rozdělené do traffic typů | Management, VM, vMotion, FT odděleny |
|
||||||
|
| **Hypervisor** | VMware vSphere Enterprise+ / Hyper-V DC | Enterprise license, DRS, HA, FT |
|
||||||
|
| **Storage backend** | FC SAN — VMFS 8 datastory, VVols | Thin provisioning, storage DRS, array snapshots |
|
||||||
|
| **HA** | vSphere HA + DRS + vCenter | vMotion, DRS, FT, SRM pro DR |
|
||||||
|
| **Failover** | N+1 nebo admission control (rezerva CPU/RAM) | Vyhrazená kapacita pro HA failover |
|
||||||
|
|
||||||
|
**Use case**: Enterprise, 100+ VM, mix DB a aplikací, centralizovaný storage management, enterprise SLA.
|
||||||
|
|
||||||
|
#### Varianta D: Hyperscale — Ceph / SDS (20+ hostů)
|
||||||
|
|
||||||
|
| Komponenta | Doporučení | Poznámka |
|
||||||
|
|-----------|-----------|----------|
|
||||||
|
| **CPU** | 2× EPYC 9654/9965 (64-128C) | 2 socket, compute optimální |
|
||||||
|
| **RAM** | 512 GB - 1 TB (2-4 GB/core) | Nízký overcommit ratio pro konzistenci |
|
||||||
|
| **OS disk** | 2× M.2 NVMe RAID 1 (BOSS) | Boot |
|
||||||
|
| **Network** | 4-8× 100 GbE (compute + storage) | Separate OVN/OVS pro SDN, VXLAN tunneling |
|
||||||
|
| **Hypervisor** | OpenStack (Nova) / OpenShift (KubeVirt) | Open source, API-driven, multi-tenant |
|
||||||
|
| **Storage backend** | Ceph (RADOS, RBD, RGW, CephFS) | Unified storage, erasure coding (8+3) |
|
||||||
|
| **Orchestrace** | OpenStack / Kubernetes | Infrastructure-as-Code, autoscaling |
|
||||||
|
| **HA** | OpenStack HA / Kubernetes HA | Self-healing, auto-rebalance |
|
||||||
|
|
||||||
|
**Use case**: Cloud provider, hyperscale, 500+ VM, multi-tenant, maximální automatizace.
|
||||||
|
|
||||||
|
### Srovnání hypervisor variant
|
||||||
|
|
||||||
|
| Aspekt | Lokální (malá) | vSAN/Ceph (střední) | FC SAN (velká) | Ceph hyperscale |
|
||||||
|
|--------|---------------|---------------------|----------------|-----------------|
|
||||||
|
| **Storage** | Lokální RAID | vSAN / Ceph (HCI) | FC SAN (centralizovaný) | Ceph (distribuovaný) |
|
||||||
|
| **Počet hostů** | 2-3 | 3-6 | 6-50+ | 20+ |
|
||||||
|
| **Latence VM** | ~10 µs (local) | ~100-500 µs | ~200 µs (SAN) | ~500-2000 µs |
|
||||||
|
| **CAPEX/host** | Nízký | Střední | Vysoký | Střední |
|
||||||
|
| **CAPEX storage** | Nízký | Žádný (součást hostů) | Vysoký (SAN array) | Žádný (součást hostů) |
|
||||||
|
| **Management** | Simple (per host) | vCenter / Proxmox | vCenter + SAN mgmt | OpenStack / K8s |
|
||||||
|
| **vMotion** | Ne (bez sdílené storage) | Ano (vSAN / Ceph RBD) | Ano (FC LUN) | Ano (Ceph RBD) |
|
||||||
|
| **DRS** | Ne | Ano (vSphere) | Ano (vSphere) | OpenStack scheduler |
|
||||||
|
| **Škálování** | Vertikální | Horizontální (přidat host) | Horizontální (host + SAN) | Horizontální |
|
||||||
|
|
||||||
|
### Network design podle varianty
|
||||||
|
|
||||||
|
#### Malá (lokální storage)
|
||||||
|
|
||||||
|
| Traffic | VLAN | Rychlost | Teaming | Poznámka |
|
||||||
|
|---------|------|----------|---------|----------|
|
||||||
|
| Management | Mgmt | 1 GbE | Active/Passive | Dedikovaný port (iLO/iDRAC) |
|
||||||
|
| VM + Storage | All | 2-4× 10/25 GbE | LACP | Sdílené, VLAN tagging |
|
||||||
|
|
||||||
|
```
|
||||||
|
┌──────────────────────────────────────────┐
|
||||||
|
│ Host │
|
||||||
|
│ ┌──────┐ ┌─────────────────────────────┐│
|
||||||
|
│ │ iLO │ │ NIC1 NIC2 ││
|
||||||
|
│ │ 1 GbE │ │ [LACP] 25 GbE ││
|
||||||
|
│ └──────┘ └──────────┬──────────────────┘│
|
||||||
|
└──────────────────────┼───────────────────┘
|
||||||
|
│
|
||||||
|
┌─────┴─────┐
|
||||||
|
│ Switch │
|
||||||
|
└───────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Střední (vSAN / Ceph)
|
||||||
|
|
||||||
|
| Traffic | VLAN | Rychlost | Teaming | Poznámka |
|
||||||
|
|---------|------|----------|---------|----------|
|
||||||
|
| Management | Mgmt | 1 GbE | Active/Passive | Dedikovaný iLO/iDRAC |
|
||||||
|
| VM | VM | 2× 25/100 GbE | LACP | VM traffic, migrace |
|
||||||
|
| Storage | vSAN/Ceph | 2× 25/100 GbE | LACP nebo RDMA | Oddělený, Jumbo frames (MTU 9000) |
|
||||||
|
|
||||||
|
```
|
||||||
|
┌──────────────────────────────────────────┐
|
||||||
|
│ Host │
|
||||||
|
│ ┌──────┐ ┌──────────┐ ┌───────────────┐│
|
||||||
|
│ │ iLO │ │ NIC1 NIC2│ │ NIC3 NIC4 ││
|
||||||
|
│ │ 1 GbE │ │ VM traffic│ │ Storage (vSAN)││
|
||||||
|
│ └──────┘ └──────────┘ └───────────────┘│
|
||||||
|
└──────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Velká (FC SAN)
|
||||||
|
|
||||||
|
| Traffic | VLAN | Rychlost | Teaming | Poznámka |
|
||||||
|
|---------|------|----------|---------|----------|
|
||||||
|
| Management | Mgmt | 1 GbE | Active/Passive | Dedikovaný |
|
||||||
|
| VM | VM | 2-4× 25/100 GbE | LACP | VM traffic |
|
||||||
|
| vMotion | vMotion | 2× 25 GbE | Dedikovaný | Multi-NIC vMotion |
|
||||||
|
| FT | FT | 2× 10/25 GbE | Dedikovaný | Low latency |
|
||||||
|
| Storage | — | 2× 32/64 Gb FC | Multipath | FC SAN |
|
||||||
|
|
||||||
|
```
|
||||||
|
┌──────────────────────────────────────────────┐
|
||||||
|
│ Host │
|
||||||
|
│ ┌──────┐ ┌────────────┐ ┌────┐ ┌─────────┐│
|
||||||
|
│ │ iLO │ │ NIC1-4 │ │HBA1│ │ HBA2 ││
|
||||||
|
│ │ 1 GbE │ │ VM+vMotion+FT│ │32Gb│ │ 32Gb ││
|
||||||
|
│ └──────┘ └────────────┘ └─┬──┘ └──┬──────┘│
|
||||||
|
└────────────────────────────┼───────┼───────┘
|
||||||
|
│ │
|
||||||
|
┌───────┴───┐ ┌─┴────────┐
|
||||||
|
│ Ethernet │ │ FC Switch │
|
||||||
|
│ Switch │ │ (Brocade/ │
|
||||||
|
│ │ │ Cisco) │
|
||||||
|
└───────────┘ └──────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### BIOS pro hypervisor — všechny varianty
|
||||||
|
|
||||||
|
| Nastavení | Hodnota | Zdůvodnění |
|
||||||
|
|-----------|---------|------------|
|
||||||
|
| Hyper-Threading | Enabled | Vyšší VM density |
|
||||||
|
| Virtualization Technology | Enabled | VT-x/AMD-V |
|
||||||
|
| VT-d / IOMMU | Enabled | Passthrough, SR-IOV |
|
||||||
|
| Power Management | Performance / OS | Minimalizace latence VM exit |
|
||||||
|
| C-States | Disabled | Nižší latence VM exit (důležité pro real-time VM) |
|
||||||
|
| NUMA | Enabled | NUMA-aware VM placement |
|
||||||
|
| SR-IOV | Enabled | NIC/GPU virtualizace |
|
||||||
|
| Adjacent Sector Prefetch | Enabled (Intel) | Lepší sekvenční čtení |
|
||||||
|
| DCU Streamer / IP Prefetcher | Enabled | HW prefetch pro VM workload |
|
||||||
|
| Patrol Scrub | Disabled (vSAN/Ceph) | Může způsobovat latency spikes u SDS |
|
||||||
|
|
||||||
|
### Výběr hypervisoru podle varianty
|
||||||
|
|
||||||
|
| Kritérium | VMware vSphere | Proxmox VE | Hyper-V | OpenStack |
|
||||||
|
|-----------|---------------|------------|---------|-----------|
|
||||||
|
| **Velikost** | SME - Enterprise | SME | SME - Enterprise | Hyperscale |
|
||||||
|
| **Storage** | vSAN, SAN, NFS | Ceph, ZFS, NFS | Storage Spaces, SAN | Ceph, manila |
|
||||||
|
| **License** | ~$1-5k/core | Zdarma (support ~$500/host) | Součást Windows Server | Open source |
|
||||||
|
| **Familiarita** | Nejvyšší | Střední | Windows admin | Nízká |
|
||||||
|
| **Automation** | Terraform, Ansible, PowerCLI | Ansible, Terraform, PBS | PowerShell, SCVMM | Terraform, Heat, Ansible |
|
||||||
|
| **Ekosystém** | Nejširší (Veeam, Zerto, SRM) | Rostoucí (PBS, vzdálená migrace) | Windows ecosystem | Open source (Kolla, TripleO) |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Kubernetes node
|
||||||
|
|
||||||
|
### Node profily
|
||||||
|
|
||||||
|
| Role | CPU | RAM | Storage | Network | Use case |
|
||||||
|
|------|-----|-----|---------|---------|----------|
|
||||||
|
| **General purpose** | 16-32 cores | 64-128 GB | 1× NVMe OS + 1×NVMe local | Web, API, microservices |
|
||||||
|
| **Memory optimized** | 32-64 cores | 256-512 GB | 1× NVMe OS + 2×NVMe local | In-memory cache, DB |
|
||||||
|
| **Compute optimized** | 64-128 cores | 128-256 GB | 1× NVMe OS | Batch, CI/CD |
|
||||||
|
| **GPU node** | 32-64 cores | 512-1024 GB | 1× NVMe OS + 4-8×NVMe local | AI/ML training, inference |
|
||||||
|
| **Storage node** | 16-32 cores | 64-128 GB | 4-12× NVMe/SATA (Ceph/Longhorn) | SDS, persistent volumes |
|
||||||
|
|
||||||
|
### Kernel tuning
|
||||||
|
|
||||||
|
```
|
||||||
|
# /etc/sysctl.d/99-kubernetes.conf
|
||||||
|
net.bridge.bridge-nf-call-iptables = 1
|
||||||
|
net.bridge.bridge-nf-call-ip6tables = 1
|
||||||
|
net.ipv4.ip_forward = 1
|
||||||
|
net.ipv4.conf.all.forwarding = 1
|
||||||
|
|
||||||
|
# Connection tracking (pro NodePort, Service)
|
||||||
|
net.netfilter.nf_conntrack_max = 2097152
|
||||||
|
net.netfilter.nf_conntrack_tcp_timeout_established = 86400
|
||||||
|
|
||||||
|
# File watchers (pro kubelet, containerd)
|
||||||
|
fs.inotify.max_user_instances = 8192
|
||||||
|
fs.inotify.max_user_watches = 524288
|
||||||
|
|
||||||
|
# Memory management
|
||||||
|
vm.swappiness = 0
|
||||||
|
vm.overcommit_memory = 1 # Allow overcommit (CRI-O, containerd)
|
||||||
|
vm.panic_on_oom = 0
|
||||||
|
kernel.panic = 10
|
||||||
|
kernel.panic_on_oops = 1
|
||||||
|
```
|
||||||
|
|
||||||
|
### Container storage
|
||||||
|
|
||||||
|
| Typ | Doporučení | Poznámka |
|
||||||
|
|-----|-----------|----------|
|
||||||
|
| **OS disk** | RAID 1 (2× NVMe) | Ext4/XFS, 100-200 GB |
|
||||||
|
| **Container runtime image** | RAID 1 (2× NVMe) | /var/lib/containerd, 200-500 GB |
|
||||||
|
| **Local PV** | Single NVMe | Raw device, no RAID |
|
||||||
|
| **Rook/Ceph OSD** | Raw NVMe/SATA | HBA/IT mode, no RAID |
|
||||||
|
| **Longhorn** | Raw NVMe/SATA | Ext4/XFS per volume |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Storage server (Ceph / MinIO / NAS)
|
||||||
|
|
||||||
|
### Ceph OSD node
|
||||||
|
|
||||||
|
| Komponenta | Doporučení | Poznámka |
|
||||||
|
|-----------|-----------|----------|
|
||||||
|
| **CPU** | 1-2 cores per OSD | Do 12 OSD na node (24 cores) |
|
||||||
|
| **RAM** | 4-8 GB per OSD + OS | BlueStore cache, 16-64 GB min |
|
||||||
|
| **Network** | 2× 25/100 GbE | Public + Cluster network |
|
||||||
|
| **Storage** | 10-12× NVMe/SATA SSD OSD | HBA/IT mode, žádný RAID |
|
||||||
|
| **OS disk** | 2× SATA SSD RAID 1 | OS, Ceph MON/MGR |
|
||||||
|
|
||||||
|
**BIOS pro Ceph:**
|
||||||
|
- SATA/NVMe: AHCI/NVMe mode (ne RAID)
|
||||||
|
- C-States: Disabled (nižší latence OSD)
|
||||||
|
- NUMA: Enabled
|
||||||
|
- Power: Performance
|
||||||
|
|
||||||
|
### MinIO node
|
||||||
|
|
||||||
|
| Komponenta | Doporučení |
|
||||||
|
|-----------|-----------|
|
||||||
|
| **CPU** | 8-16 cores (32+ pro erasure coding) |
|
||||||
|
| **RAM** | 32-64 GB + 1 GB per 1 TB storage |
|
||||||
|
| **Storage** | 4-16× NVMe (direct, no RAID) |
|
||||||
|
| **Network** | 2× 25/100 GbE |
|
||||||
|
| **OS** | Ubuntu / RHEL, XFS (pro data) |
|
||||||
|
|
||||||
|
### NAS (TrueNAS / FreeNAS)
|
||||||
|
|
||||||
|
- **ZFS**: RAID-Z1/Z2/Z3, compression (lz4, zstd), dedup
|
||||||
|
- **ARC cache**: 1 GB per 1 TB storage (max 64 GB)
|
||||||
|
- **L2ARC**: NVMe cache (optional, read-heavy)
|
||||||
|
- **SLOG**: NVDIMM / Optane (sync write, ZIL)
|
||||||
|
- **Network**: 2-4× 10/25 GbE LACP
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Web / API servery
|
||||||
|
|
||||||
|
| Parametr | Doporučení |
|
||||||
|
|----------|-----------|
|
||||||
|
| **CPU** | High clock, 8-32 cores |
|
||||||
|
| **RAM** | 32-128 GB |
|
||||||
|
| **Storage** | 2× NVMe RAID 1 (OS + app) |
|
||||||
|
| **OS** | Ubuntu / RHEL, optimized kernel |
|
||||||
|
| **Network** | 2× 10/25 GbE (bonding) |
|
||||||
|
|
||||||
|
**Kernel tuning:**
|
||||||
|
```
|
||||||
|
net.ipv4.tcp_tw_reuse = 1
|
||||||
|
net.ipv4.tcp_fin_timeout = 15
|
||||||
|
net.core.somaxconn = 65535
|
||||||
|
net.ipv4.tcp_max_syn_backlog = 65535
|
||||||
|
net.core.netdev_max_backlog = 65535
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Rychlý decision tree — výběr serveru podle workloadu, velikosti a storage
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
flowchart TD
|
||||||
|
W["Jaký workload?"] --> DB["Databáze"]
|
||||||
|
W --> HV["Virtualizace"]
|
||||||
|
W --> K8s["Kubernetes"]
|
||||||
|
W --> AI["AI/ML"]
|
||||||
|
W --> ST["Storage server"]
|
||||||
|
W --> WEB["Web / API"]
|
||||||
|
|
||||||
|
DB --> DBS{"Velikost firmy"}
|
||||||
|
DBS -->|"< 500"| DB1["1× EPYC 8-16C, 64-256 GB<br/>NVMe RAID10, 2× 25GbE"]
|
||||||
|
DBS -->|"500-5000"| DB2{"Storage"}
|
||||||
|
DB2 -->|"Lokální"| DB2L["1-2× EPYC 16-24C, 128-512 GB<br/>NVMe RAID10, 4× 25GbE"]
|
||||||
|
DB2 -->|"Ceph"| DB2C["2× EPYC 16-32C, 256-512 GB<br/>RBD, 4× 25/100GbE"]
|
||||||
|
DBS -->|"Enterprise"| DB3{"Storage"}
|
||||||
|
DB3 -->|"FC SAN"| DB3F["2× EPYC 48-128C, 512-2048 GB<br/>SAN LUN + 2× FC 32/64G"]
|
||||||
|
DB3 -->|"Ceph"| DB3C["2× EPYC 32-64C, 256-512 GB<br/>RBD, 4× 100GbE"]
|
||||||
|
DBS -->|"Cloud"| DBC["RDS/Azure SQL/CloudSQL<br/>Managed, Multi-AZ"]
|
||||||
|
|
||||||
|
DB --> ORACLE{"Oracle architektura?"}
|
||||||
|
ORACLE -->|"Standalone"| ORA1["1-2× EPYC 8-24C<br/>64-512 GB, ASM local/FC<br/>2× 25GbE + FC 32G"]
|
||||||
|
ORACLE -->|"Data Guard"| ORA2["2× EPYC 32-64C<br/>256-1024 GB, FC SAN<br/>2× 25/100GbE + 2× FC 64G<br/>2× 25GbE (DG sync)"]
|
||||||
|
ORACLE -->|"RAC 2-4 nodes"| ORA3["Per node: 2× EPYC 32-64C<br/>512-2048 GB, FC SAN<br/>2× 100GbE (app)<br/>2× FC 64G (storage)<br/>2× 100GbE RoCE (interconnect)"]
|
||||||
|
ORACLE -->|"Exadata"| ORA4["Engineered system<br/>2-8 DB servers + 3-18 storage<br/>RoCE 100GbE, Smart Scan<br/>15-30 kW/rack"]
|
||||||
|
|
||||||
|
HV --> HVS{"Počet hostů"}
|
||||||
|
HVS -->|"2-3"| HV1["1× EPYC 12-24C, 128-256 GB<br/>RAID5/6 SSD, 2-4× 10/25GbE"]
|
||||||
|
HVS -->|"3-6"| HV2{"HCI"}
|
||||||
|
HV2 -->|"vSAN"| HV2V["1-2× EPYC 16-32C, 256-512 GB<br/>NVMe cache + SSD, 4× 25GbE"]
|
||||||
|
HV2 -->|"Ceph"| HV2C["1× EPYC 12-24C, 128-256 GB<br/>4-8× HBA NVMe/SSD, 4× 25GbE"]
|
||||||
|
HVS -->|"6+"| HV3["2× EPYC 32-64C, 512-2048 GB<br/>FC SAN 32/64G, 4-8× 25/100GbE"]
|
||||||
|
HVS -->|"20+"| HV4["2× EPYC 64-128C, 512-1024 GB<br/>OpenStack + Ceph, 4-8× 100GbE"]
|
||||||
|
|
||||||
|
K8s --> K8T{"Typ uzlu"}
|
||||||
|
K8T -->|"General"| K8G["16-32C, 64-128 GB<br/>2× NVMe, 2× 25GbE"]
|
||||||
|
K8T -->|"Memory"| K8M["32-64C, 256-512 GB<br/>3× NVMe, 2× 25GbE"]
|
||||||
|
K8T -->|"GPU"| K8U["32-64C, 512-1024 GB<br/>6-10× NVMe, H100/B200, 4× 100GbE"]
|
||||||
|
K8T -->|"Storage"| K8S["16-32C, 64-128 GB<br/>6-14× HBA NVMe, 4× 25GbE"]
|
||||||
|
|
||||||
|
AI --> AIT{"Účel"}
|
||||||
|
AIT -->|"Trénování"| AITR["GPU H100/B200, NVLink<br/>InfiniBand 400Gb/s, liquid cooling"]
|
||||||
|
AIT -->|"Inference"| AIIR["A100/H200, MIG<br/>PCIe 5.0, 2× 100GbE"]
|
||||||
|
|
||||||
|
ST --> STT{"Typ"}
|
||||||
|
STT -->|"Ceph OSD"| STC["EPYC (PCIe lanes)<br/>4-8 GB/OSD, HBA, 2× 25/100GbE"]
|
||||||
|
STT -->|"MinIO"| STM["EPYC 8-16C, 32-64 GB<br/>4-16× NVMe direct, 2× 25/100GbE"]
|
||||||
|
STT -->|"NAS (ZFS)"| STN["EPYC 16-32C, 64-128 GB<br/>RAID-Z, SLOG NVMe, 2-4× 10/25GbE"]
|
||||||
|
|
||||||
|
WEB --> WEBE["EPYC high clock, 8-32C<br/>32-128 GB, 2× NVMe RAID1, 2× 10/25GbE"]
|
||||||
|
```
|
||||||
|
|
||||||
|
### Connectivity summary podle platformy
|
||||||
|
|
||||||
|
| Platforma | App / VM síť | Storage síť | Replikace / Cluster | Management |
|
||||||
|
|-----------|-------------|-------------|---------------------|------------|
|
||||||
|
| **DB lokální (malá)** | 2× 25 GbE LACP | — | 2× 25 GbE (sdílené) | 1× 1 GbE (iLO) |
|
||||||
|
| **DB lokální (střední)** | 2× 25/100 GbE LACP | — | 2× 25 GbE dedikované | 1× 1 GbE (iLO) |
|
||||||
|
| **DB FC SAN** | 2× 25/100 GbE LACP | 2× 32/64 Gb FC multipath | FC replication | 1× 1 GbE (iLO) + SAN mgmt |
|
||||||
|
| **DB Ceph** | 2× 25/100 GbE | 2× 25/100 GbE (Ceph public) | 2× 25/100 GbE (Ceph cluster) | 1× 1 GbE (iLO) |
|
||||||
|
| **Hypervisor lokální** | 2-4× 10/25 GbE LACP | — (lokální) | — | 1× 1 GbE (iLO) |
|
||||||
|
| **Hypervisor vSAN** | 2× 25/100 GbE LACP | 2× 25/100 GbE (vSAN) | vSAN traffic | 1× 1 GbE (iLO) |
|
||||||
|
| **Hypervisor FC SAN** | 2-4× 25/100 GbE LACP | 2× 32/64 Gb FC multipath | 2× 25 GbE (vMotion) | 1× 1 GbE (iLO) |
|
||||||
|
| **Hypervisor Ceph** | 2× 25/100 GbE LACP | 2× 25/100 GbE (Ceph) | 2× 25 GbE (migration) | 1× 1 GbE (iLO) |
|
||||||
|
| **Kubernetes** | 2× 25/100 GbE | 2× 25/100 GbE (Ceph/Longhorn) | 2× 25/100 GbE (K8s cluster) | 1× 1 GbE (BMC) |
|
||||||
|
| **Web/API** | 2× 10/25 GbE LACP | — | — | 1× 1 GbE (BMC) |
|
||||||
|
| **Oracle Standalone** | 2× 25 GbE LACP | 2× FC 32G nebo NVMe local | Data Guard 2× 25 GbE | 1× 1 GbE (iLO) + ASM mgmt |
|
||||||
|
| **Oracle Data Guard** | 2× 25/100 GbE LACP | 2× FC 64G multipath | 2× 25 GbE (DG sync) | 1× 1 GbE (iLO) + SAN mgmt |
|
||||||
|
| **Oracle RAC** | 2× 100 GbE LACP (VIP/SCAN) | 2× FC 64G multipath | 2× 100 GbE RoCE (Cache Fusion) | 1× 1 GbE (iLO) + Clusterware |
|
||||||
|
| **Oracle Exadata** | 4-8× 100 GbE RoCE | NVMe over Fabric | RDMA interconnect | Exadata CLI + OEDA |
|
||||||
|
|
||||||
|
## Zdroje
|
||||||
|
|
||||||
|
Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
|
||||||
|
|
||||||
|
*Poslední revize: 2026-06-03*
|
||||||
353
SERVER-HW.md
Normal file
353
SERVER-HW.md
Normal file
@@ -0,0 +1,353 @@
|
|||||||
|
# 🔧 Server hardware — komponenty a architektura
|
||||||
|
|
||||||
|
## Form faktory
|
||||||
|
|
||||||
|
| Typ | Popis | Výhody | Nevýhody |
|
||||||
|
|-----|-------|--------|----------|
|
||||||
|
| **Rack (1U/2U/4U)** | Standardní rack mount, šířka 19" | Široká škála konfigurací, jednoduchá výměna | Omezený počet PCIe slotů v 1U |
|
||||||
|
| **Blade** | Modulární server do chassis (HPE Synergy, Dell MX) | Vysoká hustota, sdílené napájení/chlazení | Vendor lock-in, vyšší cena chassis |
|
||||||
|
| **Tower** | Samostatně stojící skříň | Tichý, rozšiřitelný | Zabírá místo, není rack-optimized |
|
||||||
|
| **Edge / Micro** | Malý, nízká spotřeba, industriální provedení | Odolnost vůči prostředí, nízký odběr | Omezený výkon, méně PCIe |
|
||||||
|
|
||||||
|
## Procesory (CPU)
|
||||||
|
|
||||||
|
### Intel Xeon vs AMD EPYC
|
||||||
|
|
||||||
|
| Vlastnost | Intel Xeon (6. gen Granite Rapids) | AMD EPYC (5. gen Turin) |
|
||||||
|
|-----------|-----------------------------------|------------------------|
|
||||||
|
| **Max jader** | 128 (P-cores) | 192 (Zen 5c) / 128 (Zen 5) |
|
||||||
|
| **PCIe lanes** | 80-96 per socket | 128 per socket |
|
||||||
|
| **Memory channels** | 8 (DDR5) | 12 (DDR5) |
|
||||||
|
| **Max memory** | 4 TB | 6 TB+ |
|
||||||
|
| **Cache L3** | ~200 MB | ~384 MB |
|
||||||
|
| **AVX-512** | Ano (full width) | Ano (256bit) |
|
||||||
|
| **AMX (matrix)** | Ano (AMX, Intel AMX) | Ne |
|
||||||
|
| **TDP** | 350-500 W | 360-500 W |
|
||||||
|
| **Infrastructure** | Intel QuickAssist, DSA, IAA | AMD Infinity Architecture |
|
||||||
|
| **Use case** | AI inference, networking, HPC | Virtualizace, databáze, general purpose |
|
||||||
|
|
||||||
|
### CPU selection guide
|
||||||
|
|
||||||
|
| Workload | Doporučený CPU | Zdůvodnění |
|
||||||
|
|----------|---------------|------------|
|
||||||
|
| **Databáze (OLTP)** | EPYC (high core count, more memory channels) | Více PCIe lanes pro NVMe, vyšší memory bandwidth |
|
||||||
|
| **Databáze (OLAP/DW)** | Xeon (AVX-512, AMX) | Vektorové instrukce pro analytické dotazy |
|
||||||
|
| **Virtualizace** | EPYC (více jader, nižší TCO) | Vyšší core density, nižší cena per core |
|
||||||
|
| **HPC / AI training** | Xeon + GPU (AMX pro preprocessing) | AMX pro data preprocessing, GPU pro training |
|
||||||
|
| **Web / API servery** | EPYC (good perf/core, low TDP variants) | Dobrý poměr výkon/W |
|
||||||
|
| **Storage** | EPYC (128 PCIe lanes pro NVMe) | Maximum NVMe disků |
|
||||||
|
|
||||||
|
## Operační paměť (RAM)
|
||||||
|
|
||||||
|
### Typy DIMM
|
||||||
|
|
||||||
|
| Typ | Popis | Use case | Server support |
|
||||||
|
|-----|-------|----------|---------------|
|
||||||
|
| **RDIMM** (Registered) | Registrovaná, buffer adresových linek (1 register) | Standardní serverová paměť | Všechny servery |
|
||||||
|
| **LRDIMM** (Load-Reduced) | Snížená elektrická zátěž (2 registry — data + adresy) | Vysokokapacitní konfigurace (více DIMMů na channel) | Enterprise, 4R+ |
|
||||||
|
| **NVDIMM** (Non-Volatile) | Bateriově zálohovaná DRAM + flash | Write cache, metadata, persistence | Legacy (Intel Optane PMEM) |
|
||||||
|
| **3D XPoint / Optane** | PCM-based persistence (ukončeno Intelem) | Legacy | Intel-only, ukončeno |
|
||||||
|
|
||||||
|
### DDR5 vs DDR4 klíčové rozdíly
|
||||||
|
|
||||||
|
| Vlastnost | DDR4 | DDR5 |
|
||||||
|
|-----------|------|------|
|
||||||
|
| **Channel architektura** | 1× 64-bit channel per DIMM | 2× 32-bit sub-channel per DIMM |
|
||||||
|
| **Bank groups** | 4 (single rank) | 8 (single rank) |
|
||||||
|
| **Burst length** | 8 (BL8) | 16 (BL16) |
|
||||||
|
| **On-die ECC** | Ne | Ano (pro opravu bitových chyb v DRAM) |
|
||||||
|
| **PMIC** | Na motherboard | Na DIMM (power management IC) |
|
||||||
|
| **VDD** | 1.2 V | 1.1 V |
|
||||||
|
| **RCD** | 1× RCD per DIMM | 2× RCD (jeden na sub-channel) |
|
||||||
|
| **Max DIMM capacity** | 64 GB (LRDIMM) | 256 GB (RDIMM 3DS) |
|
||||||
|
| **Max speed** | 3200 MT/s | 6400 MT/s (aktuálně 4800-5600) |
|
||||||
|
|
||||||
|
### Memory rank — detail
|
||||||
|
|
||||||
|
Rank = sada DRAM čipů na DIMMu, které jsou přístupné současně (64bit data + 8bit ECC).
|
||||||
|
|
||||||
|
| Rank | Počet DRAM čipů (x8) | Kapacita DIMM (typ.) | Popis |
|
||||||
|
|------|---------------------|---------------------|-------|
|
||||||
|
| **Single Rank (1R)** | 8-9 | 8-32 GB | Všechny DRAM čipy v jedné bance |
|
||||||
|
| **Dual Rank (2R)** | 16-18 | 16-128 GB | Dvě banky, rank interleaving |
|
||||||
|
| **Quad Rank (4R)** | 32-36 | 64-256 GB (3DS) | Čtyři banky, vyšší kapacita |
|
||||||
|
| **Octa Rank (8R)** | 64-72 | 256 GB (3DS) | Nejvyšší kapacita, enterprise |
|
||||||
|
|
||||||
|
**Rank interleaving**: Dual-rank DIMM může oslovovat dva ranking střídavě, což zvyšuje efektivní bandwidth (až o 5-15 % oproti single-rank při stejném taktu).
|
||||||
|
|
||||||
|
**DDR5 rank vs DDR4**: DDR5 single-rank již obsahuje 8 bank groups (ekvivalent dual-rank DDR4), proto je rank upgrade u DDR5 méně výrazný než u DDR4.
|
||||||
|
|
||||||
|
**Pravidlo**: Vždy preferovat dual-rank DIMMy před single-rank pro vyšší hustotu a bandwidth. Quad-rank a octa-rank pouze LRDIMM nebo 3DS.
|
||||||
|
|
||||||
|
### Osazování DIMM — základní pravidla
|
||||||
|
|
||||||
|
#### 1DPC vs 2DPC (DIMMs Per Channel)
|
||||||
|
|
||||||
|
| Konfigurace | DIMMů na channel | Max speed DDR5 | Bandwidth | Kapacita |
|
||||||
|
|------------|-----------------|---------------|-----------|----------|
|
||||||
|
| **1DPC** | 1 | 4800-5600 MT/s | 100 % | Nižší |
|
||||||
|
| **2DPC** | 2 | 4000-4400 MT/s | ~80 % | Vyšší |
|
||||||
|
|
||||||
|
**Důležité**: Při osazení 2 DIMMů na channel klesá rychlost pamětí. Např. Dell R760:
|
||||||
|
- 1DPC: 5600 MT/s (s 5th Gen Xeon)
|
||||||
|
- 2DPC: 4400 MT/s (vždy)
|
||||||
|
|
||||||
|
#### Channel architecture (Intel Xeon 4th/5th Gen — 8 channels per CPU)
|
||||||
|
|
||||||
|
```
|
||||||
|
CPU 1 — Channel A [Slot A1 (white)] [Slot A9 (black)] 1DPC: osadit bílé sloty
|
||||||
|
─ Channel B [Slot A7 (white)] [Slot A15 (black)] 2DPC: osadit bílé + černé
|
||||||
|
─ Channel C [Slot A3 (white)] [Slot A11 (black)]
|
||||||
|
─ Channel D [Slot A5 (white)] [Slot A13 (black)]
|
||||||
|
─ Channel E [Slot A4 (white)] [Slot A12 (black)]
|
||||||
|
─ Channel F [Slot A6 (white)] [Slot A14 (black)]
|
||||||
|
─ Channel G [Slot A2 (white)] [Slot A10 (black)]
|
||||||
|
─ Channel H [Slot A8 (white)] [Slot A16 (black)]
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Channel architecture (AMD EPYC — 12 channels per CPU)
|
||||||
|
|
||||||
|
```
|
||||||
|
CPU 1 ─ Channel 0-11 (12× single channel, 2 DPC)
|
||||||
|
Slot A0 (P0) / Slot A1 (P1) — dle konkrétního serveru
|
||||||
|
```
|
||||||
|
|
||||||
|
AMD EPYC má 12 memory channels (vs Intel 8), což dává o 50 % vyšší teoretickou memory bandwidth.
|
||||||
|
|
||||||
|
### Pravidla osazování od výrobců
|
||||||
|
|
||||||
|
#### Dell PowerEdge (R660 / R760)
|
||||||
|
|
||||||
|
| Počet DIMMů na CPU | 1DPC (bílé sloty) | 2DPC (bílé + černé) | Speed |
|
||||||
|
|-------------------|-------------------|---------------------|-------|
|
||||||
|
| **1 DIMM per CPU** | A1 (Channel A) | — | 5600 MT/s |
|
||||||
|
| **2 DIMMs per CPU** | A1, A7 | — | 5600 MT/s |
|
||||||
|
| **4 DIMMs per CPU** | A1, A7, A3, A5 | — | 5600 MT/s |
|
||||||
|
| **8 DIMMs per CPU** | A1-A8 (všechny bílé) | — | 5600 MT/s |
|
||||||
|
| **16 DIMMs per CPU** | A1-A8 (bílé) | A9-A16 (černé) | 4400 MT/s |
|
||||||
|
|
||||||
|
**Klíčová pravidla dle Dell**:
|
||||||
|
1. Všechny DIMMy musí být DDR5 (nemíchat generace)
|
||||||
|
2. Nemíchat kapacity DIMMů (všechny stejné)
|
||||||
|
3. Nemíchat x4 a x8 DRAM chips
|
||||||
|
4. Nemíchat 3DS a non-3DS RDIMM
|
||||||
|
5. Pokud mícháte rychlosti DIMMů, všechny běží na nejnižší
|
||||||
|
6. Vyvážit kapacitu mezi procesory
|
||||||
|
7. Optimální konfigurace: 16× identický DIMM (1DPC na každém channelu)
|
||||||
|
8. Fault Resilient Memory (FRM): pouze 8 nebo 16 DIMMů na procesor
|
||||||
|
|
||||||
|
#### HPE ProLiant (DL360 / DL380 Gen11)
|
||||||
|
|
||||||
|
**Population order** (16 slotů na CPU, Intel):
|
||||||
|
|
||||||
|
| DIMMů | Pořadí osazení |
|
||||||
|
|-------|---------------|
|
||||||
|
| 1 | 10 |
|
||||||
|
| 2 | 1, 3 |
|
||||||
|
| 4 | 1, 3, 7, 10 |
|
||||||
|
| 6 | 3, 5, 7, 10, 14, 16 |
|
||||||
|
| 8 | 1, 3, 5, 7, 10, 12, 14, 16 |
|
||||||
|
| 12 | 1, 2, 3, 5, 6, 7, 10, 11, 12, 14, 15, 16 |
|
||||||
|
| 16 | 1-16 |
|
||||||
|
|
||||||
|
**Pravidla HPE SmartMemory**:
|
||||||
|
1. Nejkvalifikovanější konfigurace: 1DPC (bílé sloty)
|
||||||
|
2. 2DPC (černé sloty) až po osazení všech bílých
|
||||||
|
3. HBM + 4th Gen Intel: nepodporuje Hemi (hemisphere) a SGX
|
||||||
|
4. Heterogenní mix: vyšší rank count do bílých slotů
|
||||||
|
5. **Nemíchat**: 3DS s non-3DS, x4 s x8, různé ranky v channelu, 16 Gb / 24 Gb / 32 Gb DRAM
|
||||||
|
|
||||||
|
#### HPE Gen11/Gen12 s AMD EPYC 9005 (a50012817enw)
|
||||||
|
|
||||||
|
AMD EPYC 9005 (Turin) přináší 12 memory channels na CPU a podporu DDR5-6400.
|
||||||
|
|
||||||
|
| Vlastnost | Detail |
|
||||||
|
|-----------|--------|
|
||||||
|
| **Memory channels** | 12 per CPU (vs 8 u Intel) |
|
||||||
|
| **Max DIMM slots** | 24 per CPU (2 DPC) |
|
||||||
|
| **Max speed** | DDR5-6400 (1 DPC), DDR5-4800–5600 (2 DPC) |
|
||||||
|
| **Max capacity** | 6 TB+ (12× 256 GB 3DS RDIMM) |
|
||||||
|
| **DIMM typy** | RDIMM (1R/2R/4R/8R), 3DS RDIMM, LRDIMM |
|
||||||
|
| **Population** | 1 DPC (bílé sloty): 12 DIMMs, plná rychlost; 2 DPC: 24 DIMMs, snížená rychlost |
|
||||||
|
| **Optimum** | 12× identických DIMMů (1 DPC na každém channelu) = max bandwidth |
|
||||||
|
|
||||||
|
**Pravidla pro AMD EPYC 9005:**
|
||||||
|
1. Osazovat po stejných kapacitách v rámci channelu
|
||||||
|
2. 1 DPC = plná rychlost 6400 MT/s, 2 DPC = nižší rychlost
|
||||||
|
3. Pro optimální bandwidth: 12 DIMMů (1DPC) na CPU — využito všech 12 channelů
|
||||||
|
4. Maximální kapacita: 24 DIMMů (2DPC) — 24× 256 GB = 6 TB na CPU
|
||||||
|
5. Nemíchat RDIMM a LRDIMM ve stejném systému
|
||||||
|
|
||||||
|
### Memory population — decision flow
|
||||||
|
|
||||||
|
```
|
||||||
|
Kolik DIMMů na CPU?
|
||||||
|
│
|
||||||
|
├── 1 DIMM → Channel A (slot 1), ztrácíte 87.5 % bandwidth
|
||||||
|
│
|
||||||
|
├── 2 DIMMs → Channels A+B, stále ztráta 75 % bandwidth
|
||||||
|
│
|
||||||
|
├── 4 DIMMs → Channels A,B,C,D, lepší, ale ne optimální
|
||||||
|
│
|
||||||
|
├── 8 DIMMs → 1DPC na všech channel = MAX SPEED (5600 MT/s)
|
||||||
|
│ ✅ Doporučeno pro výkon
|
||||||
|
│
|
||||||
|
├── 12 DIMMs → 8× 1DPC + 4× 2DPC = mixed speed (4400 MT/s)
|
||||||
|
│
|
||||||
|
├── 16 DIMMs → 2DPC na všech channel = MAX KAPACITA (4400 MT/s)
|
||||||
|
│ ✅ Pro kapacitně náročné workloady
|
||||||
|
│
|
||||||
|
└── Více než 16 → Pouze s LRDIMM / 3DS, speed penalty
|
||||||
|
|
||||||
|
Závěr: 8 DIMMů na CPU (1DPC) = nejvyšší výkon
|
||||||
|
16 DIMMů na CPU (2DPC) = nejvyšší kapacita
|
||||||
|
```
|
||||||
|
|
||||||
|
### Vliv konfigurace na výkon
|
||||||
|
|
||||||
|
| Konfigurace | Relativní bandwidth | Latence | Use case |
|
||||||
|
|------------|-------------------|---------|----------|
|
||||||
|
| **1DPC, 8 ch, 5600 MT/s** (8 DIMM) | 100 % | Nejnižší | Databáze OLTP, HPC, real-time |
|
||||||
|
| **2DPC, 8 ch, 4400 MT/s** (16 DIMM) | ~78 % | +10-15 % | Virtualizace, VDI, in-memory DB |
|
||||||
|
| **Mixed 1+2DPC** (12 DIMM) | ~85 % | Střední | Kompromis kapacity/výkonu |
|
||||||
|
| **Unbalanced channels** | 50-70 % | Vysoká | **Vyhnout se** |
|
||||||
|
|
||||||
|
**Doporučení výrobců:**
|
||||||
|
- **Dell**: 16× identických DIMMů (8 per CPU), 1DPC, 5600 MT/s = optimální výkon
|
||||||
|
- **HPE Intel**: Vždy plnit bílé sloty první, pro max výkon 1DPC, pro max kapacitu 2DPC
|
||||||
|
- **HPE AMD EPYC 9005**: 12 channelů na CPU, 1DPC = 12 DIMMů na CPU při 6400 MT/s (max bandwidth); 2DPC = 24 DIMMů na CPU (max kapacita 6 TB)
|
||||||
|
- **Supermicro**: Sledovat konkrétní manual pro daný model (DSG, GPU, storage)
|
||||||
|
- **Lenovo**: Stejná pravidla jako Intel/AMD platforma — preferovat 1DPC
|
||||||
|
|
||||||
|
### Memory sizing per workload
|
||||||
|
|
||||||
|
| Workload | Poměr RAM/core | Typický pool | Doporučená konfigurace |
|
||||||
|
|----------|---------------|--------------|----------------------|
|
||||||
|
| Databáze (OLTP) | 8-16 GB/core, DB v RAM | 256 GB - 2 TB | 8× 32-64 GB RDIMM, 1DPC |
|
||||||
|
| Databáze (OLAP) | 16-64 GB/core, columnstore | 512 GB - 4 TB+ | 16× 64-128 GB RDIMM, 2DPC |
|
||||||
|
| Virtualizace (VM) | 4-8 GB/core, podle VM density | 256 GB - 2 TB | 8-16× 32-64 GB RDIMM |
|
||||||
|
| Kubernetes (general) | 2-4 GB/core | 64-256 GB | 8× 16-32 GB RDIMM, 1DPC |
|
||||||
|
| AI training (CPU preprocessing) | 2-4 GB/core | 128-512 GB | 8× 32-64 GB RDIMM, 1DPC |
|
||||||
|
| HPC | 1-2 GB/core | 64-128 GB | 8× 16 GB RDIMM, 1DPC, high-speed |
|
||||||
|
| In-memory DB (SAP HANA) | 8-32 GB/core | 1-6 TB+ | 16× 128-256 GB LRDIMM/3DS |
|
||||||
|
|
||||||
|
## PCIe
|
||||||
|
|
||||||
|
| Generace | Rok | Rychlost per lane | x16 propustnost | x24 (GPU) |
|
||||||
|
|----------|-----|-------------------|-----------------|-----------|
|
||||||
|
| **PCIe 3.0** | 2010 | 985 MB/s | 15.8 GB/s | 23.6 GB/s |
|
||||||
|
| **PCIe 4.0** | 2017 | 1.97 GB/s | 31.5 GB/s | 47.3 GB/s |
|
||||||
|
| **PCIe 5.0** | 2022 | 3.94 GB/s | 63 GB/s | 94.5 GB/s |
|
||||||
|
| **PCIe 6.0** | 2025 | 7.88 GB/s | 126 GB/s | 189 GB/s |
|
||||||
|
|
||||||
|
**PCIe lane allocation**:
|
||||||
|
- GPU (x16): NVIDIA H100, AMD MI300X
|
||||||
|
- NVMe U.2 (x4): každý NVMe disk
|
||||||
|
- NIC 100 GbE (x16): dual-port 100 GbE
|
||||||
|
- RAID/HBA (x8): storage controller
|
||||||
|
|
||||||
|
**CPU PCIe lane count**:
|
||||||
|
- Intel Xeon Scalable (4. gen): 64-80 lanes per socket
|
||||||
|
- AMD EPYC (4. gen Genoa): 128 lanes per socket
|
||||||
|
- Dual-socket: 256 lanes total
|
||||||
|
|
||||||
|
## NUMA
|
||||||
|
|
||||||
|
### Topologie
|
||||||
|
|
||||||
|
```
|
||||||
|
Socket 0 (NUMA node 0) Socket 1 (NUMA node 1)
|
||||||
|
├── Cores 0-31 ├── Cores 32-63
|
||||||
|
├── Memory 0-256 GB ├── Memory 256-512 GB
|
||||||
|
├── PCIe root complex (GPU, NVMe) ├── PCIe root complex (NIC, NVMe)
|
||||||
|
└── I/O hub └── I/O hub
|
||||||
|
│ │
|
||||||
|
└───────── Infinity Fabric / UPI ──┘
|
||||||
|
```
|
||||||
|
|
||||||
|
- **Local access** — CPU → vlastní memory (nízká latence, plná bandwidth)
|
||||||
|
- **Remote access** — CPU → druhý socket memory (vyšší latence, ~1.5×, nižší bandwidth)
|
||||||
|
- NUMA-aware aplikace: databáze, VM, DPDK, AI training
|
||||||
|
|
||||||
|
### Cross-NUMA penalty
|
||||||
|
|
||||||
|
| CPU | Local latency | Remote latency | Penalty |
|
||||||
|
|-----|--------------|----------------|---------|
|
||||||
|
| AMD EPYC (Genoa) | ~80 ns | ~150 ns | ~1.9× |
|
||||||
|
| Intel Xeon (Sapphire Rapids) | ~90 ns | ~160 ns | ~1.8× |
|
||||||
|
|
||||||
|
## TDP a chlazení
|
||||||
|
|
||||||
|
| CPU | TDP | Core count | Chlazení |
|
||||||
|
|-----|-----|-----------|----------|
|
||||||
|
| Intel Xeon Platinum 8480+ | 350 W | 56 | Air (high-performance) |
|
||||||
|
| Intel Xeon 6980P (Granite Rapids) | 500 W | 128 | Liquid recommended |
|
||||||
|
| AMD EPYC 9654 (Genoa) | 360 W | 96 | Air / Liquid |
|
||||||
|
| AMD EPYC 9965 (Turin) | 500 W | 192 | Liquid recommended |
|
||||||
|
|
||||||
|
### Cooling requirements per rack density
|
||||||
|
|
||||||
|
| Rack density | kW/rack | Cooling |
|
||||||
|
|-------------|---------|---------|
|
||||||
|
| Low | 1-5 kW | Free air cooling |
|
||||||
|
| Medium | 5-15 kW | CRAC/CRAH, hot/cold aisle |
|
||||||
|
| High | 15-40 kW | In-row cooling, rear-door HX |
|
||||||
|
| Ultra | 40-100+ kW | Direct-to-chip liquid, immersion |
|
||||||
|
|
||||||
|
## BMC a management
|
||||||
|
|
||||||
|
| Vendor | BMC | API | Remote console | Features |
|
||||||
|
|--------|-----|-----|---------------|----------|
|
||||||
|
| **Dell** | iDRAC (9/10) | Redfish, RACADM | Virtual Console (HTML5) | Lifecycle Controller, SUU |
|
||||||
|
| **HPE** | iLO (5/6) | Redfish, iLOREST | Integrated Remote Console | Smart Update Manager, SUM |
|
||||||
|
| **Supermicro** | BMC / IPMI | IPMI, Redfish | IPMIView, HTML5 KVM | SuperDoctor, SSM |
|
||||||
|
| **Lenovo** | XClarity Controller | Redfish, IPMI | Remote Console | XClarity Administrator |
|
||||||
|
| **Cisco** | CIMC / UCSM | Redfish, XML API | KVM Console | UCS Manager, Intersight |
|
||||||
|
|
||||||
|
### Standardní funkce
|
||||||
|
- Power: on/off/cycle/reset
|
||||||
|
- Boot: one-shot PXE, CD-ROM redirect, BIOS setup
|
||||||
|
- Monitoring: sensors (temp, voltage, fan, PSU)
|
||||||
|
- Alerting: SNMP traps, email, Redfish events
|
||||||
|
- Remote media: ISO mount přes network
|
||||||
|
- Serial over LAN (SOL)
|
||||||
|
|
||||||
|
## Výrobci a řady
|
||||||
|
|
||||||
|
| Výrobce | Rack series | Blade series | Management |
|
||||||
|
|---------|-------------|-------------|------------|
|
||||||
|
| **Dell** | PowerEdge R6xx/R7xx (R660, R760) | MX7000, FX2 | iDRAC, OpenManage Enterprise |
|
||||||
|
| **HPE** | ProLiant DL (DL360, DL380) | Synergy, BladeSystem | iLO, OneView, OpsRamp |
|
||||||
|
| **Cisco** | UCS C-Series (C240, C245) | UCS B-Series, Fabric Interconnect | UCS Manager, Intersight |
|
||||||
|
| **Lenovo** | ThinkSystem SR (SR630, SR650) | ThinkSystem SN | XClarity |
|
||||||
|
| **Supermicro** | SuperServer (pro GPU, storage, cloud) | FatTwin, MicroBlade | IPMI, SuperDoctor |
|
||||||
|
|
||||||
|
## Server connectivity
|
||||||
|
|
||||||
|
Detailní kapitola o síťové a storage konektivitě: [CONNECTIVITY.md](CONNECTIVITY.md)
|
||||||
|
|
||||||
|
## Storage controllers
|
||||||
|
|
||||||
|
| Controller | Typ | RAID | Cache | Protokol |
|
||||||
|
|-----------|-----|------|-------|----------|
|
||||||
|
| **Dell PERC** (H755, H965) | HW RAID | 0/1/5/6/10/50/60 | 4-8 GB NV | NVMe, SAS, SATA |
|
||||||
|
| **Broadcom / LSI** (9560, 9670) | HW RAID / HBA | 0/1/5/6/10/50/60 | 4 GB NV | NVMe, SAS, SATA |
|
||||||
|
| **Intel VROC** | SW RAID (CPU) | 0/1/5/10 | — | NVMe only |
|
||||||
|
| **M.2 HW RAID** (BOSS-S1) | HW RAID | 0/1 | — | 2× M.2 NVMe/SATA |
|
||||||
|
|
||||||
|
### IT vs HW RAID mode
|
||||||
|
|
||||||
|
| Vlastnost | IT (Initiator Target) / HBA | HW RAID |
|
||||||
|
|-----------|---------------------------|---------|
|
||||||
|
| **OS vidí** | Každý disk samostatně | RAID virtuální disk |
|
||||||
|
| **Caching** | OS cache | RAID controller cache (BBU) |
|
||||||
|
| **RAID** | Software (mdadm, ZFS, Ceph) | Hardware + SW driver |
|
||||||
|
| **Passthrough** | Ano | Ne |
|
||||||
|
| **Use case** | SDS (Ceph, MinIO), ZFS | VMware VMFS, Windows, legacy |
|
||||||
|
| **Battery/Backup** | Není potřeba | Write-back cache vyžaduje BBU |
|
||||||
|
|
||||||
|
## Zdroje
|
||||||
|
|
||||||
|
Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
|
||||||
|
|
||||||
|
*Poslední revize: 2026-06-03*
|
||||||
Reference in New Issue
Block a user