Files
knowledge-base/SERVER-CONFIG.en.md
Stanislav Hubacek 3fa11ef0f6 comiiit
2026-06-11 15:27:28 +02:00

37 KiB
Raw Blame History

⚙️ Server configuration — best practices by workload

General BIOS/UEFI settings

Setting Recommendation Rationale
Boot mode UEFI Secure Boot, GPT, larger disks
Power profile Performance / OS Control Max performance, C-States disabled
Hyper-Threading Enabled +30-50 % throughput for multi-thread
Virtualization Enabled (VT-x/AMD-V) Required for hypervisor, containers
SR-IOV Enabled GPU, NIC passthrough
NUMA Enabled NUMA-aware scheduling
ACPI Enabled Power management, OS-level
Secure Boot Enabled Secure boot chain
TPM Enabled Measured boot, key storage

1. Database servers

CPU Selection

DB type CPU preference Rationale
OLTP (PostgreSQL, MySQL) High clock, moderate cores Low latency per transaction, limited parallelism
OLAP (ClickHouse, Snowflake) Many cores, AVX-512 Columnstore, high parallelism
In-memory (Redis, Memcached) High clock, low cache latency Single-threaded (Redis), RAM bandwidth
Document (MongoDB) Balance (clock × cores) Mixed workload
Distributed (Cassandra, Scylla) Many cores, high cache Shard-per-core (Scylla), compaction
Oracle OLTP High clock, moderate cores, core-factor aware CPU license cost (core factor 0.5 for AMD EPYC and Intel Xeon)
Oracle OLAP / DW Many cores, large SGA, in-memory option Parallel query, Exadata Smart Scan, compression

Oracle CPU licensing — core factor

Oracle licenses per core with a correction factor depending on the processor. Factor 0.5 means 2 cores = 1 Oracle license.

Processor Core factor 64 physical cores → Oracle licenses
AMD EPYC (all series) 0.5 32
Intel Xeon (Scalable) 0.5 32
IBM POWER 1.0 64
ARM (Ampere Altra) 0.5 32

Impact on CPU selection: At the same Oracle license cost, EPYC with more cores is more advantageous — you get more compute power for the same license price.

Configuration by company size and storage type

Variant A: Small company — local NVMe RAID

Component Recommendation Note
CPU 1× EPYC 9124/9224 or Intel Xeon 4410Y (8-16C) 1 socket, high clock
RAM 64-256 GB (8-16 GB/core) DDR5-4800, 1DPC
OS disk 2× SATA/SAS SSD, RAID 1 (240-480 GB) For OS + binaries
Data disk 4-6× NVMe (U.2/E3.S), RAID 10 Local data, no sharing
WAL disk 2× NVMe RAID 1 (400-800 GB) PostgreSQL only
Network 2× 25 GbE (LACP) Application traffic + management
Form factor 1U or 2U Single node, no cluster
Storage backend Local RAID controller (PERC/Broadcom) HW RAID 10 or SW RAID (mdadm)
HA Application manages failover (patroni, repmgr, orchestrator) Standby node on failure

Use case: Startup, branch office, dev/test, < 500 users, single database server, low availability requirements.

Variant B: Medium company — local NVMe + asynchronous replication

Component Recommendation Note
CPU 1-2× EPYC 9334/9374F or Intel Xeon 5418Y (16-24C) 1-2 socket, balanced
RAM 128-512 GB (8-16 GB/core) DDR5-4800/5600, 1DPC
OS disk 2× NVMe RAID 1 (2× 480 GB) OS + binaries
Data disk 6-8× NVMe, RAID 10 Local NVMe, 3-6 TB usable
WAL disk 2× NVMe RAID 1 (2× 800 GB) Separate from data
Network 2× 25 GbE (app) + 2× 25 GbE (replication) Application and replication networks separated
Form factor 2U Primary + replica node
Storage backend SW RAID (mdadm) or HW RAID (PERC H965) Write-back cache with BBU
HA Patroni / repmgr / MySQL InnoDB Cluster Asynchronous replication to 1-2 standby

Use case: E-commerce, medium SaaS, 500-5000 users, RPO < 1 min, RTO < 5 min.

Variant C: Large company — FC SAN (enterprise)

Component Recommendation Note
CPU 2× EPYC 9654/9965 or Xeon 8592+/6980P (48-128C) 2 socket, max cores, large cache
RAM 512 GB - 2 TB (8-16 GB/core) DDR5, 2DPC (speed penalty), 12 channels (EPYC)
OS disk 2× SATA SSD RAID 1 (2× 480 GB) OS only, data on SAN
Data + WAL LUNs from FC SAN Hitachi VSP / Dell PowerMax / Pure //X
HBA 2× dual-port FC HBA (32/64 Gb) Multipath (active-active), FC-NVMe
Network 2× 25/100 GbE (app) + 2× 32/64 Gb FC (storage) App and storage networks separated
Form factor 2U 2-8 node cluster (RAC, AlwaysOn AG)
Storage backend FC SAN — LUN per database Thin provisioning, RAID on SAN, snapshots
HA Oracle RAC / SQL Server AOAG / PostgreSQL Patroni Synchronous replication, FC multipath

SAN advantages: Centralized management, snapshots, cloning, disaster recovery (SRDF/Metro), separate storage network, higher availability. Disadvantages: Higher latency compared to local NVMe (~50-200 µs over SAN vs ~10 µs local NVMe), higher CAPEX, vendor lock-in.

Variant D: Large company — Ceph / SDS backend

Component Recommendation Note
CPU 2× EPYC 9334/9654 (16-32C) Fewer cores than SAN variant — part of CPU goes to Ceph client
RAM 256-512 GB Less RAM — Ceph client cache is not as effective as local buffer
OS disk 2× SATA SSD RAID 1 (2× 480 GB) OS
Network 2× 25/100 GbE (app) + 2× 25/100 GbE (Ceph public) App and Ceph traffic over Ethernet
HBA Storage HBA in IT/HBA mode (no RAID) For Ceph OSD node, not DB node
Form factor 2U DB node + separate Ceph OSD node
Storage backend RBD (RADOS Block Device) over Ceph 3× replication or erasure coding
HA Application + Ceph inherent HA Ceph self-healing, auto-rebalance

Ceph advantages: No vendor lock-in, horizontal scaling, unified platform for block/file/object, lower CAPEX. Disadvantages: Higher latency and CPU overhead (Ceph client → network → OSD), variable performance, more complex troubleshooting.

Variant E: Cloud — RDS / CloudSQL / Azure SQL

Component Recommendation Note
Compute AWS RDS (db.r7g/r8g), Azure SQL (GP/BC/Hyperscale) Managed service, no OS access
Storage EBS gp3 / io2, Azure Premium SSD v2, Cloud SQL SSD Automatic scaling, PITR, multi-AZ
Network Security Group, Private Link, VPC peering No HBA, no SAN — everything over Ethernet
HA Multi-AZ (synchronous), read replicas Managed failover, RTO < 60 s
Backup Automated, PITR (7-35 days) No management required

Use case: No on-prem hardware, elastic scaling, pay-per-use, lower operational overhead. Disadvantages: Higher long-term costs, data residency, network latency, limited customization.

Variant comparison

Aspect Local NVMe (small) Local NVMe (medium) FC SAN Ceph Cloud
Latency ~10 µs ~10 µs ~50-200 µs ~100-500 µs ~100-1000 µs
Scaling Vertical Vertical Horizontal Horizontal Elastic
CAPEX Low Medium High Medium None (OPEX)
Operational overhead Low Low High (SAN admin) Medium None
HA Application Patroni/Cluster RAC/AOAG Ceph HA Managed
RPO 1-5 min < 1 min < 10 s < 30 s < 60 s
RTO 5-15 min < 5 min < 2 min < 5 min < 60 s
Number of servers 1-2 2-4 4-16 6-20+ 0 (managed)
Company Startup/SME SME/Enterprise Enterprise Enterprise Any

PostgreSQL parameter matrix by storage type

Parameter Local NVMe FC SAN Ceph RBD
random_page_cost 1.1 1.5-2.0 2.0-3.0
effective_io_concurrency 300 100-200 50-100
synchronous_commit off (NVMe cache) on (SAN cache) off (Ceph cache)
full_page_writes on on on (even over Ceph)

Storage layout by backend type

Local NVMe (small/medium):

Mount point    FS       RAID       Disk            Purpose
/               ext4    1 (mirror) 2× SATA SSD     OS
/data           xfs     10          4-8× NVMe       Data
/wal            xfs     1 (mirror) 2× NVMe          WAL (PG)

FC SAN (enterprise):

Mount point    FS       Device                   Purpose
/               ext4    local RAID 1 (2× SSD)     OS
/dev/sdb        xfs     FC LUN 1 (500 GB)         WAL (PG)
/dev/sdc        xfs     FC LUN 2 (2 TB)           Data
/dev/sdd        xfs     FC LUN 3 (2 TB)           Indexes (separate)

Ceph RBD:

Mount point    FS       Ceph device               Purpose
/               ext4    local RAID 1 (2× SSD)     OS
/dev/rbd0       xfs     rbd datastore-01          Data + WAL (Ceph RBD)

Kernel tuning by variant

Local NVMe:

vm.dirty_ratio = 30
vm.dirty_background_ratio = 5

FC SAN:

# SAN storage — higher latency, less aggressive flush
vm.dirty_ratio = 20
vm.dirty_background_ratio = 3
vm.dirty_expire_centisecs = 3000   # Defer writes (SAN cache)

Ceph RBD:

# Ceph RBD — network storage, optimize for RBD cache
vm.dirty_ratio = 15
vm.dirty_background_ratio = 2
# RBD cache settings
# rbd cache = true (client-side)
# rbd cache size = 256-512 MB

Database-specific tuning

Parameter PostgreSQL MySQL Oracle MongoDB
Cache shared_buffers 25 % RAM innodb_buffer_pool 70-80 % RAM SGA_TARGET 60-80 % RAM WiredTiger cache 50-80 % RAM
OS cache effective_cache_size 75 % RAM OS cache + InnoDB OS cache (double buffering risk with large SGA) OS cache
Write buffer wal_buffers 64-256 MB innodb_log_file_size 1-4 GB Redo log (2-4 groups, 200 MB-4 GB) WiredTiger log
Connections max_connections 50-500 max_connections 100-500 processes 200-2000 maxIncomingConnections
I/O effective_io_concurrency 200 innodb_io_capacity 2000 db_file_multiblock_read_count 128 WiredTiger eviction
Huge pages huge_pages = try large-pages = ON use_large_pages = only (mandatory) transparent_hugepages=never
Parallel query max_parallel_workers 4-8 innodb_parallel_read_threads 4 parallel_degree_policy = auto — up to 64

Connectivity by variant

Variant App network Storage network Replication Management
Local (small) 2× 25 GbE LACP 2× 25 GbE (same) iDRAC/iLO
Local (medium) 2× 25 GbE LACP 2× 25 GbE dedicated iDRAC/iLO
FC SAN 2× 25/100 GbE 2× 32/64 Gb FC (multipath) FC replication iDRAC/iLO + SAN mgmt
Ceph 2× 25/100 GbE 2× 25/100 GbE (public net) 2× 25/100 GbE (cluster net) iDRAC/iLO + Ceph mgmt
Cloud Elastic IP / Private Link AWS Console / API
Oracle Standalone 2× 25 GbE LACP ASM (2× 25 GbE or FC 32G) Data Guard 2× 25 GbE iLO + ASM mgmt
Oracle RAC 2-4× 25/100 GbE 2× 64 Gb FC (multipath) Cache Fusion interconnect iLO + SAN mgmt
Oracle Exadata 4-8× 100 GbE RoCE NVMe over Fabric RDMA interconnect Exadata CLI + OEDA

Oracle-specific configuration

Oracle ASM — diskgroup layout

Oracle ASM (Automatic Storage Management) replaces traditional filesystem + volume manager:

Diskgroup Redundancy Disks Purpose
DATA Normal (2× mirror) 4-12× FC LUN/NVMe Data files, temp files, control files
FRA (Flash Recovery Area) Normal (2× mirror) 2-6× FC LUN/NVMe Archive logs, backup, flashback logs
REDO High (3× mirror) 2-4× FC LUN/NVMe Online redo log groups (I/O critical)
SPFILE Normal 2× small LUN Server parameter file

ASM striping: Coarse (1 MB) for regular data, Fine (128 KB) for redo logs (lower write latency).

Variant O1: Standalone Oracle (small/medium, single instance)

Parameter Small (< 500 users) Medium (500-2000 users)
CPU 1-2× EPYC 9124-9224 / Xeon 4410Y (8-16C) 2× EPYC 9334-9374F / Xeon 5418Y (16-24C)
RAM (SGA + PGA) 64-128 GB (SGA 70 %, PGA 30 %) 128-512 GB (SGA 60-80 %, PGA 20-40 %)
Huge pages Yes (vm.nr_hugepages) — mandatory for SGA Yes
OS disk 2× SATA SSD RAID 1 (240 GB) 2× NVMe RAID 1 (480 GB)
DATA + FRA 4-6× NVMe, ASM normal redundancy 6-8× NVMe or FC LUN, ASM normal
REDO 2-4× NVMe (separate from DATA), ASM high 4× FC LUN (separate), ASM high
Archive log Local FRA FC LUN (FRA diskgroup)
Network (app) 2× 25 GbE LACP 2-4× 25/100 GbE LACP
Network (storage) — (local NVMe) 2× FC 32G multipath
Network (Data Guard) 2× 25 GbE dedicated
DB version Oracle SE2 (max 16 threads) Oracle EE (unlimited)

Use case: Dev/test, small production DBs, branch offices. SE2 license = max 16 CPU threads, limited parallel execution.

Variant O2: Oracle Data Guard (medium/large, HA + DR)

Primary + standby in active-passive mode, Active Data Guard possible for reporting.

Parameter Recommendation
CPU 2× EPYC 9654-9965 / Xeon 8592+ (32-64C)
RAM 256-1024 GB (SGA 60-80 %, PGA 20-40 %)
Huge pages Yes (50-80 % RAM allocated for SGA)
OS disk 2× NVMe RAID 1 (480 GB)
Storage FC SAN LUN (DATA + FRA + REDO separate) or NVMe + ASM
HBA 2× dual-port FC 32/64 Gb (multipath active-active)
App network 2-4× 25/100 GbE LACP
Storage network 2× FC 32/64 Gb multipath
Data Guard network 2× 25/100 GbE dedicated (sync or async)
Data Guard mode Maximum Availability (sync, fallback to async) — RPO = 0
Topology 1 primary + 1-2 standby (physical), far sync for geo-DR
Active Data Guard Standby open for read (reporting, backup) — requires ADG license

Data Guard latency:

Synchronous (Maximum Availability):
  Primary COMMIT → LGWR flush REDO → sync over network → Standby LGWR → ACK → ~1-5 ms
  RPO = 0, impact on write latency

Asynchronous (Maximum Performance):
  Primary COMMIT → LGWR flush REDO → async to standby buffer → ~0.1-1 ms
  RPO = a few seconds, negligible write impact

Network requirements for Data Guard sync:

  • RTT < 2 ms for synchronous mode (recommended < 1 ms)
  • Min. 10 GbE, recommended 25 GbE (throughput = REDO rate × 2)
  • REDO rate: OLTP ~50-500 MB/s, batch ~500-2000 MB/s
  • At REDO rate 500 MB/s and 25 GbE → ~20 % link utilization

Variant O3: Oracle RAC (large, enterprise)

Multi-instance cluster with shared storage and Cache Fusion.

Parameter Recommendation
Number of nodes 2-4 (typical), max 64 (RAC cluster)
CPU per node 2× EPYC 9654-9965 / Xeon 8592+ (32-64C)
RAM per node 512-2048 GB (SGA 60-80 %, PGA 20-40 %)
Huge pages Yes (1 GB pages if RAM > 512 GB)
Storage FC SAN — shared LUNs (ASM normal/high redundancy)
HBA 2× dual-port FC 64 Gb (multipath, active-active)
App network 2-4× 25/100 GbE LACP (VIP, SCAN listener)
Storage network 2-4× FC 64 Gb (multipath per node)
Cache Fusion interconnect 2× 100 GbE (RoCE v2 or InfiniBand) — dedicated
RAC interconnect latency < 5 µs (recommended), max < 10 µs
ASM Normal redundancy (2-way mirror)
Oracle Clusterware Voting disk (3× 1 GB LUN), OCR (3× 500 MB LUN)
Service OLTP_service, REPORT_service, BATCH_service

Cache Fusion — critical interconnect:

Node A (DB instance) ←──→ Node B (DB instance)
       │                        │
       └──────── ASM ───────────┘
              │
        FC SAN (shared storage)

Cache Fusion traffic: dirty block transfer between instances
  → Latency < 5 µs, otherwise RAC scaling degrades
  → Capacity: 2× 100 GbE, dedicated switch or InfiniBand HDR100
  → Recommended MTU: 9000 (jumbo frames)

RAC sizing by transaction count:

TPS Nodes CPU per node RAM per node Interconnect
< 10 000 2 16-24C 256 GB 2× 25 GbE
10 000 - 50 000 2-4 32-48C 512 GB 2× 100 GbE RoCE
50 000 - 200 000 4-8 48-64C 1024 GB 2× 100 GbE RoCE / InfiniBand
> 200 000 8+ 64-128C 2048 GB InfiniBand HDR100/HDR200

RAC sizing — license cost calculation:

Example: 4-node RAC, each node 2× EPYC 9654 (96C) = 192 cores per node
  Core factor 0.5 → 96 Oracle licenses per node
  4 × 96 = 384 Oracle EE licenses
  At ~$47.5k/license → ~$18.2M (licenses only, without 22 % annual support)

Variant O4: Oracle Exadata (hyperscale)

Engineered system — optimal for hybrid workload (OLTP + DW).

Parameter X9M / X10M Use case
Database servers 2-8× (Xeon, 1.5-6 TB RAM, NVMe) Compute
Storage servers 3-18× (NVMe + HDD, Smart Scan) Predicate offloading
Smart Scan Filtering at storage layer Less data over network, higher throughput
RoCE interconnect 100 GbE (RDMA) Low latency, high bandwidth
In-Memory Column Store Optional license Real-time analytics without ETL
HCC (Hybrid Columnar Compression) Compression in storage servers Up to 10-15× compression for DW
Rack power ~15-30 kW (full rack) Higher density

When to choose Exadata over standalone RAC:

  • OLTP > 50 000 TPS
  • Consolidation needed (multiple DBs on one cluster)
  • Smart Scan significantly accelerates reporting on production data
  • HCC for storage savings on DW workloads

2. Hypervisor host (ESXi / KVM / Hyper-V)

Configuration by size and storage type

Variant A: Small company — local storage (2-3 hosts)

Component Recommendation Note
CPU 1× EPYC 9224/9254 or Xeon 4410Y/5418Y (12-24C) 1 socket, enough cores for VM density
RAM 128-256 GB (4-8 GB/core) DDR5, 1DPC
OS disk 2× SATA SSD RAID 1 (2× 240-480 GB) ESXi / Proxmox / Hyper-V boot
VM storage 4-6× SATA/SAS SSD, RAID 5/6 or 10 Local RAID, 4-12 TB usable
Network 2-4× 10/25 GbE (LACP) Shared for everything (management + VM + storage)
Hypervisor VMware vSphere Standard / Proxmox VE / Hyper-V Basic license, no enterprise features
Storage backend Local RAID controller (PERC H755, Broadcom 9560) HW RAID with cache, write-back
HA VMware HA / Proxmox HA Restart VM on another host on failure
Backup Veeam B&R Free / PBS (Proxmox Backup Server) Local or USB disk

Use case: Small office, branch office, dev/test, < 10 VMs, low budget, simple management. Limitations: No vMotion without shared storage, outage during host failure (HA restart, not seamless).

Variant B: Medium company — vSAN / Ceph (3-6 hosts)

Component Recommendation Note
CPU 1-2× EPYC 9334/9654 or Xeon 5418Y/8592+ (16-32C) 1-2 socket
RAM 256-512 GB (4-8 GB/core) DDR5, 2DPC (minimal penalty)
OS disk 2× SATA SSD RAID 1 or 2× M.2 NVMe (BOSS-S1) Separate from VM storage
Cache tier 1-2× NVMe (vSAN caching / Ceph WAL+DB) For write performance
Capacity tier 4-8× SATA/SAS SSD or HDD (vSAN capacity / Ceph OSD) HDD for capacity, SSD for performance
Network 4× 25/100 GbE — 2× VM + mgmt, 2× storage (vSAN/Ceph) Separate storage network, RDMA (RoCE v2)
Hypervisor VMware vSAN / Proxmox Ceph / StarWind HCI HCI license (vSAN ~$2.5k/Core)
Storage backend vSAN OSA/ESA or Ceph (RADOS) Distributed storage, auto-rebalance
HA vSphere HA + vSAN / Proxmox HA + Ceph vMotion, DRS, automated failover
Failover N+1 (one host as reserve) vSAN requires min. 4 hosts (ESA min. 3)

Pure Ceph variant (Proxmox / OpenStack):

Proxmox node (3-6×):
├── CPU: 1× EPYC 9224-9334 (12-24C)
├── RAM: 128-256 GB
├── OS: 2× SATA SSD RAID 1
├── Ceph OSD: 4-8× NVMe/SATA SSD (RAW, HBA mode)
├── Network: 2× 25 GbE (public) + 2× 25 GbE (cluster)
└── Storage: Ceph 3× replication, CRUSH host failure domain

VMware vSAN variant (4-6 hosts):

vSAN node (4-6×):
├── CPU: 1-2× EPYC/Xeon (16-32C)
├── RAM: 256-512 GB
├── OS: 2× M.2 NVMe (BOSS-S1) or SD card (deprecated)
├── vSAN cache: 1-2× NVMe (write buffer)
├── vSAN capacity: 4-8× SATA SSD (vSAN ESA) or HDD (vSAN OSA)
├── Network: 2× 25/100 GbE (VM) + 2× 25 GbE (vSAN)
└── Storage: vSAN ESA (all-NVMe) or OSA (hybrid)

Use case: SME, enterprise division, 10-100 VMs, need for vMotion, DRS, HA, simple storage management.

Variant C: Large company — FC SAN (6+ hosts)

Component Recommendation Note
CPU 2× EPYC 9654/9965 or Xeon 8592+/6980P (32-64C) 2 socket, max VM density
RAM 512 GB - 2 TB (4-8 GB/core) DDR5, 2DPC
OS disk 2× SATA SSD RAID 1 or SD card (vSphere) Boot, image storage
VM storage LUNs from FC SAN — VMFS / NFS datastores Hitachi, Dell, Pure, HPE storage
HBA 2× dual-port FC HBA 32/64 Gb Multipath, FC-NVMe
Network 4-8× 25/100 GbE — split by traffic type Management, VM, vMotion, FT separated
Hypervisor VMware vSphere Enterprise+ / Hyper-V DC Enterprise license, DRS, HA, FT
Storage backend FC SAN — VMFS 8 datastores, VVols Thin provisioning, storage DRS, array snapshots
HA vSphere HA + DRS + vCenter vMotion, DRS, FT, SRM for DR
Failover N+1 or admission control (CPU/RAM reserve) Reserved capacity for HA failover

Use case: Enterprise, 100+ VMs, mix of DB and applications, centralized storage management, enterprise SLA.

Variant D: Hyperscale — Ceph / SDS (20+ hosts)

Component Recommendation Note
CPU 2× EPYC 9654/9965 (64-128C) 2 socket, compute optimal
RAM 512 GB - 1 TB (2-4 GB/core) Low overcommit ratio for consistency
OS disk 2× M.2 NVMe RAID 1 (BOSS) Boot
Network 4-8× 100 GbE (compute + storage) Separate OVN/OVS for SDN, VXLAN tunneling
Hypervisor OpenStack (Nova) / OpenShift (KubeVirt) Open source, API-driven, multi-tenant
Storage backend Ceph (RADOS, RBD, RGW, CephFS) Unified storage, erasure coding (8+3)
Orchestration OpenStack / Kubernetes Infrastructure-as-Code, autoscaling
HA OpenStack HA / Kubernetes HA Self-healing, auto-rebalance

Use case: Cloud provider, hyperscale, 500+ VMs, multi-tenant, maximum automation.

Hypervisor variant comparison

Aspect Local (small) vSAN/Ceph (medium) FC SAN (large) Ceph hyperscale
Storage Local RAID vSAN / Ceph (HCI) FC SAN (centralized) Ceph (distributed)
Number of hosts 2-3 3-6 6-50+ 20+
VM latency ~10 µs (local) ~100-500 µs ~200 µs (SAN) ~500-2000 µs
CAPEX/host Low Medium High Medium
CAPEX storage Low None (part of hosts) High (SAN array) None (part of hosts)
Management Simple (per host) vCenter / Proxmox vCenter + SAN mgmt OpenStack / K8s
vMotion No (no shared storage) Yes (vSAN / Ceph RBD) Yes (FC LUN) Yes (Ceph RBD)
DRS No Yes (vSphere) Yes (vSphere) OpenStack scheduler
Scaling Vertical Horizontal (add host) Horizontal (host + SAN) Horizontal

Network design by variant

Small (local storage)

Traffic VLAN Speed Teaming Note
Management Mgmt 1 GbE Active/Passive Dedicated port (iLO/iDRAC)
VM + Storage All 2-4× 10/25 GbE LACP Shared, VLAN tagging
┌──────────────────────────────────────────┐
│  Host                                   │
│  ┌──────┐ ┌─────────────────────────────┐│
│  │ iLO  │ │   NIC1   NIC2               ││
│  │ 1 GbE │ │  [LACP] 25 GbE             ││
│  └──────┘ └──────────┬──────────────────┘│
└──────────────────────┼───────────────────┘
                       │
                 ┌─────┴─────┐
                 │  Switch   │
                 └───────────┘

Medium (vSAN / Ceph)

Traffic VLAN Speed Teaming Note
Management Mgmt 1 GbE Active/Passive Dedicated iLO/iDRAC
VM VM 2× 25/100 GbE LACP VM traffic, migration
Storage vSAN/Ceph 2× 25/100 GbE LACP or RDMA Separate, Jumbo frames (MTU 9000)
┌──────────────────────────────────────────┐
│  Host                                   │
│  ┌──────┐ ┌──────────┐ ┌───────────────┐│
│  │ iLO  │ │ NIC1 NIC2│ │ NIC3 NIC4     ││
│  │ 1 GbE │ │ VM traffic│ │ Storage (vSAN)││
│  └──────┘ └──────────┘ └───────────────┘│
└──────────────────────────────────────────┘

Large (FC SAN)

Traffic VLAN Speed Teaming Note
Management Mgmt 1 GbE Active/Passive Dedicated
VM VM 2-4× 25/100 GbE LACP VM traffic
vMotion vMotion 2× 25 GbE Dedicated Multi-NIC vMotion
FT FT 2× 10/25 GbE Dedicated Low latency
Storage 2× 32/64 Gb FC Multipath FC SAN
┌──────────────────────────────────────────────┐
│  Host                                       │
│  ┌──────┐ ┌────────────┐ ┌────┐ ┌─────────┐│
│  │ iLO  │ │ NIC1-4      │ │HBA1│ │ HBA2    ││
│  │ 1 GbE │ │ VM+vMotion+FT│ │32Gb│ │ 32Gb    ││
│  └──────┘ └────────────┘ └─┬──┘ └──┬──────┘│
└────────────────────────────┼───────┼───────┘
                             │       │
                     ┌───────┴───┐ ┌─┴────────┐
                     │ Ethernet  │ │ FC Switch │
                     │ Switch    │ │ (Brocade/ │
                     │           │ │  Cisco)   │
                     └───────────┘ └──────────┘

BIOS for hypervisor — all variants

Setting Value Rationale
Hyper-Threading Enabled Higher VM density
Virtualization Technology Enabled VT-x/AMD-V
VT-d / IOMMU Enabled Passthrough, SR-IOV
Power Management Performance / OS Minimize VM exit latency
C-States Disabled Lower VM exit latency (important for real-time VMs)
NUMA Enabled NUMA-aware VM placement
SR-IOV Enabled NIC/GPU virtualization
Adjacent Sector Prefetch Enabled (Intel) Better sequential reads
DCU Streamer / IP Prefetcher Enabled HW prefetch for VM workload
Patrol Scrub Disabled (vSAN/Ceph) Can cause latency spikes with SDS

Hypervisor selection by variant

Criterion VMware vSphere Proxmox VE Hyper-V OpenStack
Size SME - Enterprise SME SME - Enterprise Hyperscale
Storage vSAN, SAN, NFS Ceph, ZFS, NFS Storage Spaces, SAN Ceph, manila
License ~$1-5k/core Free (support ~$500/host) Part of Windows Server Open source
Familiarity Highest Medium Windows admin Low
Automation Terraform, Ansible, PowerCLI Ansible, Terraform, PBS PowerShell, SCVMM Terraform, Heat, Ansible
Ecosystem Broadest (Veeam, Zerto, SRM) Growing (PBS, remote migration) Windows ecosystem Open source (Kolla, TripleO)

3. Kubernetes node

Node profiles

Role CPU RAM Storage Network Use case
General purpose 16-32 cores 64-128 GB 1× NVMe OS + 1×NVMe local Web, API, microservices
Memory optimized 32-64 cores 256-512 GB 1× NVMe OS + 2×NVMe local In-memory cache, DB
Compute optimized 64-128 cores 128-256 GB 1× NVMe OS Batch, CI/CD
GPU node 32-64 cores 512-1024 GB 1× NVMe OS + 4-8×NVMe local AI/ML training, inference
Storage node 16-32 cores 64-128 GB 4-12× NVMe/SATA (Ceph/Longhorn) SDS, persistent volumes

Kernel tuning

# /etc/sysctl.d/99-kubernetes.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
net.ipv4.conf.all.forwarding = 1

# Connection tracking (for NodePort, Service)
net.netfilter.nf_conntrack_max = 2097152
net.netfilter.nf_conntrack_tcp_timeout_established = 86400

# File watchers (for kubelet, containerd)
fs.inotify.max_user_instances = 8192
fs.inotify.max_user_watches = 524288

# Memory management
vm.swappiness = 0
vm.overcommit_memory = 1      # Allow overcommit (CRI-O, containerd)
vm.panic_on_oom = 0
kernel.panic = 10
kernel.panic_on_oops = 1

Container storage

Type Recommendation Note
OS disk RAID 1 (2× NVMe) Ext4/XFS, 100-200 GB
Container runtime image RAID 1 (2× NVMe) /var/lib/containerd, 200-500 GB
Local PV Single NVMe Raw device, no RAID
Rook/Ceph OSD Raw NVMe/SATA HBA/IT mode, no RAID
Longhorn Raw NVMe/SATA Ext4/XFS per volume

4. Storage server (Ceph / MinIO / NAS)

Ceph OSD node

Component Recommendation Note
CPU 1-2 cores per OSD Up to 12 OSD per node (24 cores)
RAM 4-8 GB per OSD + OS BlueStore cache, 16-64 GB min
Network 2× 25/100 GbE Public + Cluster network
Storage 10-12× NVMe/SATA SSD OSD HBA/IT mode, no RAID
OS disk 2× SATA SSD RAID 1 OS, Ceph MON/MGR

BIOS for Ceph:

  • SATA/NVMe: AHCI/NVMe mode (not RAID)
  • C-States: Disabled (lower OSD latency)
  • NUMA: Enabled
  • Power: Performance

MinIO node

Component Recommendation
CPU 8-16 cores (32+ for erasure coding)
RAM 32-64 GB + 1 GB per 1 TB storage
Storage 4-16× NVMe (direct, no RAID)
Network 2× 25/100 GbE
OS Ubuntu / RHEL, XFS (for data)

NAS (TrueNAS / FreeNAS)

  • ZFS: RAID-Z1/Z2/Z3, compression (lz4, zstd), dedup
  • ARC cache: 1 GB per 1 TB storage (max 64 GB)
  • L2ARC: NVMe cache (optional, read-heavy)
  • SLOG: NVDIMM / Optane (sync write, ZIL)
  • Network: 2-4× 10/25 GbE LACP

5. Web / API servers

Parameter Recommendation
CPU High clock, 8-32 cores
RAM 32-128 GB
Storage 2× NVMe RAID 1 (OS + app)
OS Ubuntu / RHEL, optimized kernel
Network 2× 10/25 GbE (bonding)

Kernel tuning:

net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 15
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.core.netdev_max_backlog = 65535

Quick decision tree — server selection by workload, size and storage

flowchart TD
    W["What workload?"] --> DB["Database"]
    W --> HV["Virtualization"]
    W --> K8s["Kubernetes"]
    W --> AI["AI/ML"]
    W --> ST["Storage server"]
    W --> WEB["Web / API"]

    DB --> DBS{"Company size"}
    DBS -->|"< 500"| DB1["1× EPYC 8-16C, 64-256 GB<br/>NVMe RAID10, 2× 25GbE"]
    DBS -->|"500-5000"| DB2{"Storage"}
    DB2 -->|"Local"| DB2L["1-2× EPYC 16-24C, 128-512 GB<br/>NVMe RAID10, 4× 25GbE"]
    DB2 -->|"Ceph"| DB2C["2× EPYC 16-32C, 256-512 GB<br/>RBD, 4× 25/100GbE"]
    DBS -->|"Enterprise"| DB3{"Storage"}
    DB3 -->|"FC SAN"| DB3F["2× EPYC 48-128C, 512-2048 GB<br/>SAN LUN + 2× FC 32/64G"]
    DB3 -->|"Ceph"| DB3C["2× EPYC 32-64C, 256-512 GB<br/>RBD, 4× 100GbE"]
    DBS -->|"Cloud"| DBC["RDS/Azure SQL/CloudSQL<br/>Managed, Multi-AZ"]

    DB --> ORACLE{"Oracle architecture?"}
    ORACLE -->|"Standalone"| ORA1["1-2× EPYC 8-24C<br/>64-512 GB, ASM local/FC<br/>2× 25GbE + FC 32G"]
    ORACLE -->|"Data Guard"| ORA2["2× EPYC 32-64C<br/>256-1024 GB, FC SAN<br/>2× 25/100GbE + 2× FC 64G<br/>2× 25GbE (DG sync)"]
    ORACLE -->|"RAC 2-4 nodes"| ORA3["Per node: 2× EPYC 32-64C<br/>512-2048 GB, FC SAN<br/>2× 100GbE (app)<br/>2× FC 64G (storage)<br/>2× 100GbE RoCE (interconnect)"]
    ORACLE -->|"Exadata"| ORA4["Engineered system<br/>2-8 DB servers + 3-18 storage<br/>RoCE 100GbE, Smart Scan<br/>15-30 kW/rack"]

    HV --> HVS{"Number of hosts"}
    HVS -->|"2-3"| HV1["1× EPYC 12-24C, 128-256 GB<br/>RAID5/6 SSD, 2-4× 10/25GbE"]
    HVS -->|"3-6"| HV2{"HCI"}
    HV2 -->|"vSAN"| HV2V["1-2× EPYC 16-32C, 256-512 GB<br/>NVMe cache + SSD, 4× 25GbE"]
    HV2 -->|"Ceph"| HV2C["1× EPYC 12-24C, 128-256 GB<br/>4-8× HBA NVMe/SSD, 4× 25GbE"]
    HVS -->|"6+"| HV3["2× EPYC 32-64C, 512-2048 GB<br/>FC SAN 32/64G, 4-8× 25/100GbE"]
    HVS -->|"20+"| HV4["2× EPYC 64-128C, 512-1024 GB<br/>OpenStack + Ceph, 4-8× 100GbE"]

    K8s --> K8T{"Node type"}
    K8T -->|"General"| K8G["16-32C, 64-128 GB<br/>2× NVMe, 2× 25GbE"]
    K8T -->|"Memory"| K8M["32-64C, 256-512 GB<br/>3× NVMe, 2× 25GbE"]
    K8T -->|"GPU"| K8U["32-64C, 512-1024 GB<br/>6-10× NVMe, H100/B200, 4× 100GbE"]
    K8T -->|"Storage"| K8S["16-32C, 64-128 GB<br/>6-14× HBA NVMe, 4× 25GbE"]

    AI --> AIT{"Purpose"}
    AIT -->|"Training"| AITR["GPU H100/B200, NVLink<br/>InfiniBand 400Gb/s, liquid cooling"]
    AIT -->|"Inference"| AIIR["A100/H200, MIG<br/>PCIe 5.0, 2× 100GbE"]

    ST --> STT{"Type"}
    STT -->|"Ceph OSD"| STC["EPYC (PCIe lanes)<br/>4-8 GB/OSD, HBA, 2× 25/100GbE"]
    STT -->|"MinIO"| STM["EPYC 8-16C, 32-64 GB<br/>4-16× NVMe direct, 2× 25/100GbE"]
    STT -->|"NAS (ZFS)"| STN["EPYC 16-32C, 64-128 GB<br/>RAID-Z, SLOG NVMe, 2-4× 10/25GbE"]

    WEB --> WEBE["EPYC high clock, 8-32C<br/>32-128 GB, 2× NVMe RAID1, 2× 10/25GbE"]

Connectivity summary by platform

Platform App / VM network Storage network Replication / Cluster Management
DB local (small) 2× 25 GbE LACP 2× 25 GbE (shared) 1× 1 GbE (iLO)
DB local (medium) 2× 25/100 GbE LACP 2× 25 GbE dedicated 1× 1 GbE (iLO)
DB FC SAN 2× 25/100 GbE LACP 2× 32/64 Gb FC multipath FC replication 1× 1 GbE (iLO) + SAN mgmt
DB Ceph 2× 25/100 GbE 2× 25/100 GbE (Ceph public) 2× 25/100 GbE (Ceph cluster) 1× 1 GbE (iLO)
Hypervisor local 2-4× 10/25 GbE LACP — (local) 1× 1 GbE (iLO)
Hypervisor vSAN 2× 25/100 GbE LACP 2× 25/100 GbE (vSAN) vSAN traffic 1× 1 GbE (iLO)
Hypervisor FC SAN 2-4× 25/100 GbE LACP 2× 32/64 Gb FC multipath 2× 25 GbE (vMotion) 1× 1 GbE (iLO)
Hypervisor Ceph 2× 25/100 GbE LACP 2× 25/100 GbE (Ceph) 2× 25 GbE (migration) 1× 1 GbE (iLO)
Kubernetes 2× 25/100 GbE 2× 25/100 GbE (Ceph/Longhorn) 2× 25/100 GbE (K8s cluster) 1× 1 GbE (BMC)
Web/API 2× 10/25 GbE LACP 1× 1 GbE (BMC)
Oracle Standalone 2× 25 GbE LACP 2× FC 32G or NVMe local Data Guard 2× 25 GbE 1× 1 GbE (iLO) + ASM mgmt
Oracle Data Guard 2× 25/100 GbE LACP 2× FC 64G multipath 2× 25 GbE (DG sync) 1× 1 GbE (iLO) + SAN mgmt
Oracle RAC 2× 100 GbE LACP (VIP/SCAN) 2× FC 64G multipath 2× 100 GbE RoCE (Cache Fusion) 1× 1 GbE (iLO) + Clusterware
Oracle Exadata 4-8× 100 GbE RoCE NVMe over Fabric RDMA interconnect Exadata CLI + OEDA

Sources

Links, books and standards: sources/infrastructure/sources.md

Last revision: 2026-06-03