Files
Stanislav Hubacek 3fa11ef0f6 comiiit
2026-06-11 15:27:28 +02:00

379 lines
14 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Case study: Proxmox VE demo cluster (3× node, Ceph, HA)
## 1. Requirements and parameters
| Parameter | Value |
|----------|---------|
| Number of hosts | 3 |
| Purpose | demo, learning, development |
| Hypervisor | Proxmox VE (free) |
| Budget | low-cost (~$10,000$15,000) |
| Storage | Ceph (HCI) |
| HA | yes |
| Location | 1 rack, standard office room |
---
## 2. Server configuration
Based on a combination of the **Mini variant** (23 hosts, single-socket) and the **pure Ceph variant** per SERVER-CONFIG.md. Each of the 3 nodes is identical.
### 2.1 Single node configuration
| Component | Specification | Rationale |
|------------|-------------|------------|
| **CPU** | 1× AMD EPYC 9224 (24C/48T, 200 W TDP) or Intel Xeon 5418Y (16C/32T) | SERVER-CONFIG.md: "Pure Ceph variant: CPU 1× EPYC 92249334 (1224C)". Ceph requires 12 cores per OSD; with 3 OSD + Proxmox + VM, 12+ cores is the minimum. |
| **RAM** | 128 GB DDR5-4800 (4× 32 GB RDIMM, 1DPC) | SERVER-CONFIG.md: "RAM 128256 GB" for Ceph variant. 128 GB is sufficient for demo; 48 GB per OSD + OS + lightweight VMs. |
| **OS disk** | 2× 240 GB SATA SSD, RAID 1 (HW controller in HBA mode or SW mdadm) | "OS: 2× SATA SSD RAID 1" per Ceph variant. |
| **Ceph OSD** | 3× 960 GB SATA SSD (HBA/IT mode, no HW RAID) | "Ceph OSD: 48× NVMe/SATA SSD (RAW, HBA mode)". For demo we reduce to 3 OSD/node. Total 9 OSD in cluster. |
| **NIC** | 2× dual-port 10 GbE SFP+ (total 4× 10 GbE) | "Network: 2× 25 GbE public + 2× 25 GbE cluster". For low-cost we choose 10 GbE (SFP+), the concept remains the same. |
| **BMC** | 1× 1 GbE (iDRAC / iLO / IPMI) | Standard management port, CONNECTIVITY.md. |
| **Form factor** | 1U rack server (Dell R660, HPE DL360 Gen11, or Supermicro) | 19" rack, suitable for 1U. |
### 2.2 CPU choice rationale
KB states for the Mini variant "1× EPYC 4124 (4C) or Xeon E-2400". However, 4 cores is insufficient for Ceph (OSD + Proxmox + VM). Therefore we choose EPYC 9224 (24C) / Xeon 5418Y (16C), which corresponds to the Ceph variant in SERVER-CONFIG.md. The price is higher, but the cluster is functional for real-world testing.
---
## 3. Storage variant — Ceph
### 3.1 Topology
```
3× Proxmox node ─── each 3× OSD (SATA SSD)
Ceph cluster
┌─────────┼─────────┐
3× MON 3× MGR 9× OSD
```
### 3.2 Ceph configuration
| Parameter | Value | Note |
|----------|---------|---------|
| Replication | 3 (size = 3, min_size = 2) | Standard per STORAGE.md |
| Failure domain | host | CRUSH: replication across nodes |
| Raw capacity | 9 × 960 GB ≈ 8.6 TB | |
| Usable capacity | ~2.9 TB (8.6 / 3) | Sufficient for demo |
| OSD backend | BlueStore | Default in Ceph, recommended |
| MON quorum | 3 (1 per node) | Minimum for HA |
| Cache | RAM (BlueStore cache) | 12 GB per OSD |
| Network public | 2× 10 GbE LACP | VM traffic + Ceph frontend |
| Network cluster | 2× 10 GbE LACP | Ceph backend replication |
| MTU | 9000 (jumbo frames) | Recommended per NETWORKING.md |
### 3.3 Storage layout on disk
```
/dev/sda 240 GB OS (RAID 1, mirror with /dev/sdb)
/dev/sdc 960 GB OSD.0 (RAW, BlueStore)
/dev/sdd 960 GB OSD.1 (RAW, BlueStore)
/dev/sde 960 GB OSD.2 (RAW, BlueStore)
```
### 3.4 Ceph pool design
| Pool | PG count | Replication | Purpose |
|------|----------|-----------|-------|
| vms | 128 | 3× | VM disks (RBD) |
| data | 64 | 3× | Data volume |
| backups | 32 | 3× | Backups (low priority) |
PG count is approximate for demo (9 OSD). Production formula: (OSD_total × 100) / replication_size.
---
## 4. Network
### 4.1 Topology
```
┌─────────────────┐
│ 10 GbE Switch │
│ (24-port SFP+) │
└──┬──┬──┬──┬──┬──┘
┌─────────────┘ │ │ └─────────────┐
│ │ │ │
┌─────┴─────┐ ┌────┴──┴───┐ ┌───────┴──┐
│ Node 1 │ │ Node 2 │ │ Node 3 │
│ 4×10GbE │ │ 4×10GbE │ │ 4×10GbE │
│ ┌──────┐ │ │ ┌──────┐ │ │ ┌──────┐ │
│ │1GbE │ │ │ │1GbE │ │ │ │1GbE │ │
│ │BMC │ │ │ │BMC │ │ │ │BMC │ │
└─────────┘ └───────────┘ └───────────┘
```
### 4.2 VLAN and traffic segmentation
| VLAN | Purpose | Ports | MTU |
|------|------|-------|-----|
| VLAN 10 | Management (Proxmox web UI, SSH) | 1× 1 GbE BMC | 1500 |
| VLAN 20 | VM traffic + Ceph public | 2× 10 GbE (bond) | 9000 |
| VLAN 30 | Ceph cluster (backend) | 2× 10 GbE (bond) | 9000 |
### 4.3 Switch
| Parameter | Value |
|----------|---------|
| Model | MikroTik CRS326-24S+2Q+RM or similar L2+ switch |
| Ports | 24× SFP+ 10 GbE |
| Management | VLAN 10, IP 10.0.0.254/24 |
| Features | VLAN, LACP (LAG), Jumbo frames (MTU 9000), SNMP |
### 4.4 Cabling
| Type | Length | Quantity | Purpose |
|-----|-------|-------|-------|
| SFP+ DAC (passive) | 3 m | 12 | 10 GbE connection server ↔ switch |
| Cat6A UTP | 3 m | 3 | Management (1 GbE BMC) |
| Cat6A UTP | 1 m | 1 | Internet uplink (patch panel) |
DAC cables are cheaper than SFP+ optics + patch cords — suitable for single-rack.
---
## 5. Rack layout
### 5.1 Dimensions and positions
| U | Device | Power (W) |
|---|----------|-----------|
| U1 | Switch 10 GbE (1U) | ~60 W |
| U2 | UPS (2U) | — |
| U3 | (empty, ventilation) | — |
| U4 | Server Node 1 (1U) | ~250 W |
| U5 | Server Node 2 (1U) | ~250 W |
| U6 | Server Node 3 (1U) | ~250 W |
| U7U15 | Empty (optional storage, patch panel) | — |
| Parameter | Value |
|----------|---------|
| Rack type | 15U wall-mount, 19", 600×600 mm |
| Total IT load | ~810 W |
| PUE estimate | ~1.5 (office room, no precision cooling) |
| Cooling | Standard office AC (ASHRAE A2: 1035 °C). Sufficient for <1 kW. |
**Note:** KB (DATACENTERS.md) states free air cooling for low density (<5 kW/rack). Standard ventilation and AC are sufficient in an office.
### 5.2 UPS
| Parameter | Value |
|----------|---------|
| Type | VI (line-interactive) — per DATACENTERS.md for smaller racks |
| Capacity | 2000 VA / 1200 W |
| Backup time | ~1520 min at 810 W load |
| Output | 8× C13 (for servers + switch) |
| Battery | VRLA (cheaper) or Li-ion LFP |
| Management | USB / SNMP card (automatic Proxmox shutdown) |
Optionally can be upgraded to VFI (double-conversion) UPS for cleaner output, but VI is sufficient for demo.
### 5.3 PDU
1× basic 1U PDU (8× C13), 230 V / 10 A — for distribution to servers.
---
## 6. Hypervisor — Proxmox VE
### 6.1 Installation and configuration
| Component | Version / Configuration |
|------------|---------------------|
| Hypervisor | Proxmox VE 8.x (Debian 12 + KVM + LXC) |
| Storage backend | Ceph Reef / Squid (18.x) integrated in Proxmox |
| Cluster | 3-node cluster, Corosync + PMXCFS |
| HA | Proxmox HA — 1 node failure tolerance (remaining 2 take over VMs) |
| Fencing | watchdog (softdog) + Proxmox HA manager |
### 6.2 License
| Item | Price | Note |
|---------|------|----------|
| Proxmox VE | $0 | Open source, full functionality without license |
| Proxmox community support | $0 | Forum, wiki |
| Proxmox enterprise support (optional) | ~€500/host/year | Can be purchased later |
HYPERVISORS.md: Proxmox VE is "open source (free)", no license required.
### 6.3 HA setup
- HA group: all 3 nodes, no-quorum-policy = "stop" (for demo)
- Max VM restart: 2 attempts
- Migration: live migration via Ceph RBD (shared storage)
---
## 7. Budget estimate
**Disclaimer:** KB does not contain specific component prices. The following amounts are approximate market estimates (Q2 2026, USD).
### 7.1 Servers (3×)
| Item | Qty | Price/unit | Total |
|---------|------|----------|--------|
| 1U rack server (basic config, without CPU/RAM/disk) | 3 | ~$1,200 | $3,600 |
| AMD EPYC 9224 (24C) / Intel Xeon 5418Y (16C) — per KB | 3 | ~$900 | $2,700 |
| RAM 128 GB (4× 32 GB DDR5-4800 RDIMM) | 3 | ~$600 | $1,800 |
| 240 GB SATA SSD (OS) | 6 | ~$50 | $300 |
| 960 GB SATA SSD (Ceph OSD) | 9 | ~$150 | $1,350 |
| Dual-port 10 GbE SFP+ NIC (e.g. Intel X710-DA2) | 6 | ~$120 | $720 |
| **Servers total** | | | **~$10,470** |
### 7.2 Network
| Item | Qty | Price/unit | Total |
|---------|------|----------|--------|
| MikroTik CRS326-24S+2Q+RM (24× 10GbE SFP+) | 1 | ~$600 | $600 |
| SFP+ DAC cable 3 m (passive) | 12 | ~$15 | $180 |
| Network total | | | **~$780** |
### 7.3 Rack and power
| Item | Qty | Price/unit | Total |
|---------|------|----------|--------|
| 15U wall-mount rack 19" | 1 | ~$300 | $300 |
| UPS 2000 VA (line-interactive, VRLA) | 1 | ~$450 | $450 |
| 1U PDU basic (8× C13) | 1 | ~$60 | $60 |
| Rack + power total | | | **~$810** |
### 7.4 Other
| Item | Price |
|---------|------|
| Cat6A patch cables, management | ~$50 |
| Mounting material, velcro | ~$30 |
| Shipping and installation | ~$200 |
| Other total | **~$280** |
### 7.5 Total calculation
| Category | Amount |
|-----------|--------|
| Servers (3× node) | ~$10,470 |
| Network (switch + cables) | ~$780 |
| Rack + power | ~$810 |
| Other | ~$280 |
| **Total** | **~$12,340** |
| Reserve (1015%) | ~$1,2001,800 |
| **Total with reserve** | **~$13,500$14,100** |
Budget **$10,000$15,000** is achievable. Using cheaper CPUs (EPYC 4124P / Xeon E-2488), it can be built for ~$8,0009,000, but with limited performance for Ceph.
**Possible savings:**
- CPU: 2× EPYC 4124P (4C) + 1× more powerful node → ~$800 savings (but asymmetric cluster)
- OSD: 2× instead of 3× SSD/node → ~$500 savings (less capacity)
- Switch: 12-port instead of 24-port → ~$300 savings
---
## 8. Topology diagram
```mermaid
flowchart TB
subgraph Rack["15U Rack (office)"]
U1["U1: 10GbE Switch (MikroTik)"]
U2["U2: UPS 2000 VA"]
U4["U4: Node 1 — Proxmox + Ceph OSD"]
U5["U5: Node 2 — Proxmox + Ceph OSD"]
U6["U6: Node 3 — Proxmox + Ceph OSD"]
end
subgraph Node1["Node 1 (detail)"]
N1_CPU["CPU: EPYC 9224 (24C)"]
N1_RAM["RAM: 128 GB DDR5"]
N1_OS["OS: 2× 240 GB SSD (RAID 1)"]
N1_OSD1["OSD.0: 960 GB SSD"]
N1_OSD2["OSD.1: 960 GB SSD"]
N1_OSD3["OSD.2: 960 GB SSD"]
N1_NIC["NIC: 4× 10GbE SFP+"]
N1_BMC["BMC: 1× 1GbE"]
end
U1 ---|"4× 10GbE LACP<br/>(public + cluster)"| U4
U1 ---|"4× 10GbE LACP"| U5
U1 ---|"4× 10GbE LACP"| U6
U4 --- N1_CPU
U4 --- N1_RAM
U4 --- N1_OS
U4 --- N1_OSD1
U4 --- N1_OSD2
U4 --- N1_OSD3
U4 --- N1_NIC
U4 --- N1_BMC
subgraph Ceph["Ceph Cluster"]
CEPH_MON["3× MON (1 per node)"]
CEPH_MGR["3× MGR (1 per node)"]
CEPH_OSD["9× OSD (3 per node)"]
end
U4 --- CEPH_MON
U5 --- CEPH_MON
U6 --- CEPH_MON
U4 --- CEPH_MGR
U5 --- CEPH_MGR
U6 --- CEPH_MGR
U4 --- CEPH_OSD
U5 --- CEPH_OSD
U6 --- CEPH_OSD
subgraph Proxmox["Proxmox VE Cluster"]
PMX_HA["HA Group (3 nodes)"]
PMX_HA --- U4
PMX_HA --- U5
PMX_HA --- U6
end
subgraph Uplink["Internet / LAN"]
UPLINK_SW["Office LAN<br/>(1 GbE)"]
end
U1 ---|"1× Cat6A<br/>1 GbE"| UPLINK_SW
U1 ---|"Internet<br/>(ISP router)"| UPLINK_SW
```
---
## 9. Summary and key decisions
| Decision | Variant | Rationale |
|------------|----------|------------|
| Hypervisor | Proxmox VE | HYPERVISORS.md: "For SME / low budget — open source, built-in Ceph, no license costs". Ideal for demo. |
| Storage | Ceph (3× replication) | STORAGE.md + SERVER-CONFIG.md: Ceph is the recommended SDS for Proxmox, 3 nodes minimum for quorum. |
| CPU | Single-socket EPYC 9224 / Xeon 5418Y | Compromise between price (Mini variant ~1 socket) and performance for Ceph (Ceph variant ~12+ cores). |
| Network | 10 GbE SFP+ (instead of 25 GbE) | KB recommends 25 GbE, but for low-cost demo 10 GbE is sufficient. The concept (public/cluster network separation) remains the same. |
| Rack | 15U wall-mount | Suitable for office, no raised floor, no precision cooling. |
| UPS | 2000 VA line-interactive | DATACENTERS.md: VI type for smaller racks. Sufficient for demo. |
| License | Proxmox VE (free) | No license costs, support can be purchased later. |
### Compromises compared to production deployment
- **25 GbE → 10 GbE**: lower Ceph cluster network throughput (not an issue in demo environment)
- **HDD → SSD**: for Ceph OSD we choose SSD instead of HDD (higher price, better performance — demo focuses on functionality, not capacity)
- **2× 10 GbE public + 2× 10 GbE cluster → combined on LACP**: can be merged when ports are scarce, but separation is better
- **Cooling**: office AC, not DC-grade precision cooling (PUE ~1.51.8)
### What KB does not address (supplemented from practice)
KB does not contain specific component prices — the budget is an approximate market estimate. It also does not specify a concrete switch model with L2+ features (VLAN, LACP, Jumbo frames). Here we follow common practice for the SOHO/SME segment.
---
## 10. References from KB
- **DATACENTERS.md** — rack layout, power chain, UPS types, cooling classes (ASHRAE), cabling standards
- **HYPERVISORS.md** — Proxmox VE as open source variant, platform comparison, Mini variant (23 hosts), Ceph connectivity
- **SERVER-CONFIG.md** — Pure Ceph variant (36 hosts), HW specification, network design, BIOS settings
- **STORAGE.md** — Ceph architecture (MON/MGR/OSD, CRUSH map, BlueStore, replication), SDS overview
- **CONNECTIVITY.md** — Ethernet speeds (10/25 GbE), SFP+ form factor, NIC placement, management port
- **NETWORKING.md** — VLAN segmentation, MTU and jumbo frames, best practices
- **SERVER-HW.md** — CPU selection (EPYC vs Xeon), RAM population (1DPC/2DPC), NUMA, form factors
---
*Last revision: 2026-06-04*