This commit is contained in:
Stanislav Hubacek
2026-06-03 14:47:26 +02:00
parent 70ee14c2c2
commit c6fa0bff6a
31 changed files with 4212 additions and 0 deletions

204
CONNECTIVITY.md Normal file
View File

@@ -0,0 +1,204 @@
# 🔌 Server connectivity — síťová a storage konektivita
## Ethernet — síťová konektivita
### Rychlosti a formáty
| Rychlost | Označení | Form factor | Kabeláž | Rok standardu | Use case |
|----------|----------|-------------|---------|---------------|----------|
| **1 GbE** | 1000BASE-T | RJ45 (copper) | Cat5e/Cat6 | 1999 | Management, legacy |
| **10 GbE** | 10GBASE-T / SFP+ | RJ45 / SFP+ | Cat6A (30m) / Cat7 (100m) / DAC / SR/LR | 2006 | Běžný server, storage |
| **25 GbE** | 25GBASE-R | SFP28 | Cat8 (30m) / DAC (5m) / SR/LR (100m/10km) | 2016 | Standard pro servery (2020+) |
| **40 GbE** | 40GBASE-R | QSFP+ | DAC (7m) / SR (150m) / LR (10km) | 2010 | Legacy, spine |
| **50 GbE** | 50GBASE-R | SFP56 | DAC / SR / LR | 2018 | Emerging server |
| **100 GbE** | 100GBASE-R | QSFP28 | DAC (3m) / SR4 (100m) / LR4 (10km) / PSM4 (500m) | 2015 | Spine, storage, AI |
| **200 GbE** | 200GBASE-R | QSFP56 | DAC / SR4 / DR4 | 2019 | AI/ML, HPC |
| **400 GbE** | 400GBASE-R | QSFP-DD / OSFP | DAC (2.5m) / SR8 (100m) / DR4 (500m) / FR4 (2km) | 2017 | AI training, hyperscale |
| **800 GbE** | 800GBASE-R | QSFP-DD800 / OSFP | DAC (2m) / SR8 (100m) / DR8 (500m) | 2024 | Next-gen AI/ML |
**Doporučení pro servery (2026)**:
- **Standard**: 2× 25 GbE (management + data) nebo 2× 100 GbE pro náročné workloady
- **AI/ML training**: 8× 400 GbE (InfiniBand preferován pro GPU communication)
- **Storage**: 2× 25/100 GbE (iSCSI/NFS) nebo dedikovaná FC (16/32 Gbps)
### Form factor NIC
| Form factor | PCIe lanes | Rychlost | Use case |
|------------|-----------|----------|----------|
| **OCP 3.0** | x8/x16 | 25/100/200 GbE | Moderní servery (Dell, HPE), small form factor |
| **PCIe HHHL** | x8 | 25/50 GbE | Standardní 1U/2U servery |
| **PCIe FHHL** | x16 | 100/200/400 GbE | GPU servery, high-density |
| **Mezzanine** | x8 | 10/25 GbE | Blade servery (HPE Synergy, Dell MX) |
| **LOM (LAN on Motherboard)** | — | 1/10/25 GbE | Integrovaný, základní konektivita |
### NIC features
| Feature | Popis | Benefit |
|---------|-------|---------|
| **TSO/GRO** | TCP Segmentation Offload / Generic Receive Offload | Snížení CPU zátěže pro TCP |
| **LRO/LSO** | Large Receive/Send Offload | Obdoba TSO/GRO pro legacy |
| **RSS** | Receive Side Scaling | Distribuce příchozích packetů přes více CPU jader |
| **RPS/RFS** | Receive Packet Steering / Flow Steering | Softwarové RSS, cache affinity |
| **XDP** | eXpress Data Path | BPF-based packet processing (DDoS, load balancer) |
| **RDMA (RoCE v2)** | RDMA over Converged Ethernet | GPU direct communication, storage (NVMe-oF) |
| **iWARP** | RDMA over TCP | RDMA bez speciálního switch (vyšší latence) |
| **DPDK** | Data Plane Development Kit | Uživatelský prostor pro packet processing (VNF, vSwitch) |
| **VXLAN/NVGRE offload** | HW offload pro tunelování | Overlay networking (VMware NSX, OpenStack) |
| **SR-IOV** | Single Root I/O Virtualization | Direct NIC access pro VM (VF), nízká latence |
| **Flow Bifurcation** | Split NIC traffic mezi kernel a DPDK | Souběžný management a high-speed data path |
| **PTP (IEEE 1588)** | Precision Time Protocol | Finanční služby, 5G, telco |
### NIC selection per workload
| Workload | Doporučená NIC | Zdůvodnění |
|----------|---------------|------------|
| **Web / API servery** | 2× 25 GbE SFP28, OCP | Nízká cena, dostatečná bandwidth |
| **Virtualizace (VMware)** | 2× 25 GbE (SR-IOV, VXLAN offload) | SR-IOV pro VM, VXLAN pro NSX |
| **Databáze (OLTP)** | 2× 25/100 GbE (RSS, low latency) | Nízká latence, RSS pro CPU scaling |
| **Storage (NFS/iSCSI)** | 2× 25/100 GbE (RoCE v2) | RDMA pro NVMe-oF, low latency |
| **Storage (FC SAN)** | 2× 32 Gb FC HBA | SAN pro VMware VMFS, block storage |
| **AI/ML training** | 8× 400 GbE + InfiniBand NDR | GPU communication, data ingestion |
| **AI/ML inference** | 4× 100 GbE (RoCE v2) | Model serving, GPU direct |
| **HPC** | InfiniBand NDR 400 Gbps | MPI communication, low latency |
| **Telco / Edge** | 2× 25 GbE (DPDK, PTP) | VNF, 5G UPF, low latency |
---
## Storage connectivity
### Fibre Channel (FC) SAN
| Generace | Rychlost | Označení | Form factor | Dosah (SMF) | Use case |
|----------|----------|----------|-------------|-------------|----------|
| **Gen 5** | 16 Gbps | 16GFC | SFP+ | 10 km | Legacy SAN |
| **Gen 6** | 32 Gbps | 32GFC | SFP28 | 10 km | Současný standard |
| **Gen 7** | 64 Gbps | 64GFC | SFP56 | 10 km | Emerging, high-performance |
| **Gen 8** | 128 Gbps | 128GFC | QSFP28 | 10 km | Budoucnost (2026+) |
**HBA (Host Bus Adapter)**:
| Výrobce | Model | Rychlost | PCIe | Porty | Features |
|---------|-------|----------|------|-------|----------|
| **Broadcom / Emulex** | LPe35000 | 32 GFC | PCIe 3.0 x8 | 1-2 | NVMe-FC, T10-PI, SR-IOV |
| **Broadcom / Emulex** | LPe36000 | 64 GFC | PCIe 4.0 x16 | 1-2 | NVMe-FC, FC-NVMe |
| **Marvell / QLogic** | QLE2770 | 32 GFC | PCIe 3.0 x8 | 1-2 | FC-NVMe, T10-PI |
| **Marvell / QLogic** | QLE2870 | 64 GFC | PCIe 4.0 x8 | 1-2 | NVMe-FC, 64GFC |
**FC SAN topology**:
```
Server ──HBA── FC Switch ──── Storage Array (FC port)
│ │
│ ┌────┴────┐
│ │ Fabric │
│ └─────────┘
──── ISL (Inter-Switch Link) ──── backup fabric (B)
```
**Zoning** (FC):
```
Zone A: Server1_HBA1 + Storage_Port1 (production)
Zone B: Server1_HBA2 + Storage_Port2 (backup fabric)
Zone C: Backup_Server + Storage_Target (backup)
```
### iSCSI
| Vlastnost | iSCSI | Poznámka |
|-----------|-------|----------|
| **Transport** | TCP/IP (port 3260) | Po standardním ethernetu |
| **Rychlost** | 1/10/25/100 GbE | Stejná jako Ethernet |
| **Initiator** | SW (OS) nebo HW (TOE) | SW initiator zdarma, ~5-10 % CPU load |
| **Multipathing** | MPIO (Multiple Connections per Session) | Až 8 cest, active/active nebo active/passive |
| **CHAP** | Authentication | Mutual CHAP doporučen |
| **Jumbo frames** | Doporučeno MTU 9000 | Snížení CPU overhead, vyšší throughput |
| **Use case** | Malé a střední SAN, backup, DR | Levnější než FC, nižší výkon |
**iSCSI configuration**:
```
# Software initiator (Linux)
iscsiadm -m discovery -t sendtargets -p 10.0.0.100:3260
iscsiadm -m node --login -T iqn.2024-05.storage:array01
# Multipath (dm-multipath)
mpathconf --enable --with_multipathd y
# /etc/multipath.conf: aliases, failback, rr_min_io
```
### NVMe-oF (NVMe over Fabrics)
| Transport | Protokol | Latence | CPU overhead | Use case |
|-----------|----------|---------|-------------|----------|
| **NVMe over FC** | FC-NVMe (FC Gen 6/7) | <10 µs | Nízký | Enterprise SAN, VMware |
| **NVMe over RDMA (RoCE v2)** | RDMA (RoCE) | <5 µs | Velmi nízký | AI/ML, HPC, K8s (CSI) |
| **NVMe over TCP** | TCP | ~50 µs | Střední (10-20 % CPU) | Standardní Ethernet, bez RDMA |
| **NVMe over InfiniBand** | IB RC/UC | <3 µs | Nejnižší | HPC, AI training |
**NVMe-oF comparison**:
| Vlastnost | FC-NVMe | NVMe/RoCE | NVMe/TCP | NVMe/IB |
|-----------|---------|-----------|----------|---------|
| **Latence (target)** | ~8 µs | ~4 µs | ~50 µs | ~3 µs |
| **Bandwidth** | 64 Gbps | 100/200 GbE | 25/100 GbE | NDR 400 Gbps |
| **Requires special HW** | FC HBA + switch | RoCE NIC + DCB switch | Standard NIC | IB HCA + switch |
| **Ecosystem** | Broadcom, Marvell | NVIDIA, Broadcom | OS built-in | NVIDIA Mellanox |
| **Use case** | VMware, enterprise SAN | AI/ML, K8s, HPC | SMB, K8s, cost-effective | HPC, large AI |
### SAS (Serial Attached SCSI)
| Generace | Rychlost | Kabeláž | Dosah | Use case |
|----------|----------|---------|-------|----------|
| **SAS 3** | 12 Gbps | SAS cable (SFF-8644) | 6-10 m | Legacy storage, DAS |
| **SAS 4** | 22.5 Gbps | SAS cable (SFF-8644) | 6-10 m | Současný standard |
| **SAS 5** | 45 Gbps | SAS cable (SFF-8644) | 6-10 m | Emerging |
**SAS topology**: Server → SAS HBA → SAS expander → SAS disk (point-to-point, ne shared jako FC)
---
## Server connectivity — decision matrix
| Workload | Primární | Sekundární | Management |
|----------|----------|-----------|------------|
| **Web / API** | 2× 25 GbE (LACP) | — | 1× 1 GbE BMC |
| **Databáze** | 2× 25/100 GbE (RSS) | 2× 32 Gb FC (SAN) | 1× 1 GbE BMC |
| **Virtualizace** | 4× 25 GbE (SR-IOV) | 2× 32 Gb FC (VMFS) | 1× 1 GbE BMC |
| **Kubernetes** | 2× 25/100 GbE | — | 1× 1 GbE BMC |
| **Storage node** | 2× 100 GbE (RoCE) | 2× 25 GbE (management) | 1× 1 GbE BMC |
| **AI training** | 8× 400 GbE + IB NDR | 4× 100 GbE (storage) | 1× 1 GbE BMC |
| **AI inference** | 4× 100 GbE (RoCE) | 2× 25 GbE (management) | 1× 1 GbE BMC |
| **HPC** | InfiniBand NDR | 2× 100 GbE (storage) | 1× 1 GbE BMC |
---
## Server NIC placement (PCIe slot optimization)
```
2U Server (GPU/AI):
┌─────────────────────────────────────────────────┐
│ PCIe 0: GPU (x16) — NVLink / InfiniBand (x16) │
│ PCIe 1: GPU (x16) — NIC 100 GbE (x16) │
│ PCIe 2: GPU (x16) │
│ PCIe 3: GPU (x16) │
│ PCIe 4: GPU (x16) │
│ PCIe 5: GPU (x16) — NIC 100 GbE (x16) │
│ PCIe 6: Storage HBA / NIC (x8) │
│ PCIe 7: Management / OCP (x8) │
└─────────────────────────────────────────────────┘
1U Standard:
┌─────────────────────────────────┐
│ OCP: 2× 25 GbE (management) │
│ PCIe 0: NIC 25 GbE (x8) │
│ PCIe 1: Storage HBA / FC (x8) │
│ PCIe 2: GPU (x16, optional) │
│ PCIe 3: NVMe (x4, M.2) │
└─────────────────────────────────┘
```
## Zdroje
Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)