Files
knowledge-base/CONNECTIVITY.md
Stanislav Hubacek c6fa0bff6a commit
2026-06-11 15:27:28 +02:00

205 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 🔌 Server connectivity — síťová a storage konektivita
## Ethernet — síťová konektivita
### Rychlosti a formáty
| Rychlost | Označení | Form factor | Kabeláž | Rok standardu | Use case |
|----------|----------|-------------|---------|---------------|----------|
| **1 GbE** | 1000BASE-T | RJ45 (copper) | Cat5e/Cat6 | 1999 | Management, legacy |
| **10 GbE** | 10GBASE-T / SFP+ | RJ45 / SFP+ | Cat6A (30m) / Cat7 (100m) / DAC / SR/LR | 2006 | Běžný server, storage |
| **25 GbE** | 25GBASE-R | SFP28 | Cat8 (30m) / DAC (5m) / SR/LR (100m/10km) | 2016 | Standard pro servery (2020+) |
| **40 GbE** | 40GBASE-R | QSFP+ | DAC (7m) / SR (150m) / LR (10km) | 2010 | Legacy, spine |
| **50 GbE** | 50GBASE-R | SFP56 | DAC / SR / LR | 2018 | Emerging server |
| **100 GbE** | 100GBASE-R | QSFP28 | DAC (3m) / SR4 (100m) / LR4 (10km) / PSM4 (500m) | 2015 | Spine, storage, AI |
| **200 GbE** | 200GBASE-R | QSFP56 | DAC / SR4 / DR4 | 2019 | AI/ML, HPC |
| **400 GbE** | 400GBASE-R | QSFP-DD / OSFP | DAC (2.5m) / SR8 (100m) / DR4 (500m) / FR4 (2km) | 2017 | AI training, hyperscale |
| **800 GbE** | 800GBASE-R | QSFP-DD800 / OSFP | DAC (2m) / SR8 (100m) / DR8 (500m) | 2024 | Next-gen AI/ML |
**Doporučení pro servery (2026)**:
- **Standard**: 2× 25 GbE (management + data) nebo 2× 100 GbE pro náročné workloady
- **AI/ML training**: 8× 400 GbE (InfiniBand preferován pro GPU communication)
- **Storage**: 2× 25/100 GbE (iSCSI/NFS) nebo dedikovaná FC (16/32 Gbps)
### Form factor NIC
| Form factor | PCIe lanes | Rychlost | Use case |
|------------|-----------|----------|----------|
| **OCP 3.0** | x8/x16 | 25/100/200 GbE | Moderní servery (Dell, HPE), small form factor |
| **PCIe HHHL** | x8 | 25/50 GbE | Standardní 1U/2U servery |
| **PCIe FHHL** | x16 | 100/200/400 GbE | GPU servery, high-density |
| **Mezzanine** | x8 | 10/25 GbE | Blade servery (HPE Synergy, Dell MX) |
| **LOM (LAN on Motherboard)** | — | 1/10/25 GbE | Integrovaný, základní konektivita |
### NIC features
| Feature | Popis | Benefit |
|---------|-------|---------|
| **TSO/GRO** | TCP Segmentation Offload / Generic Receive Offload | Snížení CPU zátěže pro TCP |
| **LRO/LSO** | Large Receive/Send Offload | Obdoba TSO/GRO pro legacy |
| **RSS** | Receive Side Scaling | Distribuce příchozích packetů přes více CPU jader |
| **RPS/RFS** | Receive Packet Steering / Flow Steering | Softwarové RSS, cache affinity |
| **XDP** | eXpress Data Path | BPF-based packet processing (DDoS, load balancer) |
| **RDMA (RoCE v2)** | RDMA over Converged Ethernet | GPU direct communication, storage (NVMe-oF) |
| **iWARP** | RDMA over TCP | RDMA bez speciálního switch (vyšší latence) |
| **DPDK** | Data Plane Development Kit | Uživatelský prostor pro packet processing (VNF, vSwitch) |
| **VXLAN/NVGRE offload** | HW offload pro tunelování | Overlay networking (VMware NSX, OpenStack) |
| **SR-IOV** | Single Root I/O Virtualization | Direct NIC access pro VM (VF), nízká latence |
| **Flow Bifurcation** | Split NIC traffic mezi kernel a DPDK | Souběžný management a high-speed data path |
| **PTP (IEEE 1588)** | Precision Time Protocol | Finanční služby, 5G, telco |
### NIC selection per workload
| Workload | Doporučená NIC | Zdůvodnění |
|----------|---------------|------------|
| **Web / API servery** | 2× 25 GbE SFP28, OCP | Nízká cena, dostatečná bandwidth |
| **Virtualizace (VMware)** | 2× 25 GbE (SR-IOV, VXLAN offload) | SR-IOV pro VM, VXLAN pro NSX |
| **Databáze (OLTP)** | 2× 25/100 GbE (RSS, low latency) | Nízká latence, RSS pro CPU scaling |
| **Storage (NFS/iSCSI)** | 2× 25/100 GbE (RoCE v2) | RDMA pro NVMe-oF, low latency |
| **Storage (FC SAN)** | 2× 32 Gb FC HBA | SAN pro VMware VMFS, block storage |
| **AI/ML training** | 8× 400 GbE + InfiniBand NDR | GPU communication, data ingestion |
| **AI/ML inference** | 4× 100 GbE (RoCE v2) | Model serving, GPU direct |
| **HPC** | InfiniBand NDR 400 Gbps | MPI communication, low latency |
| **Telco / Edge** | 2× 25 GbE (DPDK, PTP) | VNF, 5G UPF, low latency |
---
## Storage connectivity
### Fibre Channel (FC) SAN
| Generace | Rychlost | Označení | Form factor | Dosah (SMF) | Use case |
|----------|----------|----------|-------------|-------------|----------|
| **Gen 5** | 16 Gbps | 16GFC | SFP+ | 10 km | Legacy SAN |
| **Gen 6** | 32 Gbps | 32GFC | SFP28 | 10 km | Současný standard |
| **Gen 7** | 64 Gbps | 64GFC | SFP56 | 10 km | Emerging, high-performance |
| **Gen 8** | 128 Gbps | 128GFC | QSFP28 | 10 km | Budoucnost (2026+) |
**HBA (Host Bus Adapter)**:
| Výrobce | Model | Rychlost | PCIe | Porty | Features |
|---------|-------|----------|------|-------|----------|
| **Broadcom / Emulex** | LPe35000 | 32 GFC | PCIe 3.0 x8 | 1-2 | NVMe-FC, T10-PI, SR-IOV |
| **Broadcom / Emulex** | LPe36000 | 64 GFC | PCIe 4.0 x16 | 1-2 | NVMe-FC, FC-NVMe |
| **Marvell / QLogic** | QLE2770 | 32 GFC | PCIe 3.0 x8 | 1-2 | FC-NVMe, T10-PI |
| **Marvell / QLogic** | QLE2870 | 64 GFC | PCIe 4.0 x8 | 1-2 | NVMe-FC, 64GFC |
**FC SAN topology**:
```
Server ──HBA── FC Switch ──── Storage Array (FC port)
│ │
│ ┌────┴────┐
│ │ Fabric │
│ └─────────┘
──── ISL (Inter-Switch Link) ──── backup fabric (B)
```
**Zoning** (FC):
```
Zone A: Server1_HBA1 + Storage_Port1 (production)
Zone B: Server1_HBA2 + Storage_Port2 (backup fabric)
Zone C: Backup_Server + Storage_Target (backup)
```
### iSCSI
| Vlastnost | iSCSI | Poznámka |
|-----------|-------|----------|
| **Transport** | TCP/IP (port 3260) | Po standardním ethernetu |
| **Rychlost** | 1/10/25/100 GbE | Stejná jako Ethernet |
| **Initiator** | SW (OS) nebo HW (TOE) | SW initiator zdarma, ~5-10 % CPU load |
| **Multipathing** | MPIO (Multiple Connections per Session) | Až 8 cest, active/active nebo active/passive |
| **CHAP** | Authentication | Mutual CHAP doporučen |
| **Jumbo frames** | Doporučeno MTU 9000 | Snížení CPU overhead, vyšší throughput |
| **Use case** | Malé a střední SAN, backup, DR | Levnější než FC, nižší výkon |
**iSCSI configuration**:
```
# Software initiator (Linux)
iscsiadm -m discovery -t sendtargets -p 10.0.0.100:3260
iscsiadm -m node --login -T iqn.2024-05.storage:array01
# Multipath (dm-multipath)
mpathconf --enable --with_multipathd y
# /etc/multipath.conf: aliases, failback, rr_min_io
```
### NVMe-oF (NVMe over Fabrics)
| Transport | Protokol | Latence | CPU overhead | Use case |
|-----------|----------|---------|-------------|----------|
| **NVMe over FC** | FC-NVMe (FC Gen 6/7) | <10 µs | Nízký | Enterprise SAN, VMware |
| **NVMe over RDMA (RoCE v2)** | RDMA (RoCE) | <5 µs | Velmi nízký | AI/ML, HPC, K8s (CSI) |
| **NVMe over TCP** | TCP | ~50 µs | Střední (10-20 % CPU) | Standardní Ethernet, bez RDMA |
| **NVMe over InfiniBand** | IB RC/UC | <3 µs | Nejnižší | HPC, AI training |
**NVMe-oF comparison**:
| Vlastnost | FC-NVMe | NVMe/RoCE | NVMe/TCP | NVMe/IB |
|-----------|---------|-----------|----------|---------|
| **Latence (target)** | ~8 µs | ~4 µs | ~50 µs | ~3 µs |
| **Bandwidth** | 64 Gbps | 100/200 GbE | 25/100 GbE | NDR 400 Gbps |
| **Requires special HW** | FC HBA + switch | RoCE NIC + DCB switch | Standard NIC | IB HCA + switch |
| **Ecosystem** | Broadcom, Marvell | NVIDIA, Broadcom | OS built-in | NVIDIA Mellanox |
| **Use case** | VMware, enterprise SAN | AI/ML, K8s, HPC | SMB, K8s, cost-effective | HPC, large AI |
### SAS (Serial Attached SCSI)
| Generace | Rychlost | Kabeláž | Dosah | Use case |
|----------|----------|---------|-------|----------|
| **SAS 3** | 12 Gbps | SAS cable (SFF-8644) | 6-10 m | Legacy storage, DAS |
| **SAS 4** | 22.5 Gbps | SAS cable (SFF-8644) | 6-10 m | Současný standard |
| **SAS 5** | 45 Gbps | SAS cable (SFF-8644) | 6-10 m | Emerging |
**SAS topology**: Server → SAS HBA → SAS expander → SAS disk (point-to-point, ne shared jako FC)
---
## Server connectivity — decision matrix
| Workload | Primární | Sekundární | Management |
|----------|----------|-----------|------------|
| **Web / API** | 2× 25 GbE (LACP) | — | 1× 1 GbE BMC |
| **Databáze** | 2× 25/100 GbE (RSS) | 2× 32 Gb FC (SAN) | 1× 1 GbE BMC |
| **Virtualizace** | 4× 25 GbE (SR-IOV) | 2× 32 Gb FC (VMFS) | 1× 1 GbE BMC |
| **Kubernetes** | 2× 25/100 GbE | — | 1× 1 GbE BMC |
| **Storage node** | 2× 100 GbE (RoCE) | 2× 25 GbE (management) | 1× 1 GbE BMC |
| **AI training** | 8× 400 GbE + IB NDR | 4× 100 GbE (storage) | 1× 1 GbE BMC |
| **AI inference** | 4× 100 GbE (RoCE) | 2× 25 GbE (management) | 1× 1 GbE BMC |
| **HPC** | InfiniBand NDR | 2× 100 GbE (storage) | 1× 1 GbE BMC |
---
## Server NIC placement (PCIe slot optimization)
```
2U Server (GPU/AI):
┌─────────────────────────────────────────────────┐
│ PCIe 0: GPU (x16) — NVLink / InfiniBand (x16) │
│ PCIe 1: GPU (x16) — NIC 100 GbE (x16) │
│ PCIe 2: GPU (x16) │
│ PCIe 3: GPU (x16) │
│ PCIe 4: GPU (x16) │
│ PCIe 5: GPU (x16) — NIC 100 GbE (x16) │
│ PCIe 6: Storage HBA / NIC (x8) │
│ PCIe 7: Management / OCP (x8) │
└─────────────────────────────────────────────────┘
1U Standard:
┌─────────────────────────────────┐
│ OCP: 2× 25 GbE (management) │
│ PCIe 0: NIC 25 GbE (x8) │
│ PCIe 1: Storage HBA / FC (x8) │
│ PCIe 2: GPU (x16, optional) │
│ PCIe 3: NVMe (x4, M.2) │
└─────────────────────────────────┘
```
## Zdroje
Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)