271 lines
14 KiB
Markdown
271 lines
14 KiB
Markdown
# 🔌 Server connectivity — network and storage connectivity
|
||
|
||
## Ethernet — network connectivity
|
||
|
||
### Speeds and formats
|
||
|
||
| Speed | Designation | Form factor | Cabling | Standard year | Use case |
|
||
|----------|----------|-------------|---------|---------------|----------|
|
||
| **1 GbE** | 1000BASE-T | RJ45 (copper) | Cat5e/Cat6 | 1999 | Management, legacy |
|
||
| **10 GbE** | 10GBASE-T / SFP+ | RJ45 / SFP+ | Cat6A (30m) / Cat7 (100m) / DAC / SR/LR | 2006 | Common server, storage |
|
||
| **25 GbE** | 25GBASE-R | SFP28 | Cat8 (30m) / DAC (5m) / SR/LR (100m/10km) | 2016 | Standard for servers (2020+) |
|
||
| **40 GbE** | 40GBASE-R | QSFP+ | DAC (7m) / SR (150m) / LR (10km) | 2010 | Legacy, spine |
|
||
| **50 GbE** | 50GBASE-R | SFP56 | DAC / SR / LR | 2018 | Emerging server |
|
||
| **100 GbE** | 100GBASE-R | QSFP28 | DAC (3m) / SR4 (100m) / LR4 (10km) / PSM4 (500m) | 2015 | Spine, storage, AI |
|
||
| **200 GbE** | 200GBASE-R | QSFP56 | DAC / SR4 / DR4 | 2019 | AI/ML, HPC |
|
||
| **400 GbE** | 400GBASE-R | QSFP-DD / OSFP | DAC (2.5m) / SR8 (100m) / DR4 (500m) / FR4 (2km) | 2017 | AI training, hyperscale |
|
||
| **800 GbE** | 800GBASE-R | QSFP-DD800 / OSFP | DAC (2m) / SR8 (100m) / DR8 (500m) | 2024 | Next-gen AI/ML |
|
||
|
||
**Recommendations for servers (2026)**:
|
||
- **Standard**: 2× 25 GbE (management + data) or 2× 100 GbE for demanding workloads
|
||
- **AI/ML training**: 8× 400 GbE (InfiniBand preferred for GPU communication)
|
||
- **Storage**: 2× 25/100 GbE (iSCSI/NFS) or dedicated FC (16/32 Gbps)
|
||
|
||
### NIC form factor
|
||
|
||
| Form factor | PCIe lanes | Speed | Use case |
|
||
|------------|-----------|----------|----------|
|
||
| **OCP 3.0** | x8/x16 | 25/100/200 GbE | Modern servers (Dell, HPE), small form factor |
|
||
| **PCIe HHHL** | x8 | 25/50 GbE | Standard 1U/2U servers |
|
||
| **PCIe FHHL** | x16 | 100/200/400 GbE | GPU servers, high-density |
|
||
| **Mezzanine** | x8 | 10/25 GbE | Blade servers (HPE Synergy, Dell MX) |
|
||
| **LOM (LAN on Motherboard)** | — | 1/10/25 GbE | Integrated, basic connectivity |
|
||
|
||
### NIC features
|
||
|
||
| Feature | Description | Benefit |
|
||
|---------|-------|---------|
|
||
| **TSO/GRO** | TCP Segmentation Offload / Generic Receive Offload | Reduced CPU load for TCP |
|
||
| **LRO/LSO** | Large Receive/Send Offload | Equivalent of TSO/GRO for legacy |
|
||
| **RSS** | Receive Side Scaling | Distribution of incoming packets across multiple CPU cores |
|
||
| **RPS/RFS** | Receive Packet Steering / Flow Steering | Software RSS, cache affinity |
|
||
| **XDP** | eXpress Data Path | BPF-based packet processing (DDoS, load balancer) |
|
||
| **RDMA (RoCE v2)** | RDMA over Converged Ethernet | GPU direct communication, storage (NVMe-oF) |
|
||
| **iWARP** | RDMA over TCP | RDMA without special switch (higher latency) |
|
||
| **DPDK** | Data Plane Development Kit | Userspace for packet processing (VNF, vSwitch) |
|
||
| **VXLAN/NVGRE offload** | HW offload for tunneling | Overlay networking (VMware NSX, OpenStack) |
|
||
| **SR-IOV** | Single Root I/O Virtualization | Direct NIC access for VMs (VF), low latency |
|
||
| **Flow Bifurcation** | Split NIC traffic between kernel and DPDK | Concurrent management and high-speed data path |
|
||
| **PTP (IEEE 1588)** | Precision Time Protocol | Financial services, 5G, telco |
|
||
|
||
### NIC selection per workload
|
||
|
||
| Workload | Recommended NIC | Rationale |
|
||
|----------|---------------|------------|
|
||
| **Web / API servers** | 2× 25 GbE SFP28, OCP | Low cost, sufficient bandwidth |
|
||
| **Virtualization (VMware)** | 2× 25 GbE (SR-IOV, VXLAN offload) | SR-IOV for VMs, VXLAN for NSX |
|
||
| **Database (OLTP)** | 2× 25/100 GbE (RSS, low latency) | Low latency, RSS for CPU scaling |
|
||
| **Storage (NFS/iSCSI)** | 2× 25/100 GbE (RoCE v2) | RDMA for NVMe-oF, low latency |
|
||
| **Storage (FC SAN)** | 2× 32 Gb FC HBA | SAN for VMware VMFS, block storage |
|
||
| **AI/ML training** | 8× 400 GbE + InfiniBand NDR | GPU communication, data ingestion |
|
||
| **AI/ML inference** | 4× 100 GbE (RoCE v2) | Model serving, GPU direct |
|
||
| **HPC** | InfiniBand NDR 400 Gbps | MPI communication, low latency |
|
||
| **Telco / Edge** | 2× 25 GbE (DPDK, PTP) | VNF, 5G UPF, low latency |
|
||
|
||
---
|
||
|
||
## Storage connectivity
|
||
|
||
### Fibre Channel (FC) SAN
|
||
|
||
| Generation | Speed | Designation | Form factor | Reach (SMF) | Use case |
|
||
|----------|----------|----------|-------------|-------------|----------|
|
||
| **Gen 5** | 16 Gbps | 16GFC | SFP+ | 10 km | Legacy SAN |
|
||
| **Gen 6** | 32 Gbps | 32GFC | SFP28 | 10 km | Current standard |
|
||
| **Gen 7** | 64 Gbps | 64GFC | SFP56 | 10 km | Emerging, high-performance |
|
||
| **Gen 8** | 128 Gbps | 128GFC | QSFP28 | 10 km | Emerging (first production deployments) |
|
||
|
||
**HBA (Host Bus Adapter)**:
|
||
|
||
| Manufacturer | Model | Speed | PCIe | Ports | Features |
|
||
|---------|-------|----------|------|-------|----------|
|
||
| **Broadcom / Emulex** | LPe35000 | 32 GFC | PCIe 3.0 x8 | 1-2 | NVMe-FC, T10-PI, SR-IOV |
|
||
| **Broadcom / Emulex** | LPe36000 | 64 GFC | PCIe 4.0 x16 | 1-2 | NVMe-FC, FC-NVMe |
|
||
| **Marvell / QLogic** | QLE2770 | 32 GFC | PCIe 3.0 x8 | 1-2 | FC-NVMe, T10-PI |
|
||
| **Marvell / QLogic** | QLE2870 | 64 GFC | PCIe 4.0 x8 | 1-2 | NVMe-FC, 64GFC |
|
||
|
||
**FC SAN topology**:
|
||
|
||
```
|
||
Server ──HBA── FC Switch ──── Storage Array (FC port)
|
||
│ │
|
||
│ ┌────┴────┐
|
||
│ │ Fabric │
|
||
│ └─────────┘
|
||
│
|
||
──── ISL (Inter-Switch Link) ──── backup fabric (B)
|
||
```
|
||
|
||
**Zoning** (FC):
|
||
|
||
```
|
||
Zone A: Server1_HBA1 + Storage_Port1 (production)
|
||
Zone B: Server1_HBA2 + Storage_Port2 (backup fabric)
|
||
Zone C: Backup_Server + Storage_Target (backup)
|
||
```
|
||
|
||
### iSCSI
|
||
|
||
| Property | iSCSI | Note |
|
||
|-----------|-------|----------|
|
||
| **Transport** | TCP/IP (port 3260) | Over standard Ethernet |
|
||
| **Speed** | 1/10/25/100 GbE | Same as Ethernet |
|
||
| **Initiator** | SW (OS) or HW (TOE) | SW initiator free, ~5-10 % CPU load |
|
||
| **Multipathing** | MPIO (Multiple Connections per Session) | Up to 8 paths, active/active or active/passive |
|
||
| **CHAP** | Authentication | Mutual CHAP recommended |
|
||
| **Jumbo frames** | Recommended MTU 9000 | Reduced CPU overhead, higher throughput |
|
||
| **Use case** | Small and medium SAN, backup, DR | Cheaper than FC, lower performance |
|
||
|
||
**iSCSI configuration**:
|
||
|
||
```
|
||
# Software initiator (Linux)
|
||
iscsiadm -m discovery -t sendtargets -p 10.0.0.100:3260
|
||
iscsiadm -m node --login -T iqn.2024-05.storage:array01
|
||
|
||
# Multipath (dm-multipath)
|
||
mpathconf --enable --with_multipathd y
|
||
# /etc/multipath.conf: aliases, failback, rr_min_io
|
||
```
|
||
|
||
### NVMe-oF (NVMe over Fabrics)
|
||
|
||
| Transport | Protocol | Latency | CPU overhead | Use case |
|
||
|-----------|----------|---------|-------------|----------|
|
||
| **NVMe over FC** | FC-NVMe (FC Gen 6/7) | <10 µs | Low | Enterprise SAN, VMware |
|
||
| **NVMe over RDMA (RoCE v2)** | RDMA (RoCE) | <5 µs | Very low | AI/ML, HPC, K8s (CSI) |
|
||
| **NVMe over TCP** | TCP | ~50 µs | Moderate (10-20 % CPU) | Standard Ethernet, no RDMA |
|
||
| **NVMe over InfiniBand** | IB RC/UC | <3 µs | Lowest | HPC, AI training |
|
||
|
||
**NVMe-oF comparison**:
|
||
|
||
| Property | FC-NVMe | NVMe/RoCE | NVMe/TCP | NVMe/IB |
|
||
|-----------|---------|-----------|----------|---------|
|
||
| **Latency (target)** | ~8 µs | ~4 µs | ~50 µs | ~3 µs |
|
||
| **Bandwidth** | 64 Gbps | 100/200 GbE | 25/100 GbE | NDR 400 Gbps |
|
||
| **Requires special HW** | FC HBA + switch | RoCE NIC + DCB switch | Standard NIC | IB HCA + switch |
|
||
| **Ecosystem** | Broadcom, Marvell | NVIDIA, Broadcom | OS built-in | NVIDIA Mellanox |
|
||
| **Use case** | VMware, enterprise SAN | AI/ML, K8s, HPC | SMB, K8s, cost-effective | HPC, large AI |
|
||
|
||
### SAS (Serial Attached SCSI)
|
||
|
||
| Generation | Speed | Cabling | Reach | Use case |
|
||
|----------|----------|---------|-------|----------|
|
||
| **SAS 3** | 12 Gbps | SAS cable (SFF-8644) | 6-10 m | Legacy storage, DAS |
|
||
| **SAS 4** | 22.5 Gbps | SAS cable (SFF-8644) | 6-10 m | Current standard |
|
||
| **SAS 5** | 45 Gbps | SAS cable (SFF-8644) | 6-10 m | Emerging |
|
||
|
||
**SAS topology**: Server → SAS HBA → SAS expander → SAS disk (point-to-point, not shared like FC)
|
||
|
||
---
|
||
|
||
## Server connectivity — decision matrix
|
||
|
||
| Workload | Primary | Secondary | Management |
|
||
|----------|----------|-----------|------------|
|
||
| **Web / API** | 2× 25 GbE (LACP) | — | 1× 1 GbE BMC |
|
||
| **Database** | 2× 25/100 GbE (RSS) | 2× 32 Gb FC (SAN) | 1× 1 GbE BMC |
|
||
| **Virtualization** | 4× 25 GbE (SR-IOV) | 2× 32 Gb FC (VMFS) | 1× 1 GbE BMC |
|
||
| **Kubernetes** | 2× 25/100 GbE | — | 1× 1 GbE BMC |
|
||
| **Storage node** | 2× 100 GbE (RoCE) | 2× 25 GbE (management) | 1× 1 GbE BMC |
|
||
| **AI training** | 8× 400 GbE + IB NDR | 4× 100 GbE (storage) | 1× 1 GbE BMC |
|
||
| **AI inference** | 4× 100 GbE (RoCE) | 2× 25 GbE (management) | 1× 1 GbE BMC |
|
||
| **HPC** | InfiniBand NDR | 2× 100 GbE (storage) | 1× 1 GbE BMC |
|
||
|
||
---
|
||
|
||
## Server NIC placement (PCIe slot optimization)
|
||
|
||
```
|
||
2U Server (GPU/AI):
|
||
┌─────────────────────────────────────────────────┐
|
||
│ PCIe 0: GPU (x16) — NVLink / InfiniBand (x16) │
|
||
│ PCIe 1: GPU (x16) — NIC 100 GbE (x16) │
|
||
│ PCIe 2: GPU (x16) │
|
||
│ PCIe 3: GPU (x16) │
|
||
│ PCIe 4: GPU (x16) │
|
||
│ PCIe 5: GPU (x16) — NIC 100 GbE (x16) │
|
||
│ PCIe 6: Storage HBA / NIC (x8) │
|
||
│ PCIe 7: Management / OCP (x8) │
|
||
└─────────────────────────────────────────────────┘
|
||
|
||
1U Standard:
|
||
┌─────────────────────────────────┐
|
||
│ OCP: 2× 25 GbE (management) │
|
||
│ PCIe 0: NIC 25 GbE (x8) │
|
||
│ PCIe 1: Storage HBA / FC (x8) │
|
||
│ PCIe 2: GPU (x16, optional) │
|
||
│ PCIe 3: NVMe (x4, M.2) │
|
||
└─────────────────────────────────┘
|
||
```
|
||
|
||
### NVIDIA Mellanox ConnectX NICs
|
||
|
||
NVIDIA Mellanox is a leading manufacturer of NIC adapters for AI/HPC and cloud data centers.
|
||
|
||
| Model | PCIe | Max speed | Form factor | Key features |
|
||
|-------|------|-------------|-------------|------------------|
|
||
| **ConnectX-5** | PCIe 3.0 x16 | 100 GbE (dual) | HHHL | RoCE, NVMe-oF target offload, MPI offload |
|
||
| **ConnectX-6 Dx** | PCIe 4.0 x16 | 200 GbE (1-port) / 100 GbE (2-port) | HHHL, OCP 3.0 | ASAP² vSwitch offload, IPsec/TLS inline crypto, AES-XTS, 215 Mpps DPDK |
|
||
| **ConnectX-6 Lx** | PCIe 4.0 x8 | 25 GbE (dual) | HHHL, OCP 3.0 | RoCE, Secure Boot, low-power |
|
||
| **ConnectX-7** | PCIe 5.0 x16 | 400 GbE (1-port) / 200 GbE (2-port) | HHHL | NDR InfiniBand + 400GbE, GPUDirect, SHARP |
|
||
| **ConnectX-8** | PCIe 6.0 x16 | 800 GbE (1-port) / 400 GbE (2-port) | HHHL | XDR InfiniBand, sub-500ns latency, in-network computing, multi-host |
|
||
|
||
**Platforms**: Spectrum-X Ethernet (end-to-end AI networking), Quantum InfiniBand, BlueField DPU.
|
||
|
||
### Broadcom Emulex FC HBA
|
||
|
||
| Model | Speed | PCIe | Ports | Features |
|
||
|-------|----------|------|-------|----------|
|
||
| **LPe35000** (Gen 7) | 32 GFC | PCIe 3.0 x8 | 1-2 | NVMe-FC, T10-PI (DIF), SR-IOV, Silicon Root of Trust |
|
||
| **LPe35002** (Gen 7) | 32 GFC | PCIe 3.0 x8 | 2 | NVMe-FC, Secure Boot, digitally signed firmware |
|
||
| **LPe36000** (Gen 7) | 64 GFC | PCIe 4.0 x16 | 1-2 | First 64GFC HBA on the market, 10M IOPS, 3× better latency than Gen 6 |
|
||
|
||
**Key features**: NVMe over FC support, T10 DIF (Data Integrity Field), 10M MTBF, NIST SP 800-193 compliant. Gen 7 delivers up to 10M IOPS and 3× lower latency compared to Gen 6.
|
||
|
||
### NVMe-oF specification
|
||
|
||
NVMe over Fabrics (NVMe-oF) extends the NVMe protocol from local PCIe to network transports. First specification 1.0 released in June 2016, currently part of NVMe 2.3 (August 2025). Supported transports:
|
||
|
||
| Transport | Specification | Use case |
|
||
|-----------|------------|----------|
|
||
| **NVMe over PCIe** | NVMe Base | Local NVMe SSD |
|
||
| **NVMe over RDMA** (RoCE, InfiniBand, iWARP) | NVMe Transport | AI/ML, HPC, lowest latency <5 µs |
|
||
| **NVMe over TCP** | NVMe Transport | Standard Ethernet, no RDMA, latency ~50 µs |
|
||
| **NVMe over FC** (FC-NVMe) | INCITS T11 | Enterprise SAN, FC fabric |
|
||
|
||
NVMe 2.3 adds Computational Programs Command Set, Storage Level Management (SLM), and Zoned Namespaces (ZNS). NVMe-MI defines the management interface.
|
||
|
||
### Dell PowerEdge R760 — NIC placement
|
||
|
||
Dell R760 server supports:
|
||
- **OCP 3.0** adapters (up to 2×) — 1/10/25/100 GbE
|
||
- **PCIe Gen5** slots — 8× slots (6× FHHL + 2× LP)
|
||
- **LOM** — 2× 1 GbE Broadcom 5720 on motherboard
|
||
- Maximum NIC speed: 100 GbE (QSFP56)
|
||
- Supported types: RJ45, SFP+, SFP28, QSFP28, QSFP56
|
||
|
||
Recommended configurations:
|
||
- Standard: OCP 3.0 2× 25 GbE + PCIe storage HBA
|
||
- AI/ML: PCIe 100 GbE (riser config 1, slot 1-2) + GPU in other slots
|
||
|
||
### HPE Gen11 NIC options
|
||
|
||
HPE ProLiant Gen11 (DL360/DL380) supports:
|
||
- **OCP 3.0** slots (up to 2) — 10/25/100/200 GbE (Broadcom, Intel, NVIDIA Mellanox)
|
||
- **PCIe Gen5** adapters — 8× slots (DL380) / 3× slots (DL360)
|
||
- **iLO 6** dedicated management port (1 GbE)
|
||
- Supported NICs: Broadcom BCM57412 (10GbE), BCM57504 (25GbE), NVIDIA ConnectX-6 Dx (100GbE)
|
||
|
||
## Sources
|
||
|
||
Links, books, and standards: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
|
||
|
||
### Recommended literature
|
||
|
||
| Book | Authors | ISBN | Description |
|
||
|-------|--------|------|-------|
|
||
| AI Data Center Network Design and Technologies (1st ed., 2026) | Subramaniam, Styszynski, Tambakuwala | 978-0-13-543628-8 | First vendor-agnostic guide to network design for AI training and inference. Covers high-radix fabric, lossless Ethernet/IP, UEC technologies, cooling and power for AI clusters. Authors from HPE Juniper Networking. |
|
||
|
||
*Last revision: 2026-06-03*
|