🎮 GPU — architektura, modely, virtualizace
GPU modely
NVIDIA
| GPU |
Architektura |
VRAM |
HBM |
FP16 (TFLOPS) |
FP8 (TFLOPS) |
Interconnect |
TDP |
| A100 |
Ampere (2020) |
40/80 GB |
HBM2e |
312 |
— |
NVLink 3 (600 GB/s) |
400 W |
| H100 |
Hopper (2022) |
80 GB |
HBM3 |
1000 |
2000 (sparse) |
NVLink 4 (900 GB/s) |
700 W |
| H200 |
Hopper (2023) |
141 GB |
HBM3e |
1650 |
~3300 |
NVLink 4 (900 GB/s) |
700 W |
| B200 |
Blackwell (2024) |
192 GB |
HBM3e |
2250 |
~4500 |
NVLink 5 (1800 GB/s) |
700 W |
| B100 |
Blackwell (2024) |
192 GB |
HBM3e |
~1800 |
~3600 |
NVLink 5 |
700 W |
| GB200 |
Blackwell (2024) |
— |
HBM3e |
4500 (dual) |
9000 (dual) |
NVLink 5 |
2700 W |
AMD
| GPU |
Architektura |
VRAM |
HBM |
FP16 (TFLOPS) |
Interconnect |
TDP |
| MI250X |
CDNA 2 (2021) |
128 GB |
HBM2e |
383 |
Infinity Fabric |
500 W |
| MI300X |
CDNA 3 (2023) |
192 GB |
HBM3 |
~2600 |
Infinity Fabric (896 GB/s) |
750 W |
| MI350 |
CDNA 4 (2025) |
288 GB |
HBM3e |
~3500 |
Infinity Fabric |
750 W |
GPU interconnects
| Technologie |
Poskytovatel |
Bandwidth |
Topologie |
Use case |
| NVLink 4 |
NVIDIA |
900 GB/s (18× 50 GB/s) |
GPU-GPU direct |
AI training (H100, H200) |
| NVLink 5 |
NVIDIA |
1800 GB/s (18× 100 GB/s) |
GPU-GPU direct |
AI training (B200, GB200) |
| Infinity Fabric |
AMD |
896 GB/s |
GPU-GPU + CPU-GPU |
AI training (MI300X, MI350) |
| NVSwitch |
NVIDIA |
900 GB/s per GPU (NVLink) |
Full-mesh (256 GPU) |
DGX SuperPOD, HGX |
| InfiniBand (NDR) |
NVIDIA/Mellanox |
400 Gbps per port |
GPU-NIC direct, RDMA |
Distributed training, HPC |
| PCIe 5.0 |
Standard |
63 GB/s per x16 |
CPU-GPU |
Inference, rendering |
| Ethernet (RoCE v2) |
Standard |
100/200/400 GbE |
GPU-NIC, RDMA over converged ethernet |
AI inference, storage |
GPU direct communication
- GPU Direct RDMA — GPU ↔ NIC bez CPU (InfiniBand, RoCE)
- GPU Direct Storage — GPU ↔ NVMe bez CPU (NVIDIA Magnum IO)
- NVSwitch — full bisection bandwidth mezi všemi GPU v node
Virtualizace GPU
| Technologie |
Popis |
GPU support |
Use case |
| NVIDIA vGPU (Grid) |
Časové slicing + dedikované profily |
A-series (VDI), Q-series (pro viz), B-series (AI) |
VDI, virtualizované AI |
| NVIDIA MIG |
Hardwarové partition GPU |
A100 (7 inst.), H100/H200/B200 |
AI inference, multi-tenant GPU |
| AMD MxGPU |
SR-IOV, hardwarové partition |
AMD MI (pro), Radeon Pro |
VDI, cloud gaming |
| Intel SG (SG1) |
SR-IOV, hardwarové partition |
Intel SG1, Flex, Arc |
VDI, media transcoding |
| GPU passthrough |
Dedikovaný GPU celé VM (VFIO-pci) |
Všechny GPU |
AI training, HPC, nejvyšší výkon |
MIG partition table (A100 / H100)
| GPU |
Partition profile |
GPU Memory |
Compute units |
| A100 80 GB |
1g.5gb |
5 GB |
1 |
| A100 80 GB |
2g.10gb |
10 GB |
2 |
| A100 80 GB |
3g.20gb |
20 GB |
3 |
| A100 80 GB |
7g.40gb |
40 GB |
7 |
| A100 80 GB |
Full (7× 1g) |
7 × 5 GB |
7 instances |
| H100 80 GB |
1g.6gb+me |
6 GB |
1 |
| H100 80 GB |
2g.12gb+me |
12 GB |
2 |
| H100 80 GB |
3g.24gb+me |
24 GB |
3 |
| H100 80 GB |
7g.80gb |
80 GB |
7 |
GPU use cases
AI Training
- Modely: LLM (70B-405B+), vision, multimodal
- GPU: H100, B200, GB200, MI300X
- Interconnect: NVLink 5 / Infinity Fabric (v rámci node), InfiniBand NDR (mezi nody)
- Parallelism: Data Parallel (DDP), Tensor Parallel (TP), Pipeline Parallel (PP), Fully Sharded (FSDP)
- Framework: PyTorch (NCCL), JAX (XLA), DeepSpeed, Megatron-LM
- Tipy:
- GB200: 2× B200 propojené NVLink, 8 GPU → 4 GB200
- DGX B200 / HGX B200: standardní building block
- InfiniBand: fat tree topology pro all-reduce optimalizaci
AI Inference
- Modely: LLM serving, embedding, image gen
- GPU: A100, H200, B200 (larger VRAM pro větší modely)
- Techniky: MIG partition, TensorRT-LLM, vLLM, Triton Inference Server
- Kvantizace: FP8, INT8, INT4 → nižší VRAM, vyšší throughput
- Latency: batch size optimalizace, dynamic batching, continuous batching
- Scale: on-prem (2-32 GPU) / cloud (elastic)
VDI (Virtual Desktop Infrastructure)
- GPU: NVIDIA A16 (1 GPU = 16 users), A10 (1 GPU = 4 users)
- Technologie: vGPU (Grid), AMD MxGPU
- Protokoly: VMware Blast, Citrix HDX, Microsoft RDP, PC-over-IP (HP Teradici)
- Use case: CAD (CATIA, SolidWorks), Office, engineering, healthcare (PACS)
Rendering a VFX
- GPU: NVIDIA RTX 6000 Ada, RTX A6000, AMD Radeon Pro W7900
- Rendering: Blender (Cycles/OptiX), V-Ray, Octane Render, Redshift
- Denoising: AI-accelerated denoising na GPU
- Farm rendering: Deadline, Qube! (job scheduler)
GPU server form factors
| Form factor |
GPU count |
Power |
Cooling |
Příklad |
| 1U |
1-2 |
700-1400 W |
Air (high-RPM) |
Dell XR4510c |
| 2U |
4-8 |
3-6 kW |
Air / Liquid |
Dell R760xa, HPE DL380a |
| 4U |
8-10 |
5-8 kW |
Liquid |
NVIDIA DGX H100, Dell R760xa |
| 8U / Chassis |
8-16 |
10-20 kW |
Liquid (CDU) |
NVIDIA HGX, Supermicro SYS-821GE |
OpenStack Cyborg (GPU lifecycle management)
Cyborg je OpenStack service pro správu akcelerátorů (GPU, FPGA, DPU, NPU).
Klíčové schopnosti
- Discovery — automatická detekce GPU na compute node (NVIDIA, AMD, Intel)
- Inventory — tracking dostupných akcelerátorů v clusteru
- Lifecycle — attach/detach GPU k VM, firmware update, reset
- Scheduling — Placement API pro GPU-aware scheduling (Nova)
- Cyborg API — REST API pro správu akcelerátorů
Integrace
| Komponenta |
Role |
| Nova |
VM scheduling s GPU požadavky (extra_specs: accel:device_profile) |
| Placement |
Resource provider pro GPU (inventory, traits) |
| Neutron |
SR-IOV VF passthrough pro GPU networking |
| Ironic |
Bare metal + GPU provisioning |
Zdroje
Odkazy, knihy a standardy: sources/infrastructure/sources.md
Poslední revize: 2026-06-03