Fossil/knowledge-base

Fork 0

Files

Stanislav Hubacek ef3c2f75b1 18.6.2026

2026-06-18 16:25:33 +02:00

7.0 KiB

Raw Blame History

🎮 GPU — architecture, models, virtualization

GPU models

NVIDIA

GPU	Architecture	VRAM	HBM	FP16 (TFLOPS)	FP8 (TFLOPS)	Interconnect	TDP
A100	Ampere (2020)	40/80 GB	HBM2e	312	—	NVLink 3 (600 GB/s)	400 W
H100	Hopper (2022)	80 GB	HBM3	1000	2000 (sparse)	NVLink 4 (900 GB/s)	700 W
H200	Hopper (2023)	141 GB	HBM3e	1650	~3300	NVLink 4 (900 GB/s)	700 W
B200	Blackwell (2024)	192 GB	HBM3e	2250	~4500	NVLink 5 (1800 GB/s)	700 W
B100	Blackwell (2024)	192 GB	HBM3e	~1800	~3600	NVLink 5	700 W
GB200	Blackwell (2024)	—	HBM3e	4500 (dual)	9000 (dual)	NVLink 5	2700 W

AMD

GPU	Architecture	VRAM	HBM	FP16 (TFLOPS)	Interconnect	TDP
MI250X	CDNA 2 (2021)	128 GB	HBM2e	383	Infinity Fabric	500 W
MI300X	CDNA 3 (2023)	192 GB	HBM3	~2600	Infinity Fabric (896 GB/s)	750 W
MI350	CDNA 4 (2025)	288 GB	HBM3e	~3500	Infinity Fabric	750 W

GPU interconnects

Technology	Provider	Bandwidth	Topology	Use case
NVLink 4	NVIDIA	900 GB/s (18× 50 GB/s)	GPU-GPU direct	AI training (H100, H200)
NVLink 5	NVIDIA	1800 GB/s (18× 100 GB/s)	GPU-GPU direct	AI training (B200, GB200)
Infinity Fabric	AMD	896 GB/s	GPU-GPU + CPU-GPU	AI training (MI300X, MI350)
NVSwitch	NVIDIA	900 GB/s per GPU (NVLink)	Full-mesh (256 GPU)	DGX SuperPOD, HGX
InfiniBand (NDR)	NVIDIA/Mellanox	400 Gbps per port	GPU-NIC direct, RDMA	Distributed training, HPC
PCIe 5.0	Standard	63 GB/s per x16	CPU-GPU	Inference, rendering
Ethernet (RoCE v2)	Standard	100/200/400 GbE	GPU-NIC, RDMA over converged ethernet	AI inference, storage

GPU direct communication

GPU 0 ──NVLink── GPU 1        GPU 0 ───PCIe─── CPU ───PCIe─── GPU 1
  │                            │
  │                            │
NVSwitch                     InfiniBand
  │                            │
  │                            │
GPU 2 ──NVLink── GPU 3        GPU 2 ───PCIe─── CPU ───PCIe─── GPU 3

NVLink topologie (GPU direct)   PCIe topologie (CPU mediated)

GPU Direct RDMA — GPU ↔ NIC without CPU (InfiniBand, RoCE)
GPU Direct Storage — GPU ↔ NVMe without CPU (NVIDIA Magnum IO)
NVSwitch — full bisection bandwidth between all GPUs in a node

GPU virtualization

Technology	Description	GPU support	Use case
NVIDIA vGPU (Grid)	Time slicing + dedicated profiles	A-series (VDI), Q-series (pro viz), B-series (AI)	VDI, virtualized AI
NVIDIA MIG	Hardware GPU partitioning	A100 (7 inst.), H100/H200/B200	AI inference, multi-tenant GPU
AMD MxGPU	SR-IOV, hardware partitioning	AMD MI (pro), Radeon Pro	VDI, cloud gaming
Intel SG (SG1)	SR-IOV, hardware partitioning	Intel SG1, Flex, Arc	VDI, media transcoding
GPU passthrough	Dedicated GPU to whole VM (VFIO-pci)	All GPUs	AI training, HPC, highest performance

MIG partition table (A100 / H100)

GPU	Partition profile	GPU Memory	Compute units
A100 80 GB	1g.5gb	5 GB	1
A100 80 GB	2g.10gb	10 GB	2
A100 80 GB	3g.20gb	20 GB	3
A100 80 GB	7g.40gb	40 GB	7
A100 80 GB	Full (7× 1g)	7 × 5 GB	7 instances
H100 80 GB	1g.6gb+me	6 GB	1
H100 80 GB	2g.12gb+me	12 GB	2
H100 80 GB	3g.24gb+me	24 GB	3
H100 80 GB	7g.80gb	80 GB	7

GPU use cases

AI Training

Models: LLM (70B-405B+), vision, multimodal
GPU: H100, B200, GB200, MI300X
Interconnect: NVLink 5 / Infinity Fabric (within node), InfiniBand NDR (between nodes)
Parallelism: Data Parallel (DDP), Tensor Parallel (TP), Pipeline Parallel (PP), Fully Sharded (FSDP)
Framework: PyTorch (NCCL), JAX (XLA), DeepSpeed, Megatron-LM
Tips:
- GB200: 2× B200 connected via NVLink, 8 GPU → 4 GB200
- DGX B200 / HGX B200: standard building block
- InfiniBand: fat tree topology for all-reduce optimization

AI Inference

Models: LLM serving, embedding, image gen
GPU: A100, H200, B200 (larger VRAM for larger models)
Techniques: MIG partition, TensorRT-LLM, vLLM, Triton Inference Server
Quantization: FP8, INT8, INT4 → lower VRAM, higher throughput
Latency: batch size optimization, dynamic batching, continuous batching
Scale: on-prem (2-32 GPU) / cloud (elastic)

VDI (Virtual Desktop Infrastructure)

GPU: NVIDIA A16 (1 GPU = 16 users), A10 (1 GPU = 4 users)
Technology: vGPU (Grid), AMD MxGPU
Protocols: VMware Blast, Citrix HDX, Microsoft RDP, PC-over-IP (HP Teradici)
Use case: CAD (CATIA, SolidWorks), Office, engineering, healthcare (PACS)

Rendering and VFX

GPU: NVIDIA RTX 6000 Ada, RTX A6000, AMD Radeon Pro W7900
Rendering: Blender (Cycles/OptiX), V-Ray, Octane Render, Redshift
Denoising: AI-accelerated denoising on GPU
Farm rendering: Deadline, Qube! (job scheduler)

GPU pricing

Detailed pricing comparisons (purchase price, cloud on-demand, $/M token inference cost, $/GB HBM, price trends 2024→2026) see:

AI-INFRASTRUCTURE.en.md — GPU pricing and price/performance

GPU server form factors

Form factor	GPU count	Power	Cooling	Example
1U	1-2	700-1400 W	Air (high-RPM)	Dell XR4510c
2U	4-8	3-6 kW	Air / Liquid	Dell R760xa, HPE DL380a
4U	8-10	5-8 kW	Liquid	NVIDIA DGX H100, Dell R760xa
8U / Chassis	8-16	10-20 kW	Liquid (CDU)	NVIDIA HGX, Supermicro SYS-821GE

OpenStack Cyborg (GPU lifecycle management)

Cyborg is an OpenStack service for managing accelerators (GPU, FPGA, DPU, NPU).

Key capabilities

Discovery — automatic GPU detection on compute nodes (NVIDIA, AMD, Intel)
Inventory — tracking available accelerators in the cluster
Lifecycle — attach/detach GPU to VM, firmware update, reset
Scheduling — Placement API for GPU-aware scheduling (Nova)
Cyborg API — REST API for accelerator management

Integration

Component	Role
Nova	VM scheduling with GPU requirements (extra_specs: `accel:device_profile`)
Placement	Resource provider for GPU (inventory, traits)
Neutron	SR-IOV VF passthrough for GPU networking
Ironic	Bare metal + GPU provisioning

Sources

Links, books and standards: sources/infrastructure/sources.en.md

Last revision: 2026-06-03

7.0 KiB Raw Blame History Unescape Escape