🎮 GPU — architecture, models, virtualization
GPU models
NVIDIA
| GPU |
Architecture |
VRAM |
HBM |
FP16 (TFLOPS) |
FP8 (TFLOPS) |
Interconnect |
TDP |
| A100 |
Ampere (2020) |
40/80 GB |
HBM2e |
312 |
— |
NVLink 3 (600 GB/s) |
400 W |
| H100 |
Hopper (2022) |
80 GB |
HBM3 |
1000 |
2000 (sparse) |
NVLink 4 (900 GB/s) |
700 W |
| H200 |
Hopper (2023) |
141 GB |
HBM3e |
1650 |
~3300 |
NVLink 4 (900 GB/s) |
700 W |
| B200 |
Blackwell (2024) |
192 GB |
HBM3e |
2250 |
~4500 |
NVLink 5 (1800 GB/s) |
700 W |
| B100 |
Blackwell (2024) |
192 GB |
HBM3e |
~1800 |
~3600 |
NVLink 5 |
700 W |
| GB200 |
Blackwell (2024) |
— |
HBM3e |
4500 (dual) |
9000 (dual) |
NVLink 5 |
2700 W |
AMD
| GPU |
Architecture |
VRAM |
HBM |
FP16 (TFLOPS) |
Interconnect |
TDP |
| MI250X |
CDNA 2 (2021) |
128 GB |
HBM2e |
383 |
Infinity Fabric |
500 W |
| MI300X |
CDNA 3 (2023) |
192 GB |
HBM3 |
~2600 |
Infinity Fabric (896 GB/s) |
750 W |
| MI350 |
CDNA 4 (2025) |
288 GB |
HBM3e |
~3500 |
Infinity Fabric |
750 W |
GPU interconnects
| Technology |
Provider |
Bandwidth |
Topology |
Use case |
| NVLink 4 |
NVIDIA |
900 GB/s (18× 50 GB/s) |
GPU-GPU direct |
AI training (H100, H200) |
| NVLink 5 |
NVIDIA |
1800 GB/s (18× 100 GB/s) |
GPU-GPU direct |
AI training (B200, GB200) |
| Infinity Fabric |
AMD |
896 GB/s |
GPU-GPU + CPU-GPU |
AI training (MI300X, MI350) |
| NVSwitch |
NVIDIA |
900 GB/s per GPU (NVLink) |
Full-mesh (256 GPU) |
DGX SuperPOD, HGX |
| InfiniBand (NDR) |
NVIDIA/Mellanox |
400 Gbps per port |
GPU-NIC direct, RDMA |
Distributed training, HPC |
| PCIe 5.0 |
Standard |
63 GB/s per x16 |
CPU-GPU |
Inference, rendering |
| Ethernet (RoCE v2) |
Standard |
100/200/400 GbE |
GPU-NIC, RDMA over converged ethernet |
AI inference, storage |
GPU direct communication
- GPU Direct RDMA — GPU ↔ NIC without CPU (InfiniBand, RoCE)
- GPU Direct Storage — GPU ↔ NVMe without CPU (NVIDIA Magnum IO)
- NVSwitch — full bisection bandwidth between all GPUs in a node
GPU virtualization
| Technology |
Description |
GPU support |
Use case |
| NVIDIA vGPU (Grid) |
Time slicing + dedicated profiles |
A-series (VDI), Q-series (pro viz), B-series (AI) |
VDI, virtualized AI |
| NVIDIA MIG |
Hardware GPU partitioning |
A100 (7 inst.), H100/H200/B200 |
AI inference, multi-tenant GPU |
| AMD MxGPU |
SR-IOV, hardware partitioning |
AMD MI (pro), Radeon Pro |
VDI, cloud gaming |
| Intel SG (SG1) |
SR-IOV, hardware partitioning |
Intel SG1, Flex, Arc |
VDI, media transcoding |
| GPU passthrough |
Dedicated GPU to whole VM (VFIO-pci) |
All GPUs |
AI training, HPC, highest performance |
MIG partition table (A100 / H100)
| GPU |
Partition profile |
GPU Memory |
Compute units |
| A100 80 GB |
1g.5gb |
5 GB |
1 |
| A100 80 GB |
2g.10gb |
10 GB |
2 |
| A100 80 GB |
3g.20gb |
20 GB |
3 |
| A100 80 GB |
7g.40gb |
40 GB |
7 |
| A100 80 GB |
Full (7× 1g) |
7 × 5 GB |
7 instances |
| H100 80 GB |
1g.6gb+me |
6 GB |
1 |
| H100 80 GB |
2g.12gb+me |
12 GB |
2 |
| H100 80 GB |
3g.24gb+me |
24 GB |
3 |
| H100 80 GB |
7g.80gb |
80 GB |
7 |
GPU use cases
AI Training
- Models: LLM (70B-405B+), vision, multimodal
- GPU: H100, B200, GB200, MI300X
- Interconnect: NVLink 5 / Infinity Fabric (within node), InfiniBand NDR (between nodes)
- Parallelism: Data Parallel (DDP), Tensor Parallel (TP), Pipeline Parallel (PP), Fully Sharded (FSDP)
- Framework: PyTorch (NCCL), JAX (XLA), DeepSpeed, Megatron-LM
- Tips:
- GB200: 2× B200 connected via NVLink, 8 GPU → 4 GB200
- DGX B200 / HGX B200: standard building block
- InfiniBand: fat tree topology for all-reduce optimization
AI Inference
- Models: LLM serving, embedding, image gen
- GPU: A100, H200, B200 (larger VRAM for larger models)
- Techniques: MIG partition, TensorRT-LLM, vLLM, Triton Inference Server
- Quantization: FP8, INT8, INT4 → lower VRAM, higher throughput
- Latency: batch size optimization, dynamic batching, continuous batching
- Scale: on-prem (2-32 GPU) / cloud (elastic)
VDI (Virtual Desktop Infrastructure)
- GPU: NVIDIA A16 (1 GPU = 16 users), A10 (1 GPU = 4 users)
- Technology: vGPU (Grid), AMD MxGPU
- Protocols: VMware Blast, Citrix HDX, Microsoft RDP, PC-over-IP (HP Teradici)
- Use case: CAD (CATIA, SolidWorks), Office, engineering, healthcare (PACS)
Rendering and VFX
- GPU: NVIDIA RTX 6000 Ada, RTX A6000, AMD Radeon Pro W7900
- Rendering: Blender (Cycles/OptiX), V-Ray, Octane Render, Redshift
- Denoising: AI-accelerated denoising on GPU
- Farm rendering: Deadline, Qube! (job scheduler)
GPU pricing
Detailed pricing comparisons (purchase price, cloud on-demand, $/M token inference cost, $/GB HBM, price trends 2024→2026) see:
GPU server form factors
| Form factor |
GPU count |
Power |
Cooling |
Example |
| 1U |
1-2 |
700-1400 W |
Air (high-RPM) |
Dell XR4510c |
| 2U |
4-8 |
3-6 kW |
Air / Liquid |
Dell R760xa, HPE DL380a |
| 4U |
8-10 |
5-8 kW |
Liquid |
NVIDIA DGX H100, Dell R760xa |
| 8U / Chassis |
8-16 |
10-20 kW |
Liquid (CDU) |
NVIDIA HGX, Supermicro SYS-821GE |
OpenStack Cyborg (GPU lifecycle management)
Cyborg is an OpenStack service for managing accelerators (GPU, FPGA, DPU, NPU).
Key capabilities
- Discovery — automatic GPU detection on compute nodes (NVIDIA, AMD, Intel)
- Inventory — tracking available accelerators in the cluster
- Lifecycle — attach/detach GPU to VM, firmware update, reset
- Scheduling — Placement API for GPU-aware scheduling (Nova)
- Cyborg API — REST API for accelerator management
Integration
| Component |
Role |
| Nova |
VM scheduling with GPU requirements (extra_specs: accel:device_profile) |
| Placement |
Resource provider for GPU (inventory, traits) |
| Neutron |
SR-IOV VF passthrough for GPU networking |
| Ironic |
Bare metal + GPU provisioning |
Sources
Links, books and standards: sources/infrastructure/sources.en.md
Last revision: 2026-06-03