# ๐ŸŽฎ GPU โ€” architecture, models, virtualization ## GPU models ### NVIDIA | GPU | Architecture | VRAM | HBM | FP16 (TFLOPS) | FP8 (TFLOPS) | Interconnect | TDP | |-----|-------------|------|-----|--------------|-------------|-------------|-----| | **A100** | Ampere (2020) | 40/80 GB | HBM2e | 312 | โ€” | NVLink 3 (600 GB/s) | 400 W | | **H100** | Hopper (2022) | 80 GB | HBM3 | 1000 | 2000 (sparse) | NVLink 4 (900 GB/s) | 700 W | | **H200** | Hopper (2023) | 141 GB | HBM3e | 1650 | ~3300 | NVLink 4 (900 GB/s) | 700 W | | **B200** | Blackwell (2024) | 192 GB | HBM3e | 2250 | ~4500 | NVLink 5 (1800 GB/s) | 700 W | | **B100** | Blackwell (2024) | 192 GB | HBM3e | ~1800 | ~3600 | NVLink 5 | 700 W | | **GB200** | Blackwell (2024) | โ€” | HBM3e | 4500 (dual) | 9000 (dual) | NVLink 5 | 2700 W | ### AMD | GPU | Architecture | VRAM | HBM | FP16 (TFLOPS) | Interconnect | TDP | |-----|-------------|------|-----|--------------|-------------|-----| | **MI250X** | CDNA 2 (2021) | 128 GB | HBM2e | 383 | Infinity Fabric | 500 W | | **MI300X** | CDNA 3 (2023) | 192 GB | HBM3 | ~2600 | Infinity Fabric (896 GB/s) | 750 W | | **MI350** | CDNA 4 (2025) | 288 GB | HBM3e | ~3500 | Infinity Fabric | 750 W | ## GPU interconnects | Technology | Provider | Bandwidth | Topology | Use case | |------------|-------------|-----------|-----------|----------| | **NVLink 4** | NVIDIA | 900 GB/s (18ร— 50 GB/s) | GPU-GPU direct | AI training (H100, H200) | | **NVLink 5** | NVIDIA | 1800 GB/s (18ร— 100 GB/s) | GPU-GPU direct | AI training (B200, GB200) | | **Infinity Fabric** | AMD | 896 GB/s | GPU-GPU + CPU-GPU | AI training (MI300X, MI350) | | **NVSwitch** | NVIDIA | 900 GB/s per GPU (NVLink) | Full-mesh (256 GPU) | DGX SuperPOD, HGX | | **InfiniBand (NDR)** | NVIDIA/Mellanox | 400 Gbps per port | GPU-NIC direct, RDMA | Distributed training, HPC | | **PCIe 5.0** | Standard | 63 GB/s per x16 | CPU-GPU | Inference, rendering | | **Ethernet (RoCE v2)** | Standard | 100/200/400 GbE | GPU-NIC, RDMA over converged ethernet | AI inference, storage | ### GPU direct communication ``` GPU 0 โ”€โ”€NVLinkโ”€โ”€ GPU 1 GPU 0 โ”€โ”€โ”€PCIeโ”€โ”€โ”€ CPU โ”€โ”€โ”€PCIeโ”€โ”€โ”€ GPU 1 โ”‚ โ”‚ โ”‚ โ”‚ NVSwitch InfiniBand โ”‚ โ”‚ โ”‚ โ”‚ GPU 2 โ”€โ”€NVLinkโ”€โ”€ GPU 3 GPU 2 โ”€โ”€โ”€PCIeโ”€โ”€โ”€ CPU โ”€โ”€โ”€PCIeโ”€โ”€โ”€ GPU 3 NVLink topologie (GPU direct) PCIe topologie (CPU mediated) ``` - **GPU Direct RDMA** โ€” GPU โ†” NIC without CPU (InfiniBand, RoCE) - **GPU Direct Storage** โ€” GPU โ†” NVMe without CPU (NVIDIA Magnum IO) - **NVSwitch** โ€” full bisection bandwidth between all GPUs in a node ## GPU virtualization | Technology | Description | GPU support | Use case | |------------|-------|-------------|----------| | **NVIDIA vGPU (Grid)** | Time slicing + dedicated profiles | A-series (VDI), Q-series (pro viz), B-series (AI) | VDI, virtualized AI | | **NVIDIA MIG** | Hardware GPU partitioning | A100 (7 inst.), H100/H200/B200 | AI inference, multi-tenant GPU | | **AMD MxGPU** | SR-IOV, hardware partitioning | AMD MI (pro), Radeon Pro | VDI, cloud gaming | | **Intel SG (SG1)** | SR-IOV, hardware partitioning | Intel SG1, Flex, Arc | VDI, media transcoding | | **GPU passthrough** | Dedicated GPU to whole VM (VFIO-pci) | All GPUs | AI training, HPC, highest performance | ### MIG partition table (A100 / H100) | GPU | Partition profile | GPU Memory | Compute units | |-----|------------------|-----------|--------------| | **A100 80 GB** | 1g.5gb | 5 GB | 1 | | A100 80 GB | 2g.10gb | 10 GB | 2 | | A100 80 GB | 3g.20gb | 20 GB | 3 | | A100 80 GB | 7g.40gb | 40 GB | 7 | | A100 80 GB | Full (7ร— 1g) | 7 ร— 5 GB | 7 instances | | **H100 80 GB** | 1g.6gb+me | 6 GB | 1 | | H100 80 GB | 2g.12gb+me | 12 GB | 2 | | H100 80 GB | 3g.24gb+me | 24 GB | 3 | | H100 80 GB | 7g.80gb | 80 GB | 7 | ## GPU use cases ### AI Training - **Models**: LLM (70B-405B+), vision, multimodal - **GPU**: H100, B200, GB200, MI300X - **Interconnect**: NVLink 5 / Infinity Fabric (within node), InfiniBand NDR (between nodes) - **Parallelism**: Data Parallel (DDP), Tensor Parallel (TP), Pipeline Parallel (PP), Fully Sharded (FSDP) - **Framework**: PyTorch (NCCL), JAX (XLA), DeepSpeed, Megatron-LM - **Tips**: - GB200: 2ร— B200 connected via NVLink, 8 GPU โ†’ 4 GB200 - DGX B200 / HGX B200: standard building block - InfiniBand: fat tree topology for all-reduce optimization ### AI Inference - **Models**: LLM serving, embedding, image gen - **GPU**: A100, H200, B200 (larger VRAM for larger models) - **Techniques**: MIG partition, TensorRT-LLM, vLLM, Triton Inference Server - **Quantization**: FP8, INT8, INT4 โ†’ lower VRAM, higher throughput - **Latency**: batch size optimization, dynamic batching, continuous batching - **Scale**: on-prem (2-32 GPU) / cloud (elastic) ### VDI (Virtual Desktop Infrastructure) - **GPU**: NVIDIA A16 (1 GPU = 16 users), A10 (1 GPU = 4 users) - **Technology**: vGPU (Grid), AMD MxGPU - **Protocols**: VMware Blast, Citrix HDX, Microsoft RDP, PC-over-IP (HP Teradici) - **Use case**: CAD (CATIA, SolidWorks), Office, engineering, healthcare (PACS) ### Rendering and VFX - **GPU**: NVIDIA RTX 6000 Ada, RTX A6000, AMD Radeon Pro W7900 - **Rendering**: Blender (Cycles/OptiX), V-Ray, Octane Render, Redshift - **Denoising**: AI-accelerated denoising on GPU - **Farm rendering**: Deadline, Qube! (job scheduler) ## GPU pricing Detailed pricing comparisons (purchase price, cloud on-demand, $/M token inference cost, $/GB HBM, price trends 2024โ†’2026) see: - [AI-INFRASTRUCTURE.en.md โ€” GPU pricing and price/performance](AI-INFRASTRUCTURE.en.md#gpu-pricing-and-priceperformance) ## GPU server form factors | Form factor | GPU count | Power | Cooling | Example | |------------|-----------|-------|---------|---------| | **1U** | 1-2 | 700-1400 W | Air (high-RPM) | Dell XR4510c | | **2U** | 4-8 | 3-6 kW | Air / Liquid | Dell R760xa, HPE DL380a | | **4U** | 8-10 | 5-8 kW | Liquid | NVIDIA DGX H100, Dell R760xa | | **8U / Chassis** | 8-16 | 10-20 kW | Liquid (CDU) | NVIDIA HGX, Supermicro SYS-821GE | ## OpenStack Cyborg (GPU lifecycle management) Cyborg is an OpenStack service for managing accelerators (GPU, FPGA, DPU, NPU). ### Key capabilities - **Discovery** โ€” automatic GPU detection on compute nodes (NVIDIA, AMD, Intel) - **Inventory** โ€” tracking available accelerators in the cluster - **Lifecycle** โ€” attach/detach GPU to VM, firmware update, reset - **Scheduling** โ€” Placement API for GPU-aware scheduling (Nova) - **Cyborg API** โ€” REST API for accelerator management ### Integration | Component | Role | |------------|------| | **Nova** | VM scheduling with GPU requirements (extra_specs: `accel:device_profile`) | | **Placement** | Resource provider for GPU (inventory, traits) | | **Neutron** | SR-IOV VF passthrough for GPU networking | | **Ironic** | Bare metal + GPU provisioning | ## Sources Links, books and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md) *Last revision: 2026-06-03*