From 0c2002c5409eb76fbd8da1254390170fab63e961 Mon Sep 17 00:00:00 2001 From: Stanislav Hubacek Date: Thu, 11 Jun 2026 13:54:16 +0200 Subject: [PATCH] Delete GPU.md --- GPU.md | 128 --------------------------------------------------------- 1 file changed, 128 deletions(-) delete mode 100644 GPU.md diff --git a/GPU.md b/GPU.md deleted file mode 100644 index 6529b1c..0000000 --- a/GPU.md +++ /dev/null @@ -1,128 +0,0 @@ -# 🎮 GPU — architektura, modely, virtualizace - -## GPU modely - -### NVIDIA - -| GPU | Architektura | VRAM | HBM | FP16 (TFLOPS) | FP8 (TFLOPS) | Interconnect | TDP | -|-----|-------------|------|-----|--------------|-------------|-------------|-----| -| **A100** | Ampere (2020) | 40/80 GB | HBM2e | 312 | — | NVLink 3 (600 GB/s) | 400 W | -| **H100** | Hopper (2022) | 80 GB | HBM3 | 1000 | 2000 (sparse) | NVLink 4 (900 GB/s) | 700 W | -| **H200** | Hopper (2023) | 141 GB | HBM3e | 1650 | ~3300 | NVLink 4 (900 GB/s) | 700 W | -| **B200** | Blackwell (2024) | 192 GB | HBM3e | 2250 | ~4500 | NVLink 5 (1800 GB/s) | 700 W | -| **B100** | Blackwell (2024) | 192 GB | HBM3e | ~1800 | ~3600 | NVLink 5 | 700 W | -| **GB200** | Blackwell (2024) | — | HBM3e | 4500 (dual) | 9000 (dual) | NVLink 5 | 2700 W | - -### AMD - -| GPU | Architektura | VRAM | HBM | FP16 (TFLOPS) | Interconnect | TDP | -|-----|-------------|------|-----|--------------|-------------|-----| -| **MI250X** | CDNA 2 (2021) | 128 GB | HBM2e | 383 | Infinity Fabric | 500 W | -| **MI300X** | CDNA 3 (2023) | 192 GB | HBM3 | ~2600 | Infinity Fabric (896 GB/s) | 750 W | -| **MI350** | CDNA 4 (2025) | 288 GB | HBM3e | ~3500 | Infinity Fabric | 750 W | - -## GPU interconnects - -| Technologie | Poskytovatel | Bandwidth | Topologie | Use case | -|------------|-------------|-----------|-----------|----------| -| **NVLink 4** | NVIDIA | 900 GB/s (18× 50 GB/s) | GPU-GPU direct | AI training (H100, H200) | -| **NVLink 5** | NVIDIA | 1800 GB/s (18× 100 GB/s) | GPU-GPU direct | AI training (B200, GB200) | -| **Infinity Fabric** | AMD | 896 GB/s | GPU-GPU + CPU-GPU | AI training (MI300X, MI350) | -| **NVSwitch** | NVIDIA | 900 GB/s per GPU (NVLink) | Full-mesh (256 GPU) | DGX SuperPOD, HGX | -| **InfiniBand (NDR)** | NVIDIA/Mellanox | 400 Gbps per port | GPU-NIC direct, RDMA | Distributed training, HPC | -| **PCIe 5.0** | Standard | 63 GB/s per x16 | CPU-GPU | Inference, rendering | -| **Ethernet (RoCE v2)** | Standard | 100/200/400 GbE | GPU-NIC, RDMA over converged ethernet | AI inference, storage | - -### GPU direct communication - -``` -GPU 0 ──NVLink── GPU 1 GPU 0 ───PCIe─── CPU ───PCIe─── GPU 1 - │ │ - │ │ -NVSwitch InfiniBand - │ │ - │ │ -GPU 2 ──NVLink── GPU 3 GPU 2 ───PCIe─── CPU ───PCIe─── GPU 3 - -NVLink topologie (GPU direct) PCIe topologie (CPU mediated) -``` - -- **GPU Direct RDMA** — GPU ↔ NIC bez CPU (InfiniBand, RoCE) -- **GPU Direct Storage** — GPU ↔ NVMe bez CPU (NVIDIA Magnum IO) -- **NVSwitch** — full bisection bandwidth mezi všemi GPU v node - -## Virtualizace GPU - -| Technologie | Popis | GPU support | Use case | -|------------|-------|-------------|----------| -| **NVIDIA vGPU (Grid)** | Časové slicing + dedikované profily | A-series (VDI), Q-series (pro viz), B-series (AI) | VDI, virtualizované AI | -| **NVIDIA MIG** | Hardwarové partition GPU | A100 (7 inst.), H100/H200/B200 | AI inference, multi-tenant GPU | -| **AMD MxGPU** | SR-IOV, hardwarové partition | AMD MI (pro), Radeon Pro | VDI, cloud gaming | -| **Intel SG (SG1)** | SR-IOV, hardwarové partition | Intel SG1, Flex, Arc | VDI, media transcoding | -| **GPU passthrough** | Dedikovaný GPU celé VM (VFIO-pci) | Všechny GPU | AI training, HPC, nejvyšší výkon | - -### MIG partition table (A100 / H100) - -| GPU | Partition profile | GPU Memory | Compute units | -|-----|------------------|-----------|--------------| -| **A100 80 GB** | 1g.5gb | 5 GB | 1 | -| A100 80 GB | 2g.10gb | 10 GB | 2 | -| A100 80 GB | 3g.20gb | 20 GB | 3 | -| A100 80 GB | 7g.40gb | 40 GB | 7 | -| A100 80 GB | Full (7× 1g) | 7 × 5 GB | 7 instances | -| **H100 80 GB** | 1g.6gb+me | 6 GB | 1 | -| H100 80 GB | 2g.12gb+me | 12 GB | 2 | -| H100 80 GB | 3g.24gb+me | 24 GB | 3 | -| H100 80 GB | 7g.80gb | 80 GB | 7 | - -## GPU use cases - -### AI Training - -- **Modely**: LLM (70B-405B+), vision, multimodal -- **GPU**: H100, B200, GB200, MI300X -- **Interconnect**: NVLink 5 / Infinity Fabric (v rámci node), InfiniBand NDR (mezi nody) -- **Parallelism**: Data Parallel (DDP), Tensor Parallel (TP), Pipeline Parallel (PP), Fully Sharded (FSDP) -- **Framework**: PyTorch (NCCL), JAX (XLA), DeepSpeed, Megatron-LM -- **Tipy**: - - GB200: 2× B200 propojené NVLink, 8 GPU → 4 GB200 - - DGX B200 / HGX B200: standardní building block - - InfiniBand: fat tree topology pro all-reduce optimalizaci - -### AI Inference - -- **Modely**: LLM serving, embedding, image gen -- **GPU**: A100, H200, B200 (larger VRAM pro větší modely) -- **Techniky**: MIG partition, TensorRT-LLM, vLLM, Triton Inference Server -- **Kvantizace**: FP8, INT8, INT4 → nižší VRAM, vyšší throughput -- **Latency**: batch size optimalizace, dynamic batching, continuous batching -- **Scale**: on-prem (2-32 GPU) / cloud (elastic) - -### VDI (Virtual Desktop Infrastructure) - -- **GPU**: NVIDIA A16 (1 GPU = 16 users), A10 (1 GPU = 4 users) -- **Technologie**: vGPU (Grid), AMD MxGPU -- **Protokoly**: VMware Blast, Citrix HDX, Microsoft RDP, PC-over-IP (HP Teradici) -- **Use case**: CAD (CATIA, SolidWorks), Office, engineering, healthcare (PACS) - -### Rendering a VFX - -- **GPU**: NVIDIA RTX 6000 Ada, RTX A6000, AMD Radeon Pro W7900 -- **Rendering**: Blender (Cycles/OptiX), V-Ray, Octane Render, Redshift -- **Denoising**: AI-accelerated denoising na GPU -- **Farm rendering**: Deadline, Qube! (job scheduler) - -## GPU server form factors - -| Form factor | GPU count | Power | Cooling | Příklad | -|------------|-----------|-------|---------|---------| -| **1U** | 1-2 | 700-1400 W | Air (high-RPM) | Dell XR4510c | -| **2U** | 4-8 | 3-6 kW | Air / Liquid | Dell R760xa, HPE DL380a | -| **4U** | 8-10 | 5-8 kW | Liquid | NVIDIA DGX H100, Dell R760xa | -| **8U / Chassis** | 8-16 | 10-20 kW | Liquid (CDU) | NVIDIA HGX, Supermicro SYS-821GE | - -## Zdroje - -Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md) - -*Poslední revize: 2026-06-03*