Files
knowledge-base/GPU.md
2026-06-03 22:50:25 +02:00

129 lines
5.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 🎮 GPU — architektura, modely, virtualizace
## GPU modely
### NVIDIA
| GPU | Architektura | VRAM | HBM | FP16 (TFLOPS) | FP8 (TFLOPS) | Interconnect | TDP |
|-----|-------------|------|-----|--------------|-------------|-------------|-----|
| **A100** | Ampere (2020) | 40/80 GB | HBM2e | 312 | — | NVLink 3 (600 GB/s) | 400 W |
| **H100** | Hopper (2022) | 80 GB | HBM3 | 1000 | 2000 (sparse) | NVLink 4 (900 GB/s) | 700 W |
| **H200** | Hopper (2023) | 141 GB | HBM3e | 1650 | ~3300 | NVLink 4 (900 GB/s) | 700 W |
| **B200** | Blackwell (2024) | 192 GB | HBM3e | 2250 | ~4500 | NVLink 5 (1800 GB/s) | 700 W |
| **B100** | Blackwell (2024) | 192 GB | HBM3e | ~1800 | ~3600 | NVLink 5 | 700 W |
| **GB200** | Blackwell (2024) | — | HBM3e | 4500 (dual) | 9000 (dual) | NVLink 5 | 2700 W |
### AMD
| GPU | Architektura | VRAM | HBM | FP16 (TFLOPS) | Interconnect | TDP |
|-----|-------------|------|-----|--------------|-------------|-----|
| **MI250X** | CDNA 2 (2021) | 128 GB | HBM2e | 383 | Infinity Fabric | 500 W |
| **MI300X** | CDNA 3 (2023) | 192 GB | HBM3 | ~2600 | Infinity Fabric (896 GB/s) | 750 W |
| **MI350** | CDNA 4 (2025) | 288 GB | HBM3e | ~3500 | Infinity Fabric | 750 W |
## GPU interconnects
| Technologie | Poskytovatel | Bandwidth | Topologie | Use case |
|------------|-------------|-----------|-----------|----------|
| **NVLink 4** | NVIDIA | 900 GB/s (18× 50 GB/s) | GPU-GPU direct | AI training (H100, H200) |
| **NVLink 5** | NVIDIA | 1800 GB/s (18× 100 GB/s) | GPU-GPU direct | AI training (B200, GB200) |
| **Infinity Fabric** | AMD | 896 GB/s | GPU-GPU + CPU-GPU | AI training (MI300X, MI350) |
| **NVSwitch** | NVIDIA | 900 GB/s per GPU (NVLink) | Full-mesh (256 GPU) | DGX SuperPOD, HGX |
| **InfiniBand (NDR)** | NVIDIA/Mellanox | 400 Gbps per port | GPU-NIC direct, RDMA | Distributed training, HPC |
| **PCIe 5.0** | Standard | 63 GB/s per x16 | CPU-GPU | Inference, rendering |
| **Ethernet (RoCE v2)** | Standard | 100/200/400 GbE | GPU-NIC, RDMA over converged ethernet | AI inference, storage |
### GPU direct communication
```
GPU 0 ──NVLink── GPU 1 GPU 0 ───PCIe─── CPU ───PCIe─── GPU 1
│ │
│ │
NVSwitch InfiniBand
│ │
│ │
GPU 2 ──NVLink── GPU 3 GPU 2 ───PCIe─── CPU ───PCIe─── GPU 3
NVLink topologie (GPU direct) PCIe topologie (CPU mediated)
```
- **GPU Direct RDMA** — GPU ↔ NIC bez CPU (InfiniBand, RoCE)
- **GPU Direct Storage** — GPU ↔ NVMe bez CPU (NVIDIA Magnum IO)
- **NVSwitch** — full bisection bandwidth mezi všemi GPU v node
## Virtualizace GPU
| Technologie | Popis | GPU support | Use case |
|------------|-------|-------------|----------|
| **NVIDIA vGPU (Grid)** | Časové slicing + dedikované profily | A-series (VDI), Q-series (pro viz), B-series (AI) | VDI, virtualizované AI |
| **NVIDIA MIG** | Hardwarové partition GPU | A100 (7 inst.), H100/H200/B200 | AI inference, multi-tenant GPU |
| **AMD MxGPU** | SR-IOV, hardwarové partition | AMD MI (pro), Radeon Pro | VDI, cloud gaming |
| **Intel SG (SG1)** | SR-IOV, hardwarové partition | Intel SG1, Flex, Arc | VDI, media transcoding |
| **GPU passthrough** | Dedikovaný GPU celé VM (VFIO-pci) | Všechny GPU | AI training, HPC, nejvyšší výkon |
### MIG partition table (A100 / H100)
| GPU | Partition profile | GPU Memory | Compute units |
|-----|------------------|-----------|--------------|
| **A100 80 GB** | 1g.5gb | 5 GB | 1 |
| A100 80 GB | 2g.10gb | 10 GB | 2 |
| A100 80 GB | 3g.20gb | 20 GB | 3 |
| A100 80 GB | 7g.40gb | 40 GB | 7 |
| A100 80 GB | Full (7× 1g) | 7 × 5 GB | 7 instances |
| **H100 80 GB** | 1g.6gb+me | 6 GB | 1 |
| H100 80 GB | 2g.12gb+me | 12 GB | 2 |
| H100 80 GB | 3g.24gb+me | 24 GB | 3 |
| H100 80 GB | 7g.80gb | 80 GB | 7 |
## GPU use cases
### AI Training
- **Modely**: LLM (70B-405B+), vision, multimodal
- **GPU**: H100, B200, GB200, MI300X
- **Interconnect**: NVLink 5 / Infinity Fabric (v rámci node), InfiniBand NDR (mezi nody)
- **Parallelism**: Data Parallel (DDP), Tensor Parallel (TP), Pipeline Parallel (PP), Fully Sharded (FSDP)
- **Framework**: PyTorch (NCCL), JAX (XLA), DeepSpeed, Megatron-LM
- **Tipy**:
- GB200: 2× B200 propojené NVLink, 8 GPU → 4 GB200
- DGX B200 / HGX B200: standardní building block
- InfiniBand: fat tree topology pro all-reduce optimalizaci
### AI Inference
- **Modely**: LLM serving, embedding, image gen
- **GPU**: A100, H200, B200 (larger VRAM pro větší modely)
- **Techniky**: MIG partition, TensorRT-LLM, vLLM, Triton Inference Server
- **Kvantizace**: FP8, INT8, INT4 → nižší VRAM, vyšší throughput
- **Latency**: batch size optimalizace, dynamic batching, continuous batching
- **Scale**: on-prem (2-32 GPU) / cloud (elastic)
### VDI (Virtual Desktop Infrastructure)
- **GPU**: NVIDIA A16 (1 GPU = 16 users), A10 (1 GPU = 4 users)
- **Technologie**: vGPU (Grid), AMD MxGPU
- **Protokoly**: VMware Blast, Citrix HDX, Microsoft RDP, PC-over-IP (HP Teradici)
- **Use case**: CAD (CATIA, SolidWorks), Office, engineering, healthcare (PACS)
### Rendering a VFX
- **GPU**: NVIDIA RTX 6000 Ada, RTX A6000, AMD Radeon Pro W7900
- **Rendering**: Blender (Cycles/OptiX), V-Ray, Octane Render, Redshift
- **Denoising**: AI-accelerated denoising na GPU
- **Farm rendering**: Deadline, Qube! (job scheduler)
## GPU server form factors
| Form factor | GPU count | Power | Cooling | Příklad |
|------------|-----------|-------|---------|---------|
| **1U** | 1-2 | 700-1400 W | Air (high-RPM) | Dell XR4510c |
| **2U** | 4-8 | 3-6 kW | Air / Liquid | Dell R760xa, HPE DL380a |
| **4U** | 8-10 | 5-8 kW | Liquid | NVIDIA DGX H100, Dell R760xa |
| **8U / Chassis** | 8-16 | 10-20 kW | Liquid (CDU) | NVIDIA HGX, Supermicro SYS-821GE |
## Zdroje
Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
*Poslední revize: 2026-06-03*