Files
knowledge-base/STORAGE.md
Stanislav Hubacek ef3c2f75b1 18.6.2026
2026-06-18 16:25:33 +02:00

335 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 💾 Storage infrastruktura
## Typy úložišť
| Typ | Popis | Latence | Use case |
|-----|-------|---------|----------|
| **DAS** (Direct Attached) | Disky přímo v serveru | <0.1 ms | OS, cache, lokální data |
| **SAN** (Storage Area Network) | Bloková zařízení po síti | <1 ms | Databáze, VM datastory |
| **NAS** (Network Attached Storage) | Souborový přístup (NFS, SMB) | 1-3 ms | Sdílené soubory, home dirs |
| **Object storage** | REST API, flat namespace | 10-100 ms | Zálohy, media, big data |
## Protokoly
| Protokol | Typ | Rychlost | Poznámka |
|----------|-----|----------|----------|
| **Fibre Channel** | SAN | 8/16/32/64 Gbps | Nízká latence, dedikovaná síť |
| **iSCSI** | SAN (IP) | 1/10/25 GbE | Levnější, po ethernetu |
| **NVMe-oF** | SAN (NVMe) | 25/50/100 GbE | Nejnižší latence, emerging |
| **NFS** | NAS | 1/10/25 GbE | Univerzální, jednoduchý |
| **SMB/CIFS** | NAS | 1/10/25 GbE | Windows native |
| **S3 API** | Object | — | Standard pro object storage |
## RAID
| RAID | Min. disků | Kapacita | Ochrana | Rychlost čtení | Rychlost zápisu | Use case |
|------|-----------|----------|---------|---------------|----------------|----------|
| **0** | 2 | 100 % | Žádná | N × (striping) | N × | Temp data, cache (risky) |
| **1** | 2 | 50 % | 1 disk | N × (mirror) | 1 × | OS disk, kritická data |
| **5** | 3 | 67-94 % | 1 disk | N-1 × | N-1 × (parity write penalty) | Univerzální file/VM storage |
| **6** | 4 | 50-88 % | 2 disky | N-2 × | N-2 × (double parity) | Velké kapacity, důležitá data |
| **10** | 4 | 50 % | 1/mirror | N × | N/2 × | Databáze, VM, high-performance |
| **50** | 6 | 67-94 % | 1/stripe | N-1 × | N-1 × | Large capacity + performance |
| **60** | 8 | 50-88 % | 2/stripe | N-2 × | N-2 × | Enterprise |
### Stripe size
- Malý stripe (16-64 KB) — lepší IOPS, horší throughput (databáze, OLTP)
- Velký stripe (128-1024 KB) — lepší throughput, horší IOPS (video, media, backup)
- Write hole u RAID 5/6: při výpadku během zápisu parity je metadata nekonzistentní (prevence: non-volatile cache, battery-backed RAID controller)
## Software-Defined Storage (SDS)
| Nástroj | Typ | Use case |
|---------|-----|----------|
| **Ceph** | Object/Block/File (RADOS) | Univerzální SDS, OpenStack, Kubernetes |
| **MinIO** | Object (S3 API) | High-performance S3, AI/ML data lake |
| **GlusterFS** | Distributed File | Shared filesystem, POSIX |
| **Longhorn** | Block (Kubernetes) | K8s PVC, mikroservisy |
| **Linstor** | Block (DRBD + LVM) | Linux SDS, Kubernetes |
| **VMware vSAN** | Block (HCI) | VMware ecosystem |
| **StarWind** | Block (HCI) | Hyper-V / VMware |
### Ceph
**Architektura**:
```
RADOS (Reliable Autonomic Distributed Object Store)
├── Monitors (MON) — cluster map, quorum (3/5)
├── Managers (MGR) — dashboard, balancer, orchestrator
├── OSDs (Object Storage Daemons) — data + replikace
└── MDS (Metadata Server) — pouze pro CephFS
```
**CRUSH map** (Controlled Replication Under Scalable Hashing):
- Algoritmus pro výpočet umístění dat (žádný centrální index)
- Vrstvy: Root → Datacenter → Rack → Host → OSD
- Failure domain: replikace napříč racky / hosty
- `ceph osd crush rule create-replicated replicated_rule default host`
**Přístupová rozhraní**:
| Rozhraní | Typ | Use case |
|----------|-----|----------|
| **RBD** (RADOS Block Device) | Block | VM images, Kubernetes PVC (csi-rbd) |
| **RGW** (RADOS Gateway) | Object (S3/Swift API) | S3-kompatibilní storage, backup |
| **CephFS** | File (POSIX) | Shared filesystem, home dirs |
| **NFS-Ganesha** | File (NFS) | NFS export přes CephFS |
**Erasure coding**:
- K+M (data + parity chunks), např. 8+3 (8 data, 3 parity)
- Prostorově efektivnější než 3× replikace (1.375× vs 3×)
- Vyšší CPU režie, nižší IOPS
- Doporučeno pro cold data (RGW) místo replikace
## Výrobci enterprise storage
### Hitachi VSP (Virtual Storage Platform)
| Model | Architektura | Max kapacita | IOPS / Latence | Protokoly | Use case |
|-------|-------------|--------------|----------------|-----------|----------|
| **VSP 5200/5600** | Active-active, scale-up/out, 212 controllerů | 69.3 PB raw, 287 PBe | 33M IOPS, 39 µs | FC-NVMe 32Gb, FC 16/32Gb, FICON 16Gb, iSCSI 10Gb | Mission-critical, mainframe, enterprise consolidation |
| **VSP E590/E790/E1090** | Symmetric active-active, až 65 nodů/130 controllerů | 10.62 PB raw (E1090) | 8.4M IOPS, <41 µs | FC 32Gb, iSCSI 25Gb, FC-NVMe 32Gb | Midrange enterprise, hybrid workloads |
**Klíčové vlastnosti**: SVOS společný pro celé portfolio, AI-driven data reduction 4:1 garance, Global-Active Device metro clustering, 8 nines availability (HW), 100% data availability guarantee.
---
### Huawei OceanStor Dorado
| Model | Architektura | Max kapacita | IOPS / Latence | Protokoly | Use case |
|-------|-------------|--------------|----------------|-----------|----------|
| **Dorado 8000/18000 V6** | SmartMatrix full-mesh, až 32 controllerů | 32 TB cache, 6400 SSD | 40M IOPS, 0.05 ms | FC 32/64Gb, FC-NVMe, iSCSI, NFS, SMB, NVMe/RoCE, S3 | Mission-critical, finance, govt, carrier |
| **Dorado 8000/18000 V7 (2025)** | SmartMatrix 4.0, až 64/128 controllerů | 500 PB+ | >100M IOPS, 0.03 ms | FC, RoCE, NVMe/TCP, NFS, SMB, S3 | AI workloads, converged block/file/object |
**Klíčové vlastnosti**: SmartMatrix přežije 7/8 controllerů, FlashEver (3-gen online HW upgrade za 10 let), RAID-TP (triple SSD failure), DPU-based SmartNIC, ML-based I/O prefetch, 100% ransomware detection (Tolly), #1 SPC-1 benchmark.
---
### Dell PowerStore & PowerMax
| Model | Architektura | Max kapacita | IOPS / Latence | Protokoly | Use case |
|-------|-------------|--------------|----------------|-----------|----------|
| **PowerStore 1500/5500/9500 (Gen 3)** | Active-active dual-node, PCIe Gen5, DDR5, RDMA 200GbE | 1.2 PB raw, 5.8 PBe | 3× IOPS oproti Gen2 | FC 32/64Gb, iSCSI, NVMe/FC, NVMe/TCP, NFSv4, SMB3 | Midrange-to-high-end, VMware, containerized |
| **PowerMax 2500/8500** | Scale-out NVMe, Dynamic Fabric, až 16 nodů | 8.8 PBe (2500), 18 PBe (8500) | 6 nines availability | FC 64Gb, FICON, NVMe/FC, NVMe/TCP, iSCSI, NFS, SMB | Mission-critical, mainframe, OLTP, cyber vault |
**Klíčové vlastnosti**: PowerStore 6:1 DRR garance, unified block/file/vVols out of box, Cyber Detect AI anomaly; PowerMax 5:1 DRR, Secure Snapshots 65M, SRDF/Metro, Flexible RAID až 92% efficient, FIPS 140-3.
---
### HPE Alletra
| Model | Architektura | Max kapacita | IOPS / Latence | Protokoly | Use case |
|-------|-------------|--------------|----------------|-----------|----------|
| **Alletra 5000** | Active-active hybrid flash, dual controller | 1.2 PB raw | 99.9999% garance | FC, iSCSI | Mixed primary + secondary, cost-efficient hybrid |
| **Alletra 6000** | Active-active all-NVMe, dual controller | ~368 TB usable | <100 µs | FC, iSCSI | Business-critical DB, VDI, VMware |
| **Alletra 9000** | Active-active all-NVMe, multi-node scale-out | 24 PB+ usable | ~23M IOPS, <150 µs | FC, iSCSI, NVMe/FC | Mission-critical ERP, AI, consolidation |
| **Alletra Storage MP** | Disaggregated modular, block + file + object | 5.8 PB block, 11.8 PB object | 100% availability garance | FC, iSCSI, NVMe/FC, NFS, SMB, S3 | Multi-protocol consolidation, AI/analytics |
**Klíčové vlastnosti**: Triple Parity RAID (5000), InfoSight AI Ops, HPE GreenLake as-a-service, non-disruptive controller upgrades (MP), 100% data availability guarantee.
---
### Infinidat
| Model | Architektura | Max kapacita | IOPS / Latence | Protokoly | Use case |
|-------|-------------|--------------|----------------|-----------|----------|
| **InfiniBox SSA G4** | Triple-active controller, AMD EPYC PCIe 5.0, DDR5 | 1.97 PB usable / 5.9 PBe | 2.24M IOPS, 35 µs | FC 32Gb, 25/100GbE, NVMe-oF/TCP, iSCSI, NFS, SMB, S3 | Mission-critical Oracle/SQL, multi-site DR |
| **InfiniBox G4 Hybrid** | Triple-active hybrid (HDD + flash cache) | 10.9 PB raw / 32.8 PBe | 2.24M IOPS, 64 GB/s | FC, Ethernet, NVMe-oF, iSCSI, NFS, SMB, S3 | Backup, massive unstructured data |
**Klíčové vlastnosti**: 3-way active jediný na trhu, Neural Cache (ML-driven), InfiniRAID, Immutable snapshots, 100% availability + 1-min snapshot recovery garance, vše v základní ceně (žádný extra licensing).
---
### Pure Storage FlashArray
| Model | Architektura | Max kapacita | IOPS / Latence | Protokoly | Use case |
|-------|-------------|--------------|----------------|-----------|----------|
| **FlashArray//X (X20X90 R5)** | Active-active, NVMe DirectFlash | 1.2 PB raw / 4.4 PBe | 250 µs, 5:1 DRR | FC, NVMe/FC, NVMe/RoCE, NVMe/TCP, iSCSI, NFS, SMB | Mission-critical DB, VMware, enterprise |
| **FlashArray//C (C50C90 R5)** | Active-active, QLC DirectFlash | 4.2 PB raw / 16.3 PBe | 5:1 DRR | FC, NVMe-oF, iSCSI, NFS, SMB | Capacity-optimized, backup, file |
| **FlashArray//XL (XL190)** | Active-active, 40 DirectFlash modulů | 1.9 PB raw / 9.4 PBe | >4M IOPS, <100 µs, 45 GB/s | FC 64Gb, 100GbE RoCE, NVMe/FC, NVMe/TCP, NFS, SMB | Největší DB konsolidace, OLTP |
**Klíčové vlastnosti**: DirectFlash (bez FTL vrstvy), 99.9999% availability, Evergreen (nikdy forklift upgrade), Purity OS jednotný napříč celým portfoliem, ActiveCluster/ActiveDR, Pure1 AIOps.
---
### Lenovo ThinkSystem
| Model | Architektura | Max kapacita | IOPS / Latence | Protokoly | Use case |
|-------|-------------|--------------|----------------|-----------|----------|
| **DM Series** (DM3200F/5200F/7200F) | Active-active, all-NVMe, NetApp ONTAP | 1.8 PB raw / 6.8 PBe | Až 120 NVMe SSD | FC 64Gb, iSCSI, NVMe/FC, NFS, SMB, S3 | Unified block/file, AI/ML, VMware |
| **DG Series** (DG5200/7200) | Active-active, all-QLC, ONTAP | 7.4 PB raw / 27 PBe | QLC ekonomie | FC, NVMe/FC, NVMe/TCP, iSCSI, NFS, SMB, S3 | Capacity-optimized, backup, archive |
| **DE Series** (DE4000FDE6600F) | Active-active, SAS/NVMe hybrid | 1.84 PB raw | 2M IOPS, <100 µs, 44 GB/s | FC 32Gb, iSCSI 25Gb, NVMe/FC, SAS, NVMe/RoCE | HPC, analytics, video surveillance |
**Klíčové vlastnosti**: DM/DG využívají ONTAP (SnapMirror, SnapVault, FabricPool, RAID-DP/RAID-TEC); cluster scale-out až 12 HA párů; DE série nejlepší poměr cena/výkon v portfoliu.
---
### Synology
| Model | Architektura | Max kapacita | Protokoly | Use case |
|-------|-------------|--------------|-----------|----------|
| **UC3200/UC3400** | Active-active dual-controller, SAS backend | 576 TB raw | iSCSI, FC 16Gb, 10/25GbE | SMB/midmarket SAN, VMware, HA |
| **DS/RS Series** (RS3626xs+, RS6426xs+) | Single-controller / HA pair, Btrfs | 864 TB raw, 1 PB volume | SMB, NFS, iSCSI, FC (HBA) | SME all-in-one NAS/SAN, backup, surveillance |
**Klíčové vlastnosti**: DSM UC pro SAN, Synology HA, Snapshot Replication (16K snapshots), VMware VAAI/ODX/ALUA, Surveillance Station, nízké TCO.
---
### Srovnání vendorů — přehled
| Vendor | Flagship | Max IOPS | Max kapacita | Latence | Garance availability | Hlavní diferentiátor |
|--------|----------|----------|-------------|---------|---------------------|----------------------|
| **Hitachi** | VSP 5600 | 33M | 287 PBe | 39 µs | 8 nines (HW) | Mainframe + open; 65-node cluster |
| **Huawei** | Dorado 18000 V7 | >100M | 500 PB+ | 0.03 ms | 99.99999% | SmartMatrix; #1 SPC-1 |
| **Dell** | PowerMax 8500 | — | 18 PBe | — | 6 nines | SRDF/Metro; mainframe |
| **HPE** | Alletra 9000/MP | ~3M | 11.8 PBe | <150 µs | 100% data guarantee | InfoSight AIOps; GreenLake |
| **Infinidat** | InfiniBox SSA G4 | 2.24M | 32.8 PBe | 35 µs | 100% availability | 3-way active; Neural Cache |
| **Pure** | FlashArray//XL | >4M | 16.3 PBe | <100 µs | 99.9999% | DirectFlash; Evergreen |
| **Lenovo** | DM7200F | — | 27 PBe | — | — | ONTAP ecosystem; široké portfolio |
| **Synology** | UC3400 | 690K | 576 TB | — | — | Nejnižší cena za active-active SAN |
---
### Výběr storage dle use case
| Use case | Doporučení | Zdůvodnění |
|----------|-----------|-------------|
| **Mainframe + open hybrid** | Hitachi VSP / Dell PowerMax | Jediní s FICON + FC současně |
| **AI/ML trénování** | Huawei Dorado V7 / Pure //XL | Nejvyšší IOPS, nejnižší latence |
| **Enterprise DB (Oracle, SQL Server)** | Infinidat / Pure //X | Nízká latence, konzistentní výkon |
| **Virtualizace (VMware, Hyper-V)** | Dell PowerStore / HPE Alletra 6000 | VAAI, vVols, InfoSight |
| **SMB / SME** | Synology / Lenovo DE | Nízké TCO, jednoduchá správa |
| **Object storage / backup** | Pure //C / Lenovo DG / Infinidat Hybrid | QLC ekonomie, vysoká kapacita |
| **Multi-protocol konsolidace** | HPE Alletra MP / Huawei Dorado | Block + file + object v jedné platformě |
## Decision diagram — výběr storage platformy
```mermaid
flowchart TD
Start(["Storage requirement"]) --> PROTO{"Access type"}
PROTO -->|"Block (SAN)"| BLOCK
PROTO -->|"File (NAS)"| FILE
PROTO -->|"Object"| OBJECT
BLOCK --> BPERF{"Performance tier"}
BPERF -->|"Tier 0/1<br/>< 100 µs, > 1M IOPS"| BT1["Infinidat / Pure //XL<br/>Huawei Dorado V7<br/>FC-NVMe, NVMe-oF"]
BPERF -->|"Tier 2<br/>100-500 µs"| BT2["Dell PowerStore / HPE Alletra 6000<br/>Hitachi VSP / Lenovo DM<br/>FC 32G, iSCSI 25GbE"]
BPERF -->|"Tier 3<br/>SME / low-cost"| BT3["Synology UC3400<br/>Lenovo DE / Dell PowerVault<br/>iSCSI, SAS"]
BLOCK --> BECOS{"Ecosystem"}
BECOS -->|"Mainframe"| BMF["Hitachi VSP / Dell PowerMax<br/>FICON + FC současně"]
BECOS -->|"VMware"| BVM["Dell PowerStore / HPE Alletra<br/>VAAI, vVols, InfoSight"]
BECOS -->|"Oracle / SQL Server"| BDB["Infinidat / Pure //X<br/>Nejnižší latence"]
FILE --> FSIZE{"Škálování"}
FSIZE -->|"Enterprise"| FE["HPE Alletra MP (file)<br/>Lenovo DM / Dell PowerScale<br/>NFS, SMB, multi-protocol"]
FSIZE -->|"SMB"| FS["Synology DS/RS<br/>Lenovo DE / TrueNAS<br/>Btrfs, NFS, SMB, nízké TCO"]
OBJECT --> OUSE{"Use case"}
OUSE -->|"Backup / archive"| OB["Pure //C / Infinidat Hybrid<br/>Lenovo DG<br/>QLC, erasure coding, nízká cena/TB"]
OUSE -->|"AI/ML data lake"| OM["MinIO / Pure //C<br/>High throughput S3<br/>NVMe direct, erasure coding"]
OUSE -->|"Kubernetes PVC"| OK["Ceph RBD / Longhorn / Linstor<br/>SDS na K8s<br/>CSI, replication, snapshots"]
```
## OpenStack Storage
OpenStack nabízí tři hlavní storage služby:
| Služba | Typ | Popis |
|--------|-----|-------|
| **Cinder** | Block storage | Persistent volumes pro instance (iSCSI, NFS, Ceph RBD) |
| **Swift** | Object storage | RESTful object store (S3-kompatibilní via middleware) |
| **Manila** | File storage | Shared file systems (NFS, CIFS) jako managed service |
### Cinder (Block Storage)
- Podpora multi-backend: LVM, Ceph RBD, NFS, iSCSI, Fibre Channel
- Snapshoting, cloning, encryption at rest
- Cinder scheduler pro distribuci volume napříč backendy
- QoS specs pro omezení IOPS/bandwidth
### Swift (Object Storage)
- Alternativa k S3 pro on-prem object storage
- Ring-based data distribution (consistent hashing)
- Multi-region replikace (syncopy)
- Stateless REST API (RESTful, no single point of failure)
### Manila (Shared File Systems)
- Managed NFS/CIFS pro sdílení mezi instancemi
- Backendy: NetApp, Dell EMC, CephFS, GlusterFS
- Access rules (IP-based, cert-based, user-based)
- Use case: HPC cluster home directories, NAS pro legacy apps
### Kontejnerový storage (OpenStack + Ceph)
Ceph je nejčastější storage backend pro OpenStack: Cinder (RBD), Swift (RGW), Manila (CephFS), Glance (RBD images).
## Big Data storage
### HDFS cluster
HDFS je primární storage pro Hadoop ekosystém (on-prem). Typická konfigurace:
| Parametr | Hodnota | Poznámka |
|----------|---------|----------|
| **Disk per DataNode** | 824 × HDD (1422 TB) + 2× NVMe (metadata, cache) | Balance capacity / performance |
| **Replication factor** | 3× | Rack-aware |
| **Network** | 2× 25/100 GbE (data) + 1× 1 GbE (management) | Data + replication traffic |
| **RAM** | 64256 GB (OS cache + metadata) | HDFS cache + OS buffer cache |
| **CPU** | 1632 cores | HDFS overhead je nízký |
| **NameNode HA** | Active + Standby + JN (JournalNode) | Quorum-based HA |
| **Use case** | Secvenční čtení/zápis, velké soubory, Spark YARN |
**Modelový cluster — 1 PB usable:**
- 10× DataNode (12× 18 TB HDD, 2× 1.9 TB NVMe)
- 2× NameNode (HA, 256 GB RAM)
- 3× JournalNode (malé VM)
- Replication 3× → raw ~ 2.2 PB
- Network: 25 GbE pro data, 100 GbE pro shuffle-heavy Spark
### Object storage jako Data Lake (S3/GCS/MinIO)
Pro nové projekty (Spark on K8s, Iceberg/Delta, lakehouse) se preferuje object storage před HDFS:
| Platforma | Výhody | Limity |
|-----------|--------|--------|
| **MinIO** (on-prem) | S3 API, erasure coding, NVMe direct, high throughput | Single tenant (per cluster) |
| **Pure //C** (on-prem) | QLC NVMe, dedupe, S3 + NFS | Vyšší cena/TB |
| **AWS S3** (cloud) | Neomezená kapacita, Iceberg/Delta support | Egress fees |
| **Azure ADLS** (cloud) | Hierarchical namespace, HNS, POSIX-like ACLs | Vendor lock |
| **GCP GCS** (cloud) | Uniform + fine-grained ACLs, object versioning | Region restrictions |
### Srovnání: HDFS vs Object Storage pro Big Data
| Kritérium | HDFS | Object Storage (S3/MinIO) |
|-----------|------|-------------------------|
| **Architektura** | Master/worker (NameNode SPOF) | Distributed, no SPOF (erasure coding) |
| **Konzistence** | Strong (jediný writer per file) | Eventual (S3) / Strong (MinIO) |
| **Propustnost** | Vysoká (rack-aware, locality) | Vysoká (network-bound) |
| **Škálování** | Horizontální (DataNode) | Horizontální (stateless) |
| **Cena** | Nízká (HDD) | Střední (S3 API) |
| **Metadata** | NameNode (1 mil. bloků ~ 1 GB RAM) | Object-level (flat namespace) |
| **Spark integration** | Native (locality optimalizace) | S3A connector, Hadoop Compatible |
| **2026 trend** | Legacy, klesající | Standard pro nové projekty |
Podrobnější informace o Big Data viz [BIG-DATA.md](BIG-DATA.md).
## Zdroje
Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
### Doporučená literatura
| Kniha | Autoři | ISBN | Popis |
|-------|--------|------|-------|
| Storage Systems | Ganger, Gibson | 978-1680837540 | Učebnice pokrývající návrh, implementaci a provoz úložných systémů — od charakteristik jednotlivých zařízení přes OS, databáze a networking až po distribuce v serverech a large-scale systémech. Nezbytný zdroj pro architekty storage infrastruktury. |
*Poslední revize: 2026-06-03*