18.6.2026
This commit is contained in:
51
STORAGE.md
51
STORAGE.md
@@ -270,6 +270,57 @@ OpenStack nabízí tři hlavní storage služby:
|
||||
|
||||
Ceph je nejčastější storage backend pro OpenStack: Cinder (RBD), Swift (RGW), Manila (CephFS), Glance (RBD images).
|
||||
|
||||
## Big Data storage
|
||||
|
||||
### HDFS cluster
|
||||
|
||||
HDFS je primární storage pro Hadoop ekosystém (on-prem). Typická konfigurace:
|
||||
|
||||
| Parametr | Hodnota | Poznámka |
|
||||
|----------|---------|----------|
|
||||
| **Disk per DataNode** | 8–24 × HDD (14–22 TB) + 2× NVMe (metadata, cache) | Balance capacity / performance |
|
||||
| **Replication factor** | 3× | Rack-aware |
|
||||
| **Network** | 2× 25/100 GbE (data) + 1× 1 GbE (management) | Data + replication traffic |
|
||||
| **RAM** | 64–256 GB (OS cache + metadata) | HDFS cache + OS buffer cache |
|
||||
| **CPU** | 16–32 cores | HDFS overhead je nízký |
|
||||
| **NameNode HA** | Active + Standby + JN (JournalNode) | Quorum-based HA |
|
||||
| **Use case** | Secvenční čtení/zápis, velké soubory, Spark YARN |
|
||||
|
||||
**Modelový cluster — 1 PB usable:**
|
||||
|
||||
- 10× DataNode (12× 18 TB HDD, 2× 1.9 TB NVMe)
|
||||
- 2× NameNode (HA, 256 GB RAM)
|
||||
- 3× JournalNode (malé VM)
|
||||
- Replication 3× → raw ~ 2.2 PB
|
||||
- Network: 25 GbE pro data, 100 GbE pro shuffle-heavy Spark
|
||||
|
||||
### Object storage jako Data Lake (S3/GCS/MinIO)
|
||||
|
||||
Pro nové projekty (Spark on K8s, Iceberg/Delta, lakehouse) se preferuje object storage před HDFS:
|
||||
|
||||
| Platforma | Výhody | Limity |
|
||||
|-----------|--------|--------|
|
||||
| **MinIO** (on-prem) | S3 API, erasure coding, NVMe direct, high throughput | Single tenant (per cluster) |
|
||||
| **Pure //C** (on-prem) | QLC NVMe, dedupe, S3 + NFS | Vyšší cena/TB |
|
||||
| **AWS S3** (cloud) | Neomezená kapacita, Iceberg/Delta support | Egress fees |
|
||||
| **Azure ADLS** (cloud) | Hierarchical namespace, HNS, POSIX-like ACLs | Vendor lock |
|
||||
| **GCP GCS** (cloud) | Uniform + fine-grained ACLs, object versioning | Region restrictions |
|
||||
|
||||
### Srovnání: HDFS vs Object Storage pro Big Data
|
||||
|
||||
| Kritérium | HDFS | Object Storage (S3/MinIO) |
|
||||
|-----------|------|-------------------------|
|
||||
| **Architektura** | Master/worker (NameNode SPOF) | Distributed, no SPOF (erasure coding) |
|
||||
| **Konzistence** | Strong (jediný writer per file) | Eventual (S3) / Strong (MinIO) |
|
||||
| **Propustnost** | Vysoká (rack-aware, locality) | Vysoká (network-bound) |
|
||||
| **Škálování** | Horizontální (DataNode) | Horizontální (stateless) |
|
||||
| **Cena** | Nízká (HDD) | Střední (S3 API) |
|
||||
| **Metadata** | NameNode (1 mil. bloků ~ 1 GB RAM) | Object-level (flat namespace) |
|
||||
| **Spark integration** | Native (locality optimalizace) | S3A connector, Hadoop Compatible |
|
||||
| **2026 trend** | Legacy, klesající | Standard pro nové projekty |
|
||||
|
||||
Podrobnější informace o Big Data viz [BIG-DATA.md](BIG-DATA.md).
|
||||
|
||||
## Zdroje
|
||||
|
||||
Odkazy, knihy a standardy: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
|
||||
|
||||
Reference in New Issue
Block a user