comiiit
This commit is contained in:
283
STORAGE.en.md
Normal file
283
STORAGE.en.md
Normal file
@@ -0,0 +1,283 @@
|
||||
# 💾 Storage infrastructure
|
||||
|
||||
## Storage types
|
||||
|
||||
| Type | Description | Latency | Use case |
|
||||
|-----|-------|---------|----------|
|
||||
| **DAS** (Direct Attached) | Disks directly in server | <0.1 ms | OS, cache, local data |
|
||||
| **SAN** (Storage Area Network) | Block devices over network | <1 ms | Databases, VM datastores |
|
||||
| **NAS** (Network Attached Storage) | File access (NFS, SMB) | 1-3 ms | Shared files, home dirs |
|
||||
| **Object storage** | REST API, flat namespace | 10-100 ms | Backups, media, big data |
|
||||
|
||||
## Protocols
|
||||
|
||||
| Protocol | Type | Speed | Note |
|
||||
|----------|-----|----------|----------|
|
||||
| **Fibre Channel** | SAN | 8/16/32/64 Gbps | Low latency, dedicated network |
|
||||
| **iSCSI** | SAN (IP) | 1/10/25 GbE | Cheaper, over ethernet |
|
||||
| **NVMe-oF** | SAN (NVMe) | 25/50/100 GbE | Lowest latency, emerging |
|
||||
| **NFS** | NAS | 1/10/25 GbE | Universal, simple |
|
||||
| **SMB/CIFS** | NAS | 1/10/25 GbE | Windows native |
|
||||
| **S3 API** | Object | — | Standard for object storage |
|
||||
|
||||
## RAID
|
||||
|
||||
| RAID | Min. disks | Capacity | Protection | Read speed | Write speed | Use case |
|
||||
|------|-----------|----------|---------|---------------|----------------|----------|
|
||||
| **0** | 2 | 100 % | None | N × (striping) | N × | Temp data, cache (risky) |
|
||||
| **1** | 2 | 50 % | 1 disk | N × (mirror) | 1 × | OS disk, critical data |
|
||||
| **5** | 3 | 67-94 % | 1 disk | N-1 × | N-1 × (parity write penalty) | Universal file/VM storage |
|
||||
| **6** | 4 | 50-88 % | 2 disks | N-2 × | N-2 × (double parity) | Large capacities, important data |
|
||||
| **10** | 4 | 50 % | 1/mirror | N × | N/2 × | Databases, VM, high-performance |
|
||||
| **50** | 6 | 67-94 % | 1/stripe | N-1 × | N-1 × | Large capacity + performance |
|
||||
| **60** | 8 | 50-88 % | 2/stripe | N-2 × | N-2 × | Enterprise |
|
||||
|
||||
### Stripe size
|
||||
|
||||
- Small stripe (16-64 KB) — better IOPS, worse throughput (databases, OLTP)
|
||||
- Large stripe (128-1024 KB) — better throughput, worse IOPS (video, media, backup)
|
||||
- Write hole on RAID 5/6: metadata inconsistency during power loss while writing parity (prevention: non-volatile cache, battery-backed RAID controller)
|
||||
|
||||
## Software-Defined Storage (SDS)
|
||||
|
||||
| Tool | Type | Use case |
|
||||
|---------|-----|----------|
|
||||
| **Ceph** | Object/Block/File (RADOS) | Universal SDS, OpenStack, Kubernetes |
|
||||
| **MinIO** | Object (S3 API) | High-performance S3, AI/ML data lake |
|
||||
| **GlusterFS** | Distributed File | Shared filesystem, POSIX |
|
||||
| **Longhorn** | Block (Kubernetes) | K8s PVC, microservices |
|
||||
| **Linstor** | Block (DRBD + LVM) | Linux SDS, Kubernetes |
|
||||
| **VMware vSAN** | Block (HCI) | VMware ecosystem |
|
||||
| **StarWind** | Block (HCI) | Hyper-V / VMware |
|
||||
|
||||
### Ceph
|
||||
|
||||
**Architecture**:
|
||||
|
||||
```
|
||||
RADOS (Reliable Autonomic Distributed Object Store)
|
||||
├── Monitors (MON) — cluster map, quorum (3/5)
|
||||
├── Managers (MGR) — dashboard, balancer, orchestrator
|
||||
├── OSDs (Object Storage Daemons) — data + replication
|
||||
└── MDS (Metadata Server) — CephFS only
|
||||
```
|
||||
|
||||
**CRUSH map** (Controlled Replication Under Scalable Hashing):
|
||||
|
||||
- Algorithm for calculating data placement (no central index)
|
||||
- Layers: Root → Datacenter → Rack → Host → OSD
|
||||
- Failure domain: replication across racks / hosts
|
||||
- `ceph osd crush rule create-replicated replicated_rule default host`
|
||||
|
||||
**Access interfaces**:
|
||||
|
||||
| Interface | Type | Use case |
|
||||
|----------|-----|----------|
|
||||
| **RBD** (RADOS Block Device) | Block | VM images, Kubernetes PVC (csi-rbd) |
|
||||
| **RGW** (RADOS Gateway) | Object (S3/Swift API) | S3-compatible storage, backup |
|
||||
| **CephFS** | File (POSIX) | Shared filesystem, home dirs |
|
||||
| **NFS-Ganesha** | File (NFS) | NFS export over CephFS |
|
||||
|
||||
**Erasure coding**:
|
||||
|
||||
- K+M (data + parity chunks), e.g. 8+3 (8 data, 3 parity)
|
||||
- More space-efficient than 3× replication (1.375× vs 3×)
|
||||
- Higher CPU overhead, lower IOPS
|
||||
- Recommended for cold data (RGW) instead of replication
|
||||
|
||||
## Enterprise storage vendors
|
||||
|
||||
### Hitachi VSP (Virtual Storage Platform)
|
||||
|
||||
| Model | Architecture | Max capacity | IOPS / Latency | Protocols | Use case |
|
||||
|-------|-------------|--------------|----------------|-----------|----------|
|
||||
| **VSP 5200/5600** | Active-active, scale-up/out, 2–12 controllers | 69.3 PB raw, 287 PBe | 33M IOPS, 39 µs | FC-NVMe 32Gb, FC 16/32Gb, FICON 16Gb, iSCSI 10Gb | Mission-critical, mainframe, enterprise consolidation |
|
||||
| **VSP E590/E790/E1090** | Symmetric active-active, up to 65 nodes/130 controllers | 10.62 PB raw (E1090) | 8.4M IOPS, <41 µs | FC 32Gb, iSCSI 25Gb, FC-NVMe 32Gb | Midrange enterprise, hybrid workloads |
|
||||
|
||||
**Key features**: SVOS common across entire portfolio, AI-driven data reduction 4:1 guarantee, Global-Active Device metro clustering, 8 nines availability (HW), 100% data availability guarantee.
|
||||
|
||||
---
|
||||
|
||||
### Huawei OceanStor Dorado
|
||||
|
||||
| Model | Architecture | Max capacity | IOPS / Latency | Protocols | Use case |
|
||||
|-------|-------------|--------------|----------------|-----------|----------|
|
||||
| **Dorado 8000/18000 V6** | SmartMatrix full-mesh, up to 32 controllers | 32 TB cache, 6400 SSD | 40M IOPS, 0.05 ms | FC 32/64Gb, FC-NVMe, iSCSI, NFS, SMB, NVMe/RoCE, S3 | Mission-critical, finance, govt, carrier |
|
||||
| **Dorado 8000/18000 V7 (2025)** | SmartMatrix 4.0, up to 64/128 controllers | 500 PB+ | >100M IOPS, 0.03 ms | FC, RoCE, NVMe/TCP, NFS, SMB, S3 | AI workloads, converged block/file/object |
|
||||
|
||||
**Key features**: SmartMatrix survives 7/8 controllers, FlashEver (3-gen online HW upgrade in 10 years), RAID-TP (triple SSD failure), DPU-based SmartNIC, ML-based I/O prefetch, 100% ransomware detection (Tolly), #1 SPC-1 benchmark.
|
||||
|
||||
---
|
||||
|
||||
### Dell PowerStore & PowerMax
|
||||
|
||||
| Model | Architecture | Max capacity | IOPS / Latency | Protocols | Use case |
|
||||
|-------|-------------|--------------|----------------|-----------|----------|
|
||||
| **PowerStore 1500/5500/9500 (Gen 3)** | Active-active dual-node, PCIe Gen5, DDR5, RDMA 200GbE | 1.2 PB raw, 5.8 PBe | 3× IOPS vs Gen2 | FC 32/64Gb, iSCSI, NVMe/FC, NVMe/TCP, NFSv4, SMB3 | Midrange-to-high-end, VMware, containerized |
|
||||
| **PowerMax 2500/8500** | Scale-out NVMe, Dynamic Fabric, up to 16 nodes | 8.8 PBe (2500), 18 PBe (8500) | 6 nines availability | FC 64Gb, FICON, NVMe/FC, NVMe/TCP, iSCSI, NFS, SMB | Mission-critical, mainframe, OLTP, cyber vault |
|
||||
|
||||
**Key features**: PowerStore 6:1 DRR guarantee, unified block/file/vVols out of box, Cyber Detect AI anomaly; PowerMax 5:1 DRR, Secure Snapshots 65M, SRDF/Metro, Flexible RAID up to 92% efficient, FIPS 140-3.
|
||||
|
||||
---
|
||||
|
||||
### HPE Alletra
|
||||
|
||||
| Model | Architecture | Max capacity | IOPS / Latency | Protocols | Use case |
|
||||
|-------|-------------|--------------|----------------|-----------|----------|
|
||||
| **Alletra 5000** | Active-active hybrid flash, dual controller | 1.2 PB raw | 99.9999% guarantee | FC, iSCSI | Mixed primary + secondary, cost-efficient hybrid |
|
||||
| **Alletra 6000** | Active-active all-NVMe, dual controller | ~368 TB usable | <100 µs | FC, iSCSI | Business-critical DB, VDI, VMware |
|
||||
| **Alletra 9000** | Active-active all-NVMe, multi-node scale-out | 2–4 PB+ usable | ~2–3M IOPS, <150 µs | FC, iSCSI, NVMe/FC | Mission-critical ERP, AI, consolidation |
|
||||
| **Alletra Storage MP** | Disaggregated modular, block + file + object | 5.8 PB block, 11.8 PB object | 100% availability guarantee | FC, iSCSI, NVMe/FC, NFS, SMB, S3 | Multi-protocol consolidation, AI/analytics |
|
||||
|
||||
**Key features**: Triple Parity RAID (5000), InfoSight AI Ops, HPE GreenLake as-a-service, non-disruptive controller upgrades (MP), 100% data availability guarantee.
|
||||
|
||||
---
|
||||
|
||||
### Infinidat
|
||||
|
||||
| Model | Architecture | Max capacity | IOPS / Latency | Protocols | Use case |
|
||||
|-------|-------------|--------------|----------------|-----------|----------|
|
||||
| **InfiniBox SSA G4** | Triple-active controller, AMD EPYC PCIe 5.0, DDR5 | 1.97 PB usable / 5.9 PBe | 2.24M IOPS, 35 µs | FC 32Gb, 25/100GbE, NVMe-oF/TCP, iSCSI, NFS, SMB, S3 | Mission-critical Oracle/SQL, multi-site DR |
|
||||
| **InfiniBox G4 Hybrid** | Triple-active hybrid (HDD + flash cache) | 10.9 PB raw / 32.8 PBe | 2.24M IOPS, 64 GB/s | FC, Ethernet, NVMe-oF, iSCSI, NFS, SMB, S3 | Backup, massive unstructured data |
|
||||
|
||||
**Key features**: Only 3-way active on the market, Neural Cache (ML-driven), InfiniRAID, Immutable snapshots, 100% availability + 1-min snapshot recovery guarantee, everything included in base price (no extra licensing).
|
||||
|
||||
---
|
||||
|
||||
### Pure Storage FlashArray
|
||||
|
||||
| Model | Architecture | Max capacity | IOPS / Latency | Protocols | Use case |
|
||||
|-------|-------------|--------------|----------------|-----------|----------|
|
||||
| **FlashArray//X (X20–X90 R5)** | Active-active, NVMe DirectFlash | 1.2 PB raw / 4.4 PBe | 250 µs, 5:1 DRR | FC, NVMe/FC, NVMe/RoCE, NVMe/TCP, iSCSI, NFS, SMB | Mission-critical DB, VMware, enterprise |
|
||||
| **FlashArray//C (C50–C90 R5)** | Active-active, QLC DirectFlash | 4.2 PB raw / 16.3 PBe | 5:1 DRR | FC, NVMe-oF, iSCSI, NFS, SMB | Capacity-optimized, backup, file |
|
||||
| **FlashArray//XL (XL190)** | Active-active, 40 DirectFlash modules | 1.9 PB raw / 9.4 PBe | >4M IOPS, <100 µs, 45 GB/s | FC 64Gb, 100GbE RoCE, NVMe/FC, NVMe/TCP, NFS, SMB | Largest DB consolidation, OLTP |
|
||||
|
||||
**Key features**: DirectFlash (no FTL layer), 99.9999% availability, Evergreen (never forklift upgrade), Purity OS unified across entire portfolio, ActiveCluster/ActiveDR, Pure1 AIOps.
|
||||
|
||||
---
|
||||
|
||||
### Lenovo ThinkSystem
|
||||
|
||||
| Model | Architecture | Max capacity | IOPS / Latency | Protocols | Use case |
|
||||
|-------|-------------|--------------|----------------|-----------|----------|
|
||||
| **DM Series** (DM3200F/5200F/7200F) | Active-active, all-NVMe, NetApp ONTAP | 1.8 PB raw / 6.8 PBe | Up to 120 NVMe SSD | FC 64Gb, iSCSI, NVMe/FC, NFS, SMB, S3 | Unified block/file, AI/ML, VMware |
|
||||
| **DG Series** (DG5200/7200) | Active-active, all-QLC, ONTAP | 7.4 PB raw / 27 PBe | QLC economics | FC, NVMe/FC, NVMe/TCP, iSCSI, NFS, SMB, S3 | Capacity-optimized, backup, archive |
|
||||
| **DE Series** (DE4000F–DE6600F) | Active-active, SAS/NVMe hybrid | 1.84 PB raw | 2M IOPS, <100 µs, 44 GB/s | FC 32Gb, iSCSI 25Gb, NVMe/FC, SAS, NVMe/RoCE | HPC, analytics, video surveillance |
|
||||
|
||||
**Key features**: DM/DG use ONTAP (SnapMirror, SnapVault, FabricPool, RAID-DP/RAID-TEC); cluster scale-out up to 12 HA pairs; DE series best price/performance in portfolio.
|
||||
|
||||
---
|
||||
|
||||
### Synology
|
||||
|
||||
| Model | Architecture | Max capacity | Protocols | Use case |
|
||||
|-------|-------------|--------------|-----------|----------|
|
||||
| **UC3200/UC3400** | Active-active dual-controller, SAS backend | 576 TB raw | iSCSI, FC 16Gb, 10/25GbE | SMB/midmarket SAN, VMware, HA |
|
||||
| **DS/RS Series** (RS3626xs+, RS6426xs+) | Single-controller / HA pair, Btrfs | 864 TB raw, 1 PB volume | SMB, NFS, iSCSI, FC (HBA) | SME all-in-one NAS/SAN, backup, surveillance |
|
||||
|
||||
**Key features**: DSM UC for SAN, Synology HA, Snapshot Replication (16K snapshots), VMware VAAI/ODX/ALUA, Surveillance Station, low TCO.
|
||||
|
||||
---
|
||||
|
||||
### Vendor comparison — overview
|
||||
|
||||
| Vendor | Flagship | Max IOPS | Max capacity | Latency | Availability guarantee | Main differentiator |
|
||||
|--------|----------|----------|-------------|---------|---------------------|----------------------|
|
||||
| **Hitachi** | VSP 5600 | 33M | 287 PBe | 39 µs | 8 nines (HW) | Mainframe + open; 65-node cluster |
|
||||
| **Huawei** | Dorado 18000 V7 | >100M | 500 PB+ | 0.03 ms | 99.99999% | SmartMatrix; #1 SPC-1 |
|
||||
| **Dell** | PowerMax 8500 | — | 18 PBe | — | 6 nines | SRDF/Metro; mainframe |
|
||||
| **HPE** | Alletra 9000/MP | ~3M | 11.8 PBe | <150 µs | 100% data guarantee | InfoSight AIOps; GreenLake |
|
||||
| **Infinidat** | InfiniBox SSA G4 | 2.24M | 32.8 PBe | 35 µs | 100% availability | 3-way active; Neural Cache |
|
||||
| **Pure** | FlashArray//XL | >4M | 16.3 PBe | <100 µs | 99.9999% | DirectFlash; Evergreen |
|
||||
| **Lenovo** | DM7200F | — | 27 PBe | — | — | ONTAP ecosystem; broad portfolio |
|
||||
| **Synology** | UC3400 | 690K | 576 TB | — | — | Lowest price for active-active SAN |
|
||||
|
||||
---
|
||||
|
||||
### Storage selection by use case
|
||||
|
||||
| Use case | Recommendation | Rationale |
|
||||
|----------|-----------|-------------|
|
||||
| **Mainframe + open hybrid** | Hitachi VSP / Dell PowerMax | Only ones with FICON + FC simultaneously |
|
||||
| **AI/ML training** | Huawei Dorado V7 / Pure //XL | Highest IOPS, lowest latency |
|
||||
| **Enterprise DB (Oracle, SQL Server)** | Infinidat / Pure //X | Low latency, consistent performance |
|
||||
| **Virtualization (VMware, Hyper-V)** | Dell PowerStore / HPE Alletra 6000 | VAAI, vVols, InfoSight |
|
||||
| **SMB / SME** | Synology / Lenovo DE | Low TCO, simple management |
|
||||
| **Object storage / backup** | Pure //C / Lenovo DG / Infinidat Hybrid | QLC economics, high capacity |
|
||||
| **Multi-protocol consolidation** | HPE Alletra MP / Huawei Dorado | Block + file + object in one platform |
|
||||
|
||||
## Decision diagram — storage platform selection
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
Start(["Storage requirement"]) --> PROTO{"Access type"}
|
||||
PROTO -->|"Block (SAN)"| BLOCK
|
||||
PROTO -->|"File (NAS)"| FILE
|
||||
PROTO -->|"Object"| OBJECT
|
||||
|
||||
BLOCK --> BPERF{"Performance tier"}
|
||||
BPERF -->|"Tier 0/1<br/>< 100 µs, > 1M IOPS"| BT1["Infinidat / Pure //XL<br/>Huawei Dorado V7<br/>FC-NVMe, NVMe-oF"]
|
||||
BPERF -->|"Tier 2<br/>100-500 µs"| BT2["Dell PowerStore / HPE Alletra 6000<br/>Hitachi VSP / Lenovo DM<br/>FC 32G, iSCSI 25GbE"]
|
||||
BPERF -->|"Tier 3<br/>SME / low-cost"| BT3["Synology UC3400<br/>Lenovo DE / Dell PowerVault<br/>iSCSI, SAS"]
|
||||
|
||||
BLOCK --> BECOS{"Ecosystem"}
|
||||
BECOS -->|"Mainframe"| BMF["Hitachi VSP / Dell PowerMax<br/>FICON + FC simultaneously"]
|
||||
BECOS -->|"VMware"| BVM["Dell PowerStore / HPE Alletra<br/>VAAI, vVols, InfoSight"]
|
||||
BECOS -->|"Oracle / SQL Server"| BDB["Infinidat / Pure //X<br/>Lowest latency"]
|
||||
|
||||
FILE --> FSIZE{"Scaling"}
|
||||
FSIZE -->|"Enterprise"| FE["HPE Alletra MP (file)<br/>Lenovo DM / Dell PowerScale<br/>NFS, SMB, multi-protocol"]
|
||||
FSIZE -->|"SMB"| FS["Synology DS/RS<br/>Lenovo DE / TrueNAS<br/>Btrfs, NFS, SMB, low TCO"]
|
||||
|
||||
OBJECT --> OUSE{"Use case"}
|
||||
OUSE -->|"Backup / archive"| OB["Pure //C / Infinidat Hybrid<br/>Lenovo DG<br/>QLC, erasure coding, low cost/TB"]
|
||||
OUSE -->|"AI/ML data lake"| OM["MinIO / Pure //C<br/>High throughput S3<br/>NVMe direct, erasure coding"]
|
||||
OUSE -->|"Kubernetes PVC"| OK["Ceph RBD / Longhorn / Linstor<br/>SDS on K8s<br/>CSI, replication, snapshots"]
|
||||
```
|
||||
|
||||
## OpenStack Storage
|
||||
|
||||
OpenStack offers three main storage services:
|
||||
|
||||
| Service | Type | Description |
|
||||
|--------|-----|-------|
|
||||
| **Cinder** | Block storage | Persistent volumes for instances (iSCSI, NFS, Ceph RBD) |
|
||||
| **Swift** | Object storage | RESTful object store (S3-compatible via middleware) |
|
||||
| **Manila** | File storage | Shared file systems (NFS, CIFS) as a managed service |
|
||||
|
||||
### Cinder (Block Storage)
|
||||
|
||||
- Multi-backend support: LVM, Ceph RBD, NFS, iSCSI, Fibre Channel
|
||||
- Snapshoting, cloning, encryption at rest
|
||||
- Cinder scheduler for volume distribution across backends
|
||||
- QoS specs for IOPS/bandwidth limits
|
||||
|
||||
### Swift (Object Storage)
|
||||
|
||||
- Alternative to S3 for on-prem object storage
|
||||
- Ring-based data distribution (consistent hashing)
|
||||
- Multi-region replication (syncopy)
|
||||
- Stateless REST API (RESTful, no single point of failure)
|
||||
|
||||
### Manila (Shared File Systems)
|
||||
|
||||
- Managed NFS/CIFS for sharing between instances
|
||||
- Backends: NetApp, Dell EMC, CephFS, GlusterFS
|
||||
- Access rules (IP-based, cert-based, user-based)
|
||||
- Use case: HPC cluster home directories, NAS for legacy apps
|
||||
|
||||
### Container storage (OpenStack + Ceph)
|
||||
|
||||
Ceph is the most common storage backend for OpenStack: Cinder (RBD), Swift (RGW), Manila (CephFS), Glance (RBD images).
|
||||
|
||||
## Sources
|
||||
|
||||
Links, books and standards: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
|
||||
|
||||
### Recommended reading
|
||||
|
||||
| Book | Authors | ISBN | Description |
|
||||
|-------|--------|------|-------|
|
||||
| Storage Systems | Ganger, Gibson | 978-1680837540 | Textbook covering the design, implementation and operation of storage systems — from device characteristics through OS, databases and networking to server distribution and large-scale systems. An essential resource for storage infrastructure architects. |
|
||||
|
||||
*Last revision: 2026-06-03*
|
||||
Reference in New Issue
Block a user