# πŸ’Ύ Storage infrastructure ## Storage types | Type | Description | Latency | Use case | |-----|-------|---------|----------| | **DAS** (Direct Attached) | Disks directly in server | <0.1 ms | OS, cache, local data | | **SAN** (Storage Area Network) | Block devices over network | <1 ms | Databases, VM datastores | | **NAS** (Network Attached Storage) | File access (NFS, SMB) | 1-3 ms | Shared files, home dirs | | **Object storage** | REST API, flat namespace | 10-100 ms | Backups, media, big data | ## Protocols | Protocol | Type | Speed | Note | |----------|-----|----------|----------| | **Fibre Channel** | SAN | 8/16/32/64 Gbps | Low latency, dedicated network | | **iSCSI** | SAN (IP) | 1/10/25 GbE | Cheaper, over ethernet | | **NVMe-oF** | SAN (NVMe) | 25/50/100 GbE | Lowest latency, emerging | | **NFS** | NAS | 1/10/25 GbE | Universal, simple | | **SMB/CIFS** | NAS | 1/10/25 GbE | Windows native | | **S3 API** | Object | β€” | Standard for object storage | ## RAID | RAID | Min. disks | Capacity | Protection | Read speed | Write speed | Use case | |------|-----------|----------|---------|---------------|----------------|----------| | **0** | 2 | 100 % | None | N Γ— (striping) | N Γ— | Temp data, cache (risky) | | **1** | 2 | 50 % | 1 disk | N Γ— (mirror) | 1 Γ— | OS disk, critical data | | **5** | 3 | 67-94 % | 1 disk | N-1 Γ— | N-1 Γ— (parity write penalty) | Universal file/VM storage | | **6** | 4 | 50-88 % | 2 disks | N-2 Γ— | N-2 Γ— (double parity) | Large capacities, important data | | **10** | 4 | 50 % | 1/mirror | N Γ— | N/2 Γ— | Databases, VM, high-performance | | **50** | 6 | 67-94 % | 1/stripe | N-1 Γ— | N-1 Γ— | Large capacity + performance | | **60** | 8 | 50-88 % | 2/stripe | N-2 Γ— | N-2 Γ— | Enterprise | ### Stripe size - Small stripe (16-64 KB) β€” better IOPS, worse throughput (databases, OLTP) - Large stripe (128-1024 KB) β€” better throughput, worse IOPS (video, media, backup) - Write hole on RAID 5/6: metadata inconsistency during power loss while writing parity (prevention: non-volatile cache, battery-backed RAID controller) ## Software-Defined Storage (SDS) | Tool | Type | Use case | |---------|-----|----------| | **Ceph** | Object/Block/File (RADOS) | Universal SDS, OpenStack, Kubernetes | | **MinIO** | Object (S3 API) | High-performance S3, AI/ML data lake | | **GlusterFS** | Distributed File | Shared filesystem, POSIX | | **Longhorn** | Block (Kubernetes) | K8s PVC, microservices | | **Linstor** | Block (DRBD + LVM) | Linux SDS, Kubernetes | | **VMware vSAN** | Block (HCI) | VMware ecosystem | | **StarWind** | Block (HCI) | Hyper-V / VMware | ### Ceph **Architecture**: ``` RADOS (Reliable Autonomic Distributed Object Store) β”œβ”€β”€ Monitors (MON) β€” cluster map, quorum (3/5) β”œβ”€β”€ Managers (MGR) β€” dashboard, balancer, orchestrator β”œβ”€β”€ OSDs (Object Storage Daemons) β€” data + replication └── MDS (Metadata Server) β€” CephFS only ``` **CRUSH map** (Controlled Replication Under Scalable Hashing): - Algorithm for calculating data placement (no central index) - Layers: Root β†’ Datacenter β†’ Rack β†’ Host β†’ OSD - Failure domain: replication across racks / hosts - `ceph osd crush rule create-replicated replicated_rule default host` **Access interfaces**: | Interface | Type | Use case | |----------|-----|----------| | **RBD** (RADOS Block Device) | Block | VM images, Kubernetes PVC (csi-rbd) | | **RGW** (RADOS Gateway) | Object (S3/Swift API) | S3-compatible storage, backup | | **CephFS** | File (POSIX) | Shared filesystem, home dirs | | **NFS-Ganesha** | File (NFS) | NFS export over CephFS | **Erasure coding**: - K+M (data + parity chunks), e.g. 8+3 (8 data, 3 parity) - More space-efficient than 3Γ— replication (1.375Γ— vs 3Γ—) - Higher CPU overhead, lower IOPS - Recommended for cold data (RGW) instead of replication ## Enterprise storage vendors ### Hitachi VSP (Virtual Storage Platform) | Model | Architecture | Max capacity | IOPS / Latency | Protocols | Use case | |-------|-------------|--------------|----------------|-----------|----------| | **VSP 5200/5600** | Active-active, scale-up/out, 2–12 controllers | 69.3 PB raw, 287 PBe | 33M IOPS, 39 Β΅s | FC-NVMe 32Gb, FC 16/32Gb, FICON 16Gb, iSCSI 10Gb | Mission-critical, mainframe, enterprise consolidation | | **VSP E590/E790/E1090** | Symmetric active-active, up to 65 nodes/130 controllers | 10.62 PB raw (E1090) | 8.4M IOPS, <41 Β΅s | FC 32Gb, iSCSI 25Gb, FC-NVMe 32Gb | Midrange enterprise, hybrid workloads | **Key features**: SVOS common across entire portfolio, AI-driven data reduction 4:1 guarantee, Global-Active Device metro clustering, 8 nines availability (HW), 100% data availability guarantee. --- ### Huawei OceanStor Dorado | Model | Architecture | Max capacity | IOPS / Latency | Protocols | Use case | |-------|-------------|--------------|----------------|-----------|----------| | **Dorado 8000/18000 V6** | SmartMatrix full-mesh, up to 32 controllers | 32 TB cache, 6400 SSD | 40M IOPS, 0.05 ms | FC 32/64Gb, FC-NVMe, iSCSI, NFS, SMB, NVMe/RoCE, S3 | Mission-critical, finance, govt, carrier | | **Dorado 8000/18000 V7 (2025)** | SmartMatrix 4.0, up to 64/128 controllers | 500 PB+ | >100M IOPS, 0.03 ms | FC, RoCE, NVMe/TCP, NFS, SMB, S3 | AI workloads, converged block/file/object | **Key features**: SmartMatrix survives 7/8 controllers, FlashEver (3-gen online HW upgrade in 10 years), RAID-TP (triple SSD failure), DPU-based SmartNIC, ML-based I/O prefetch, 100% ransomware detection (Tolly), #1 SPC-1 benchmark. --- ### Dell PowerStore & PowerMax | Model | Architecture | Max capacity | IOPS / Latency | Protocols | Use case | |-------|-------------|--------------|----------------|-----------|----------| | **PowerStore 1500/5500/9500 (Gen 3)** | Active-active dual-node, PCIe Gen5, DDR5, RDMA 200GbE | 1.2 PB raw, 5.8 PBe | 3Γ— IOPS vs Gen2 | FC 32/64Gb, iSCSI, NVMe/FC, NVMe/TCP, NFSv4, SMB3 | Midrange-to-high-end, VMware, containerized | | **PowerMax 2500/8500** | Scale-out NVMe, Dynamic Fabric, up to 16 nodes | 8.8 PBe (2500), 18 PBe (8500) | 6 nines availability | FC 64Gb, FICON, NVMe/FC, NVMe/TCP, iSCSI, NFS, SMB | Mission-critical, mainframe, OLTP, cyber vault | **Key features**: PowerStore 6:1 DRR guarantee, unified block/file/vVols out of box, Cyber Detect AI anomaly; PowerMax 5:1 DRR, Secure Snapshots 65M, SRDF/Metro, Flexible RAID up to 92% efficient, FIPS 140-3. --- ### HPE Alletra | Model | Architecture | Max capacity | IOPS / Latency | Protocols | Use case | |-------|-------------|--------------|----------------|-----------|----------| | **Alletra 5000** | Active-active hybrid flash, dual controller | 1.2 PB raw | 99.9999% guarantee | FC, iSCSI | Mixed primary + secondary, cost-efficient hybrid | | **Alletra 6000** | Active-active all-NVMe, dual controller | ~368 TB usable | <100 Β΅s | FC, iSCSI | Business-critical DB, VDI, VMware | | **Alletra 9000** | Active-active all-NVMe, multi-node scale-out | 2–4 PB+ usable | ~2–3M IOPS, <150 Β΅s | FC, iSCSI, NVMe/FC | Mission-critical ERP, AI, consolidation | | **Alletra Storage MP** | Disaggregated modular, block + file + object | 5.8 PB block, 11.8 PB object | 100% availability guarantee | FC, iSCSI, NVMe/FC, NFS, SMB, S3 | Multi-protocol consolidation, AI/analytics | **Key features**: Triple Parity RAID (5000), InfoSight AI Ops, HPE GreenLake as-a-service, non-disruptive controller upgrades (MP), 100% data availability guarantee. --- ### Infinidat | Model | Architecture | Max capacity | IOPS / Latency | Protocols | Use case | |-------|-------------|--------------|----------------|-----------|----------| | **InfiniBox SSA G4** | Triple-active controller, AMD EPYC PCIe 5.0, DDR5 | 1.97 PB usable / 5.9 PBe | 2.24M IOPS, 35 Β΅s | FC 32Gb, 25/100GbE, NVMe-oF/TCP, iSCSI, NFS, SMB, S3 | Mission-critical Oracle/SQL, multi-site DR | | **InfiniBox G4 Hybrid** | Triple-active hybrid (HDD + flash cache) | 10.9 PB raw / 32.8 PBe | 2.24M IOPS, 64 GB/s | FC, Ethernet, NVMe-oF, iSCSI, NFS, SMB, S3 | Backup, massive unstructured data | **Key features**: Only 3-way active on the market, Neural Cache (ML-driven), InfiniRAID, Immutable snapshots, 100% availability + 1-min snapshot recovery guarantee, everything included in base price (no extra licensing). --- ### Pure Storage FlashArray | Model | Architecture | Max capacity | IOPS / Latency | Protocols | Use case | |-------|-------------|--------------|----------------|-----------|----------| | **FlashArray//X (X20–X90 R5)** | Active-active, NVMe DirectFlash | 1.2 PB raw / 4.4 PBe | 250 Β΅s, 5:1 DRR | FC, NVMe/FC, NVMe/RoCE, NVMe/TCP, iSCSI, NFS, SMB | Mission-critical DB, VMware, enterprise | | **FlashArray//C (C50–C90 R5)** | Active-active, QLC DirectFlash | 4.2 PB raw / 16.3 PBe | 5:1 DRR | FC, NVMe-oF, iSCSI, NFS, SMB | Capacity-optimized, backup, file | | **FlashArray//XL (XL190)** | Active-active, 40 DirectFlash modules | 1.9 PB raw / 9.4 PBe | >4M IOPS, <100 Β΅s, 45 GB/s | FC 64Gb, 100GbE RoCE, NVMe/FC, NVMe/TCP, NFS, SMB | Largest DB consolidation, OLTP | **Key features**: DirectFlash (no FTL layer), 99.9999% availability, Evergreen (never forklift upgrade), Purity OS unified across entire portfolio, ActiveCluster/ActiveDR, Pure1 AIOps. --- ### Lenovo ThinkSystem | Model | Architecture | Max capacity | IOPS / Latency | Protocols | Use case | |-------|-------------|--------------|----------------|-----------|----------| | **DM Series** (DM3200F/5200F/7200F) | Active-active, all-NVMe, NetApp ONTAP | 1.8 PB raw / 6.8 PBe | Up to 120 NVMe SSD | FC 64Gb, iSCSI, NVMe/FC, NFS, SMB, S3 | Unified block/file, AI/ML, VMware | | **DG Series** (DG5200/7200) | Active-active, all-QLC, ONTAP | 7.4 PB raw / 27 PBe | QLC economics | FC, NVMe/FC, NVMe/TCP, iSCSI, NFS, SMB, S3 | Capacity-optimized, backup, archive | | **DE Series** (DE4000F–DE6600F) | Active-active, SAS/NVMe hybrid | 1.84 PB raw | 2M IOPS, <100 Β΅s, 44 GB/s | FC 32Gb, iSCSI 25Gb, NVMe/FC, SAS, NVMe/RoCE | HPC, analytics, video surveillance | **Key features**: DM/DG use ONTAP (SnapMirror, SnapVault, FabricPool, RAID-DP/RAID-TEC); cluster scale-out up to 12 HA pairs; DE series best price/performance in portfolio. --- ### Synology | Model | Architecture | Max capacity | Protocols | Use case | |-------|-------------|--------------|-----------|----------| | **UC3200/UC3400** | Active-active dual-controller, SAS backend | 576 TB raw | iSCSI, FC 16Gb, 10/25GbE | SMB/midmarket SAN, VMware, HA | | **DS/RS Series** (RS3626xs+, RS6426xs+) | Single-controller / HA pair, Btrfs | 864 TB raw, 1 PB volume | SMB, NFS, iSCSI, FC (HBA) | SME all-in-one NAS/SAN, backup, surveillance | **Key features**: DSM UC for SAN, Synology HA, Snapshot Replication (16K snapshots), VMware VAAI/ODX/ALUA, Surveillance Station, low TCO. --- ### Vendor comparison β€” overview | Vendor | Flagship | Max IOPS | Max capacity | Latency | Availability guarantee | Main differentiator | |--------|----------|----------|-------------|---------|---------------------|----------------------| | **Hitachi** | VSP 5600 | 33M | 287 PBe | 39 Β΅s | 8 nines (HW) | Mainframe + open; 65-node cluster | | **Huawei** | Dorado 18000 V7 | >100M | 500 PB+ | 0.03 ms | 99.99999% | SmartMatrix; #1 SPC-1 | | **Dell** | PowerMax 8500 | β€” | 18 PBe | β€” | 6 nines | SRDF/Metro; mainframe | | **HPE** | Alletra 9000/MP | ~3M | 11.8 PBe | <150 Β΅s | 100% data guarantee | InfoSight AIOps; GreenLake | | **Infinidat** | InfiniBox SSA G4 | 2.24M | 32.8 PBe | 35 Β΅s | 100% availability | 3-way active; Neural Cache | | **Pure** | FlashArray//XL | >4M | 16.3 PBe | <100 Β΅s | 99.9999% | DirectFlash; Evergreen | | **Lenovo** | DM7200F | β€” | 27 PBe | β€” | β€” | ONTAP ecosystem; broad portfolio | | **Synology** | UC3400 | 690K | 576 TB | β€” | β€” | Lowest price for active-active SAN | --- ### Storage selection by use case | Use case | Recommendation | Rationale | |----------|-----------|-------------| | **Mainframe + open hybrid** | Hitachi VSP / Dell PowerMax | Only ones with FICON + FC simultaneously | | **AI/ML training** | Huawei Dorado V7 / Pure //XL | Highest IOPS, lowest latency | | **Enterprise DB (Oracle, SQL Server)** | Infinidat / Pure //X | Low latency, consistent performance | | **Virtualization (VMware, Hyper-V)** | Dell PowerStore / HPE Alletra 6000 | VAAI, vVols, InfoSight | | **SMB / SME** | Synology / Lenovo DE | Low TCO, simple management | | **Object storage / backup** | Pure //C / Lenovo DG / Infinidat Hybrid | QLC economics, high capacity | | **Multi-protocol consolidation** | HPE Alletra MP / Huawei Dorado | Block + file + object in one platform | ## Decision diagram β€” storage platform selection ```mermaid flowchart TD Start(["Storage requirement"]) --> PROTO{"Access type"} PROTO -->|"Block (SAN)"| BLOCK PROTO -->|"File (NAS)"| FILE PROTO -->|"Object"| OBJECT BLOCK --> BPERF{"Performance tier"} BPERF -->|"Tier 0/1
< 100 Β΅s, > 1M IOPS"| BT1["Infinidat / Pure //XL
Huawei Dorado V7
FC-NVMe, NVMe-oF"] BPERF -->|"Tier 2
100-500 Β΅s"| BT2["Dell PowerStore / HPE Alletra 6000
Hitachi VSP / Lenovo DM
FC 32G, iSCSI 25GbE"] BPERF -->|"Tier 3
SME / low-cost"| BT3["Synology UC3400
Lenovo DE / Dell PowerVault
iSCSI, SAS"] BLOCK --> BECOS{"Ecosystem"} BECOS -->|"Mainframe"| BMF["Hitachi VSP / Dell PowerMax
FICON + FC simultaneously"] BECOS -->|"VMware"| BVM["Dell PowerStore / HPE Alletra
VAAI, vVols, InfoSight"] BECOS -->|"Oracle / SQL Server"| BDB["Infinidat / Pure //X
Lowest latency"] FILE --> FSIZE{"Scaling"} FSIZE -->|"Enterprise"| FE["HPE Alletra MP (file)
Lenovo DM / Dell PowerScale
NFS, SMB, multi-protocol"] FSIZE -->|"SMB"| FS["Synology DS/RS
Lenovo DE / TrueNAS
Btrfs, NFS, SMB, low TCO"] OBJECT --> OUSE{"Use case"} OUSE -->|"Backup / archive"| OB["Pure //C / Infinidat Hybrid
Lenovo DG
QLC, erasure coding, low cost/TB"] OUSE -->|"AI/ML data lake"| OM["MinIO / Pure //C
High throughput S3
NVMe direct, erasure coding"] OUSE -->|"Kubernetes PVC"| OK["Ceph RBD / Longhorn / Linstor
SDS on K8s
CSI, replication, snapshots"] ``` ## OpenStack Storage OpenStack offers three main storage services: | Service | Type | Description | |--------|-----|-------| | **Cinder** | Block storage | Persistent volumes for instances (iSCSI, NFS, Ceph RBD) | | **Swift** | Object storage | RESTful object store (S3-compatible via middleware) | | **Manila** | File storage | Shared file systems (NFS, CIFS) as a managed service | ### Cinder (Block Storage) - Multi-backend support: LVM, Ceph RBD, NFS, iSCSI, Fibre Channel - Snapshoting, cloning, encryption at rest - Cinder scheduler for volume distribution across backends - QoS specs for IOPS/bandwidth limits ### Swift (Object Storage) - Alternative to S3 for on-prem object storage - Ring-based data distribution (consistent hashing) - Multi-region replication (syncopy) - Stateless REST API (RESTful, no single point of failure) ### Manila (Shared File Systems) - Managed NFS/CIFS for sharing between instances - Backends: NetApp, Dell EMC, CephFS, GlusterFS - Access rules (IP-based, cert-based, user-based) - Use case: HPC cluster home directories, NAS for legacy apps ### Container storage (OpenStack + Ceph) Ceph is the most common storage backend for OpenStack: Cinder (RBD), Swift (RGW), Manila (CephFS), Glance (RBD images). ## Sources Links, books and standards: [sources/infrastructure/sources.md](sources/infrastructure/sources.md) ### Recommended reading | Book | Authors | ISBN | Description | |-------|--------|------|-------| | Storage Systems | Ganger, Gibson | 978-1680837540 | Textbook covering the design, implementation and operation of storage systems β€” from device characteristics through OS, databases and networking to server distribution and large-scale systems. An essential resource for storage infrastructure architects. | *Last revision: 2026-06-03*