Files
knowledge-base/DATACENTERS.en.md
Stanislav Hubacek ef3c2f75b1 18.6.2026
2026-06-18 16:25:33 +02:00

1064 lines
55 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 🏭 Datacenters
## Tier classification (TIA-942 / Uptime Institute)
| Tier | Availability | Downtime / year | Redundancy |
|------|-------------|-----------------|------------|
| **Tier I** | 99.671 % | 28.8 h | N — no redundancy |
| **Tier II** | 99.741 % | 22.7 h | N+1 — redundant components |
| **Tier III** | 99.982 % | 1.6 h | N+1 — concurrently maintainable |
| **Tier IV** | 99.995 % | 26.3 min | 2N+1 — fault tolerant |
## Key subsystems
| System | Description |
|--------|-------------|
| **Power** | UPS, generators (diesel), ATS, PDU, redundant feeds (A/B feed) |
| **Cooling** | CRAC/CRAH, chilled water, free cooling, containment (hot/cold aisle) |
| **Physical security** | CCTV, biometric access, mantrap, rack security locks |
| **Cabling** | Structured cabling (Cat6A/7/8, OM3/OM4 single-mode fiber), patch panels |
| **Fire suppression** | Alarm, inert gases (Novec, FM-200), VESDA (very early smoke detection) |
| **Monitoring** | DCIM (Data Center Infrastructure Management), SNMP, BMS (Building Management System) |
## Aisle containment
```
┌────────────────────────────────────┐
│ Rack Row │
│ ┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐ │
Cold │ │ │ │ │ │ │ │ │ │ │ │ │ │ Cold
Aisle <──│ └──┘ └──┘ └──┘ └──┘ └──┘ └──┘ ──> Aisle
│ ┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐ │
Hot │ │ │ │ │ │ │ │ │ │ │ │ │ │ Hot
Aisle ──>│ └──┘ └──┘ └──┘ └──┘ └──┘ └──┘ <── Aisle
└────────────────────────────────────┘
```
## Environmental classes (ASHRAE TC 9.9)
ASHRAE Technical Committee 9.9 defines temperature and humidity envelopes for IT equipment in DC.
| Class | Temperature (recommended) | Temperature (allowable) | Usage |
|-------|--------------------------|-------------------------|-------|
| **A1** | 18-27 °C | 15-32 °C | Enterprise DC, strict control |
| **A2** | 18-27 °C | 10-35 °C | Standard DC |
| **A3** | 18-27 °C | 5-40 °C | Looser environment |
| **A4** | 18-27 °C | 5-45 °C | Maximum cooling savings |
| **H1** | 18-22 °C | 5-25 °C | High-density air-cooled (AI/ML) |
- 5th edition (2021) added class H1 for high-density and extended liquid cooling W-classes (W17, W27, W32, W40, W45, W+)
- 2024: new S-classes for Technology Cooling System (TCS) liquid cooling
- Humidity: recommended 9 °C DP to 70 % RH (at low pollutants); max 50 % RH at high corrosivity
## Power
### Power chain
```
Grid ──> Transformer ──> UPS ──> PDU ──> Rack PDU ──> Server PSU
├──> Generator (ATS switches on outage)
└──> STS/ATS (Static Transfer Switch)
```
A/B feed topology:
```
Grid A ──> UPS A ──> PDU A1 ──> Rack PDU A ──> PSU A (server)
Grid B ──> UPS B ──> PDU B1 ──> Rack PDU B ──> PSU B (server)
```
Each server has 2 PSUs — each powered from a different branch (A/B). On failure of one branch, the server continues without interruption.
### UPS types
| Classification | IEC 62040-3 | Description | Switching | Use case |
|--------------|-------------|-------------|-----------|----------|
| **VFD** (Voltage & Frequency Dependent) | Passive standby | UPS in bypass, switches to inverter on failure | 4-10 ms | SOHO, edge |
| **VI** (Voltage Independent) | Line-interactive | Voltage regulation via autotransformer | 2-4 ms | Smaller racks, office |
| **VFI** (Voltage & Frequency Independent) | Double-conversion | AC → DC → AC, full isolation, zero switching time | 0 ms | Enterprise DC, Tier III/IV |
For DC the standard is **VFI (double-conversion)** — online UPS, zero switching time, full isolation from the grid.
### Battery technologies
| Type | Density (Wh/L) | Lifespan (cycles) | Lifespan (years) | Temperature | Cost/kWh | Note |
|------|---------------|-------------------|------------------|-------------|----------|------|
| **VRLA** (AGM/Gel) | 50-80 | 200-500 | 3-5 | 20-25 °C | ~$150-200 | Cheap, large, heavy, temperature sensitive |
| **Li-ion (LFP)** | 200-350 | 3000-5000 | 10-15 | 0-40 °C | ~$300-500 | Small, light, long life, BMS required |
| **Li-ion (NMC)** | 250-400 | 1000-2000 | 8-12 | 0-40 °C | ~$250-400 | Higher density, thermal runaway risk |
| **NiCd** | 80-150 | 1000-2000 | 10-15 | 20-50 °C | ~$400-600 | Extreme temperatures, memory effect |
| **Flow battery** (V/Zn/Br) | 20-40 | 10,000+ | 20+ | 10-35 °C | ~$500-800 | Unlimited cycles, large, long-term backup |
Li-ion (LFP) is becoming the standard for new DCs due to longer life, smaller footprint, and better behavior at high temperatures.
### Generator sizing
| Variant | Size | Fuel | Start time | Run time | Use case |
|---------|------|------|------------|----------|----------|
| **Diesel** | 500-2500 kVA | Diesel | 10-30 s | 24-72 h (depending on tank) | Standard for enterprise DC |
| **Nat. gas** | 200-1500 kVA | Natural gas | 10-30 s | Unlimited (pipeline) | Less common, lower emissions |
| **CHP** (cogeneration) | 500-2000 kVA | Natural gas | 5-15 min | Unlimited | Combined power + cooling (absorption chiller) |
Sizing: Generator should cover 100 % IT load + 100 % cooling load (incl. chillers) — typically 1.3-1.8× IT load. Diesel tank min. for 24 h operation, commonly 48-72 h. Daily consumption ~0.3-0.4 L/kWh.
### ATS vs STS
| Feature | ATS (Automatic Transfer Switch) | STS (Static Transfer Switch) |
|---------|--------------------------------|-----------------------------|
| **Switching** | 4-10 ms (mechanical relay) | < 4 ms (thyristor) |
| **Lifespan** | ~10,000 switches | Unlimited (solid-state) |
| **Cost** | Low | High (~3-5× ATS) |
| **Use case** | Generator → UPS feed | Between two UPS outputs |
### PDU types
| Type | Description | Use case |
|------|-------------|----------|
| **Basic** | Passive splitter (no monitoring) | Edge, office |
| **Metered** | Current measurement at PDU level | Standard DC |
| **Monitored** | Measurement per outlet, SNMP, web GUI | Enterprise DC |
| **Switched** | On/off per outlet, remote reboot | Enterprise DC, colo |
| **High-density** | 3-phase, 60-100 A, C19 outlets | GPU/HPC/AI racks |
### Power calculation
```
Total Power = Σ(P_server + P_storage + P_network + P_cooling + P_losses)
P_server = P_idle + (P_max - P_idle) × Utilization%
P_cooling = P_IT / PUE
Example:
100 servers × 500 W (avg) = 50 kW IT load
PUE = 1.5 → total 75 kW
UPS + generator → sized for 75 kW × 1.2 (safety factor) = 90 kW
```
### PUE (Power Usage Effectiveness)
```
PUE = Total Facility Energy / IT Equipment Energy
```
| PUE | Efficiency | Type |
|-----|-----------|------|
| 1.0-1.1 | Excellent | Hyperscale (Google, Meta) |
| 1.1-1.3 | Very good | Modern DC |
| 1.3-1.6 | Good / average | Enterprise DC |
| 1.6-2.0 | Below average | Older DC |
| >2.0 | Poor | Legacy |
PUE is measured at the whole DC level, not per rack. Includes: UPS losses, cooling, lighting, distribution losses. Excludes: well-to-tank fuel production, embodied carbon. Target for modern DC: PUE < 1.2.
### WUE and CUE
| Metric | Description | Formula | Target |
|--------|-------------|---------|--------|
| **WUE** (Water Usage Effectiveness) | Water consumption per IT energy | WUE = Annual Water Usage / IT Energy (L/kWh) | < 0.5 L/kWh |
| **CUE** (Carbon Usage Effectiveness) | CO₂ emissions per IT energy | CUE = Total CO₂ / IT Energy (kg CO₂/kWh) | < 0.2 kg CO₂/kWh |
WUE is critical in dry regions (southwest US, Australia, Middle East). Adiabatic cooling consumes significantly more water than closed-loop cooling.
### 3-phase vs Single-phase
| Feature | Single-phase (230 V) | 3-phase (400 V) |
|---------|---------------------|-----------------|
| **Voltage** | 230 V (L-N) | 230/400 V (L-N/L-L) |
| **Power per feed** | ~7.4 kW (32 A) | ~22 kW (32 A, 3-ph) |
| **Efficiency** | Lower (more losses) | Higher (lower current) |
| **Use case** | Smaller racks, office | Standard in DC, high-density |
| **PDU** | Single-phase (C13/C19) | 3-phase (C13/C19, 3-ph monitoring) |
| **Balancing** | Automatic | Phase balancing required (L1/L2/L3) |
### Rack power density
| Cat. | Type | kW/rack | Power | Cooling |
|------|------|---------|-------|---------|
| Low | Office, storage | 1-3 kW | 1-ph, 16 A | Air (free cooling) |
| Medium | Standard compute | 5-10 kW | 3-ph, 32 A | Air (CRAC/CRAH) |
| High | GPU, HPC | 15-30 kW | 3-ph, 60 A | Air + liquid assist |
| Ultra | AI/ML clusters | 40-100+ kW | 3-ph, 100+ A | Direct-to-chip / immersion |
### Rack PDU connectors
| Connector | Max current | Device type |
|-----------|-------------|-------------|
| **C13** | 10 A (250 V) | Servers, switches, 1U |
| **C19** | 16 A (250 V) | Higher power servers, UPS |
| **IEC 60309** (3-ph) | 16-125 A | Rack PDU inputs |
| **NEMA L6-30** | 30 A (250 V) | US spec |
## Cooling
### Cooling — technology overview
| Technology | Type | Output (kW/rack) | Typical PUE | CAPEX | Use case |
|-----------|------|-----------------|-------------|-------|----------|
| **Free air cooling** | Air | < 5 | 1.05-1.15 | Low | Climatically suitable locations |
| **CRAC (DX)** | Air | 5-10 | 1.4-1.8 | Medium | Smaller DC, retrofit |
| **CRAH (CW)** | Air | 5-15 | 1.2-1.5 | High | Enterprise DC |
| **In-row cooling** | Air | 10-25 | 1.2-1.4 | High | High-density racks |
| **Rear-door HX** | Hybrid | 15-30 | 1.1-1.3 | Medium | Retrofits, GPU |
| **Direct-to-chip** | Liquid | 40-100+ | 1.05-1.15 | High | AI/ML, HPC |
| **Immersion (single-phase)** | Liquid | 50-100+ | 1.03-1.10 | High | Bitcoin, hyperscale |
| **Immersion (two-phase)** | Liquid | 100-200+ | 1.03-1.08 | Very high | Extreme density |
### Chilled water vs Direct Expansion (DX)
| Feature | Chilled water (CW) | Direct Expansion (DX) |
|---------|-------------------|----------------------|
| **Medium** | Water + glycol | Refrigerant (R134a, R410A, R454B) |
| **CRAC/CRAH** | CRAH (Coolant-based) | CRAC (refrigerant compressor) |
| **Efficiency** | Higher (COP 5-7) | Lower (COP 2-4) |
| **Water temperature** | 7-12 °C (standard), 18-22 °C (high-temp) | 5-10 °C (evaporator) |
| **Complexity** | Higher (chillers, pumps, pipes, cooling tower) | Simpler |
| **Maintenance** | Higher (water treatment, legionella prevention) | Lower |
| **Use case** | Large DC > 500 kW, enterprise | Smaller DC, edge, retrofit |
### Containment types
| Type | Description | Efficiency | Implementation |
|------|-------------|------------|----------------|
| **Cold aisle containment (CAC)** | Enclosed cold aisle, warm air returns to room | High | Doors at aisle ends, ceiling panels |
| **Hot aisle containment (HAC)** | Enclosed hot aisle, warm air goes directly to return | Higher | Doors + ceiling panels, return to CRAH |
| **Chimney / rear duct** | Each rack has its own exhaust chimney to ceiling | Highest | Individual ducts per rack, expensive |
| **Open aisle** | No containment, cold and warm air mix | Low | Legacy, cheap |
Recommendation: CAC/HAC at density > 5 kW/rack. HAC is 5-10 % more efficient than CAC (warm air is directly extracted, does not mix with room).
### CFD modeling
Computational Fluid Dynamics (CFD) simulates airflow in DC before physical implementation:
- Identification of hot spots (warm air recirculation into cold aisle)
- Optimization of perforated tile positions
- Design of bypass airflow (cable openings, uncovered positions)
- Simulation of CRAH unit failure (what-if scenarios)
- Tools: Future Facilities (6Sigma DC), Ansys Fluent, OpenFOAM
### Free cooling
- **Air-side** — intake of outside air at suitable temperature (filtration, humidification)
- **Water-side** — use of cold water from outdoor chillers (strainer cycle) without compressor
- **Climate zone** — free cooling usable ~2000-8000 hours/year depending on location
- Scandinavia: 7000-8000 h/year
- Central Europe: 4000-6000 h/year
- Southern Europe: 2000-4000 h/year
- **Hybrid** — combination of free cooling + mechanical cooling (most common)
- **Economizer types**: Class A1 (dry cooler), Class A2 (evaporative), Class B (air-side)
### Liquid cooling detail
| Type | Inlet temperature | Capacity (kW/rack) | Medium | Installation |
|------|-----------------|-------------------|--------|-------------|
| **Cold plate (D2C)** | 20-45 °C | 40-100+ | Water, propylene glycol | CDU per rack or per row |
| **Rear-door HX** | 18-27 °C | 15-30 | Water | Passive, no server modification |
| **Immersion (1-ph)** | 35-50 °C | 50-100+ | Dielectric oil | Tank, CDU, heat exchanger |
| **Immersion (2-ph)** | 25-35 °C | 100-200+ | Dielectric (boiling) | Tank + condenser |
**CDU (Coolant Distribution Unit)**:
- Provides coolant temperature and pressure to racks
- Primary loop (facility water) + secondary loop (rack coolant)
- Sizing: 1 CDU per 4-8 racks (40-100 kW per CDU)
- Redundancy: N+1 CDU, dual coolant loops
**Water quality requirements**:
- Conductivity: < 1 µS/cm (demineralized water)
- pH: 6.5-8.0
- Particulates: < 50 µm (filtration)
- Corrosion prevention: inhibitors, glycol (10-30 %)
- Biological growth prevention: UV, biocides
### Adiabatic cooling
Using water evaporation to cool air:
- **Direct adiabatic** — air passes through water (media pad), cools and humidifies
- **Indirect adiabatic** — air cools via heat exchanger without direct contact with water
- **Water consumption**: 3-5 L/kWh (direct), 1-2 L/kWh (indirect)
- Efficiency depends on air humidity — more effective in dry climates
## Cabling and structured cabling
### TIA-942 cabling hierarchy
```
Entrance Room (ER)
├── Backbone cabling (fiber single-mode / multi-mode)
│ │
│ ├── Main Distribution Area (MDA)
│ │ │
│ │ ├── Horizontal Distribution Area (HDA)
│ │ │ │
│ │ │ └── Equipment Distribution Area (EDA) → rack
│ │ │
│ │ └── Intermediate Distribution Area (IDA) — optional
│ │
│ └── Telecommunication Room (TR) — for office
└── Backbone cabling (fiber / copper)
```
### Copper cabling categories
| Category | Frequency | Speed | Length | Connector | Use case |
|----------|-----------|-------|--------|-----------|----------|
| **Cat5e** | 100 MHz | 1 GbE | 100 m | RJ45 | Legacy, voice |
| **Cat6** | 250 MHz | 1 GbE (10 GbE up to 55 m) | 100 m (10 GbE: 55 m) | RJ45 | Standard DC, enterprise |
| **Cat6A** | 500 MHz | 10 GbE | 100 m | RJ45 | Standard for new DC |
| **Cat7** (GG45) | 600 MHz | 10 GbE | 100 m | GG45/TERA | Niche, replaced by Cat6A/8 |
| **Cat8.1** | 2000 MHz | 25/40 GbE | 30 m | RJ45 | Top-of-rack, storage |
| **Cat8.2** | 2000 MHz | 25/40 GbE | 30 m | GG45/TERA | Top-of-rack, storage |
In DC, **Cat6A** (10 GbE up to 100 m) is standard for horizontal cabling. Cat8 only for patch cables within a rack (up to 30 m).
### Fiber optic types
| Type | Core | Modal BW | Speed | Max length | Use case |
|------|------|----------|-------|-----------|----------|
| **OS1** (SM) | 9 µm | — | 100 GbE - 800 GbE | 10-80 km | Backbone, campus, WAN |
| **OS2** (SM) | 9 µm | — | 100 GbE - 800 GbE | 2-80 km (CWDM/DWDM) | Backbone, DWDM |
| **OM1** (MM) | 62.5 µm | 200 MHz·km | 1 GbE | 275 m | Legacy |
| **OM2** (MM) | 50 µm | 500 MHz·km | 10 GbE | 82 m | Legacy |
| **OM3** (MM) | 50 µm | 2000 MHz·km | 10 GbE up to 300 m, 100 GbE up to 100 m | 300 m (10G) | Standard DC, VCSEL |
| **OM4** (MM) | 50 µm | 4700 MHz·km | 100 GbE up to 150 m, 400 GbE up to 100 m | 550 m (10G) | High-performance DC standard |
| **OM5** (MM) | 50 µm | 4700+ MHz·km | 200/400 GbE SWDM | 150 m (100G) | Emerging, SWDM |
For new DC: **OM4** as standard for multi-mode, **OS2** for single-mode backbone (LR, DWDM). OM5 is not widely deployed — OM4 + parallel optics (SR4) is more common.
### Connector types
| Connector | Type | Insertion loss | Fiber count | Use case |
|-----------|------|---------------|-------------|----------|
| **LC** | Duplex | < 0.15 dB | 2 | Standard for SFP/SFP+/QSFP |
| **SC** | Duplex | < 0.2 dB | 2 | Older installations, patch panels |
| **MPO/MTP** (12-f) | Multi-fiber | < 0.35 dB | 12/24 | 40/100/400 GbE parallel |
| **MPO/MTP** (24-f) | Multi-fiber | < 0.5 dB | 24 | 400 GbE (SR4.2, DR4) |
| **SN** | Duplex (mini) | < 0.15 dB | 2 | High-density (QSFP-DD, OSFP) |
| **CS** | Duplex (mini) | < 0.15 dB | 2 | High-density (QSFP-DD, OSFP) |
### MPO/MTP polarity
| Method | Description | Use case |
|--------|-------------|----------|
| **Type A** (Straight) | Fiber 1→1, 2→2, ... | Duplex applications with cross-over at both ends |
| **Type B** (Crossed) | Fiber 1→12, 2→11, ... | Parallel optics (SR4, SR8) — standard |
| **Type C** (Pairs crossed) | Pairs 1-2→2-1, 3-4→4-3 | 40 GbE SR4 (4×10G) |
### Breakout cassettes
```
MPO (12-f) ──> Breakout cassette ──> 6× LC duplex (12 fibers = 6× duplex)
MPO (24-f) ──> Breakout cassette ──> 12× LC duplex (24 fibers = 12× duplex)
```
Use case: Connecting MPO ports (switch) with LC ports (servers, storage). Cassettes are in the patch panel, not in the active path.
### Copper vs fiber decision
| Criterion | Copper (Cat6A/8) | Fiber (OM4/OS2) |
|-----------|-----------------|-----------------|
| **Reach** | 30-100 m | 100 m - 80 km |
| **Speed** | 1-40 GbE | 1-800 GbE |
| **Transceiver cost** | Lower (RJ45) | Higher (SFP+/QSFP) |
| **Cable cost** | Lower | Higher (patch cord) |
| **Port power** | 2-5 W (25 GbE) | 1-3 W (25 GbE SR) |
| **EMI immunity** | Susceptible | Immune |
| **Weight (100 m)** | ~3-4 kg | ~0.5-1 kg |
| **Recommendation** | Up to 30 m, server→ToR switch | Backbone, storage, >30 m |
### Cabling best practices
- **Horizontal cabling**: max 90 m permanent link + 10 m patch cords (TIA-942)
- **Fiber management**: slack spools, cable managers, minimum bend radius 10× cable diameter
- **Color coding**: OS1/OS2 (yellow), OM3 (aqua), OM4 (magenta/purple), OM5 (lime green)
- **Labeling**: both ends, patch panels, faceplates — standard ANSI/TIA-606-B
- **Overhead vs underfloor**: overhead (ladder rack) is preferred in DC (better airflow, easier changes)
- **MPO cassettes**: plan 15-20 % fiber reserve for future needs
## Physical security
### Multi-layer security model (defense in depth)
```
Layer 1: Perimeter (fence, gate, guards)
Layer 2: Building (walls, locks, CCTV, card readers)
Layer 3: DC hall (biometrics, mantrap, CCTV, motion detection)
Layer 4: Rack / Cage (electronic locks, sensors)
Layer 5: Data (encryption, HSM, access control)
```
### Access control
| Method | Factor | Level | Note |
|--------|--------|-------|------|
| **RFID / proximity card** | Something you have | Standard | Basic access, cheap |
| **Smart card (PKI)** | Something you have + PIN | Medium | Certificate on card, anti-passback |
| **Biometric (fingerprint)** | Something you are | High | Fast, hygienic (touchless readers) |
| **Biometric (palm/finger vein)** | Something you are | Very high | Hard to forge, contactless |
| **Biometric (iris/retina)** | Something you are | Highest | Very accurate, slow, expensive |
| **Multi-factor** | 2+ factors | Highest | Card + biometrics + PIN — Tier IV DC |
### Mantrap design
```
Outer door ──> Mantrap (vestibule) ──> Inner door
├── Weight sensor (anti-tailgating)
├── CCTV (both doors)
├── Intercom (emergency exit)
└── Motion detector (in mantrap)
```
- Only one door opens at a time
- Anti-tailgating: weight sensor detects multiple persons
- Exit via breakout button + motion detection
- Emergency exit: panic bar + alarm
### CCTV
| Element | Recommendation |
|---------|----------------|
| **Resolution** | Min. 1080p, ideally 4K (6 MP+) |
| **FPS** | 15-30 FPS (recording), 30+ FPS (realtime monitoring) |
| **Retention** | Min. 30 days (90 days for audit) |
| **Storage** | NVR (on-prem), cloud (AWS KVS, Azure Video Indexer) |
| **AI analytics** | Face detection, ANPR (license plate), object detection |
| **Field of view** | Every door, every aisle — no blind spots |
### Asset tracking
| Technology | Accuracy | Cost | Use case |
|-----------|----------|------|----------|
| **Barcode** | Rack-level | Very low | Manual inventory |
| **RFID (passive)** | Rack-level (door sweep) | Low | Automatic rack open detection |
| **RFID (active, UWB)** | 10-30 cm | Medium | Real-time tracking |
| **Bluetooth BLE** | 1-3 m | Low | Approximate position |
| **GPS** | 1-10 m | Medium | Outdoor tracking |
## DC layout and design
### Raised floor vs Slab
| Feature | Raised floor | Slab (solid floor) |
|---------|-------------|-------------------|
| **Airflow** | Underfloor air distribution (raised floor as plenum) | Overhead air, in-row cooling |
| **Flexibility** | Easy addition of perforated tiles | Limited (overhead cooling required) |
| **Weight** | Limit 500-1000 kg/m² (depends on height) | Unlimited |
| **Cost** | Higher (~$200-400/m²) | Lower (~$100-200/m²) |
| **Height** | 600-900 mm (standard), 900-1200 mm (high-density) | — |
| **Trend** | Declining (shift to in-row/overhead cooling) | Growing (new DC, high-density) |
Modern high-density DC (AI/ML, GPU) are moving away from raised floor to slab + overhead/in-row cooling — higher rack weights (1000-2000 kg), inability to provide sufficient airflow through floor.
### Rack layout and dimensions
| Parameter | Standard | High-density | Note |
|-----------|----------|-------------|------|
| **Rack width** | 600 mm (19") | 600-750 mm | 750 mm for GPU (cabling, cooling) |
| **Rack depth** | 1000-1200 mm | 1200-1500 mm | GPU servers, longer cables |
| **Rack height** | 42U | 48U / 52U | Higher rack = better power density |
| **Aisle width (cold)** | 1200-1500 mm | 1500-1800 mm | Service access, airflow |
| **Aisle width (hot)** | 900-1200 mm | 1200-1500 mm | Narrower than cold |
| **Max rack load** | 500-800 kg | 1000-2000 kg | Floor reinforcement required |
### Space planning
```
For Tier III DC (example):
IT space: 1000 m²
└── 20 rows × 10 racks = 200 racks at 42U
└── 200 racks × 5 kW avg = 1 MW IT load
└── PUE 1.4 → 1.4 MW facility
Support spaces:
└── UPS + batteries: 200 m²
└── Generators: 100 m² (outdoor)
└── Cooling (chillers, cooling tower): 300 m²
└── Offices, storage, loading dock: 400 m²
Total: ~2000 m² (50% IT, 50% support)
```
### Zone approach (TIA-942)
| Zone | Description | Access | Security |
|------|-------------|--------|----------|
| **Z1** (Public) | Reception, offices | Free | Minimal |
| **Z2** (Office) | Administration, NOC | Employees + guests | RFID |
| **Z3** (DC support) | UPS, generators, cooling | DC operators | RFID + biometrics |
| **Z4** (DC hall) | Servers, storage, networking | DC operators + approved | RFID + biometrics + mantrap |
| **Z5** (Rack/cage) | Specific rack or cage | Only authorized personnel | Electronic lock |
## Fire suppression
### Detection
| System | Type | Detection time | False alarms | Use case |
|--------|------|----------------|--------------|----------|
| **VESDA** (Very Early Smoke Detection) | Aspiration, laser sensor | < 30 s (4 alarm levels) | Very low | Standard for DC |
| **Spot detection** | Ionization / optical smoke detector | 2-5 min | Medium | Legacy, smaller DC |
| **Heat detection** | Thermal detector (temperature / rate of rise) | 5-10 min | Very low | Backup for VESDA |
| **Line-type (LHD)** | Linear heat detection cable | 2-5 min | Low | Cable trays, above ceiling |
VESDA is the standard — active aspiration draws air from DC, laser sensor detects smoke particles at 4 levels (Alert → Action → Fire 1 → Fire 2). Enables intervention before visible smoke.
### Suppression systems
| System | Medium | Advantages | Disadvantages | Typical DC |
|--------|--------|------------|---------------|-----------|
| **Novec 1230** (FK-5-1-12) | Gas | Safe for people, zero ODP, short atmospheric lifetime (5 days) | Higher cost | Enterprise DC |
| **FM-200** (HFC-227ea) | Gas | Fast (10 s), effective | High GWP (3220), no ODP | Legacy DC |
| **Inergen** (IG-541) | Inert gas (52% N₂, 40% Ar, 8% CO₂) | Completely safe, natural gas | Large volume, high pressure | Enterprise DC |
| **Argonite** (IG-55) | 50% Ar, 50% N₂ | Safe, natural | Large volume, higher pressure | Enterprise DC |
| **Water mist** | Water (fine mist) | Cooling, smoke suppression, low cost | Water in DC (risk), local application only | Retrofits |
| **Pre-action sprinkler** | Water | Dual activation (detection + sprinkler) | Water risk, drainage required | Tier I-II |
**Concentration**: Novec (4-6 % volume), FM-200 (7-9 %), Inergen (35-50 %). Novec and Inergen are safe for breathing (min. 5-7 min evacuation).
### Detection zones
```
DC hall ──> zones of ~200 m² (max)
├── VESDA (each zone its own aspirator)
├── Smoke detectors (ceiling + floor)
└── Heat detection (backup)
```
## DCIM (Data Center Infrastructure Management)
### What DCIM covers
| Area | Metrics | Output |
|------|---------|--------|
| **Power** | Per PDU, per outlet, per rack, total | Capacity planning, PUE, kW/rack |
| **Cooling** | Temperature, humidity, airflow (sensors per rack) | Hot spot maps, airflow optimization |
| **Asset** | What is in which rack, U position, serial, warranty | Asset inventory, lease management |
| **Network** | Port utilization, patch panel connections | Patch management, port tracking |
| **Space** | Free U in rack, free racks | Capacity planning, "what-if" simulations |
### Tools
| Tool | Type | Platform | Cost | Note |
|------|------|----------|------|------|
| **Nlyte (Carrier)** | Enterprise DCIM | On-prem / Cloud | $$$ | Market leader, complex |
| **Sunbird DCIM** | Enterprise DCIM | Cloud | $$$ | Power monitoring, asset tracking |
| **Device42** | DCIM + IPAM | On-prem / Cloud | $$ | Integrated IPAM, CMDB |
| **NetBox** | Open source DCIM | On-prem | Free | IPAM, DCIM, asset tracking |
| **OpenDCIM** | Open source | On-prem | Free | Basic DCIM, asset management |
| **RackTables** | Open source | On-prem | Free | Simple, asset + networking |
| **Vendor-specific** | Dell OME, HPE OneView | On-prem | Part of HW | Vendor-specific only |
## Site selection
### Criteria for DC site selection
| Category | Criterion | Weight |
|----------|-----------|--------|
| **Power** | Electricity availability (grid capacity), cost/kWh, possibility of two independent feeds | High |
| **Connectivity** | Fiber backbone availability, number of connectivity providers, latency to major POP | High |
| **Natural risks** | Earthquakes, floods, hurricanes, tornadoes, wildfires — historical data + predictions | High |
| **Climate** | Average temperature, humidity (free cooling potential) | Medium |
| **Workforce** | Availability of technicians, DC operators, network/admin engineers | Medium |
| **Taxes and regulation** | Tax incentives, environmental regulations, building permits | Medium |
| **Security** | Crime, political stability, terrorist risk | High |
| **Transport accessibility** | Proximity to airport, highway (for HW deliveries, personnel) | Low |
### Natural risks — mapping
| Risk | Areas | Mitigation |
|------|-------|------------|
| **Earthquakes** | Pacific Ring of Fire (CA, Japan, Chile) | Base isolation, seismic bracing, flexible connections |
| **Hurricanes** | Caribbean, southeastern US, southeast Asia | Reinforced construction, generators above flood level |
| **Floods** | River valleys, coastal areas | Location outside flood zone, barriers |
| **Wildfires** | California, Australia, Mediterranean | Defensive zones, air filtration, monitoring |
### Power availability by region
| Region | Grid reliability | Cost/kWh (industrial) | Note |
|--------|-----------------|------------------------|------|
| **Northern Europe** (SE, NO, FI) | High (99.99 %) | $0.04-0.08 | Cheap green energy, cool climate |
| **Central Europe** (DE, NL, CZ) | High (99.99 %) | $0.10-0.20 | Stable, growing renewables |
| **Eastern US** (VA, NC) | High | $0.05-0.08 | Largest DC hub (Ashburn, VA) |
| **Western US** (CA, OR) | Medium (PG&E issues) | $0.10-0.15 | CALISO grid, blackout risk |
| **Singapore** | High | $0.15-0.20 | Moratorium on new DC (2023), water |
| **Dubai / UAE** | High | $0.06-0.10 | Cheap energy, high temperature (cooling) |
## Compliance and certification
| Standard / Certification | Area | Description |
|-------------------------|------|-------------|
| **TIA-942** (Rated 1-4) | DC design | Classification of redundancy, cabling, security (analogous to Uptime Tier) |
| **Uptime Institute** (Tier I-IV) | DC design | Operational certification, construction documentation |
| **ISO 27001** | ISMS | Information security, risk management |
| **ISO 27701** | Privacy | Extension of ISO 27001 for GDPR compliance |
| **SOC 2** (Type I/II) | Service org | Controls: Security, Availability, Confidentiality, Integrity, Privacy |
| **PCI DSS** | Payment cards | Physical security, access to cardholder data |
| **HIPAA** | Healthcare | USA, health data protection |
| **FedRAMP** | US government | Cloud service authorization, DC security |
| **GDPR** | EU | Personal data protection, data residency |
| **NIST SP 800-53** | DC security | Security control catalog for US federal |
| **ISO 14001** | EMS | Environmental management, sustainability |
## Sustainability
### Carbon footprint of DC
```
Total emissions = Scope 1 (direct) + Scope 2 (energy) + Scope 3 (supply chain)
Scope 1: Generators (diesel), refrigerant leaks
Scope 2: Purchased electricity (grid mix)
Scope 3: HW manufacturing, transport, EOL recycling (~60-80 % of total emissions)
```
### Emission reduction
| Measure | Impact on PUE | Emission reduction | Payback |
|---------|--------------|-------------------|---------|
| **Temperature increase** (22→27 °C) | 0.1-0.2 | 10-20 % cooling | Immediate |
| **Free cooling** | 0.1-0.3 | 20-40 % cooling | 1-2 years |
| **Liquid cooling** | 0.2-0.4 | 30-50 % cooling | 2-4 years |
| **LED lighting + sensors** | 0.01-0.02 | < 1 % | 1 year |
| **PPA (Power Purchase Agreement)** | — | 100 % Scope 2 | Variable |
| **Renewable sources** (rooftop solar) | — | 5-15 % consumption | 5-10 years |
| **Green generator** (HVO biodiesel) | — | 90 % CO₂ reduction | +30 % fuel cost |
### Sustainability certifications
| Certification | Description |
|--------------|-------------|
| **LEED** (BD+C: DC) | U.S. Green Building Council — design and construction |
| **BREEAM** | UK, European sustainability assessment |
| **Climate Neutral Data Centre Pact** (EU) | Self-regulatory, PUE < 1.4 by 2030 |
| **ISO 50001** | Energy management system |
| **Energy Star** | EPA, energy efficiency (US only) |
## Decision diagram — DC topology design
```mermaid
flowchart TD
Start(["DC design"]) --> TIER{"Required Tier?"}
TIER -->|"Tier I / II"| T1["N / N+1 redundancy<br/>Simple power, single path<br/>CRAC/CRAH, free cooling<br/>PUE 1.4-1.6, cost 1×"]
TIER -->|"Tier III"| T3["N+1, concurrently maintainable<br/>Dual path (A/B feed)<br/>Hot aisle containment<br/>PUE 1.2-1.4, cost 2×"]
TIER -->|"Tier IV"| T4["2N+1, fault tolerant<br/>Dual redundant + STS<br/>Hot + cold containment<br/>PUE 1.1-1.3, cost 3×"]
TIER --> POWER{"Power chain"}
POWER -->|"UPS"| UPS{"UPS type"}
UPS -->|"Enterprise DC"| UPS1["VFI double-conversion<br/>Li-ion (LFP), 10-15 years<br/>N+1 or 2N modular"]
UPS -->|"Edge / office"| UPS2["VI line-interactive<br/>VRLA, 3-5 years"]
POWER -->|"Generator"| GEN["Diesel 500-2500 kVA<br/>Tank for 24-72 h<br/>ATS 4-10 ms switching"]
POWER -->|"PDU"| PDU["3-phase 400 V<br/>Monitored/Switched<br/>A/B feed to racks"]
Start --> DENS{"Power density"}
DENS -->|"< 10 kW/rack"| COOL1["Air cooling<br/>CRAC/CRAH, raised floor<br/>Hot aisle containment<br/>ASHRAE A1-A2"]
DENS -->|"10-25 kW/rack"| COOL2["Hybrid<br/>In-row cooling<br/>Rear door HX<br/>ASHRAE A1-H1"]
DENS -->|"> 25 kW/rack"| COOL3["Liquid cooling<br/>CDU, direct-to-chip<br/>Immersion single/two-phase<br/>ASHRAE W-classes"]
Start --> CLIM{"Climate zone"}
CLIM -->|"Moderate (CZ, DE)"| FC1["Free cooling 4000-6000 h/year<br/>Chiller + economizer<br/>PUE saving 0.2-0.3"]
CLIM -->|"Warm (ES, US South)"| FC2["Chiller year-round<br/>Adiabatic cooling<br/>PUE 1.3-1.6"]
CLIM -->|"Cold (SE, NO)"| FC3["Free cooling 7000+ h/year<br/>Air-side economizer<br/>PUE < 1.2"]
```
## Secondary data center topologies
When planning a second DC, the choice of topology is key based on distance, RPO/RTO, and budget.
### Distance classification
| Category | Distance | Latency (round-trip) | Use case |
|-----------|-----------|---------------------|----------|
| **Metro (Campus)** | 120 km | < 1 ms | Synchronous replication, stretched cluster |
| **Metro** | 20100 km | 15 ms | Metro cluster, mostly sync replication |
| **Regional** | 100500 km | 520 ms | Asynchronous replication, warm standby |
| **Continent** | 5003000 km | 20100 ms | Asynchronous replication, cold standby |
| **Global** | 3000+ km | > 100 ms | Async only, no real-time dependencies |
### Topologies by operational mode
#### Active-Active (Hot-Hot)
```
DC-A (Primary) DC-B (Active)
┌────────────────────┐ ┌────────────────────┐
│ App Active │ │ App Active │
│ DB Active │◄─sync─►│ DB Active │
│ Users → LB → A │ │ Users → LB → B │
└────────────────────┘ └────────────────────┘
│ │
└──── Global Load Balancer ────┘
```
| Parameter | Value |
|----------|---------|
| **RTO** | 0seconds (automatic failover, traffic is redirected) |
| **RPO** | 0 (sync replication, commit is confirmed only after write to both DCs) |
| **Max distance** | < 100 km (latency < 5 ms RTT for sync DB replication) |
| **Operating costs** | 2× (both DCs fully active, both fully equipped) |
| **Advantages** | Zero downtime, instant switchover, full utilization of both DCs |
| **Disadvantages** | Requires synchronous replication → distance limit, complex networking, split-brain risk |
**Split-brain solutions**: STONITH (Shoot The Other Node In The Head), watchdog, quorum (3rd node in 3rd location / cloud), fencing, SCSI-3 persistent reservation.
**Use case**: Financial services, telco, payment gateways — where even a minute of downtime = millions.
#### Active-Passive (Hot-Warm, MetroCluster)
```
DC-A (Primary) DC-B (Standby)
┌────────────────────┐ ┌────────────────────┐
│ App Active │ │ App Standby │
│ DB Primary │──sync──►│ DB Standby │
│ Users → LB → A │ │ ~~~ (waiting) ~~~ │
│ DNS: A-record │ │ DNS: health check │
└────────────────────┘ └────────────────────┘
```
| Parameter | Value |
|----------|---------|
| **RTO** | tens of secondsminutes (DNS failover + App startup) |
| **RPO** | 0 (sync) or seconds (async) |
| **Max distance** | sync < 100 km, async unlimited |
| **Operating costs** | 1.51.8× (second DC has reduced or idle compute) |
| **MetroCluster** | Specific implementation: FC SAN over DWDM, sync mirror, automatic failover |
**MetroCluster** (NetApp, Dell EMC, HPE):
- Storage-based cluster with synchronous mirroring between DCs
- Automatic failover on entire DC failure
- Requires dedicated DWDM or dark fiber interconnection
- Typical distance: up to 50 km (for latency < 1 ms RTT)
- Use case: enterprise storage, primary+secondary DC in metropolitan area
#### Hot-Cold (Warm Standby → Cold)
```
DC-A (Primary) DC-B (Cold Standby)
┌────────────────────┐ ┌────────────────────┐
│ App Active │ │ ~~~ powered off ~~~│
│ DB Active │──async─►│ Backup storage │
│ Users → A │ │ ~~~ no compute ~~~│
└────────────────────┘ └────────────────────┘
```
| Parameter | Value |
|----------|---------|
| **RTO** | hoursdays (purchase/rent HW, restore from backup) |
| **RPO** | hours (last backup) |
| **Max distance** | unlimited |
| **Operating costs** | 1.11.3× (only storage and facility, compute only at failover) |
| **Typical use case** | Low-cost DR, compliance, last resort |
#### Pilot Light
```
DC-A (Primary) DC-B (Pilot Light)
┌────────────────────┐ ┌────────────────────┐
│ App Active │ │ ~~~ off ~~~ │
│ DB Active │──async─►│ DB replica (mini) │
│ All services │ │ Core services only│
│ │ │ (DNS, LDAP, mon) │
└────────────────────┘ └────────────────────┘
On DR: spin-up compute
from IaC, rest from backup
```
- DC-B runs with minimum compute (only core services and DB replica)
- Application layer is spun up from IaC (Terraform, Ansible) only during DR
- Compromise between cost and RTO
### Comparison table
| Topology | RTO | RPO | Cost (× primary) | Max distance | Failover |
|-----------|-----|-----|-------------------|-------------|----------|
| **Active-Active** | 0s | 0 | 2.0× | < 100 km | Auto (traffic) |
| **MetroCluster** | smin | 0 | 1.82.0× | < 50 km | Auto (storage) |
| **Active-Passive (sync)** | min | 0 | 1.51.8× | < 100 km | Semi-auto |
| **Active-Passive (async)** | minh | smin | 1.31.5× | unlimited | Semi-auto |
| **Pilot Light** | h | minh | 1.21.4× | unlimited | Manual |
| **Warm Standby** | minh | smin | 1.51.8× | unlimited | Semi-auto |
| **Cold Standby** | days | h | 1.11.3× | unlimited | Manual |
### Stretched Cluster
```
┌──── Site A (50 km) ────┐ ┌──── Site B ──────────┐
│ ┌──────────────────┐ │ │ ┌──────────────────┐ │
│ │ ESXi / Hyper-V │ │ │ │ ESXi / Hyper-V │ │
│ │ VM │ │ │ │ VM (complement) │ │
│ └────────┬─────────┘ │ │ └────────┬─────────┘ │
│ │ │ │ │ │
│ ┌────────▼─────────┐ │ │ ┌────────▼─────────┐ │
│ │ Storage (SAN) │──┼────┼──│ Storage (SAN) │ │
│ │ MetroCluster │ │ │ │ MetroCluster │ │
│ └──────────────────┘ │ │ └──────────────────┘ │
└────────────────────────┘ └────────────────────────┘
┌─────▼──────┐
│ vCenter / │
│ Cluster │
│ (single) │
└────────────┘
```
- One cluster stretched across two sites (single management domain)
- VMs can live-migrate between sites (vMotion over distance)
- Storage synchronously mirrored (MetroCluster, VPLEX, vSAN延伸)
- **Requirements**: dark fiber / DWDM, low latency (< 5 ms), high link reliability
- **Risks**: split-brain, brain drain (split-site cluster), network dependency
- **Use case**: enterprise with own dark fiber between two DCs in a metropolitan area
### Decision tree
```mermaid
flowchart TD
Start(["Secondary DC"]) --> RPO{"Required RPO?"}
RPO -->|"0 (no data loss)"| SYNC{"Sync replication possible?"}
SYNC -->|"Yes, < 100 km"| ACT{"Want zero downtime?"}
ACT -->|"Yes"| AA["Active-Active<br/>RTO=0, RPO=0, 2× cost"]
ACT -->|"No"| AP["Active-Passive<br/>RTO=min, RPO=0, 1.5×"]
SYNC -->|"No, > 100 km"| ASYNC["Active-Passive (async)<br/>RTO=min, RPO=s, 1.3×"]
RPO -->|"minuteshours"| WARM{"Want fast failover?"}
WARM -->|"Yes"| PILOT["Pilot Light<br/>RTO=h, RPO=min, 1.2×"]
WARM -->|"No"| COLD["Cold Standby<br/>RTO=days, RPO=h, 1.1×"]
Start --> DIST{"Distance between DCs"}
DIST -->|"< 50 km, own fiber"| MC["MetroCluster / Stretched Cluster<br/>Single management, sync storage"]
DIST -->|"50300 km"| REG["Regional DR<br/>Active-Passive, async replication"]
DIST -->|"> 300 km"| GLOBAL["Global DR<br/>Cold standby, backup & restore"]
```
### Physical infrastructure for DC interconnection
| Technology | Bandwidth | Max distance | Latency | Use case |
|------------|-----------|-------------|---------|----------|
| **Dark fiber** | 100 GbE800 GbE | 1080 km (single-mode) | < 0.1 ms | MetroCluster, stretched cluster |
| **DWDM** | 400 GbE1.6 TbE (per lambda) | 80120 km (without amplifier) | < 0.5 ms | Metro, metro cluster |
| **CWDM** | 1025 GbE (per channel) | 1040 km | < 0.3 ms | Campus, smaller metro |
| **MPLS L2VPN** | 10100 GbE | unlimited | 110 ms | Regional DR, async replication |
| **Internet IPsec** | 110 GbE | unlimited | 550 ms | Cold standby, backup |
### Impact of individual technologies on DC topology selection
Choosing a secondary DC topology is not purely an infrastructure decision — each layer (DB, hypervisor, orchestration, messaging) brings its own constraints.
#### Databases
| DB technology | Sync replication | Max distance | Auto-failover | Split-brain handling | Note |
|---------------|---------------|-------------|---------------|-------------------|----------|
| **PostgreSQL** | Synchronous commit (synchronous_standby_names) | < 100 km (latency < 10 ms) | Patroni / repmgr + etcd | Quorum (etcd, 3+ node) | Streaming replication, needs wal_keep_segments |
| **MySQL** | Group Replication (multi-primary, single-primary) | < 100 km | MySQL InnoDB Cluster + MySQL Router | Paxos (Group Replication, 3+ node) | Semi-sync as compromise |
| **Oracle** | Data Guard (SYNC/FASTSYNC/ASYNC), RAC extended | sync < 100 km, async unlimited | Data Guard Broker / FSFO (Fast Start Failover) | Observer (3rd node) | Far Sync for remote DCs |
| **MSSQL** | AlwaysOn Availability Groups (SYNCHRONOUS_COMMIT) | < 100 km | AlwaysOn + Cluster quorum | File share majority / cloud witness | Multi-site cluster support |
| **MongoDB** | Majority write concern + journaling | < 100 km | Replica set auto-election | Arbitration node (voting member) | Priority-based failover |
| **Cassandra** | N/A (multi-master, eventual consistency) | unlimited | Yes (peer-to-peer) | None (multi-master, gossip protocol) | Snitch-aware topology, NetworkTopologyStrategy |
| **Redis** | Redis Sentinel / Redis Cluster (async) | unlimited (async) | Sentinel / Cluster failover | Quorum (Sentinel, majority) | PSYNC replication, replication lag |
Key limitation for **sync replication**: latency < 5 ms RTT (commit must wait for confirmation from both DCs). At 100 km RTT ~1 ms — OK. At 1000 km (~10 ms RTT) sync replication reduces transaction throughput by 80+ %.
Suitable for **Active-Active**:
- **Cassandra / ScyllaDB** — native multi-DC, eventual consistency, no split-brain
- **MySQL Group Replication (multi-primary)** — 3+ DC for quorum
- **CockroachDB / TiDB** — native multi-region, ACID across DCs
- **Redis Enterprise** — Active-Active (CRDT-based)
Suitable for **Active-Passive**:
- **PostgreSQL + Patroni** — auto-failover, etcd quorum
- **Oracle Data Guard** — FSFO, far sync for remote DCs
- **MSSQL AlwaysOn** — cloud witness
- **MongoDB Replica Set** — arbitration node in 3rd location
#### Hypervisors
| Hypervisor | Cluster technology | Stretched cluster | Max distance | Split-brain |
|-----------|-------------------|-------------------|-------------|-------------|
| **VMware vSphere** | vSAN延伸, Metro vCenter, Site Recovery Manager | Yes (vSAN延伸, Metro Cluster) | < 50 km (vSAN延伸), < 10 ms RTT | Fencing (STONITH), witness host |
| **Hyper-V** | Storage Replica + Failover Cluster | Yes (Cluster Sets) | < 50 km (sync), unlimited (async) | File share witness / cloud witness |
| **Proxmox VE** | Proxmox HA + Ceph | Limited (Ceph stretch cluster) | < 50 km (Ceph sync) | Ceph monitor quorum (3+ DC) |
| **XCP-ng / XenServer** | Xen Orchestra HA + SR (Storage Repository) replication | Limited | depends on storage replication | — |
| **Nutanix AHV** | Metro Availability (sync), Async DR | Yes (Metro) | < 100 km (sync), unlimited (async) | Witness VM (cloud / 3rd site) |
| **KVM / oVirt** | oVirt HA + GlusterFS / NFS | Limited | depends on storage replication | — |
**vSAN延伸 specific requirements:**
- Dedicated vSAN network (25 GbE min., < 5 ms RTT)
- Witness host in 3rd location (or cloud witness)
- All VM policies (FTT=1, mirroring striped)
- Storage policy: `site-A + site-B + witness`
#### Kubernetes and container platforms
| Platform | Multi-cluster DR | Replication | Max distance | Failover |
|-----------|-----------------|-----------|-------------|----------|
| **Vanilla K8s** | KubeFed, Cluster API, Velero + Restic | Velero (backup/restore), Rook (Ceph) | unlimited | Manual (Velero restore) |
| **OpenShift** | ACM (Advanced Cluster Management), Velero | OADP (OpenShift API for Data Protection) | unlimited | ACM failover (subscription) |
| **Rancher** | Rancher Multi-Cluster App, Velero | Longhorn (sync/async DR), Velero | unlimited | Semi-auto |
| **Google GKE** | Multi-cluster Services, Backup for GKE | Config Sync, Backup for GKE | unlimited | Manual |
| **Azure AKS** | Azure ARC + Velero + Azure Traffic Manager | AKS backup (velero), Azure Site Recovery | unlimited | Manual (Velero) |
| **AWS EKS** | EKS multi-cluster, Velero + S3 cross-region | Velero (S3), Rook (EBS snapshots) | unlimited | Manual |
**Key K8s DR principles:**
- **Applications must be stateless** (or state externalized to DB/storage)
- **Velero** — backup/restore entire cluster (PV, resources, helm releases)
- **Rook/Ceph** — cross-region mirroring RBD volumes
- **KubeFed / ACM** — subscription-based deploy to multiple clusters
- **Ingress/Gateway API** — traffic routing between clusters
- **External DNS** — DNS failover on cluster outage
#### Messaging / streaming
| Platform | Replication | Topology | DR support | Max distance |
|-----------|-----------|-----------|------------|-------------|
| **Apache Kafka** | MirrorMaker 2, Confluent Cluster Linking, KRaft quorum | Active-Passive (MM2), Active-Active (Cluster Linking) | MM2: async, Cluster Linking: async | unlimited |
| **RabbitMQ** | Classic Queue Mirroring, Quorum Queues | Active-Passive (Warm Standby) | Federation / Shovel (async) | unlimited |
| **Red Hat AMQ** | (Artemis) Cluster + HA | Active-Passive (shared store / replication) | Live-backup pair | < 100 km (sync) |
| **NATS** | NATS JetStream (cluster + cross-account) | Active-Active (Leaf nodes, cross-account) | Super-cluster, failover | unlimited |
| **Apache Pulsar** | BookKeeper (bookie rack-aware), geo-replication | Active-Active (geo-replication) | Built-in (cluster-level) | unlimited (async) |
| **AWS SQS/SNS** | Managed, AWS region pairs | Active-Active (multi-region) | Built-in (AWS managed) | unlimited |
| **Azure Service Bus** | Managed, paired region | Active-Passive (paired region) | Built-in (geo-recovery) | unlimited |
| **Oracle Service Bus (OSB)** | Oracle WebLogic Cluster + JDBC store + AQ | Active-Passive (WebLogic Cluster + Data Guard) | OSB/WLS cluster + Oracle RAC/Data Guard sync | < 100 km (Data Guard sync), unlimited (async) |
**Messaging DR recommendations:**
- **Kafka**: use Cluster Linking for Active-Active, or MirrorMaker 2 for Active-Passive; replicate only critical topics
- **RabbitMQ**: Quorum Queues + Federation upstream for DR; avoid Classic Queue Mirroring (deprecated)
- **Pulsar**: native geo-replication, bookie rack-aware for stretched cluster; easiest DR among messaging platforms
- **OSB**: WebLogic cluster + Oracle RAC/Data Guard; DR depends on DB layer, not on OSB itself
### Per-layer limitations summary table
| Layer | Limiting factor for secondary DC | Max distance for sync | Impact on topology selection |
|--------|-----------------------------------|----------------------|--------------------------|
| **Storage** | Sync mirror latency, DWDM cost | < 50 km (MetroCluster) | Stretched cluster only in metro |
| **Databases** | Commit wait for sync replication | < 100 km (5 ms RTT) | Active-Active only with multi-master DB |
| **Hypervisor** | Stretched cluster quorum + fencing | < 50 km (vSAN, 5 ms) | MetroCluster / stretched cluster |
| **Kubernetes** | Velero restore time, Rook mirror latency | unlimited (async) | Active-Passive, cold standby |
| **Messaging** | Replication lag, offset management | unlimited (async) | Active-Active (Kafka, Pulsar, NATS) or Active-Passive |
| **Network** | Dark fiber/DWDM cost, latency | < 100 km (metro fiber) | Limits sync replication options |
| **Application** | Stateful/stateless, connection draining | depends on architecture | Stateless app → any topology |
## Disk monitoring — S.M.A.R.T.
Self-Monitoring, Analysis and Reporting Technology — predictive monitoring of HDD/SSD.
| Key attribute | ID | Description |
|--------------|----|-------------|
| Reallocated Sectors Count | 5 | Number of remapped sectors (increase = end of disk life) |
| Power-On Hours | 9 | Total operating time in hours |
| Reported Uncorrectable Errors | 187 | Uncorrectable errors (red flag) |
| CRC Error Count | 199 | Errors on SATA link (cable/controller) |
| SSD Life Left | 231 | % remaining SSD life |
| Media Wearout Indicator | 233 | Total NAND writes |
Tools: `smartmontools` (smartctl, smartd), Prometheus exporter (`node_exporter`), OTeL collector.
## Sources
Links, books and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)
### Recommended literature
| Book | Authors | ISBN | Description |
|------|---------|------|-------------|
| The Data Center as a Computer (4th ed., 2025) | Barroso, Hölzle, Ranganathan | 978-3-031-99488-3 | Comprehensive design evolution of warehouse-scale computer (WSC) by Google architects. Covers hardware, software, power, cooling, networking and 25 years of WSC experience. Key publication for datacenter architecture. |
| Electronics Cooling: From the Chip to the Datacenter (Vol. 62) | Abraham et al. | 978-0-443-47084-4 | Practical guide to thermal management from transistor level to datacenter. Covers conduction, convection, liquid immersion and phase change cooling. Essential resource for DC cooling design. |
## Datacenter backbone services
When building a new DC, basic infrastructure services must be deployed first — without them, higher layers cannot operate:
### DNS
| Role | Service | Description |
|------|---------|-------------|
| **Authoritative** | Bind, PowerDNS, NSD | Primary DNS zone for internal domains |
| **Recursive** | Unbound, Bind (caching), CoreDNS | Resolver for internal + external queries |
| **Anycast** | DNS anycast (BGP) | Redundancy, lower latency |
| **Integration** | Infoblox, BlueCat, dnsmasq | IPAM + DNS + DHCP in one |
Best practices: separate auth and recursive resolvers, DNSSEC, split-horizon (internal vs external view), TSIG for zone transfers, monitoring (DNS query latency, NXDOMAIN rate).
### NTP (time synchronization)
- **Primary**: GPS-disciplined NTP servers (Microchip S600, Meinberg)
- **Secondary**: Stratum 1/2 NTP (ntpd, chrony, NTPsec)
- **All nodes**: chrony (modern replacement for ntpd), local NTP server on each rack switch (boundary clock)
- **Precision**: PTP (IEEE 1588) for telco/fintech — sub-microsecond accuracy
- **DC topology**: GPS antenna → Grandmaster (PTP) → Boundary clock (rack switch) → Ordinary clock (server)
### DHCP + IPAM
| Tool | Description |
|------|-------------|
| **ISC DHCP** | Legacy, still widely deployed |
| **Kea** | Modern replacement for ISC DHCP (ISC + Linux Foundation) |
| **Infoblox / BlueCat** | Enterprise IPAM + DHCP + DNS |
| **NetBox / phpIPAM** | Open-source IPAM |
### LDAP / Identity Management
| Tool | Description |
|------|-------------|
| **FreeIPA** | Integrated IDM (LDAP + Kerberos + DNS + CA) — Linux |
| **Active Directory** | Microsoft, LDAP + Kerberos + Group Policy |
| **389 Directory Server** | Open-source LDAP (Red Hat) |
| **OpenLDAP** | Classic open-source LDAP |
| **Keycloak / Authentik** | Modern OIDC/SAML/LDAP gateways |
### PKI and certificates
- **Enterprise CA**: EJBCA, Smallstep, HashiCorp Vault (PKI engine)
- **ACME**: Cert-Manager (Kubernetes), certbot (Let's Encrypt)
- **mTLS**: Vault PKI, spire (SPIFFE), Cilium
- **Best practices**: root CA offline, intermediate CA per environment, short-lived certificates (max 90 days), revocation (CRL/OCSP)
### Monitoring and observability
See [MONITORING.en.md](MONITORING.en.md). Before running first workloads, DC must have:
- Metric collection (Prometheus, Zabbix)
- Centralized logs (Loki, ELK)
- Alerting (Alertmanager, PagerDuty)
- Uptime monitoring (heartbeat checks)
### Deployment logistics — step order
```
1. DNS (at least recursive + local resolver)
2. NTP (time synchronization)
3. DHCP + IPAM (first servers get IPs)
4. LDAP / IAM (users, groups, access rights)
5. PKI (certificates for encryption)
6. Configuration management (Ansible, Puppet)
7. Monitoring + logging (see what's happening)
8. Container registry / Package repo (docker registry, apt/yum mirror)
9. Load balancer (for services)
10. Storage backend (Ceph, NFS, SAN)
11. Orchestration (Kubernetes, OpenStack)
```
## OpenStack in the datacenter
OpenStack brings a software abstraction layer to DC enabling multi-tenancy and self-service:
### Control plane architecture
- **Controller nodes** — management services (Keystone, Nova API, Neutron API, Horizon, RabbitMQ, DB)
- **Compute nodes** — hypervisor (KVM), Nova Compute, Neutron agent
- **Storage nodes** — Ceph OSD, Cinder volumes, Swift object storage
- **Network nodes** — Neutron L3 router, DHCP agent, DVR
### Requirements for DC infrastructure
| Component | Requirement |
|-----------|-------------|
| **Controller** | 3-5 node HA cluster, 16+ vCPU, 32+ GB RAM, SSD |
| **Compute** | Dense performance per rack (GPU, high-core), NUMA-aware design |
| **Storage (Ceph)** | 10-25 GbE networking, NVMe/SSD OSD, 3+ replica |
| **Network** | 25/100 GbE spine-leaf, L3 BGP underlay, VXLAN overlay |
| **Rack power** | 10-30 kW/rack for GPU compute |
### Use cases
- Private cloud for enterprise (multi-tenant, self-service Horizon)
- NFVI for telco (DPDK, SR-IOV, low-latency)
- Academic / HPC clusters (Ironic, Cyborg, Manila)
- Government / regulated environments (on-prem, audit trail)
*Last revision: 2026-06-03*