# 🏭 Datacenters ## Tier classification (TIA-942 / Uptime Institute) | Tier | Availability | Downtime / year | Redundancy | |------|-------------|-----------------|------------| | **Tier I** | 99.671 % | 28.8 h | N β€” no redundancy | | **Tier II** | 99.741 % | 22.7 h | N+1 β€” redundant components | | **Tier III** | 99.982 % | 1.6 h | N+1 β€” concurrently maintainable | | **Tier IV** | 99.995 % | 26.3 min | 2N+1 β€” fault tolerant | ## Key subsystems | System | Description | |--------|-------------| | **Power** | UPS, generators (diesel), ATS, PDU, redundant feeds (A/B feed) | | **Cooling** | CRAC/CRAH, chilled water, free cooling, containment (hot/cold aisle) | | **Physical security** | CCTV, biometric access, mantrap, rack security locks | | **Cabling** | Structured cabling (Cat6A/7/8, OM3/OM4 single-mode fiber), patch panels | | **Fire suppression** | Alarm, inert gases (Novec, FM-200), VESDA (very early smoke detection) | | **Monitoring** | DCIM (Data Center Infrastructure Management), SNMP, BMS (Building Management System) | ## Aisle containment ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Rack Row β”‚ β”‚ β”Œβ”€β”€β” β”Œβ”€β”€β” β”Œβ”€β”€β” β”Œβ”€β”€β” β”Œβ”€β”€β” β”Œβ”€β”€β” β”‚ Cold β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ Cold Aisle <──│ β””β”€β”€β”˜ β””β”€β”€β”˜ β””β”€β”€β”˜ β””β”€β”€β”˜ β””β”€β”€β”˜ β””β”€β”€β”˜ ──> Aisle β”‚ β”Œβ”€β”€β” β”Œβ”€β”€β” β”Œβ”€β”€β” β”Œβ”€β”€β” β”Œβ”€β”€β” β”Œβ”€β”€β” β”‚ Hot β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ Hot Aisle ──>β”‚ β””β”€β”€β”˜ β””β”€β”€β”˜ β””β”€β”€β”˜ β””β”€β”€β”˜ β””β”€β”€β”˜ β””β”€β”€β”˜ <── Aisle β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ## Environmental classes (ASHRAE TC 9.9) ASHRAE Technical Committee 9.9 defines temperature and humidity envelopes for IT equipment in DC. | Class | Temperature (recommended) | Temperature (allowable) | Usage | |-------|--------------------------|-------------------------|-------| | **A1** | 18-27 Β°C | 15-32 Β°C | Enterprise DC, strict control | | **A2** | 18-27 Β°C | 10-35 Β°C | Standard DC | | **A3** | 18-27 Β°C | 5-40 Β°C | Looser environment | | **A4** | 18-27 Β°C | 5-45 Β°C | Maximum cooling savings | | **H1** | 18-22 Β°C | 5-25 Β°C | High-density air-cooled (AI/ML) | - 5th edition (2021) added class H1 for high-density and extended liquid cooling W-classes (W17, W27, W32, W40, W45, W+) - 2024: new S-classes for Technology Cooling System (TCS) liquid cooling - Humidity: recommended βˆ’9 Β°C DP to 70 % RH (at low pollutants); max 50 % RH at high corrosivity ## Power ### Power chain ``` Grid ──> Transformer ──> UPS ──> PDU ──> Rack PDU ──> Server PSU β”‚ β”œβ”€β”€> Generator (ATS switches on outage) └──> STS/ATS (Static Transfer Switch) ``` A/B feed topology: ``` Grid A ──> UPS A ──> PDU A1 ──> Rack PDU A ──> PSU A (server) β”‚ Grid B ──> UPS B ──> PDU B1 ──> Rack PDU B ──> PSU B (server) ``` Each server has 2 PSUs β€” each powered from a different branch (A/B). On failure of one branch, the server continues without interruption. ### UPS types | Classification | IEC 62040-3 | Description | Switching | Use case | |--------------|-------------|-------------|-----------|----------| | **VFD** (Voltage & Frequency Dependent) | Passive standby | UPS in bypass, switches to inverter on failure | 4-10 ms | SOHO, edge | | **VI** (Voltage Independent) | Line-interactive | Voltage regulation via autotransformer | 2-4 ms | Smaller racks, office | | **VFI** (Voltage & Frequency Independent) | Double-conversion | AC β†’ DC β†’ AC, full isolation, zero switching time | 0 ms | Enterprise DC, Tier III/IV | For DC the standard is **VFI (double-conversion)** β€” online UPS, zero switching time, full isolation from the grid. ### Battery technologies | Type | Density (Wh/L) | Lifespan (cycles) | Lifespan (years) | Temperature | Cost/kWh | Note | |------|---------------|-------------------|------------------|-------------|----------|------| | **VRLA** (AGM/Gel) | 50-80 | 200-500 | 3-5 | 20-25 Β°C | ~$150-200 | Cheap, large, heavy, temperature sensitive | | **Li-ion (LFP)** | 200-350 | 3000-5000 | 10-15 | 0-40 Β°C | ~$300-500 | Small, light, long life, BMS required | | **Li-ion (NMC)** | 250-400 | 1000-2000 | 8-12 | 0-40 Β°C | ~$250-400 | Higher density, thermal runaway risk | | **NiCd** | 80-150 | 1000-2000 | 10-15 | βˆ’20-50 Β°C | ~$400-600 | Extreme temperatures, memory effect | | **Flow battery** (V/Zn/Br) | 20-40 | 10,000+ | 20+ | 10-35 Β°C | ~$500-800 | Unlimited cycles, large, long-term backup | Li-ion (LFP) is becoming the standard for new DCs due to longer life, smaller footprint, and better behavior at high temperatures. ### Generator sizing | Variant | Size | Fuel | Start time | Run time | Use case | |---------|------|------|------------|----------|----------| | **Diesel** | 500-2500 kVA | Diesel | 10-30 s | 24-72 h (depending on tank) | Standard for enterprise DC | | **Nat. gas** | 200-1500 kVA | Natural gas | 10-30 s | Unlimited (pipeline) | Less common, lower emissions | | **CHP** (cogeneration) | 500-2000 kVA | Natural gas | 5-15 min | Unlimited | Combined power + cooling (absorption chiller) | Sizing: Generator should cover 100 % IT load + 100 % cooling load (incl. chillers) β€” typically 1.3-1.8Γ— IT load. Diesel tank min. for 24 h operation, commonly 48-72 h. Daily consumption ~0.3-0.4 L/kWh. ### ATS vs STS | Feature | ATS (Automatic Transfer Switch) | STS (Static Transfer Switch) | |---------|--------------------------------|-----------------------------| | **Switching** | 4-10 ms (mechanical relay) | < 4 ms (thyristor) | | **Lifespan** | ~10,000 switches | Unlimited (solid-state) | | **Cost** | Low | High (~3-5Γ— ATS) | | **Use case** | Generator β†’ UPS feed | Between two UPS outputs | ### PDU types | Type | Description | Use case | |------|-------------|----------| | **Basic** | Passive splitter (no monitoring) | Edge, office | | **Metered** | Current measurement at PDU level | Standard DC | | **Monitored** | Measurement per outlet, SNMP, web GUI | Enterprise DC | | **Switched** | On/off per outlet, remote reboot | Enterprise DC, colo | | **High-density** | 3-phase, 60-100 A, C19 outlets | GPU/HPC/AI racks | ### Power calculation ``` Total Power = Ξ£(P_server + P_storage + P_network + P_cooling + P_losses) P_server = P_idle + (P_max - P_idle) Γ— Utilization% P_cooling = P_IT / PUE Example: 100 servers Γ— 500 W (avg) = 50 kW IT load PUE = 1.5 β†’ total 75 kW UPS + generator β†’ sized for 75 kW Γ— 1.2 (safety factor) = 90 kW ``` ### PUE (Power Usage Effectiveness) ``` PUE = Total Facility Energy / IT Equipment Energy ``` | PUE | Efficiency | Type | |-----|-----------|------| | 1.0-1.1 | Excellent | Hyperscale (Google, Meta) | | 1.1-1.3 | Very good | Modern DC | | 1.3-1.6 | Good / average | Enterprise DC | | 1.6-2.0 | Below average | Older DC | | >2.0 | Poor | Legacy | PUE is measured at the whole DC level, not per rack. Includes: UPS losses, cooling, lighting, distribution losses. Excludes: well-to-tank fuel production, embodied carbon. Target for modern DC: PUE < 1.2. ### WUE and CUE | Metric | Description | Formula | Target | |--------|-------------|---------|--------| | **WUE** (Water Usage Effectiveness) | Water consumption per IT energy | WUE = Annual Water Usage / IT Energy (L/kWh) | < 0.5 L/kWh | | **CUE** (Carbon Usage Effectiveness) | COβ‚‚ emissions per IT energy | CUE = Total COβ‚‚ / IT Energy (kg COβ‚‚/kWh) | < 0.2 kg COβ‚‚/kWh | WUE is critical in dry regions (southwest US, Australia, Middle East). Adiabatic cooling consumes significantly more water than closed-loop cooling. ### 3-phase vs Single-phase | Feature | Single-phase (230 V) | 3-phase (400 V) | |---------|---------------------|-----------------| | **Voltage** | 230 V (L-N) | 230/400 V (L-N/L-L) | | **Power per feed** | ~7.4 kW (32 A) | ~22 kW (32 A, 3-ph) | | **Efficiency** | Lower (more losses) | Higher (lower current) | | **Use case** | Smaller racks, office | Standard in DC, high-density | | **PDU** | Single-phase (C13/C19) | 3-phase (C13/C19, 3-ph monitoring) | | **Balancing** | Automatic | Phase balancing required (L1/L2/L3) | ### Rack power density | Cat. | Type | kW/rack | Power | Cooling | |------|------|---------|-------|---------| | Low | Office, storage | 1-3 kW | 1-ph, 16 A | Air (free cooling) | | Medium | Standard compute | 5-10 kW | 3-ph, 32 A | Air (CRAC/CRAH) | | High | GPU, HPC | 15-30 kW | 3-ph, 60 A | Air + liquid assist | | Ultra | AI/ML clusters | 40-100+ kW | 3-ph, 100+ A | Direct-to-chip / immersion | ### Rack PDU connectors | Connector | Max current | Device type | |-----------|-------------|-------------| | **C13** | 10 A (250 V) | Servers, switches, 1U | | **C19** | 16 A (250 V) | Higher power servers, UPS | | **IEC 60309** (3-ph) | 16-125 A | Rack PDU inputs | | **NEMA L6-30** | 30 A (250 V) | US spec | ## Cooling ### Cooling β€” technology overview | Technology | Type | Output (kW/rack) | Typical PUE | CAPEX | Use case | |-----------|------|-----------------|-------------|-------|----------| | **Free air cooling** | Air | < 5 | 1.05-1.15 | Low | Climatically suitable locations | | **CRAC (DX)** | Air | 5-10 | 1.4-1.8 | Medium | Smaller DC, retrofit | | **CRAH (CW)** | Air | 5-15 | 1.2-1.5 | High | Enterprise DC | | **In-row cooling** | Air | 10-25 | 1.2-1.4 | High | High-density racks | | **Rear-door HX** | Hybrid | 15-30 | 1.1-1.3 | Medium | Retrofits, GPU | | **Direct-to-chip** | Liquid | 40-100+ | 1.05-1.15 | High | AI/ML, HPC | | **Immersion (single-phase)** | Liquid | 50-100+ | 1.03-1.10 | High | Bitcoin, hyperscale | | **Immersion (two-phase)** | Liquid | 100-200+ | 1.03-1.08 | Very high | Extreme density | ### Chilled water vs Direct Expansion (DX) | Feature | Chilled water (CW) | Direct Expansion (DX) | |---------|-------------------|----------------------| | **Medium** | Water + glycol | Refrigerant (R134a, R410A, R454B) | | **CRAC/CRAH** | CRAH (Coolant-based) | CRAC (refrigerant compressor) | | **Efficiency** | Higher (COP 5-7) | Lower (COP 2-4) | | **Water temperature** | 7-12 Β°C (standard), 18-22 Β°C (high-temp) | βˆ’5-10 Β°C (evaporator) | | **Complexity** | Higher (chillers, pumps, pipes, cooling tower) | Simpler | | **Maintenance** | Higher (water treatment, legionella prevention) | Lower | | **Use case** | Large DC > 500 kW, enterprise | Smaller DC, edge, retrofit | ### Containment types | Type | Description | Efficiency | Implementation | |------|-------------|------------|----------------| | **Cold aisle containment (CAC)** | Enclosed cold aisle, warm air returns to room | High | Doors at aisle ends, ceiling panels | | **Hot aisle containment (HAC)** | Enclosed hot aisle, warm air goes directly to return | Higher | Doors + ceiling panels, return to CRAH | | **Chimney / rear duct** | Each rack has its own exhaust chimney to ceiling | Highest | Individual ducts per rack, expensive | | **Open aisle** | No containment, cold and warm air mix | Low | Legacy, cheap | Recommendation: CAC/HAC at density > 5 kW/rack. HAC is 5-10 % more efficient than CAC (warm air is directly extracted, does not mix with room). ### CFD modeling Computational Fluid Dynamics (CFD) simulates airflow in DC before physical implementation: - Identification of hot spots (warm air recirculation into cold aisle) - Optimization of perforated tile positions - Design of bypass airflow (cable openings, uncovered positions) - Simulation of CRAH unit failure (what-if scenarios) - Tools: Future Facilities (6Sigma DC), Ansys Fluent, OpenFOAM ### Free cooling - **Air-side** β€” intake of outside air at suitable temperature (filtration, humidification) - **Water-side** β€” use of cold water from outdoor chillers (strainer cycle) without compressor - **Climate zone** β€” free cooling usable ~2000-8000 hours/year depending on location - Scandinavia: 7000-8000 h/year - Central Europe: 4000-6000 h/year - Southern Europe: 2000-4000 h/year - **Hybrid** β€” combination of free cooling + mechanical cooling (most common) - **Economizer types**: Class A1 (dry cooler), Class A2 (evaporative), Class B (air-side) ### Liquid cooling detail | Type | Inlet temperature | Capacity (kW/rack) | Medium | Installation | |------|-----------------|-------------------|--------|-------------| | **Cold plate (D2C)** | 20-45 Β°C | 40-100+ | Water, propylene glycol | CDU per rack or per row | | **Rear-door HX** | 18-27 Β°C | 15-30 | Water | Passive, no server modification | | **Immersion (1-ph)** | 35-50 Β°C | 50-100+ | Dielectric oil | Tank, CDU, heat exchanger | | **Immersion (2-ph)** | 25-35 Β°C | 100-200+ | Dielectric (boiling) | Tank + condenser | **CDU (Coolant Distribution Unit)**: - Provides coolant temperature and pressure to racks - Primary loop (facility water) + secondary loop (rack coolant) - Sizing: 1 CDU per 4-8 racks (40-100 kW per CDU) - Redundancy: N+1 CDU, dual coolant loops **Water quality requirements**: - Conductivity: < 1 Β΅S/cm (demineralized water) - pH: 6.5-8.0 - Particulates: < 50 Β΅m (filtration) - Corrosion prevention: inhibitors, glycol (10-30 %) - Biological growth prevention: UV, biocides ### Adiabatic cooling Using water evaporation to cool air: - **Direct adiabatic** β€” air passes through water (media pad), cools and humidifies - **Indirect adiabatic** β€” air cools via heat exchanger without direct contact with water - **Water consumption**: 3-5 L/kWh (direct), 1-2 L/kWh (indirect) - Efficiency depends on air humidity β€” more effective in dry climates ## Cabling and structured cabling ### TIA-942 cabling hierarchy ``` Entrance Room (ER) β”‚ β”œβ”€β”€ Backbone cabling (fiber single-mode / multi-mode) β”‚ β”‚ β”‚ β”œβ”€β”€ Main Distribution Area (MDA) β”‚ β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ Horizontal Distribution Area (HDA) β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ └── Equipment Distribution Area (EDA) β†’ rack β”‚ β”‚ β”‚ β”‚ β”‚ └── Intermediate Distribution Area (IDA) β€” optional β”‚ β”‚ β”‚ └── Telecommunication Room (TR) β€” for office β”‚ └── Backbone cabling (fiber / copper) ``` ### Copper cabling categories | Category | Frequency | Speed | Length | Connector | Use case | |----------|-----------|-------|--------|-----------|----------| | **Cat5e** | 100 MHz | 1 GbE | 100 m | RJ45 | Legacy, voice | | **Cat6** | 250 MHz | 1 GbE (10 GbE up to 55 m) | 100 m (10 GbE: 55 m) | RJ45 | Standard DC, enterprise | | **Cat6A** | 500 MHz | 10 GbE | 100 m | RJ45 | Standard for new DC | | **Cat7** (GG45) | 600 MHz | 10 GbE | 100 m | GG45/TERA | Niche, replaced by Cat6A/8 | | **Cat8.1** | 2000 MHz | 25/40 GbE | 30 m | RJ45 | Top-of-rack, storage | | **Cat8.2** | 2000 MHz | 25/40 GbE | 30 m | GG45/TERA | Top-of-rack, storage | In DC, **Cat6A** (10 GbE up to 100 m) is standard for horizontal cabling. Cat8 only for patch cables within a rack (up to 30 m). ### Fiber optic types | Type | Core | Modal BW | Speed | Max length | Use case | |------|------|----------|-------|-----------|----------| | **OS1** (SM) | 9 Β΅m | β€” | 100 GbE - 800 GbE | 10-80 km | Backbone, campus, WAN | | **OS2** (SM) | 9 Β΅m | β€” | 100 GbE - 800 GbE | 2-80 km (CWDM/DWDM) | Backbone, DWDM | | **OM1** (MM) | 62.5 Β΅m | 200 MHzΒ·km | 1 GbE | 275 m | Legacy | | **OM2** (MM) | 50 Β΅m | 500 MHzΒ·km | 10 GbE | 82 m | Legacy | | **OM3** (MM) | 50 Β΅m | 2000 MHzΒ·km | 10 GbE up to 300 m, 100 GbE up to 100 m | 300 m (10G) | Standard DC, VCSEL | | **OM4** (MM) | 50 Β΅m | 4700 MHzΒ·km | 100 GbE up to 150 m, 400 GbE up to 100 m | 550 m (10G) | High-performance DC standard | | **OM5** (MM) | 50 Β΅m | 4700+ MHzΒ·km | 200/400 GbE SWDM | 150 m (100G) | Emerging, SWDM | For new DC: **OM4** as standard for multi-mode, **OS2** for single-mode backbone (LR, DWDM). OM5 is not widely deployed β€” OM4 + parallel optics (SR4) is more common. ### Connector types | Connector | Type | Insertion loss | Fiber count | Use case | |-----------|------|---------------|-------------|----------| | **LC** | Duplex | < 0.15 dB | 2 | Standard for SFP/SFP+/QSFP | | **SC** | Duplex | < 0.2 dB | 2 | Older installations, patch panels | | **MPO/MTP** (12-f) | Multi-fiber | < 0.35 dB | 12/24 | 40/100/400 GbE parallel | | **MPO/MTP** (24-f) | Multi-fiber | < 0.5 dB | 24 | 400 GbE (SR4.2, DR4) | | **SN** | Duplex (mini) | < 0.15 dB | 2 | High-density (QSFP-DD, OSFP) | | **CS** | Duplex (mini) | < 0.15 dB | 2 | High-density (QSFP-DD, OSFP) | ### MPO/MTP polarity | Method | Description | Use case | |--------|-------------|----------| | **Type A** (Straight) | Fiber 1β†’1, 2β†’2, ... | Duplex applications with cross-over at both ends | | **Type B** (Crossed) | Fiber 1β†’12, 2β†’11, ... | Parallel optics (SR4, SR8) β€” standard | | **Type C** (Pairs crossed) | Pairs 1-2β†’2-1, 3-4β†’4-3 | 40 GbE SR4 (4Γ—10G) | ### Breakout cassettes ``` MPO (12-f) ──> Breakout cassette ──> 6Γ— LC duplex (12 fibers = 6Γ— duplex) MPO (24-f) ──> Breakout cassette ──> 12Γ— LC duplex (24 fibers = 12Γ— duplex) ``` Use case: Connecting MPO ports (switch) with LC ports (servers, storage). Cassettes are in the patch panel, not in the active path. ### Copper vs fiber decision | Criterion | Copper (Cat6A/8) | Fiber (OM4/OS2) | |-----------|-----------------|-----------------| | **Reach** | 30-100 m | 100 m - 80 km | | **Speed** | 1-40 GbE | 1-800 GbE | | **Transceiver cost** | Lower (RJ45) | Higher (SFP+/QSFP) | | **Cable cost** | Lower | Higher (patch cord) | | **Port power** | 2-5 W (25 GbE) | 1-3 W (25 GbE SR) | | **EMI immunity** | Susceptible | Immune | | **Weight (100 m)** | ~3-4 kg | ~0.5-1 kg | | **Recommendation** | Up to 30 m, serverβ†’ToR switch | Backbone, storage, >30 m | ### Cabling best practices - **Horizontal cabling**: max 90 m permanent link + 10 m patch cords (TIA-942) - **Fiber management**: slack spools, cable managers, minimum bend radius 10Γ— cable diameter - **Color coding**: OS1/OS2 (yellow), OM3 (aqua), OM4 (magenta/purple), OM5 (lime green) - **Labeling**: both ends, patch panels, faceplates β€” standard ANSI/TIA-606-B - **Overhead vs underfloor**: overhead (ladder rack) is preferred in DC (better airflow, easier changes) - **MPO cassettes**: plan 15-20 % fiber reserve for future needs ## Physical security ### Multi-layer security model (defense in depth) ``` Layer 1: Perimeter (fence, gate, guards) Layer 2: Building (walls, locks, CCTV, card readers) Layer 3: DC hall (biometrics, mantrap, CCTV, motion detection) Layer 4: Rack / Cage (electronic locks, sensors) Layer 5: Data (encryption, HSM, access control) ``` ### Access control | Method | Factor | Level | Note | |--------|--------|-------|------| | **RFID / proximity card** | Something you have | Standard | Basic access, cheap | | **Smart card (PKI)** | Something you have + PIN | Medium | Certificate on card, anti-passback | | **Biometric (fingerprint)** | Something you are | High | Fast, hygienic (touchless readers) | | **Biometric (palm/finger vein)** | Something you are | Very high | Hard to forge, contactless | | **Biometric (iris/retina)** | Something you are | Highest | Very accurate, slow, expensive | | **Multi-factor** | 2+ factors | Highest | Card + biometrics + PIN β€” Tier IV DC | ### Mantrap design ``` Outer door ──> Mantrap (vestibule) ──> Inner door β”‚ β”œβ”€β”€ Weight sensor (anti-tailgating) β”œβ”€β”€ CCTV (both doors) β”œβ”€β”€ Intercom (emergency exit) └── Motion detector (in mantrap) ``` - Only one door opens at a time - Anti-tailgating: weight sensor detects multiple persons - Exit via breakout button + motion detection - Emergency exit: panic bar + alarm ### CCTV | Element | Recommendation | |---------|----------------| | **Resolution** | Min. 1080p, ideally 4K (6 MP+) | | **FPS** | 15-30 FPS (recording), 30+ FPS (realtime monitoring) | | **Retention** | Min. 30 days (90 days for audit) | | **Storage** | NVR (on-prem), cloud (AWS KVS, Azure Video Indexer) | | **AI analytics** | Face detection, ANPR (license plate), object detection | | **Field of view** | Every door, every aisle β€” no blind spots | ### Asset tracking | Technology | Accuracy | Cost | Use case | |-----------|----------|------|----------| | **Barcode** | Rack-level | Very low | Manual inventory | | **RFID (passive)** | Rack-level (door sweep) | Low | Automatic rack open detection | | **RFID (active, UWB)** | 10-30 cm | Medium | Real-time tracking | | **Bluetooth BLE** | 1-3 m | Low | Approximate position | | **GPS** | 1-10 m | Medium | Outdoor tracking | ## DC layout and design ### Raised floor vs Slab | Feature | Raised floor | Slab (solid floor) | |---------|-------------|-------------------| | **Airflow** | Underfloor air distribution (raised floor as plenum) | Overhead air, in-row cooling | | **Flexibility** | Easy addition of perforated tiles | Limited (overhead cooling required) | | **Weight** | Limit 500-1000 kg/mΒ² (depends on height) | Unlimited | | **Cost** | Higher (~$200-400/mΒ²) | Lower (~$100-200/mΒ²) | | **Height** | 600-900 mm (standard), 900-1200 mm (high-density) | β€” | | **Trend** | Declining (shift to in-row/overhead cooling) | Growing (new DC, high-density) | Modern high-density DC (AI/ML, GPU) are moving away from raised floor to slab + overhead/in-row cooling β€” higher rack weights (1000-2000 kg), inability to provide sufficient airflow through floor. ### Rack layout and dimensions | Parameter | Standard | High-density | Note | |-----------|----------|-------------|------| | **Rack width** | 600 mm (19") | 600-750 mm | 750 mm for GPU (cabling, cooling) | | **Rack depth** | 1000-1200 mm | 1200-1500 mm | GPU servers, longer cables | | **Rack height** | 42U | 48U / 52U | Higher rack = better power density | | **Aisle width (cold)** | 1200-1500 mm | 1500-1800 mm | Service access, airflow | | **Aisle width (hot)** | 900-1200 mm | 1200-1500 mm | Narrower than cold | | **Max rack load** | 500-800 kg | 1000-2000 kg | Floor reinforcement required | ### Space planning ``` For Tier III DC (example): IT space: 1000 mΒ² └── 20 rows Γ— 10 racks = 200 racks at 42U └── 200 racks Γ— 5 kW avg = 1 MW IT load └── PUE 1.4 β†’ 1.4 MW facility Support spaces: └── UPS + batteries: 200 mΒ² └── Generators: 100 mΒ² (outdoor) └── Cooling (chillers, cooling tower): 300 mΒ² └── Offices, storage, loading dock: 400 mΒ² Total: ~2000 mΒ² (50% IT, 50% support) ``` ### Zone approach (TIA-942) | Zone | Description | Access | Security | |------|-------------|--------|----------| | **Z1** (Public) | Reception, offices | Free | Minimal | | **Z2** (Office) | Administration, NOC | Employees + guests | RFID | | **Z3** (DC support) | UPS, generators, cooling | DC operators | RFID + biometrics | | **Z4** (DC hall) | Servers, storage, networking | DC operators + approved | RFID + biometrics + mantrap | | **Z5** (Rack/cage) | Specific rack or cage | Only authorized personnel | Electronic lock | ## Fire suppression ### Detection | System | Type | Detection time | False alarms | Use case | |--------|------|----------------|--------------|----------| | **VESDA** (Very Early Smoke Detection) | Aspiration, laser sensor | < 30 s (4 alarm levels) | Very low | Standard for DC | | **Spot detection** | Ionization / optical smoke detector | 2-5 min | Medium | Legacy, smaller DC | | **Heat detection** | Thermal detector (temperature / rate of rise) | 5-10 min | Very low | Backup for VESDA | | **Line-type (LHD)** | Linear heat detection cable | 2-5 min | Low | Cable trays, above ceiling | VESDA is the standard β€” active aspiration draws air from DC, laser sensor detects smoke particles at 4 levels (Alert β†’ Action β†’ Fire 1 β†’ Fire 2). Enables intervention before visible smoke. ### Suppression systems | System | Medium | Advantages | Disadvantages | Typical DC | |--------|--------|------------|---------------|-----------| | **Novec 1230** (FK-5-1-12) | Gas | Safe for people, zero ODP, short atmospheric lifetime (5 days) | Higher cost | Enterprise DC | | **FM-200** (HFC-227ea) | Gas | Fast (10 s), effective | High GWP (3220), no ODP | Legacy DC | | **Inergen** (IG-541) | Inert gas (52% Nβ‚‚, 40% Ar, 8% COβ‚‚) | Completely safe, natural gas | Large volume, high pressure | Enterprise DC | | **Argonite** (IG-55) | 50% Ar, 50% Nβ‚‚ | Safe, natural | Large volume, higher pressure | Enterprise DC | | **Water mist** | Water (fine mist) | Cooling, smoke suppression, low cost | Water in DC (risk), local application only | Retrofits | | **Pre-action sprinkler** | Water | Dual activation (detection + sprinkler) | Water risk, drainage required | Tier I-II | **Concentration**: Novec (4-6 % volume), FM-200 (7-9 %), Inergen (35-50 %). Novec and Inergen are safe for breathing (min. 5-7 min evacuation). ### Detection zones ``` DC hall ──> zones of ~200 mΒ² (max) β”‚ β”œβ”€β”€ VESDA (each zone its own aspirator) β”œβ”€β”€ Smoke detectors (ceiling + floor) └── Heat detection (backup) ``` ## DCIM (Data Center Infrastructure Management) ### What DCIM covers | Area | Metrics | Output | |------|---------|--------| | **Power** | Per PDU, per outlet, per rack, total | Capacity planning, PUE, kW/rack | | **Cooling** | Temperature, humidity, airflow (sensors per rack) | Hot spot maps, airflow optimization | | **Asset** | What is in which rack, U position, serial, warranty | Asset inventory, lease management | | **Network** | Port utilization, patch panel connections | Patch management, port tracking | | **Space** | Free U in rack, free racks | Capacity planning, "what-if" simulations | ### Tools | Tool | Type | Platform | Cost | Note | |------|------|----------|------|------| | **Nlyte (Carrier)** | Enterprise DCIM | On-prem / Cloud | $$$ | Market leader, complex | | **Sunbird DCIM** | Enterprise DCIM | Cloud | $$$ | Power monitoring, asset tracking | | **Device42** | DCIM + IPAM | On-prem / Cloud | $$ | Integrated IPAM, CMDB | | **NetBox** | Open source DCIM | On-prem | Free | IPAM, DCIM, asset tracking | | **OpenDCIM** | Open source | On-prem | Free | Basic DCIM, asset management | | **RackTables** | Open source | On-prem | Free | Simple, asset + networking | | **Vendor-specific** | Dell OME, HPE OneView | On-prem | Part of HW | Vendor-specific only | ## Site selection ### Criteria for DC site selection | Category | Criterion | Weight | |----------|-----------|--------| | **Power** | Electricity availability (grid capacity), cost/kWh, possibility of two independent feeds | High | | **Connectivity** | Fiber backbone availability, number of connectivity providers, latency to major POP | High | | **Natural risks** | Earthquakes, floods, hurricanes, tornadoes, wildfires β€” historical data + predictions | High | | **Climate** | Average temperature, humidity (free cooling potential) | Medium | | **Workforce** | Availability of technicians, DC operators, network/admin engineers | Medium | | **Taxes and regulation** | Tax incentives, environmental regulations, building permits | Medium | | **Security** | Crime, political stability, terrorist risk | High | | **Transport accessibility** | Proximity to airport, highway (for HW deliveries, personnel) | Low | ### Natural risks β€” mapping | Risk | Areas | Mitigation | |------|-------|------------| | **Earthquakes** | Pacific Ring of Fire (CA, Japan, Chile) | Base isolation, seismic bracing, flexible connections | | **Hurricanes** | Caribbean, southeastern US, southeast Asia | Reinforced construction, generators above flood level | | **Floods** | River valleys, coastal areas | Location outside flood zone, barriers | | **Wildfires** | California, Australia, Mediterranean | Defensive zones, air filtration, monitoring | ### Power availability by region | Region | Grid reliability | Cost/kWh (industrial) | Note | |--------|-----------------|------------------------|------| | **Northern Europe** (SE, NO, FI) | High (99.99 %) | $0.04-0.08 | Cheap green energy, cool climate | | **Central Europe** (DE, NL, CZ) | High (99.99 %) | $0.10-0.20 | Stable, growing renewables | | **Eastern US** (VA, NC) | High | $0.05-0.08 | Largest DC hub (Ashburn, VA) | | **Western US** (CA, OR) | Medium (PG&E issues) | $0.10-0.15 | CALISO grid, blackout risk | | **Singapore** | High | $0.15-0.20 | Moratorium on new DC (2023), water | | **Dubai / UAE** | High | $0.06-0.10 | Cheap energy, high temperature (cooling) | ## Compliance and certification | Standard / Certification | Area | Description | |-------------------------|------|-------------| | **TIA-942** (Rated 1-4) | DC design | Classification of redundancy, cabling, security (analogous to Uptime Tier) | | **Uptime Institute** (Tier I-IV) | DC design | Operational certification, construction documentation | | **ISO 27001** | ISMS | Information security, risk management | | **ISO 27701** | Privacy | Extension of ISO 27001 for GDPR compliance | | **SOC 2** (Type I/II) | Service org | Controls: Security, Availability, Confidentiality, Integrity, Privacy | | **PCI DSS** | Payment cards | Physical security, access to cardholder data | | **HIPAA** | Healthcare | USA, health data protection | | **FedRAMP** | US government | Cloud service authorization, DC security | | **GDPR** | EU | Personal data protection, data residency | | **NIST SP 800-53** | DC security | Security control catalog for US federal | | **ISO 14001** | EMS | Environmental management, sustainability | ## Sustainability ### Carbon footprint of DC ``` Total emissions = Scope 1 (direct) + Scope 2 (energy) + Scope 3 (supply chain) Scope 1: Generators (diesel), refrigerant leaks Scope 2: Purchased electricity (grid mix) Scope 3: HW manufacturing, transport, EOL recycling (~60-80 % of total emissions) ``` ### Emission reduction | Measure | Impact on PUE | Emission reduction | Payback | |---------|--------------|-------------------|---------| | **Temperature increase** (22β†’27 Β°C) | βˆ’0.1-0.2 | 10-20 % cooling | Immediate | | **Free cooling** | βˆ’0.1-0.3 | 20-40 % cooling | 1-2 years | | **Liquid cooling** | βˆ’0.2-0.4 | 30-50 % cooling | 2-4 years | | **LED lighting + sensors** | βˆ’0.01-0.02 | < 1 % | 1 year | | **PPA (Power Purchase Agreement)** | β€” | 100 % Scope 2 | Variable | | **Renewable sources** (rooftop solar) | β€” | 5-15 % consumption | 5-10 years | | **Green generator** (HVO biodiesel) | β€” | 90 % COβ‚‚ reduction | +30 % fuel cost | ### Sustainability certifications | Certification | Description | |--------------|-------------| | **LEED** (BD+C: DC) | U.S. Green Building Council β€” design and construction | | **BREEAM** | UK, European sustainability assessment | | **Climate Neutral Data Centre Pact** (EU) | Self-regulatory, PUE < 1.4 by 2030 | | **ISO 50001** | Energy management system | | **Energy Star** | EPA, energy efficiency (US only) | ## Decision diagram β€” DC topology design ```mermaid flowchart TD Start(["DC design"]) --> TIER{"Required Tier?"} TIER -->|"Tier I / II"| T1["N / N+1 redundancy
Simple power, single path
CRAC/CRAH, free cooling
PUE 1.4-1.6, cost 1Γ—"] TIER -->|"Tier III"| T3["N+1, concurrently maintainable
Dual path (A/B feed)
Hot aisle containment
PUE 1.2-1.4, cost 2Γ—"] TIER -->|"Tier IV"| T4["2N+1, fault tolerant
Dual redundant + STS
Hot + cold containment
PUE 1.1-1.3, cost 3Γ—"] TIER --> POWER{"Power chain"} POWER -->|"UPS"| UPS{"UPS type"} UPS -->|"Enterprise DC"| UPS1["VFI double-conversion
Li-ion (LFP), 10-15 years
N+1 or 2N modular"] UPS -->|"Edge / office"| UPS2["VI line-interactive
VRLA, 3-5 years"] POWER -->|"Generator"| GEN["Diesel 500-2500 kVA
Tank for 24-72 h
ATS 4-10 ms switching"] POWER -->|"PDU"| PDU["3-phase 400 V
Monitored/Switched
A/B feed to racks"] Start --> DENS{"Power density"} DENS -->|"< 10 kW/rack"| COOL1["Air cooling
CRAC/CRAH, raised floor
Hot aisle containment
ASHRAE A1-A2"] DENS -->|"10-25 kW/rack"| COOL2["Hybrid
In-row cooling
Rear door HX
ASHRAE A1-H1"] DENS -->|"> 25 kW/rack"| COOL3["Liquid cooling
CDU, direct-to-chip
Immersion single/two-phase
ASHRAE W-classes"] Start --> CLIM{"Climate zone"} CLIM -->|"Moderate (CZ, DE)"| FC1["Free cooling 4000-6000 h/year
Chiller + economizer
PUE saving 0.2-0.3"] CLIM -->|"Warm (ES, US South)"| FC2["Chiller year-round
Adiabatic cooling
PUE 1.3-1.6"] CLIM -->|"Cold (SE, NO)"| FC3["Free cooling 7000+ h/year
Air-side economizer
PUE < 1.2"] ``` ## Secondary data center topologies When planning a second DC, the choice of topology is key based on distance, RPO/RTO, and budget. ### Distance classification | Category | Distance | Latency (round-trip) | Use case | |-----------|-----------|---------------------|----------| | **Metro (Campus)** | 1–20 km | < 1 ms | Synchronous replication, stretched cluster | | **Metro** | 20–100 km | 1–5 ms | Metro cluster, mostly sync replication | | **Regional** | 100–500 km | 5–20 ms | Asynchronous replication, warm standby | | **Continent** | 500–3000 km | 20–100 ms | Asynchronous replication, cold standby | | **Global** | 3000+ km | > 100 ms | Async only, no real-time dependencies | ### Topologies by operational mode #### Active-Active (Hot-Hot) ``` DC-A (Primary) DC-B (Active) β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ App Active β”‚ β”‚ App Active β”‚ β”‚ DB Active │◄─sync─►│ DB Active β”‚ β”‚ Users β†’ LB β†’ A β”‚ β”‚ Users β†’ LB β†’ B β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ └──── Global Load Balancer β”€β”€β”€β”€β”˜ ``` | Parameter | Value | |----------|---------| | **RTO** | 0–seconds (automatic failover, traffic is redirected) | | **RPO** | 0 (sync replication, commit is confirmed only after write to both DCs) | | **Max distance** | < 100 km (latency < 5 ms RTT for sync DB replication) | | **Operating costs** | 2Γ— (both DCs fully active, both fully equipped) | | **Advantages** | Zero downtime, instant switchover, full utilization of both DCs | | **Disadvantages** | Requires synchronous replication β†’ distance limit, complex networking, split-brain risk | **Split-brain solutions**: STONITH (Shoot The Other Node In The Head), watchdog, quorum (3rd node in 3rd location / cloud), fencing, SCSI-3 persistent reservation. **Use case**: Financial services, telco, payment gateways β€” where even a minute of downtime = millions. #### Active-Passive (Hot-Warm, MetroCluster) ``` DC-A (Primary) DC-B (Standby) β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ App Active β”‚ β”‚ App Standby β”‚ β”‚ DB Primary │──sync──►│ DB Standby β”‚ β”‚ Users β†’ LB β†’ A β”‚ β”‚ ~~~ (waiting) ~~~ β”‚ β”‚ DNS: A-record β”‚ β”‚ DNS: health check β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` | Parameter | Value | |----------|---------| | **RTO** | tens of seconds–minutes (DNS failover + App startup) | | **RPO** | 0 (sync) or seconds (async) | | **Max distance** | sync < 100 km, async unlimited | | **Operating costs** | 1.5–1.8Γ— (second DC has reduced or idle compute) | | **MetroCluster** | Specific implementation: FC SAN over DWDM, sync mirror, automatic failover | **MetroCluster** (NetApp, Dell EMC, HPE): - Storage-based cluster with synchronous mirroring between DCs - Automatic failover on entire DC failure - Requires dedicated DWDM or dark fiber interconnection - Typical distance: up to 50 km (for latency < 1 ms RTT) - Use case: enterprise storage, primary+secondary DC in metropolitan area #### Hot-Cold (Warm Standby β†’ Cold) ``` DC-A (Primary) DC-B (Cold Standby) β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ App Active β”‚ β”‚ ~~~ powered off ~~~β”‚ β”‚ DB Active │──async─►│ Backup storage β”‚ β”‚ Users β†’ A β”‚ β”‚ ~~~ no compute ~~~β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` | Parameter | Value | |----------|---------| | **RTO** | hours–days (purchase/rent HW, restore from backup) | | **RPO** | hours (last backup) | | **Max distance** | unlimited | | **Operating costs** | 1.1–1.3Γ— (only storage and facility, compute only at failover) | | **Typical use case** | Low-cost DR, compliance, last resort | #### Pilot Light ``` DC-A (Primary) DC-B (Pilot Light) β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ App Active β”‚ β”‚ ~~~ off ~~~ β”‚ β”‚ DB Active │──async─►│ DB replica (mini) β”‚ β”‚ All services β”‚ β”‚ Core services onlyβ”‚ β”‚ β”‚ β”‚ (DNS, LDAP, mon) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ On DR: spin-up compute from IaC, rest from backup ``` - DC-B runs with minimum compute (only core services and DB replica) - Application layer is spun up from IaC (Terraform, Ansible) only during DR - Compromise between cost and RTO ### Comparison table | Topology | RTO | RPO | Cost (Γ— primary) | Max distance | Failover | |-----------|-----|-----|-------------------|-------------|----------| | **Active-Active** | 0–s | 0 | 2.0Γ— | < 100 km | Auto (traffic) | | **MetroCluster** | s–min | 0 | 1.8–2.0Γ— | < 50 km | Auto (storage) | | **Active-Passive (sync)** | min | 0 | 1.5–1.8Γ— | < 100 km | Semi-auto | | **Active-Passive (async)** | min–h | s–min | 1.3–1.5Γ— | unlimited | Semi-auto | | **Pilot Light** | h | min–h | 1.2–1.4Γ— | unlimited | Manual | | **Warm Standby** | min–h | s–min | 1.5–1.8Γ— | unlimited | Semi-auto | | **Cold Standby** | days | h | 1.1–1.3Γ— | unlimited | Manual | ### Stretched Cluster ``` β”Œβ”€β”€β”€β”€ Site A (50 km) ────┐ β”Œβ”€β”€β”€β”€ Site B ──────────┐ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ ESXi / Hyper-V β”‚ β”‚ β”‚ β”‚ ESXi / Hyper-V β”‚ β”‚ β”‚ β”‚ VM β”‚ β”‚ β”‚ β”‚ VM (complement) β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Storage (SAN) │──┼────┼──│ Storage (SAN) β”‚ β”‚ β”‚ β”‚ MetroCluster β”‚ β”‚ β”‚ β”‚ MetroCluster β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”‚ vCenter / β”‚ β”‚ Cluster β”‚ β”‚ (single) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` - One cluster stretched across two sites (single management domain) - VMs can live-migrate between sites (vMotion over distance) - Storage synchronously mirrored (MetroCluster, VPLEX, vSANε»ΆδΌΈ) - **Requirements**: dark fiber / DWDM, low latency (< 5 ms), high link reliability - **Risks**: split-brain, brain drain (split-site cluster), network dependency - **Use case**: enterprise with own dark fiber between two DCs in a metropolitan area ### Decision tree ```mermaid flowchart TD Start(["Secondary DC"]) --> RPO{"Required RPO?"} RPO -->|"0 (no data loss)"| SYNC{"Sync replication possible?"} SYNC -->|"Yes, < 100 km"| ACT{"Want zero downtime?"} ACT -->|"Yes"| AA["Active-Active
RTO=0, RPO=0, 2Γ— cost"] ACT -->|"No"| AP["Active-Passive
RTO=min, RPO=0, 1.5Γ—"] SYNC -->|"No, > 100 km"| ASYNC["Active-Passive (async)
RTO=min, RPO=s, 1.3Γ—"] RPO -->|"minutes–hours"| WARM{"Want fast failover?"} WARM -->|"Yes"| PILOT["Pilot Light
RTO=h, RPO=min, 1.2Γ—"] WARM -->|"No"| COLD["Cold Standby
RTO=days, RPO=h, 1.1Γ—"] Start --> DIST{"Distance between DCs"} DIST -->|"< 50 km, own fiber"| MC["MetroCluster / Stretched Cluster
Single management, sync storage"] DIST -->|"50–300 km"| REG["Regional DR
Active-Passive, async replication"] DIST -->|"> 300 km"| GLOBAL["Global DR
Cold standby, backup & restore"] ``` ### Physical infrastructure for DC interconnection | Technology | Bandwidth | Max distance | Latency | Use case | |------------|-----------|-------------|---------|----------| | **Dark fiber** | 100 GbE–800 GbE | 10–80 km (single-mode) | < 0.1 ms | MetroCluster, stretched cluster | | **DWDM** | 400 GbE–1.6 TbE (per lambda) | 80–120 km (without amplifier) | < 0.5 ms | Metro, metro cluster | | **CWDM** | 10–25 GbE (per channel) | 10–40 km | < 0.3 ms | Campus, smaller metro | | **MPLS L2VPN** | 10–100 GbE | unlimited | 1–10 ms | Regional DR, async replication | | **Internet IPsec** | 1–10 GbE | unlimited | 5–50 ms | Cold standby, backup | ### Impact of individual technologies on DC topology selection Choosing a secondary DC topology is not purely an infrastructure decision β€” each layer (DB, hypervisor, orchestration, messaging) brings its own constraints. #### Databases | DB technology | Sync replication | Max distance | Auto-failover | Split-brain handling | Note | |---------------|---------------|-------------|---------------|-------------------|----------| | **PostgreSQL** | Synchronous commit (synchronous_standby_names) | < 100 km (latency < 10 ms) | Patroni / repmgr + etcd | Quorum (etcd, 3+ node) | Streaming replication, needs wal_keep_segments | | **MySQL** | Group Replication (multi-primary, single-primary) | < 100 km | MySQL InnoDB Cluster + MySQL Router | Paxos (Group Replication, 3+ node) | Semi-sync as compromise | | **Oracle** | Data Guard (SYNC/FASTSYNC/ASYNC), RAC extended | sync < 100 km, async unlimited | Data Guard Broker / FSFO (Fast Start Failover) | Observer (3rd node) | Far Sync for remote DCs | | **MSSQL** | AlwaysOn Availability Groups (SYNCHRONOUS_COMMIT) | < 100 km | AlwaysOn + Cluster quorum | File share majority / cloud witness | Multi-site cluster support | | **MongoDB** | Majority write concern + journaling | < 100 km | Replica set auto-election | Arbitration node (voting member) | Priority-based failover | | **Cassandra** | N/A (multi-master, eventual consistency) | unlimited | Yes (peer-to-peer) | None (multi-master, gossip protocol) | Snitch-aware topology, NetworkTopologyStrategy | | **Redis** | Redis Sentinel / Redis Cluster (async) | unlimited (async) | Sentinel / Cluster failover | Quorum (Sentinel, majority) | PSYNC replication, replication lag | Key limitation for **sync replication**: latency < 5 ms RTT (commit must wait for confirmation from both DCs). At 100 km RTT ~1 ms β€” OK. At 1000 km (~10 ms RTT) sync replication reduces transaction throughput by 80+ %. Suitable for **Active-Active**: - **Cassandra / ScyllaDB** β€” native multi-DC, eventual consistency, no split-brain - **MySQL Group Replication (multi-primary)** β€” 3+ DC for quorum - **CockroachDB / TiDB** β€” native multi-region, ACID across DCs - **Redis Enterprise** β€” Active-Active (CRDT-based) Suitable for **Active-Passive**: - **PostgreSQL + Patroni** β€” auto-failover, etcd quorum - **Oracle Data Guard** β€” FSFO, far sync for remote DCs - **MSSQL AlwaysOn** β€” cloud witness - **MongoDB Replica Set** β€” arbitration node in 3rd location #### Hypervisors | Hypervisor | Cluster technology | Stretched cluster | Max distance | Split-brain | |-----------|-------------------|-------------------|-------------|-------------| | **VMware vSphere** | vSANε»ΆδΌΈ, Metro vCenter, Site Recovery Manager | Yes (vSANε»ΆδΌΈ, Metro Cluster) | < 50 km (vSANε»ΆδΌΈ), < 10 ms RTT | Fencing (STONITH), witness host | | **Hyper-V** | Storage Replica + Failover Cluster | Yes (Cluster Sets) | < 50 km (sync), unlimited (async) | File share witness / cloud witness | | **Proxmox VE** | Proxmox HA + Ceph | Limited (Ceph stretch cluster) | < 50 km (Ceph sync) | Ceph monitor quorum (3+ DC) | | **XCP-ng / XenServer** | Xen Orchestra HA + SR (Storage Repository) replication | Limited | depends on storage replication | β€” | | **Nutanix AHV** | Metro Availability (sync), Async DR | Yes (Metro) | < 100 km (sync), unlimited (async) | Witness VM (cloud / 3rd site) | | **KVM / oVirt** | oVirt HA + GlusterFS / NFS | Limited | depends on storage replication | β€” | **vSANε»ΆδΌΈ specific requirements:** - Dedicated vSAN network (25 GbE min., < 5 ms RTT) - Witness host in 3rd location (or cloud witness) - All VM policies (FTT=1, mirroring striped) - Storage policy: `site-A + site-B + witness` #### Kubernetes and container platforms | Platform | Multi-cluster DR | Replication | Max distance | Failover | |-----------|-----------------|-----------|-------------|----------| | **Vanilla K8s** | KubeFed, Cluster API, Velero + Restic | Velero (backup/restore), Rook (Ceph) | unlimited | Manual (Velero restore) | | **OpenShift** | ACM (Advanced Cluster Management), Velero | OADP (OpenShift API for Data Protection) | unlimited | ACM failover (subscription) | | **Rancher** | Rancher Multi-Cluster App, Velero | Longhorn (sync/async DR), Velero | unlimited | Semi-auto | | **Google GKE** | Multi-cluster Services, Backup for GKE | Config Sync, Backup for GKE | unlimited | Manual | | **Azure AKS** | Azure ARC + Velero + Azure Traffic Manager | AKS backup (velero), Azure Site Recovery | unlimited | Manual (Velero) | | **AWS EKS** | EKS multi-cluster, Velero + S3 cross-region | Velero (S3), Rook (EBS snapshots) | unlimited | Manual | **Key K8s DR principles:** - **Applications must be stateless** (or state externalized to DB/storage) - **Velero** β€” backup/restore entire cluster (PV, resources, helm releases) - **Rook/Ceph** β€” cross-region mirroring RBD volumes - **KubeFed / ACM** β€” subscription-based deploy to multiple clusters - **Ingress/Gateway API** β€” traffic routing between clusters - **External DNS** β€” DNS failover on cluster outage #### Messaging / streaming | Platform | Replication | Topology | DR support | Max distance | |-----------|-----------|-----------|------------|-------------| | **Apache Kafka** | MirrorMaker 2, Confluent Cluster Linking, KRaft quorum | Active-Passive (MM2), Active-Active (Cluster Linking) | MM2: async, Cluster Linking: async | unlimited | | **RabbitMQ** | Classic Queue Mirroring, Quorum Queues | Active-Passive (Warm Standby) | Federation / Shovel (async) | unlimited | | **Red Hat AMQ** | (Artemis) Cluster + HA | Active-Passive (shared store / replication) | Live-backup pair | < 100 km (sync) | | **NATS** | NATS JetStream (cluster + cross-account) | Active-Active (Leaf nodes, cross-account) | Super-cluster, failover | unlimited | | **Apache Pulsar** | BookKeeper (bookie rack-aware), geo-replication | Active-Active (geo-replication) | Built-in (cluster-level) | unlimited (async) | | **AWS SQS/SNS** | Managed, AWS region pairs | Active-Active (multi-region) | Built-in (AWS managed) | unlimited | | **Azure Service Bus** | Managed, paired region | Active-Passive (paired region) | Built-in (geo-recovery) | unlimited | | **Oracle Service Bus (OSB)** | Oracle WebLogic Cluster + JDBC store + AQ | Active-Passive (WebLogic Cluster + Data Guard) | OSB/WLS cluster + Oracle RAC/Data Guard sync | < 100 km (Data Guard sync), unlimited (async) | **Messaging DR recommendations:** - **Kafka**: use Cluster Linking for Active-Active, or MirrorMaker 2 for Active-Passive; replicate only critical topics - **RabbitMQ**: Quorum Queues + Federation upstream for DR; avoid Classic Queue Mirroring (deprecated) - **Pulsar**: native geo-replication, bookie rack-aware for stretched cluster; easiest DR among messaging platforms - **OSB**: WebLogic cluster + Oracle RAC/Data Guard; DR depends on DB layer, not on OSB itself ### Per-layer limitations summary table | Layer | Limiting factor for secondary DC | Max distance for sync | Impact on topology selection | |--------|-----------------------------------|----------------------|--------------------------| | **Storage** | Sync mirror latency, DWDM cost | < 50 km (MetroCluster) | Stretched cluster only in metro | | **Databases** | Commit wait for sync replication | < 100 km (5 ms RTT) | Active-Active only with multi-master DB | | **Hypervisor** | Stretched cluster quorum + fencing | < 50 km (vSAN, 5 ms) | MetroCluster / stretched cluster | | **Kubernetes** | Velero restore time, Rook mirror latency | unlimited (async) | Active-Passive, cold standby | | **Messaging** | Replication lag, offset management | unlimited (async) | Active-Active (Kafka, Pulsar, NATS) or Active-Passive | | **Network** | Dark fiber/DWDM cost, latency | < 100 km (metro fiber) | Limits sync replication options | | **Application** | Stateful/stateless, connection draining | depends on architecture | Stateless app β†’ any topology | ## Disk monitoring β€” S.M.A.R.T. Self-Monitoring, Analysis and Reporting Technology β€” predictive monitoring of HDD/SSD. | Key attribute | ID | Description | |--------------|----|-------------| | Reallocated Sectors Count | 5 | Number of remapped sectors (increase = end of disk life) | | Power-On Hours | 9 | Total operating time in hours | | Reported Uncorrectable Errors | 187 | Uncorrectable errors (red flag) | | CRC Error Count | 199 | Errors on SATA link (cable/controller) | | SSD Life Left | 231 | % remaining SSD life | | Media Wearout Indicator | 233 | Total NAND writes | Tools: `smartmontools` (smartctl, smartd), Prometheus exporter (`node_exporter`), OTeL collector. ## Sources Links, books and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md) ### Recommended literature | Book | Authors | ISBN | Description | |------|---------|------|-------------| | The Data Center as a Computer (4th ed., 2025) | Barroso, HΓΆlzle, Ranganathan | 978-3-031-99488-3 | Comprehensive design evolution of warehouse-scale computer (WSC) by Google architects. Covers hardware, software, power, cooling, networking and 25 years of WSC experience. Key publication for datacenter architecture. | | Electronics Cooling: From the Chip to the Datacenter (Vol. 62) | Abraham et al. | 978-0-443-47084-4 | Practical guide to thermal management from transistor level to datacenter. Covers conduction, convection, liquid immersion and phase change cooling. Essential resource for DC cooling design. | ## Datacenter backbone services When building a new DC, basic infrastructure services must be deployed first β€” without them, higher layers cannot operate: ### DNS | Role | Service | Description | |------|---------|-------------| | **Authoritative** | Bind, PowerDNS, NSD | Primary DNS zone for internal domains | | **Recursive** | Unbound, Bind (caching), CoreDNS | Resolver for internal + external queries | | **Anycast** | DNS anycast (BGP) | Redundancy, lower latency | | **Integration** | Infoblox, BlueCat, dnsmasq | IPAM + DNS + DHCP in one | Best practices: separate auth and recursive resolvers, DNSSEC, split-horizon (internal vs external view), TSIG for zone transfers, monitoring (DNS query latency, NXDOMAIN rate). ### NTP (time synchronization) - **Primary**: GPS-disciplined NTP servers (Microchip S600, Meinberg) - **Secondary**: Stratum 1/2 NTP (ntpd, chrony, NTPsec) - **All nodes**: chrony (modern replacement for ntpd), local NTP server on each rack switch (boundary clock) - **Precision**: PTP (IEEE 1588) for telco/fintech β€” sub-microsecond accuracy - **DC topology**: GPS antenna β†’ Grandmaster (PTP) β†’ Boundary clock (rack switch) β†’ Ordinary clock (server) ### DHCP + IPAM | Tool | Description | |------|-------------| | **ISC DHCP** | Legacy, still widely deployed | | **Kea** | Modern replacement for ISC DHCP (ISC + Linux Foundation) | | **Infoblox / BlueCat** | Enterprise IPAM + DHCP + DNS | | **NetBox / phpIPAM** | Open-source IPAM | ### LDAP / Identity Management | Tool | Description | |------|-------------| | **FreeIPA** | Integrated IDM (LDAP + Kerberos + DNS + CA) β€” Linux | | **Active Directory** | Microsoft, LDAP + Kerberos + Group Policy | | **389 Directory Server** | Open-source LDAP (Red Hat) | | **OpenLDAP** | Classic open-source LDAP | | **Keycloak / Authentik** | Modern OIDC/SAML/LDAP gateways | ### PKI and certificates - **Enterprise CA**: EJBCA, Smallstep, HashiCorp Vault (PKI engine) - **ACME**: Cert-Manager (Kubernetes), certbot (Let's Encrypt) - **mTLS**: Vault PKI, spire (SPIFFE), Cilium - **Best practices**: root CA offline, intermediate CA per environment, short-lived certificates (max 90 days), revocation (CRL/OCSP) ### Monitoring and observability See [MONITORING.en.md](MONITORING.en.md). Before running first workloads, DC must have: - Metric collection (Prometheus, Zabbix) - Centralized logs (Loki, ELK) - Alerting (Alertmanager, PagerDuty) - Uptime monitoring (heartbeat checks) ### Deployment logistics β€” step order ``` 1. DNS (at least recursive + local resolver) 2. NTP (time synchronization) 3. DHCP + IPAM (first servers get IPs) 4. LDAP / IAM (users, groups, access rights) 5. PKI (certificates for encryption) 6. Configuration management (Ansible, Puppet) 7. Monitoring + logging (see what's happening) 8. Container registry / Package repo (docker registry, apt/yum mirror) 9. Load balancer (for services) 10. Storage backend (Ceph, NFS, SAN) 11. Orchestration (Kubernetes, OpenStack) ``` ## OpenStack in the datacenter OpenStack brings a software abstraction layer to DC enabling multi-tenancy and self-service: ### Control plane architecture - **Controller nodes** β€” management services (Keystone, Nova API, Neutron API, Horizon, RabbitMQ, DB) - **Compute nodes** β€” hypervisor (KVM), Nova Compute, Neutron agent - **Storage nodes** β€” Ceph OSD, Cinder volumes, Swift object storage - **Network nodes** β€” Neutron L3 router, DHCP agent, DVR ### Requirements for DC infrastructure | Component | Requirement | |-----------|-------------| | **Controller** | 3-5 node HA cluster, 16+ vCPU, 32+ GB RAM, SSD | | **Compute** | Dense performance per rack (GPU, high-core), NUMA-aware design | | **Storage (Ceph)** | 10-25 GbE networking, NVMe/SSD OSD, 3+ replica | | **Network** | 25/100 GbE spine-leaf, L3 BGP underlay, VXLAN overlay | | **Rack power** | 10-30 kW/rack for GPU compute | ### Use cases - Private cloud for enterprise (multi-tenant, self-service Horizon) - NFVI for telco (DPDK, SR-IOV, low-latency) - Academic / HPC clusters (Ironic, Cyborg, Manila) - Government / regulated environments (on-prem, audit trail) *Last revision: 2026-06-03*