Files
knowledge-base/DATACENTERS.en.md
Stanislav Hubacek 3fa11ef0f6 comiiit
2026-06-11 15:27:28 +02:00

789 lines
38 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 🏭 Datacenters
## Tier classification (TIA-942 / Uptime Institute)
| Tier | Availability | Downtime / year | Redundancy |
|------|-------------|-----------------|------------|
| **Tier I** | 99.671 % | 28.8 h | N — no redundancy |
| **Tier II** | 99.741 % | 22.7 h | N+1 — redundant components |
| **Tier III** | 99.982 % | 1.6 h | N+1 — concurrently maintainable |
| **Tier IV** | 99.995 % | 26.3 min | 2N+1 — fault tolerant |
## Key subsystems
| System | Description |
|--------|-------------|
| **Power** | UPS, generators (diesel), ATS, PDU, redundant feeds (A/B feed) |
| **Cooling** | CRAC/CRAH, chilled water, free cooling, containment (hot/cold aisle) |
| **Physical security** | CCTV, biometric access, mantrap, rack security locks |
| **Cabling** | Structured cabling (Cat6A/7/8, OM3/OM4 single-mode fiber), patch panels |
| **Fire suppression** | Alarm, inert gases (Novec, FM-200), VESDA (very early smoke detection) |
| **Monitoring** | DCIM (Data Center Infrastructure Management), SNMP, BMS (Building Management System) |
## Aisle containment
```
┌────────────────────────────────────┐
│ Rack Row │
│ ┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐ │
Cold │ │ │ │ │ │ │ │ │ │ │ │ │ │ Cold
Aisle <──│ └──┘ └──┘ └──┘ └──┘ └──┘ └──┘ ──> Aisle
│ ┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐ │
Hot │ │ │ │ │ │ │ │ │ │ │ │ │ │ Hot
Aisle ──>│ └──┘ └──┘ └──┘ └──┘ └──┘ └──┘ <── Aisle
└────────────────────────────────────┘
```
## Environmental classes (ASHRAE TC 9.9)
ASHRAE Technical Committee 9.9 defines temperature and humidity envelopes for IT equipment in DC.
| Class | Temperature (recommended) | Temperature (allowable) | Usage |
|-------|--------------------------|-------------------------|-------|
| **A1** | 18-27 °C | 15-32 °C | Enterprise DC, strict control |
| **A2** | 18-27 °C | 10-35 °C | Standard DC |
| **A3** | 18-27 °C | 5-40 °C | Looser environment |
| **A4** | 18-27 °C | 5-45 °C | Maximum cooling savings |
| **H1** | 18-22 °C | 5-25 °C | High-density air-cooled (AI/ML) |
- 5th edition (2021) added class H1 for high-density and extended liquid cooling W-classes (W17, W27, W32, W40, W45, W+)
- 2024: new S-classes for Technology Cooling System (TCS) liquid cooling
- Humidity: recommended 9 °C DP to 70 % RH (at low pollutants); max 50 % RH at high corrosivity
## Power
### Power chain
```
Grid ──> Transformer ──> UPS ──> PDU ──> Rack PDU ──> Server PSU
├──> Generator (ATS switches on outage)
└──> STS/ATS (Static Transfer Switch)
```
A/B feed topology:
```
Grid A ──> UPS A ──> PDU A1 ──> Rack PDU A ──> PSU A (server)
Grid B ──> UPS B ──> PDU B1 ──> Rack PDU B ──> PSU B (server)
```
Each server has 2 PSUs — each powered from a different branch (A/B). On failure of one branch, the server continues without interruption.
### UPS types
| Classification | IEC 62040-3 | Description | Switching | Use case |
|--------------|-------------|-------------|-----------|----------|
| **VFD** (Voltage & Frequency Dependent) | Passive standby | UPS in bypass, switches to inverter on failure | 4-10 ms | SOHO, edge |
| **VI** (Voltage Independent) | Line-interactive | Voltage regulation via autotransformer | 2-4 ms | Smaller racks, office |
| **VFI** (Voltage & Frequency Independent) | Double-conversion | AC → DC → AC, full isolation, zero switching time | 0 ms | Enterprise DC, Tier III/IV |
For DC the standard is **VFI (double-conversion)** — online UPS, zero switching time, full isolation from the grid.
### Battery technologies
| Type | Density (Wh/L) | Lifespan (cycles) | Lifespan (years) | Temperature | Cost/kWh | Note |
|------|---------------|-------------------|------------------|-------------|----------|------|
| **VRLA** (AGM/Gel) | 50-80 | 200-500 | 3-5 | 20-25 °C | ~$150-200 | Cheap, large, heavy, temperature sensitive |
| **Li-ion (LFP)** | 200-350 | 3000-5000 | 10-15 | 0-40 °C | ~$300-500 | Small, light, long life, BMS required |
| **Li-ion (NMC)** | 250-400 | 1000-2000 | 8-12 | 0-40 °C | ~$250-400 | Higher density, thermal runaway risk |
| **NiCd** | 80-150 | 1000-2000 | 10-15 | 20-50 °C | ~$400-600 | Extreme temperatures, memory effect |
| **Flow battery** (V/Zn/Br) | 20-40 | 10,000+ | 20+ | 10-35 °C | ~$500-800 | Unlimited cycles, large, long-term backup |
Li-ion (LFP) is becoming the standard for new DCs due to longer life, smaller footprint, and better behavior at high temperatures.
### Generator sizing
| Variant | Size | Fuel | Start time | Run time | Use case |
|---------|------|------|------------|----------|----------|
| **Diesel** | 500-2500 kVA | Diesel | 10-30 s | 24-72 h (depending on tank) | Standard for enterprise DC |
| **Nat. gas** | 200-1500 kVA | Natural gas | 10-30 s | Unlimited (pipeline) | Less common, lower emissions |
| **CHP** (cogeneration) | 500-2000 kVA | Natural gas | 5-15 min | Unlimited | Combined power + cooling (absorption chiller) |
Sizing: Generator should cover 100 % IT load + 100 % cooling load (incl. chillers) — typically 1.3-1.8× IT load. Diesel tank min. for 24 h operation, commonly 48-72 h. Daily consumption ~0.3-0.4 L/kWh.
### ATS vs STS
| Feature | ATS (Automatic Transfer Switch) | STS (Static Transfer Switch) |
|---------|--------------------------------|-----------------------------|
| **Switching** | 4-10 ms (mechanical relay) | < 4 ms (thyristor) |
| **Lifespan** | ~10,000 switches | Unlimited (solid-state) |
| **Cost** | Low | High (~3-5× ATS) |
| **Use case** | Generator → UPS feed | Between two UPS outputs |
### PDU types
| Type | Description | Use case |
|------|-------------|----------|
| **Basic** | Passive splitter (no monitoring) | Edge, office |
| **Metered** | Current measurement at PDU level | Standard DC |
| **Monitored** | Measurement per outlet, SNMP, web GUI | Enterprise DC |
| **Switched** | On/off per outlet, remote reboot | Enterprise DC, colo |
| **High-density** | 3-phase, 60-100 A, C19 outlets | GPU/HPC/AI racks |
### Power calculation
```
Total Power = Σ(P_server + P_storage + P_network + P_cooling + P_losses)
P_server = P_idle + (P_max - P_idle) × Utilization%
P_cooling = P_IT / PUE
Example:
100 servers × 500 W (avg) = 50 kW IT load
PUE = 1.5 → total 75 kW
UPS + generator → sized for 75 kW × 1.2 (safety factor) = 90 kW
```
### PUE (Power Usage Effectiveness)
```
PUE = Total Facility Energy / IT Equipment Energy
```
| PUE | Efficiency | Type |
|-----|-----------|------|
| 1.0-1.1 | Excellent | Hyperscale (Google, Meta) |
| 1.1-1.3 | Very good | Modern DC |
| 1.3-1.6 | Good / average | Enterprise DC |
| 1.6-2.0 | Below average | Older DC |
| >2.0 | Poor | Legacy |
PUE is measured at the whole DC level, not per rack. Includes: UPS losses, cooling, lighting, distribution losses. Excludes: well-to-tank fuel production, embodied carbon. Target for modern DC: PUE < 1.2.
### WUE and CUE
| Metric | Description | Formula | Target |
|--------|-------------|---------|--------|
| **WUE** (Water Usage Effectiveness) | Water consumption per IT energy | WUE = Annual Water Usage / IT Energy (L/kWh) | < 0.5 L/kWh |
| **CUE** (Carbon Usage Effectiveness) | CO₂ emissions per IT energy | CUE = Total CO₂ / IT Energy (kg CO₂/kWh) | < 0.2 kg CO₂/kWh |
WUE is critical in dry regions (southwest US, Australia, Middle East). Adiabatic cooling consumes significantly more water than closed-loop cooling.
### 3-phase vs Single-phase
| Feature | Single-phase (230 V) | 3-phase (400 V) |
|---------|---------------------|-----------------|
| **Voltage** | 230 V (L-N) | 230/400 V (L-N/L-L) |
| **Power per feed** | ~7.4 kW (32 A) | ~22 kW (32 A, 3-ph) |
| **Efficiency** | Lower (more losses) | Higher (lower current) |
| **Use case** | Smaller racks, office | Standard in DC, high-density |
| **PDU** | Single-phase (C13/C19) | 3-phase (C13/C19, 3-ph monitoring) |
| **Balancing** | Automatic | Phase balancing required (L1/L2/L3) |
### Rack power density
| Cat. | Type | kW/rack | Power | Cooling |
|------|------|---------|-------|---------|
| Low | Office, storage | 1-3 kW | 1-ph, 16 A | Air (free cooling) |
| Medium | Standard compute | 5-10 kW | 3-ph, 32 A | Air (CRAC/CRAH) |
| High | GPU, HPC | 15-30 kW | 3-ph, 60 A | Air + liquid assist |
| Ultra | AI/ML clusters | 40-100+ kW | 3-ph, 100+ A | Direct-to-chip / immersion |
### Rack PDU connectors
| Connector | Max current | Device type |
|-----------|-------------|-------------|
| **C13** | 10 A (250 V) | Servers, switches, 1U |
| **C19** | 16 A (250 V) | Higher power servers, UPS |
| **IEC 60309** (3-ph) | 16-125 A | Rack PDU inputs |
| **NEMA L6-30** | 30 A (250 V) | US spec |
## Cooling
### Cooling — technology overview
| Technology | Type | Output (kW/rack) | Typical PUE | CAPEX | Use case |
|-----------|------|-----------------|-------------|-------|----------|
| **Free air cooling** | Air | < 5 | 1.05-1.15 | Low | Climatically suitable locations |
| **CRAC (DX)** | Air | 5-10 | 1.4-1.8 | Medium | Smaller DC, retrofit |
| **CRAH (CW)** | Air | 5-15 | 1.2-1.5 | High | Enterprise DC |
| **In-row cooling** | Air | 10-25 | 1.2-1.4 | High | High-density racks |
| **Rear-door HX** | Hybrid | 15-30 | 1.1-1.3 | Medium | Retrofits, GPU |
| **Direct-to-chip** | Liquid | 40-100+ | 1.05-1.15 | High | AI/ML, HPC |
| **Immersion (single-phase)** | Liquid | 50-100+ | 1.03-1.10 | High | Bitcoin, hyperscale |
| **Immersion (two-phase)** | Liquid | 100-200+ | 1.03-1.08 | Very high | Extreme density |
### Chilled water vs Direct Expansion (DX)
| Feature | Chilled water (CW) | Direct Expansion (DX) |
|---------|-------------------|----------------------|
| **Medium** | Water + glycol | Refrigerant (R134a, R410A, R454B) |
| **CRAC/CRAH** | CRAH (Coolant-based) | CRAC (refrigerant compressor) |
| **Efficiency** | Higher (COP 5-7) | Lower (COP 2-4) |
| **Water temperature** | 7-12 °C (standard), 18-22 °C (high-temp) | 5-10 °C (evaporator) |
| **Complexity** | Higher (chillers, pumps, pipes, cooling tower) | Simpler |
| **Maintenance** | Higher (water treatment, legionella prevention) | Lower |
| **Use case** | Large DC > 500 kW, enterprise | Smaller DC, edge, retrofit |
### Containment types
| Type | Description | Efficiency | Implementation |
|------|-------------|------------|----------------|
| **Cold aisle containment (CAC)** | Enclosed cold aisle, warm air returns to room | High | Doors at aisle ends, ceiling panels |
| **Hot aisle containment (HAC)** | Enclosed hot aisle, warm air goes directly to return | Higher | Doors + ceiling panels, return to CRAH |
| **Chimney / rear duct** | Each rack has its own exhaust chimney to ceiling | Highest | Individual ducts per rack, expensive |
| **Open aisle** | No containment, cold and warm air mix | Low | Legacy, cheap |
Recommendation: CAC/HAC at density > 5 kW/rack. HAC is 5-10 % more efficient than CAC (warm air is directly extracted, does not mix with room).
### CFD modeling
Computational Fluid Dynamics (CFD) simulates airflow in DC before physical implementation:
- Identification of hot spots (warm air recirculation into cold aisle)
- Optimization of perforated tile positions
- Design of bypass airflow (cable openings, uncovered positions)
- Simulation of CRAH unit failure (what-if scenarios)
- Tools: Future Facilities (6Sigma DC), Ansys Fluent, OpenFOAM
### Free cooling
- **Air-side** — intake of outside air at suitable temperature (filtration, humidification)
- **Water-side** — use of cold water from outdoor chillers (strainer cycle) without compressor
- **Climate zone** — free cooling usable ~2000-8000 hours/year depending on location
- Scandinavia: 7000-8000 h/year
- Central Europe: 4000-6000 h/year
- Southern Europe: 2000-4000 h/year
- **Hybrid** — combination of free cooling + mechanical cooling (most common)
- **Economizer types**: Class A1 (dry cooler), Class A2 (evaporative), Class B (air-side)
### Liquid cooling detail
| Type | Inlet temperature | Capacity (kW/rack) | Medium | Installation |
|------|-----------------|-------------------|--------|-------------|
| **Cold plate (D2C)** | 20-45 °C | 40-100+ | Water, propylene glycol | CDU per rack or per row |
| **Rear-door HX** | 18-27 °C | 15-30 | Water | Passive, no server modification |
| **Immersion (1-ph)** | 35-50 °C | 50-100+ | Dielectric oil | Tank, CDU, heat exchanger |
| **Immersion (2-ph)** | 25-35 °C | 100-200+ | Dielectric (boiling) | Tank + condenser |
**CDU (Coolant Distribution Unit)**:
- Provides coolant temperature and pressure to racks
- Primary loop (facility water) + secondary loop (rack coolant)
- Sizing: 1 CDU per 4-8 racks (40-100 kW per CDU)
- Redundancy: N+1 CDU, dual coolant loops
**Water quality requirements**:
- Conductivity: < 1 µS/cm (demineralized water)
- pH: 6.5-8.0
- Particulates: < 50 µm (filtration)
- Corrosion prevention: inhibitors, glycol (10-30 %)
- Biological growth prevention: UV, biocides
### Adiabatic cooling
Using water evaporation to cool air:
- **Direct adiabatic** — air passes through water (media pad), cools and humidifies
- **Indirect adiabatic** — air cools via heat exchanger without direct contact with water
- **Water consumption**: 3-5 L/kWh (direct), 1-2 L/kWh (indirect)
- Efficiency depends on air humidity — more effective in dry climates
## Cabling and structured cabling
### TIA-942 cabling hierarchy
```
Entrance Room (ER)
├── Backbone cabling (fiber single-mode / multi-mode)
│ │
│ ├── Main Distribution Area (MDA)
│ │ │
│ │ ├── Horizontal Distribution Area (HDA)
│ │ │ │
│ │ │ └── Equipment Distribution Area (EDA) → rack
│ │ │
│ │ └── Intermediate Distribution Area (IDA) — optional
│ │
│ └── Telecommunication Room (TR) — for office
└── Backbone cabling (fiber / copper)
```
### Copper cabling categories
| Category | Frequency | Speed | Length | Connector | Use case |
|----------|-----------|-------|--------|-----------|----------|
| **Cat5e** | 100 MHz | 1 GbE | 100 m | RJ45 | Legacy, voice |
| **Cat6** | 250 MHz | 1 GbE (10 GbE up to 55 m) | 100 m (10 GbE: 55 m) | RJ45 | Standard DC, enterprise |
| **Cat6A** | 500 MHz | 10 GbE | 100 m | RJ45 | Standard for new DC |
| **Cat7** (GG45) | 600 MHz | 10 GbE | 100 m | GG45/TERA | Niche, replaced by Cat6A/8 |
| **Cat8.1** | 2000 MHz | 25/40 GbE | 30 m | RJ45 | Top-of-rack, storage |
| **Cat8.2** | 2000 MHz | 25/40 GbE | 30 m | GG45/TERA | Top-of-rack, storage |
In DC, **Cat6A** (10 GbE up to 100 m) is standard for horizontal cabling. Cat8 only for patch cables within a rack (up to 30 m).
### Fiber optic types
| Type | Core | Modal BW | Speed | Max length | Use case |
|------|------|----------|-------|-----------|----------|
| **OS1** (SM) | 9 µm | — | 100 GbE - 800 GbE | 10-80 km | Backbone, campus, WAN |
| **OS2** (SM) | 9 µm | — | 100 GbE - 800 GbE | 2-80 km (CWDM/DWDM) | Backbone, DWDM |
| **OM1** (MM) | 62.5 µm | 200 MHz·km | 1 GbE | 275 m | Legacy |
| **OM2** (MM) | 50 µm | 500 MHz·km | 10 GbE | 82 m | Legacy |
| **OM3** (MM) | 50 µm | 2000 MHz·km | 10 GbE up to 300 m, 100 GbE up to 100 m | 300 m (10G) | Standard DC, VCSEL |
| **OM4** (MM) | 50 µm | 4700 MHz·km | 100 GbE up to 150 m, 400 GbE up to 100 m | 550 m (10G) | High-performance DC standard |
| **OM5** (MM) | 50 µm | 4700+ MHz·km | 200/400 GbE SWDM | 150 m (100G) | Emerging, SWDM |
For new DC: **OM4** as standard for multi-mode, **OS2** for single-mode backbone (LR, DWDM). OM5 is not widely deployed — OM4 + parallel optics (SR4) is more common.
### Connector types
| Connector | Type | Insertion loss | Fiber count | Use case |
|-----------|------|---------------|-------------|----------|
| **LC** | Duplex | < 0.15 dB | 2 | Standard for SFP/SFP+/QSFP |
| **SC** | Duplex | < 0.2 dB | 2 | Older installations, patch panels |
| **MPO/MTP** (12-f) | Multi-fiber | < 0.35 dB | 12/24 | 40/100/400 GbE parallel |
| **MPO/MTP** (24-f) | Multi-fiber | < 0.5 dB | 24 | 400 GbE (SR4.2, DR4) |
| **SN** | Duplex (mini) | < 0.15 dB | 2 | High-density (QSFP-DD, OSFP) |
| **CS** | Duplex (mini) | < 0.15 dB | 2 | High-density (QSFP-DD, OSFP) |
### MPO/MTP polarity
| Method | Description | Use case |
|--------|-------------|----------|
| **Type A** (Straight) | Fiber 1→1, 2→2, ... | Duplex applications with cross-over at both ends |
| **Type B** (Crossed) | Fiber 1→12, 2→11, ... | Parallel optics (SR4, SR8) — standard |
| **Type C** (Pairs crossed) | Pairs 1-2→2-1, 3-4→4-3 | 40 GbE SR4 (4×10G) |
### Breakout cassettes
```
MPO (12-f) ──> Breakout cassette ──> 6× LC duplex (12 fibers = 6× duplex)
MPO (24-f) ──> Breakout cassette ──> 12× LC duplex (24 fibers = 12× duplex)
```
Use case: Connecting MPO ports (switch) with LC ports (servers, storage). Cassettes are in the patch panel, not in the active path.
### Copper vs fiber decision
| Criterion | Copper (Cat6A/8) | Fiber (OM4/OS2) |
|-----------|-----------------|-----------------|
| **Reach** | 30-100 m | 100 m - 80 km |
| **Speed** | 1-40 GbE | 1-800 GbE |
| **Transceiver cost** | Lower (RJ45) | Higher (SFP+/QSFP) |
| **Cable cost** | Lower | Higher (patch cord) |
| **Port power** | 2-5 W (25 GbE) | 1-3 W (25 GbE SR) |
| **EMI immunity** | Susceptible | Immune |
| **Weight (100 m)** | ~3-4 kg | ~0.5-1 kg |
| **Recommendation** | Up to 30 m, server→ToR switch | Backbone, storage, >30 m |
### Cabling best practices
- **Horizontal cabling**: max 90 m permanent link + 10 m patch cords (TIA-942)
- **Fiber management**: slack spools, cable managers, minimum bend radius 10× cable diameter
- **Color coding**: OS1/OS2 (yellow), OM3 (aqua), OM4 (magenta/purple), OM5 (lime green)
- **Labeling**: both ends, patch panels, faceplates — standard ANSI/TIA-606-B
- **Overhead vs underfloor**: overhead (ladder rack) is preferred in DC (better airflow, easier changes)
- **MPO cassettes**: plan 15-20 % fiber reserve for future needs
## Physical security
### Multi-layer security model (defense in depth)
```
Layer 1: Perimeter (fence, gate, guards)
Layer 2: Building (walls, locks, CCTV, card readers)
Layer 3: DC hall (biometrics, mantrap, CCTV, motion detection)
Layer 4: Rack / Cage (electronic locks, sensors)
Layer 5: Data (encryption, HSM, access control)
```
### Access control
| Method | Factor | Level | Note |
|--------|--------|-------|------|
| **RFID / proximity card** | Something you have | Standard | Basic access, cheap |
| **Smart card (PKI)** | Something you have + PIN | Medium | Certificate on card, anti-passback |
| **Biometric (fingerprint)** | Something you are | High | Fast, hygienic (touchless readers) |
| **Biometric (palm/finger vein)** | Something you are | Very high | Hard to forge, contactless |
| **Biometric (iris/retina)** | Something you are | Highest | Very accurate, slow, expensive |
| **Multi-factor** | 2+ factors | Highest | Card + biometrics + PIN — Tier IV DC |
### Mantrap design
```
Outer door ──> Mantrap (vestibule) ──> Inner door
├── Weight sensor (anti-tailgating)
├── CCTV (both doors)
├── Intercom (emergency exit)
└── Motion detector (in mantrap)
```
- Only one door opens at a time
- Anti-tailgating: weight sensor detects multiple persons
- Exit via breakout button + motion detection
- Emergency exit: panic bar + alarm
### CCTV
| Element | Recommendation |
|---------|----------------|
| **Resolution** | Min. 1080p, ideally 4K (6 MP+) |
| **FPS** | 15-30 FPS (recording), 30+ FPS (realtime monitoring) |
| **Retention** | Min. 30 days (90 days for audit) |
| **Storage** | NVR (on-prem), cloud (AWS KVS, Azure Video Indexer) |
| **AI analytics** | Face detection, ANPR (license plate), object detection |
| **Field of view** | Every door, every aisle — no blind spots |
### Asset tracking
| Technology | Accuracy | Cost | Use case |
|-----------|----------|------|----------|
| **Barcode** | Rack-level | Very low | Manual inventory |
| **RFID (passive)** | Rack-level (door sweep) | Low | Automatic rack open detection |
| **RFID (active, UWB)** | 10-30 cm | Medium | Real-time tracking |
| **Bluetooth BLE** | 1-3 m | Low | Approximate position |
| **GPS** | 1-10 m | Medium | Outdoor tracking |
## DC layout and design
### Raised floor vs Slab
| Feature | Raised floor | Slab (solid floor) |
|---------|-------------|-------------------|
| **Airflow** | Underfloor air distribution (raised floor as plenum) | Overhead air, in-row cooling |
| **Flexibility** | Easy addition of perforated tiles | Limited (overhead cooling required) |
| **Weight** | Limit 500-1000 kg/m² (depends on height) | Unlimited |
| **Cost** | Higher (~$200-400/m²) | Lower (~$100-200/m²) |
| **Height** | 600-900 mm (standard), 900-1200 mm (high-density) | — |
| **Trend** | Declining (shift to in-row/overhead cooling) | Growing (new DC, high-density) |
Modern high-density DC (AI/ML, GPU) are moving away from raised floor to slab + overhead/in-row cooling — higher rack weights (1000-2000 kg), inability to provide sufficient airflow through floor.
### Rack layout and dimensions
| Parameter | Standard | High-density | Note |
|-----------|----------|-------------|------|
| **Rack width** | 600 mm (19") | 600-750 mm | 750 mm for GPU (cabling, cooling) |
| **Rack depth** | 1000-1200 mm | 1200-1500 mm | GPU servers, longer cables |
| **Rack height** | 42U | 48U / 52U | Higher rack = better power density |
| **Aisle width (cold)** | 1200-1500 mm | 1500-1800 mm | Service access, airflow |
| **Aisle width (hot)** | 900-1200 mm | 1200-1500 mm | Narrower than cold |
| **Max rack load** | 500-800 kg | 1000-2000 kg | Floor reinforcement required |
### Space planning
```
For Tier III DC (example):
IT space: 1000 m²
└── 20 rows × 10 racks = 200 racks at 42U
└── 200 racks × 5 kW avg = 1 MW IT load
└── PUE 1.4 → 1.4 MW facility
Support spaces:
└── UPS + batteries: 200 m²
└── Generators: 100 m² (outdoor)
└── Cooling (chillers, cooling tower): 300 m²
└── Offices, storage, loading dock: 400 m²
Total: ~2000 m² (50% IT, 50% support)
```
### Zone approach (TIA-942)
| Zone | Description | Access | Security |
|------|-------------|--------|----------|
| **Z1** (Public) | Reception, offices | Free | Minimal |
| **Z2** (Office) | Administration, NOC | Employees + guests | RFID |
| **Z3** (DC support) | UPS, generators, cooling | DC operators | RFID + biometrics |
| **Z4** (DC hall) | Servers, storage, networking | DC operators + approved | RFID + biometrics + mantrap |
| **Z5** (Rack/cage) | Specific rack or cage | Only authorized personnel | Electronic lock |
## Fire suppression
### Detection
| System | Type | Detection time | False alarms | Use case |
|--------|------|----------------|--------------|----------|
| **VESDA** (Very Early Smoke Detection) | Aspiration, laser sensor | < 30 s (4 alarm levels) | Very low | Standard for DC |
| **Spot detection** | Ionization / optical smoke detector | 2-5 min | Medium | Legacy, smaller DC |
| **Heat detection** | Thermal detector (temperature / rate of rise) | 5-10 min | Very low | Backup for VESDA |
| **Line-type (LHD)** | Linear heat detection cable | 2-5 min | Low | Cable trays, above ceiling |
VESDA is the standard — active aspiration draws air from DC, laser sensor detects smoke particles at 4 levels (Alert → Action → Fire 1 → Fire 2). Enables intervention before visible smoke.
### Suppression systems
| System | Medium | Advantages | Disadvantages | Typical DC |
|--------|--------|------------|---------------|-----------|
| **Novec 1230** (FK-5-1-12) | Gas | Safe for people, zero ODP, short atmospheric lifetime (5 days) | Higher cost | Enterprise DC |
| **FM-200** (HFC-227ea) | Gas | Fast (10 s), effective | High GWP (3220), no ODP | Legacy DC |
| **Inergen** (IG-541) | Inert gas (52% N₂, 40% Ar, 8% CO₂) | Completely safe, natural gas | Large volume, high pressure | Enterprise DC |
| **Argonite** (IG-55) | 50% Ar, 50% N₂ | Safe, natural | Large volume, higher pressure | Enterprise DC |
| **Water mist** | Water (fine mist) | Cooling, smoke suppression, low cost | Water in DC (risk), local application only | Retrofits |
| **Pre-action sprinkler** | Water | Dual activation (detection + sprinkler) | Water risk, drainage required | Tier I-II |
**Concentration**: Novec (4-6 % volume), FM-200 (7-9 %), Inergen (35-50 %). Novec and Inergen are safe for breathing (min. 5-7 min evacuation).
### Detection zones
```
DC hall ──> zones of ~200 m² (max)
├── VESDA (each zone its own aspirator)
├── Smoke detectors (ceiling + floor)
└── Heat detection (backup)
```
## DCIM (Data Center Infrastructure Management)
### What DCIM covers
| Area | Metrics | Output |
|------|---------|--------|
| **Power** | Per PDU, per outlet, per rack, total | Capacity planning, PUE, kW/rack |
| **Cooling** | Temperature, humidity, airflow (sensors per rack) | Hot spot maps, airflow optimization |
| **Asset** | What is in which rack, U position, serial, warranty | Asset inventory, lease management |
| **Network** | Port utilization, patch panel connections | Patch management, port tracking |
| **Space** | Free U in rack, free racks | Capacity planning, "what-if" simulations |
### Tools
| Tool | Type | Platform | Cost | Note |
|------|------|----------|------|------|
| **Nlyte (Carrier)** | Enterprise DCIM | On-prem / Cloud | $$$ | Market leader, complex |
| **Sunbird DCIM** | Enterprise DCIM | Cloud | $$$ | Power monitoring, asset tracking |
| **Device42** | DCIM + IPAM | On-prem / Cloud | $$ | Integrated IPAM, CMDB |
| **NetBox** | Open source DCIM | On-prem | Free | IPAM, DCIM, asset tracking |
| **OpenDCIM** | Open source | On-prem | Free | Basic DCIM, asset management |
| **RackTables** | Open source | On-prem | Free | Simple, asset + networking |
| **Vendor-specific** | Dell OME, HPE OneView | On-prem | Part of HW | Vendor-specific only |
## Site selection
### Criteria for DC site selection
| Category | Criterion | Weight |
|----------|-----------|--------|
| **Power** | Electricity availability (grid capacity), cost/kWh, possibility of two independent feeds | High |
| **Connectivity** | Fiber backbone availability, number of connectivity providers, latency to major POP | High |
| **Natural risks** | Earthquakes, floods, hurricanes, tornadoes, wildfires — historical data + predictions | High |
| **Climate** | Average temperature, humidity (free cooling potential) | Medium |
| **Workforce** | Availability of technicians, DC operators, network/admin engineers | Medium |
| **Taxes and regulation** | Tax incentives, environmental regulations, building permits | Medium |
| **Security** | Crime, political stability, terrorist risk | High |
| **Transport accessibility** | Proximity to airport, highway (for HW deliveries, personnel) | Low |
### Natural risks — mapping
| Risk | Areas | Mitigation |
|------|-------|------------|
| **Earthquakes** | Pacific Ring of Fire (CA, Japan, Chile) | Base isolation, seismic bracing, flexible connections |
| **Hurricanes** | Caribbean, southeastern US, southeast Asia | Reinforced construction, generators above flood level |
| **Floods** | River valleys, coastal areas | Location outside flood zone, barriers |
| **Wildfires** | California, Australia, Mediterranean | Defensive zones, air filtration, monitoring |
### Power availability by region
| Region | Grid reliability | Cost/kWh (industrial) | Note |
|--------|-----------------|------------------------|------|
| **Northern Europe** (SE, NO, FI) | High (99.99 %) | $0.04-0.08 | Cheap green energy, cool climate |
| **Central Europe** (DE, NL, CZ) | High (99.99 %) | $0.10-0.20 | Stable, growing renewables |
| **Eastern US** (VA, NC) | High | $0.05-0.08 | Largest DC hub (Ashburn, VA) |
| **Western US** (CA, OR) | Medium (PG&E issues) | $0.10-0.15 | CALISO grid, blackout risk |
| **Singapore** | High | $0.15-0.20 | Moratorium on new DC (2023), water |
| **Dubai / UAE** | High | $0.06-0.10 | Cheap energy, high temperature (cooling) |
## Compliance and certification
| Standard / Certification | Area | Description |
|-------------------------|------|-------------|
| **TIA-942** (Rated 1-4) | DC design | Classification of redundancy, cabling, security (analogous to Uptime Tier) |
| **Uptime Institute** (Tier I-IV) | DC design | Operational certification, construction documentation |
| **ISO 27001** | ISMS | Information security, risk management |
| **ISO 27701** | Privacy | Extension of ISO 27001 for GDPR compliance |
| **SOC 2** (Type I/II) | Service org | Controls: Security, Availability, Confidentiality, Integrity, Privacy |
| **PCI DSS** | Payment cards | Physical security, access to cardholder data |
| **HIPAA** | Healthcare | USA, health data protection |
| **FedRAMP** | US government | Cloud service authorization, DC security |
| **GDPR** | EU | Personal data protection, data residency |
| **NIST SP 800-53** | DC security | Security control catalog for US federal |
| **ISO 14001** | EMS | Environmental management, sustainability |
## Sustainability
### Carbon footprint of DC
```
Total emissions = Scope 1 (direct) + Scope 2 (energy) + Scope 3 (supply chain)
Scope 1: Generators (diesel), refrigerant leaks
Scope 2: Purchased electricity (grid mix)
Scope 3: HW manufacturing, transport, EOL recycling (~60-80 % of total emissions)
```
### Emission reduction
| Measure | Impact on PUE | Emission reduction | Payback |
|---------|--------------|-------------------|---------|
| **Temperature increase** (22→27 °C) | 0.1-0.2 | 10-20 % cooling | Immediate |
| **Free cooling** | 0.1-0.3 | 20-40 % cooling | 1-2 years |
| **Liquid cooling** | 0.2-0.4 | 30-50 % cooling | 2-4 years |
| **LED lighting + sensors** | 0.01-0.02 | < 1 % | 1 year |
| **PPA (Power Purchase Agreement)** | — | 100 % Scope 2 | Variable |
| **Renewable sources** (rooftop solar) | — | 5-15 % consumption | 5-10 years |
| **Green generator** (HVO biodiesel) | — | 90 % CO₂ reduction | +30 % fuel cost |
### Sustainability certifications
| Certification | Description |
|--------------|-------------|
| **LEED** (BD+C: DC) | U.S. Green Building Council — design and construction |
| **BREEAM** | UK, European sustainability assessment |
| **Climate Neutral Data Centre Pact** (EU) | Self-regulatory, PUE < 1.4 by 2030 |
| **ISO 50001** | Energy management system |
| **Energy Star** | EPA, energy efficiency (US only) |
## Decision diagram — DC topology design
```mermaid
flowchart TD
Start(["DC design"]) --> TIER{"Required Tier?"}
TIER -->|"Tier I / II"| T1["N / N+1 redundancy<br/>Simple power, single path<br/>CRAC/CRAH, free cooling<br/>PUE 1.4-1.6, cost 1×"]
TIER -->|"Tier III"| T3["N+1, concurrently maintainable<br/>Dual path (A/B feed)<br/>Hot aisle containment<br/>PUE 1.2-1.4, cost 2×"]
TIER -->|"Tier IV"| T4["2N+1, fault tolerant<br/>Dual redundant + STS<br/>Hot + cold containment<br/>PUE 1.1-1.3, cost 3×"]
TIER --> POWER{"Power chain"}
POWER -->|"UPS"| UPS{"UPS type"}
UPS -->|"Enterprise DC"| UPS1["VFI double-conversion<br/>Li-ion (LFP), 10-15 years<br/>N+1 or 2N modular"]
UPS -->|"Edge / office"| UPS2["VI line-interactive<br/>VRLA, 3-5 years"]
POWER -->|"Generator"| GEN["Diesel 500-2500 kVA<br/>Tank for 24-72 h<br/>ATS 4-10 ms switching"]
POWER -->|"PDU"| PDU["3-phase 400 V<br/>Monitored/Switched<br/>A/B feed to racks"]
Start --> DENS{"Power density"}
DENS -->|"< 10 kW/rack"| COOL1["Air cooling<br/>CRAC/CRAH, raised floor<br/>Hot aisle containment<br/>ASHRAE A1-A2"]
DENS -->|"10-25 kW/rack"| COOL2["Hybrid<br/>In-row cooling<br/>Rear door HX<br/>ASHRAE A1-H1"]
DENS -->|"> 25 kW/rack"| COOL3["Liquid cooling<br/>CDU, direct-to-chip<br/>Immersion single/two-phase<br/>ASHRAE W-classes"]
Start --> CLIM{"Climate zone"}
CLIM -->|"Moderate (CZ, DE)"| FC1["Free cooling 4000-6000 h/year<br/>Chiller + economizer<br/>PUE saving 0.2-0.3"]
CLIM -->|"Warm (ES, US South)"| FC2["Chiller year-round<br/>Adiabatic cooling<br/>PUE 1.3-1.6"]
CLIM -->|"Cold (SE, NO)"| FC3["Free cooling 7000+ h/year<br/>Air-side economizer<br/>PUE < 1.2"]
```
## Disk monitoring — S.M.A.R.T.
Self-Monitoring, Analysis and Reporting Technology — predictive monitoring of HDD/SSD.
| Key attribute | ID | Description |
|--------------|----|-------------|
| Reallocated Sectors Count | 5 | Number of remapped sectors (increase = end of disk life) |
| Power-On Hours | 9 | Total operating time in hours |
| Reported Uncorrectable Errors | 187 | Uncorrectable errors (red flag) |
| CRC Error Count | 199 | Errors on SATA link (cable/controller) |
| SSD Life Left | 231 | % remaining SSD life |
| Media Wearout Indicator | 233 | Total NAND writes |
Tools: `smartmontools` (smartctl, smartd), Prometheus exporter (`node_exporter`), OTeL collector.
## Sources
Links, books and standards: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
### Recommended literature
| Book | Authors | ISBN | Description |
|------|---------|------|-------------|
| The Data Center as a Computer (4th ed., 2025) | Barroso, Hölzle, Ranganathan | 978-3-031-99488-3 | Comprehensive design evolution of warehouse-scale computer (WSC) by Google architects. Covers hardware, software, power, cooling, networking and 25 years of WSC experience. Key publication for datacenter architecture. |
| Electronics Cooling: From the Chip to the Datacenter (Vol. 62) | Abraham et al. | 978-0-443-47084-4 | Practical guide to thermal management from transistor level to datacenter. Covers conduction, convection, liquid immersion and phase change cooling. Essential resource for DC cooling design. |
## Datacenter backbone services
When building a new DC, basic infrastructure services must be deployed first — without them, higher layers cannot operate:
### DNS
| Role | Service | Description |
|------|---------|-------------|
| **Authoritative** | Bind, PowerDNS, NSD | Primary DNS zone for internal domains |
| **Recursive** | Unbound, Bind (caching), CoreDNS | Resolver for internal + external queries |
| **Anycast** | DNS anycast (BGP) | Redundancy, lower latency |
| **Integration** | Infoblox, BlueCat, dnsmasq | IPAM + DNS + DHCP in one |
Best practices: separate auth and recursive resolvers, DNSSEC, split-horizon (internal vs external view), TSIG for zone transfers, monitoring (DNS query latency, NXDOMAIN rate).
### NTP (time synchronization)
- **Primary**: GPS-disciplined NTP servers (Microchip S600, Meinberg)
- **Secondary**: Stratum 1/2 NTP (ntpd, chrony, NTPsec)
- **All nodes**: chrony (modern replacement for ntpd), local NTP server on each rack switch (boundary clock)
- **Precision**: PTP (IEEE 1588) for telco/fintech — sub-microsecond accuracy
- **DC topology**: GPS antenna → Grandmaster (PTP) → Boundary clock (rack switch) → Ordinary clock (server)
### DHCP + IPAM
| Tool | Description |
|------|-------------|
| **ISC DHCP** | Legacy, still widely deployed |
| **Kea** | Modern replacement for ISC DHCP (ISC + Linux Foundation) |
| **Infoblox / BlueCat** | Enterprise IPAM + DHCP + DNS |
| **NetBox / phpIPAM** | Open-source IPAM |
### LDAP / Identity Management
| Tool | Description |
|------|-------------|
| **FreeIPA** | Integrated IDM (LDAP + Kerberos + DNS + CA) — Linux |
| **Active Directory** | Microsoft, LDAP + Kerberos + Group Policy |
| **389 Directory Server** | Open-source LDAP (Red Hat) |
| **OpenLDAP** | Classic open-source LDAP |
| **Keycloak / Authentik** | Modern OIDC/SAML/LDAP gateways |
### PKI and certificates
- **Enterprise CA**: EJBCA, Smallstep, HashiCorp Vault (PKI engine)
- **ACME**: Cert-Manager (Kubernetes), certbot (Let's Encrypt)
- **mTLS**: Vault PKI, spire (SPIFFE), Cilium
- **Best practices**: root CA offline, intermediate CA per environment, short-lived certificates (max 90 days), revocation (CRL/OCSP)
### Monitoring and observability
See [MONITORING.md](MONITORING.md). Before running first workloads, DC must have:
- Metric collection (Prometheus, Zabbix)
- Centralized logs (Loki, ELK)
- Alerting (Alertmanager, PagerDuty)
- Uptime monitoring (heartbeat checks)
### Deployment logistics — step order
```
1. DNS (at least recursive + local resolver)
2. NTP (time synchronization)
3. DHCP + IPAM (first servers get IPs)
4. LDAP / IAM (users, groups, access rights)
5. PKI (certificates for encryption)
6. Configuration management (Ansible, Puppet)
7. Monitoring + logging (see what's happening)
8. Container registry / Package repo (docker registry, apt/yum mirror)
9. Load balancer (for services)
10. Storage backend (Ceph, NFS, SAN)
11. Orchestration (Kubernetes, OpenStack)
```
## OpenStack in the datacenter
OpenStack brings a software abstraction layer to DC enabling multi-tenancy and self-service:
### Control plane architecture
- **Controller nodes** — management services (Keystone, Nova API, Neutron API, Horizon, RabbitMQ, DB)
- **Compute nodes** — hypervisor (KVM), Nova Compute, Neutron agent
- **Storage nodes** — Ceph OSD, Cinder volumes, Swift object storage
- **Network nodes** — Neutron L3 router, DHCP agent, DVR
### Requirements for DC infrastructure
| Component | Requirement |
|-----------|-------------|
| **Controller** | 3-5 node HA cluster, 16+ vCPU, 32+ GB RAM, SSD |
| **Compute** | Dense performance per rack (GPU, high-core), NUMA-aware design |
| **Storage (Ceph)** | 10-25 GbE networking, NVMe/SSD OSD, 3+ replica |
| **Network** | 25/100 GbE spine-leaf, L3 BGP underlay, VXLAN overlay |
| **Rack power** | 10-30 kW/rack for GPU compute |
### Use cases
- Private cloud for enterprise (multi-tenant, self-service Horizon)
- NFVI for telco (DPDK, SR-IOV, low-latency)
- Academic / HPC clusters (Ironic, Cyborg, Manila)
- Government / regulated environments (on-prem, audit trail)
*Last revision: 2026-06-03*