# 🏗️ Data Center Migration

## Migration strategies

| Strategy | RTO | RPO | Risk | Cost | Duration | Description |
|-----------|-----|-----|--------|---------|-------------|-------|
| **Cold / Big Bang** | hours–days | days | High | Low | days | Shut everything down, move, power up |
| **Phased / Wave** | minutes (per wave) | minutes | Medium | Medium | weeks–months | Workloads moved in waves |
| **Rolling** | 0 (live) | 0 | Low | High | months | Live migration per VM/service |
| **Parallel Run** | 0 | 0 | Low | Very high | months | Both DCs operational, gradual cutover |
| **Pilot Light** | hours | minutes | Medium | Low | weeks | Critical services in new DC, rest migrates |
| **Lift & Shift** | hours | minutes | Medium | Low | weeks | VMs/servers moved without configuration changes |
| **Re-platform** | hours | minutes | Low | Medium | months | Optimization during migration (OS upgrade, resize) |
| **Re-architect** | 0 | 0 | Low | High | months–years | Application redesigned for new platform |

---

## Decision tree

```mermaid
flowchart TD
    Start(["DC Migration"]) --> APP{"Application\nstateful?"}
    APP -->|"Yes"| DOWNTIME{"Tolerates\ndowntime?"}
    APP -->|"No"| ROLLING["Rolling / Parallel Run"]

    DOWNTIME -->|"Yes, hours+"| COLD["Cold / Big Bang\nSimplest, cheapest\nRisk: all at once"]
    DOWNTIME -->|"Yes, minutes"| PHASED["Phased / Wave\nBy application / business unit"]
    DOWNTIME -->|"No (zero downtime)"| SYNC{"Sync replication\npossible?"}

    SYNC -->|"Yes, < 100 km"| ROLLING
    SYNC -->|"No"| PARALLEL["Parallel Run\nBoth DCs active, gradual cutover"]

    ROLLING --> ROLL_HA{"VMware,\nHyper-V?"}
    ROLL_HA -->|"Yes"| VMOTION["vMotion / Storage vMotion\nLive migration, 0 downtime"]
    ROLL_HA -->|"No"| ROLL_REPL["Storage + DB replication\nGradual workload migration"]
```

---

## Migration phases

### 1. Discovery and assessment

| Task | Tools | Output |
|------|----------|--------|
| HW and SW inventory | RVTools, NetBox, CMDB | Server, VM, and service list |
| Dependency mapping | ServiceNow, AppDynamics, manual | Application dependency graph |
| Traffic analysis | NetFlow, sFlow, vRNI | Bandwidth, latency, peak usage |
| Performance baseline | Prometheus, Zabbix, vRealize | CPU/RAM/disk/network per workload |
| License audit | Flexera, SAM | Licenses, support, compliance |

**Output:** workload list with RTO/RPO, dependencies, and criticality.

### 2. Planning

- **Wave plan** — workload division into migration waves (10–50 VMs per wave)
- **Dependency ordering** — DNS, NTP, LDAP, PKI first
- **Cutover window** — time window for switching (typically weekend)
- **Rollback plan** — conditions and procedure for reversal
- **Test plan** — what and how to test post-migration
- **Communication plan** — who, when, how is informed

### 3. New DC preparation

- **Infrastructure** — DNS, NTP, DHCP, LDAP/AD, PKI, monitoring (see [DATACENTERS.en.md](DATACENTERS.en.md) — deployment order)
- **Network** — BGP peering, VXLAN/VLAN, firewall rules, load balancers
- **Storage** — SAN zoning, NAS exports, Ceph cluster
- **Virtualization** — vCenter, Hyper-V cluster, Proxmox

### 4. Replication and synchronization

| Layer | Method | Tools |
|--------|--------|----------|
| **Storage (block)** | SAN sync/async mirror, LUN replication | NetApp SnapMirror, Dell EMC RecoverPoint, Pure ActiveCluster |
| **Storage (file)** | DFS-R, rsync, robocopy | Windows DFS, Rsync |
| **Storage (object)** | Cross-region replication | MinIO replication, S3 CRR |
| **Databases** | Log shipping, CDC, streaming replication | PostgreSQL Patroni, Oracle Data Guard, MSSQL AlwaysOn, MySQL Group Replication |
| **VM** | Storage vMotion, replication | VMware vSphere Replication, Hyper-V Replica, Zerto |
| **Kubernetes** | Velero + Restic, Rook Ceph mirror | Velero, Rook |

### 5. Workload migration

#### Wave migration (recommended for medium/large DCs)

```mermaid
gantt
    title Wave migration
    dateFormat  YYYY-MM-DD
    section Wave 1 - Core
    DNS, NTP, LDAP    :done, w1a, 2026-07-01, 3d
    Monitoring + logging :done, w1b, after w1a, 2d
    section Wave 2 - Network
    Load balancers     :active, w2a, 2026-07-06, 2d
    Firewalls          :active, w2b, 2026-07-08, 2d
    section Wave 3 - Storage
    NAS migration      :w3a, 2026-07-10, 5d
    SAN replication    :w3b, 2026-07-10, 3d
    section Wave 4 - Dev/Test
    Dev VMs            :w4a, 2026-07-15, 5d
    section Wave 5 - Prod tier 3
    Internal apps      :w5a, 2026-07-22, 5d
    section Wave 6 - Prod tier 2
    Business apps      :w6a, 2026-07-29, 5d
    section Wave 7 - Prod tier 1
    Critical apps      :w7a, 2026-08-05, 5d
```

#### Typical single wave procedure:

1. **Day -7**: Sync data replication (initial seed)
2. **Day -1**: Incremental sync, final test
3. **Day 0 (cutover)**:
   - Stop application in source DC
   - Final sync (last delta)
   - Start application in target DC
   - DNS/Traffic switch
   - Smoke test
4. **Day +1**: Monitoring (performance, errors, lag)
5. **Day +7**: Rollback window end (success confirmation)

### 6. Network strategies

#### IP re-addressing

| Approach | Description | Pros | Cons |
|---------|-------|--------|----------|
| **Keep IP** | Same IPs, BGP anycast or stretch VLAN | No application config changes | Stretched VLAN/L2 limitations |
| **Change IP** | New IP range, DNS/BGP routing change | Clean architecture | Config changes, DNS TTL |
| **NAT translation** | NAT between old and new IP space | No application changes | Latency, troubleshooting complexity |

**Keep IP** is only possible with:
- L2 stretch between DCs (VXLAN, OTV) — distance limited
- BGP anycast for VIPs (load balancers)
- Applications tolerant to ARP cache changes

#### DNS cutover

```
1. Lower TTL to 60–300 s (one week ahead)
2. At cutover, change A/AAAA records to new IPs
3. Wait for propagation (per TTL)
4. Monitor traffic
```

#### Traffic steering

| Technique | Use case |
|----------|----------|
| **BGP** | Change AS path / local pref for traffic steering |
| **DNS** | Lower TTL, change A records |
| **Load balancer** | Change pool members, health check |
| **GSLB** | Global Server Load Balancing (F5 GTM, NSX ALB) |
| **Cloud DNS** | AWS Route53, Azure Traffic Manager, Google Cloud DNS |

### 7. Database migration

See individual DB files for details. Summary table:

| DB | Method | RPO | RTO | Note |
|----|--------|-----|-----|----------|
| **PostgreSQL** | Streaming replication + Patroni switchover | 0 (sync) / ~MB (async) | min | Patroni auto-failover |
| **MySQL** | Group Replication / async replication | 0 (sync) / seconds | min | InnoDB Cluster |
| **Oracle** | Data Guard switchover | 0 (sync) | min | Far sync for remote DCs |
| **MSSQL** | AlwaysOn AG failover | 0 (sync) | min | Cloud witness |
| **MongoDB** | Replica set election | seconds | < 1 min | Priority-based failover |
| **Cassandra** | Multi-DC replication | eventual | 0 | Native multi-master |

### 8. Testing

| Phase | What to test | Method |
|------|-------------|--------|
| **Pre-migration** | Application in new DC (isolated) | Dry run on replicated data |
| **Cutover** | Functionality, availability, latency | Smoke test, synthetic transactions |
| **Post-migration** | Performance, integration, monitoring | A/B comparison with baseline, canary traffic |
| **Rollback** | Return to old DC | Tested rollback plan |

### 9. Rollback plan

Each wave must have a defined rollback:

| Condition | Action |
|----------|------|
| Application fails to start in new DC | DNS switch back, stop replication |
| Performance worse than baseline (> 20 %) | Rollback, root cause analysis |
| Integration failure (API timeout, DB connection) | Rollback, dependency check |
| Security incident | Rollback, forensic analysis |

Rollback must be tested **before** the real cutover.

---

## Special cases

### Mainframe migration

- **IBM z/OS** — GDPS (Geographically Dispersed Parallel Sysplex)
- HyperSwap for storage mirroring
- Cross-system coupling facility (XCF)
- Often the last migrated component

### COTS applications (Oracle EBS, SAP)

- Require vendor-specific migration procedures
- Oracle EBS: Autoconfig, cloning (ADXLC)
- SAP: System Copy (Homogeneous / Heterogeneous), SWPM, SUM
- License re-licensing on HW change

### Cloud migration (On-prem → Cloud)

See [CLOUD.en.md](CLOUD.en.md) — migration strategies (6 Rs):

| Strategy | Description |
|-----------|-------|
| **Re-host (Lift & Shift)** | VM → Cloud VM (AWS MGN, Azure Migrate) |
| **Re-platform** | OS upgrade, managed DB (RDS, Cloud SQL) |
| **Re-architect** | Application rewritten as cloud-native |
| **Retire** | Decommission unnecessary applications |
| **Retain** | Application stays on-prem (review later) |
| **Repurchase** | SaaS replacement |

---

## Recommended approach per DC size

| DC Size | VM Count | Recommended strategy | Duration | Team |
|-------------|----------|---------------------|-------------|-----|
| **Small** | < 50 | Big Bang (weekend) | 2–4 days | 3–5 people |
| **Medium** | 50–500 | Phased (5–10 waves) | 2–8 weeks | 5–10 people |
| **Large** | 500–5000 | Phased + Rolling | 3–12 months | 10–30 people |
| **Enterprise** | 5000+ | Parallel Run / Rolling | 12–36 months | 30+ people |

---

## Related

- [DATACENTERS.en.md](DATACENTERS.en.md) — DC topologies, secondary DC, deployment order
- [CLOUD.en.md](CLOUD.en.md) — cloud migration strategies (6 Rs)
- [DR.en.md](DR.en.md) — disaster recovery, RTO/RPO
- [NETWORKING.en.md](NETWORKING.en.md) — BGP, DNS, VXLAN, traffic steering
- [STORAGE.en.md](STORAGE.en.md) — storage replication

## Sources

Links, books, and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)

*Last revision: 2026-06-12*