# πŸ—οΈ Data Center Migration ## Migration strategies | Strategy | RTO | RPO | Risk | Cost | Duration | Description | |-----------|-----|-----|--------|---------|-------------|-------| | **Cold / Big Bang** | hours–days | days | High | Low | days | Shut everything down, move, power up | | **Phased / Wave** | minutes (per wave) | minutes | Medium | Medium | weeks–months | Workloads moved in waves | | **Rolling** | 0 (live) | 0 | Low | High | months | Live migration per VM/service | | **Parallel Run** | 0 | 0 | Low | Very high | months | Both DCs operational, gradual cutover | | **Pilot Light** | hours | minutes | Medium | Low | weeks | Critical services in new DC, rest migrates | | **Lift & Shift** | hours | minutes | Medium | Low | weeks | VMs/servers moved without configuration changes | | **Re-platform** | hours | minutes | Low | Medium | months | Optimization during migration (OS upgrade, resize) | | **Re-architect** | 0 | 0 | Low | High | months–years | Application redesigned for new platform | --- ## Decision tree ```mermaid flowchart TD Start(["DC Migration"]) --> APP{"Application\nstateful?"} APP -->|"Yes"| DOWNTIME{"Tolerates\ndowntime?"} APP -->|"No"| ROLLING["Rolling / Parallel Run"] DOWNTIME -->|"Yes, hours+"| COLD["Cold / Big Bang\nSimplest, cheapest\nRisk: all at once"] DOWNTIME -->|"Yes, minutes"| PHASED["Phased / Wave\nBy application / business unit"] DOWNTIME -->|"No (zero downtime)"| SYNC{"Sync replication\npossible?"} SYNC -->|"Yes, < 100 km"| ROLLING SYNC -->|"No"| PARALLEL["Parallel Run\nBoth DCs active, gradual cutover"] ROLLING --> ROLL_HA{"VMware,\nHyper-V?"} ROLL_HA -->|"Yes"| VMOTION["vMotion / Storage vMotion\nLive migration, 0 downtime"] ROLL_HA -->|"No"| ROLL_REPL["Storage + DB replication\nGradual workload migration"] ``` --- ## Migration phases ### 1. Discovery and assessment | Task | Tools | Output | |------|----------|--------| | HW and SW inventory | RVTools, NetBox, CMDB | Server, VM, and service list | | Dependency mapping | ServiceNow, AppDynamics, manual | Application dependency graph | | Traffic analysis | NetFlow, sFlow, vRNI | Bandwidth, latency, peak usage | | Performance baseline | Prometheus, Zabbix, vRealize | CPU/RAM/disk/network per workload | | License audit | Flexera, SAM | Licenses, support, compliance | **Output:** workload list with RTO/RPO, dependencies, and criticality. ### 2. Planning - **Wave plan** β€” workload division into migration waves (10–50 VMs per wave) - **Dependency ordering** β€” DNS, NTP, LDAP, PKI first - **Cutover window** β€” time window for switching (typically weekend) - **Rollback plan** β€” conditions and procedure for reversal - **Test plan** β€” what and how to test post-migration - **Communication plan** β€” who, when, how is informed ### 3. New DC preparation - **Infrastructure** β€” DNS, NTP, DHCP, LDAP/AD, PKI, monitoring (see [DATACENTERS.en.md](DATACENTERS.en.md) β€” deployment order) - **Network** β€” BGP peering, VXLAN/VLAN, firewall rules, load balancers - **Storage** β€” SAN zoning, NAS exports, Ceph cluster - **Virtualization** β€” vCenter, Hyper-V cluster, Proxmox ### 4. Replication and synchronization | Layer | Method | Tools | |--------|--------|----------| | **Storage (block)** | SAN sync/async mirror, LUN replication | NetApp SnapMirror, Dell EMC RecoverPoint, Pure ActiveCluster | | **Storage (file)** | DFS-R, rsync, robocopy | Windows DFS, Rsync | | **Storage (object)** | Cross-region replication | MinIO replication, S3 CRR | | **Databases** | Log shipping, CDC, streaming replication | PostgreSQL Patroni, Oracle Data Guard, MSSQL AlwaysOn, MySQL Group Replication | | **VM** | Storage vMotion, replication | VMware vSphere Replication, Hyper-V Replica, Zerto | | **Kubernetes** | Velero + Restic, Rook Ceph mirror | Velero, Rook | ### 5. Workload migration #### Wave migration (recommended for medium/large DCs) ```mermaid gantt title Wave migration dateFormat YYYY-MM-DD section Wave 1 - Core DNS, NTP, LDAP :done, w1a, 2026-07-01, 3d Monitoring + logging :done, w1b, after w1a, 2d section Wave 2 - Network Load balancers :active, w2a, 2026-07-06, 2d Firewalls :active, w2b, 2026-07-08, 2d section Wave 3 - Storage NAS migration :w3a, 2026-07-10, 5d SAN replication :w3b, 2026-07-10, 3d section Wave 4 - Dev/Test Dev VMs :w4a, 2026-07-15, 5d section Wave 5 - Prod tier 3 Internal apps :w5a, 2026-07-22, 5d section Wave 6 - Prod tier 2 Business apps :w6a, 2026-07-29, 5d section Wave 7 - Prod tier 1 Critical apps :w7a, 2026-08-05, 5d ``` #### Typical single wave procedure: 1. **Day -7**: Sync data replication (initial seed) 2. **Day -1**: Incremental sync, final test 3. **Day 0 (cutover)**: - Stop application in source DC - Final sync (last delta) - Start application in target DC - DNS/Traffic switch - Smoke test 4. **Day +1**: Monitoring (performance, errors, lag) 5. **Day +7**: Rollback window end (success confirmation) ### 6. Network strategies #### IP re-addressing | Approach | Description | Pros | Cons | |---------|-------|--------|----------| | **Keep IP** | Same IPs, BGP anycast or stretch VLAN | No application config changes | Stretched VLAN/L2 limitations | | **Change IP** | New IP range, DNS/BGP routing change | Clean architecture | Config changes, DNS TTL | | **NAT translation** | NAT between old and new IP space | No application changes | Latency, troubleshooting complexity | **Keep IP** is only possible with: - L2 stretch between DCs (VXLAN, OTV) β€” distance limited - BGP anycast for VIPs (load balancers) - Applications tolerant to ARP cache changes #### DNS cutover ``` 1. Lower TTL to 60–300 s (one week ahead) 2. At cutover, change A/AAAA records to new IPs 3. Wait for propagation (per TTL) 4. Monitor traffic ``` #### Traffic steering | Technique | Use case | |----------|----------| | **BGP** | Change AS path / local pref for traffic steering | | **DNS** | Lower TTL, change A records | | **Load balancer** | Change pool members, health check | | **GSLB** | Global Server Load Balancing (F5 GTM, NSX ALB) | | **Cloud DNS** | AWS Route53, Azure Traffic Manager, Google Cloud DNS | ### 7. Database migration See individual DB files for details. Summary table: | DB | Method | RPO | RTO | Note | |----|--------|-----|-----|----------| | **PostgreSQL** | Streaming replication + Patroni switchover | 0 (sync) / ~MB (async) | min | Patroni auto-failover | | **MySQL** | Group Replication / async replication | 0 (sync) / seconds | min | InnoDB Cluster | | **Oracle** | Data Guard switchover | 0 (sync) | min | Far sync for remote DCs | | **MSSQL** | AlwaysOn AG failover | 0 (sync) | min | Cloud witness | | **MongoDB** | Replica set election | seconds | < 1 min | Priority-based failover | | **Cassandra** | Multi-DC replication | eventual | 0 | Native multi-master | ### 8. Testing | Phase | What to test | Method | |------|-------------|--------| | **Pre-migration** | Application in new DC (isolated) | Dry run on replicated data | | **Cutover** | Functionality, availability, latency | Smoke test, synthetic transactions | | **Post-migration** | Performance, integration, monitoring | A/B comparison with baseline, canary traffic | | **Rollback** | Return to old DC | Tested rollback plan | ### 9. Rollback plan Each wave must have a defined rollback: | Condition | Action | |----------|------| | Application fails to start in new DC | DNS switch back, stop replication | | Performance worse than baseline (> 20 %) | Rollback, root cause analysis | | Integration failure (API timeout, DB connection) | Rollback, dependency check | | Security incident | Rollback, forensic analysis | Rollback must be tested **before** the real cutover. --- ## Special cases ### Mainframe migration - **IBM z/OS** β€” GDPS (Geographically Dispersed Parallel Sysplex) - HyperSwap for storage mirroring - Cross-system coupling facility (XCF) - Often the last migrated component ### COTS applications (Oracle EBS, SAP) - Require vendor-specific migration procedures - Oracle EBS: Autoconfig, cloning (ADXLC) - SAP: System Copy (Homogeneous / Heterogeneous), SWPM, SUM - License re-licensing on HW change ### Cloud migration (On-prem β†’ Cloud) See [CLOUD.en.md](CLOUD.en.md) β€” migration strategies (6 Rs): | Strategy | Description | |-----------|-------| | **Re-host (Lift & Shift)** | VM β†’ Cloud VM (AWS MGN, Azure Migrate) | | **Re-platform** | OS upgrade, managed DB (RDS, Cloud SQL) | | **Re-architect** | Application rewritten as cloud-native | | **Retire** | Decommission unnecessary applications | | **Retain** | Application stays on-prem (review later) | | **Repurchase** | SaaS replacement | --- ## Recommended approach per DC size | DC Size | VM Count | Recommended strategy | Duration | Team | |-------------|----------|---------------------|-------------|-----| | **Small** | < 50 | Big Bang (weekend) | 2–4 days | 3–5 people | | **Medium** | 50–500 | Phased (5–10 waves) | 2–8 weeks | 5–10 people | | **Large** | 500–5000 | Phased + Rolling | 3–12 months | 10–30 people | | **Enterprise** | 5000+ | Parallel Run / Rolling | 12–36 months | 30+ people | --- ## Related - [DATACENTERS.en.md](DATACENTERS.en.md) β€” DC topologies, secondary DC, deployment order - [CLOUD.en.md](CLOUD.en.md) β€” cloud migration strategies (6 Rs) - [DR.en.md](DR.en.md) β€” disaster recovery, RTO/RPO - [NETWORKING.en.md](NETWORKING.en.md) β€” BGP, DNS, VXLAN, traffic steering - [STORAGE.en.md](STORAGE.en.md) β€” storage replication ## Sources Links, books, and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md) *Last revision: 2026-06-12*