new files
This commit is contained in:
246
DC-MIGRATION.en.md
Normal file
246
DC-MIGRATION.en.md
Normal file
@@ -0,0 +1,246 @@
|
||||
# 🏗️ Data Center Migration
|
||||
|
||||
## Migration strategies
|
||||
|
||||
| Strategy | RTO | RPO | Risk | Cost | Duration | Description |
|
||||
|-----------|-----|-----|--------|---------|-------------|-------|
|
||||
| **Cold / Big Bang** | hours–days | days | High | Low | days | Shut everything down, move, power up |
|
||||
| **Phased / Wave** | minutes (per wave) | minutes | Medium | Medium | weeks–months | Workloads moved in waves |
|
||||
| **Rolling** | 0 (live) | 0 | Low | High | months | Live migration per VM/service |
|
||||
| **Parallel Run** | 0 | 0 | Low | Very high | months | Both DCs operational, gradual cutover |
|
||||
| **Pilot Light** | hours | minutes | Medium | Low | weeks | Critical services in new DC, rest migrates |
|
||||
| **Lift & Shift** | hours | minutes | Medium | Low | weeks | VMs/servers moved without configuration changes |
|
||||
| **Re-platform** | hours | minutes | Low | Medium | months | Optimization during migration (OS upgrade, resize) |
|
||||
| **Re-architect** | 0 | 0 | Low | High | months–years | Application redesigned for new platform |
|
||||
|
||||
---
|
||||
|
||||
## Decision tree
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
Start(["DC Migration"]) --> APP{"Application\nstateful?"}
|
||||
APP -->|"Yes"| DOWNTIME{"Tolerates\ndowntime?"}
|
||||
APP -->|"No"| ROLLING["Rolling / Parallel Run"]
|
||||
|
||||
DOWNTIME -->|"Yes, hours+"| COLD["Cold / Big Bang\nSimplest, cheapest\nRisk: all at once"]
|
||||
DOWNTIME -->|"Yes, minutes"| PHASED["Phased / Wave\nBy application / business unit"]
|
||||
DOWNTIME -->|"No (zero downtime)"| SYNC{"Sync replication\npossible?"}
|
||||
|
||||
SYNC -->|"Yes, < 100 km"| ROLLING
|
||||
SYNC -->|"No"| PARALLEL["Parallel Run\nBoth DCs active, gradual cutover"]
|
||||
|
||||
ROLLING --> ROLL_HA{"VMware,\nHyper-V?"}
|
||||
ROLL_HA -->|"Yes"| VMOTION["vMotion / Storage vMotion\nLive migration, 0 downtime"]
|
||||
ROLL_HA -->|"No"| ROLL_REPL["Storage + DB replication\nGradual workload migration"]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Migration phases
|
||||
|
||||
### 1. Discovery and assessment
|
||||
|
||||
| Task | Tools | Output |
|
||||
|------|----------|--------|
|
||||
| HW and SW inventory | RVTools, NetBox, CMDB | Server, VM, and service list |
|
||||
| Dependency mapping | ServiceNow, AppDynamics, manual | Application dependency graph |
|
||||
| Traffic analysis | NetFlow, sFlow, vRNI | Bandwidth, latency, peak usage |
|
||||
| Performance baseline | Prometheus, Zabbix, vRealize | CPU/RAM/disk/network per workload |
|
||||
| License audit | Flexera, SAM | Licenses, support, compliance |
|
||||
|
||||
**Output:** workload list with RTO/RPO, dependencies, and criticality.
|
||||
|
||||
### 2. Planning
|
||||
|
||||
- **Wave plan** — workload division into migration waves (10–50 VMs per wave)
|
||||
- **Dependency ordering** — DNS, NTP, LDAP, PKI first
|
||||
- **Cutover window** — time window for switching (typically weekend)
|
||||
- **Rollback plan** — conditions and procedure for reversal
|
||||
- **Test plan** — what and how to test post-migration
|
||||
- **Communication plan** — who, when, how is informed
|
||||
|
||||
### 3. New DC preparation
|
||||
|
||||
- **Infrastructure** — DNS, NTP, DHCP, LDAP/AD, PKI, monitoring (see [DATACENTERS.en.md](DATACENTERS.en.md) — deployment order)
|
||||
- **Network** — BGP peering, VXLAN/VLAN, firewall rules, load balancers
|
||||
- **Storage** — SAN zoning, NAS exports, Ceph cluster
|
||||
- **Virtualization** — vCenter, Hyper-V cluster, Proxmox
|
||||
|
||||
### 4. Replication and synchronization
|
||||
|
||||
| Layer | Method | Tools |
|
||||
|--------|--------|----------|
|
||||
| **Storage (block)** | SAN sync/async mirror, LUN replication | NetApp SnapMirror, Dell EMC RecoverPoint, Pure ActiveCluster |
|
||||
| **Storage (file)** | DFS-R, rsync, robocopy | Windows DFS, Rsync |
|
||||
| **Storage (object)** | Cross-region replication | MinIO replication, S3 CRR |
|
||||
| **Databases** | Log shipping, CDC, streaming replication | PostgreSQL Patroni, Oracle Data Guard, MSSQL AlwaysOn, MySQL Group Replication |
|
||||
| **VM** | Storage vMotion, replication | VMware vSphere Replication, Hyper-V Replica, Zerto |
|
||||
| **Kubernetes** | Velero + Restic, Rook Ceph mirror | Velero, Rook |
|
||||
|
||||
### 5. Workload migration
|
||||
|
||||
#### Wave migration (recommended for medium/large DCs)
|
||||
|
||||
```mermaid
|
||||
gantt
|
||||
title Wave migration
|
||||
dateFormat YYYY-MM-DD
|
||||
section Wave 1 - Core
|
||||
DNS, NTP, LDAP :done, w1a, 2026-07-01, 3d
|
||||
Monitoring + logging :done, w1b, after w1a, 2d
|
||||
section Wave 2 - Network
|
||||
Load balancers :active, w2a, 2026-07-06, 2d
|
||||
Firewalls :active, w2b, 2026-07-08, 2d
|
||||
section Wave 3 - Storage
|
||||
NAS migration :w3a, 2026-07-10, 5d
|
||||
SAN replication :w3b, 2026-07-10, 3d
|
||||
section Wave 4 - Dev/Test
|
||||
Dev VMs :w4a, 2026-07-15, 5d
|
||||
section Wave 5 - Prod tier 3
|
||||
Internal apps :w5a, 2026-07-22, 5d
|
||||
section Wave 6 - Prod tier 2
|
||||
Business apps :w6a, 2026-07-29, 5d
|
||||
section Wave 7 - Prod tier 1
|
||||
Critical apps :w7a, 2026-08-05, 5d
|
||||
```
|
||||
|
||||
#### Typical single wave procedure:
|
||||
|
||||
1. **Day -7**: Sync data replication (initial seed)
|
||||
2. **Day -1**: Incremental sync, final test
|
||||
3. **Day 0 (cutover)**:
|
||||
- Stop application in source DC
|
||||
- Final sync (last delta)
|
||||
- Start application in target DC
|
||||
- DNS/Traffic switch
|
||||
- Smoke test
|
||||
4. **Day +1**: Monitoring (performance, errors, lag)
|
||||
5. **Day +7**: Rollback window end (success confirmation)
|
||||
|
||||
### 6. Network strategies
|
||||
|
||||
#### IP re-addressing
|
||||
|
||||
| Approach | Description | Pros | Cons |
|
||||
|---------|-------|--------|----------|
|
||||
| **Keep IP** | Same IPs, BGP anycast or stretch VLAN | No application config changes | Stretched VLAN/L2 limitations |
|
||||
| **Change IP** | New IP range, DNS/BGP routing change | Clean architecture | Config changes, DNS TTL |
|
||||
| **NAT translation** | NAT between old and new IP space | No application changes | Latency, troubleshooting complexity |
|
||||
|
||||
**Keep IP** is only possible with:
|
||||
- L2 stretch between DCs (VXLAN, OTV) — distance limited
|
||||
- BGP anycast for VIPs (load balancers)
|
||||
- Applications tolerant to ARP cache changes
|
||||
|
||||
#### DNS cutover
|
||||
|
||||
```
|
||||
1. Lower TTL to 60–300 s (one week ahead)
|
||||
2. At cutover, change A/AAAA records to new IPs
|
||||
3. Wait for propagation (per TTL)
|
||||
4. Monitor traffic
|
||||
```
|
||||
|
||||
#### Traffic steering
|
||||
|
||||
| Technique | Use case |
|
||||
|----------|----------|
|
||||
| **BGP** | Change AS path / local pref for traffic steering |
|
||||
| **DNS** | Lower TTL, change A records |
|
||||
| **Load balancer** | Change pool members, health check |
|
||||
| **GSLB** | Global Server Load Balancing (F5 GTM, NSX ALB) |
|
||||
| **Cloud DNS** | AWS Route53, Azure Traffic Manager, Google Cloud DNS |
|
||||
|
||||
### 7. Database migration
|
||||
|
||||
See individual DB files for details. Summary table:
|
||||
|
||||
| DB | Method | RPO | RTO | Note |
|
||||
|----|--------|-----|-----|----------|
|
||||
| **PostgreSQL** | Streaming replication + Patroni switchover | 0 (sync) / ~MB (async) | min | Patroni auto-failover |
|
||||
| **MySQL** | Group Replication / async replication | 0 (sync) / seconds | min | InnoDB Cluster |
|
||||
| **Oracle** | Data Guard switchover | 0 (sync) | min | Far sync for remote DCs |
|
||||
| **MSSQL** | AlwaysOn AG failover | 0 (sync) | min | Cloud witness |
|
||||
| **MongoDB** | Replica set election | seconds | < 1 min | Priority-based failover |
|
||||
| **Cassandra** | Multi-DC replication | eventual | 0 | Native multi-master |
|
||||
|
||||
### 8. Testing
|
||||
|
||||
| Phase | What to test | Method |
|
||||
|------|-------------|--------|
|
||||
| **Pre-migration** | Application in new DC (isolated) | Dry run on replicated data |
|
||||
| **Cutover** | Functionality, availability, latency | Smoke test, synthetic transactions |
|
||||
| **Post-migration** | Performance, integration, monitoring | A/B comparison with baseline, canary traffic |
|
||||
| **Rollback** | Return to old DC | Tested rollback plan |
|
||||
|
||||
### 9. Rollback plan
|
||||
|
||||
Each wave must have a defined rollback:
|
||||
|
||||
| Condition | Action |
|
||||
|----------|------|
|
||||
| Application fails to start in new DC | DNS switch back, stop replication |
|
||||
| Performance worse than baseline (> 20 %) | Rollback, root cause analysis |
|
||||
| Integration failure (API timeout, DB connection) | Rollback, dependency check |
|
||||
| Security incident | Rollback, forensic analysis |
|
||||
|
||||
Rollback must be tested **before** the real cutover.
|
||||
|
||||
---
|
||||
|
||||
## Special cases
|
||||
|
||||
### Mainframe migration
|
||||
|
||||
- **IBM z/OS** — GDPS (Geographically Dispersed Parallel Sysplex)
|
||||
- HyperSwap for storage mirroring
|
||||
- Cross-system coupling facility (XCF)
|
||||
- Often the last migrated component
|
||||
|
||||
### COTS applications (Oracle EBS, SAP)
|
||||
|
||||
- Require vendor-specific migration procedures
|
||||
- Oracle EBS: Autoconfig, cloning (ADXLC)
|
||||
- SAP: System Copy (Homogeneous / Heterogeneous), SWPM, SUM
|
||||
- License re-licensing on HW change
|
||||
|
||||
### Cloud migration (On-prem → Cloud)
|
||||
|
||||
See [CLOUD.en.md](CLOUD.en.md) — migration strategies (6 Rs):
|
||||
|
||||
| Strategy | Description |
|
||||
|-----------|-------|
|
||||
| **Re-host (Lift & Shift)** | VM → Cloud VM (AWS MGN, Azure Migrate) |
|
||||
| **Re-platform** | OS upgrade, managed DB (RDS, Cloud SQL) |
|
||||
| **Re-architect** | Application rewritten as cloud-native |
|
||||
| **Retire** | Decommission unnecessary applications |
|
||||
| **Retain** | Application stays on-prem (review later) |
|
||||
| **Repurchase** | SaaS replacement |
|
||||
|
||||
---
|
||||
|
||||
## Recommended approach per DC size
|
||||
|
||||
| DC Size | VM Count | Recommended strategy | Duration | Team |
|
||||
|-------------|----------|---------------------|-------------|-----|
|
||||
| **Small** | < 50 | Big Bang (weekend) | 2–4 days | 3–5 people |
|
||||
| **Medium** | 50–500 | Phased (5–10 waves) | 2–8 weeks | 5–10 people |
|
||||
| **Large** | 500–5000 | Phased + Rolling | 3–12 months | 10–30 people |
|
||||
| **Enterprise** | 5000+ | Parallel Run / Rolling | 12–36 months | 30+ people |
|
||||
|
||||
---
|
||||
|
||||
## Related
|
||||
|
||||
- [DATACENTERS.en.md](DATACENTERS.en.md) — DC topologies, secondary DC, deployment order
|
||||
- [CLOUD.en.md](CLOUD.en.md) — cloud migration strategies (6 Rs)
|
||||
- [DR.en.md](DR.en.md) — disaster recovery, RTO/RPO
|
||||
- [NETWORKING.en.md](NETWORKING.en.md) — BGP, DNS, VXLAN, traffic steering
|
||||
- [STORAGE.en.md](STORAGE.en.md) — storage replication
|
||||
|
||||
## Sources
|
||||
|
||||
Links, books, and standards: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
|
||||
|
||||
*Last revision: 2026-06-12*
|
||||
Reference in New Issue
Block a user