Files

Stanislav Hubacek b53714113c new files

2026-06-16 15:47:45 +02:00

9.6 KiB

Raw Blame History

🏗️ Data Center Migration

Migration strategies

Strategy	RTO	RPO	Risk	Cost	Duration	Description
Cold / Big Bang	hours–days	days	High	Low	days	Shut everything down, move, power up
Phased / Wave	minutes (per wave)	minutes	Medium	Medium	weeks–months	Workloads moved in waves
Rolling	0 (live)	0	Low	High	months	Live migration per VM/service
Parallel Run	0	0	Low	Very high	months	Both DCs operational, gradual cutover
Pilot Light	hours	minutes	Medium	Low	weeks	Critical services in new DC, rest migrates
Lift & Shift	hours	minutes	Medium	Low	weeks	VMs/servers moved without configuration changes
Re-platform	hours	minutes	Low	Medium	months	Optimization during migration (OS upgrade, resize)
Re-architect	0	0	Low	High	months–years	Application redesigned for new platform

Decision tree

flowchart TD
    Start(["DC Migration"]) --> APP{"Application\nstateful?"}
    APP -->|"Yes"| DOWNTIME{"Tolerates\ndowntime?"}
    APP -->|"No"| ROLLING["Rolling / Parallel Run"]

    DOWNTIME -->|"Yes, hours+"| COLD["Cold / Big Bang\nSimplest, cheapest\nRisk: all at once"]
    DOWNTIME -->|"Yes, minutes"| PHASED["Phased / Wave\nBy application / business unit"]
    DOWNTIME -->|"No (zero downtime)"| SYNC{"Sync replication\npossible?"}

    SYNC -->|"Yes, < 100 km"| ROLLING
    SYNC -->|"No"| PARALLEL["Parallel Run\nBoth DCs active, gradual cutover"]

    ROLLING --> ROLL_HA{"VMware,\nHyper-V?"}
    ROLL_HA -->|"Yes"| VMOTION["vMotion / Storage vMotion\nLive migration, 0 downtime"]
    ROLL_HA -->|"No"| ROLL_REPL["Storage + DB replication\nGradual workload migration"]

Migration phases

1. Discovery and assessment

Task	Tools	Output
HW and SW inventory	RVTools, NetBox, CMDB	Server, VM, and service list
Dependency mapping	ServiceNow, AppDynamics, manual	Application dependency graph
Traffic analysis	NetFlow, sFlow, vRNI	Bandwidth, latency, peak usage
Performance baseline	Prometheus, Zabbix, vRealize	CPU/RAM/disk/network per workload
License audit	Flexera, SAM	Licenses, support, compliance

Output: workload list with RTO/RPO, dependencies, and criticality.

2. Planning

Wave plan — workload division into migration waves (10–50 VMs per wave)
Dependency ordering — DNS, NTP, LDAP, PKI first
Cutover window — time window for switching (typically weekend)
Rollback plan — conditions and procedure for reversal
Test plan — what and how to test post-migration
Communication plan — who, when, how is informed

3. New DC preparation

Infrastructure — DNS, NTP, DHCP, LDAP/AD, PKI, monitoring (see DATACENTERS.en.md — deployment order)
Network — BGP peering, VXLAN/VLAN, firewall rules, load balancers
Storage — SAN zoning, NAS exports, Ceph cluster
Virtualization — vCenter, Hyper-V cluster, Proxmox

4. Replication and synchronization

Layer	Method	Tools
Storage (block)	SAN sync/async mirror, LUN replication	NetApp SnapMirror, Dell EMC RecoverPoint, Pure ActiveCluster
Storage (file)	DFS-R, rsync, robocopy	Windows DFS, Rsync
Storage (object)	Cross-region replication	MinIO replication, S3 CRR
Databases	Log shipping, CDC, streaming replication	PostgreSQL Patroni, Oracle Data Guard, MSSQL AlwaysOn, MySQL Group Replication
VM	Storage vMotion, replication	VMware vSphere Replication, Hyper-V Replica, Zerto
Kubernetes	Velero + Restic, Rook Ceph mirror	Velero, Rook

5. Workload migration

Wave migration (recommended for medium/large DCs)

gantt
    title Wave migration
    dateFormat  YYYY-MM-DD
    section Wave 1 - Core
    DNS, NTP, LDAP    :done, w1a, 2026-07-01, 3d
    Monitoring + logging :done, w1b, after w1a, 2d
    section Wave 2 - Network
    Load balancers     :active, w2a, 2026-07-06, 2d
    Firewalls          :active, w2b, 2026-07-08, 2d
    section Wave 3 - Storage
    NAS migration      :w3a, 2026-07-10, 5d
    SAN replication    :w3b, 2026-07-10, 3d
    section Wave 4 - Dev/Test
    Dev VMs            :w4a, 2026-07-15, 5d
    section Wave 5 - Prod tier 3
    Internal apps      :w5a, 2026-07-22, 5d
    section Wave 6 - Prod tier 2
    Business apps      :w6a, 2026-07-29, 5d
    section Wave 7 - Prod tier 1
    Critical apps      :w7a, 2026-08-05, 5d

Typical single wave procedure:

Day -7: Sync data replication (initial seed)
Day -1: Incremental sync, final test
Day 0 (cutover):
- Stop application in source DC
- Final sync (last delta)
- Start application in target DC
- DNS/Traffic switch
- Smoke test
Day +1: Monitoring (performance, errors, lag)
Day +7: Rollback window end (success confirmation)

6. Network strategies

IP re-addressing

Approach	Description	Pros	Cons
Keep IP	Same IPs, BGP anycast or stretch VLAN	No application config changes	Stretched VLAN/L2 limitations
Change IP	New IP range, DNS/BGP routing change	Clean architecture	Config changes, DNS TTL
NAT translation	NAT between old and new IP space	No application changes	Latency, troubleshooting complexity

Keep IP is only possible with:

L2 stretch between DCs (VXLAN, OTV) — distance limited
BGP anycast for VIPs (load balancers)
Applications tolerant to ARP cache changes

DNS cutover

1. Lower TTL to 60–300 s (one week ahead)
2. At cutover, change A/AAAA records to new IPs
3. Wait for propagation (per TTL)
4. Monitor traffic

Traffic steering

Technique	Use case
BGP	Change AS path / local pref for traffic steering
DNS	Lower TTL, change A records
Load balancer	Change pool members, health check
GSLB	Global Server Load Balancing (F5 GTM, NSX ALB)
Cloud DNS	AWS Route53, Azure Traffic Manager, Google Cloud DNS

7. Database migration

See individual DB files for details. Summary table:

DB	Method	RPO	RTO	Note
PostgreSQL	Streaming replication + Patroni switchover	0 (sync) / ~MB (async)	min	Patroni auto-failover
MySQL	Group Replication / async replication	0 (sync) / seconds	min	InnoDB Cluster
Oracle	Data Guard switchover	0 (sync)	min	Far sync for remote DCs
MSSQL	AlwaysOn AG failover	0 (sync)	min	Cloud witness
MongoDB	Replica set election	seconds	< 1 min	Priority-based failover
Cassandra	Multi-DC replication	eventual	0	Native multi-master

8. Testing

Phase	What to test	Method
Pre-migration	Application in new DC (isolated)	Dry run on replicated data
Cutover	Functionality, availability, latency	Smoke test, synthetic transactions
Post-migration	Performance, integration, monitoring	A/B comparison with baseline, canary traffic
Rollback	Return to old DC	Tested rollback plan

9. Rollback plan

Each wave must have a defined rollback:

Condition	Action
Application fails to start in new DC	DNS switch back, stop replication
Performance worse than baseline (> 20 %)	Rollback, root cause analysis
Integration failure (API timeout, DB connection)	Rollback, dependency check
Security incident	Rollback, forensic analysis

Rollback must be tested before the real cutover.

Special cases

Mainframe migration

IBM z/OS — GDPS (Geographically Dispersed Parallel Sysplex)
HyperSwap for storage mirroring
Cross-system coupling facility (XCF)
Often the last migrated component

COTS applications (Oracle EBS, SAP)

Require vendor-specific migration procedures
Oracle EBS: Autoconfig, cloning (ADXLC)
SAP: System Copy (Homogeneous / Heterogeneous), SWPM, SUM
License re-licensing on HW change

Cloud migration (On-prem → Cloud)

See CLOUD.en.md — migration strategies (6 Rs):

Strategy	Description
Re-host (Lift & Shift)	VM → Cloud VM (AWS MGN, Azure Migrate)
Re-platform	OS upgrade, managed DB (RDS, Cloud SQL)
Re-architect	Application rewritten as cloud-native
Retire	Decommission unnecessary applications
Retain	Application stays on-prem (review later)
Repurchase	SaaS replacement

Recommended approach per DC size

DC Size	VM Count	Recommended strategy	Duration	Team
Small	< 50	Big Bang (weekend)	2–4 days	3–5 people
Medium	50–500	Phased (5–10 waves)	2–8 weeks	5–10 people
Large	500–5000	Phased + Rolling	3–12 months	10–30 people
Enterprise	5000+	Parallel Run / Rolling	12–36 months	30+ people

DATACENTERS.en.md — DC topologies, secondary DC, deployment order
CLOUD.en.md — cloud migration strategies (6 Rs)
DR.en.md — disaster recovery, RTO/RPO
NETWORKING.en.md — BGP, DNS, VXLAN, traffic steering
STORAGE.en.md — storage replication

Sources

Links, books, and standards: sources/infrastructure/sources.md

Last revision: 2026-06-12

9.6 KiB Raw Blame History Unescape Escape