new files

This commit is contained in:
Stanislav Hubacek
2026-06-16 15:47:45 +02:00
parent 3fa11ef0f6
commit b53714113c
11 changed files with 2298 additions and 7 deletions

246
DC-MIGRATION.en.md Normal file
View File

@@ -0,0 +1,246 @@
# 🏗️ Data Center Migration
## Migration strategies
| Strategy | RTO | RPO | Risk | Cost | Duration | Description |
|-----------|-----|-----|--------|---------|-------------|-------|
| **Cold / Big Bang** | hoursdays | days | High | Low | days | Shut everything down, move, power up |
| **Phased / Wave** | minutes (per wave) | minutes | Medium | Medium | weeksmonths | Workloads moved in waves |
| **Rolling** | 0 (live) | 0 | Low | High | months | Live migration per VM/service |
| **Parallel Run** | 0 | 0 | Low | Very high | months | Both DCs operational, gradual cutover |
| **Pilot Light** | hours | minutes | Medium | Low | weeks | Critical services in new DC, rest migrates |
| **Lift & Shift** | hours | minutes | Medium | Low | weeks | VMs/servers moved without configuration changes |
| **Re-platform** | hours | minutes | Low | Medium | months | Optimization during migration (OS upgrade, resize) |
| **Re-architect** | 0 | 0 | Low | High | monthsyears | Application redesigned for new platform |
---
## Decision tree
```mermaid
flowchart TD
Start(["DC Migration"]) --> APP{"Application\nstateful?"}
APP -->|"Yes"| DOWNTIME{"Tolerates\ndowntime?"}
APP -->|"No"| ROLLING["Rolling / Parallel Run"]
DOWNTIME -->|"Yes, hours+"| COLD["Cold / Big Bang\nSimplest, cheapest\nRisk: all at once"]
DOWNTIME -->|"Yes, minutes"| PHASED["Phased / Wave\nBy application / business unit"]
DOWNTIME -->|"No (zero downtime)"| SYNC{"Sync replication\npossible?"}
SYNC -->|"Yes, < 100 km"| ROLLING
SYNC -->|"No"| PARALLEL["Parallel Run\nBoth DCs active, gradual cutover"]
ROLLING --> ROLL_HA{"VMware,\nHyper-V?"}
ROLL_HA -->|"Yes"| VMOTION["vMotion / Storage vMotion\nLive migration, 0 downtime"]
ROLL_HA -->|"No"| ROLL_REPL["Storage + DB replication\nGradual workload migration"]
```
---
## Migration phases
### 1. Discovery and assessment
| Task | Tools | Output |
|------|----------|--------|
| HW and SW inventory | RVTools, NetBox, CMDB | Server, VM, and service list |
| Dependency mapping | ServiceNow, AppDynamics, manual | Application dependency graph |
| Traffic analysis | NetFlow, sFlow, vRNI | Bandwidth, latency, peak usage |
| Performance baseline | Prometheus, Zabbix, vRealize | CPU/RAM/disk/network per workload |
| License audit | Flexera, SAM | Licenses, support, compliance |
**Output:** workload list with RTO/RPO, dependencies, and criticality.
### 2. Planning
- **Wave plan** — workload division into migration waves (1050 VMs per wave)
- **Dependency ordering** — DNS, NTP, LDAP, PKI first
- **Cutover window** — time window for switching (typically weekend)
- **Rollback plan** — conditions and procedure for reversal
- **Test plan** — what and how to test post-migration
- **Communication plan** — who, when, how is informed
### 3. New DC preparation
- **Infrastructure** — DNS, NTP, DHCP, LDAP/AD, PKI, monitoring (see [DATACENTERS.en.md](DATACENTERS.en.md) — deployment order)
- **Network** — BGP peering, VXLAN/VLAN, firewall rules, load balancers
- **Storage** — SAN zoning, NAS exports, Ceph cluster
- **Virtualization** — vCenter, Hyper-V cluster, Proxmox
### 4. Replication and synchronization
| Layer | Method | Tools |
|--------|--------|----------|
| **Storage (block)** | SAN sync/async mirror, LUN replication | NetApp SnapMirror, Dell EMC RecoverPoint, Pure ActiveCluster |
| **Storage (file)** | DFS-R, rsync, robocopy | Windows DFS, Rsync |
| **Storage (object)** | Cross-region replication | MinIO replication, S3 CRR |
| **Databases** | Log shipping, CDC, streaming replication | PostgreSQL Patroni, Oracle Data Guard, MSSQL AlwaysOn, MySQL Group Replication |
| **VM** | Storage vMotion, replication | VMware vSphere Replication, Hyper-V Replica, Zerto |
| **Kubernetes** | Velero + Restic, Rook Ceph mirror | Velero, Rook |
### 5. Workload migration
#### Wave migration (recommended for medium/large DCs)
```mermaid
gantt
title Wave migration
dateFormat YYYY-MM-DD
section Wave 1 - Core
DNS, NTP, LDAP :done, w1a, 2026-07-01, 3d
Monitoring + logging :done, w1b, after w1a, 2d
section Wave 2 - Network
Load balancers :active, w2a, 2026-07-06, 2d
Firewalls :active, w2b, 2026-07-08, 2d
section Wave 3 - Storage
NAS migration :w3a, 2026-07-10, 5d
SAN replication :w3b, 2026-07-10, 3d
section Wave 4 - Dev/Test
Dev VMs :w4a, 2026-07-15, 5d
section Wave 5 - Prod tier 3
Internal apps :w5a, 2026-07-22, 5d
section Wave 6 - Prod tier 2
Business apps :w6a, 2026-07-29, 5d
section Wave 7 - Prod tier 1
Critical apps :w7a, 2026-08-05, 5d
```
#### Typical single wave procedure:
1. **Day -7**: Sync data replication (initial seed)
2. **Day -1**: Incremental sync, final test
3. **Day 0 (cutover)**:
- Stop application in source DC
- Final sync (last delta)
- Start application in target DC
- DNS/Traffic switch
- Smoke test
4. **Day +1**: Monitoring (performance, errors, lag)
5. **Day +7**: Rollback window end (success confirmation)
### 6. Network strategies
#### IP re-addressing
| Approach | Description | Pros | Cons |
|---------|-------|--------|----------|
| **Keep IP** | Same IPs, BGP anycast or stretch VLAN | No application config changes | Stretched VLAN/L2 limitations |
| **Change IP** | New IP range, DNS/BGP routing change | Clean architecture | Config changes, DNS TTL |
| **NAT translation** | NAT between old and new IP space | No application changes | Latency, troubleshooting complexity |
**Keep IP** is only possible with:
- L2 stretch between DCs (VXLAN, OTV) — distance limited
- BGP anycast for VIPs (load balancers)
- Applications tolerant to ARP cache changes
#### DNS cutover
```
1. Lower TTL to 60300 s (one week ahead)
2. At cutover, change A/AAAA records to new IPs
3. Wait for propagation (per TTL)
4. Monitor traffic
```
#### Traffic steering
| Technique | Use case |
|----------|----------|
| **BGP** | Change AS path / local pref for traffic steering |
| **DNS** | Lower TTL, change A records |
| **Load balancer** | Change pool members, health check |
| **GSLB** | Global Server Load Balancing (F5 GTM, NSX ALB) |
| **Cloud DNS** | AWS Route53, Azure Traffic Manager, Google Cloud DNS |
### 7. Database migration
See individual DB files for details. Summary table:
| DB | Method | RPO | RTO | Note |
|----|--------|-----|-----|----------|
| **PostgreSQL** | Streaming replication + Patroni switchover | 0 (sync) / ~MB (async) | min | Patroni auto-failover |
| **MySQL** | Group Replication / async replication | 0 (sync) / seconds | min | InnoDB Cluster |
| **Oracle** | Data Guard switchover | 0 (sync) | min | Far sync for remote DCs |
| **MSSQL** | AlwaysOn AG failover | 0 (sync) | min | Cloud witness |
| **MongoDB** | Replica set election | seconds | < 1 min | Priority-based failover |
| **Cassandra** | Multi-DC replication | eventual | 0 | Native multi-master |
### 8. Testing
| Phase | What to test | Method |
|------|-------------|--------|
| **Pre-migration** | Application in new DC (isolated) | Dry run on replicated data |
| **Cutover** | Functionality, availability, latency | Smoke test, synthetic transactions |
| **Post-migration** | Performance, integration, monitoring | A/B comparison with baseline, canary traffic |
| **Rollback** | Return to old DC | Tested rollback plan |
### 9. Rollback plan
Each wave must have a defined rollback:
| Condition | Action |
|----------|------|
| Application fails to start in new DC | DNS switch back, stop replication |
| Performance worse than baseline (> 20 %) | Rollback, root cause analysis |
| Integration failure (API timeout, DB connection) | Rollback, dependency check |
| Security incident | Rollback, forensic analysis |
Rollback must be tested **before** the real cutover.
---
## Special cases
### Mainframe migration
- **IBM z/OS** — GDPS (Geographically Dispersed Parallel Sysplex)
- HyperSwap for storage mirroring
- Cross-system coupling facility (XCF)
- Often the last migrated component
### COTS applications (Oracle EBS, SAP)
- Require vendor-specific migration procedures
- Oracle EBS: Autoconfig, cloning (ADXLC)
- SAP: System Copy (Homogeneous / Heterogeneous), SWPM, SUM
- License re-licensing on HW change
### Cloud migration (On-prem → Cloud)
See [CLOUD.en.md](CLOUD.en.md) — migration strategies (6 Rs):
| Strategy | Description |
|-----------|-------|
| **Re-host (Lift & Shift)** | VM → Cloud VM (AWS MGN, Azure Migrate) |
| **Re-platform** | OS upgrade, managed DB (RDS, Cloud SQL) |
| **Re-architect** | Application rewritten as cloud-native |
| **Retire** | Decommission unnecessary applications |
| **Retain** | Application stays on-prem (review later) |
| **Repurchase** | SaaS replacement |
---
## Recommended approach per DC size
| DC Size | VM Count | Recommended strategy | Duration | Team |
|-------------|----------|---------------------|-------------|-----|
| **Small** | < 50 | Big Bang (weekend) | 24 days | 35 people |
| **Medium** | 50500 | Phased (510 waves) | 28 weeks | 510 people |
| **Large** | 5005000 | Phased + Rolling | 312 months | 1030 people |
| **Enterprise** | 5000+ | Parallel Run / Rolling | 1236 months | 30+ people |
---
## Related
- [DATACENTERS.en.md](DATACENTERS.en.md) — DC topologies, secondary DC, deployment order
- [CLOUD.en.md](CLOUD.en.md) — cloud migration strategies (6 Rs)
- [DR.en.md](DR.en.md) — disaster recovery, RTO/RPO
- [NETWORKING.en.md](NETWORKING.en.md) — BGP, DNS, VXLAN, traffic steering
- [STORAGE.en.md](STORAGE.en.md) — storage replication
## Sources
Links, books, and standards: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
*Last revision: 2026-06-12*