new files

2026-06-16 15:47:45 +02:00
parent 3fa11ef0f6
commit b53714113c
11 changed files with 2298 additions and 7 deletions
--- a/DC-MIGRATION.en.md
+++ b/DC-MIGRATION.en.md
@@ -0,0 +1,246 @@
+# 🏗️ Data Center Migration
+
+## Migration strategies
+
+| Strategy | RTO | RPO | Risk | Cost | Duration | Description |
+|-----------|-----|-----|--------|---------|-------------|-------|
+| **Cold / Big Bang** | hours–days | days | High | Low | days | Shut everything down, move, power up |
+| **Phased / Wave** | minutes (per wave) | minutes | Medium | Medium | weeks–months | Workloads moved in waves |
+| **Rolling** | 0 (live) | 0 | Low | High | months | Live migration per VM/service |
+| **Parallel Run** | 0 | 0 | Low | Very high | months | Both DCs operational, gradual cutover |
+| **Pilot Light** | hours | minutes | Medium | Low | weeks | Critical services in new DC, rest migrates |
+| **Lift & Shift** | hours | minutes | Medium | Low | weeks | VMs/servers moved without configuration changes |
+| **Re-platform** | hours | minutes | Low | Medium | months | Optimization during migration (OS upgrade, resize) |
+| **Re-architect** | 0 | 0 | Low | High | months–years | Application redesigned for new platform |
+
+---
+
+## Decision tree
+
+```mermaid
+flowchart TD
+    Start(["DC Migration"]) --> APP{"Application\nstateful?"}
+    APP -->|"Yes"| DOWNTIME{"Tolerates\ndowntime?"}
+    APP -->|"No"| ROLLING["Rolling / Parallel Run"]
+
+    DOWNTIME -->|"Yes, hours+"| COLD["Cold / Big Bang\nSimplest, cheapest\nRisk: all at once"]
+    DOWNTIME -->|"Yes, minutes"| PHASED["Phased / Wave\nBy application / business unit"]
+    DOWNTIME -->|"No (zero downtime)"| SYNC{"Sync replication\npossible?"}
+
+    SYNC -->|"Yes, < 100 km"| ROLLING
+    SYNC -->|"No"| PARALLEL["Parallel Run\nBoth DCs active, gradual cutover"]
+
+    ROLLING --> ROLL_HA{"VMware,\nHyper-V?"}
+    ROLL_HA -->|"Yes"| VMOTION["vMotion / Storage vMotion\nLive migration, 0 downtime"]
+    ROLL_HA -->|"No"| ROLL_REPL["Storage + DB replication\nGradual workload migration"]
+```
+
+---
+
+## Migration phases
+
+### 1. Discovery and assessment
+
+| Task | Tools | Output |
+|------|----------|--------|
+| HW and SW inventory | RVTools, NetBox, CMDB | Server, VM, and service list |
+| Dependency mapping | ServiceNow, AppDynamics, manual | Application dependency graph |
+| Traffic analysis | NetFlow, sFlow, vRNI | Bandwidth, latency, peak usage |
+| Performance baseline | Prometheus, Zabbix, vRealize | CPU/RAM/disk/network per workload |
+| License audit | Flexera, SAM | Licenses, support, compliance |
+
+**Output:** workload list with RTO/RPO, dependencies, and criticality.
+
+### 2. Planning
+
+- **Wave plan** — workload division into migration waves (10–50 VMs per wave)
+- **Dependency ordering** — DNS, NTP, LDAP, PKI first
+- **Cutover window** — time window for switching (typically weekend)
+- **Rollback plan** — conditions and procedure for reversal
+- **Test plan** — what and how to test post-migration
+- **Communication plan** — who, when, how is informed
+
+### 3. New DC preparation
+
+- **Infrastructure** — DNS, NTP, DHCP, LDAP/AD, PKI, monitoring (see [DATACENTERS.en.md](DATACENTERS.en.md) — deployment order)
+- **Network** — BGP peering, VXLAN/VLAN, firewall rules, load balancers
+- **Storage** — SAN zoning, NAS exports, Ceph cluster
+- **Virtualization** — vCenter, Hyper-V cluster, Proxmox
+
+### 4. Replication and synchronization
+
+| Layer | Method | Tools |
+|--------|--------|----------|
+| **Storage (block)** | SAN sync/async mirror, LUN replication | NetApp SnapMirror, Dell EMC RecoverPoint, Pure ActiveCluster |
+| **Storage (file)** | DFS-R, rsync, robocopy | Windows DFS, Rsync |
+| **Storage (object)** | Cross-region replication | MinIO replication, S3 CRR |
+| **Databases** | Log shipping, CDC, streaming replication | PostgreSQL Patroni, Oracle Data Guard, MSSQL AlwaysOn, MySQL Group Replication |
+| **VM** | Storage vMotion, replication | VMware vSphere Replication, Hyper-V Replica, Zerto |
+| **Kubernetes** | Velero + Restic, Rook Ceph mirror | Velero, Rook |
+
+### 5. Workload migration
+
+#### Wave migration (recommended for medium/large DCs)
+
+```mermaid
+gantt
+    title Wave migration
+    dateFormat  YYYY-MM-DD
+    section Wave 1 - Core
+    DNS, NTP, LDAP    :done, w1a, 2026-07-01, 3d
+    Monitoring + logging :done, w1b, after w1a, 2d
+    section Wave 2 - Network
+    Load balancers     :active, w2a, 2026-07-06, 2d
+    Firewalls          :active, w2b, 2026-07-08, 2d
+    section Wave 3 - Storage
+    NAS migration      :w3a, 2026-07-10, 5d
+    SAN replication    :w3b, 2026-07-10, 3d
+    section Wave 4 - Dev/Test
+    Dev VMs            :w4a, 2026-07-15, 5d
+    section Wave 5 - Prod tier 3
+    Internal apps      :w5a, 2026-07-22, 5d
+    section Wave 6 - Prod tier 2
+    Business apps      :w6a, 2026-07-29, 5d
+    section Wave 7 - Prod tier 1
+    Critical apps      :w7a, 2026-08-05, 5d
+```
+
+#### Typical single wave procedure:
+
+1. **Day -7**: Sync data replication (initial seed)
+2. **Day -1**: Incremental sync, final test
+3. **Day 0 (cutover)**:
+   - Stop application in source DC
+   - Final sync (last delta)
+   - Start application in target DC
+   - DNS/Traffic switch
+   - Smoke test
+4. **Day +1**: Monitoring (performance, errors, lag)
+5. **Day +7**: Rollback window end (success confirmation)
+
+### 6. Network strategies
+
+#### IP re-addressing
+
+| Approach | Description | Pros | Cons |
+|---------|-------|--------|----------|
+| **Keep IP** | Same IPs, BGP anycast or stretch VLAN | No application config changes | Stretched VLAN/L2 limitations |
+| **Change IP** | New IP range, DNS/BGP routing change | Clean architecture | Config changes, DNS TTL |
+| **NAT translation** | NAT between old and new IP space | No application changes | Latency, troubleshooting complexity |
+
+**Keep IP** is only possible with:
+- L2 stretch between DCs (VXLAN, OTV) — distance limited
+- BGP anycast for VIPs (load balancers)
+- Applications tolerant to ARP cache changes
+
+#### DNS cutover
+
+```
+1. Lower TTL to 60–300 s (one week ahead)
+2. At cutover, change A/AAAA records to new IPs
+3. Wait for propagation (per TTL)
+4. Monitor traffic
+```
+
+#### Traffic steering
+
+| Technique | Use case |
+|----------|----------|
+| **BGP** | Change AS path / local pref for traffic steering |
+| **DNS** | Lower TTL, change A records |
+| **Load balancer** | Change pool members, health check |
+| **GSLB** | Global Server Load Balancing (F5 GTM, NSX ALB) |
+| **Cloud DNS** | AWS Route53, Azure Traffic Manager, Google Cloud DNS |
+
+### 7. Database migration
+
+See individual DB files for details. Summary table:
+
+| DB | Method | RPO | RTO | Note |
+|----|--------|-----|-----|----------|
+| **PostgreSQL** | Streaming replication + Patroni switchover | 0 (sync) / ~MB (async) | min | Patroni auto-failover |
+| **MySQL** | Group Replication / async replication | 0 (sync) / seconds | min | InnoDB Cluster |
+| **Oracle** | Data Guard switchover | 0 (sync) | min | Far sync for remote DCs |
+| **MSSQL** | AlwaysOn AG failover | 0 (sync) | min | Cloud witness |
+| **MongoDB** | Replica set election | seconds | < 1 min | Priority-based failover |
+| **Cassandra** | Multi-DC replication | eventual | 0 | Native multi-master |
+
+### 8. Testing
+
+| Phase | What to test | Method |
+|------|-------------|--------|
+| **Pre-migration** | Application in new DC (isolated) | Dry run on replicated data |
+| **Cutover** | Functionality, availability, latency | Smoke test, synthetic transactions |
+| **Post-migration** | Performance, integration, monitoring | A/B comparison with baseline, canary traffic |
+| **Rollback** | Return to old DC | Tested rollback plan |
+
+### 9. Rollback plan
+
+Each wave must have a defined rollback:
+
+| Condition | Action |
+|----------|------|
+| Application fails to start in new DC | DNS switch back, stop replication |
+| Performance worse than baseline (> 20 %) | Rollback, root cause analysis |
+| Integration failure (API timeout, DB connection) | Rollback, dependency check |
+| Security incident | Rollback, forensic analysis |
+
+Rollback must be tested **before** the real cutover.
+
+---
+
+## Special cases
+
+### Mainframe migration
+
+- **IBM z/OS** — GDPS (Geographically Dispersed Parallel Sysplex)
+- HyperSwap for storage mirroring
+- Cross-system coupling facility (XCF)
+- Often the last migrated component
+
+### COTS applications (Oracle EBS, SAP)
+
+- Require vendor-specific migration procedures
+- Oracle EBS: Autoconfig, cloning (ADXLC)
+- SAP: System Copy (Homogeneous / Heterogeneous), SWPM, SUM
+- License re-licensing on HW change
+
+### Cloud migration (On-prem → Cloud)
+
+See [CLOUD.en.md](CLOUD.en.md) — migration strategies (6 Rs):
+
+| Strategy | Description |
+|-----------|-------|
+| **Re-host (Lift & Shift)** | VM → Cloud VM (AWS MGN, Azure Migrate) |
+| **Re-platform** | OS upgrade, managed DB (RDS, Cloud SQL) |
+| **Re-architect** | Application rewritten as cloud-native |
+| **Retire** | Decommission unnecessary applications |
+| **Retain** | Application stays on-prem (review later) |
+| **Repurchase** | SaaS replacement |
+
+---
+
+## Recommended approach per DC size
+
+| DC Size | VM Count | Recommended strategy | Duration | Team |
+|-------------|----------|---------------------|-------------|-----|
+| **Small** | < 50 | Big Bang (weekend) | 2–4 days | 3–5 people |
+| **Medium** | 50–500 | Phased (5–10 waves) | 2–8 weeks | 5–10 people |
+| **Large** | 500–5000 | Phased + Rolling | 3–12 months | 10–30 people |
+| **Enterprise** | 5000+ | Parallel Run / Rolling | 12–36 months | 30+ people |
+
+---
+
+## Related
+
+- [DATACENTERS.en.md](DATACENTERS.en.md) — DC topologies, secondary DC, deployment order
+- [CLOUD.en.md](CLOUD.en.md) — cloud migration strategies (6 Rs)
+- [DR.en.md](DR.en.md) — disaster recovery, RTO/RPO
+- [NETWORKING.en.md](NETWORKING.en.md) — BGP, DNS, VXLAN, traffic steering
+- [STORAGE.en.md](STORAGE.en.md) — storage replication
+
+## Sources
+
+Links, books, and standards: [sources/infrastructure/sources.md](sources/infrastructure/sources.md)
+
+*Last revision: 2026-06-12*