Multi-Site Migration Sequencing¶
Scope¶
This file covers sequencing and coordination of migrations across multiple physical sites: pilot selection, wave ordering, staffing models, inter-site dependencies, and go/no-go governance. For single-site migration mechanics (6 Rs, cutover, rollback), see general/workload-migration.md. For disaster recovery pairing, see general/disaster-recovery.md.
Checklist¶
- [Critical] Has a pilot site been selected based on low risk, representative workloads, and cooperative local staff?
- [Critical] Are inter-site dependencies mapped? (DR pairs, AD/LDAP replication, shared databases, cross-site backup targets)
- [Critical] Are go/no-go criteria defined and agreed upon between migration waves?
- [Critical] Is a rollback plan defined at the appropriate scope for each wave? (per-VM, per-application, per-site)
- [Critical] Is there a site readiness checklist that must be completed before each site's migration begins?
- [Recommended] Is urgency-based sequencing factored in? (lease expiry, licensing deadlines, end-of-support dates, contract renewals)
- [Recommended] Is the staffing model decided? (dedicated team per site, traveling team, hub-and-spoke)
- [Recommended] Are hardware procurement lead times mapped per site and factored into the schedule?
- [Recommended] Is a lessons-learned feedback loop established between waves? (retrospective after each wave, process updates before next)
- [Recommended] Are regional constraints accounted for? (timezone coverage, local team availability, language, regulatory differences)
- [Recommended] Is the parallel vs sequential execution model decided based on staff capacity and risk tolerance?
- [Optional] Are site-local stakeholders identified and briefed on their responsibilities during migration?
- [Optional] Is there a shared dashboard or tracking tool for cross-site migration status?
Why This Matters¶
Multi-site migrations fail when treated as independent single-site projects. Sites share dependencies — DR replication, Active Directory forests, centralized backup, shared WAN links — and migrating one site without understanding its relationship to others causes outages at sites that were not even being migrated. Sequencing mistakes compound: a botched early wave erodes organizational trust and makes later waves politically harder to execute.
The pilot site sets the tone for the entire program. Choosing a site that is too complex, too politically charged, or staffed by a team that is resistant to change turns the pilot into a cautionary tale rather than a confidence builder. Choosing a site that is too simple teaches nothing applicable to the harder sites that follow.
Hardware procurement is the most common schedule-breaker in on-premises migrations. A 12-week lead time for server delivery means the migration schedule is set by the supply chain, not the project plan. Sites that require new hardware must be identified early enough for procurement to complete before the migration window opens.
Pilot Site Selection¶
Selection Criteria¶
| Factor | Ideal Pilot Characteristics |
|---|---|
| Workload complexity | Small number of workloads (10-50 VMs), representative of the broader estate |
| Business risk | Non-revenue-critical applications, tolerant of brief disruptions |
| Local team | Cooperative, technically capable, willing to participate actively |
| Dependencies | Minimal cross-site dependencies, no shared databases with other sites |
| Infrastructure age | Old enough to benefit clearly from migration, not so old that tooling fails |
| Size | Small enough to complete in one wave, large enough to exercise the full process |
| Geography | Same timezone or overlapping hours as the core migration team |
Pilot Anti-Patterns¶
- Headquarters as pilot — Too much political pressure, too many stakeholders, too many critical workloads
- Smallest site as pilot — If it has only 3 VMs, the pilot teaches nothing about wave management or dependency handling
- Most troubled site as pilot — Fixing years of technical debt during a migration pilot guarantees failure
- Remote site with no local staff — Cannot validate the remote execution model until the process itself is proven
Sequencing Strategies¶
Urgency-Based Sequencing¶
External deadlines override all other sequencing preferences. Map every site against hard deadlines:
| Urgency Driver | Typical Lead Time | Impact of Missing Deadline |
|---|---|---|
| Data center lease expiry | 6-18 months notice | Forced emergency move or expensive lease extension |
| Hardware end-of-support | Known years in advance | No vendor support, no replacement parts, compliance risk |
| Software licensing deadlines | Annual renewal cycles | Unexpected license costs if workloads are still running at renewal |
| Regulatory compliance dates | Varies by jurisdiction | Fines, audit failures, operating restrictions |
| Vendor contract renewals | Annual or multi-year | Renewal locks in costs for workloads that should have been migrated |
Rule: Sites with immovable deadlines within the next 6 months go first, regardless of other factors.
Size-Based Sequencing¶
After urgency constraints are satisfied, sequence from small to large:
- Small sites (< 50 VMs) — Build confidence, refine runbooks, train the team
- Medium sites (50-200 VMs) — Test wave management, parallel execution, dependency handling
- Large sites (200+ VMs) — Apply proven process with full staffing, longest cutover windows
This approach builds organizational muscle memory. Each wave is harder than the last, but the team is also more experienced.
Regional Considerations¶
| Factor | Impact on Sequencing |
|---|---|
| Timezone | Migration cutovers during off-hours require either local staff or travel; cluster sites by timezone to minimize travel |
| Local team availability | Sites with strong local teams can run more independently; sites without local expertise need core team presence |
| Language | Documentation, communication plans, and training materials may need translation; factor this into preparation time |
| Local regulations | Some jurisdictions require data residency, notification periods, or union consultation before infrastructure changes |
| Network connectivity | Sites with poor WAN links take longer for data replication; schedule extra time or ship data physically |
Execution Models¶
Parallel vs Sequential¶
| Model | Pros | Cons | When to Use |
|---|---|---|---|
| Strictly sequential | Maximum learning between waves, lowest staff requirement, simplest coordination | Slowest overall timeline | Early waves, small teams, first multi-site migration |
| Parallel (2-3 sites) | Faster overall timeline, keeps momentum | Requires more staff, harder to incorporate lessons learned, higher coordination overhead | Mid-program after process is proven, adequate staffing |
| Fully parallel (all sites) | Fastest possible timeline | Extremely high risk, no feedback loop, massive staffing requirement | Almost never appropriate; only under extreme deadline pressure with experienced team |
Recommended progression: Start sequential for waves 1-2, move to limited parallelism (2 sites) for waves 3-5, scale up only if the team and process support it.
Staffing Models¶
| Model | Description | Pros | Cons |
|---|---|---|---|
| Dedicated team per site | Each site gets its own migration team for the duration | Deep site knowledge, single point of accountability | Expensive, inconsistent process across sites, no knowledge sharing |
| Traveling team | One core team moves from site to site | Consistent process, deep expertise, strong feedback loop | Travel fatigue, single point of failure, sequential only |
| Hub-and-spoke | Central architecture/planning team + local execution teams per site | Scales to parallel execution, consistent standards, local knowledge | Requires strong local teams, coordination overhead, potential for drift |
Hub-and-spoke is the most common model for large programs. The central team owns the runbooks, tooling, and standards. Local teams execute with central oversight. The central team travels for cutovers and complex phases.
Hardware Procurement Planning¶
For on-premises target environments, hardware procurement is on the critical path:
| Phase | Duration | Activities |
|---|---|---|
| Specification | 2-4 weeks | Capacity planning, vendor selection, quote requests |
| Procurement | 1-2 weeks | Purchase orders, contract negotiation, payment |
| Manufacturing/shipping | 4-16 weeks | Vendor build time, shipping, customs (international) |
| Receiving and racking | 1-2 weeks | Physical installation, cabling, power verification |
| Burn-in and base config | 1-2 weeks | Firmware updates, BIOS configuration, OS installation, network configuration |
Total: 9-26 weeks from specification to ready-for-workloads. Start procurement for each site at least 6 months before its planned migration window. For international shipments, add time for customs and import regulations.
Inter-Site Dependencies¶
Common Cross-Site Dependencies¶
| Dependency | Migration Impact | Mitigation |
|---|---|---|
| DR pairs | Migrating the primary site without the DR site breaks disaster recovery | Migrate DR pairs together, or establish temporary DR at the target before migrating the primary |
| Active Directory replication | AD sites and services must reflect the new topology | Add new AD sites before migration, adjust replication topology, update DNS forwarders |
| Centralized backup | Backup targets at one site may serve other sites | Establish new backup targets at the destination before migrating backup sources |
| Shared databases | Applications at site A may depend on databases at site B | Map all cross-site data flows; co-migrate tightly coupled applications or implement replication |
| WAN-dependent applications | Some applications are designed for LAN latency and break over WAN | Identify latency-sensitive cross-site communication; co-locate or re-architect |
| Centralized services | DHCP, DNS, NTP, SIEM, log aggregation, certificate authorities | Deploy local instances at the target before migrating workloads that depend on them |
Dependency Mapping Process¶
- Inventory cross-site network flows — Firewall logs, NetFlow data, application documentation
- Identify shared infrastructure services — DNS, AD, backup, monitoring, certificate services
- Map DR relationships — Which sites are paired, what replication technology, what RPO/RTO
- Interview application owners — Cross-site dependencies are often undocumented
- Build a dependency matrix — Sites as rows and columns, dependencies in cells, migration sequence constraints as output
Feedback Loop Between Waves¶
Post-Wave Retrospective (Mandatory)¶
After each wave, before the next wave begins:
- What worked — Identify process steps that went smoothly, tools that performed well
- What failed — Document every issue, root cause, time to resolve
- What was missing — Checklist items that should have been there but were not
- Runbook updates — Incorporate lessons into the runbook before the next wave uses it
- Tooling updates — Fix scripts, automation, monitoring gaps discovered during the wave
- Timeline recalibration — Adjust estimates for remaining waves based on actual durations
Metrics to Track Across Waves¶
| Metric | Purpose |
|---|---|
| Cutover duration (planned vs actual) | Are estimates improving? |
| Rollback count | Are rollbacks decreasing wave over wave? |
| Unplanned issues per wave | Is the team catching problems earlier? |
| Time to resolve unplanned issues | Is the team getting faster at recovery? |
| Stakeholder satisfaction | Are site teams confident in the process? |
| Post-migration incident rate | Are migrated workloads stable? (measure for 2-4 weeks post-cutover) |
Go/No-Go Governance¶
Go/No-Go Criteria Between Waves¶
| Category | Go Criteria | No-Go Trigger |
|---|---|---|
| Previous wave stability | All workloads from previous wave stable for defined soak period (typically 2-4 weeks) | Active incidents from previous wave unresolved |
| Site readiness | Site readiness checklist 100% complete | Hardware not staged, network not prepared, or local team not briefed |
| Staffing | Migration team available and not fatigued from back-to-back waves | Key personnel unavailable, team burnout indicators |
| Stakeholder approval | Site owner and business sponsor have signed off | Unresolved objections from site stakeholders |
| Rollback readiness | Rollback plan reviewed, rollback resources available | No tested rollback path |
| External factors | No conflicting change freezes, business events, or regulatory deadlines | Conflicts with year-end freeze, audit period, or peak business season |
Decision Authority¶
Define clearly before the program begins:
- Who can declare Go — Program manager with concurrence from site owner and technical lead
- Who can declare No-Go — Any of the above, plus security, compliance, or infrastructure leads
- Who can override No-Go — Executive sponsor only, with documented risk acceptance
- Escalation path — If Go/No-Go decision is disputed, who arbitrates and on what timeline
Site Readiness Checklist¶
Before any workload migration begins at a site, every item must be confirmed:
- Target hardware staged, racked, cabled, powered, and burn-in complete
- Target network configured (VLANs, firewall rules, routing, WAN connectivity verified)
- Target infrastructure services operational (DNS, DHCP, NTP, AD site configured)
- Monitoring and alerting configured for target environment
- Backup infrastructure at target site operational and tested
- Local team briefed on migration plan, schedule, and their responsibilities
- Local team trained on new environment operations (if different from source)
- Communication plan distributed to all affected users at the site
- Maintenance windows approved and communicated
- Rollback plan reviewed with local team and tested where feasible
- Escalation contacts confirmed and available for the cutover window
- Source environment performance baselines captured for post-migration comparison
Rollback Scope¶
Rollback Granularity Options¶
| Scope | Description | Complexity | When Appropriate |
|---|---|---|---|
| Per-VM | Revert a single VM to source | Low | Individual VM fails validation; all others are fine |
| Per-application | Revert all VMs/components of one application | Medium | Application-level failure (e.g., multi-tier app, one tier has issues) |
| Per-wave | Revert all workloads in the current migration wave | High | Systemic issue affecting multiple workloads in the wave |
| Per-site | Revert the entire site migration | Very High | Fundamental infrastructure problem at the target site (network, storage, power) |
Rollback Design Principles¶
- Decide rollback scope before cutover — Do not decide during an incident
- Keep source systems running through the soak period — Do not decommission until rollback window closes
- Data sync-back is the hardest problem — If the migrated workload accepted writes, rolling back means those writes must be replayed or reconciled at the source
- DNS TTLs must be low during the rollback window — High TTLs make traffic redirection slow
- Test the rollback procedure at least once per execution model (first VM rollback, first application rollback, first wave rollback)
- Define the rollback decision authority — Same person who declares Go should have authority to declare rollback
- Time-box the rollback window — Typically 24-72 hours post-cutover; after that, fix-forward is the only option
Common Decisions (ADR Triggers)¶
- Pilot site selection — which site and why, what makes it representative without being risky
- Site sequencing rationale — urgency vs size vs regional grouping, trade-offs documented
- Parallel vs sequential execution — how many sites in parallel, what triggers scaling up
- Staffing model — dedicated vs traveling vs hub-and-spoke, cost and risk trade-offs
- Inter-site dependency resolution — how DR pairs, AD replication, and shared services are handled during migration
- Go/no-go governance — who has authority, what criteria are mandatory vs advisory
- Rollback scope per wave — at what granularity rollback is planned, how data sync-back works
- Feedback loop cadence — retrospective format, mandatory vs optional attendees, how runbook updates are enforced
See Also¶
general/workload-migration.md— Single-site migration mechanics, 6 Rs, cutover planninggeneral/disaster-recovery.md— DR pairing and replication strategiesgeneral/hardware-sizing.md— Capacity planning and hardware specificationgeneral/networking-physical.md— Physical network design and WAN connectivitygeneral/governance.md— Change management and approval processes