VMware Data Protection¶

Scope¶

This document covers VMware data protection capabilities including vSphere Replication, VCF Live Site Recovery (formerly Site Recovery Manager / SRM), backup integration via VADP, snapshot management, and DR planning.

Checklist¶

Why This Matters¶

Data protection failures are discovered at the worst possible time -- during recovery. Untested VCF Live Site Recovery (formerly SRM) recovery plans frequently fail due to stale configurations (removed VMs, changed networks, expired credentials), making regular non-disruptive testing essential rather than optional. VM snapshots are the most misunderstood vSphere feature: they are not backups, they degrade VM performance proportionally to chain length, and a snapshot chain consuming an entire datastore causes all VMs on that datastore to pause. CBT (Changed Block Tracking) bugs have historically caused silent backup corruption (KB 2090639, KB 86234), requiring periodic CBT resets as a preventive measure. Backup proxy sizing directly determines the backup window -- insufficient proxies extend backup jobs into production hours. Ransomware increasingly targets backup infrastructure itself, making immutable or air-gapped backup copies a requirement rather than a best practice.

Common Decisions (ADR Triggers)¶

vSphere Replication vs array-based replication -- vSphere Replication for storage-agnostic, per-VM replication with 5-minute RPO vs array-based replication (SnapMirror, SRDF) for sub-minute RPO, storage-level deduplication, and lower ESXi host overhead at scale
VCF Live Site Recovery vs manual DR runbooks -- VCF Live Site Recovery (formerly SRM) for automated, tested, repeatable DR orchestration vs manual runbooks for environments too small to justify licensing or with non-VMware DR targets
Backup platform selection -- Veeam (market leader, broad feature set, per-VM licensing) vs Commvault (enterprise breadth, complex) vs Cohesity/Rubrik (scale-out, immutable by design) vs Dell Avamar/NetWorker (Dell ecosystem integration)
Backup transport mode -- Hot-Add (backup proxy mounts VM disks, no SAN required, good for vSAN) vs Direct SAN (fastest, requires SAN zoning to proxy) vs NBD/NBDSSL (network-based, slowest, most flexible)
Snapshot policy enforcement -- automated snapshot age monitoring with auto-delete vs manual review; VMware recommends no snapshot older than 72 hours and no chain deeper than 3 levels
Immutable backup architecture -- Linux hardened repository (XFS, immutable flag) vs object storage with object lock (S3, MinIO) vs purpose-built appliance (Cohesity, Rubrik) vs tape for air-gap
RPO/RTO tiering -- Tier 1 (RPO <15min, RTO <1hr, synchronous replication + VCF Live Site Recovery) vs Tier 2 (RPO <1hr, RTO <4hr, async replication + backup) vs Tier 3 (RPO <24hr, RTO <24hr, daily backup only)
vSAN native snapshots vs traditional snapshots -- vSAN ESA native snapshots for improved snapshot performance without chain penalty vs traditional snapshots for vSAN OSA and non-vSAN environments

Reference Links¶

vSphere Replication documentation -- vSphere Replication deployment, configuration, and recovery point objectives
VCF Live Site Recovery documentation -- orchestrated disaster recovery with recovery plans and automated failover
vSphere Storage APIs for Data Protection (VADP) -- VADP and Changed Block Tracking for backup integration