Ransomware Resilience Architecture¶

Scope¶

This file covers ransomware-specific resilience controls including immutable backup architecture, backup isolation, detection patterns, recovery playbooks, network segmentation for lateral movement containment, identity hardening, and incident response sequencing. It focuses on the architectural decisions that determine whether an organization can survive and recover from a ransomware attack. For general backup product selection and sizing, see general/enterprise-backup.md. For broader security controls (IAM, encryption, compliance), see general/security.md. For DR site design and failover orchestration, see general/disaster-recovery.md.

Checklist¶

Why This Matters¶

Ransomware is the most financially destructive cyber threat facing organizations today. The average ransomware payment exceeds $1.5 million, but the total cost of an incident — including downtime, recovery, legal fees, regulatory fines, and reputational damage — routinely reaches tens of millions of dollars. Attacks that destroy backup infrastructure alongside production systems cause the most severe outcomes because they eliminate the organization's ability to recover without paying the ransom.

Modern ransomware operators are not opportunistic script runners. They are organized groups that conduct weeks or months of reconnaissance inside a compromised network before deploying encryption. During this dwell time, they systematically identify and compromise backup infrastructure, exfiltrate sensitive data for double-extortion leverage, and position encryption payloads across every reachable system. The encryption event itself is the final step in a carefully planned operation.

This means that ransomware resilience is not a backup problem alone — it is an architecture problem that spans identity management, network segmentation, detection capabilities, backup isolation, and recovery procedures. An organization with excellent backups but flat network segmentation and shared admin credentials will still suffer a catastrophic breach because the attacker will reach and destroy the backups. Conversely, strong network segmentation with poor backup immutability leaves the organization unable to recover even if the attack is contained. Every layer must be addressed, and they must be designed together as an integrated defense.

The organizations that recover quickly from ransomware attacks share common traits: immutable backups that the attacker could not reach, documented recovery playbooks that were tested before the incident, isolated recovery environments ready to receive restored workloads, and identity infrastructure that could be rebuilt independently from compromised domain controllers.

Common Decisions (ADR Triggers)¶

ADR: Immutable Backup Storage Strategy¶

Context: The organization must protect backup data from deletion or encryption by ransomware operators who have obtained administrative credentials.

Options:

Approach	Immutability Enforcement	Restore Speed	Cost	Operational Complexity
S3 Object Lock (compliance mode)	Platform-enforced, irrevocable for retention period	Fast (network-speed restore from object storage)	Moderate (storage + API + egress costs)	Low (managed service)
Hardened Linux repository (Veeam)	OS-level immutability flags, single-use SSH credentials	Fast (local or LAN restore)	Low (commodity Linux server)	Moderate (requires Linux hardening expertise)
Air-gapped tape vault	Physical isolation (tapes removed from network)	Slow (hours to retrieve and mount media)	Low (tape media is inexpensive)	High (tape rotation logistics, media management)
Purpose-built appliance (Data Domain Retention Lock, ExaGrid)	Hardware/firmware-enforced WORM	Fast (local appliance)	High (proprietary hardware)	Low (vendor-managed firmware)
Cloud-native vault (AWS Backup Vault Lock, Azure immutable vault)	Platform-enforced, policy-driven	Moderate (cloud restore speeds)	Moderate (managed service pricing)	Low (managed service)

Recommendation: Layer at least two approaches — a fast-restore immutable copy (hardened repository or object lock) for operational recovery, plus an air-gapped or physically isolated copy for maximum resilience against sophisticated attackers who may find ways to compromise cloud management planes.

ADR: Backup Infrastructure Isolation Model¶

Context: Backup infrastructure must be isolated from production to prevent a compromised production environment from reaching backup data.

Options: - Separate network segment with firewall rules: Backup servers on a dedicated VLAN with strict firewall rules allowing only backup traffic (one-way from production to backup). Low cost, moderate isolation. Risk: firewall misconfiguration, rules accumulate exceptions over time. - Separate Active Directory domain (or no domain): Backup servers joined to a dedicated forest with no trust relationships to production AD, or using local accounts only. Eliminates credential reuse attack path. Requires managing a separate identity infrastructure. - Separate cloud account/subscription: Backup vaults in a dedicated AWS account, Azure subscription, or GCP project with no IAM cross-account roles to production. Strongest cloud isolation. Requires cross-account backup configuration and separate billing. - Physical air gap: Backup copies written to media that is physically disconnected from any network after the backup window completes. Maximum isolation. Slowest restore, operationally intensive.

Decision drivers: Threat model sophistication (nation-state vs. commodity ransomware), acceptable recovery time (air-gapped copies are slower to restore), operational capacity to manage separate infrastructure, and compliance requirements for backup segregation.

ADR: Ransomware Detection and Response Approach¶

Context: The organization needs to detect ransomware activity during the encryption phase (or earlier, during reconnaissance and lateral movement) and respond fast enough to limit damage.

Options: - Endpoint Detection and Response (EDR): Agents on every endpoint detect encryption behavior, process injection, and known ransomware indicators. Fastest detection on endpoints. Requires agent deployment coverage and SOC monitoring. Products: CrowdStrike Falcon, Microsoft Defender for Endpoint, SentinelOne. - Network Detection and Response (NDR): Passive network monitoring detects lateral movement, C2 communication, and anomalous SMB/RDP traffic. Detects agentless threats and IoT/OT devices. Does not see encrypted internal traffic without TLS inspection. Products: Darktrace, ExtraHop, Vectra. - Canary files and honeypots: Decoy files and shares that generate alerts on any access. Zero false positives (legitimate users never access decoys). Only detects attackers who interact with decoys — not comprehensive. Low cost, simple to deploy. - Backup anomaly detection: Backup software monitors for unusual changes in backup job size, deduplication ratio, or file entropy between backup runs. Detects encryption after it starts. Built into Veeam, Rubrik, Cohesity. Late detection — encryption is already in progress. - SIEM correlation with automated response: Aggregates signals from EDR, NDR, backup anomaly, and canary alerts into a SIEM with SOAR playbooks for automated containment. Most comprehensive, highest operational complexity.

Decision drivers: Existing security tooling investments, SOC maturity and staffing, network encryption posture (TLS inspection feasibility), acceptable false-positive rate for automated containment, and budget for managed detection and response (MDR) services.

ADR: Recovery Environment Architecture¶

Context: After a ransomware incident, restored workloads must be validated as clean before reconnecting to production networks. The organization needs a pre-staged environment for this purpose.

Options: - Dedicated isolated recovery environment (IRE): Pre-staged compute (physical or virtual) on a quarantined network segment with no connectivity to production. Restored VMs boot in the IRE, undergo malware scanning and validation, then are migrated to rebuilt production infrastructure. Fastest recovery, highest infrastructure cost. - Cloud-based clean room: Spin up a temporary isolated VPC/VNet in the cloud for recovery validation. Restored backups are mounted in the cloud environment, scanned, and either migrated to production cloud or exported back to on-premises. Elastic cost (pay only during recovery), requires cloud backup replication. - Parallel restore on production hardware after wipe: Rebuild production infrastructure from scratch (reimage servers, rebuild AD), then restore backups directly. No dedicated recovery infrastructure cost. Slower — rebuild and restore are sequential. Risk: incomplete eradication if any compromised system is missed during rebuild.

Decision drivers: Target recovery time (IRE enables parallel restore while production is rebuilt), budget for standing recovery infrastructure, cloud readiness of backup data, and whether the organization has tested each approach in a DR exercise.