Compute¶

Scope¶

Compute platform selection, sizing, scaling, placement, and lifecycle management for all workload types. This file covers what compute decisions need to be made and the trade-offs involved. For provider-specific how, see the provider compute files. Applies to all deployment models: public cloud IaaS/PaaS, on-premises virtualization, bare metal, and hybrid.

Checklist¶

Why This Matters¶

Compute is the foundation of the application tier and typically the largest line item in cloud spend. Incorrect sizing leads to either wasted budget — studies consistently show 30-40% of cloud compute spend is on idle or underutilized resources — or performance degradation during peak load. Getting compute decisions wrong at the architecture phase is costly to fix later: migrating from VMs to containers, re-architecting from stateful to stateless, or changing instance families requires application changes, not just infrastructure changes.

High availability depends on compute placement decisions made at design time. Deploying all instances in a single availability zone or on a single hypervisor host creates a single point of failure that no amount of application-level resilience can compensate for. Similarly, autoscaling that is designed reactively — after the first outage — is always playing catch-up, because scaling policies need to account for instance launch time, application warm-up, and downstream dependency readiness.

OS lifecycle management is the most frequently deferred compute decision and the most dangerous to ignore. Unpatched operating systems are the primary attack vector in most breaches, yet many organizations treat patching as an operational task rather than an architectural decision. The choice between in-place patching and immutable image replacement has profound implications for CI/CD pipeline design, deployment velocity, and incident response capability.

Common Decisions (ADR Triggers)¶

ADR: Compute Platform Selection¶

Context: The architecture must support the application's performance, availability, and operational requirements while aligning with team skills and cost targets.

Options:

Criterion	Virtual Machines	Containers (Kubernetes)	Serverless (FaaS)	Bare Metal
Workload fit	Monoliths, legacy apps, OS-dependent software	Microservices, 12-factor apps, polyglot stacks	Event-driven, intermittent, API backends	License-bound (Oracle per-core), HPC, GPU-direct
Density	1 app per VM typical (lower density)	10-50 containers per node (highest density)	Managed by provider (no capacity planning)	1 workload per server (lowest density)
Scaling speed	Minutes (VM boot + app start)	Seconds (container start)	Milliseconds (cold start 100ms-10s depending on runtime)	N/A (manual provisioning)
Operational overhead	OS patching, antivirus, monitoring agents	Cluster management, networking (CNI), upgrades	Minimal (vendor-managed runtime)	Full hardware lifecycle: firmware, BIOS, RAID, OS
Cost model	Per-hour/second, reserved instances available	Node cost + cluster overhead (control plane, monitoring)	Per-invocation + duration, zero cost at idle	CapEx hardware + data center costs, or cloud bare metal premium
Portability	OVA/VMDK export, limited portability	High — container images run anywhere with K8s	Low — vendor-specific runtimes and triggers	N/A

Decision drivers: Application architecture (monolith vs. microservices), team Kubernetes expertise, cold start tolerance, cost at scale, and compliance requirements (some regulations require dedicated hardware).

ADR: Autoscaling Strategy¶

Context: The application experiences variable load and must scale automatically to maintain performance SLAs without over-provisioning.

Options: - Target tracking: Maintains a metric (CPU, request count) at a target value. Simplest to configure, works well for gradual load changes. Reacts slowly to spikes. - Step scaling: Defines thresholds with corresponding scale-out increments (e.g., CPU > 70% add 2, > 85% add 4). Better for bursty traffic with known patterns. More complex to tune. - Scheduled scaling: Pre-scales at known times (business hours, batch windows, campaign launches). Predictable and cost-efficient when patterns are stable. Does not handle unexpected traffic. - Predictive scaling: Uses ML to forecast load and pre-provision capacity. Combines scheduled and reactive approaches. Available in AWS ASG and some Kubernetes solutions. Requires historical data to be accurate. - Event-driven (KEDA): Scales from zero based on external metrics (SQS queue depth, Kafka lag, cron schedules). Ideal for async processing workloads. Requires metric adapter configuration.

Decision drivers: Traffic pattern predictability, acceptable response time during scale-out (including instance boot + application warm-up), cost tolerance for pre-provisioned headroom, and complexity budget.

ADR: OS Patching Strategy¶

Context: The organization must keep operating systems patched for security compliance while minimizing application downtime.

Options: - In-place patching with maintenance windows: Patch running instances during scheduled windows. Simple, requires reboot scheduling. Creates drift over time as instances accumulate unique patch histories. Tools: AWS SSM Patch Manager, WSUS, Ansible. - Immutable image replacement: Build new golden images (AMI, VM template) via CI/CD pipeline, replace instances via rolling deployment. No drift, fully reproducible. Requires image pipeline investment and longer deployment time. Tools: Packer, EC2 Image Builder, Azure Image Builder. - Container base image updates: Rebuild container images with updated base images, redeploy via Kubernetes rolling update. Fastest patch cycle, but only patches the container OS — host node OS requires separate patching via node rotation (EKS managed node group updates, GKE auto-upgrade).

Recommendation: Use immutable image replacement for production workloads and container base image updates for Kubernetes. Reserve in-place patching for legacy workloads that cannot be rebuilt from automation. Regardless of strategy, scan images with vulnerability scanners (Trivy, Qualys, Nessus) before deployment.

ADR: Spot/Preemptible Instance Strategy¶

Context: The organization wants to reduce compute costs for workloads that can tolerate interruption.

Options: - No spot usage: All on-demand or reserved. Highest cost, zero interruption risk. Appropriate for databases, stateful services, and SLA-bound workloads. - Spot for batch/CI only: Use spot instances for batch processing, CI/CD runners, and data pipelines. Moderate savings with contained blast radius. Most common starting point. - Spot for stateless production: Run stateless web/API tiers on spot with on-demand fallback. Highest savings (60-90%), requires robust health checks, graceful shutdown handlers, and diversified instance pools. Risk: simultaneous reclamation during capacity crunches. - Mixed fleet (spot + on-demand baseline): Run the minimum required capacity on on-demand/reserved, burst with spot. Balanced cost and reliability. Requires capacity-aware load balancing.

Decision drivers: Workload fault tolerance, acceptable interruption frequency, instance type availability in the target region/AZ, and team readiness to handle spot interruption signals.