Deployment¶

Scope¶

This file covers deployment strategy decisions: how application code moves from source control to production, including deployment models, rollback procedures, infrastructure as code, database migration coordination, environment promotion, and deployment observability. For provider-specific CI/CD pipeline implementation, see the provider files. For container orchestration details, see general/container-orchestration.md.

Checklist¶

Why This Matters¶

Deployment is the highest-risk routine activity in operations. Industry data consistently shows that deployments and configuration changes cause the majority of production incidents. A poorly designed deployment process creates a compounding problem: teams that fear deployments deploy less frequently, which means each deployment carries more changes, which increases the blast radius when something goes wrong, which reinforces the fear. Breaking this cycle requires investment in deployment automation, fast rollback, and deployment observability.

The deployment strategy directly affects availability, developer velocity, and operational cost. A blue-green deployment gives instant rollback but doubles infrastructure spend. A canary deployment minimizes blast radius but requires sophisticated traffic management and automated analysis. Rolling updates are the simplest to implement but offer the worst rollback story. There is no universally correct answer — the choice depends on the application's downtime tolerance, the team's operational maturity, and the infrastructure budget.

Infrastructure as code has shifted from a best practice to a baseline expectation. Manual infrastructure changes create drift, make environments unreproducible, and eliminate the ability to audit what changed and when. However, adopting IaC introduces its own challenges — state management, secret handling, module versioning, and blast radius control — that must be addressed in the architecture rather than discovered during an incident.

Common Decisions (ADR Triggers)¶

ADR: Deployment Strategy Selection¶

Context: The application requires a deployment model that balances availability, rollback speed, and infrastructure cost.

Options:

Criterion	Rolling Update	Blue-Green	Canary	Recreate
Downtime	None (if health checks configured)	None (instant cutover)	None (gradual shift)	Yes (full restart)
Rollback Speed	Slow (must roll forward or re-deploy)	Instant (switch back to old environment)	Fast (shift traffic back)	Slow (full redeploy)
Infrastructure Cost	No extra capacity needed	2x capacity during deployment	Minimal extra (canary instances)	No extra capacity
Complexity	Low	Moderate (routing, environment sync)	High (traffic splitting, metrics analysis)	Lowest
Database Compatibility	Must handle mixed versions during rollout	Can run migrations between cutover	Must handle mixed versions	Single version at a time
Best Fit	Stateless microservices, Kubernetes default	Mission-critical with zero-downtime requirement	High-traffic services needing gradual validation	Dev/test environments, batch processing

Decision drivers: Downtime tolerance (SLA commitments), rollback time requirement, infrastructure budget, database migration complexity, and team operational maturity.

ADR: Infrastructure as Code Tool Selection¶

Context: The organization needs to manage infrastructure declaratively with version control, drift detection, and repeatable provisioning.

Options: - Terraform: Multi-cloud, largest ecosystem of providers, HCL language (purpose-built, easy to learn, limited expressiveness). Requires state file management (remote backend, state locking). Mature, battle-tested. Open-source core with BSL license (post-August 2023) or use OpenTofu fork (MPL). - Pulumi: Uses general-purpose languages (Python, TypeScript, Go, C#) instead of a DSL. Same state management concerns as Terraform. Better for teams that prefer real programming constructs (loops, conditionals, type checking). Smaller provider ecosystem. - CloudFormation / CDK: AWS-native, no state file to manage (AWS manages it). CDK generates CloudFormation from TypeScript/Python. AWS-only; not viable for multi-cloud. Deep integration with AWS services. - Crossplane: Kubernetes-native IaC using CRDs. Infrastructure managed like any other Kubernetes resource. Best for platform engineering teams building internal developer platforms. Steep learning curve, requires Kubernetes expertise.

Decision drivers: Cloud strategy (single vs. multi-cloud), team language preferences, state management tolerance, existing Kubernetes investment, and licensing considerations.

ADR: Database Migration Strategy During Deployment¶

Context: Application deployments frequently require database schema changes that must be coordinated with code deployment.

Options: - Expand-contract (recommended): Phase 1 deploys a backward-compatible schema change (add column, create table). Phase 2 deploys application code that uses the new schema. Phase 3 removes old columns/tables. Requires two deployments for breaking changes but guarantees zero-downtime and safe rollback. - Pre-deployment migration job: A migration runs before the new application version starts. Simpler workflow but the old application version must tolerate the new schema during rollout. Rollback requires a reverse migration. - Application-managed migration at startup: Each instance checks and applies migrations on boot (e.g., Flyway, Alembic auto-migrate). Risk of race conditions with multiple instances starting simultaneously. Requires distributed locking.

Decision drivers: Downtime tolerance, rollback requirements, team discipline for backward-compatible migrations, and database size (large tables make some ALTER operations impractical without online DDL tools like pt-online-schema-change or gh-ost).

ADR: Environment Promotion Strategy¶

Context: The organization needs a defined path for promoting changes from development to production with appropriate validation at each stage.

Options: - Three-environment (dev/staging/prod): Minimum viable pipeline. Staging mirrors production configuration. Automated tests gate promotion. Simple to manage, but staging may diverge from production over time. - Four-environment (dev/staging/pre-prod/prod): Pre-prod is a production-scale environment for performance testing and final validation. Higher infrastructure cost, better confidence. - Per-PR ephemeral environments: Spin up a full environment for each pull request using Kubernetes namespaces or serverless. Excellent developer experience, high infrastructure cost, complex to manage stateful dependencies.

Decision drivers: Infrastructure budget, deployment frequency target, need for performance testing, regulatory requirements for production-like validation, and team size.