Container Orchestration¶

Scope¶

This file covers cloud-agnostic container orchestration decisions — whether to containerize, which orchestration platform to use, and how to design the container platform (networking, storage, security, multi-tenancy). It addresses the what before provider-specific files cover the how. It does not cover provider-specific managed Kubernetes implementations (see provider container files), CI/CD pipeline design (see general/deployment.md), or application architecture patterns like microservices (see patterns/microservices.md).

Checklist¶

Why This Matters¶

Container orchestration is the most consequential platform decision in modern infrastructure. Choosing the wrong abstraction level — over-investing in Kubernetes for a small team running a handful of services, or under-investing in orchestration for a growing microservices fleet — creates years of technical debt. Teams that adopt Kubernetes without dedicated platform engineering capacity frequently end up with insecure, unreliable clusters that are harder to operate than the VMs they replaced.

The decisions in this file cascade through every other architectural concern. Namespace design affects security boundaries, network policy scope, and RBAC complexity. Storage strategy determines whether stateful workloads can run reliably or will suffer data loss during node failures. Ingress and service mesh choices affect latency, observability, and the ability to do canary deployments. Getting these foundational decisions wrong is expensive to reverse because workloads, CI/CD pipelines, and operational runbooks all build on top of them.

A common failure pattern is treating container orchestration as purely an infrastructure concern. In practice, it requires close collaboration between platform teams and application developers on image build standards, resource sizing, health check design, and graceful shutdown behavior. Organizations that skip this alignment end up with clusters full of misconfigured workloads — containers without resource limits consuming entire nodes, missing readiness probes causing traffic to hit unready pods, and images pulled from public registries without vulnerability scanning.

Common Decisions (ADR Triggers)¶

Containers vs VMs vs serverless — which workload types go where and why
Orchestration platform selection — Kubernetes vs ECS vs Nomad vs managed service
Managed vs self-managed Kubernetes — operational model and control plane ownership
Container runtime — containerd vs CRI-O vs sandboxed runtimes for sensitive workloads
Image registry and supply chain security — private registry, scanning policy, image signing
Cluster topology — single cluster vs multi-cluster, cluster-per-environment vs shared clusters
Namespace and tenancy model — how teams, environments, and applications map to namespaces
Ingress controller and service mesh — whether to adopt a service mesh and which one
CNI plugin selection — overlay vs routable networking, network policy engine
Persistent storage strategy — in-cluster stateful workloads vs external managed services
GitOps adoption — ArgoCD vs Flux vs imperative deployment pipelines

Container Orchestration¶

Scope¶

Checklist¶

Why This Matters¶

Common Decisions (ADR Triggers)¶

Reference Links¶

See Also¶