GCP Containers¶

Scope¶

GKE (Standard and Autopilot), Cloud Run (services, jobs, functions), GKE Enterprise (formerly Anthos), Artifact Registry, container security and fleet management.

Checklist¶

Why This Matters¶

GKE is the most mature managed Kubernetes offering across cloud providers, with Autopilot providing a unique pod-level abstraction that eliminates node management entirely. The choice between Autopilot and Standard has significant implications: Autopilot enforces security best practices (no privileged containers, no host network) and bills per pod resource request, while Standard gives full node access but requires managing node pools, OS patching, and capacity. Autopilot now supports GPU workloads, making it viable for AI/ML inference without managing GPU node pools manually. Workload Identity is critical because the alternative (node-level service accounts or mounted key files) creates significant security risks. Cloud Run provides a simpler serverless container model for request-driven workloads where Kubernetes complexity is not needed, with per-request billing that can be dramatically cheaper for sporadic traffic. Cloud Run Jobs extends this to batch workloads with defined start and end, supporting parallel task execution and scheduled runs. Cloud Run also supports GPU-attached instances for ML inference and multi-container deployments (sidecars) for workloads needing auxiliary containers.

Common Decisions (ADR Triggers)¶

GKE mode -- Autopilot (managed nodes, pod billing, security defaults, GPU support) vs Standard (full control, node billing, custom configurations)
GKE vs Cloud Run -- Kubernetes orchestration vs serverless containers, persistent workloads vs request-driven, team Kubernetes expertise
Cloud Run services vs Cloud Run Jobs -- always-listening request-driven services vs batch/scheduled tasks with defined completion, parallel task execution for data processing
Multi-cluster strategy -- single regional cluster vs multi-zonal vs multi-regional with Multi Cluster Ingress, fleet management
GitOps tooling -- Config Sync (managed, GKE Enterprise) vs Argo CD vs Flux, policy enforcement via Policy Controller vs standalone Gatekeeper
Image management -- Artifact Registry (regional, multi-region) vs self-hosted registry, vulnerability scanning policy (block critical CVEs)
Ingress model -- GKE Gateway Controller vs classic GCE Ingress vs nginx Ingress Controller vs Istio ingress gateway
GKE Enterprise adoption -- GKE on-prem vs GKE Enterprise on AWS/Azure vs GKE Attached Clusters, Connect gateway for fleet management
Cloud Run scaling -- min instances (cost vs cold start latency) vs zero-to-N scaling, CPU always-allocated (background tasks) vs request-only (cost savings)
GPU workloads -- GKE Autopilot GPU (managed, simpler) vs GKE Standard GPU node pools (full control) vs Cloud Run GPU (serverless inference)

Reference Architectures¶

Google Cloud Architecture Center: Containers -- reference architectures for GKE cluster design, multi-tenant patterns, and CI/CD pipelines
Google Cloud: GKE best practices -- comprehensive guidance on cluster setup, networking, security, and cost optimization
Google Cloud: Migrate to containers -- reference architecture for modernizing applications from VMs to GKE or Cloud Run
Google Cloud: Multi-cluster Kubernetes with GKE -- reference design for multi-region GKE with global load balancing and fleet management
Google Cloud: Cloud Run production best practices -- reference patterns for concurrency tuning, cold start mitigation, and cost optimization for serverless containers