GCP DNS (Cloud DNS, Cloud Service Mesh, Global Load Balancing)¶
Scope¶
Cloud DNS (public and private zones, DNSSEC, routing policies), Cloud Service Mesh (unified product combining the former Traffic Director managed control plane and Anthos Service Mesh managed service mesh), Global and Regional Load Balancing, hybrid DNS integration.
Checklist¶
- [Critical] Choose zone type: public zones for internet-facing DNS resolution vs private zones for VPC-internal name resolution; private zones are associated with specific VPC networks and override public DNS resolution for the same domain within those VPCs
- [Recommended] Enable DNSSEC for public zones: Cloud DNS supports both DNSSEC signing (managed key rotation) and DNSSEC validation; establish chain of trust by adding DS records at the domain registrar; monitor DNSSEC status via Cloud Monitoring
- [Recommended] Configure Cloud DNS routing policies for traffic management: weighted round robin (traffic splitting by percentage for canary deployments), geolocation (route by client geography to specific IP sets), failover (primary/backup with health-checked targets); routing policies are set per record set
- [Recommended] Set up health checks for DNS failover routing: Cloud DNS failover policies require health-checked targets via Google Cloud load balancers or direct health check resources; configure health check intervals, thresholds, and protocol (HTTP, HTTPS, TCP) appropriate to the backend
- [Recommended] Design private zones for internal service discovery: use private zones (e.g., internal.mycompany.com) linked to VPCs for internal DNS; enable inbound DNS forwarding for on-premises clients to resolve GCP private zones; configure outbound DNS forwarding for GCP workloads to resolve on-premises DNS
- [Optional] Configure Cloud DNS peering for cross-VPC resolution: DNS peering forwards queries from one VPC to another VPC's private zone without full VPC peering; useful in hub-and-spoke network topologies where spoke VPCs need to resolve hub private zones
- [Recommended] Plan Global External Application Load Balancer routing for HTTP workloads: URL maps with host rules and path matchers for content-based routing, traffic splitting for canary deployments (5%/95% weighted backend services), header-based routing for A/B testing; provides Layer 7 routing as an alternative to DNS-level routing
- [Recommended] Configure Cloud Service Mesh for service mesh traffic management: managed control plane for Envoy proxies on VMs or GKE pods; supports weighted traffic splitting, request mirroring, fault injection, circuit breaking, and locality-aware load balancing; provides unified mTLS, observability, and authorization policies; operates at the application layer independent of DNS
- [Recommended] Set appropriate TTL values on DNS records: lower TTL (30-60s) for records subject to failover or deployment changes; higher TTL (300-3600s) for stable records; Cloud DNS charges per query so higher TTLs reduce cost for high-traffic domains
- [Recommended] Implement split-horizon DNS: use both a public zone and a private zone for the same domain; internal VPC clients resolve private IPs from the private zone while external clients resolve public IPs from the public zone; avoids traffic hairpinning through external load balancers for internal service-to-service communication
- [Recommended] Configure Cloud DNS for GKE: kube-dns or CoreDNS integrates with Cloud DNS private zones for pod-to-service resolution; use Cloud DNS for GKE (managed DNS within GKE) for better scalability and reduced kube-dns resource consumption; supports Kubernetes DNS policies
- [Optional] Monitor DNS with Cloud Monitoring: query volume metrics, DNSSEC validation status, private zone resolution latency; set up alerts on DNS query errors and zone propagation delays; enable DNS query logging for security analysis and troubleshooting
Why This Matters¶
DNS is the foundation of service discovery and traffic routing in GCP. Cloud DNS provides 100% availability SLA for authoritative DNS, but misconfigured routing policies, missing health checks, or incorrect private zone associations silently direct traffic to wrong or unhealthy endpoints. Unlike AWS Route 53 which combines DNS and traffic routing, GCP separates these concerns across Cloud DNS (authoritative DNS), Global Load Balancer (Layer 7 traffic routing), and Cloud Service Mesh (service mesh control plane -- unified product combining the former Traffic Director managed control plane and Anthos Service Mesh managed service mesh into a single offering).
This separation provides more flexibility but requires understanding which layer to use for each routing requirement. DNS-level routing (Cloud DNS routing policies) works for any protocol but has TTL-dependent failover. Load balancer routing (URL maps, traffic splitting) works only for HTTP/HTTPS but provides instant failover and sophisticated content-based routing. Cloud Service Mesh provides the most advanced routing (header-based, fault injection, circuit breaking) plus mTLS, distributed tracing, and authorization policies, but requires Envoy proxy deployment.
Private DNS zone configuration is critical for Private Service Connect and private Google API access. Without proper private zones for googleapis.com (restricted or private), VPC traffic to Google APIs traverses the public internet even when Private Google Access is enabled. This is a common security gap in enterprise GCP deployments.
Common Decisions (ADR Triggers)¶
- Cloud DNS routing policies vs Global Load Balancer for traffic management -- Cloud DNS routing policies work at the DNS layer for any protocol (TCP, UDP, gRPC) with TTL-dependent failover speed. Global External Application Load Balancer works at Layer 7 (HTTP/HTTPS) with instant failover, content-based routing (URL path, headers), and integrated WAF (Cloud Armor). Use Cloud DNS routing for non-HTTP workloads or multi-cloud/hybrid routing. Use Global Load Balancer for web applications needing advanced routing, WAF, or instant failover.
- Cloud DNS failover vs Global Load Balancer health checks -- Cloud DNS failover routing uses health checks to switch from primary to backup record sets at the DNS layer (failover time = health check detection + DNS TTL). Global Load Balancer removes unhealthy backends from the serving pool within seconds (no DNS dependency). Use Global Load Balancer for HTTP workloads requiring fast failover. Use Cloud DNS failover for non-HTTP workloads or when routing must be DNS-based (multi-cloud, external endpoints).
- Cloud Service Mesh vs standalone Istio -- Cloud Service Mesh is the Google-managed service mesh providing a unified control plane for Envoy sidecars with mTLS, traffic management, observability, and authorization policies. Standalone Istio provides equivalent functionality but requires self-managing the control plane. Use Cloud Service Mesh for GKE-based workloads wanting managed mesh operations. Use standalone Istio for multi-cloud portability or specific Istio version requirements.
- DNS peering vs shared VPC for cross-project DNS -- DNS peering forwards DNS queries from one VPC to another for resolution against private zones. Shared VPC provides a common network (and DNS resolution) across projects. DNS peering is lighter-weight and does not require network-level integration. Shared VPC provides full network connectivity and centralized DNS. Use DNS peering when only DNS resolution (not network connectivity) is needed across VPCs.
- Cloud DNS for GKE vs default kube-dns -- Default kube-dns runs as pods within the cluster, consuming cluster resources and requiring manual scaling for large clusters. Cloud DNS for GKE offloads DNS resolution to Cloud DNS infrastructure with automatic scaling and higher query capacity. Use Cloud DNS for GKE in production clusters, especially those with high DNS query volumes or many services. Default kube-dns is adequate for small development clusters.
Reference Architectures¶
Multi-Region Active/Active Web Application¶
Global External Application Load Balancer with anycast IP -> backend services in us-central1 and europe-west1. URL map with default backend service using capacity-based load balancing (traffic distributed by backend utilization). Health checks on /healthz endpoint, 5-second interval, 2 consecutive failures for removal. Cloud DNS public zone with A record pointing to the load balancer's anycast IP. Cloud Armor WAF policy attached to the backend service. Failover is instant (backend removal from load balancer pool, no DNS propagation delay).
Hybrid DNS with On-Premises Integration¶
Cloud DNS private zone (gcp.company.internal) linked to hub VPC with inbound DNS forwarding enabled (Cloud DNS inbound policy creates forwarding IP addresses). On-premises DNS servers configured to forward gcp.company.internal queries to the inbound forwarding IPs. Cloud DNS outbound forwarding zone (onprem.company.internal) forwarding to on-premises DNS servers (10.0.1.10, 10.0.1.11) via Cloud VPN or Interconnect. Spoke VPCs resolve gcp.company.internal via DNS peering to hub VPC.
Canary Deployment with Traffic Splitting¶
Global External Application Load Balancer with URL map traffic splitting: stable backend service (95% weight) and canary backend service (5% weight). Both backend services backed by GKE NEGs (Network Endpoint Groups) in the same cluster. Cloud DNS public zone with A record pointing to load balancer anycast IP. Gradual weight adjustment (5% -> 25% -> 50% -> 100%) during rollout. Cloud Monitoring dashboards comparing error rates and latency between stable and canary backends. Automatic rollback via Cloud Deploy if canary error rate exceeds threshold.
Service Mesh with Cloud Service Mesh¶
Cloud Service Mesh configured as managed control plane. Envoy sidecar proxies on GKE pods (or GCE VMs) receiving routing configuration from Cloud Service Mesh. Routing rules: 90% traffic to v1 backend, 10% to v2 backend, with header-based routing (x-debug-routing: v2 always routes to v2). Fault injection rules for resilience testing (inject 500 errors to 1% of requests). Circuit breaker on backend services (max 1000 concurrent requests, 5 consecutive 5xx triggers open circuit). Health checks via Envoy with outlier detection (ejecting backends with > 5% error rate). mTLS enforced between all services in the mesh.
Reference Links¶
- Cloud DNS documentation -- public zones, private zones, DNSSEC, and routing policies
- Cloud DNS routing policies -- weighted round robin, geolocation, and failover routing
- Cloud Load Balancing documentation -- global and regional load balancers, URL maps, and traffic splitting
- Cloud Service Mesh documentation -- managed Envoy control plane, mTLS, traffic management, and observability
See Also¶
general/hybrid-dns.md-- hybrid DNS architecture patternsproviders/gcp/networking.md-- GCP VPC and load balancing integrationproviders/gcp/disaster-recovery.md-- Cloud DNS failover routing for DR