AWS Networking¶

Scope¶

AWS networking services beyond VPC fundamentals. Covers Transit Gateway (multi-VPC, multi-account, multi-region, peering, route tables), Direct Connect (dedicated vs hosted, LAG, redundancy, MACsec), PrivateLink and VPC Endpoints (interface vs gateway, endpoint policies, producer-side endpoint services, Gateway Load Balancer endpoints), Network Firewall (stateful inspection, IDS/IPS, rule groups), Global Accelerator (anycast, endpoint groups), Route 53 Resolver (hybrid DNS forwarding), VPC Lattice (application-layer service-to-service), VPC peering vs Transit Gateway decision framework, NACLs vs Security Groups design philosophy, multi-account networking patterns, and IPv6 dual-stack adoption.

Checklist¶

Why This Matters¶

AWS networking services form the connective tissue between every workload, account, and region. A poorly designed Transit Gateway topology creates routing black holes or allows lateral movement between environments that should be isolated. Missing Direct Connect redundancy means a single fiber cut takes down hybrid connectivity to an entire region. Overly permissive VPC endpoint policies allow any principal in the VPC to access services they should not reach, bypassing IAM policies that assume traffic arrives from the public endpoint.

Network Firewall fills the gap between Security Groups (which cannot inspect payload content) and third-party appliances (which require managing EC2 instances, licensing, and scaling). Without centralized traffic inspection, east-west traffic between VPCs traverses Transit Gateway without any deep packet inspection, meaning a compromised workload in one VPC can communicate freely with resources in another.

The Transit Gateway vs VPC Peering decision has long-term cost and operational implications. Transit Gateway charges $0.02/GB for data processing, which is significant at high throughput, but VPC Peering does not support transitive routing or centralized inspection. Choosing the wrong model early leads to a painful re-architecture as the network grows. Similarly, skipping Global Accelerator for latency-sensitive global applications forces traffic through the public internet, adding 50-200ms of latency that a global anycast network eliminates.

Common Decisions (ADR Triggers)¶

Transit Gateway vs VPC Peering -- Transit Gateway provides centralized routing, transitive connectivity, traffic inspection insertion, and scales to thousands of VPCs but charges $0.02/GB data processing; VPC Peering is free for data transfer (same-region), lower latency, but non-transitive and unmanageable beyond a handful of connections
Direct Connect dedicated vs hosted -- dedicated connections (1/10/100 Gbps) provide a physical port you own with MACsec encryption support and LAG aggregation; hosted connections (50 Mbps-10 Gbps) are provisioned by a partner with lower commitment but no MACsec and shared infrastructure
Direct Connect redundancy model -- single connection (dev/test only), dual connections at one location (tolerates device failure), dual connections at two locations (tolerates facility failure, recommended minimum for production), four connections at two locations (maximum resiliency)
Network Firewall vs third-party NVA -- managed service with AWS-native integration, auto-scaling, and Suricata-compatible rules vs Palo Alto, Fortinet, or Check Point for teams with existing rule sets, advanced features (SSL/TLS decryption with custom CAs), or multi-cloud consistency
Centralized vs distributed inspection -- centralized inspection VPC with Transit Gateway routing (simpler management, single chokepoint) vs distributed firewall endpoints per VPC (lower latency, higher throughput, more complex management)
VPC endpoint strategy -- deploy interface endpoints in every VPC (simple routing, higher cost) vs centralized endpoints in a shared services VPC accessed via Transit Gateway (lower cost, adds Transit Gateway data processing charges and latency)
PrivateLink producer pattern -- expose internal services to consumers via NLB-backed endpoint service (L4, TLS passthrough, allowed-principals authorization, no transitive networking required) vs VPC peering / Transit Gateway (full L3 reachability, requires non-overlapping CIDRs, broader blast radius); PrivateLink is preferred for SaaS-style one-to-many exposure and cross-account boundaries where the producer wants no routing relationship to consumer VPCs
Centralized inspection: Network Firewall vs Gateway Load Balancer -- Network Firewall is AWS-managed with Suricata rules and no appliance management; GWLB + GWLBe enables third-party NVAs (Palo Alto, Fortinet, Check Point) with existing rule sets, SSL/TLS decryption with custom CAs, and multi-cloud rule consistency, at the cost of managing the appliance fleet
Global Accelerator vs CloudFront -- Global Accelerator for non-HTTP/TCP/UDP workloads, gaming, IoT, and static IP requirements; CloudFront for HTTP/S content delivery with caching; both use the AWS global edge network
VPC Lattice vs Transit Gateway for service mesh -- VPC Lattice for L7 service-to-service with IAM auth and path-based routing (application team-managed); Transit Gateway for L3/L4 network connectivity (network team-managed); can coexist
IPv6 adoption strategy -- dual-stack from day one (avoids retrofit), IPv6-only for new workloads (simplifies addressing, reduces NAT costs), or IPv4-only with future migration plan

Pricing Links¶

AWS Pricing Pages¶

Transit Gateway Pricing — $0.05/hr per attachment + $0.02/GB data processed
Direct Connect Pricing — port-hour fees by speed (1 Gbps: ~$0.30/hr, 10 Gbps: ~$1.50/hr) + data transfer out rates by region
PrivateLink / VPC Endpoint Pricing — interface endpoints: $0.01/hr per AZ + $0.01/GB data processed; Gateway endpoints (S3, DynamoDB): free
AWS Network Firewall Pricing — $0.395/hr per firewall endpoint + $0.065/GB data processed
Global Accelerator Pricing — $0.025/hr per accelerator + $0.015-$0.035/GB data transfer premium (varies by region)
VPC Peering Pricing — no charge for peering connection; standard cross-AZ ($0.01/GB) and cross-region data transfer rates apply
VPC Lattice Pricing — $0.025/hr per service + $0.025/GB data processed
AWS Pricing Calculator — interactive cost estimation tool

Common Cost Surprises¶

Transit Gateway data processing at scale — $0.02/GB applies to all traffic traversing the Transit Gateway. A hub-spoke architecture with 20 TB/mo of east-west traffic pays $400/mo in data processing alone. VPC Peering for high-throughput same-region connections avoids this charge entirely.
Network Firewall always-on cost — each firewall endpoint costs $0.395/hr (~$288/mo) regardless of traffic volume. Deploying in 3 AZs across an inspection VPC costs ~$864/mo before any data processing charges. Plus $0.065/GB processed. A production deployment inspecting 10 TB/mo pays ~$1,514/mo.
Interface endpoint multiplication — each interface endpoint costs $0.01/hr per AZ (~$7.20/mo per AZ). Deploying 10 endpoints across 3 AZs costs $216/mo. In a multi-account environment with endpoints per VPC, costs multiply rapidly. Centralizing endpoints in a shared VPC reduces cost but adds Transit Gateway charges.
Direct Connect data transfer asymmetry — inbound data over Direct Connect is free, but outbound data transfer rates vary by region ($0.02-$0.08/GB). A workload sending 10 TB/mo outbound over Direct Connect in US East pays ~$200/mo in transfer charges alone, on top of port fees.
Global Accelerator data transfer premium — Global Accelerator charges a DT premium on top of standard EC2 data transfer. For US/EU traffic this adds ~$0.015/GB. An application serving 50 TB/mo globally pays ~$750/mo in accelerator DT premium plus the $18/mo base fee.
Transit Gateway inter-region peering — inter-region peering connections incur standard cross-region data transfer rates ($0.02/GB) in addition to the per-attachment hourly charge. Multi-region architectures with significant cross-region traffic should evaluate whether direct inter-region VPC peering is cheaper for specific high-throughput paths.

NLB Target Groups: Attributes That Are Load-Bearing¶

NLB target groups carry several attributes that change the security and operational characteristics of any PrivateLink-touching workload. The single most consequential one is preserve_client_ip, but several others are worth knowing.

`preserve_client_ip` and the target SG design¶

When preserve_client_ip = true on an NLB target group with target type instance, the target instance sees the actual consumer client source IP in the packet, not the NLB's internal IP. This changes the security group design completely:

preserve_client_ip = false (default for ip target type, can be set on instance): the target SG admits the NLB's internal IPs, which are within the VPC CIDR. SG ingress on the application port from the VPC CIDR is the typical pattern. This works.
preserve_client_ip = true (default for instance target type, can be set on ip): the target SG must admit the consumer source IPs, not the VPC CIDR. The NLB does not appear as a source — only the original client does. SG ingress on the VPC CIDR is wrong; the SG either needs the consumer IP ranges directly (a customer-managed prefix list is the right shape) or it needs 0.0.0.0/0 on the application port (which is usually too broad).

The audit consequence: a target SG that "looks correct" for a preserve_client_ip = true target group is silently broken. The reviewer must check the target group attribute, not just the SG rules. See providers/aws/security-groups.md for the SG side of the same pattern.

Other target group attributes worth knowing¶

target_type — instance, ip, lambda, or alb. instance registers EC2 instances by ID. ip registers IPs (allowing on-prem targets reached via Direct Connect or VPN, or PrivateLink-routed targets in other VPCs). lambda registers a Lambda function. alb registers an ALB as a target so an NLB fronts the ALB (the right pattern for "PrivateLink-fronted HTTP service" — NLB for the static IP and PrivateLink integration, ALB for the L7 features).
deregistration_delay.timeout_seconds — default 300s. How long the target group waits for in-flight connections to drain before fully deregistering a target. Tune higher (up to 3600s) for long-lived connections, lower (30s) for stateless services where the long default delays autoscaling responsiveness.
cross_zone.enabled — at the target group level, this is independent of the LB-level cross-zone setting and confusingly named. When true, the LB can route a request received in AZ-A to a target registered in AZ-B. When false, requests stay zone-local. Cross-zone load balancing on NLB has cross-AZ data transfer cost implications ($0.01/GB) that the LB-level cross-zone setting does not warn about. Decide per target group, not per LB.
stickiness.enabled — for non-HTTP target groups, NLB stickiness uses source IP hashing and is a coarser tool than ALB cookie stickiness. Useful for protocols that need session affinity (e.g., gRPC streaming, certain database protocols).
Health check protocol/interval/threshold — defaults are TCP / 30s interval / 3 healthy threshold / 3 unhealthy threshold. The defaults are too slow for any production workload that needs fast failover; tune the interval down to 10s and the unhealthy threshold to 2 for latency-sensitive workloads. The flip side: too-aggressive health checks can flap on transient issues. Tune deliberately, not by accident.
Mixed-protocol listeners — an NLB can carry both TLS and TCP listeners targeting the same target group. The audit gotcha: if one listener is TCP (cleartext) and the workload is supposed to be TLS-only, the cleartext listener is a finding. Inventory the listeners, not just the target groups.

The single-AZ deployment trap¶

A common deployment shape is a multi-AZ NLB with all targets registered in a single AZ (for example, because the targets are in a single-instance ASG, or because someone forgot to spread the targets across zones). NLBs use Route 53 DNS to return per-AZ IPs to clients. If a client resolves to an AZ that has no healthy targets, the request blackholes — there is no cross-zone fallback unless cross_zone.enabled = true.

This shape is invisible to most monitoring because the LB itself is healthy and the targets are healthy. Only the per-zone DNS resolution exposes the bug. Symptoms: roughly half (or one-third in 3-AZ deployments) of requests fail with timeout, but the other requests succeed normally. The fix: either spread targets across all AZs the NLB serves, or enable cross_zone.enabled = true on the target group with explicit acceptance of the data transfer cost.

The audit signal: any target group with all targets in one AZ behind a multi-AZ NLB. Capture target group registrations from the bundle and check the AZ distribution.

Reference Links¶

AWS Transit Gateway Documentation — architecture, route tables, multicast, peering, and Connect attachments
AWS Direct Connect User Guide — connection types, LAG, virtual interfaces, MACsec, and resiliency recommendations
AWS PrivateLink Documentation — interface endpoints, gateway endpoints, endpoint services, and endpoint policies
Create an endpoint service (PrivateLink producer) — NLB/GWLB-backed endpoint services, allowed principals, acceptance settings, and private DNS name verification
VPC endpoint policies — resource-policy syntax for restricting principals, actions, and resources reachable through interface and gateway endpoints (data-perimeter pattern)
Gateway Load Balancer Documentation — GENEVE-encapsulated traffic insertion to third-party security appliances via PrivateLink (GWLBe)
AWS Network Firewall Documentation — deployment models, stateful/stateless rule groups, Suricata-compatible rules, and logging
AWS Global Accelerator Documentation — standard vs custom routing accelerators, endpoint groups, and health checks
Amazon VPC Lattice Documentation — service networks, services, target groups, listeners, auth policies, and observability
AWS Architecture Center: Networking & Content Delivery — curated reference architectures for multi-VPC, hybrid, and global networking
AWS Prescriptive Guidance: Multi-account network architecture — Transit Gateway hub-spoke with centralized inspection for AWS Organizations
AWS Direct Connect Resiliency Recommendations — interactive tool for selecting the appropriate redundancy model