AWS Database¶

Scope¶

Consolidated overview of AWS database services and selection framework. Covers database selection decision framework (relational vs NoSQL vs graph vs time-series vs in-memory vs ledger), DocumentDB (MongoDB-compatible), Neptune (graph), Timestream (time-series), Keyspaces (Cassandra-compatible), MemoryDB (Redis-compatible durable), QLDB (ledger), cross-service comparison, database connectivity patterns, and migration paths. RDS/Aurora, DynamoDB, and ElastiCache are covered in dedicated files and cross-referenced here.

Checklist¶

Cross-Service Comparison¶

Service	Data Model	Latency	Max Storage	Serverless Option	Multi-Region	Primary Use Case
RDS/Aurora	Relational	Low ms	128 TiB (Aurora)	Aurora Serverless v2	Global Database	OLTP, complex queries
DynamoDB	Key-value, document	Single-digit ms	Unlimited	On-demand mode	Global tables	Known access patterns at scale
ElastiCache	Key-value, data structures	Sub-ms	635 GiB/node	ElastiCache Serverless	No (single region)	Caching, ephemeral data
DocumentDB	Document (MongoDB)	Low ms	128 TiB	Elastic clusters	Global clusters	MongoDB-compatible workloads
Neptune	Graph (property + RDF)	Low ms	128 TiB	Neptune Serverless	Global Database	Relationships, knowledge graphs
Timestream	Time-series	Low ms	Unlimited	Serverless-native	No (single region)	IoT, DevOps metrics, analytics
Keyspaces	Wide-column (Cassandra)	Single-digit ms	Unlimited	Serverless-native	Multi-Region replication	Cassandra-compatible workloads
MemoryDB	Key-value, data structures	Sub-ms reads, low ms writes	635 GiB/node	No	Multi-Region (preview)	Durable in-memory primary DB
QLDB	Ledger (document)	Low ms	100 GB/ledger	Serverless-native	No	Immutable audit (end-of-support Jul 2025)

Database Connectivity Patterns¶

RDS Proxy¶

RDS Proxy sits between applications and RDS/Aurora databases, providing connection pooling, multiplexing, and faster failover. Critical for Lambda-to-database connectivity where rapid function scaling creates thousands of short-lived connections. RDS Proxy reduces failover time by up to 66% by maintaining connections to standby instances. Supports IAM authentication and Secrets Manager integration for credential management.

VPC Endpoints (PrivateLink)¶

Serverless database services (DynamoDB, Timestream, Keyspaces, QLDB) are accessed via public endpoints by default. VPC gateway endpoints (DynamoDB) and interface endpoints (Timestream, Keyspaces, QLDB, DocumentDB) keep traffic within the AWS network, avoiding NAT gateway data processing charges and improving security posture. Interface endpoints incur per-hour and per-GB charges but eliminate data transfer through NAT gateways.

Cross-Account and Cross-VPC Access¶

For multi-account architectures, database access patterns include VPC peering or Transit Gateway for VPC-deployed databases (RDS, Aurora, DocumentDB, Neptune, ElastiCache, MemoryDB), and VPC endpoints for serverless services. Resource Access Manager (RAM) can share Aurora DB clusters across accounts. Security group referencing works across peered VPCs using the account-id/sg-id format.

Migration Paths¶

Source	Target	Tool	Notes
On-premises MySQL/PostgreSQL	RDS/Aurora	DMS + SCT	Full load + CDC for minimal downtime
On-premises MongoDB	DocumentDB	DMS	Online migration with change streams
On-premises Cassandra	Keyspaces	cqlsh COPY or custom tooling	No native DMS support for Keyspaces as target
Self-managed Redis	MemoryDB or ElastiCache	Online migration tool or snapshot import	Snapshot-based for initial load, replication for cutover
DynamoDB	Aurora	DMS	When workload evolves to need relational queries
RDS/Aurora	DynamoDB	DMS or custom ETL	When access patterns become key-value dominant
Oracle/SQL Server	Aurora PostgreSQL	DMS + SCT	Heterogeneous migration with schema conversion
QLDB	DynamoDB + application audit	Custom export + DynamoDB Streams	Required before July 2025 end-of-support
Timestream	Timestream for InfluxDB	Export/import	For InfluxDB-compatible workloads

Why This Matters¶

AWS offers 15+ purpose-built database services, and selecting the wrong one is expensive to reverse. Each database is optimized for a specific data model and access pattern -- using a relational database for graph traversals or a key-value store for ad-hoc queries results in poor performance and escalating costs. The most common and costly mistake is defaulting to a relational database for every workload when a purpose-built service would be significantly more efficient.

DocumentDB is frequently misunderstood as a drop-in MongoDB replacement -- it implements the MongoDB wire protocol but has compatibility gaps that surface during migration. Neptune's value depends entirely on whether the workload genuinely requires relationship traversal; storing graph data that is only queried with simple lookups wastes the graph engine's overhead. Timestream provides significant cost savings over storing time-series data in relational databases but requires understanding its dual-tier storage model to avoid unexpected query costs on magnetic store data.

MemoryDB fills a gap between ElastiCache (fast but volatile) and traditional databases (durable but slower) -- it is appropriate when Redis data structures are needed as a primary database, not just a cache. Keyspaces eliminates Cassandra operational burden but does not support the full Cassandra feature set, making compatibility testing essential before migration.

QLDB reached end-of-support on July 31, 2025. Any existing QLDB ledgers must be migrated to alternative services. Do not recommend QLDB for new projects.

Common Decisions (ADR Triggers)¶

Relational vs NoSQL vs purpose-built -- RDS/Aurora for complex queries and ACID transactions vs DynamoDB for scale and known access patterns vs purpose-built (Neptune, Timestream, Keyspaces, MemoryDB) for specific data models; defaulting to relational increases cost and complexity for non-relational workloads
DocumentDB vs DynamoDB for document workloads -- DocumentDB for MongoDB-compatible applications requiring rich queries, aggregation pipelines, and secondary indexes with familiar MongoDB drivers vs DynamoDB for key-value/document workloads with known access patterns and massive scale; DocumentDB requires cluster management while DynamoDB is fully serverless
DocumentDB vs self-managed MongoDB on EC2 -- DocumentDB for reduced operational burden with trade-off of MongoDB compatibility gaps vs self-managed MongoDB for full feature compatibility (transactions, change streams resume, client-side encryption) with significant operational overhead
Neptune vs relational joins -- Neptune for workloads requiring multi-hop relationship traversal (social networks, fraud detection, recommendation engines, knowledge graphs) vs relational JOINs for simple 1-2 hop relationships where graph overhead is not justified
Neptune Serverless vs provisioned -- Serverless (scales in Neptune Capacity Units) for variable graph query workloads and development environments vs provisioned instances for predictable, high-throughput graph query workloads
Timestream vs time-series in Aurora/RDS -- Timestream for native time-series optimization (automatic partitioning, retention policies, interpolation functions, scheduled queries) vs PostgreSQL TimescaleDB extension in RDS for teams wanting a single relational engine for both transactional and time-series data
Keyspaces vs self-managed Cassandra -- Keyspaces for serverless Cassandra-compatible access with no cluster management vs self-managed Cassandra on EC2 for full feature compatibility, tunable consistency, and workloads requiring Cassandra-specific features not available in Keyspaces
MemoryDB vs ElastiCache -- MemoryDB when Redis data structures are the primary data store requiring durability (Multi-AZ transactional log, no data loss on failover) vs ElastiCache when used as a cache layer in front of another database (eventual data loss on failover acceptable); MemoryDB write latency is slightly higher due to durability guarantees
Single-engine vs polyglot persistence -- single database engine for operational simplicity vs multiple purpose-built engines for workload optimization; polyglot persistence increases operational overhead, monitoring complexity, and cross-service data consistency challenges
Database connectivity -- RDS Proxy for serverless-to-database connection pooling vs application-level pooling (PgBouncer, HikariCP) for container/EC2 workloads vs direct connections for low-connection-count applications

Reference Links¶

AWS Database Services overview -- service comparison and selection guidance for all AWS database offerings
AWS Prescriptive Guidance: Database strategy -- framework for selecting and migrating to purpose-built databases on AWS
Amazon DocumentDB Developer Guide -- cluster architecture, MongoDB compatibility, scaling, and security configuration
Amazon Neptune User Guide -- graph data model selection (property graph vs RDF), query languages, serverless configuration, and best practices
Amazon Timestream Developer Guide -- time-series data modeling, retention tiers, scheduled queries, and integration with IoT and DevOps tooling
Amazon Keyspaces Developer Guide -- CQL compatibility, capacity modes, table design, and migration from Apache Cassandra
Amazon MemoryDB Developer Guide -- durability architecture, Multi-AZ replication, and comparison with ElastiCache
AWS Database Migration Service User Guide -- heterogeneous and homogeneous migration patterns, Schema Conversion Tool, and change data capture
AWS Architecture Center: Databases -- reference architectures for database selection, migration, and multi-database patterns