Apache Cassandra¶

Scope¶

This file covers Apache Cassandra (and DataStax) architecture decisions: partition key and clustering column design, data modeling from query patterns, compaction strategy selection (STCS, LCS, TWCS), replication factor and consistency level tuning, multi-datacenter replication, cluster sizing and topology, DataStax Astra (managed Cassandra-as-a-Service), DataStax Enterprise (DSE) features, anti-patterns to avoid, and operational procedures (repair, compaction, upgrades). For general database strategy (engine selection, replication patterns, encryption), see general/data.md. For migration methodology and cutover planning, see general/database-migration.md.

Checklist¶

Why This Matters¶

Apache Cassandra provides linear horizontal scalability and multi-datacenter replication that few other databases match, but achieving these capabilities requires a fundamentally different approach to data modeling than relational databases. The most common Cassandra failure is not hardware — it is data model design that treats Cassandra like a relational database with flexible querying. In Cassandra, the data model must be designed from the query patterns backward: if you need data by user and by date, you create two tables with different partition keys, each denormalized to serve its specific query. Organizations that attempt to normalize data and use secondary indexes or ALLOW FILTERING discover that their "scalable" database performs worse than the single-node relational database it replaced.

Partition design drives everything in Cassandra. A partition that grows unboundedly — like storing all events for a user in a single partition — eventually causes garbage collection pauses, read timeouts, and compaction failures as the partition exceeds hundreds of megabytes. The Bucket pattern (adding a time bucket to the partition key, like user_id + month) distributes data across multiple partitions while preserving query locality. Getting partition design wrong requires table recreation and full data migration, as partition keys cannot be modified on existing tables.

Operational disciplines that are optional in other databases are mandatory in Cassandra. Repair must run regularly to prevent data resurrection from tombstone garbage collection. Compaction must be monitored to prevent disk exhaustion. JVM garbage collection must be tuned because Cassandra runs as a long-lived JVM process where GC pauses directly impact request latency. Organizations that deploy Cassandra without dedicated operational expertise inevitably encounter one of these issues in production, and the remediation is often significantly more disruptive than prevention.

Common Decisions (ADR Triggers)¶

ADR: Cassandra vs Other Distributed Databases¶

Context: The organization needs a horizontally scalable database and must evaluate Cassandra against alternatives.

Options:

Criterion	Apache Cassandra	Amazon DynamoDB	Google Cloud Spanner	CockroachDB
Consistency Model	Tunable (eventual to strong)	Strong (per-item)	Strong (global)	Strong (serializable)
Multi-Region	Native active-active	Global Tables	Native multi-region	Native multi-region
Data Model	Wide-column (CQL)	Key-value / document	Relational (SQL)	Relational (SQL)
Operational Overhead	High (self-hosted)	None (serverless)	None (managed)	Moderate (self-hosted)
Query Flexibility	Low (query-driven design)	Low (key-based access)	High (SQL joins)	High (SQL joins)
Cost Model	Infrastructure + ops	Per-request + storage	Per-node + storage	Infrastructure + ops

Decision drivers: Consistency requirements (strong vs eventual), data model complexity (relational joins needed vs key-based access), multi-region replication needs, operational team expertise, cloud provider commitment, and total cost of ownership.

ADR: DataStax Astra vs Self-Hosted Cassandra¶

Context: The organization has chosen Cassandra and must decide between managed (Astra) and self-hosted deployment.

Options: - DataStax Astra: Serverless pricing based on operations and storage. Zero operational overhead for repair, compaction, and upgrades. Multi-cloud availability. Stargate API layer for REST/GraphQL access. Limited CQL feature set. Higher per-operation cost at large scale. - Self-hosted (VM): Full control over configuration, compaction tuning, and version. Requires DBA expertise for repair scheduling, capacity planning, and upgrades. Lower per-GB cost at scale. Can use any Cassandra version or fork. Operational cost often exceeds infrastructure cost. - Self-hosted (Kubernetes with K8ssandra): K8ssandra operator provides automated deployment, repair (Reaper), and monitoring (Medusa for backups). Reduces operational burden vs raw VM deployment. Adds Kubernetes complexity. Requires understanding of StatefulSet behavior and persistent volume management. - DataStax Enterprise (DSE): Commercial distribution with added features (Search, Graph, Analytics). Includes management tools and support. Higher licensing cost. Features may justify cost if DSE Search or Graph replaces standalone components.

Decision drivers: Operational team Cassandra expertise, scale (small clusters favor Astra, large clusters favor self-hosted for cost), need for DSE-specific features, Kubernetes maturity, and support requirements.

ADR: Compaction Strategy Selection¶

Context: Each Cassandra table must use a compaction strategy that matches its read/write pattern.

Options: - SizeTieredCompactionStrategy (STCS): Default strategy. Groups similarly-sized SSTables for compaction. Best for write-heavy workloads. Requires 50% free disk for major compactions. Read amplification increases between compactions as data spreads across many SSTables. Worst-case temporary space usage is the highest of all strategies. - LeveledCompactionStrategy (LCS): Organizes SSTables into levels with guaranteed non-overlapping key ranges per level. Read amplification is low (typically 1-2 SSTables per read). Write amplification is higher (each write may be compacted multiple times across levels). Requires only 10% free disk. Best for read-heavy workloads or workloads with frequent updates. - TimeWindowCompactionStrategy (TWCS): Creates SSTables per time window (e.g., 1 hour). SSTables within the same window are compacted together. Never compacts across time windows. Optimal for time-series data with TTLs where data is never updated after insertion. Entire windows expire simultaneously, avoiding tombstone overhead.

Decision drivers: Read vs write ratio, data mutation frequency (updates vs append-only), TTL usage pattern, available disk space, and latency sensitivity for reads.

Reference Links¶

Apache Cassandra Documentation -- architecture, CQL, configuration, and operational procedures
DataStax Cassandra Documentation -- DataStax Enterprise features, Astra DB, drivers, and best practices
Cassandra Data Modeling Guide -- query-driven data modeling, partition design, and denormalization patterns
K8ssandra Documentation -- Kubernetes operator for Cassandra with Reaper, Medusa, and Stargate
The Last Pickle - Cassandra Blog -- advanced Cassandra operations, repair strategies, and performance tuning