5.5 KiB
🐘 Cassandra & ScyllaDB
Overview
Apache Cassandra is a distributed wide-column NoSQL database designed for high availability and linear scalability with no single point of failure. Inspired by the Amazon Dynamo paper (2007) and Google Bigtable. ScyllaDB is a C++ reimplementation compatible with the Cassandra protocol, with drastically lower latency and higher throughput.
Architecture (Dynamo-inspired)
Consistent hashing
Data is divided on a hash ring, each node is responsible for a token range:
0 ─── node A ─── hash(key1)
│
90 ─── node B ─── hash(key2)
│
180 ─── node C ─── hash(key3)
│
270 ─── node D ─── hash(key4)
- Adding/removing a node affects only K/N keys (thanks to virtual nodes)
- Virtual nodes (vnodes) — each physical node has ~100-200 tokens on the ring (more even distribution)
Quorum (N, R, W)
- N = replication factor (typically 3)
- R = read quorum (typically 2)
- W = write quorum (typically 2)
- Condition: R + W > N (for strong-ish consistency)
- Sloppy quorum — when a node is unavailable, data is temporarily stored on another
- Hinted handoff — temporary write with hint, data transferred upon recovery
Gossip protocol
Decentralized dissemination of membership information — each node periodically communicates with 1-3 random nodes. No central point of failure.
Vector clocks
Capturing causality of object versions. On conflict (partition merge), both versions are returned — application merges.
Merkle trees
Anti-entropy — hash tree for detecting divergence between replicas. Fast detection of which data ranges differ.
Write path
Client → Coordinator → [1. Write to commit log (disk)]
[2. Write to memtable (RAM)]
[3. Acknowledge client]
→ [4. Flush memtable → SSTable (periodically)]
→ [5. Compaction (merge SSTables)]
Read path
Client → Coordinator → [1. Check bloom filter]
[2. Check row cache / key cache]
[3. Read from SSTable (disk)]
[4. Merge with memtable]
[5. Repair if stale (read repair)]
Cassandra vs ScyllaDB
| Feature | Cassandra | ScyllaDB |
|---|---|---|
| Language | Java (JVM) | C++ (seastar framework) |
| Architecture | Thread-per-connection | Shared-nothing, CPU sharding |
| Latency | 5-20 ms (typical) | 1-3 ms (typical) |
| Throughput | Good | 5-10× higher on same HW |
| GC pauses | Yes (JVM) | No (no GC) |
| NUMA | OS-dependent | Native NUMA aware |
| Workload | Standard | High-throughput, real-time |
| Price | Open source | Open source + Enterprise |
Data model
- Keyspace = namespace (analogy to DB)
- Table = column definition (not schema-less)
- Partition key = hash key for ring distribution
- Clustering columns = ordering within a partition
- Primary key = Partition key + Clustering columns
Recommendations — where Cassandra is better
| Area | Cassandra | Competition | Why Cassandra |
|---|---|---|---|
| Write throughput | Linear scaling, no master bottleneck | PostgreSQL (master writes) | Every node writes, no single point of failure |
| Availability | AP from CAP — always writable | MongoDB (CP, primary down = read-only) | "Always-writeable" philosophy |
| Multi-DC | Native, per-DC replication | CockroachDB (complex) | Simple configuration, latency-tolerant |
| Time-series | Wide-row model, TTL, compaction | InfluxDB (specialized) | Can combine with other workloads |
| IoT / sensor data | Linear scaling, no master | MongoDB (sharding complex) | Predictable performance under growth |
| Geographic distribution | Native multi-DC, hinted handoff | Spanner (vendor lock-in) | Open source, no dependencies |
When to use Cassandra / ScyllaDB
- IoT / sensor data ingest — millions of writes/s, no data loss
- Time-series at massive scale — metrics, logs, event data
- User activity history — write-heavy workloads
- Multi-DC applications — data available in every location
- Recommendation systems — wide-row model for "what user has seen"
- Message / event store — high-throughput append with TTL
When to use something else
- Relations, JOINs, transactions → PostgreSQL (Cassandra has no JOINs, limited transactions)
- Full-text search → Elasticsearch
- Aggregation / OLAP → ClickHouse (Cassandra is not an analytical DB)
- Small data (< 100 GB) → PostgreSQL (Cassandra overhead not worth it)
- Frequent reads by secondary keys → DynamoDB (SADA indexes) — Cassandra has limited secondary indexes
ScyllaDB specific
ScyllaDB is advantageous when:
- You need 5-10× higher throughput on the same HW
- You have latency-sensitive workload (real-time scoring, ad-tech)
- You want to eliminate JVM/GC issues
- You need predictable performance (P99 < 5 ms)
Sources
References, books, and standards: sources/databases/sources.md
Recommended reading
| Paper / Book | Authors | Description |
|---|---|---|
| Dynamo: Amazon's Highly Available Key-value Store (SOSP 2007) | DeCandia et al. | Foundational paper for Cassandra architecture |
| Cassandra: The Definitive Guide (3rd ed.) | E. Hewitt | Comprehensive guide to deployment and operations |
Last revision: 2026-06-03