Files

Stanislav Hubacek 3fa11ef0f6 comiiit

2026-06-11 15:27:28 +02:00

5.5 KiB

Raw Blame History

🐘 Cassandra & ScyllaDB

Overview

Apache Cassandra is a distributed wide-column NoSQL database designed for high availability and linear scalability with no single point of failure. Inspired by the Amazon Dynamo paper (2007) and Google Bigtable. ScyllaDB is a C++ reimplementation compatible with the Cassandra protocol, with drastically lower latency and higher throughput.

Architecture (Dynamo-inspired)

Consistent hashing

Data is divided on a hash ring, each node is responsible for a token range:

   0 ─── node A ─── hash(key1)
         │
   90 ─── node B ─── hash(key2)
         │
  180 ─── node C ─── hash(key3)
         │
  270 ─── node D ─── hash(key4)

Adding/removing a node affects only K/N keys (thanks to virtual nodes)
Virtual nodes (vnodes) — each physical node has ~100-200 tokens on the ring (more even distribution)

Quorum (N, R, W)

N = replication factor (typically 3)
R = read quorum (typically 2)
W = write quorum (typically 2)
Condition: R + W > N (for strong-ish consistency)
Sloppy quorum — when a node is unavailable, data is temporarily stored on another
Hinted handoff — temporary write with hint, data transferred upon recovery

Gossip protocol

Decentralized dissemination of membership information — each node periodically communicates with 1-3 random nodes. No central point of failure.

Vector clocks

Capturing causality of object versions. On conflict (partition merge), both versions are returned — application merges.

Merkle trees

Anti-entropy — hash tree for detecting divergence between replicas. Fast detection of which data ranges differ.

Write path

Client → Coordinator → [1. Write to commit log (disk)]
                       [2. Write to memtable (RAM)]
                       [3. Acknowledge client]
                       → [4. Flush memtable → SSTable (periodically)]
                       → [5. Compaction (merge SSTables)]

Read path

Client → Coordinator → [1. Check bloom filter]
                       [2. Check row cache / key cache]
                       [3. Read from SSTable (disk)]
                       [4. Merge with memtable]
                       [5. Repair if stale (read repair)]

Cassandra vs ScyllaDB

Feature	Cassandra	ScyllaDB
Language	Java (JVM)	C++ (seastar framework)
Architecture	Thread-per-connection	Shared-nothing, CPU sharding
Latency	5-20 ms (typical)	1-3 ms (typical)
Throughput	Good	5-10× higher on same HW
GC pauses	Yes (JVM)	No (no GC)
NUMA	OS-dependent	Native NUMA aware
Workload	Standard	High-throughput, real-time
Price	Open source	Open source + Enterprise

Data model

Keyspace = namespace (analogy to DB)
Table = column definition (not schema-less)
Partition key = hash key for ring distribution
Clustering columns = ordering within a partition
Primary key = Partition key + Clustering columns

Recommendations — where Cassandra is better

Area	Cassandra	Competition	Why Cassandra
Write throughput	Linear scaling, no master bottleneck	PostgreSQL (master writes)	Every node writes, no single point of failure
Availability	AP from CAP — always writable	MongoDB (CP, primary down = read-only)	"Always-writeable" philosophy
Multi-DC	Native, per-DC replication	CockroachDB (complex)	Simple configuration, latency-tolerant
Time-series	Wide-row model, TTL, compaction	InfluxDB (specialized)	Can combine with other workloads
IoT / sensor data	Linear scaling, no master	MongoDB (sharding complex)	Predictable performance under growth
Geographic distribution	Native multi-DC, hinted handoff	Spanner (vendor lock-in)	Open source, no dependencies

When to use Cassandra / ScyllaDB

IoT / sensor data ingest — millions of writes/s, no data loss
Time-series at massive scale — metrics, logs, event data
User activity history — write-heavy workloads
Multi-DC applications — data available in every location
Recommendation systems — wide-row model for "what user has seen"
Message / event store — high-throughput append with TTL

When to use something else

Relations, JOINs, transactions → PostgreSQL (Cassandra has no JOINs, limited transactions)
Full-text search → Elasticsearch
Aggregation / OLAP → ClickHouse (Cassandra is not an analytical DB)
Small data (< 100 GB) → PostgreSQL (Cassandra overhead not worth it)
Frequent reads by secondary keys → DynamoDB (SADA indexes) — Cassandra has limited secondary indexes

ScyllaDB specific

ScyllaDB is advantageous when:

You need 5-10× higher throughput on the same HW
You have latency-sensitive workload (real-time scoring, ad-tech)
You want to eliminate JVM/GC issues
You need predictable performance (P99 < 5 ms)

Sources

References, books, and standards: sources/databases/sources.md

Paper / Book	Authors	Description
Dynamo: Amazon's Highly Available Key-value Store (SOSP 2007)	DeCandia et al.	Foundational paper for Cassandra architecture
Cassandra: The Definitive Guide (3rd ed.)	E. Hewitt	Comprehensive guide to deployment and operations

5.5 KiB Raw Blame History Unescape Escape