Files
knowledge-base/CASSANDRA.en.md
Stanislav Hubacek 3fa11ef0f6 comiiit
2026-06-11 15:27:28 +02:00

5.5 KiB
Raw Blame History

🐘 Cassandra & ScyllaDB

Overview

Apache Cassandra is a distributed wide-column NoSQL database designed for high availability and linear scalability with no single point of failure. Inspired by the Amazon Dynamo paper (2007) and Google Bigtable. ScyllaDB is a C++ reimplementation compatible with the Cassandra protocol, with drastically lower latency and higher throughput.

Architecture (Dynamo-inspired)

Consistent hashing

Data is divided on a hash ring, each node is responsible for a token range:

   0 ─── node A ─── hash(key1)
         │
   90 ─── node B ─── hash(key2)
         │
  180 ─── node C ─── hash(key3)
         │
  270 ─── node D ─── hash(key4)
  • Adding/removing a node affects only K/N keys (thanks to virtual nodes)
  • Virtual nodes (vnodes) — each physical node has ~100-200 tokens on the ring (more even distribution)

Quorum (N, R, W)

  • N = replication factor (typically 3)
  • R = read quorum (typically 2)
  • W = write quorum (typically 2)
  • Condition: R + W > N (for strong-ish consistency)
  • Sloppy quorum — when a node is unavailable, data is temporarily stored on another
  • Hinted handoff — temporary write with hint, data transferred upon recovery

Gossip protocol

Decentralized dissemination of membership information — each node periodically communicates with 1-3 random nodes. No central point of failure.

Vector clocks

Capturing causality of object versions. On conflict (partition merge), both versions are returned — application merges.

Merkle trees

Anti-entropy — hash tree for detecting divergence between replicas. Fast detection of which data ranges differ.

Write path

Client → Coordinator → [1. Write to commit log (disk)]
                       [2. Write to memtable (RAM)]
                       [3. Acknowledge client]
                       → [4. Flush memtable → SSTable (periodically)]
                       → [5. Compaction (merge SSTables)]

Read path

Client → Coordinator → [1. Check bloom filter]
                       [2. Check row cache / key cache]
                       [3. Read from SSTable (disk)]
                       [4. Merge with memtable]
                       [5. Repair if stale (read repair)]

Cassandra vs ScyllaDB

Feature Cassandra ScyllaDB
Language Java (JVM) C++ (seastar framework)
Architecture Thread-per-connection Shared-nothing, CPU sharding
Latency 5-20 ms (typical) 1-3 ms (typical)
Throughput Good 5-10× higher on same HW
GC pauses Yes (JVM) No (no GC)
NUMA OS-dependent Native NUMA aware
Workload Standard High-throughput, real-time
Price Open source Open source + Enterprise

Data model

  • Keyspace = namespace (analogy to DB)
  • Table = column definition (not schema-less)
  • Partition key = hash key for ring distribution
  • Clustering columns = ordering within a partition
  • Primary key = Partition key + Clustering columns

Recommendations — where Cassandra is better

Area Cassandra Competition Why Cassandra
Write throughput Linear scaling, no master bottleneck PostgreSQL (master writes) Every node writes, no single point of failure
Availability AP from CAP — always writable MongoDB (CP, primary down = read-only) "Always-writeable" philosophy
Multi-DC Native, per-DC replication CockroachDB (complex) Simple configuration, latency-tolerant
Time-series Wide-row model, TTL, compaction InfluxDB (specialized) Can combine with other workloads
IoT / sensor data Linear scaling, no master MongoDB (sharding complex) Predictable performance under growth
Geographic distribution Native multi-DC, hinted handoff Spanner (vendor lock-in) Open source, no dependencies

When to use Cassandra / ScyllaDB

  • IoT / sensor data ingest — millions of writes/s, no data loss
  • Time-series at massive scale — metrics, logs, event data
  • User activity history — write-heavy workloads
  • Multi-DC applications — data available in every location
  • Recommendation systems — wide-row model for "what user has seen"
  • Message / event store — high-throughput append with TTL

When to use something else

  • Relations, JOINs, transactions → PostgreSQL (Cassandra has no JOINs, limited transactions)
  • Full-text search → Elasticsearch
  • Aggregation / OLAP → ClickHouse (Cassandra is not an analytical DB)
  • Small data (< 100 GB) → PostgreSQL (Cassandra overhead not worth it)
  • Frequent reads by secondary keys → DynamoDB (SADA indexes) — Cassandra has limited secondary indexes

ScyllaDB specific

ScyllaDB is advantageous when:

  • You need 5-10× higher throughput on the same HW
  • You have latency-sensitive workload (real-time scoring, ad-tech)
  • You want to eliminate JVM/GC issues
  • You need predictable performance (P99 < 5 ms)

Sources

References, books, and standards: sources/databases/sources.md

Paper / Book Authors Description
Dynamo: Amazon's Highly Available Key-value Store (SOSP 2007) DeCandia et al. Foundational paper for Cassandra architecture
Cassandra: The Definitive Guide (3rd ed.) E. Hewitt Comprehensive guide to deployment and operations

Last revision: 2026-06-03