Files
knowledge-base/CASSANDRA.en.md
Stanislav Hubacek ef3c2f75b1 18.6.2026
2026-06-18 16:25:33 +02:00

136 lines
5.5 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 🐘 Cassandra & ScyllaDB
## Overview
Apache Cassandra is a distributed wide-column NoSQL database designed for high availability and linear scalability with no single point of failure. Inspired by the Amazon Dynamo paper (2007) and Google Bigtable. ScyllaDB is a C++ reimplementation compatible with the Cassandra protocol, with drastically lower latency and higher throughput.
## Architecture (Dynamo-inspired)
### Consistent hashing
Data is divided on a hash ring, each node is responsible for a token range:
```text
0 ─── node A ─── hash(key1)
90 ─── node B ─── hash(key2)
180 ─── node C ─── hash(key3)
270 ─── node D ─── hash(key4)
```
- Adding/removing a node affects only K/N keys (thanks to virtual nodes)
- **Virtual nodes** (vnodes) — each physical node has ~100-200 tokens on the ring (more even distribution)
### Quorum (N, R, W)
- N = replication factor (typically 3)
- R = read quorum (typically 2)
- W = write quorum (typically 2)
- Condition: R + W > N (for strong-ish consistency)
- **Sloppy quorum** — when a node is unavailable, data is temporarily stored on another
- **Hinted handoff** — temporary write with hint, data transferred upon recovery
### Gossip protocol
Decentralized dissemination of membership information — each node periodically communicates with 1-3 random nodes. No central point of failure.
### Vector clocks
Capturing causality of object versions. On conflict (partition merge), both versions are returned — application merges.
### Merkle trees
Anti-entropy — hash tree for detecting divergence between replicas. Fast detection of which data ranges differ.
### Write path
```text
Client → Coordinator → [1. Write to commit log (disk)]
[2. Write to memtable (RAM)]
[3. Acknowledge client]
→ [4. Flush memtable → SSTable (periodically)]
→ [5. Compaction (merge SSTables)]
```
### Read path
```text
Client → Coordinator → [1. Check bloom filter]
[2. Check row cache / key cache]
[3. Read from SSTable (disk)]
[4. Merge with memtable]
[5. Repair if stale (read repair)]
```
## Cassandra vs ScyllaDB
| Feature | Cassandra | ScyllaDB |
|---------|-----------|----------|
| **Language** | Java (JVM) | C++ (seastar framework) |
| **Architecture** | Thread-per-connection | Shared-nothing, CPU sharding |
| **Latency** | 5-20 ms (typical) | 1-3 ms (typical) |
| **Throughput** | Good | 5-10× higher on same HW |
| **GC pauses** | Yes (JVM) | No (no GC) |
| **NUMA** | OS-dependent | Native NUMA aware |
| **Workload** | Standard | High-throughput, real-time |
| **Price** | Open source | Open source + Enterprise |
## Data model
- **Keyspace** = namespace (analogy to DB)
- **Table** = column definition (not schema-less)
- **Partition key** = hash key for ring distribution
- **Clustering columns** = ordering within a partition
- **Primary key** = Partition key + Clustering columns
## Recommendations — where Cassandra is better
| Area | Cassandra | Competition | Why Cassandra |
|------|-----------|-------------|---------------|
| **Write throughput** | Linear scaling, no master bottleneck | PostgreSQL (master writes) | Every node writes, no single point of failure |
| **Availability** | AP from CAP — always writable | MongoDB (CP, primary down = read-only) | "Always-writeable" philosophy |
| **Multi-DC** | Native, per-DC replication | CockroachDB (complex) | Simple configuration, latency-tolerant |
| **Time-series** | Wide-row model, TTL, compaction | InfluxDB (specialized) | Can combine with other workloads |
| **IoT / sensor data** | Linear scaling, no master | MongoDB (sharding complex) | Predictable performance under growth |
| **Geographic distribution** | Native multi-DC, hinted handoff | Spanner (vendor lock-in) | Open source, no dependencies |
### When to use Cassandra / ScyllaDB
- **IoT / sensor data ingest** — millions of writes/s, no data loss
- **Time-series at massive scale** — metrics, logs, event data
- **User activity history** — write-heavy workloads
- **Multi-DC applications** — data available in every location
- **Recommendation systems** — wide-row model for "what user has seen"
- **Message / event store** — high-throughput append with TTL
### When to use something else
- **Relations, JOINs, transactions** → PostgreSQL (Cassandra has no JOINs, limited transactions)
- **Full-text search** → Elasticsearch
- **Aggregation / OLAP** → ClickHouse (Cassandra is not an analytical DB)
- **Small data (< 100 GB)** → PostgreSQL (Cassandra overhead not worth it)
- **Frequent reads by secondary keys** → DynamoDB (SADA indexes) — Cassandra has limited secondary indexes
### ScyllaDB specific
ScyllaDB is advantageous when:
- You need 5-10× higher throughput on the same HW
- You have latency-sensitive workload (real-time scoring, ad-tech)
- You want to eliminate JVM/GC issues
- You need predictable performance (P99 < 5 ms)
## Sources
References, books, and standards: [sources/databases/sources.en.md](sources/databases/sources.en.md)
### Recommended reading
| Paper / Book | Authors | Description |
|--------------|---------|-------------|
| Dynamo: Amazon's Highly Available Key-value Store (SOSP 2007) | DeCandia et al. | Foundational paper for Cassandra architecture |
| Cassandra: The Definitive Guide (3rd ed.) | E. Hewitt | Comprehensive guide to deployment and operations |
*Last revision: 2026-06-03*