# 🐘 Cassandra & ScyllaDB ## Overview Apache Cassandra is a distributed wide-column NoSQL database designed for high availability and linear scalability with no single point of failure. Inspired by the Amazon Dynamo paper (2007) and Google Bigtable. ScyllaDB is a C++ reimplementation compatible with the Cassandra protocol, with drastically lower latency and higher throughput. ## Architecture (Dynamo-inspired) ### Consistent hashing Data is divided on a hash ring, each node is responsible for a token range: ```text 0 ─── node A ─── hash(key1) │ 90 ─── node B ─── hash(key2) │ 180 ─── node C ─── hash(key3) │ 270 ─── node D ─── hash(key4) ``` - Adding/removing a node affects only K/N keys (thanks to virtual nodes) - **Virtual nodes** (vnodes) — each physical node has ~100-200 tokens on the ring (more even distribution) ### Quorum (N, R, W) - N = replication factor (typically 3) - R = read quorum (typically 2) - W = write quorum (typically 2) - Condition: R + W > N (for strong-ish consistency) - **Sloppy quorum** — when a node is unavailable, data is temporarily stored on another - **Hinted handoff** — temporary write with hint, data transferred upon recovery ### Gossip protocol Decentralized dissemination of membership information — each node periodically communicates with 1-3 random nodes. No central point of failure. ### Vector clocks Capturing causality of object versions. On conflict (partition merge), both versions are returned — application merges. ### Merkle trees Anti-entropy — hash tree for detecting divergence between replicas. Fast detection of which data ranges differ. ### Write path ```text Client → Coordinator → [1. Write to commit log (disk)] [2. Write to memtable (RAM)] [3. Acknowledge client] → [4. Flush memtable → SSTable (periodically)] → [5. Compaction (merge SSTables)] ``` ### Read path ```text Client → Coordinator → [1. Check bloom filter] [2. Check row cache / key cache] [3. Read from SSTable (disk)] [4. Merge with memtable] [5. Repair if stale (read repair)] ``` ## Cassandra vs ScyllaDB | Feature | Cassandra | ScyllaDB | |---------|-----------|----------| | **Language** | Java (JVM) | C++ (seastar framework) | | **Architecture** | Thread-per-connection | Shared-nothing, CPU sharding | | **Latency** | 5-20 ms (typical) | 1-3 ms (typical) | | **Throughput** | Good | 5-10× higher on same HW | | **GC pauses** | Yes (JVM) | No (no GC) | | **NUMA** | OS-dependent | Native NUMA aware | | **Workload** | Standard | High-throughput, real-time | | **Price** | Open source | Open source + Enterprise | ## Data model - **Keyspace** = namespace (analogy to DB) - **Table** = column definition (not schema-less) - **Partition key** = hash key for ring distribution - **Clustering columns** = ordering within a partition - **Primary key** = Partition key + Clustering columns ## Recommendations — where Cassandra is better | Area | Cassandra | Competition | Why Cassandra | |------|-----------|-------------|---------------| | **Write throughput** | Linear scaling, no master bottleneck | PostgreSQL (master writes) | Every node writes, no single point of failure | | **Availability** | AP from CAP — always writable | MongoDB (CP, primary down = read-only) | "Always-writeable" philosophy | | **Multi-DC** | Native, per-DC replication | CockroachDB (complex) | Simple configuration, latency-tolerant | | **Time-series** | Wide-row model, TTL, compaction | InfluxDB (specialized) | Can combine with other workloads | | **IoT / sensor data** | Linear scaling, no master | MongoDB (sharding complex) | Predictable performance under growth | | **Geographic distribution** | Native multi-DC, hinted handoff | Spanner (vendor lock-in) | Open source, no dependencies | ### When to use Cassandra / ScyllaDB - **IoT / sensor data ingest** — millions of writes/s, no data loss - **Time-series at massive scale** — metrics, logs, event data - **User activity history** — write-heavy workloads - **Multi-DC applications** — data available in every location - **Recommendation systems** — wide-row model for "what user has seen" - **Message / event store** — high-throughput append with TTL ### When to use something else - **Relations, JOINs, transactions** → PostgreSQL (Cassandra has no JOINs, limited transactions) - **Full-text search** → Elasticsearch - **Aggregation / OLAP** → ClickHouse (Cassandra is not an analytical DB) - **Small data (< 100 GB)** → PostgreSQL (Cassandra overhead not worth it) - **Frequent reads by secondary keys** → DynamoDB (SADA indexes) — Cassandra has limited secondary indexes ### ScyllaDB specific ScyllaDB is advantageous when: - You need 5-10× higher throughput on the same HW - You have latency-sensitive workload (real-time scoring, ad-tech) - You want to eliminate JVM/GC issues - You need predictable performance (P99 < 5 ms) ## Sources References, books, and standards: [sources/databases/sources.md](sources/databases/sources.md) ### Recommended reading | Paper / Book | Authors | Description | |--------------|---------|-------------| | Dynamo: Amazon's Highly Available Key-value Store (SOSP 2007) | DeCandia et al. | Foundational paper for Cassandra architecture | | Cassandra: The Definitive Guide (3rd ed.) | E. Hewitt | Comprehensive guide to deployment and operations | *Last revision: 2026-06-03*