# 🗄️ Database Architecture

## Database Classification

### Relational (SQL)

| DB | License | Use Case | Details |
|----|---------|----------|--------|
| **PostgreSQL** | Open source | Universal, geospatial, analytics, AI | [POSTGRESQL.en.md](POSTGRESQL.en.md) |
| **MySQL / MariaDB** | Open source | Web, LAMP stack, e-commerce | [MYSQL.en.md](MYSQL.en.md) |
| **Microsoft SQL Server** | Proprietary | Enterprise .NET, Windows ecosystem | — |
| **Oracle DB** | Proprietary | Enterprise, finance, mainframe, RAC cluster | [ORACLE.en.md](ORACLE.en.md) |
| **Amazon Aurora** | Managed | MySQL/PostgreSQL compatible, cloud-native | — |

### NoSQL

| Type | DB | Use Case | Details |
|-----|----|----------|--------|
| **Document** | MongoDB, Couchbase | JSON data, flexible schema | [MONGODB.en.md](MONGODB.en.md) |
| **Key-Value / Cache** | Redis, Memcached, DynamoDB | Cache, session store, real-time | [REDIS.en.md](REDIS.en.md) |
| **Wide-column** | Cassandra, ScyllaDB | Time-series, IoT, big data | [CASSANDRA.en.md](CASSANDRA.en.md) |
| **Vector** | Pinecone, Qdrant, Milvus, pgvector | Embeddings, RAG, semantic search | [VECTOR-DBS.en.md](VECTOR-DBS.en.md) |
| **Graph** | Neo4j, Dgraph | Relationships, recommendations, social graphs | — |

### Storage Engines

Common concepts across databases: [DATABASE-ENGINES.en.md](DATABASE-ENGINES.en.md)

---

## Transaction Isolation Levels

| Level | Dirty Read | Non-repeatable Read | Phantom Read | Serialization Anomaly |
|--------|-----------|---------------------|-------------|----------------------|
| **Read Uncommitted** | Yes (possible) | Yes | Yes | Yes |
| **Read Committed** | No (prevented) | Yes | Yes | Yes |
| **Repeatable Read** | No | No | No (PostgreSQL: No) | Yes |
| **Serializable** | No | No | No | No |

**Anomalies**:
- **Dirty Read** — reading data from an uncommitted transaction (data may be rolled back)
- **Non-repeatable Read** — same query returns different data (another transaction updated the row in the meantime)
- **Phantom Read** — same query returns new rows (another transaction inserted data matching the condition)
- **Serialization Anomaly** — the result of transactions is not equivalent to any serial order

### PostgreSQL vs MySQL Differences

- **PostgreSQL**: Read Uncommitted behaves like Read Committed. Repeatable Read = Snapshot Isolation (also prevents phantom reads). Serializable = SSI.
- **MySQL InnoDB**: Repeatable Read uses next-key locking (prevents phantom reads).

---

## CAP Theorem

In a distributed system, only 2 out of 3 are possible: **C**onsistency, **A**vailability, **P**artition tolerance.

In practice: P is always required, we choose between CP (consistency) and AP (availability).

### PACELC Extension

PACELC extends CAP with behavior under normal conditions (no partition):
- **P**artition → **A**vailability vs **C**onsistency
- **E**lse (no partition) → **L**atency vs **C**onsistency

| DB | Partition Choice | Else Choice |
|----|----------------|------------|
| Cassandra | AP (availability) | LC (low latency, eventual consistency) |
| DynamoDB (default) | AP | LC |
| MongoDB | CP (primary) | LC |
| PostgreSQL (single) | CP | CC |
| CockroachDB | CP | CC |

### Quorum Details

- **R** (read quorum) + **W** (write quorum) > **N** (replication factor)
- Typical: N=3, R=2, W=2 (tolerates 1 node down)
- **Sloppy quorum** — when a node is unavailable, data is temporarily stored on another node
- **Hinted handoff** — temporary write to another node with a hint, data is transferred upon recovery

---

## Replication

| Type | Description | Latency |
|-----|-------|---------|
| Synchronous | Write confirmed only after replication to all nodes | High, but consistent |
| Asynchronous | Write confirmed immediately, replication in the background | Low, possible data loss |
| Semi-synchronous | Confirmation from majority of nodes | Compromise |

### Topologies

- **Leader-Follower** (Master-Slave) — reads from replicas
- **Leader-Leader** (Multi-master) — writes to multiple nodes
- **Quorum-based** — R + W > N (Cassandra, DynamoDB)

---

## Sharding

Data distribution across nodes based on a shard key.

```
                     ┌─────────┐
                     │  Proxy  │
                     │  Router │
                     └────┬────┘
               ┌──────────┼──────────┐
          ┌────▼───┐ ┌───▼────┐ ┌───▼────┐
          │Shard A │ │Shard B │ │Shard C │
          │ 0-100  │ │101-200 │ │201-300 │
          └────────┘ └────────┘ └────────┘
```

### Methods

| Method | Description | Advantage | Disadvantage |
|--------|-------|--------|----------|
| **Hash-based** | `shard_id = hash(key) % N` | Even distribution | Loss of range queries |
| **Range-based** | Data by range (A-M, N-Z) | Preserves ordering | Hot spots |
| **Consistent hashing** | Hash ring, vnodes | Min. rebalancing when number of shards changes | More complex |

### Routing

- **Proxy-based** — application goes to proxy, which routes (Vitess, ProxySQL, mongos)
- **Client-side** — application knows which shard to target
- **DNS-based** — each shard has its own endpoint

---

## Data Consistency Patterns

| Pattern | Description | Example |
|---------|-------|---------|
| **Strong consistency** | After a write, every read sees the latest data | Single DB, Raft, Spanner |
| **Eventual consistency** | After a write, data propagates over time | DNS, DynamoDB (default), Cassandra |
| **Read-after-write** | The author always sees their own write (others are eventual) | Social networks, comments |
| **Causal consistency** | Causally dependent operations are seen in the correct order | COPS, Orbe, MongoDB (causal clusters) |
| **Monotonic reads** | You do not see older data after seeing newer data | Cassandra (MONOTONIC_READ consistency) |
| **Monotonic writes** | Writes from a single client are in order | Queue-based, single leader |

---

## Data Migration

### Schema Migration

```
V1__initial_schema.sql
V2__add_users_table.sql
V3__add_email_index.sql
V4__add_orders_table.sql
```

### Zero-Downtime Migration

1. **Expand** — add new column/table (application tolerates both states)
2. **Migrate** — backfill data, update application to new schema
3. **Contract** — remove old column/table

### Tools

| Tool | Language | Strategy | Zero-Downtime | Rollback |
|---------|-------|-----------|--------------|----------|
| **Flyway** | Java (multi-lang CLI) | Versioned SQL | Limited (additive only) | `undo` (limited, enterprise) |
| **Liquibase** | Java (multi-lang CLI) | Changesets (XML/YAML/JSON/SQL) | Yes (changeset design) | `rollback <count>` |
| **Alembic** | Python | Auto-generation, versioned | Yes (branching) | `downgrade` |
| **Prisma Migrate** | TypeScript | Declarative schema → diff | Yes (shadow DB) | `migrate diff` |
| **gh-ost** | Go | Triggerless online DDL (MySQL) | Yes (binlog stream) | No (progressive) |
| **pgroll** | Go | Online schema migration (PG) | Yes (views, multiple versions) | Yes (immediate) |

---

## SQL Antipatterns

Based on *More SQL Antipatterns* (Karwin, 2026) — 14 new antipatterns:

### Language Antipatterns

| Antipattern | Problem | Solution |
|-------------|---------|--------|
| **Fear of JOINs** | Manual pairing in application instead of JOIN | Use JOIN correctly |
| **Relational Division** | Finding sets in WHERE | Relational division (subquery with GROUP BY/HAVING) |
| **Pagination via OFFSET** | OFFSET is O(n) — the larger the offset, the slower | Keyset pagination (WHERE id > last_seen) |
| **Non-Sargable queries** | Functions on columns in WHERE (`WHERE YEAR(date) = 2026`) | Rewrite as range condition |

### Optimization Antipatterns

| Antipattern | Problem | Solution |
|-------------|---------|--------|
| **Premature denormalization** | Denormalization without reason | Measure, then optimize |
| **JSON overuse** | JSON as a universal solution | Use JSON only for genuinely flexible data |
| **Cacheless transactions** | Relying on query cache (removed in MySQL 8) | Application-level caching |

### Application Antipatterns

| Antipattern | Problem | Solution |
|-------------|---------|--------|
| **Polling** | Regularly querying for changes | LISTEN/NOTIFY, Kafka, Change Data Capture |
| **Transaction encapsulation** | Each model manages its own transaction | Unit of Work pattern |
| **Fear of deadlocks** | Trying to prevent all deadlocks | Mitigation, not prevention |
| **Data hoarding** | Storing everything forever | Data retention policies, archiving |

### Mini-Antipatterns

- `LIMIT` without `ORDER BY` — nondeterministic results
- `NATURAL JOIN` — fragile, implicit join condition
- `N+1 queries` — query in a loop instead of JOIN/batch
- Redundant indexes — duplicate/overlapping indexes unnecessarily slow writes

---

## Designing Data-Intensive Applications (2nd Edition)

*Kleppmann, Riccomini (2026)* — substantially revised edition.

### What's New Compared to 1st Edition

| Area | What's New |
|--------|-----------|
| **Cloud-native** | Storage = object store (S3, Blob), not local disk. Separation of control/data/compute plane |
| **AI workloads** | Vector indexes, DataFrames as a data model, batch processing for training data |
| **Local-first software** | DuckDB, PGlite, SQLite — databases running on laptop/edge, sync when connected |
| **Formal methods** | Randomized testing, formal verification (important for AI-generated code) |
| **Legal & ethics** | GDPR, ethics of predictive analytics, bias, algorithmic accountability |
| **Streaming → SQL views** | Materialize, incremental view maintenance — streaming as SQL |

### Key Principles (unchanged)

**Reliability**, **Scalability**, **Maintainability** — the three pillars of good data systems.

---

## Apache Iceberg Lakehouse

Based on *Architecting an Apache Iceberg Lakehouse* (Merced, 2026):

### What is a Data Lakehouse

An architecture combining the flexibility and low cost of a **data lake** (object storage) with the performance and governance of a **data warehouse**. Apache Iceberg is an open source table format.

### Iceberg Metadata Architecture

```
Table metadata (.metadata.json)
    └── Snapshot manifest list
            └── Manifests (file-level stats)
                    └── Data files (Parquet/ORC/Avro)
```

### Key Features

| Feature | Description |
|-----------|-------|
| **ACID transactions** | Safe concurrent read/write |
| **Schema evolution** | Add/drop/rename columns without rewrite |
| **Time travel** | Query historical snapshots |
| **Partition evolution** | Change partition strategy without data rewrite |
| **Hidden partitioning** | Automatic partition filters (user does not need to specify) |
| **Multi-engine** | Spark, Flink, Trino, Dremio, Snowflake over the same data |

For a broader overview of the Big Data ecosystem (HDFS, Spark, Flink, Trino, Delta Lake, Hudi) see [BIG-DATA.en.md](BIG-DATA.en.md).

### When to Use Iceberg

- Multi-tool access to the same governed data
- ACID on lake data
- Streaming + batch in a single table
- Reducing duplication (one canonical copy instead of ETL to warehouse)

---

## Best Practices

- **Connection pooling** — PgBouncer, RDS Proxy, ProxySQL
- **Indexing based on query patterns** — do not have unnecessary indexes
- **Read replicas** for reporting and analytics
- **Backup & recovery** — point-in-time recovery (PITR), regular tests
- **Query monitoring** — slow query log, pg_stat_statements, performance_schema
- **Encryption at rest & in transit**
- **Migrations in CI/CD** — part of the pipeline, not manual
- **Choose DB based on workload** — no single universal DB (polyglot persistence)

---

## Database License Model Comparison

| DB | License | Price (self-hosted) | Price (managed cloud) | Vendor lock-in | Note |
|----|---------|-------------------|---------------------|----------------|----------|
| **PostgreSQL** | PostgreSQL license (MIT-like) | $0 | ~$0.10-1.00/hr (RDS, CloudSQL, Aurora) | Low | Fully open source, no restrictions |
| **MySQL** | GPL v2 / Commercial (Oracle) | $0 (GPL) / ~$2,000/server/year (commercial) | ~$0.10-1.00/hr (RDS, PlanetScale) | Medium (Oracle owned) | GPL = need to release application? (depends on distribution) |
| **MariaDB** | GPL v2 / Business Source | $0 (GPL) | ~$0.10-1.00/hr (SkySQL) | Low | Fully compatible MySQL fork, no Oracle influence |
| **Oracle SE2** | Proprietary (per core) | ~$17,500/core + 22% support/year | ~$1-5/hr (RDS, OCI) | High | Core factor 0.5 (EPYC/Xeon), max 16 threads |
| **Oracle EE** | Proprietary (per core + options) | ~$47,500/core + options + 22% support | ~$2-30/hr (OCI, RDS) | High | Options double the price (RAC, partitioning, compression) |
| **SQL Server Standard** | Proprietary (per core + CAL) | ~$1,000/core + $200/CAL | ~$0.20-1.00/hr (Azure SQL) | Medium | Windows Server license required additionally |
| **SQL Server Enterprise** | Proprietary (per core + CAL) | ~$7,000/core + $200/CAL | ~$1-5/hr (Azure SQL) | Medium | AlwaysOn, partitioning, in-memory OLTP |
| **MongoDB** | SSPL (Community) / Commercial (Enterprise) | $0 (Community) / ~$10k/server/year (Enterprise) | ~$0.10-5.00/hr (Atlas) | Medium | SSPL restricts managed cloud services |
| **Redis** | RSALv2 + SSPL (7.4+) / BSD (Valkey) | $0 (Valkey) | ~$0.10-1.00/hr (ElastiCache, Memorystore → Valkey) | Low (Valkey) | Redis 7.4+ license change → Valkey fork |
| **Cassandra** | Apache 2.0 | $0 | ~$0.10-1.00/hr (Keyspaces, Amazon Managed) | Low | Fully open source, no restrictions |
| **ScyllaDB** | Apache 2.0 (OSS) / Enterprise | $0 (OSS) / Enterprise subscription | ~$0.50-3.00/hr (ScyllaDB Cloud) | Low (OSS) | Enterprise: monitoring, security, support |
| **CockroachDB** | BSL (Business Source License) / Enterprise | $0 (core) / Enterprise subscription | ~$0.50-3.00/hr (CockroachDB Cloud) | Medium | BSL: converts to MIT after 3 years. Enterprise: multi-region, backup |

**Key Recommendations**:
- **Lowest TCO**: PostgreSQL (no license, broadest cloud support)
- **Highest vendor lock-in**: Oracle (PL/SQL, proprietary options, expensive migration)
- **License risk**: Redis (license change) → use Valkey for new projects
- **Cloud-native licensing**: MongoDB Atlas, CockroachDB Cloud, ScyllaDB Cloud — pay-per-use, no license management

## Resources

Links, books and standards: [sources/databases/sources.en.md](sources/databases/sources.en.md)

### Recommended Reading

| Book | Authors | ISBN | Key Takeaway |
|-------|--------|------|----------------|
| Database Internals | Alex Petrov | 978-1492040346 | In-depth explanation of storage engines (B-Tree, LSM-Tree, WAL, MVCC), distributed systems |
| Designing Data-Intensive Applications (2nd ed.) | Kleppmann, Riccomini | — | Cloud-native, AI, local-first, formal methods |
| High Performance MySQL (4th ed.) | Schwartz, Zaitsev, Tkachenko | 978-1492075292 | MySQL architecture, schema/index optimization |
| Expert Oracle Architecture (3rd ed.) | Kyte, Kuhn | 978-1484249602 | Oracle architecture, RAC, Data Guard, tuning |
| AI-Ready PostgreSQL 18 | Kumar, Linster | — | PostgreSQL as a unified platform for AI |
| More SQL Antipatterns | Bill Karwin (2026) | — | 14 antipatterns, keyset pagination |
| Vector Databases | Borwankar (2026) | — | Embeddings, vector indexes, RAG |
| Architecting an Apache Iceberg Lakehouse | Merced (2026) | — | Lakehouse architecture, Iceberg metadata |

*Last revision: 2026-06-03*