15 KiB
🗄️ Database Architecture
Database Classification
Relational (SQL)
| DB | License | Use Case | Details |
|---|---|---|---|
| PostgreSQL | Open source | Universal, geospatial, analytics, AI | POSTGRESQL.en.md |
| MySQL / MariaDB | Open source | Web, LAMP stack, e-commerce | MYSQL.en.md |
| Microsoft SQL Server | Proprietary | Enterprise .NET, Windows ecosystem | — |
| Oracle DB | Proprietary | Enterprise, finance, mainframe, RAC cluster | ORACLE.en.md |
| Amazon Aurora | Managed | MySQL/PostgreSQL compatible, cloud-native | — |
NoSQL
| Type | DB | Use Case | Details |
|---|---|---|---|
| Document | MongoDB, Couchbase | JSON data, flexible schema | MONGODB.en.md |
| Key-Value / Cache | Redis, Memcached, DynamoDB | Cache, session store, real-time | REDIS.en.md |
| Wide-column | Cassandra, ScyllaDB | Time-series, IoT, big data | CASSANDRA.en.md |
| Vector | Pinecone, Qdrant, Milvus, pgvector | Embeddings, RAG, semantic search | VECTOR-DBS.en.md |
| Graph | Neo4j, Dgraph | Relationships, recommendations, social graphs | — |
Storage Engines
Common concepts across databases: DATABASE-ENGINES.en.md
Transaction Isolation Levels
| Level | Dirty Read | Non-repeatable Read | Phantom Read | Serialization Anomaly |
|---|---|---|---|---|
| Read Uncommitted | Yes (possible) | Yes | Yes | Yes |
| Read Committed | No (prevented) | Yes | Yes | Yes |
| Repeatable Read | No | No | No (PostgreSQL: No) | Yes |
| Serializable | No | No | No | No |
Anomalies:
- Dirty Read — reading data from an uncommitted transaction (data may be rolled back)
- Non-repeatable Read — same query returns different data (another transaction updated the row in the meantime)
- Phantom Read — same query returns new rows (another transaction inserted data matching the condition)
- Serialization Anomaly — the result of transactions is not equivalent to any serial order
PostgreSQL vs MySQL Differences
- PostgreSQL: Read Uncommitted behaves like Read Committed. Repeatable Read = Snapshot Isolation (also prevents phantom reads). Serializable = SSI.
- MySQL InnoDB: Repeatable Read uses next-key locking (prevents phantom reads).
CAP Theorem
In a distributed system, only 2 out of 3 are possible: Consistency, Availability, Partition tolerance.
In practice: P is always required, we choose between CP (consistency) and AP (availability).
PACELC Extension
PACELC extends CAP with behavior under normal conditions (no partition):
- Partition → Availability vs Consistency
- Else (no partition) → Latency vs Consistency
| DB | Partition Choice | Else Choice |
|---|---|---|
| Cassandra | AP (availability) | LC (low latency, eventual consistency) |
| DynamoDB (default) | AP | LC |
| MongoDB | CP (primary) | LC |
| PostgreSQL (single) | CP | CC |
| CockroachDB | CP | CC |
Quorum Details
- R (read quorum) + W (write quorum) > N (replication factor)
- Typical: N=3, R=2, W=2 (tolerates 1 node down)
- Sloppy quorum — when a node is unavailable, data is temporarily stored on another node
- Hinted handoff — temporary write to another node with a hint, data is transferred upon recovery
Replication
| Type | Description | Latency |
|---|---|---|
| Synchronous | Write confirmed only after replication to all nodes | High, but consistent |
| Asynchronous | Write confirmed immediately, replication in the background | Low, possible data loss |
| Semi-synchronous | Confirmation from majority of nodes | Compromise |
Topologies
- Leader-Follower (Master-Slave) — reads from replicas
- Leader-Leader (Multi-master) — writes to multiple nodes
- Quorum-based — R + W > N (Cassandra, DynamoDB)
Sharding
Data distribution across nodes based on a shard key.
┌─────────┐
│ Proxy │
│ Router │
└────┬────┘
┌──────────┼──────────┐
┌────▼───┐ ┌───▼────┐ ┌───▼────┐
│Shard A │ │Shard B │ │Shard C │
│ 0-100 │ │101-200 │ │201-300 │
└────────┘ └────────┘ └────────┘
Methods
| Method | Description | Advantage | Disadvantage |
|---|---|---|---|
| Hash-based | shard_id = hash(key) % N |
Even distribution | Loss of range queries |
| Range-based | Data by range (A-M, N-Z) | Preserves ordering | Hot spots |
| Consistent hashing | Hash ring, vnodes | Min. rebalancing when number of shards changes | More complex |
Routing
- Proxy-based — application goes to proxy, which routes (Vitess, ProxySQL, mongos)
- Client-side — application knows which shard to target
- DNS-based — each shard has its own endpoint
Data Consistency Patterns
| Pattern | Description | Example |
|---|---|---|
| Strong consistency | After a write, every read sees the latest data | Single DB, Raft, Spanner |
| Eventual consistency | After a write, data propagates over time | DNS, DynamoDB (default), Cassandra |
| Read-after-write | The author always sees their own write (others are eventual) | Social networks, comments |
| Causal consistency | Causally dependent operations are seen in the correct order | COPS, Orbe, MongoDB (causal clusters) |
| Monotonic reads | You do not see older data after seeing newer data | Cassandra (MONOTONIC_READ consistency) |
| Monotonic writes | Writes from a single client are in order | Queue-based, single leader |
Data Migration
Schema Migration
V1__initial_schema.sql
V2__add_users_table.sql
V3__add_email_index.sql
V4__add_orders_table.sql
Zero-Downtime Migration
- Expand — add new column/table (application tolerates both states)
- Migrate — backfill data, update application to new schema
- Contract — remove old column/table
Tools
| Tool | Language | Strategy | Zero-Downtime | Rollback |
|---|---|---|---|---|
| Flyway | Java (multi-lang CLI) | Versioned SQL | Limited (additive only) | undo (limited, enterprise) |
| Liquibase | Java (multi-lang CLI) | Changesets (XML/YAML/JSON/SQL) | Yes (changeset design) | rollback <count> |
| Alembic | Python | Auto-generation, versioned | Yes (branching) | downgrade |
| Prisma Migrate | TypeScript | Declarative schema → diff | Yes (shadow DB) | migrate diff |
| gh-ost | Go | Triggerless online DDL (MySQL) | Yes (binlog stream) | No (progressive) |
| pgroll | Go | Online schema migration (PG) | Yes (views, multiple versions) | Yes (immediate) |
SQL Antipatterns
Based on More SQL Antipatterns (Karwin, 2026) — 14 new antipatterns:
Language Antipatterns
| Antipattern | Problem | Solution |
|---|---|---|
| Fear of JOINs | Manual pairing in application instead of JOIN | Use JOIN correctly |
| Relational Division | Finding sets in WHERE | Relational division (subquery with GROUP BY/HAVING) |
| Pagination via OFFSET | OFFSET is O(n) — the larger the offset, the slower | Keyset pagination (WHERE id > last_seen) |
| Non-Sargable queries | Functions on columns in WHERE (WHERE YEAR(date) = 2026) |
Rewrite as range condition |
Optimization Antipatterns
| Antipattern | Problem | Solution |
|---|---|---|
| Premature denormalization | Denormalization without reason | Measure, then optimize |
| JSON overuse | JSON as a universal solution | Use JSON only for genuinely flexible data |
| Cacheless transactions | Relying on query cache (removed in MySQL 8) | Application-level caching |
Application Antipatterns
| Antipattern | Problem | Solution |
|---|---|---|
| Polling | Regularly querying for changes | LISTEN/NOTIFY, Kafka, Change Data Capture |
| Transaction encapsulation | Each model manages its own transaction | Unit of Work pattern |
| Fear of deadlocks | Trying to prevent all deadlocks | Mitigation, not prevention |
| Data hoarding | Storing everything forever | Data retention policies, archiving |
Mini-Antipatterns
LIMITwithoutORDER BY— nondeterministic resultsNATURAL JOIN— fragile, implicit join conditionN+1 queries— query in a loop instead of JOIN/batch- Redundant indexes — duplicate/overlapping indexes unnecessarily slow writes
Designing Data-Intensive Applications (2nd Edition)
Kleppmann, Riccomini (2026) — substantially revised edition.
What's New Compared to 1st Edition
| Area | What's New |
|---|---|
| Cloud-native | Storage = object store (S3, Blob), not local disk. Separation of control/data/compute plane |
| AI workloads | Vector indexes, DataFrames as a data model, batch processing for training data |
| Local-first software | DuckDB, PGlite, SQLite — databases running on laptop/edge, sync when connected |
| Formal methods | Randomized testing, formal verification (important for AI-generated code) |
| Legal & ethics | GDPR, ethics of predictive analytics, bias, algorithmic accountability |
| Streaming → SQL views | Materialize, incremental view maintenance — streaming as SQL |
Key Principles (unchanged)
Reliability, Scalability, Maintainability — the three pillars of good data systems.
Apache Iceberg Lakehouse
Based on Architecting an Apache Iceberg Lakehouse (Merced, 2026):
What is a Data Lakehouse
An architecture combining the flexibility and low cost of a data lake (object storage) with the performance and governance of a data warehouse. Apache Iceberg is an open source table format.
Iceberg Metadata Architecture
Table metadata (.metadata.json)
└── Snapshot manifest list
└── Manifests (file-level stats)
└── Data files (Parquet/ORC/Avro)
Key Features
| Feature | Description |
|---|---|
| ACID transactions | Safe concurrent read/write |
| Schema evolution | Add/drop/rename columns without rewrite |
| Time travel | Query historical snapshots |
| Partition evolution | Change partition strategy without data rewrite |
| Hidden partitioning | Automatic partition filters (user does not need to specify) |
| Multi-engine | Spark, Flink, Trino, Dremio, Snowflake over the same data |
For a broader overview of the Big Data ecosystem (HDFS, Spark, Flink, Trino, Delta Lake, Hudi) see BIG-DATA.en.md.
When to Use Iceberg
- Multi-tool access to the same governed data
- ACID on lake data
- Streaming + batch in a single table
- Reducing duplication (one canonical copy instead of ETL to warehouse)
Best Practices
- Connection pooling — PgBouncer, RDS Proxy, ProxySQL
- Indexing based on query patterns — do not have unnecessary indexes
- Read replicas for reporting and analytics
- Backup & recovery — point-in-time recovery (PITR), regular tests
- Query monitoring — slow query log, pg_stat_statements, performance_schema
- Encryption at rest & in transit
- Migrations in CI/CD — part of the pipeline, not manual
- Choose DB based on workload — no single universal DB (polyglot persistence)
Database License Model Comparison
| DB | License | Price (self-hosted) | Price (managed cloud) | Vendor lock-in | Note |
|---|---|---|---|---|---|
| PostgreSQL | PostgreSQL license (MIT-like) | $0 | ~$0.10-1.00/hr (RDS, CloudSQL, Aurora) | Low | Fully open source, no restrictions |
| MySQL | GPL v2 / Commercial (Oracle) | $0 (GPL) / ~$2,000/server/year (commercial) | ~$0.10-1.00/hr (RDS, PlanetScale) | Medium (Oracle owned) | GPL = need to release application? (depends on distribution) |
| MariaDB | GPL v2 / Business Source | $0 (GPL) | ~$0.10-1.00/hr (SkySQL) | Low | Fully compatible MySQL fork, no Oracle influence |
| Oracle SE2 | Proprietary (per core) | ~$17,500/core + 22% support/year | ~$1-5/hr (RDS, OCI) | High | Core factor 0.5 (EPYC/Xeon), max 16 threads |
| Oracle EE | Proprietary (per core + options) | ~$47,500/core + options + 22% support | ~$2-30/hr (OCI, RDS) | High | Options double the price (RAC, partitioning, compression) |
| SQL Server Standard | Proprietary (per core + CAL) | ~$1,000/core + $200/CAL | ~$0.20-1.00/hr (Azure SQL) | Medium | Windows Server license required additionally |
| SQL Server Enterprise | Proprietary (per core + CAL) | ~$7,000/core + $200/CAL | ~$1-5/hr (Azure SQL) | Medium | AlwaysOn, partitioning, in-memory OLTP |
| MongoDB | SSPL (Community) / Commercial (Enterprise) | $0 (Community) / ~$10k/server/year (Enterprise) | ~$0.10-5.00/hr (Atlas) | Medium | SSPL restricts managed cloud services |
| Redis | RSALv2 + SSPL (7.4+) / BSD (Valkey) | $0 (Valkey) | ~$0.10-1.00/hr (ElastiCache, Memorystore → Valkey) | Low (Valkey) | Redis 7.4+ license change → Valkey fork |
| Cassandra | Apache 2.0 | $0 | ~$0.10-1.00/hr (Keyspaces, Amazon Managed) | Low | Fully open source, no restrictions |
| ScyllaDB | Apache 2.0 (OSS) / Enterprise | $0 (OSS) / Enterprise subscription | ~$0.50-3.00/hr (ScyllaDB Cloud) | Low (OSS) | Enterprise: monitoring, security, support |
| CockroachDB | BSL (Business Source License) / Enterprise | $0 (core) / Enterprise subscription | ~$0.50-3.00/hr (CockroachDB Cloud) | Medium | BSL: converts to MIT after 3 years. Enterprise: multi-region, backup |
Key Recommendations:
- Lowest TCO: PostgreSQL (no license, broadest cloud support)
- Highest vendor lock-in: Oracle (PL/SQL, proprietary options, expensive migration)
- License risk: Redis (license change) → use Valkey for new projects
- Cloud-native licensing: MongoDB Atlas, CockroachDB Cloud, ScyllaDB Cloud — pay-per-use, no license management
Resources
Links, books and standards: sources/databases/sources.en.md
Recommended Reading
| Book | Authors | ISBN | Key Takeaway |
|---|---|---|---|
| Database Internals | Alex Petrov | 978-1492040346 | In-depth explanation of storage engines (B-Tree, LSM-Tree, WAL, MVCC), distributed systems |
| Designing Data-Intensive Applications (2nd ed.) | Kleppmann, Riccomini | — | Cloud-native, AI, local-first, formal methods |
| High Performance MySQL (4th ed.) | Schwartz, Zaitsev, Tkachenko | 978-1492075292 | MySQL architecture, schema/index optimization |
| Expert Oracle Architecture (3rd ed.) | Kyte, Kuhn | 978-1484249602 | Oracle architecture, RAC, Data Guard, tuning |
| AI-Ready PostgreSQL 18 | Kumar, Linster | — | PostgreSQL as a unified platform for AI |
| More SQL Antipatterns | Bill Karwin (2026) | — | 14 antipatterns, keyset pagination |
| Vector Databases | Borwankar (2026) | — | Embeddings, vector indexes, RAG |
| Architecting an Apache Iceberg Lakehouse | Merced (2026) | — | Lakehouse architecture, Iceberg metadata |
Last revision: 2026-06-03