# πŸ—„οΈ Database Architecture ## Database Classification ### Relational (SQL) | DB | License | Use Case | Details | |----|---------|----------|--------| | **PostgreSQL** | Open source | Universal, geospatial, analytics, AI | [POSTGRESQL.en.md](POSTGRESQL.en.md) | | **MySQL / MariaDB** | Open source | Web, LAMP stack, e-commerce | [MYSQL.en.md](MYSQL.en.md) | | **Microsoft SQL Server** | Proprietary | Enterprise .NET, Windows ecosystem | β€” | | **Oracle DB** | Proprietary | Enterprise, finance, mainframe, RAC cluster | [ORACLE.en.md](ORACLE.en.md) | | **Amazon Aurora** | Managed | MySQL/PostgreSQL compatible, cloud-native | β€” | ### NoSQL | Type | DB | Use Case | Details | |-----|----|----------|--------| | **Document** | MongoDB, Couchbase | JSON data, flexible schema | [MONGODB.en.md](MONGODB.en.md) | | **Key-Value / Cache** | Redis, Memcached, DynamoDB | Cache, session store, real-time | [REDIS.en.md](REDIS.en.md) | | **Wide-column** | Cassandra, ScyllaDB | Time-series, IoT, big data | [CASSANDRA.en.md](CASSANDRA.en.md) | | **Vector** | Pinecone, Qdrant, Milvus, pgvector | Embeddings, RAG, semantic search | [VECTOR-DBS.en.md](VECTOR-DBS.en.md) | | **Graph** | Neo4j, Dgraph | Relationships, recommendations, social graphs | β€” | ### Storage Engines Common concepts across databases: [DATABASE-ENGINES.en.md](DATABASE-ENGINES.en.md) --- ## Transaction Isolation Levels | Level | Dirty Read | Non-repeatable Read | Phantom Read | Serialization Anomaly | |--------|-----------|---------------------|-------------|----------------------| | **Read Uncommitted** | Yes (possible) | Yes | Yes | Yes | | **Read Committed** | No (prevented) | Yes | Yes | Yes | | **Repeatable Read** | No | No | No (PostgreSQL: No) | Yes | | **Serializable** | No | No | No | No | **Anomalies**: - **Dirty Read** β€” reading data from an uncommitted transaction (data may be rolled back) - **Non-repeatable Read** β€” same query returns different data (another transaction updated the row in the meantime) - **Phantom Read** β€” same query returns new rows (another transaction inserted data matching the condition) - **Serialization Anomaly** β€” the result of transactions is not equivalent to any serial order ### PostgreSQL vs MySQL Differences - **PostgreSQL**: Read Uncommitted behaves like Read Committed. Repeatable Read = Snapshot Isolation (also prevents phantom reads). Serializable = SSI. - **MySQL InnoDB**: Repeatable Read uses next-key locking (prevents phantom reads). --- ## CAP Theorem In a distributed system, only 2 out of 3 are possible: **C**onsistency, **A**vailability, **P**artition tolerance. In practice: P is always required, we choose between CP (consistency) and AP (availability). ### PACELC Extension PACELC extends CAP with behavior under normal conditions (no partition): - **P**artition β†’ **A**vailability vs **C**onsistency - **E**lse (no partition) β†’ **L**atency vs **C**onsistency | DB | Partition Choice | Else Choice | |----|----------------|------------| | Cassandra | AP (availability) | LC (low latency, eventual consistency) | | DynamoDB (default) | AP | LC | | MongoDB | CP (primary) | LC | | PostgreSQL (single) | CP | CC | | CockroachDB | CP | CC | ### Quorum Details - **R** (read quorum) + **W** (write quorum) > **N** (replication factor) - Typical: N=3, R=2, W=2 (tolerates 1 node down) - **Sloppy quorum** β€” when a node is unavailable, data is temporarily stored on another node - **Hinted handoff** β€” temporary write to another node with a hint, data is transferred upon recovery --- ## Replication | Type | Description | Latency | |-----|-------|---------| | Synchronous | Write confirmed only after replication to all nodes | High, but consistent | | Asynchronous | Write confirmed immediately, replication in the background | Low, possible data loss | | Semi-synchronous | Confirmation from majority of nodes | Compromise | ### Topologies - **Leader-Follower** (Master-Slave) β€” reads from replicas - **Leader-Leader** (Multi-master) β€” writes to multiple nodes - **Quorum-based** β€” R + W > N (Cassandra, DynamoDB) --- ## Sharding Data distribution across nodes based on a shard key. ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Proxy β”‚ β”‚ Router β”‚ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β” β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β” β”‚Shard A β”‚ β”‚Shard B β”‚ β”‚Shard C β”‚ β”‚ 0-100 β”‚ β”‚101-200 β”‚ β”‚201-300 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ### Methods | Method | Description | Advantage | Disadvantage | |--------|-------|--------|----------| | **Hash-based** | `shard_id = hash(key) % N` | Even distribution | Loss of range queries | | **Range-based** | Data by range (A-M, N-Z) | Preserves ordering | Hot spots | | **Consistent hashing** | Hash ring, vnodes | Min. rebalancing when number of shards changes | More complex | ### Routing - **Proxy-based** β€” application goes to proxy, which routes (Vitess, ProxySQL, mongos) - **Client-side** β€” application knows which shard to target - **DNS-based** β€” each shard has its own endpoint --- ## Data Consistency Patterns | Pattern | Description | Example | |---------|-------|---------| | **Strong consistency** | After a write, every read sees the latest data | Single DB, Raft, Spanner | | **Eventual consistency** | After a write, data propagates over time | DNS, DynamoDB (default), Cassandra | | **Read-after-write** | The author always sees their own write (others are eventual) | Social networks, comments | | **Causal consistency** | Causally dependent operations are seen in the correct order | COPS, Orbe, MongoDB (causal clusters) | | **Monotonic reads** | You do not see older data after seeing newer data | Cassandra (MONOTONIC_READ consistency) | | **Monotonic writes** | Writes from a single client are in order | Queue-based, single leader | --- ## Data Migration ### Schema Migration ``` V1__initial_schema.sql V2__add_users_table.sql V3__add_email_index.sql V4__add_orders_table.sql ``` ### Zero-Downtime Migration 1. **Expand** β€” add new column/table (application tolerates both states) 2. **Migrate** β€” backfill data, update application to new schema 3. **Contract** β€” remove old column/table ### Tools | Tool | Language | Strategy | Zero-Downtime | Rollback | |---------|-------|-----------|--------------|----------| | **Flyway** | Java (multi-lang CLI) | Versioned SQL | Limited (additive only) | `undo` (limited, enterprise) | | **Liquibase** | Java (multi-lang CLI) | Changesets (XML/YAML/JSON/SQL) | Yes (changeset design) | `rollback ` | | **Alembic** | Python | Auto-generation, versioned | Yes (branching) | `downgrade` | | **Prisma Migrate** | TypeScript | Declarative schema β†’ diff | Yes (shadow DB) | `migrate diff` | | **gh-ost** | Go | Triggerless online DDL (MySQL) | Yes (binlog stream) | No (progressive) | | **pgroll** | Go | Online schema migration (PG) | Yes (views, multiple versions) | Yes (immediate) | --- ## SQL Antipatterns Based on *More SQL Antipatterns* (Karwin, 2026) β€” 14 new antipatterns: ### Language Antipatterns | Antipattern | Problem | Solution | |-------------|---------|--------| | **Fear of JOINs** | Manual pairing in application instead of JOIN | Use JOIN correctly | | **Relational Division** | Finding sets in WHERE | Relational division (subquery with GROUP BY/HAVING) | | **Pagination via OFFSET** | OFFSET is O(n) β€” the larger the offset, the slower | Keyset pagination (WHERE id > last_seen) | | **Non-Sargable queries** | Functions on columns in WHERE (`WHERE YEAR(date) = 2026`) | Rewrite as range condition | ### Optimization Antipatterns | Antipattern | Problem | Solution | |-------------|---------|--------| | **Premature denormalization** | Denormalization without reason | Measure, then optimize | | **JSON overuse** | JSON as a universal solution | Use JSON only for genuinely flexible data | | **Cacheless transactions** | Relying on query cache (removed in MySQL 8) | Application-level caching | ### Application Antipatterns | Antipattern | Problem | Solution | |-------------|---------|--------| | **Polling** | Regularly querying for changes | LISTEN/NOTIFY, Kafka, Change Data Capture | | **Transaction encapsulation** | Each model manages its own transaction | Unit of Work pattern | | **Fear of deadlocks** | Trying to prevent all deadlocks | Mitigation, not prevention | | **Data hoarding** | Storing everything forever | Data retention policies, archiving | ### Mini-Antipatterns - `LIMIT` without `ORDER BY` β€” nondeterministic results - `NATURAL JOIN` β€” fragile, implicit join condition - `N+1 queries` β€” query in a loop instead of JOIN/batch - Redundant indexes β€” duplicate/overlapping indexes unnecessarily slow writes --- ## Designing Data-Intensive Applications (2nd Edition) *Kleppmann, Riccomini (2026)* β€” substantially revised edition. ### What's New Compared to 1st Edition | Area | What's New | |--------|-----------| | **Cloud-native** | Storage = object store (S3, Blob), not local disk. Separation of control/data/compute plane | | **AI workloads** | Vector indexes, DataFrames as a data model, batch processing for training data | | **Local-first software** | DuckDB, PGlite, SQLite β€” databases running on laptop/edge, sync when connected | | **Formal methods** | Randomized testing, formal verification (important for AI-generated code) | | **Legal & ethics** | GDPR, ethics of predictive analytics, bias, algorithmic accountability | | **Streaming β†’ SQL views** | Materialize, incremental view maintenance β€” streaming as SQL | ### Key Principles (unchanged) **Reliability**, **Scalability**, **Maintainability** β€” the three pillars of good data systems. --- ## Apache Iceberg Lakehouse Based on *Architecting an Apache Iceberg Lakehouse* (Merced, 2026): ### What is a Data Lakehouse An architecture combining the flexibility and low cost of a **data lake** (object storage) with the performance and governance of a **data warehouse**. Apache Iceberg is an open source table format. ### Iceberg Metadata Architecture ``` Table metadata (.metadata.json) └── Snapshot manifest list └── Manifests (file-level stats) └── Data files (Parquet/ORC/Avro) ``` ### Key Features | Feature | Description | |-----------|-------| | **ACID transactions** | Safe concurrent read/write | | **Schema evolution** | Add/drop/rename columns without rewrite | | **Time travel** | Query historical snapshots | | **Partition evolution** | Change partition strategy without data rewrite | | **Hidden partitioning** | Automatic partition filters (user does not need to specify) | | **Multi-engine** | Spark, Flink, Trino, Dremio, Snowflake over the same data | For a broader overview of the Big Data ecosystem (HDFS, Spark, Flink, Trino, Delta Lake, Hudi) see [BIG-DATA.en.md](BIG-DATA.en.md). ### When to Use Iceberg - Multi-tool access to the same governed data - ACID on lake data - Streaming + batch in a single table - Reducing duplication (one canonical copy instead of ETL to warehouse) --- ## Best Practices - **Connection pooling** β€” PgBouncer, RDS Proxy, ProxySQL - **Indexing based on query patterns** β€” do not have unnecessary indexes - **Read replicas** for reporting and analytics - **Backup & recovery** β€” point-in-time recovery (PITR), regular tests - **Query monitoring** β€” slow query log, pg_stat_statements, performance_schema - **Encryption at rest & in transit** - **Migrations in CI/CD** β€” part of the pipeline, not manual - **Choose DB based on workload** β€” no single universal DB (polyglot persistence) --- ## Database License Model Comparison | DB | License | Price (self-hosted) | Price (managed cloud) | Vendor lock-in | Note | |----|---------|-------------------|---------------------|----------------|----------| | **PostgreSQL** | PostgreSQL license (MIT-like) | $0 | ~$0.10-1.00/hr (RDS, CloudSQL, Aurora) | Low | Fully open source, no restrictions | | **MySQL** | GPL v2 / Commercial (Oracle) | $0 (GPL) / ~$2,000/server/year (commercial) | ~$0.10-1.00/hr (RDS, PlanetScale) | Medium (Oracle owned) | GPL = need to release application? (depends on distribution) | | **MariaDB** | GPL v2 / Business Source | $0 (GPL) | ~$0.10-1.00/hr (SkySQL) | Low | Fully compatible MySQL fork, no Oracle influence | | **Oracle SE2** | Proprietary (per core) | ~$17,500/core + 22% support/year | ~$1-5/hr (RDS, OCI) | High | Core factor 0.5 (EPYC/Xeon), max 16 threads | | **Oracle EE** | Proprietary (per core + options) | ~$47,500/core + options + 22% support | ~$2-30/hr (OCI, RDS) | High | Options double the price (RAC, partitioning, compression) | | **SQL Server Standard** | Proprietary (per core + CAL) | ~$1,000/core + $200/CAL | ~$0.20-1.00/hr (Azure SQL) | Medium | Windows Server license required additionally | | **SQL Server Enterprise** | Proprietary (per core + CAL) | ~$7,000/core + $200/CAL | ~$1-5/hr (Azure SQL) | Medium | AlwaysOn, partitioning, in-memory OLTP | | **MongoDB** | SSPL (Community) / Commercial (Enterprise) | $0 (Community) / ~$10k/server/year (Enterprise) | ~$0.10-5.00/hr (Atlas) | Medium | SSPL restricts managed cloud services | | **Redis** | RSALv2 + SSPL (7.4+) / BSD (Valkey) | $0 (Valkey) | ~$0.10-1.00/hr (ElastiCache, Memorystore β†’ Valkey) | Low (Valkey) | Redis 7.4+ license change β†’ Valkey fork | | **Cassandra** | Apache 2.0 | $0 | ~$0.10-1.00/hr (Keyspaces, Amazon Managed) | Low | Fully open source, no restrictions | | **ScyllaDB** | Apache 2.0 (OSS) / Enterprise | $0 (OSS) / Enterprise subscription | ~$0.50-3.00/hr (ScyllaDB Cloud) | Low (OSS) | Enterprise: monitoring, security, support | | **CockroachDB** | BSL (Business Source License) / Enterprise | $0 (core) / Enterprise subscription | ~$0.50-3.00/hr (CockroachDB Cloud) | Medium | BSL: converts to MIT after 3 years. Enterprise: multi-region, backup | **Key Recommendations**: - **Lowest TCO**: PostgreSQL (no license, broadest cloud support) - **Highest vendor lock-in**: Oracle (PL/SQL, proprietary options, expensive migration) - **License risk**: Redis (license change) β†’ use Valkey for new projects - **Cloud-native licensing**: MongoDB Atlas, CockroachDB Cloud, ScyllaDB Cloud β€” pay-per-use, no license management ## Resources Links, books and standards: [sources/databases/sources.en.md](sources/databases/sources.en.md) ### Recommended Reading | Book | Authors | ISBN | Key Takeaway | |-------|--------|------|----------------| | Database Internals | Alex Petrov | 978-1492040346 | In-depth explanation of storage engines (B-Tree, LSM-Tree, WAL, MVCC), distributed systems | | Designing Data-Intensive Applications (2nd ed.) | Kleppmann, Riccomini | β€” | Cloud-native, AI, local-first, formal methods | | High Performance MySQL (4th ed.) | Schwartz, Zaitsev, Tkachenko | 978-1492075292 | MySQL architecture, schema/index optimization | | Expert Oracle Architecture (3rd ed.) | Kyte, Kuhn | 978-1484249602 | Oracle architecture, RAC, Data Guard, tuning | | AI-Ready PostgreSQL 18 | Kumar, Linster | β€” | PostgreSQL as a unified platform for AI | | More SQL Antipatterns | Bill Karwin (2026) | β€” | 14 antipatterns, keyset pagination | | Vector Databases | Borwankar (2026) | β€” | Embeddings, vector indexes, RAG | | Architecting an Apache Iceberg Lakehouse | Merced (2026) | β€” | Lakehouse architecture, Iceberg metadata | *Last revision: 2026-06-03*