# đŸ—„ïž DatabĂĄzovĂĄ architektura ## Klasifikace databĂĄzĂ­ ### RelačnĂ­ (SQL) | DB | Licence | Use case | Detail | |----|---------|----------|--------| | **PostgreSQL** | Open source | UniverzĂĄlnĂ­, geospatial, analytika, AI | [POSTGRESQL.md](POSTGRESQL.md) | | **MySQL / MariaDB** | Open source | Web, LAMP stack, e-commerce | [MYSQL.md](MYSQL.md) | | **Microsoft SQL Server** | Proprietary | Enterprise .NET, Windows ekosystĂ©m | — | | **Oracle DB** | Proprietary | Enterprise, finance, mainframe, RAC cluster | [ORACLE.md](ORACLE.md) | | **Amazon Aurora** | Managed | MySQL/PostgreSQL kompatibilnĂ­, cloud-native | — | ### NoSQL | Typ | DB | Use case | Detail | |-----|----|----------|--------| | **Document** | MongoDB, Couchbase | JSON data, flexibilnĂ­ schema | [MONGODB.md](MONGODB.md) | | **Key-Value / Cache** | Redis, Memcached, DynamoDB | Cache, session store, real-time | [REDIS.md](REDIS.md) | | **Wide-column** | Cassandra, ScyllaDB | Time-series, IoT, velkĂĄ data | [CASSANDRA.md](CASSANDRA.md) | | **Vector** | Pinecone, Qdrant, Milvus, pgvector | Embeddingy, RAG, sĂ©mantickĂ© vyhledĂĄvĂĄnĂ­ | [VEKTOROVE-DB.md](VEKTOROVE-DB.md) | | **Graph** | Neo4j, Dgraph | Vztahy, doporučenĂ­, social grafy | — | ### Storage enginy SpolečnĂ© koncepty napƙíč databĂĄzemi: [DATABAZOVE-ENGINY.md](DATABAZOVE-ENGINY.md) --- ## Transaction isolation levels | Úroveƈ | Dirty Read | Non-repeatable Read | Phantom Read | Serialization Anomaly | |--------|-----------|---------------------|-------------|----------------------| | **Read Uncommitted** | Ano (moĆŸnĂ©) | Ano | Ano | Ano | | **Read Committed** | Ne (prevence) | Ano | Ano | Ano | | **Repeatable Read** | Ne | Ne | Ne (PostgreSQL: Ne) | Ano | | **Serializable** | Ne | Ne | Ne | Ne | **AnomĂĄlie**: - **Dirty Read** — čtenĂ­ dat z necommitnutĂ© transakce (data mohou bĂœt rollbacknuta) - **Non-repeatable Read** — stejnĂœ dotaz vrĂĄtĂ­ jinĂĄ data (jinĂĄ transakce mezitĂ­m updatovala ƙádek) - **Phantom Read** — stejnĂœ dotaz vrĂĄtĂ­ novĂ© ƙádky (jinĂĄ transakce insertla data splƈujĂ­cĂ­ podmĂ­nku) - **Serialization Anomaly** — vĂœsledek transakcĂ­ nenĂ­ ekvivalentnĂ­ ĆŸĂĄdnĂ©mu sĂ©riovĂ©mu poƙadĂ­ ### PostgreSQL vs MySQL rozdĂ­ly - **PostgreSQL**: Read Uncommitted se chovĂĄ jako Read Committed. Repeatable Read = Snapshot Isolation (zabraƈuje i phantom reads). Serializable = SSI. - **MySQL InnoDB**: Repeatable Read pouĆŸĂ­vĂĄ next-key locking (zabrĂĄnĂ­ phantom reads). --- ## CAP teorĂ©m V distribuovanĂ©m systĂ©mu lze mĂ­t pouze 2 ze 3: **C**onsistency, **A**vailability, **P**artition tolerance. V praxi: P je vĆŸdy vyĆŸadovĂĄno, volĂ­me mezi CP (konzistence) a AP (dostupnost). ### PACELC rozơíƙenĂ­ PACELC rozĆĄiƙuje CAP o chovĂĄnĂ­ za normĂĄlnĂ­ch podmĂ­nek (bez partition): - **P**artition → **A**vailability vs **C**onsistency - **E**lse (bez partition) → **L**atency vs **C**onsistency | DB | Partition volba | Else volba | |----|----------------|------------| | Cassandra | AP (dostupnost) | LC (nĂ­zkĂĄ latence, eventual consistency) | | DynamoDB (default) | AP | LC | | MongoDB | CP (primĂĄrnĂ­) | LC | | PostgreSQL (single) | CP | CC | | CockroachDB | CP | CC | ### Quorum detail - **R** (read quorum) + **W** (write quorum) > **N** (replication factor) - TypickĂ©: N=3, R=2, W=2 (toleruje 1 node down) - **Sloppy quorum** — pƙi nedostupnosti nodu, data dočasně uloĆŸena na jinĂ©m nodu - **Hinted handoff** — dočasnĂœ zĂĄpis na jinĂœ node s hintem, pƙi obnově se data pƙenesou --- ## Replikace | Typ | Popis | Latence | |-----|-------|---------| | SynchronnĂ­ | ZĂĄpis potvrzen aĆŸ po replikaci na vĆĄechny nod | VysokĂĄ, ale konzistentnĂ­ | | AsynchronnĂ­ | ZĂĄpis potvrzen ihned, replikace na pozadĂ­ | NĂ­zkĂĄ, moĆŸnĂœ data loss | | Semi-synchronnĂ­ | PotvrzenĂ­ od majority nodĆŻ | Kompromis | ### Topologie - **Leader-Follower** (Master-Slave) — čtenĂ­ z replic - **Leader-Leader** (Multi-master) — zĂĄpis na vĂ­ce nodĆŻ - **Quorum-based** — R + W > N (Cassandra, DynamoDB) --- ## Sharding Distribuce dat napƙíč uzly podle shard klíče. ``` ┌─────────┐ │ Proxy │ │ Router │ └────┬────┘ ┌──────────┌──────────┐ ┌────▌───┐ ┌───▌────┐ ┌───▌────┐ │Shard A │ │Shard B │ │Shard C │ │ 0-100 │ │101-200 │ │201-300 │ └────────┘ └────────┘ └────────┘ ``` ### Metody | Metoda | Popis | VĂœhoda | NevĂœhoda | |--------|-------|--------|----------| | **Hash-based** | `shard_id = hash(key) % N` | RovnoměrnĂĄ distribuce | ZtrĂĄta range dotazĆŻ | | **Range-based** | Data dle rozsahu (A-M, N-Z) | ZachovĂĄvĂĄ ƙazenĂ­ | Hot spots | | **Consistent hashing** | Hash ring, vnodes | Min. pƙeuspoƙádĂĄnĂ­ pƙi změně počtu shardĆŻ | SloĆŸitějĆĄĂ­ | ### Routing - **Proxy-based** — aplikace jde na proxy, ta routuje (Vitess, ProxySQL, mongos) - **Client-side** — aplikace vĂ­, na kterĂœ shard jĂ­t - **DNS-based** — kaĆŸdĂœ shard mĂĄ vlastnĂ­ endpoint --- ## Data consistency patterns | Pattern | Popis | Pƙíklad | |---------|-------|---------| | **Strong consistency** | Po zĂĄpisu kaĆŸdĂœ read vidĂ­ nejnovějĆĄĂ­ data | Single DB, Raft, Spanner | | **Eventual consistency** | Po zĂĄpisu se data časem propagujĂ­ | DNS, DynamoDB (default), Cassandra | | **Read-after-write** | Autor svĆŻj zĂĄpis vĆŸdy vidĂ­ (ostatnĂ­ eventual) | SociĂĄlnĂ­ sĂ­tě, komentáƙe | | **Causal consistency** | KauzĂĄlně zĂĄvislĂ© operace viděny ve sprĂĄvnĂ©m poƙadĂ­ | COPS, Orbe, MongoDB (causal clusters) | | **Monotonic reads** | NevidĂ­te starĆĄĂ­ data po tom, co jste viděli novějĆĄĂ­ | Cassandra (MONOTONIC_READ consistency) | | **Monotonic writes** | ZĂĄpisy od jednoho clienta v poƙadĂ­ | Queue-based, single leader | --- ## Migrace dat ### Schema migrace ``` V1__initial_schema.sql V2__add_users_table.sql V3__add_email_index.sql V4__add_orders_table.sql ``` ### Zero-downtime migrace 1. **Expand** — pƙidĂĄnĂ­ novĂ©ho sloupce/tabulky (aplikace toleruje oba stavy) 2. **Migrate** — backfill dat, update aplikace na novĂ© schema 3. **Contract** — odstraněnĂ­ starĂ©ho sloupce/tabulky ### NĂĄstroje | NĂĄstroj | Jazyk | Strategie | Zero-downtime | Rollback | |---------|-------|-----------|--------------|----------| | **Flyway** | Java (multi-lang CLI) | Versioned SQL | Omezeně (jen additive) | `undo` (limited, enterprise) | | **Liquibase** | Java (multi-lang CLI) | Changesets (XML/YAML/JSON/SQL) | Ano (changeset design) | `rollback ` | | **Alembic** | Python | Auto-generation, versioned | Ano (branching) | `downgrade` | | **Prisma Migrate** | TypeScript | Declarative schema → diff | Ano (shadow DB) | `migrate diff` | | **gh-ost** | Go | Triggerless online DDL (MySQL) | Ano (binlog stream) | Ne (progresivnĂ­) | | **pgroll** | Go | Online schema migrace (PG) | Ano (views, multiple versions) | Ano (okamĆŸitĂœ) | --- ## SQL Antipatterns Na zĂĄkladě *More SQL Antipatterns* (Karwin, 2026) — 14 novĂœch antipatternĆŻ: ### Language antipatterns | Antipattern | ProblĂ©m | ƘeĆĄenĂ­ | |-------------|---------|--------| | **Fear of JOINs** | ManuĂĄlnĂ­ pĂĄrovĂĄnĂ­ v aplikaci mĂ­sto JOIN | PouĆŸĂ­vat JOIN sprĂĄvně | | **Relational Division** | HledĂĄnĂ­ mnoĆŸin v WHERE | RelačnĂ­ dělenĂ­ (subquery s GROUP BY/HAVING) | | **Pagination via OFFSET** | OFFSET je O(n) — čím větĆĄĂ­ offset, tĂ­m pomalejĆĄĂ­ | Keyset pagination (WHERE id > last_seen) | | **Non-Sargable queries** | Funkce na sloupci v WHERE (`WHERE YEAR(date) = 2026`) | Pƙepsat na range podmĂ­nku | ### Optimization antipatterns | Antipattern | ProblĂ©m | ƘeĆĄenĂ­ | |-------------|---------|--------| | **Premature denormalization** | Denormalizace bez dĆŻvodu | Měƙit, pak optimalizovat | | **JSON overuse** | JSON jako univerzĂĄlnĂ­ ƙeĆĄenĂ­ | PouĆŸĂ­t JSON jen pro skutečně flexibilnĂ­ data | | **Cacheless transactions** | SpolĂ©hĂĄnĂ­ na query cache (v MySQL 8 odstraněna) | Application-level caching | ### Application antipatterns | Antipattern | ProblĂ©m | ƘeĆĄenĂ­ | |-------------|---------|--------| | **Polling** | PravidelnĂ© dotazovĂĄnĂ­ na změny | LISTEN/NOTIFY, Kafka, Change Data Capture | | **Transaction encapsulation** | KaĆŸdĂœ model si spravuje vlastnĂ­ transakci | Unit of Work pattern | | **Fear of deadlocks** | Snaha o prevenci vĆĄech deadlockĆŻ | Mitigace, ne prevence | | **Data hoarding** | UklĂĄdĂĄnĂ­ vĆĄeho navĆŸdy | Data retention politiky, archĂ­vace | ### Mini-antipatterny - `LIMIT` bez `ORDER BY` — nedeterministickĂ© vĂœsledky - `NATURAL JOIN` — kƙehkĂœ, implicitnĂ­ join condition - `N+1 queries` — dotaz v cyklu mĂ­sto JOIN/batch - RedundantnĂ­ indexy — duplicitnĂ­/pƙekrĂœvajĂ­cĂ­ se indexy zbytečně zpomalujĂ­ zĂĄpisy --- ## Designing Data-Intensive Applications (2. vydĂĄnĂ­) *Kleppmann, Riccomini (2026)* — zĂĄsadně pƙepracovanĂ© vydĂĄnĂ­. ### Novinky oproti 1. vydĂĄnĂ­ | Oblast | Co je novĂ© | |--------|-----------| | **Cloud-native** | Storage = object store (S3, Blob), nikoliv lokĂĄlnĂ­ disk. Separace control/data/compute plane | | **AI workloads** | VektorovĂ© indexy, DataFrames jako datovĂœ model, batch processing pro training data | | **Local-first software** | DuckDB, PGlite, SQLite — databĂĄze bÄ›ĆŸĂ­cĂ­ na laptopu/edge, sync pƙi pƙipojenĂ­ | | **Formal methods** | RandomizovanĂ© testovĂĄnĂ­, formĂĄlnĂ­ verifikace (dĆŻleĆŸitĂ© pro AI-generovanĂœ kĂłd) | | **Legal & ethics** | GDPR, etika prediktivnĂ­ analytiky, bias, accountability algoritmĆŻ | | **Streaming → SQL views** | Materialize, incremental view maintenance — streamovĂĄnĂ­ jako SQL | ### KlíčovĂ© principy (neměnĂ­ se) Spolehlivost (**Reliability**), ĆĄkĂĄlovatelnost (**Scalability**), udrĆŸovatelnost (**Maintainability**) — tƙi pilíƙe dobrĂœch datovĂœch systĂ©mĆŻ. --- ## Apache Iceberg Lakehouse Na zĂĄkladě *Architecting an Apache Iceberg Lakehouse* (Merced, 2026): ### Co je data lakehouse Architektura kombinujĂ­cĂ­ flexibilitu a nĂ­zkou cenu **data lake** (object storage) s vĂœkonem a governance **data warehouse**. Apache Iceberg je open source table format. ### Iceberg metadata architektura ``` Table metadata (.metadata.json) └── Snapshot manifest list └── Manifests (file-level stats) └── Data files (Parquet/ORC/Avro) ``` ### KlíčovĂ© vlastnosti | Vlastnost | Popis | |-----------|-------| | **ACID transakce** | BezpečnĂ© concurrent read/write | | **Schema evolution** | PƙidĂĄnĂ­/odebrĂĄnĂ­/pƙejmenovĂĄnĂ­ sloupce bez rewrite | | **Time travel** | DotazovĂĄnĂ­ na historickĂ© snapshoty | | **Partition evolution** | Změna partition strategie bez rewrite dat | | **Hidden partitioning** | AutomatickĂ© partition filtry (uĆŸivatel nemusĂ­ uvĂĄdět) | | **Multi-engine** | Spark, Flink, Trino, Dremio, Snowflake nad stejnĂœmi daty | ### Kdy pouĆŸĂ­t Iceberg - Multi-tool pƙístup ke stejnĂœm governed datĆŻm - ACID na lake datech - StreamovĂĄnĂ­ + batch v jednĂ© tabulce - SnĂ­ĆŸenĂ­ duplicity (jedna canonical kopie mĂ­sto ETL do warehouse) --- ## Best practices - **Connection pooling** — PgBouncer, RDS Proxy, ProxySQL - **IndexovĂĄnĂ­ podle query patternĆŻ** — nemĂ­t zbytečnĂ© indexy - **Read replicas** pro reporting a analytiku - **Backup & recovery** — point-in-time recovery (PITR), pravidelnĂ© testy - **Query monitoring** — slow query log, pg_stat_statements, performance_schema - **Encryption at rest & in transit** - **Migrace v CI/CD** — součást pipeline, ne manuĂĄlně - **Volba DB podle workloadu** — neexistuje jedna univerzĂĄlnĂ­ DB (polyglot persistence) --- ## SrovnĂĄnĂ­ licenčnĂ­ch modelĆŻ databĂĄzĂ­ | DB | Licence | Cena (self-hosted) | Cena (managed cloud) | Vendor lock-in | PoznĂĄmka | |----|---------|-------------------|---------------------|----------------|----------| | **PostgreSQL** | PostgreSQL license (MIT-like) | $0 | ~$0.10-1.00/hod (RDS, CloudSQL, Aurora) | NĂ­zkĂœ | Plně open source, ĆŸĂĄdnĂĄ omezenĂ­ | | **MySQL** | GPL v2 / Commercial (Oracle) | $0 (GPL) / ~$2 000/server/rok (commercial) | ~$0.10-1.00/hod (RDS, PlanetScale) | StƙednĂ­ (Oracle vlastnĂ­) | GPL = nutnost uvolnit aplikaci? (zĂĄvisĂ­ na distribuci) | | **MariaDB** | GPL v2 / Business Source | $0 (GPL) | ~$0.10-1.00/hod (SkySQL) | NĂ­zkĂœ | Plně kompatibilnĂ­ fork MySQL, ĆŸĂĄdnĂœ Oracle vliv | | **Oracle SE2** | Proprietary (per core) | ~$17 500/core + 22 % support/rok | ~$1-5/hod (RDS, OCI) | VysokĂœ | Core factor 0.5 (EPYC/Xeon), max 16 threads | | **Oracle EE** | Proprietary (per core + options) | ~$47 500/core + options + 22 % support | ~$2-30/hod (OCI, RDS) | VysokĂœ | Options zdvojnĂĄsobujĂ­ cenu (RAC, partitioning, compression) | | **SQL Server Standard** | Proprietary (per core + CAL) | ~$1 000/core + $200/CAL | ~$0.20-1.00/hod (Azure SQL) | StƙednĂ­ | Windows Server license nutnĂĄ navĂ­c | | **SQL Server Enterprise** | Proprietary (per core + CAL) | ~$7 000/core + $200/CAL | ~$1-5/hod (Azure SQL) | StƙednĂ­ | AlwaysOn, partitioning, in-memory OLTP | | **MongoDB** | SSPL (Community) / Commercial (Enterprise) | $0 (Community) / ~$10k/server/rok (Enterprise) | ~$0.10-5.00/hod (Atlas) | StƙednĂ­ | SSPL omezuje managed cloud sluĆŸby | | **Redis** | RSALv2 + SSPL (7.4+) / BSD (Valkey) | $0 (Valkey) | ~$0.10-1.00/hod (ElastiCache, Memorystore → Valkey) | NĂ­zkĂœ (Valkey) | Redis 7.4+ změna licence → fork Valkey | | **Cassandra** | Apache 2.0 | $0 | ~$0.10-1.00/hod (Keyspaces, Amazon Managed) | NĂ­zkĂœ | Plně open source, ĆŸĂĄdnĂĄ omezenĂ­ | | **ScyllaDB** | Apache 2.0 (OSS) / Enterprise | $0 (OSS) / Enterprise subscription | ~$0.50-3.00/hod (ScyllaDB Cloud) | NĂ­zkĂœ (OSS) | Enterprise: monitoring, security, support | | **CockroachDB** | BSL (Business Source License) / Enterprise | $0 (core) / Enterprise subscription | ~$0.50-3.00/hod (CockroachDB Cloud) | StƙednĂ­ | BSL: po 3 letech se měnĂ­ na MIT. Enterprise: multi-region, backup | **KlíčovĂĄ doporučenĂ­**: - **NejniĆŸĆĄĂ­ TCO**: PostgreSQL (ĆŸĂĄdnĂĄ licence, nejĆĄirĆĄĂ­ cloud podpora) - **NejvyĆĄĆĄĂ­ vendor lock-in**: Oracle (PL/SQL, proprietary options, drahĂĄ migrace) - **License risk**: Redis (změna licence) → pouĆŸĂ­vejte Valkey pro novĂ© projekty - **Cloud-native licensing**: MongoDB Atlas, CockroachDB Cloud, ScyllaDB Cloud — pay-per-use, ĆŸĂĄdnĂĄ sprĂĄva licencĂ­ ## Zdroje Odkazy, knihy a standardy: [sources/databases/sources.md](sources/databases/sources.md) ### DoporučenĂĄ literatura | Kniha | Autoƙi | ISBN | KlíčovĂœ pƙínos | |-------|--------|------|----------------| | Database Internals | Alex Petrov | 978-1492040346 | HloubkovĂœ vĂœklad storage engine (B-Tree, LSM-Tree, WAL, MVCC), distribuovanĂ© systĂ©my | | Designing Data-Intensive Applications (2nd ed.) | Kleppmann, Riccomini | — | Cloud-native, AI, local-first, formal methods | | High Performance MySQL (4th ed.) | Schwartz, Zaitsev, Tkachenko | 978-1492075292 | MySQL architektura, schema/index optimalizace | | Expert Oracle Architecture (3rd ed.) | Kyte, Kuhn | 978-1484249602 | Oracle architektura, RAC, Data Guard, tuning | | AI-Ready PostgreSQL 18 | Kumar, Linster | — | PostgreSQL jako unified platform pro AI | | More SQL Antipatterns | Bill Karwin (2026) | — | 14 antipatternĆŻ, keyset pagination | | Vector Databases | Borwankar (2026) | — | Embeddings, vektorovĂ© indexy, RAG | | Architecting an Apache Iceberg Lakehouse | Merced (2026) | — | Lakehouse architektura, Iceberg metadata | *PoslednĂ­ revize: 2026-06-03*