# đŸ—„ïž DatabĂĄzovĂĄ architektura ## Klasifikace databĂĄzĂ­ ### RelačnĂ­ (SQL) | DB | Licence | Use case | |----|---------|----------| | PostgreSQL | Open source | UniverzĂĄlnĂ­, geospatial, analytika | | MySQL | Open source | Web, LAMP stack | | MariaDB | Open source | MySQL kompatibilnĂ­ | | Microsoft SQL Server | Proprietary | Enterprise .NET | | Oracle DB | Proprietary | Enterprise | | Amazon Aurora | Managed | MySQL/PostgreSQL kompatibilnĂ­ | ### NoSQL | Typ | DB | Use case | |-----|----|----------| | Document | MongoDB, Couchbase | JSON data, flexibilnĂ­ schema | | Key-Value | Redis, DynamoDB | Cache, session store, real-time | | Wide-column | Cassandra, ScyllaDB | Time-series, IoT, velkĂĄ data | | Graph | Neo4j, Dgraph | Vztahy, doporučenĂ­, social grafy | ## Transaction isolation levels | Úroveƈ | Dirty Read | Non-repeatable Read | Phantom Read | Serialization Anomaly | |--------|-----------|---------------------|-------------|----------------------| | **Read Uncommitted** | Ano (moĆŸnĂ©) | Ano | Ano | Ano | | **Read Committed** | Ne (prevence) | Ano | Ano | Ano | | **Repeatable Read** | Ne | Ne | Ano (PostgreSQL: Ne) | Ano | | **Serializable** | Ne | Ne | Ne | Ne | **AnomĂĄlie**: - **Dirty Read** — čtenĂ­ dat z necommitnutĂ© transakce (data mohou bĂœt rollbacknuta) - **Non-repeatable Read** — stejnĂœ dotaz vrĂĄtĂ­ jinĂĄ data (jinĂĄ transakce mezitĂ­m updatovala ƙádek) - **Phantom Read** — stejnĂœ dotaz vrĂĄtĂ­ novĂ© ƙádky (jinĂĄ transakce insertla data splƈujĂ­cĂ­ podmĂ­nku) - **Serialization Anomaly** — vĂœsledek transakcĂ­ nenĂ­ ekvivalentnĂ­ ĆŸĂĄdnĂ©mu sĂ©riovĂ©mu poƙadĂ­ ### PostgreSQL specific - **Read Uncommitted** se chovĂĄ jako **Read Committed** (nenĂ­ implementovĂĄno) - **Repeatable Read** v PG je **Snapshot Isolation** — zabraƈuje i phantom reads - **Serializable** v PG pouĆŸĂ­vĂĄ Serializable Snapshot Isolation (SSI) — detekce sĂ©riovĂœch konfliktĆŻ ## PostgreSQL detail ### MVCC (Multi-Version Concurrency Control) - KaĆŸdĂĄ transakce vidĂ­ snapshot dat (transaction snapshot) z okamĆŸiku startu - StarĂ© verze ƙádkĆŻ (tuple) zĆŻstĂĄvajĂ­ v tabulce, označenĂ© jako `xmax` / `xmin` - INSERT vytvoƙí novĂœ tuple s `xmin = current_xid` - DELETE označí tuple `xmax = current_xid` (nezmizĂ­ hned) - UPDATE = DELETE old + INSERT new - VACUUM fyzicky maĆŸe tuple starĆĄĂ­ neĆŸ nejstarĆĄĂ­ aktivnĂ­ snapshot ### VACUUM a autovacuum | Parametr | Popis | VĂœchozĂ­ | |----------|-------|---------| | `autovacuum_vacuum_threshold` | Min. mrtvĂœch ƙádkĆŻ pro spuĆĄtěnĂ­ | 50 | | `autovacuum_vacuum_scale_factor` | % z tabulky jako threshold | 0.2 (20 %) | | `autovacuum_analyze_threshold` | Min. změněnĂœch ƙádkĆŻ pro ANALYZE | 50 | | `autovacuum_vacuum_cost_limit` | Limituje I/O vacuum (prevence zĂĄtÄ›ĆŸe) | 200 | | `autovacuum_naptime` | Interval mezi kontrolami | 1 min | | `deadlock_timeout` | Detekce deadlockĆŻ | 1 s | **Pƙíznaky nedostatečnĂ©ho vacuum**: - RĆŻst tabulky (bloat) — starĂ© tuple zabĂ­rajĂ­ mĂ­sto - ZhorĆĄenĂ­ vĂœkonu index scanĆŻ (VISIBLE MAP nenĂ­ aktuĂĄlnĂ­) - XID wraparound hazard — emergency vacuum (mĆŻĆŸe zastavit DB) ### WAL archiving a PITR ```conf # postgresql.conf wal_level = replica # nebo logical archive_mode = on archive_command = 'aws s3 cp %p s3://backups/pg-wal/%f' ``` - **WAL** (Write-Ahead Log) — append-only log vĆĄech změn - **WAL archiving** — kontinuĂĄlnĂ­ zĂĄloha WAL segmentĆŻ (16 MB) - **PITR** (Point-In-Time Recovery) — obnova k libovolnĂ©mu okamĆŸiku 1. Restore base backup (pg_basebackup) 2. Replay WAL archivĆŻ aĆŸ k cĂ­lovĂ©mu času 3. `recovery_target_time = '2026-06-03 10:30:00 UTC'` ### Replication slots - **Physical replication slot** — zaručuje, ĆŸe WAL nenĂ­ smazĂĄn masterem, dokud ho replica nespotƙebuje - **Logical replication slot** — pro logickou replikaci (selectivnĂ­ tabulky, transformace dat) - **Riziko**: pokud replica spadne a slot nenĂ­ aktivnĂ­, WAL naroste na disku (disk full) - Monitoring: `pg_replication_slots`, `pg_stat_replication` ## Replikace | Typ | Popis | Latence | |-----|-------|---------| | SynchronnĂ­ | ZĂĄpis potvrzen aĆŸ po replikaci na vĆĄechny nod | VysokĂĄ, ale konzistentnĂ­ | | AsynchronnĂ­ | ZĂĄpis potvrzen ihned, replikace na pozadĂ­ | NĂ­zkĂĄ, moĆŸnĂœ data loss | | Semi-synchronnĂ­ | PotvrzenĂ­ od majority nodĆŻ | Kompromis | ### Topologie - **Leader-Follower** (Master-Slave) — čtenĂ­ z replic - **Leader-Leader** (Multi-master) — zĂĄpis na vĂ­ce nodĆŻ - **Quorum-based** — R + W > N (Cassandra, DynamoDB) ## Index types | Typ | Algoritmus | Use case | Operace | PoznĂĄmka | |-----|-----------|----------|---------|----------| | **B-tree** | Balanced tree | VětĆĄina dotazĆŻ — =, <, >, BETWEEN, IN, LIKE (prefix) | RychlĂ© vyhledĂĄvĂĄnĂ­, ƙazenĂ­ | VĂœchozĂ­ v PostgreSQL, MySQL | | **Hash** | Hash table | Pouze = (equality) | RychlĂ© lookup | MalĂĄ velikost, nelze range dotazy | | **GiST** | Generalized Search Tree | Full-text, geometrie, intervaly, IP rozsahy | RychlĂ©: overlaps, contains, distance | GIST pro geometrii (PostGIS), full-text | | **GIN** (Generalized Inverted Index) | Inverted index | Pole, JSONB, full-text search | contains (@>), overlaps (&&) | PomalĂœ build, rychlĂ© čtenĂ­ | | **BRIN** (Block Range Index) | Min/Max per block range | VelkĂ© tabulky, data v poƙadĂ­ (time-series) | Range dotazy na korelovanĂĄ data | ExtrĂ©mně malĂœ (tisĂ­ckrĂĄt menĆĄĂ­ neĆŸ B-tree) | | **SP-GiST** | Space-partitioned GiST | Quad-tree, KD-tree, radix tree | Partitioned search | GeografickĂ© clustery, sĂ­Ć„ovĂĄ data | ## Index selection guide | Query pattern | DoporučenĂœ index | Pƙíklad | |--------------|-----------------|---------| | `WHERE user_id = 123` | B-tree na `user_id` | UĆŸivatelskĂœ lookup | | `WHERE status = 'active' AND created_at > '2026-01-01'` | Composite B-tree na `(status, created_at)` | Filtrace + range | | `WHERE data @> '{"key": "value"}'` | GIN na JSONB sloupec | JSONB dotazy | | `WHERE tags && ARRAY['urgent']` | GIN na pole | Tagy | | `WHERE position <@> POINT(50, 14) < 1000` | GiST na geometry | LokalitnĂ­ dotazy | | `WHERE event_time BETWEEN '2026-06-01' AND '2026-06-02'` | BRIN na `event_time` | Time-series, logy | | `WHERE email = 'user@example.com'` | Hash na `email` | Equality only | ## Query execution ### Seq scan vs Index scan vs Bitmap scan | Typ | Popis | Kdy se pouĆŸĂ­vĂĄ | |-----|-------|---------------| | **Seq Scan** | ProchĂĄzĂ­ vĆĄechny ƙádky tabulky | VelkĂĄ část tabulky (>10 %), malĂĄ tabulka, ĆŸĂĄdnĂœ vhodnĂœ index | | **Index Scan** | HledĂĄ v indexu, pak nĂĄhodnĂœ pƙístup k heap | MalĂĄ podmnoĆŸina dat (<5 %), selektivnĂ­ dotazy | | **Index Only Scan** | NačítĂĄ data pouze z indexu (ne z heap) | VĆĄechny potƙebnĂ© sloupce jsou v indexu (covering index) | | **Bitmap Scan** | Kombinace vĂ­ce indexĆŻ → bitmapa → heap access | AND/OR podmĂ­nky na vĂ­ce indexovanĂœch sloupcĂ­ch | | **Bitmap Heap Scan** | NačítĂĄ ƙádky z bitmapy (srovnanĂ©) | AND/OR kombinace, ~5-20 % tabulky | ### EXPLAIN ANALYZE ```sql EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) SELECT * FROM orders WHERE user_id = 456 AND created_at > '2026-01-01'; -- VĂœstup: -- Index Scan using idx_orders_user_created on orders -- Index Cond: ((user_id = 456) AND (created_at > '2026-01-01'::date)) -- Buffers: shared hit=3 -- Planning Time: 0.12 ms -- Execution Time: 0.34 ms ``` KlíčovĂ© metriky: `Execution Time`, `Planning Time`, `Buffers` (hit/read/dirtied), `Rows Removed by Filter`, `Actual Time` (first...last) ## Sharding Distribuce dat napƙíč uzly podle shard klíče. ``` ┌─────────┐ │ Proxy │ │ Router │ └────┬────┘ ┌──────────┌──────────┐ ┌────▌───┐ ┌───▌────┐ ┌───▌────┐ │Shard A │ │Shard B │ │Shard C │ │ 0-100 │ │101-200 │ │201-300 │ └────────┘ └────────┘ └────────┘ ``` ### Hash-based sharding ``` shard_id = hash(shard_key) % number_of_shards ``` ### Range-based sharding - Data rozdělena podle rozsahu hodnot (napƙ. uĆŸivatelĂ© A-M, N-Z) - MĆŻĆŸe vĂ©st k nerovnoměrnĂ©mu rozdělenĂ­ (hot spots) ### Consistent hashing ``` Hash ring: 0 ─── shard A ─── hash(key1) │ 90 ─── shard B ─── hash(key2) │ 180 ─── shard C ─── hash(key3) │ 270 ─── shard D ─── hash(key4) ``` - Minimalizuje pƙeuspoƙádĂĄnĂ­ pƙidĂĄnĂ­m/odebrĂĄnĂ­m nodu (pouze sousednĂ­ shard) - **Virtual nodes** (vnodes) — kaĆŸdĂœ fyzickĂœ nod mĂĄ vĂ­ce virtuĂĄlnĂ­ch bodĆŻ na ring (lepĆĄĂ­ distribuce) - **KoordinačnĂ­ sluĆŸba**: Cassandra (vnodes), Riak (vnodes), DynamoDB (Consistent Hashing) ### Rebalancing - **RučnĂ­** — zastavit zĂĄpisy, pƙerozdělit data, restartovat - **AutomatickĂ©** — incremental migration (Cassandra vnodes) - **Proxy-based** — Vitess (shard splitting), Citus (shard rebalancing) ### Routing approaches - **Proxy-based** — aplikace jde na proxy, ta routuje (Vitess, ProxySQL) - **Client-side** — aplikace vĂ­, na kterĂœ shard jĂ­t - **DNS-based** — kaĆŸdĂœ shard mĂĄ vlastnĂ­ endpoint ## CAP teorĂ©m V distribuovanĂ©m systĂ©mu lze mĂ­t pouze 2 ze 3: - **C**onsistency — vĆĄichni vidĂ­ stejnĂĄ data - **A**vailability — kaĆŸdĂœ request dostane odpověď - **P**artition tolerance — systĂ©m funguje i pƙes vĂœpadek komunikace V praxi: P je vĆŸdy vyĆŸadovĂĄno, volĂ­me mezi CP (konzistence) a AP (dostupnost). ### PACELC rozơíƙenĂ­ PACELC rozĆĄiƙuje CAP o chovĂĄnĂ­ za normĂĄlnĂ­ch podmĂ­nek (bez partition): - **P**artition → **A**vailability vs **C**onsistency - **E**lse (bez partition) → **L**atency vs **C**onsistency | DB | Partition volba | Else volba | |----|----------------|------------| | Cassandra | AP (dostupnost) | LC (nĂ­zkĂĄ latence, eventual consistency) | | DynamoDB (default) | AP | LC | | MongoDB | CP (primĂĄrnĂ­) | LC | | PostgreSQL (single) | CP | CC | | CockroachDB | CP | CC | ### Quorum detail - **R** (read quorum) + **W** (write quorum) > **N** (replication factor) - TypickĂ©: N=3, R=2, W=2 (toleruje 1 node down) - **Sloppy quorum** — pƙi nedostupnosti nodu, data dočasně uloĆŸena na jinĂ©m nodu (nastolit konzistenci po obnově) - **Hinted handoff** — dočasnĂœ zĂĄpis na jinĂœ node s hintem, pƙi obnově se data pƙenesou ## Caching | Vrstva | NĂĄstroj | Use case | |--------|---------|----------| | Application cache | Redis, Memcached | Session, API response cache | | Database cache | Built-in | Query cache, buffer pool | | CDN cache | CloudFront, Fastly | Static assets, API responses | | HTTP cache | Varnish, nginx | Reverse proxy cache, ESI | ### Cache strategie | Strategie | Popis | Use case | |-----------|-------|----------| | **Cache-aside** | Aplikace načte z cache, pƙi miss jde do DB a naplnĂ­ cache | ObecnĂĄ | | **Read-through** | Cache sama načte z DB pƙi miss | JednoduchĂ© lookupy | | **Write-through** | ZĂĄpis jde do cache i DB zĂĄroveƈ | Konzistence | | **Write-behind** | ZĂĄpis do cache okamĆŸitě, do DB asynchronně | Propustnost | | **Cache-aside (lazy loading)** | TTL + invalidace | NejčastějĆĄĂ­ | ### Redis detail **Data structures**: | Struktura | Popis | Use case | |-----------|-------|----------| | **String** | BinĂĄrnĂ­ string (max 512 MB) | Cache hodnoty, session tokeny, counters | | **Hash** | Map field-value | UĆŸivatelskĂœ profil, objekt v cache | | **List** | Linked list (push/pop na oba konce) | Queue (RPUSH/LPOP), log stream | | **Set** | UnikĂĄtnĂ­ hodnoty (unordered) | Tags, deduplikace, memberships | | **Sorted Set** | UnikĂĄtnĂ­ + score (ƙazenĂ­) | Leaderboardy, rate limiting, timeouts | | **Bitmap** | BitovĂ© pole | Feature flagy, daily active users | | **HyperLogLog** | Approximate cardinality (12 KB = 2^64) | UnikĂĄtnĂ­ nĂĄvĆĄtěvnĂ­ci (error < 1%) | | **Stream** | Append-only log (Kafka-like) | Event store, messaging | **Eviction policies**: | Policy | Popis | Use case | |--------|-------|----------| | **noeviction** | Chyba pƙi zĂĄpisu kdyĆŸ je plno | TransakčnĂ­ data, neztrĂĄcet | | **allkeys-lru** | LRU na vĆĄechny klíče | ObecnĂĄ cache, standard | | **allkeys-lfu** | LFU na vĆĄechny klíče | Často pƙistupovanĂĄ data | | **volatile-lru** | LRU na klíče s TTL | Cache s expiracĂ­ | | **volatile-ttl** | NejblĂ­ĆŸ k expiraci | KrĂĄtkodobĂĄ data | | **allkeys-random** | NĂĄhodnĂœ | TestovĂĄnĂ­ | **Redis Cluster vs Sentinel**: | Vlastnost | Redis Sentinel | Redis Cluster | |-----------|---------------|---------------| | **Ć kĂĄlovĂĄnĂ­** | Read replicas (master + replica) | Data sharding (16384 hash slotĆŻ) | | **Auto-failover** | Ano (Sentinel) | Ano (gossip-based) | | **Multi-key ops** | Ano (transactiony na masteru) | OmezenĂ© (stejnĂœ hash slot) | | **Client komunikace** | Pƙes Sentinel (deprecated) | Cluster nodes redirect (MOVED/ASK) | | **MinimĂĄlnĂ­ uzly** | Master + Replica + 3 Sentinel | 3 masters (kaĆŸdĂœ s replikou) | | **Use case** | VysokĂĄ dostupnost, single shard | Multi-shard, horizontĂĄlnĂ­ ĆĄkĂĄlovĂĄnĂ­ | ### Connection pooling | Pooler | DB | Typ | Protokol | |--------|-----|-----|----------| | **PgBouncer** | PostgreSQL | Proxy (transaction/session) | PostgreSQL wire | | **RDS Proxy** | PostgreSQL, MySQL | Managed proxy | AWS | | **ProxySQL** | MySQL | Proxy (advanced routing) | MySQL wire | | **Odyssey** | PostgreSQL | Proxy (multithreaded) | PostgreSQL wire | | **pgpool-II** | PostgreSQL | Proxy (replication, load balancing) | PostgreSQL wire | **PgBouncer reĆŸimy**: - **Session pooling** — spojenĂ­ drĆŸeno po celou dobu session (aplikace) → overhead - **Transaction pooling** — spojenĂ­ vrĂĄceno po dokončenĂ­ transakce → efektivnějĆĄĂ­ (vyĆŸaduje bezstavovost) ## Migrace dat ### SchĂ©ma migrace (PostgreSQL / MySQL) ``` V1__initial_schema.sql V2__add_users_table.sql V3__add_email_index.sql V4__add_orders_table.sql ``` ### Zero-downtime migrace 1. **Expand** — pƙidĂĄnĂ­ novĂ©ho sloupce/tabulky (aplikace toleruje oba stavy) 2. **Migrate** — backfill dat, update aplikace na novĂ© schema 3. **Contract** — odstraněnĂ­ starĂ©ho sloupce/tabulky ### NĂĄstroje detail | NĂĄstroj | Jazyk | Strategie | Zero-downtime | Rollback | |---------|-------|-----------|--------------|----------| | **Flyway** | Java (multi-lang CLI) | Versioned SQL | Omezeně (jen additive) | `undo` (limited, enterprise) | | **Liquibase** | Java (multi-lang CLI) | Changesets (XML/YAML/JSON/SQL) | Ano (changeset design) | `rollback ` | | **Alembic** | Python | Auto-generation, versioned | Ano (branching) | `downgrade` | | **Prisma Migrate** | TypeScript | Declarative schema → diff | Ano (shadow DB) | `migrate diff` | | **gh-ost** | Go | Triggerless online DDL (MySQL) | Ano (binlog stream) | Ne (progresivnĂ­) | | **pgroll** | Go | Online schema migrace (PG) | Ano (views, multiple versions) | Ano (okamĆŸitĂœ) | ## Data consistency patterns | Pattern | Popis | Pƙíklad | |---------|-------|---------| | **Strong consistency** | Po zĂĄpisu kaĆŸdĂœ read vidĂ­ nejnovějĆĄĂ­ data | Single DB, Raft, Spanner | | **Eventual consistency** | Po zĂĄpisu se data časem propagujĂ­ | DNS, DynamoDB (default), Cassandra | | **Read-after-write** | Autor svĆŻj zĂĄpis vĆŸdy vidĂ­ (ostatnĂ­ eventual) | SociĂĄlnĂ­ sĂ­tě, komentáƙe | | **Causal consistency** | KauzĂĄlně zĂĄvislĂ© operace viděny ve sprĂĄvnĂ©m poƙadĂ­ | COPS, Orbe, MongoDB (causal clusters) | | **Monotonic reads** | NevidĂ­te starĆĄĂ­ data po tom, co jste viděli novějĆĄĂ­ | Cassandra (MONOTONIC_READ consistency) | | **Monotonic writes** | ZĂĄpisy od jednoho clienta v poƙadĂ­ | Queue-based, single leader | ## Storage engines — pƙehled ### B-Tree vs LSM-Tree | Vlastnost | B-Tree | LSM-Tree | |-----------|--------|----------| | ZĂĄpis | In-place update (nĂĄhodnĂœ I/O) | Append-only (sekvenčnĂ­ I/O) | | ČtenĂ­ | RychlĂ© (pƙímo v page) | PomalejĆĄĂ­ (merge z vĂ­ce SSTable) | | Kompaktnost | HorĆĄĂ­ (page fragmentation) | LepĆĄĂ­ (kompaktnĂ­ SSTable) | | Write amplification | NiĆŸĆĄĂ­ | VyĆĄĆĄĂ­ (kompakce) | | Read amplification | NiĆŸĆĄĂ­ | VyĆĄĆĄĂ­ (bloom filtry pomĂĄhajĂ­) | | TypickĂ© DB | PostgreSQL, MySQL (InnoDB) | Cassandra, RocksDB, LevelDB | ### Write-Ahead Log (WAL) - Append-only log pro crash recovery - KaĆŸdĂĄ změna se zapĂ­ĆĄe do WAL pƙed aplikacĂ­ na data page - Checkpoint = bod, od kterĂ©ho je WAL pƙi recovery potƙeba ### MVCC (Multi-Version Concurrency Control) - KaĆŸdĂĄ transakce vidĂ­ snapshot dat v okamĆŸiku startu - StarĂ© verze ƙádkĆŻ zĆŻstĂĄvajĂ­ v tabulce (vacuum/GC) - IzolačnĂ­ Ășrovně: Read Committed, Repeatable Read, Serializable ## Best practices - **Connection pooling** — PgBouncer, RDS Proxy, ProxySQL - **IndexovĂĄnĂ­ podle query patternĆŻ** — nemĂ­t zbytečnĂ© indexy - **Read replicas** pro reporting a analytiku - **Backup & recovery** — point-in-time recovery (PITR), pravidelnĂ© testy - **Query monitoring** — slow query log, pg_stat_statements, performance_schema - **Encryption at rest & in transit** - **Migrace v CI/CD** — součást pipeline, ne manuĂĄlně ## Zdroje Odkazy, knihy a standardy: [sources/databases/sources.md](sources/databases/sources.md)