326 lines
16 KiB
Markdown
326 lines
16 KiB
Markdown
# đïž DatabĂĄzovĂĄ architektura
|
|
|
|
## Klasifikace databĂĄzĂ
|
|
|
|
### RelaÄnĂ (SQL)
|
|
|
|
| DB | Licence | Use case | Detail |
|
|
|----|---------|----------|--------|
|
|
| **PostgreSQL** | Open source | UniverzĂĄlnĂ, geospatial, analytika, AI | [POSTGRESQL.md](POSTGRESQL.md) |
|
|
| **MySQL / MariaDB** | Open source | Web, LAMP stack, e-commerce | [MYSQL.md](MYSQL.md) |
|
|
| **Microsoft SQL Server** | Proprietary | Enterprise .NET, Windows ekosystĂ©m | â |
|
|
| **Oracle DB** | Proprietary | Enterprise, finance, mainframe, RAC cluster | [ORACLE.md](ORACLE.md) |
|
|
| **Amazon Aurora** | Managed | MySQL/PostgreSQL kompatibilnĂ, cloud-native | â |
|
|
|
|
### NoSQL
|
|
|
|
| Typ | DB | Use case | Detail |
|
|
|-----|----|----------|--------|
|
|
| **Document** | MongoDB, Couchbase | JSON data, flexibilnĂ schema | [MONGODB.md](MONGODB.md) |
|
|
| **Key-Value / Cache** | Redis, Memcached, DynamoDB | Cache, session store, real-time | [REDIS.md](REDIS.md) |
|
|
| **Wide-column** | Cassandra, ScyllaDB | Time-series, IoT, velkĂĄ data | [CASSANDRA.md](CASSANDRA.md) |
|
|
| **Vector** | Pinecone, Qdrant, Milvus, pgvector | Embeddingy, RAG, sémantické vyhledåvånà | [VEKTOROVE-DB.md](VEKTOROVE-DB.md) |
|
|
| **Graph** | Neo4j, Dgraph | Vztahy, doporuÄenĂ, social grafy | â |
|
|
|
|
### Storage enginy
|
|
|
|
SpoleÄnĂ© koncepty napĆĂÄ databĂĄzemi: [DATABAZOVE-ENGINY.md](DATABAZOVE-ENGINY.md)
|
|
|
|
---
|
|
|
|
## Transaction isolation levels
|
|
|
|
| ĂroveĆ | Dirty Read | Non-repeatable Read | Phantom Read | Serialization Anomaly |
|
|
|--------|-----------|---------------------|-------------|----------------------|
|
|
| **Read Uncommitted** | Ano (moĆŸnĂ©) | Ano | Ano | Ano |
|
|
| **Read Committed** | Ne (prevence) | Ano | Ano | Ano |
|
|
| **Repeatable Read** | Ne | Ne | Ne (PostgreSQL: Ne) | Ano |
|
|
| **Serializable** | Ne | Ne | Ne | Ne |
|
|
|
|
**AnomĂĄlie**:
|
|
- **Dirty Read** â ÄtenĂ dat z necommitnutĂ© transakce (data mohou bĂœt rollbacknuta)
|
|
- **Non-repeatable Read** â stejnĂœ dotaz vrĂĄtĂ jinĂĄ data (jinĂĄ transakce mezitĂm updatovala ĆĂĄdek)
|
|
- **Phantom Read** â stejnĂœ dotaz vrĂĄtĂ novĂ© ĆĂĄdky (jinĂĄ transakce insertla data splĆujĂcĂ podmĂnku)
|
|
- **Serialization Anomaly** â vĂœsledek transakcĂ nenĂ ekvivalentnĂ ĆŸĂĄdnĂ©mu sĂ©riovĂ©mu poĆadĂ
|
|
|
|
### PostgreSQL vs MySQL rozdĂly
|
|
|
|
- **PostgreSQL**: Read Uncommitted se chovĂĄ jako Read Committed. Repeatable Read = Snapshot Isolation (zabraĆuje i phantom reads). Serializable = SSI.
|
|
- **MySQL InnoDB**: Repeatable Read pouĆŸĂvĂĄ next-key locking (zabrĂĄnĂ phantom reads).
|
|
|
|
---
|
|
|
|
## CAP teorém
|
|
|
|
V distribuovanĂ©m systĂ©mu lze mĂt pouze 2 ze 3: **C**onsistency, **A**vailability, **P**artition tolerance.
|
|
|
|
V praxi: P je vĆŸdy vyĆŸadovĂĄno, volĂme mezi CP (konzistence) a AP (dostupnost).
|
|
|
|
### PACELC rozĆĄĂĆenĂ
|
|
|
|
PACELC rozĆĄiĆuje CAP o chovĂĄnĂ za normĂĄlnĂch podmĂnek (bez partition):
|
|
- **P**artition â **A**vailability vs **C**onsistency
|
|
- **E**lse (bez partition) â **L**atency vs **C**onsistency
|
|
|
|
| DB | Partition volba | Else volba |
|
|
|----|----------------|------------|
|
|
| Cassandra | AP (dostupnost) | LC (nĂzkĂĄ latence, eventual consistency) |
|
|
| DynamoDB (default) | AP | LC |
|
|
| MongoDB | CP (primĂĄrnĂ) | LC |
|
|
| PostgreSQL (single) | CP | CC |
|
|
| CockroachDB | CP | CC |
|
|
|
|
### Quorum detail
|
|
|
|
- **R** (read quorum) + **W** (write quorum) > **N** (replication factor)
|
|
- Typické: N=3, R=2, W=2 (toleruje 1 node down)
|
|
- **Sloppy quorum** â pĆi nedostupnosti nodu, data doÄasnÄ uloĆŸena na jinĂ©m nodu
|
|
- **Hinted handoff** â doÄasnĂœ zĂĄpis na jinĂœ node s hintem, pĆi obnovÄ se data pĆenesou
|
|
|
|
---
|
|
|
|
## Replikace
|
|
|
|
| Typ | Popis | Latence |
|
|
|-----|-------|---------|
|
|
| SynchronnĂ | ZĂĄpis potvrzen aĆŸ po replikaci na vĆĄechny nod | VysokĂĄ, ale konzistentnĂ |
|
|
| AsynchronnĂ | ZĂĄpis potvrzen ihned, replikace na pozadĂ | NĂzkĂĄ, moĆŸnĂœ data loss |
|
|
| Semi-synchronnĂ | PotvrzenĂ od majority nodĆŻ | Kompromis |
|
|
|
|
### Topologie
|
|
|
|
- **Leader-Follower** (Master-Slave) â ÄtenĂ z replic
|
|
- **Leader-Leader** (Multi-master) â zĂĄpis na vĂce nodĆŻ
|
|
- **Quorum-based** â R + W > N (Cassandra, DynamoDB)
|
|
|
|
---
|
|
|
|
## Sharding
|
|
|
|
Distribuce dat napĆĂÄ uzly podle shard klĂÄe.
|
|
|
|
```
|
|
âââââââââââ
|
|
â Proxy â
|
|
â Router â
|
|
ââââââŹâââââ
|
|
ââââââââââââŒâââââââââââ
|
|
ââââââŒââââ âââââŒâââââ âââââŒâââââ
|
|
âShard A â âShard B â âShard C â
|
|
â 0-100 â â101-200 â â201-300 â
|
|
ââââââââââ ââââââââââ ââââââââââ
|
|
```
|
|
|
|
### Metody
|
|
|
|
| Metoda | Popis | VĂœhoda | NevĂœhoda |
|
|
|--------|-------|--------|----------|
|
|
| **Hash-based** | `shard_id = hash(key) % N` | RovnomÄrnĂĄ distribuce | ZtrĂĄta range dotazĆŻ |
|
|
| **Range-based** | Data dle rozsahu (A-M, N-Z) | ZachovĂĄvĂĄ ĆazenĂ | Hot spots |
|
|
| **Consistent hashing** | Hash ring, vnodes | Min. pĆeuspoĆĂĄdĂĄnĂ pĆi zmÄnÄ poÄtu shardĆŻ | SloĆŸitÄjĆĄĂ |
|
|
|
|
### Routing
|
|
|
|
- **Proxy-based** â aplikace jde na proxy, ta routuje (Vitess, ProxySQL, mongos)
|
|
- **Client-side** â aplikace vĂ, na kterĂœ shard jĂt
|
|
- **DNS-based** â kaĆŸdĂœ shard mĂĄ vlastnĂ endpoint
|
|
|
|
---
|
|
|
|
## Data consistency patterns
|
|
|
|
| Pattern | Popis | PĆĂklad |
|
|
|---------|-------|---------|
|
|
| **Strong consistency** | Po zĂĄpisu kaĆŸdĂœ read vidĂ nejnovÄjĆĄĂ data | Single DB, Raft, Spanner |
|
|
| **Eventual consistency** | Po zĂĄpisu se data Äasem propagujĂ | DNS, DynamoDB (default), Cassandra |
|
|
| **Read-after-write** | Autor svĆŻj zĂĄpis vĆŸdy vidĂ (ostatnĂ eventual) | SociĂĄlnĂ sĂtÄ, komentĂĄĆe |
|
|
| **Causal consistency** | KauzĂĄlnÄ zĂĄvislĂ© operace vidÄny ve sprĂĄvnĂ©m poĆadĂ | COPS, Orbe, MongoDB (causal clusters) |
|
|
| **Monotonic reads** | NevidĂte starĆĄĂ data po tom, co jste vidÄli novÄjĆĄĂ | Cassandra (MONOTONIC_READ consistency) |
|
|
| **Monotonic writes** | ZĂĄpisy od jednoho clienta v poĆadĂ | Queue-based, single leader |
|
|
|
|
---
|
|
|
|
## Migrace dat
|
|
|
|
### Schema migrace
|
|
|
|
```
|
|
V1__initial_schema.sql
|
|
V2__add_users_table.sql
|
|
V3__add_email_index.sql
|
|
V4__add_orders_table.sql
|
|
```
|
|
|
|
### Zero-downtime migrace
|
|
|
|
1. **Expand** â pĆidĂĄnĂ novĂ©ho sloupce/tabulky (aplikace toleruje oba stavy)
|
|
2. **Migrate** â backfill dat, update aplikace na novĂ© schema
|
|
3. **Contract** â odstranÄnĂ starĂ©ho sloupce/tabulky
|
|
|
|
### NĂĄstroje
|
|
|
|
| NĂĄstroj | Jazyk | Strategie | Zero-downtime | Rollback |
|
|
|---------|-------|-----------|--------------|----------|
|
|
| **Flyway** | Java (multi-lang CLI) | Versioned SQL | OmezenÄ (jen additive) | `undo` (limited, enterprise) |
|
|
| **Liquibase** | Java (multi-lang CLI) | Changesets (XML/YAML/JSON/SQL) | Ano (changeset design) | `rollback <count>` |
|
|
| **Alembic** | Python | Auto-generation, versioned | Ano (branching) | `downgrade` |
|
|
| **Prisma Migrate** | TypeScript | Declarative schema â diff | Ano (shadow DB) | `migrate diff` |
|
|
| **gh-ost** | Go | Triggerless online DDL (MySQL) | Ano (binlog stream) | Ne (progresivnĂ) |
|
|
| **pgroll** | Go | Online schema migrace (PG) | Ano (views, multiple versions) | Ano (okamĆŸitĂœ) |
|
|
|
|
---
|
|
|
|
## SQL Antipatterns
|
|
|
|
Na zĂĄkladÄ *More SQL Antipatterns* (Karwin, 2026) â 14 novĂœch antipatternĆŻ:
|
|
|
|
### Language antipatterns
|
|
|
|
| Antipattern | ProblĂ©m | ĆeĆĄenĂ |
|
|
|-------------|---------|--------|
|
|
| **Fear of JOINs** | ManuĂĄlnĂ pĂĄrovĂĄnĂ v aplikaci mĂsto JOIN | PouĆŸĂvat JOIN sprĂĄvnÄ |
|
|
| **Relational Division** | HledĂĄnĂ mnoĆŸin v WHERE | RelaÄnĂ dÄlenĂ (subquery s GROUP BY/HAVING) |
|
|
| **Pagination via OFFSET** | OFFSET je O(n) â ÄĂm vÄtĆĄĂ offset, tĂm pomalejĆĄĂ | Keyset pagination (WHERE id > last_seen) |
|
|
| **Non-Sargable queries** | Funkce na sloupci v WHERE (`WHERE YEAR(date) = 2026`) | PĆepsat na range podmĂnku |
|
|
|
|
### Optimization antipatterns
|
|
|
|
| Antipattern | ProblĂ©m | ĆeĆĄenĂ |
|
|
|-------------|---------|--------|
|
|
| **Premature denormalization** | Denormalizace bez dĆŻvodu | MÄĆit, pak optimalizovat |
|
|
| **JSON overuse** | JSON jako univerzĂĄlnĂ ĆeĆĄenĂ | PouĆŸĂt JSON jen pro skuteÄnÄ flexibilnĂ data |
|
|
| **Cacheless transactions** | SpolĂ©hĂĄnĂ na query cache (v MySQL 8 odstranÄna) | Application-level caching |
|
|
|
|
### Application antipatterns
|
|
|
|
| Antipattern | ProblĂ©m | ĆeĆĄenĂ |
|
|
|-------------|---------|--------|
|
|
| **Polling** | PravidelnĂ© dotazovĂĄnĂ na zmÄny | LISTEN/NOTIFY, Kafka, Change Data Capture |
|
|
| **Transaction encapsulation** | KaĆŸdĂœ model si spravuje vlastnĂ transakci | Unit of Work pattern |
|
|
| **Fear of deadlocks** | Snaha o prevenci vĆĄech deadlockĆŻ | Mitigace, ne prevence |
|
|
| **Data hoarding** | UklĂĄdĂĄnĂ vĆĄeho navĆŸdy | Data retention politiky, archĂvace |
|
|
|
|
### Mini-antipatterny
|
|
|
|
- `LIMIT` bez `ORDER BY` â nedeterministickĂ© vĂœsledky
|
|
- `NATURAL JOIN` â kĆehkĂœ, implicitnĂ join condition
|
|
- `N+1 queries` â dotaz v cyklu mĂsto JOIN/batch
|
|
- RedundantnĂ indexy â duplicitnĂ/pĆekrĂœvajĂcĂ se indexy zbyteÄnÄ zpomalujĂ zĂĄpisy
|
|
|
|
---
|
|
|
|
## Designing Data-Intensive Applications (2. vydĂĄnĂ)
|
|
|
|
*Kleppmann, Riccomini (2026)* â zĂĄsadnÄ pĆepracovanĂ© vydĂĄnĂ.
|
|
|
|
### Novinky oproti 1. vydĂĄnĂ
|
|
|
|
| Oblast | Co je nové |
|
|
|--------|-----------|
|
|
| **Cloud-native** | Storage = object store (S3, Blob), nikoliv lokĂĄlnĂ disk. Separace control/data/compute plane |
|
|
| **AI workloads** | VektorovĂ© indexy, DataFrames jako datovĂœ model, batch processing pro training data |
|
|
| **Local-first software** | DuckDB, PGlite, SQLite â databĂĄze bÄĆŸĂcĂ na laptopu/edge, sync pĆi pĆipojenĂ |
|
|
| **Formal methods** | RandomizovanĂ© testovĂĄnĂ, formĂĄlnĂ verifikace (dĆŻleĆŸitĂ© pro AI-generovanĂœ kĂłd) |
|
|
| **Legal & ethics** | GDPR, etika prediktivnĂ analytiky, bias, accountability algoritmĆŻ |
|
|
| **Streaming â SQL views** | Materialize, incremental view maintenance â streamovĂĄnĂ jako SQL |
|
|
|
|
### KlĂÄovĂ© principy (nemÄnĂ se)
|
|
|
|
Spolehlivost (**Reliability**), ĆĄkĂĄlovatelnost (**Scalability**), udrĆŸovatelnost (**Maintainability**) â tĆi pilĂĆe dobrĂœch datovĂœch systĂ©mĆŻ.
|
|
|
|
---
|
|
|
|
## Apache Iceberg Lakehouse
|
|
|
|
Na zĂĄkladÄ *Architecting an Apache Iceberg Lakehouse* (Merced, 2026):
|
|
|
|
### Co je data lakehouse
|
|
|
|
Architektura kombinujĂcĂ flexibilitu a nĂzkou cenu **data lake** (object storage) s vĂœkonem a governance **data warehouse**. Apache Iceberg je open source table format.
|
|
|
|
### Iceberg metadata architektura
|
|
|
|
```
|
|
Table metadata (.metadata.json)
|
|
âââ Snapshot manifest list
|
|
âââ Manifests (file-level stats)
|
|
âââ Data files (Parquet/ORC/Avro)
|
|
```
|
|
|
|
### KlĂÄovĂ© vlastnosti
|
|
|
|
| Vlastnost | Popis |
|
|
|-----------|-------|
|
|
| **ACID transakce** | BezpeÄnĂ© concurrent read/write |
|
|
| **Schema evolution** | PĆidĂĄnĂ/odebrĂĄnĂ/pĆejmenovĂĄnĂ sloupce bez rewrite |
|
|
| **Time travel** | Dotazovånà na historické snapshoty |
|
|
| **Partition evolution** | ZmÄna partition strategie bez rewrite dat |
|
|
| **Hidden partitioning** | AutomatickĂ© partition filtry (uĆŸivatel nemusĂ uvĂĄdÄt) |
|
|
| **Multi-engine** | Spark, Flink, Trino, Dremio, Snowflake nad stejnĂœmi daty |
|
|
|
|
DetailnÄjĆĄĂ pĆehled Big Data ekosystĂ©mu (HDFS, Spark, Flink, Trino, Delta Lake, Hudi) viz [BIG-DATA.md](BIG-DATA.md).
|
|
|
|
### Kdy pouĆŸĂt Iceberg
|
|
|
|
- Multi-tool pĆĂstup ke stejnĂœm governed datĆŻm
|
|
- ACID na lake datech
|
|
- Streamovånà + batch v jedné tabulce
|
|
- SnĂĆŸenĂ duplicity (jedna canonical kopie mĂsto ETL do warehouse)
|
|
|
|
---
|
|
|
|
## Best practices
|
|
|
|
- **Connection pooling** â PgBouncer, RDS Proxy, ProxySQL
|
|
- **IndexovĂĄnĂ podle query patternĆŻ** â nemĂt zbyteÄnĂ© indexy
|
|
- **Read replicas** pro reporting a analytiku
|
|
- **Backup & recovery** â point-in-time recovery (PITR), pravidelnĂ© testy
|
|
- **Query monitoring** â slow query log, pg_stat_statements, performance_schema
|
|
- **Encryption at rest & in transit**
|
|
- **Migrace v CI/CD** â souÄĂĄst pipeline, ne manuĂĄlnÄ
|
|
- **Volba DB podle workloadu** â neexistuje jedna univerzĂĄlnĂ DB (polyglot persistence)
|
|
|
|
---
|
|
|
|
## SrovnĂĄnĂ licenÄnĂch modelĆŻ databĂĄzĂ
|
|
|
|
| DB | Licence | Cena (self-hosted) | Cena (managed cloud) | Vendor lock-in | PoznĂĄmka |
|
|
|----|---------|-------------------|---------------------|----------------|----------|
|
|
| **PostgreSQL** | PostgreSQL license (MIT-like) | $0 | ~$0.10-1.00/hod (RDS, CloudSQL, Aurora) | NĂzkĂœ | PlnÄ open source, ĆŸĂĄdnĂĄ omezenĂ |
|
|
| **MySQL** | GPL v2 / Commercial (Oracle) | $0 (GPL) / ~$2 000/server/rok (commercial) | ~$0.10-1.00/hod (RDS, PlanetScale) | StĆednĂ (Oracle vlastnĂ) | GPL = nutnost uvolnit aplikaci? (zĂĄvisĂ na distribuci) |
|
|
| **MariaDB** | GPL v2 / Business Source | $0 (GPL) | ~$0.10-1.00/hod (SkySQL) | NĂzkĂœ | PlnÄ kompatibilnĂ fork MySQL, ĆŸĂĄdnĂœ Oracle vliv |
|
|
| **Oracle SE2** | Proprietary (per core) | ~$17 500/core + 22 % support/rok | ~$1-5/hod (RDS, OCI) | VysokĂœ | Core factor 0.5 (EPYC/Xeon), max 16 threads |
|
|
| **Oracle EE** | Proprietary (per core + options) | ~$47 500/core + options + 22 % support | ~$2-30/hod (OCI, RDS) | VysokĂœ | Options zdvojnĂĄsobujĂ cenu (RAC, partitioning, compression) |
|
|
| **SQL Server Standard** | Proprietary (per core + CAL) | ~$1 000/core + $200/CAL | ~$0.20-1.00/hod (Azure SQL) | StĆednĂ | Windows Server license nutnĂĄ navĂc |
|
|
| **SQL Server Enterprise** | Proprietary (per core + CAL) | ~$7 000/core + $200/CAL | ~$1-5/hod (Azure SQL) | StĆednĂ | AlwaysOn, partitioning, in-memory OLTP |
|
|
| **MongoDB** | SSPL (Community) / Commercial (Enterprise) | $0 (Community) / ~$10k/server/rok (Enterprise) | ~$0.10-5.00/hod (Atlas) | StĆednĂ | SSPL omezuje managed cloud sluĆŸby |
|
|
| **Redis** | RSALv2 + SSPL (7.4+) / BSD (Valkey) | $0 (Valkey) | ~$0.10-1.00/hod (ElastiCache, Memorystore â Valkey) | NĂzkĂœ (Valkey) | Redis 7.4+ zmÄna licence â fork Valkey |
|
|
| **Cassandra** | Apache 2.0 | $0 | ~$0.10-1.00/hod (Keyspaces, Amazon Managed) | NĂzkĂœ | PlnÄ open source, ĆŸĂĄdnĂĄ omezenĂ |
|
|
| **ScyllaDB** | Apache 2.0 (OSS) / Enterprise | $0 (OSS) / Enterprise subscription | ~$0.50-3.00/hod (ScyllaDB Cloud) | NĂzkĂœ (OSS) | Enterprise: monitoring, security, support |
|
|
| **CockroachDB** | BSL (Business Source License) / Enterprise | $0 (core) / Enterprise subscription | ~$0.50-3.00/hod (CockroachDB Cloud) | StĆednĂ | BSL: po 3 letech se mÄnĂ na MIT. Enterprise: multi-region, backup |
|
|
|
|
**KlĂÄovĂĄ doporuÄenĂ**:
|
|
- **NejniĆŸĆĄĂ TCO**: PostgreSQL (ĆŸĂĄdnĂĄ licence, nejĆĄirĆĄĂ cloud podpora)
|
|
- **NejvyĆĄĆĄĂ vendor lock-in**: Oracle (PL/SQL, proprietary options, drahĂĄ migrace)
|
|
- **License risk**: Redis (zmÄna licence) â pouĆŸĂvejte Valkey pro novĂ© projekty
|
|
- **Cloud-native licensing**: MongoDB Atlas, CockroachDB Cloud, ScyllaDB Cloud â pay-per-use, ĆŸĂĄdnĂĄ sprĂĄva licencĂ
|
|
|
|
## Zdroje
|
|
|
|
Odkazy, knihy a standardy: [sources/databases/sources.md](sources/databases/sources.md)
|
|
|
|
### DoporuÄenĂĄ literatura
|
|
|
|
| Kniha | AutoĆi | ISBN | KlĂÄovĂœ pĆĂnos |
|
|
|-------|--------|------|----------------|
|
|
| Database Internals | Alex Petrov | 978-1492040346 | HloubkovĂœ vĂœklad storage engine (B-Tree, LSM-Tree, WAL, MVCC), distribuovanĂ© systĂ©my |
|
|
| Designing Data-Intensive Applications (2nd ed.) | Kleppmann, Riccomini | â | Cloud-native, AI, local-first, formal methods |
|
|
| High Performance MySQL (4th ed.) | Schwartz, Zaitsev, Tkachenko | 978-1492075292 | MySQL architektura, schema/index optimalizace |
|
|
| Expert Oracle Architecture (3rd ed.) | Kyte, Kuhn | 978-1484249602 | Oracle architektura, RAC, Data Guard, tuning |
|
|
| AI-Ready PostgreSQL 18 | Kumar, Linster | â | PostgreSQL jako unified platform pro AI |
|
|
| More SQL Antipatterns | Bill Karwin (2026) | â | 14 antipatternĆŻ, keyset pagination |
|
|
| Vector Databases | Borwankar (2026) | â | Embeddings, vektorovĂ© indexy, RAG |
|
|
| Architecting an Apache Iceberg Lakehouse | Merced (2026) | â | Lakehouse architektura, Iceberg metadata |
|
|
|
|
*PoslednĂ revize: 2026-06-03*
|