comiiit
This commit is contained in:
178
POSTGRESQL.en.md
Normal file
178
POSTGRESQL.en.md
Normal file
@@ -0,0 +1,178 @@
|
||||
# 🐘 PostgreSQL
|
||||
|
||||
## Overview
|
||||
|
||||
PostgreSQL is the most advanced open-source relational database with emphasis on extensibility, SQL standards, and reliability. Development since 1996, strong community, active release cycle (major version every year).
|
||||
|
||||
## Architecture
|
||||
|
||||
### Process model
|
||||
|
||||
```text
|
||||
Postmaster (supervisor)
|
||||
├── Backend process (1 per connection)
|
||||
├── WAL writer
|
||||
├── Checkpointer
|
||||
├── Autovacuum launcher
|
||||
├── Stats collector
|
||||
├── Logical replication launcher
|
||||
└── Archiver (WAL archiving)
|
||||
```
|
||||
|
||||
Each connection = its own OS process (not thread). Advantage: isolation, stability. Disadvantage: higher memory footprint with thousands of connections → connection pooler required (PgBouncer).
|
||||
|
||||
### MVCC (Multi-Version Concurrency Control)
|
||||
|
||||
Each transaction sees a snapshot of data from the moment it started. Old row versions (tuples) remain in the table:
|
||||
|
||||
- INSERT creates a new tuple with `xmin = current_xid`
|
||||
- DELETE marks tuple with `xmax = current_xid` (doesn't disappear immediately)
|
||||
- UPDATE = DELETE old + INSERT new
|
||||
- VACUUM physically deletes tuples older than the oldest active snapshot
|
||||
|
||||
### VACUUM and autovacuum
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `autovacuum_vacuum_threshold` | Min. dead rows to trigger vacuum | 50 |
|
||||
| `autovacuum_vacuum_scale_factor` | % of table as threshold | 0.2 (20%) |
|
||||
| `autovacuum_analyze_threshold` | Min. changed rows for ANALYZE | 50 |
|
||||
| `autovacuum_vacuum_cost_limit` | Limits I/O of vacuum (prevents load) | 200 |
|
||||
| `autovacuum_naptime` | Interval between checks | 1 min |
|
||||
| `deadlock_timeout` | Deadlock detection | 1 s |
|
||||
|
||||
**Signs of insufficient vacuum**: table growth (bloat), degraded index scan performance, XID wraparound hazard.
|
||||
|
||||
### WAL (Write-Ahead Log)
|
||||
|
||||
Append-only log of all changes for crash recovery and replication:
|
||||
|
||||
```conf
|
||||
wal_level = replica # or logical
|
||||
archive_mode = on
|
||||
archive_command = 'aws s3 cp %p s3://backups/pg-wal/%f'
|
||||
```
|
||||
|
||||
**PITR (Point-In-Time Recovery)**:
|
||||
1. Restore base backup (pg_basebackup)
|
||||
2. Replay WAL archives up to target time
|
||||
3. `recovery_target_time = '2026-06-03 10:30:00 UTC'`
|
||||
|
||||
### Replication slots
|
||||
|
||||
- **Physical** — guarantees WAL is not deleted by master until replica consumes it
|
||||
- **Logical** — for logical replication (selective tables, data transformation)
|
||||
- **Risk**: if replica fails, WAL grows on disk (disk full)
|
||||
- Monitoring: `pg_replication_slots`, `pg_stat_replication`
|
||||
|
||||
### Configuration
|
||||
|
||||
Main files (per Obe & Hsu):
|
||||
- `postgresql.conf` — memory, network, logging, storage
|
||||
- `pg_hba.conf` — access privileges
|
||||
- `pg_ident.conf` — OS user to PostgreSQL role mapping
|
||||
|
||||
### AI-Ready PostgreSQL 18
|
||||
|
||||
(Kumar, Linster, 2026) — PostgreSQL 18 as a unified platform for transactions, analytics, and AI:
|
||||
|
||||
| Area | Technique |
|
||||
|------|-----------|
|
||||
| Vectors | pgvector — embeddings directly in table rows |
|
||||
| Hybrid pattern | Semantic recall → SQL filtering |
|
||||
| LLM integration | PostgreSQL + MCP (Model Context Protocol) |
|
||||
| Embedding pipeline | Batch and stream embedding generation |
|
||||
|
||||
**Hybrid query**:
|
||||
```sql
|
||||
SELECT p.*, pm.name
|
||||
FROM products p
|
||||
JOIN product_embeddings pe ON p.id = pe.product_id
|
||||
WHERE pe.embedding <-> '[0.1, 0.3, ...]' < 0.8
|
||||
AND p.in_stock = true
|
||||
AND p.price < 100.00
|
||||
ORDER BY pe.embedding <-> '[0.1, 0.3, ...]'
|
||||
LIMIT 10;
|
||||
```
|
||||
|
||||
### Extensions
|
||||
|
||||
| Extension | Purpose |
|
||||
|-----------|---------|
|
||||
| pgvector | Vector search for AI/embeddings |
|
||||
| PostGIS | Geographic data, spatial queries |
|
||||
| pg_stat_statements | Query performance monitoring |
|
||||
| pg_duckdb | Analytical queries (DuckDB engine inside PG) |
|
||||
| pg_search | Full-text and hybrid search |
|
||||
| pg_cron | DB job scheduling |
|
||||
| citus | Horizontal scaling (sharding) |
|
||||
| timescaledb | Time-series optimization |
|
||||
| pgaudit | Audit logging |
|
||||
|
||||
## Connection pooling
|
||||
|
||||
| Pooler | Type | Protocol |
|
||||
|--------|------|----------|
|
||||
| PgBouncer | Proxy (transaction/session) | PostgreSQL wire |
|
||||
| Odyssey | Proxy (multithreaded) | PostgreSQL wire |
|
||||
| pgpool-II | Proxy (replication, load balancing) | PostgreSQL wire |
|
||||
| RDS Proxy | Managed proxy (AWS) | PostgreSQL wire |
|
||||
|
||||
**PgBouncer modes**:
|
||||
- **Session pooling** — connection held for entire application session → overhead
|
||||
- **Transaction pooling** — connection returned after transaction completes → more efficient (requires statelessness)
|
||||
|
||||
## Recommendations — where PostgreSQL is better
|
||||
|
||||
| Area | PostgreSQL | Competition | Why PG |
|
||||
|------|-----------|-------------|--------|
|
||||
| **Extensibility** | Extensions, custom types, operators, index methods | MySQL limited | Can add anything from vectors to full-text in DB |
|
||||
| **SQL standard** | Closest to ANSI SQL | MySQL deviations (GROUP BY, ALTER TABLE) | Portability, fewer surprises |
|
||||
| **Geospatial data** | PostGIS (gold standard GIS) | MySQL GIS (limited) | Only real open-source choice for GIS |
|
||||
| **Consistency** | SSI serializable, foreign keys, CHECK, exclusions | MySQL MyISAM no FK, InnoDB only RC | Suitable for financial and critical systems |
|
||||
| **Concurrent read/write** | MVCC without reader/writer blocking | MySQL InnoDB reader blocks writer (and vice versa) in older versions | Better read scalability |
|
||||
| **AI/vectors** | pgvector natively in DB | Separate vector DB (increased latency) | Hybrid queries in single SQL |
|
||||
| **License** | PostgreSQL license (MIT-like) | MySQL dual license (Oracle) | No vendor lock-in |
|
||||
|
||||
### When to use PostgreSQL
|
||||
|
||||
- **Enterprise applications** — require ACID, referential integrity, complex transactions
|
||||
- **Geographic systems** — GIS, map applications, location services
|
||||
- **Financial systems** — accounting, banking, compliance (audit logging, SSI)
|
||||
- **AI / RAG applications** — hybrid vector + relational queries in one DB
|
||||
- **Analytics on relational data** — pg_duckdb, materialized views, window functions
|
||||
- **Multi-tenant applications** — row-level security, schemas per tenant
|
||||
|
||||
## PostgreSQL licensing
|
||||
|
||||
| Variant | License | Price | Restrictions |
|
||||
|---------|---------|-------|-------------|
|
||||
| **PostgreSQL** | PostgreSQL license (MIT-like) | $0 | None — can use, modify, distribute in commercial products. No "commercial license" needed |
|
||||
| **Amazon Aurora PostgreSQL** | Proprietary (AWS) | ~$0.10-1.00/hour | AWS managed, PostgreSQL compatible. AWS may use PG code thanks to PostgreSQL license |
|
||||
| **YugabyteDB** | Apache 2.0 | $0 (core) | PostgreSQL compatible distributed SQL, built on PG query layer |
|
||||
| **TimescaleDB** | Apache 2.0 (community) / Timescale License (enterprise) | $0 (community) | Time-series extensions for PostgreSQL. Enterprise: tiered storage, compression, multi-node |
|
||||
|
||||
**Key point**: The PostgreSQL license is one of the most liberal — it allows cloud providers (AWS, GCP, Azure) to offer PostgreSQL as a managed service without restrictions. This is different from MongoDB (SSPL) and Redis (RSALv2). Thanks to this, PostgreSQL has the broadest cloud support of any database.
|
||||
|
||||
**Impact on choice**: No license risk, no vendor lock-in, no hidden costs. PostgreSQL is a safe choice for any project.
|
||||
|
||||
### When to use something else
|
||||
|
||||
- **Simple web / blog** → SQLite (lighter in embedded scenarios)
|
||||
- **High-throughput key-value** → Redis (order of magnitude lower latency)
|
||||
- **Time-series at massive scale** → TimescaleDB, InfluxDB
|
||||
- **Globally distributed data** → CockroachDB, Spanner
|
||||
- **Full-text search primarily** → Elasticsearch
|
||||
|
||||
## Sources
|
||||
|
||||
References, books, and standards: [sources/databases/sources.md](sources/databases/sources.md)
|
||||
|
||||
### Recommended reading
|
||||
|
||||
| Book | Authors | ISBN | Description |
|
||||
|------|---------|------|-------------|
|
||||
| PostgreSQL: Up and Running (3rd ed.) | Regina Obe, Leo Hsu | 978-1491962935 | Practical guide to administration, configuration, and extensions |
|
||||
| AI-Ready PostgreSQL 18 | Kumar, Linster | — | PostgreSQL as unified platform for AI workloads |
|
||||
|
||||
*Last revision: 2026-06-03*
|
||||
Reference in New Issue
Block a user