106 lines
5.2 KiB
Markdown
106 lines
5.2 KiB
Markdown
# 🧠 Vector Databases
|
|
|
|
## Overview
|
|
|
|
Specialized databases for storing and searching **embeddings** — vector representations of unstructured data (text, images, audio, video). They enable **semantic search** based on similarity, not exact matching. A key building block for RAG (Retrieval-Augmented Generation) and AI applications.
|
|
|
|
## Embeddings
|
|
|
|
- Map unstructured data into a vector space (list of numbers)
|
|
- Proximity in vector space = semantic similarity
|
|
- Generated by models: Word2Vec, BERT, OpenAI embeddings, E5, Cohere, Mistral
|
|
- Dimensions: 384 (all-MiniLM) to 3072 (OpenAI text-embedding-3-large)
|
|
|
|
## Vector indexing
|
|
|
|
| Method | Algorithm | Description | Accuracy | Speed |
|
|
|--------|-----------|-------------|----------|-------|
|
|
| **Flat (brute-force)** | Full scan | Comparison with all vectors | 100% | O(N) — slow for > 100K |
|
|
| **IVF** (Inverted File) | K-means clustering | Partition into clusters, search nearest cluster | ~95-99% | O(sqrt(N)) |
|
|
| **HNSW** (Hierarchical Navigable Small World) | Navigable graph | Multi-level graph, greedy search | ~99-100% | O(log N) |
|
|
| **IVF-PQ** | IVF + Product Quantization | Vector compression, less memory | ~90-95% | O(sqrt(N)) |
|
|
| **DiskANN** | SSD-based graph | Vectors on disk, Vamana graph | ~95-98% | O(log N) + I/O |
|
|
|
|
### Index selection
|
|
|
|
| Number of vectors | Requirement | Recommended index |
|
|
|------------------|-------------|------------------|
|
|
| < 100K | 100% accuracy | Flat |
|
|
| 100K - 10M | High accuracy, speed | HNSW |
|
|
| 10M+ | Memory efficiency | IVF-PQ, DiskANN |
|
|
| 100M+ | Scaling on SSD | DiskANN |
|
|
|
|
## Use case: RAG (Retrieval-Augmented Generation)
|
|
|
|
```text
|
|
User query → Embedding model → Vector DB search → Relevant chunks → LLM → Answer
|
|
```
|
|
|
|
Variants:
|
|
- **Naive RAG** — single retrieval + single generation
|
|
- **Advanced RAG** — pre-retrieval (query rewriting, HyDE) + post-retrieval (reranking, filtering)
|
|
- **Multi-modal RAG** — text + images + audio in one pipeline
|
|
|
|
## Tools — comparison
|
|
|
|
| Tool | Type | Indexes | Cloud | Self-hosted | Note |
|
|
|------|------|---------|-------|-------------|------|
|
|
| **Pinecone** | Managed | HNSW, IVF-PQ | Yes | No | Fully managed, no ops. Pricing by dimension and vector count |
|
|
| **Weaviate** | Open source | HNSW, Flat | Yes (WCD) | Yes | Graph + vector, hybrid queries, modular (generative search) |
|
|
| **Qdrant** | Open source | HNSW, IVF-PQ, quantization | Yes (Cloud) | Yes | Rust, batch API, filter concurrent with vector search |
|
|
| **Milvus** | Open source | IVF, HNSW, IVF-PQ, DiskANN | Yes (Zilliz) | Yes | GPU acceleration. More complex ops (K8s required) |
|
|
| **pgvector** | PostgreSQL extension | IVFFlat, HNSW | All (via RDS) | Yes | Embeddings directly in PostgreSQL. Hybrid SQL + vectors |
|
|
| **Chroma** | Open source | HNSW | No | Yes | Simple embedding + retrieval, Python-native |
|
|
| **LanceDB** | Open source | IVF-PQ | No | Yes | Multi-modal data, Arrow format, no server (embedded) |
|
|
| **Elasticsearch** | Search engine | HNSW (8.0+) | Yes (Cloud) | Yes | If you already have ES, can use for vectors too |
|
|
|
|
### pgvector vs standalone vector DB
|
|
|
|
| Feature | pgvector | Standalone (Pinecone, Qdrant, Milvus) |
|
|
|---------|----------|---------------------------------------|
|
|
| **Architecture** | Extension in PostgreSQL | Standalone service |
|
|
| **Hybrid queries** | Native SQL + vectors | Requires coordination of two systems |
|
|
| **Latency** | Higher (disk-based PG) | Lower (in-memory indexes) |
|
|
| **Scaling** | PG replication / Citus | Native sharding, rebalancing |
|
|
| **Consistency** | PG ACID transactions | Eventual consistency |
|
|
| **Operations** | One system | Two systems (operational overhead) |
|
|
|
|
## Recommendations — Tool selection
|
|
|
|
| Scenario | Recommendation | Rationale |
|
|
|----------|---------------|-----------|
|
|
| **RAG on PostgreSQL data** | pgvector | Hybrid SQL + vectors in one DB |
|
|
| **RAG production, no ops** | Pinecone | Fully managed, scalable, no operations |
|
|
| **Self-hosted RAG** | Qdrant (simpler) / Milvus (performance) | Open source, data control |
|
|
| **Full-text + vectors** | Elasticsearch / Weaviate | Combination of BM25 + vector score |
|
|
| **Research / prototyping** | Chroma | Python-native, quick start |
|
|
| **Embedded / edge** | LanceDB | No server, Arrow format |
|
|
| **Multi-modal data** | Weaviate / LanceDB | Native image, audio, video support |
|
|
| **GPU acceleration** | Milvus | CUDA support for index build |
|
|
|
|
## When to (not) use a vector DB
|
|
|
|
**Use** when:
|
|
- You need semantic search (similarity by meaning, not keywords)
|
|
- You are building a RAG / AI assistant over your own data
|
|
- Document/image deduplication (near-duplicate detection)
|
|
- Recommendation systems (similar content, similar users)
|
|
|
|
**Do not use** when:
|
|
- You need exact matching (keys, IDs, foreign keys) → SQL
|
|
- Full-text search suffices (BM25, stemming) → Elasticsearch, PostgreSQL full-text
|
|
- Vectors are just a complement to the primary DB → pgvector (simplicity)
|
|
- Fewer than 1000 documents → brute-force in application is sufficient
|
|
|
|
## Sources
|
|
|
|
References, books, and standards: [sources/databases/sources.md](sources/databases/sources.md)
|
|
|
|
### Recommended reading
|
|
|
|
| Book | Authors | Description |
|
|
|------|---------|-------------|
|
|
| Vector Databases | Borwankar (2026) | Comprehensive guide to vector DBs from concepts to production deployment |
|
|
|
|
*Last revision: 2026-06-03*
|