🧠 Vector Databases

Overview

Specialized databases for storing and searching embeddings — vector representations of unstructured data (text, images, audio, video). They enable semantic search based on similarity, not exact matching. A key building block for RAG (Retrieval-Augmented Generation) and AI applications.

Embeddings

Map unstructured data into a vector space (list of numbers)
Proximity in vector space = semantic similarity
Generated by models: Word2Vec, BERT, OpenAI embeddings, E5, Cohere, Mistral
Dimensions: 384 (all-MiniLM) to 3072 (OpenAI text-embedding-3-large)

Vector indexing

Method	Algorithm	Description	Accuracy	Speed
Flat (brute-force)	Full scan	Comparison with all vectors	100%	O(N) — slow for > 100K
IVF (Inverted File)	K-means clustering	Partition into clusters, search nearest cluster	~95-99%	O(sqrt(N))
HNSW (Hierarchical Navigable Small World)	Navigable graph	Multi-level graph, greedy search	~99-100%	O(log N)
IVF-PQ	IVF + Product Quantization	Vector compression, less memory	~90-95%	O(sqrt(N))
DiskANN	SSD-based graph	Vectors on disk, Vamana graph	~95-98%	O(log N) + I/O

Index selection

Number of vectors	Requirement	Recommended index
< 100K	100% accuracy	Flat
100K - 10M	High accuracy, speed	HNSW
10M+	Memory efficiency	IVF-PQ, DiskANN
100M+	Scaling on SSD	DiskANN

Use case: RAG (Retrieval-Augmented Generation)

User query → Embedding model → Vector DB search → Relevant chunks → LLM → Answer

Variants:

Naive RAG — single retrieval + single generation
Advanced RAG — pre-retrieval (query rewriting, HyDE) + post-retrieval (reranking, filtering)
Multi-modal RAG — text + images + audio in one pipeline

Tools — comparison

Tool	Type	Indexes	Cloud	Self-hosted	Note
Pinecone	Managed	HNSW, IVF-PQ	Yes	No	Fully managed, no ops. Pricing by dimension and vector count
Weaviate	Open source	HNSW, Flat	Yes (WCD)	Yes	Graph + vector, hybrid queries, modular (generative search)
Qdrant	Open source	HNSW, IVF-PQ, quantization	Yes (Cloud)	Yes	Rust, batch API, filter concurrent with vector search
Milvus	Open source	IVF, HNSW, IVF-PQ, DiskANN	Yes (Zilliz)	Yes	GPU acceleration. More complex ops (K8s required)
pgvector	PostgreSQL extension	IVFFlat, HNSW	All (via RDS)	Yes	Embeddings directly in PostgreSQL. Hybrid SQL + vectors
Chroma	Open source	HNSW	No	Yes	Simple embedding + retrieval, Python-native
LanceDB	Open source	IVF-PQ	No	Yes	Multi-modal data, Arrow format, no server (embedded)
Elasticsearch	Search engine	HNSW (8.0+)	Yes (Cloud)	Yes	If you already have ES, can use for vectors too

pgvector vs standalone vector DB

Feature	pgvector	Standalone (Pinecone, Qdrant, Milvus)
Architecture	Extension in PostgreSQL	Standalone service
Hybrid queries	Native SQL + vectors	Requires coordination of two systems
Latency	Higher (disk-based PG)	Lower (in-memory indexes)
Scaling	PG replication / Citus	Native sharding, rebalancing
Consistency	PG ACID transactions	Eventual consistency
Operations	One system	Two systems (operational overhead)

Recommendations — Tool selection

Scenario	Recommendation	Rationale
RAG on PostgreSQL data	pgvector	Hybrid SQL + vectors in one DB
RAG production, no ops	Pinecone	Fully managed, scalable, no operations
Self-hosted RAG	Qdrant (simpler) / Milvus (performance)	Open source, data control
Full-text + vectors	Elasticsearch / Weaviate	Combination of BM25 + vector score
Research / prototyping	Chroma	Python-native, quick start
Embedded / edge	LanceDB	No server, Arrow format
Multi-modal data	Weaviate / LanceDB	Native image, audio, video support
GPU acceleration	Milvus	CUDA support for index build

When to (not) use a vector DB

Use when:

You need semantic search (similarity by meaning, not keywords)
You are building a RAG / AI assistant over your own data
Document/image deduplication (near-duplicate detection)
Recommendation systems (similar content, similar users)

Do not use when:

You need exact matching (keys, IDs, foreign keys) → SQL
Full-text search suffices (BM25, stemming) → Elasticsearch, PostgreSQL full-text
Vectors are just a complement to the primary DB → pgvector (simplicity)
Fewer than 1000 documents → brute-force in application is sufficient

Sources

References, books, and standards: sources/databases/sources.en.md

5.2 KiB Raw Blame History