Files
knowledge-base/MONGODB.en.md
Stanislav Hubacek ef3c2f75b1 18.6.2026
2026-06-18 16:25:33 +02:00

117 lines
5.3 KiB
Markdown

# 🥬 MongoDB
## Overview
MongoDB is the most widespread document-oriented NoSQL database. It stores data as BSON (binary JSON) documents with a flexible schema. Suitable for applications with rapid development where the schema frequently migrates or is diverse.
## Data model
- **Database** → Collection → Document (JSON/BSON)
- **Document** — fields with key-value, nested objects, arrays
- **Flexible schema** — each document can have different fields (but not recommended)
- **ObjectID** — default primary key (12-byte: timestamp + machine + PID + counter)
## Architecture
```
mongod (individual node)
├── WiredTiger storage engine (default since 3.2)
│ ├── B-Tree indexes (B-Tree, not LSM)
│ ├── MVCC (snapshot isolation)
│ ├── Compression (zlib, snappy, zstd)
│ └── Cache (WiredTiger internal cache)
├── Replication (replica set)
│ ├── Primary (all writes)
│ └── Secondary (replication, optional reads)
└── Sharding (cluster)
├── mongos (router)
├── Config servers (metadata)
└── Shards (replica sets)
```
### Replica set
- Primary node = all writes, secondary = replication (oplog)
- Automatic failover (election among secondaries)
- Up to 50 nodes in a replica set, max 7 voting nodes
- Read preference: primary (default), primaryPreferred, secondary, secondaryPreferred, nearest
### Sharding
- Shard key = decisive for distribution
- **Range sharding** — close data on the same shard (good for range queries, risk of hot spots)
- **Hashed sharding** — even distribution (good for write throughput, bad for range queries)
- **Zoned sharding** — data placed according to zones (geo-distribution, compliance)
## Index types
| Type | Description |
|------|-------------|
| **Single field** | Standard B-Tree index |
| **Compound** | Multiple fields in index (order matters) |
| **Multikey** | Index on array field — each value separately |
| **Text** | Full-text search |
| **Geospatial (2d, 2dsphere)** | Geo queries (near, within, intersect) |
| **Hashed** | For hashed sharding |
| **TTL** | Automatic document deletion after expiration |
| **Wildcard** | Index on unknown/irregular fields |
## Aggregation pipeline
MongoDB pipeline framework for data transformations:
```javascript
db.orders.aggregate([
{ $match: { status: "shipped" } },
{ $group: { _id: "$customer_id", total: { $sum: "$amount" } } },
{ $sort: { total: -1 } },
{ $limit: 10 }
])
```
## Recommendations — where MongoDB is better
| Area | MongoDB | Competition | Why MongoDB |
|------|---------|-------------|-------------|
| **Flexible schema** | Schema-less, changes without migration | PostgreSQL (ALTER TABLE + migration) | Rapid development, MVP, frequent model changes |
| **JSON / documents** | Native BSON, nested objects | PostgreSQL (jsonb, but lacks $ operators) | Simpler object mapping from code |
| **Horizontal scaling** | Native sharding (mongos + config) | MySQL (Vitess external) | Built-in, simple to set up |
| **Geo-distribution** | Zoned sharding, replica set per region | Cassandra (AP model, different philosophy) | CP from CAP, consistency + distribution |
| **Aggregation** | Aggregation pipeline, $lookup (LEFT JOIN) | PostgreSQL (SQL JOINs, more powerful) | Useful for denormalized data |
| **Development speed** | ORM-like (Mongoose), natural JSON | SQL (schema first, migrations) | Fastest time-to-market |
### When to use MongoDB
- **Rapid development / MVP** — schema evolves frequently, no migrations
- **Catalog data** — products with varying attributes (e-commerce, marketplace)
- **Content management** — diverse content (blog, CMS, headless CMS)
- **Real-time analytics** — aggregations, dashboards, event data
- **IoT / sensor data** — diverse message structures
- **Mobile applications** — JSON documents naturally map to API responses
### When to use something else
- **Financial transactions** → PostgreSQL (ACID, referential integrity)
- **Complex reports / JOINs** → PostgreSQL or ClickHouse
- **Relationship data (friends, follows)** → Neo4j (graph DB)
- **High-throughput writes** → Cassandra (AP model, no master bottleneck)
- **Small data, single server** → SQLite (simpler, no daemon)
## MongoDB licensing
MongoDB changed its license in 2018 from GNU AGPL v3 to **SSPL** (Server Side Public License):
| Variant | License | Price | Conditions |
|---------|---------|-------|------------|
| **MongoDB Community** | SSPL | Free | SSPL: if you offer MongoDB as a managed service, you must release the entire stack (incl. orchestration, monitoring) as open source. Internal use without restrictions |
| **MongoDB Enterprise Advanced** | Commercial | ~$10,000/server/year (Atlas: pay-per-use) | Enterprise features (LDAP, Kerberos, auditing, encryption), 24/7 support |
| **MongoDB Atlas** | Managed | Pay-per-use (~$0.10-5.00/hour depending on instance) | Fully managed, multi-cloud, auto-scaling, backup, monitoring |
**Impact**: SSPL is similar to Redis model — self-hosted internal use without restrictions, cloud providers (AWS, Azure) cannot offer MongoDB as a managed service without commercial agreement. Alternative: **FerretDB** (open source proxy compatible with MongoDB wire protocol).
## Sources
References, books, and standards: [sources/databases/sources.en.md](sources/databases/sources.en.md)
*Last revision: 2026-06-03*