5.3 KiB
5.3 KiB
🥬 MongoDB
Overview
MongoDB is the most widespread document-oriented NoSQL database. It stores data as BSON (binary JSON) documents with a flexible schema. Suitable for applications with rapid development where the schema frequently migrates or is diverse.
Data model
- Database → Collection → Document (JSON/BSON)
- Document — fields with key-value, nested objects, arrays
- Flexible schema — each document can have different fields (but not recommended)
- ObjectID — default primary key (12-byte: timestamp + machine + PID + counter)
Architecture
mongod (individual node)
├── WiredTiger storage engine (default since 3.2)
│ ├── B-Tree indexes (B-Tree, not LSM)
│ ├── MVCC (snapshot isolation)
│ ├── Compression (zlib, snappy, zstd)
│ └── Cache (WiredTiger internal cache)
├── Replication (replica set)
│ ├── Primary (all writes)
│ └── Secondary (replication, optional reads)
└── Sharding (cluster)
├── mongos (router)
├── Config servers (metadata)
└── Shards (replica sets)
Replica set
- Primary node = all writes, secondary = replication (oplog)
- Automatic failover (election among secondaries)
- Up to 50 nodes in a replica set, max 7 voting nodes
- Read preference: primary (default), primaryPreferred, secondary, secondaryPreferred, nearest
Sharding
- Shard key = decisive for distribution
- Range sharding — close data on the same shard (good for range queries, risk of hot spots)
- Hashed sharding — even distribution (good for write throughput, bad for range queries)
- Zoned sharding — data placed according to zones (geo-distribution, compliance)
Index types
| Type | Description |
|---|---|
| Single field | Standard B-Tree index |
| Compound | Multiple fields in index (order matters) |
| Multikey | Index on array field — each value separately |
| Text | Full-text search |
| Geospatial (2d, 2dsphere) | Geo queries (near, within, intersect) |
| Hashed | For hashed sharding |
| TTL | Automatic document deletion after expiration |
| Wildcard | Index on unknown/irregular fields |
Aggregation pipeline
MongoDB pipeline framework for data transformations:
db.orders.aggregate([
{ $match: { status: "shipped" } },
{ $group: { _id: "$customer_id", total: { $sum: "$amount" } } },
{ $sort: { total: -1 } },
{ $limit: 10 }
])
Recommendations — where MongoDB is better
| Area | MongoDB | Competition | Why MongoDB |
|---|---|---|---|
| Flexible schema | Schema-less, changes without migration | PostgreSQL (ALTER TABLE + migration) | Rapid development, MVP, frequent model changes |
| JSON / documents | Native BSON, nested objects | PostgreSQL (jsonb, but lacks $ operators) | Simpler object mapping from code |
| Horizontal scaling | Native sharding (mongos + config) | MySQL (Vitess external) | Built-in, simple to set up |
| Geo-distribution | Zoned sharding, replica set per region | Cassandra (AP model, different philosophy) | CP from CAP, consistency + distribution |
| Aggregation | Aggregation pipeline, $lookup (LEFT JOIN) | PostgreSQL (SQL JOINs, more powerful) | Useful for denormalized data |
| Development speed | ORM-like (Mongoose), natural JSON | SQL (schema first, migrations) | Fastest time-to-market |
When to use MongoDB
- Rapid development / MVP — schema evolves frequently, no migrations
- Catalog data — products with varying attributes (e-commerce, marketplace)
- Content management — diverse content (blog, CMS, headless CMS)
- Real-time analytics — aggregations, dashboards, event data
- IoT / sensor data — diverse message structures
- Mobile applications — JSON documents naturally map to API responses
When to use something else
- Financial transactions → PostgreSQL (ACID, referential integrity)
- Complex reports / JOINs → PostgreSQL or ClickHouse
- Relationship data (friends, follows) → Neo4j (graph DB)
- High-throughput writes → Cassandra (AP model, no master bottleneck)
- Small data, single server → SQLite (simpler, no daemon)
MongoDB licensing
MongoDB changed its license in 2018 from GNU AGPL v3 to SSPL (Server Side Public License):
| Variant | License | Price | Conditions |
|---|---|---|---|
| MongoDB Community | SSPL | Free | SSPL: if you offer MongoDB as a managed service, you must release the entire stack (incl. orchestration, monitoring) as open source. Internal use without restrictions |
| MongoDB Enterprise Advanced | Commercial | ~$10,000/server/year (Atlas: pay-per-use) | Enterprise features (LDAP, Kerberos, auditing, encryption), 24/7 support |
| MongoDB Atlas | Managed | Pay-per-use (~$0.10-5.00/hour depending on instance) | Fully managed, multi-cloud, auto-scaling, backup, monitoring |
Impact: SSPL is similar to Redis model — self-hosted internal use without restrictions, cloud providers (AWS, Azure) cannot offer MongoDB as a managed service without commercial agreement. Alternative: FerretDB (open source proxy compatible with MongoDB wire protocol).
Sources
References, books, and standards: sources/databases/sources.en.md
Last revision: 2026-06-03