knowledge-base/MONGODB.en.md

# 🥬 MongoDB

## Overview

MongoDB is the most widespread document-oriented NoSQL database. It stores data as BSON (binary JSON) documents with a flexible schema. Suitable for applications with rapid development where the schema frequently migrates or is diverse.

## Data model

- **Database** → Collection → Document (JSON/BSON)
- **Document** — fields with key-value, nested objects, arrays
- **Flexible schema** — each document can have different fields (but not recommended)
- **ObjectID** — default primary key (12-byte: timestamp + machine + PID + counter)

## Architecture

```
mongod (individual node)
  ├── WiredTiger storage engine (default since 3.2)
  │   ├── B-Tree indexes (B-Tree, not LSM)
  │   ├── MVCC (snapshot isolation)
  │   ├── Compression (zlib, snappy, zstd)
  │   └── Cache (WiredTiger internal cache)
  ├── Replication (replica set)
  │   ├── Primary (all writes)
  │   └── Secondary (replication, optional reads)
  └── Sharding (cluster)
      ├── mongos (router)
      ├── Config servers (metadata)
      └── Shards (replica sets)
```

### Replica set

- Primary node = all writes, secondary = replication (oplog)
- Automatic failover (election among secondaries)
- Up to 50 nodes in a replica set, max 7 voting nodes
- Read preference: primary (default), primaryPreferred, secondary, secondaryPreferred, nearest

### Sharding

- Shard key = decisive for distribution
- **Range sharding** — close data on the same shard (good for range queries, risk of hot spots)
- **Hashed sharding** — even distribution (good for write throughput, bad for range queries)
- **Zoned sharding** — data placed according to zones (geo-distribution, compliance)

## Index types

| Type | Description |
|------|-------------|
| **Single field** | Standard B-Tree index |
| **Compound** | Multiple fields in index (order matters) |
| **Multikey** | Index on array field — each value separately |
| **Text** | Full-text search |
| **Geospatial (2d, 2dsphere)** | Geo queries (near, within, intersect) |
| **Hashed** | For hashed sharding |
| **TTL** | Automatic document deletion after expiration |
| **Wildcard** | Index on unknown/irregular fields |

## Aggregation pipeline

MongoDB pipeline framework for data transformations:

```javascript
db.orders.aggregate([
  { $match: { status: "shipped" } },
  { $group: { _id: "$customer_id", total: { $sum: "$amount" } } },
  { $sort: { total: -1 } },
  { $limit: 10 }
])
```

## Recommendations — where MongoDB is better

| Area | MongoDB | Competition | Why MongoDB |
|------|---------|-------------|-------------|
| **Flexible schema** | Schema-less, changes without migration | PostgreSQL (ALTER TABLE + migration) | Rapid development, MVP, frequent model changes |
| **JSON / documents** | Native BSON, nested objects | PostgreSQL (jsonb, but lacks $ operators) | Simpler object mapping from code |
| **Horizontal scaling** | Native sharding (mongos + config) | MySQL (Vitess external) | Built-in, simple to set up |
| **Geo-distribution** | Zoned sharding, replica set per region | Cassandra (AP model, different philosophy) | CP from CAP, consistency + distribution |
| **Aggregation** | Aggregation pipeline, $lookup (LEFT JOIN) | PostgreSQL (SQL JOINs, more powerful) | Useful for denormalized data |
| **Development speed** | ORM-like (Mongoose), natural JSON | SQL (schema first, migrations) | Fastest time-to-market |

### When to use MongoDB

- **Rapid development / MVP** — schema evolves frequently, no migrations
- **Catalog data** — products with varying attributes (e-commerce, marketplace)
- **Content management** — diverse content (blog, CMS, headless CMS)
- **Real-time analytics** — aggregations, dashboards, event data
- **IoT / sensor data** — diverse message structures
- **Mobile applications** — JSON documents naturally map to API responses

### When to use something else

- **Financial transactions** → PostgreSQL (ACID, referential integrity)
- **Complex reports / JOINs** → PostgreSQL or ClickHouse
- **Relationship data (friends, follows)** → Neo4j (graph DB)
- **High-throughput writes** → Cassandra (AP model, no master bottleneck)
- **Small data, single server** → SQLite (simpler, no daemon)

## MongoDB licensing

MongoDB changed its license in 2018 from GNU AGPL v3 to **SSPL** (Server Side Public License):

| Variant | License | Price | Conditions |
|---------|---------|-------|------------|
| **MongoDB Community** | SSPL | Free | SSPL: if you offer MongoDB as a managed service, you must release the entire stack (incl. orchestration, monitoring) as open source. Internal use without restrictions |
| **MongoDB Enterprise Advanced** | Commercial | ~$10,000/server/year (Atlas: pay-per-use) | Enterprise features (LDAP, Kerberos, auditing, encryption), 24/7 support |
| **MongoDB Atlas** | Managed | Pay-per-use (~$0.10-5.00/hour depending on instance) | Fully managed, multi-cloud, auto-scaling, backup, monitoring |

**Impact**: SSPL is similar to Redis model — self-hosted internal use without restrictions, cloud providers (AWS, Azure) cannot offer MongoDB as a managed service without commercial agreement. Alternative: **FerretDB** (open source proxy compatible with MongoDB wire protocol).

## Sources

References, books, and standards: [sources/databases/sources.en.md](sources/databases/sources.en.md)

*Last revision: 2026-06-03*