# 🥬 MongoDB ## Overview MongoDB is the most widespread document-oriented NoSQL database. It stores data as BSON (binary JSON) documents with a flexible schema. Suitable for applications with rapid development where the schema frequently migrates or is diverse. ## Data model - **Database** → Collection → Document (JSON/BSON) - **Document** — fields with key-value, nested objects, arrays - **Flexible schema** — each document can have different fields (but not recommended) - **ObjectID** — default primary key (12-byte: timestamp + machine + PID + counter) ## Architecture ``` mongod (individual node) ├── WiredTiger storage engine (default since 3.2) │ ├── B-Tree indexes (B-Tree, not LSM) │ ├── MVCC (snapshot isolation) │ ├── Compression (zlib, snappy, zstd) │ └── Cache (WiredTiger internal cache) ├── Replication (replica set) │ ├── Primary (all writes) │ └── Secondary (replication, optional reads) └── Sharding (cluster) ├── mongos (router) ├── Config servers (metadata) └── Shards (replica sets) ``` ### Replica set - Primary node = all writes, secondary = replication (oplog) - Automatic failover (election among secondaries) - Up to 50 nodes in a replica set, max 7 voting nodes - Read preference: primary (default), primaryPreferred, secondary, secondaryPreferred, nearest ### Sharding - Shard key = decisive for distribution - **Range sharding** — close data on the same shard (good for range queries, risk of hot spots) - **Hashed sharding** — even distribution (good for write throughput, bad for range queries) - **Zoned sharding** — data placed according to zones (geo-distribution, compliance) ## Index types | Type | Description | |------|-------------| | **Single field** | Standard B-Tree index | | **Compound** | Multiple fields in index (order matters) | | **Multikey** | Index on array field — each value separately | | **Text** | Full-text search | | **Geospatial (2d, 2dsphere)** | Geo queries (near, within, intersect) | | **Hashed** | For hashed sharding | | **TTL** | Automatic document deletion after expiration | | **Wildcard** | Index on unknown/irregular fields | ## Aggregation pipeline MongoDB pipeline framework for data transformations: ```javascript db.orders.aggregate([ { $match: { status: "shipped" } }, { $group: { _id: "$customer_id", total: { $sum: "$amount" } } }, { $sort: { total: -1 } }, { $limit: 10 } ]) ``` ## Recommendations — where MongoDB is better | Area | MongoDB | Competition | Why MongoDB | |------|---------|-------------|-------------| | **Flexible schema** | Schema-less, changes without migration | PostgreSQL (ALTER TABLE + migration) | Rapid development, MVP, frequent model changes | | **JSON / documents** | Native BSON, nested objects | PostgreSQL (jsonb, but lacks $ operators) | Simpler object mapping from code | | **Horizontal scaling** | Native sharding (mongos + config) | MySQL (Vitess external) | Built-in, simple to set up | | **Geo-distribution** | Zoned sharding, replica set per region | Cassandra (AP model, different philosophy) | CP from CAP, consistency + distribution | | **Aggregation** | Aggregation pipeline, $lookup (LEFT JOIN) | PostgreSQL (SQL JOINs, more powerful) | Useful for denormalized data | | **Development speed** | ORM-like (Mongoose), natural JSON | SQL (schema first, migrations) | Fastest time-to-market | ### When to use MongoDB - **Rapid development / MVP** — schema evolves frequently, no migrations - **Catalog data** — products with varying attributes (e-commerce, marketplace) - **Content management** — diverse content (blog, CMS, headless CMS) - **Real-time analytics** — aggregations, dashboards, event data - **IoT / sensor data** — diverse message structures - **Mobile applications** — JSON documents naturally map to API responses ### When to use something else - **Financial transactions** → PostgreSQL (ACID, referential integrity) - **Complex reports / JOINs** → PostgreSQL or ClickHouse - **Relationship data (friends, follows)** → Neo4j (graph DB) - **High-throughput writes** → Cassandra (AP model, no master bottleneck) - **Small data, single server** → SQLite (simpler, no daemon) ## MongoDB licensing MongoDB changed its license in 2018 from GNU AGPL v3 to **SSPL** (Server Side Public License): | Variant | License | Price | Conditions | |---------|---------|-------|------------| | **MongoDB Community** | SSPL | Free | SSPL: if you offer MongoDB as a managed service, you must release the entire stack (incl. orchestration, monitoring) as open source. Internal use without restrictions | | **MongoDB Enterprise Advanced** | Commercial | ~$10,000/server/year (Atlas: pay-per-use) | Enterprise features (LDAP, Kerberos, auditing, encryption), 24/7 support | | **MongoDB Atlas** | Managed | Pay-per-use (~$0.10-5.00/hour depending on instance) | Fully managed, multi-cloud, auto-scaling, backup, monitoring | **Impact**: SSPL is similar to Redis model — self-hosted internal use without restrictions, cloud providers (AWS, Azure) cannot offer MongoDB as a managed service without commercial agreement. Alternative: **FerretDB** (open source proxy compatible with MongoDB wire protocol). ## Sources References, books, and standards: [sources/databases/sources.md](sources/databases/sources.md) *Last revision: 2026-06-03*