Files
knowledge-base/MESSAGING.en.md
Stanislav Hubacek ef3c2f75b1 18.6.2026
2026-06-18 16:25:33 +02:00

275 lines
12 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 📨 Messaging and streaming platforms
## Platform overview
| Platform | Type | Language | Protocol | Persistence | Use case |
|-----------|-----|-------|----------|-------------|----------|
| **Apache Kafka** | Distributed event store | Java/Scala | Binary (TCP) | Disk (log) | Event streaming, data pipeline, log aggregation |
| **RabbitMQ** | Message broker | Erlang | AMQP 0-9-1, MQTT, STOMP | Disk / RAM | Application messaging, task queue, RPC |
| **Apache Pulsar** | Distributed messaging + streaming | Java | Binary (TCP) + REST | Disk (segmented log) | Streaming + queue in one, multi-tenant |
| **NATS** | Lightweight messaging | Go | NATS protocol (TCP) | Memory / JetStream (disk) | Microservices, IoT, edge, low-latency |
| **AWS SQS** | Managed queue | — | HTTPS | Managed | Decoupling services, serverless |
| **AWS SNS** | Managed pub/sub | — | HTTPS, SQS, Lambda, email | Managed | Push notifications, fanout |
| **Azure Service Bus** | Managed messaging | — | AMQP, HTTPS | Managed | Enterprise messaging, sessions, transactions |
| **Google Pub/Sub** | Managed streaming | — | gRPC, REST | Managed | Event-driven, data pipeline |
| **Red Hat AMQ 7** (Artemis) | Message broker | Java | AMQP, MQTT, STOMP, OpenWire | Disk | Enterprise, JMS, high-availability |
| **Oracle Service Bus (OSB)** | Enterprise ESB | Java | HTTP/S, JMS, SOAP, REST, MQ, FTP, AQ | Managed (WebLogic) | Enterprise integration, SOA, protocol mediation, routing |
---
## Platform details
### Apache Kafka
**Architecture:**
```
Producer ──► Topic ──► Partition ──► Consumer Group
├── Partition 0 (Leader) ──► Broker 1
├── Partition 1 (Follower) ──► Broker 2
└── Partition 2 (Follower) ──► Broker 3
```
| Concept | Description |
|---------|-------|
| **Topic** | Logical message category |
| **Partition** | Append-only log, ordered sequence of messages |
| **Broker** | Server in Kafka cluster |
| **Producer** | Publishes messages to topic |
| **Consumer** | Reads messages from partition (within consumer group) |
| **Consumer Group** | Group of consumers sharing topic reading |
| **Offset** | Position in partition (tracked by consumer) |
| **KRaft** | Controller quorum (replaces Zookeeper from Kafka 3.x) |
**Replication and HA:**
| Parameter | Value |
|----------|---------|
| Replication factor | 23 (typically 3 for production) |
| ISR (In-Sync Replicas) | Number of replicas keeping up with leader |
| Min ISR | Minimum ISR for acknowledging writes (acks=all) |
| acks=0 | Fire-and-forget (fastest, possible data loss) |
| acks=1 | Write acknowledged by leader (compromise) |
| acks=all | Write acknowledged by all ISR (safest) |
| Leader failover | Automatic election of new leader from ISR |
**Important configuration:**
```properties
# Production
replication.factor=3
min.insync.replicas=2
default.replication.factor=3
# Retention
log.retention.hours=168 # 7 days
log.retention.bytes=-1 # unlimited (or limit)
log.segment.bytes=1073741824 # 1 GB per segment
# Performance
num.partitions=3 # adjust per need (scale-out)
compression.type=snappy # (snappy, gzip, lz4, zstd)
```
**Partitioning strategies:**
| Strategy | Key | Advantage | Disadvantage |
|----------|------|--------|----------|
| Round-robin | null | Even distribution | Per-key ordering lost |
| Key-based | user_id, order_id | Same key → same partition | Uneven distribution (hot keys) |
| Custom partitioner | Custom logic | Per use-case optimization | More complex maintenance |
### RabbitMQ
**Architecture:**
```
Producer ──► Exchange ──► Binding ──► Queue ──► Consumer
┌───────────┼───────────┐
▼ ▼ ▼
Direct Topic Fanout
Exchange Exchange Exchange
```
| Concept | Description |
|---------|-------|
| **Exchange** | Receives messages from producer, routes to queue |
| **Binding** | Exchange → queue link with routing key |
| **Queue** | FIFO message queue (consumed by consumer) |
| **Virtual Host (vhost)** | Tenant isolation within a single cluster |
| **Publisher Confirm** | Broker acknowledges message receipt |
| **Consumer Ack** | Consumer acknowledges message processing |
**Exchange types:**
| Type | Routing | Use case |
|-----|---------|----------|
| **Direct** | routing_key = binding_key | Task queue, point-to-point |
| **Topic** | routing_key match binding pattern (wildcard `*`, `#`) | Pub/sub with filtering |
| **Fanout** | All bound queues | Broadcast, event notification |
| **Headers** | AMQP headers match | Complex routing (not routing key dependent) |
**Queue types:**
```properties
# Classic Queue (deprecated in production)
x-queue-type: classic
# Quorum Queue (recommended for production)
x-queue-type: quorum
x-quorum-initial-group-size: 3
x-dead-letter-exchange: dlx
# Stream Queue (for large backlogs)
x-queue-type: stream
x-max-length-bytes: 1073741824
```
**HA and clustering:**
| Mode | Description | Use case |
|-------|-------|----------|
| **Quorum Queues** | Raft-based replication (35 node), auto failover | Production, HA messaging |
| **Federation** | Async message forwarding between independent RabbitMQ clusters | Multi-region, DR |
| **Shovel** | Point-to-point message forwarding (Federation at queue level) | Migration, specific routing |
| **Warm Standby (DR)** | Secondary cluster, started on failover | Cold DR |
### Apache Pulsar
**Unique architecture (compute/storage separation):**
```
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Producer │ │ Consumer │ │ Consumer │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
┌──────▼───────────────────▼───────────────────▼──────┐
│ Broker (stateless) │
│ Subscription: Exclusive / Shared / Failover │
└──────────────────────┬──────────────────────────────┘
┌──────────────────────▼──────────────────────────────┐
│ BookKeeper (stateful storage) │
│ ├── Bookie 1 ├── Bookie 2 ├── Bookie 3 ├── ... │
│ └── Ledger (append-only, segmented log) │
└─────────────────────────────────────────────────────┘
```
| Concept | Description |
|---------|-------|
| **Topic** | Logical category (partitioned or non-partitioned) |
| **Subscription** | Delivery mode (Exclusive, Shared, Failover, Key_Shared) |
| **Ledger** | Storage unit in BookKeeper (append-only) |
| **Bookie** | Storage node (BookKeeper) |
| **Managed Ledger** | Segmented log with cache and retention |
**Advantages over Kafka:**
- Compute/storage separation — independent scaling
- Geo-replication built-in (native)
- Multi-tenant (namespaces, isolation)
- TTL, retry, dead letter topic (built-in)
- Read-at-least-once / effectively-once
### NATS
| Feature | Description |
|---------|-------|
| **Core NATS** | Pub/sub, request-reply, < 1 ms latency |
| **JetStream** | Persistence, exactly-once, key-value store, object store |
| **Leaf nodes** | Hierarchical cluster connection |
| **Super-cluster** | Multi-region clustering (global) |
**Use case:** IoT, edge computing, microservices communication, low-latency messaging.
### Oracle Service Bus (OSB)
Part of Oracle SOA Suite, runs on WebLogic Server. Enterprise service bus for integration in Oracle-heavy environments.
| Concept | Description |
|---------|-------|
| **Proxy Service** | Inbound endpoint (HTTP, JMS, MQ, SOAP, REST) |
| **Business Service** | Target backend service |
| **Pipeline** | Message processing — routing, transformation, validation |
| **Split-Join** | Parallel/sequential orchestration of multiple services |
| **Reporting** | Message tracking, SLA monitoring |
**Key features:**
- **Protocol mediation** — translation between SOAP/REST/JMS/MQ/FTP
- **Message transformation** — XSLT, XQuery, MFL (non-XML)
- **Throttling, SLA, alerting** — built-in
- **Oracle AQ (Advanced Queuing)** — integration with Oracle DB queues
- **XPath, XQuery, XSLT 2.0/3.0** — native support
- **Error handling** — fault policies, error queues, retry
**Use case:** Enterprise SOA, Oracle DB → Kafka bridging, legacy mainframe wrapping, B2B integration.
**Alternatives:** IBM Integration Bus (IIB), MuleSoft Anypoint, WSO2 EI, Apache Camel / ServiceMix.
---
## Platform comparison
### Performance and scaling
| Platform | Max throughput | Latency (P99) | Messages/s (1 broker) | Scaling |
|-----------|--------------|---------------|-------------------------|-----------|
| **Kafka** | > 1 GB/s | 210 ms | ~1,000,000 | Partitions (horizontal) |
| **Pulsar** | > 1 GB/s | 515 ms | ~1,000,000 | Brokers + Bookies |
| **RabbitMQ** | ~100 MB/s | < 1 ms (RAM) | ~100,000 | Clustering (node) |
| **NATS** | > 10 GB/s | < 0.5 ms | ~10,000,000 | Clustering + Leaf nodes |
| **OSB** | < 1 GB/s | 10100 ms | ~10,000 | Vertical (WebLogic cluster)
### Delivery guarantees
| Platform | At most once | At least once | Exactly once | Ordering |
|-----------|-------------|---------------|-------------|----------|
| **Kafka** | Yes | Yes (acks=all + min.insync) | Yes (idempotent + transactional) | Per partition |
| **Pulsar** | Yes | Yes | Yes (dedup + transactional) | Per partition |
| **RabbitMQ** | Yes | Yes (Publisher Confirm + Consumer Ack) | Limited | Per queue |
| **NATS** | Yes | Yes (JetStream) | Limited | Per subject |
| **OSB** | Yes | Yes (XA transactions, exactly-once delivery) | Yes (XA + WS-AT) | Per pipeline |
### When to use what
| Use case | Recommended platform | Reasoning |
|----------|---------------------|------------|
| **Event sourcing / audit log** | Kafka, Pulsar | Append-only log, high throughput, replay |
| **CDC (Change Data Capture)** | Kafka (Kafka Connect + Debezium) | Connector ecosystem |
| **Task queue (job processing)** | RabbitMQ, SQS | Dead letter, retry, priority, scheduling |
| **API messaging / microservices** | NATS, RabbitMQ | Low latency, simplicity |
| **Data pipeline (ETL)** | Kafka (KSQL, Kafka Streams) | Stream processing in platform |
| **IoT / Edge** | NATS, MQTT (RabbitMQ) | Lightweight, leaf nodes |
| **Enterprise SOA / EAI** | OSB, IBM IIB, MuleSoft | Protocol mediation, XA, B2B, legacy wrapping |
| **Multi-tenant cloud** | Pulsar | Native multi-tenant, geo-replication |
| **Serverless / event-driven** | SQS/SNS, Pub/Sub | Managed, auto-scaling |
---
## DR and high availability
See [DATACENTERS.en.md](DATACENTERS.en.md) — section "Impact of individual technologies on DC topology selection" for detailed DR mapping per platform.
### Best practices
- **Don't lose messages in queue** — prefer acknowledgement-based consumption (not auto-ack)
- **Dead letter queue** — every main queue has a DLQ for undeliverable messages
- **Monitor lag** — consumer lag is a key metric (Kafka: `kafka.consumer:consumer_lag`)
- **Idempotent consumer** — same message may be delivered twice
- **Retry with backoff** — exponential backoff on processing failure
- **Schema registry** — avoid deserialization errors (Avro, Protobuf, JSON Schema)
- **Encryption** — TLS in transit, encryption at rest (Kafka: cluster-side + topic-level)
---
## Related
- [DATACENTERS.en.md](DATACENTERS.en.md) — DR topology, per-platform mapping
- [CLOUD.en.md](CLOUD.en.md) — managed messaging (SQS, SNS, Service Bus, Pub/Sub)
## Sources
Links, books, and standards: [sources/infrastructure/sources.en.md](sources/infrastructure/sources.en.md)
*Last revision: 2026-06-12*