Files
knowledge-base/MESSAGING.en.md
Stanislav Hubacek b53714113c new files
2026-06-16 15:47:45 +02:00

12 KiB
Raw Blame History

📨 Messaging and streaming platforms

Platform overview

Platform Type Language Protocol Persistence Use case
Apache Kafka Distributed event store Java/Scala Binary (TCP) Disk (log) Event streaming, data pipeline, log aggregation
RabbitMQ Message broker Erlang AMQP 0-9-1, MQTT, STOMP Disk / RAM Application messaging, task queue, RPC
Apache Pulsar Distributed messaging + streaming Java Binary (TCP) + REST Disk (segmented log) Streaming + queue in one, multi-tenant
NATS Lightweight messaging Go NATS protocol (TCP) Memory / JetStream (disk) Microservices, IoT, edge, low-latency
AWS SQS Managed queue HTTPS Managed Decoupling services, serverless
AWS SNS Managed pub/sub HTTPS, SQS, Lambda, email Managed Push notifications, fanout
Azure Service Bus Managed messaging AMQP, HTTPS Managed Enterprise messaging, sessions, transactions
Google Pub/Sub Managed streaming gRPC, REST Managed Event-driven, data pipeline
Red Hat AMQ 7 (Artemis) Message broker Java AMQP, MQTT, STOMP, OpenWire Disk Enterprise, JMS, high-availability
Oracle Service Bus (OSB) Enterprise ESB Java HTTP/S, JMS, SOAP, REST, MQ, FTP, AQ Managed (WebLogic) Enterprise integration, SOA, protocol mediation, routing

Platform details

Apache Kafka

Architecture:

Producer ──► Topic ──► Partition ──► Consumer Group
                │
                ├── Partition 0 (Leader) ──► Broker 1
                ├── Partition 1 (Follower) ──► Broker 2
                └── Partition 2 (Follower) ──► Broker 3
Concept Description
Topic Logical message category
Partition Append-only log, ordered sequence of messages
Broker Server in Kafka cluster
Producer Publishes messages to topic
Consumer Reads messages from partition (within consumer group)
Consumer Group Group of consumers sharing topic reading
Offset Position in partition (tracked by consumer)
KRaft Controller quorum (replaces Zookeeper from Kafka 3.x)

Replication and HA:

Parameter Value
Replication factor 23 (typically 3 for production)
ISR (In-Sync Replicas) Number of replicas keeping up with leader
Min ISR Minimum ISR for acknowledging writes (acks=all)
acks=0 Fire-and-forget (fastest, possible data loss)
acks=1 Write acknowledged by leader (compromise)
acks=all Write acknowledged by all ISR (safest)
Leader failover Automatic election of new leader from ISR

Important configuration:

# Production
replication.factor=3
min.insync.replicas=2
default.replication.factor=3

# Retention
log.retention.hours=168     # 7 days
log.retention.bytes=-1      # unlimited (or limit)
log.segment.bytes=1073741824 # 1 GB per segment

# Performance
num.partitions=3            # adjust per need (scale-out)
compression.type=snappy     # (snappy, gzip, lz4, zstd)

Partitioning strategies:

Strategy Key Advantage Disadvantage
Round-robin null Even distribution Per-key ordering lost
Key-based user_id, order_id Same key → same partition Uneven distribution (hot keys)
Custom partitioner Custom logic Per use-case optimization More complex maintenance

RabbitMQ

Architecture:

Producer ──► Exchange ──► Binding ──► Queue ──► Consumer
                  │
      ┌───────────┼───────────┐
      ▼           ▼           ▼
  Direct      Topic      Fanout
  Exchange   Exchange   Exchange
Concept Description
Exchange Receives messages from producer, routes to queue
Binding Exchange → queue link with routing key
Queue FIFO message queue (consumed by consumer)
Virtual Host (vhost) Tenant isolation within a single cluster
Publisher Confirm Broker acknowledges message receipt
Consumer Ack Consumer acknowledges message processing

Exchange types:

Type Routing Use case
Direct routing_key = binding_key Task queue, point-to-point
Topic routing_key match binding pattern (wildcard *, #) Pub/sub with filtering
Fanout All bound queues Broadcast, event notification
Headers AMQP headers match Complex routing (not routing key dependent)

Queue types:

# Classic Queue (deprecated in production)
x-queue-type: classic

# Quorum Queue (recommended for production)
x-queue-type: quorum
x-quorum-initial-group-size: 3
x-dead-letter-exchange: dlx

# Stream Queue (for large backlogs)
x-queue-type: stream
x-max-length-bytes: 1073741824

HA and clustering:

Mode Description Use case
Quorum Queues Raft-based replication (35 node), auto failover Production, HA messaging
Federation Async message forwarding between independent RabbitMQ clusters Multi-region, DR
Shovel Point-to-point message forwarding (Federation at queue level) Migration, specific routing
Warm Standby (DR) Secondary cluster, started on failover Cold DR

Apache Pulsar

Unique architecture (compute/storage separation):

┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│  Producer    │    │  Consumer    │    │  Consumer    │
└──────┬───────┘    └──────┬───────┘    └──────┬───────┘
       │                   │                   │
┌──────▼───────────────────▼───────────────────▼──────┐
│               Broker (stateless)                    │
│         Subscription: Exclusive / Shared / Failover │
└──────────────────────┬──────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────┐
│           BookKeeper (stateful storage)              │
│  ├── Bookie 1  ├── Bookie 2  ├── Bookie 3  ├── ... │
│  └── Ledger (append-only, segmented log)            │
└─────────────────────────────────────────────────────┘
Concept Description
Topic Logical category (partitioned or non-partitioned)
Subscription Delivery mode (Exclusive, Shared, Failover, Key_Shared)
Ledger Storage unit in BookKeeper (append-only)
Bookie Storage node (BookKeeper)
Managed Ledger Segmented log with cache and retention

Advantages over Kafka:

  • Compute/storage separation — independent scaling
  • Geo-replication built-in (native)
  • Multi-tenant (namespaces, isolation)
  • TTL, retry, dead letter topic (built-in)
  • Read-at-least-once / effectively-once

NATS

Feature Description
Core NATS Pub/sub, request-reply, < 1 ms latency
JetStream Persistence, exactly-once, key-value store, object store
Leaf nodes Hierarchical cluster connection
Super-cluster Multi-region clustering (global)

Use case: IoT, edge computing, microservices communication, low-latency messaging.

Oracle Service Bus (OSB)

Part of Oracle SOA Suite, runs on WebLogic Server. Enterprise service bus for integration in Oracle-heavy environments.

Concept Description
Proxy Service Inbound endpoint (HTTP, JMS, MQ, SOAP, REST)
Business Service Target backend service
Pipeline Message processing — routing, transformation, validation
Split-Join Parallel/sequential orchestration of multiple services
Reporting Message tracking, SLA monitoring

Key features:

  • Protocol mediation — translation between SOAP/REST/JMS/MQ/FTP
  • Message transformation — XSLT, XQuery, MFL (non-XML)
  • Throttling, SLA, alerting — built-in
  • Oracle AQ (Advanced Queuing) — integration with Oracle DB queues
  • XPath, XQuery, XSLT 2.0/3.0 — native support
  • Error handling — fault policies, error queues, retry

Use case: Enterprise SOA, Oracle DB → Kafka bridging, legacy mainframe wrapping, B2B integration.

Alternatives: IBM Integration Bus (IIB), MuleSoft Anypoint, WSO2 EI, Apache Camel / ServiceMix.


Platform comparison

Performance and scaling

Platform Max throughput Latency (P99) Messages/s (1 broker) Scaling
Kafka > 1 GB/s 210 ms ~1,000,000 Partitions (horizontal)
Pulsar > 1 GB/s 515 ms ~1,000,000 Brokers + Bookies
RabbitMQ ~100 MB/s < 1 ms (RAM) ~100,000 Clustering (node)
NATS > 10 GB/s < 0.5 ms ~10,000,000 Clustering + Leaf nodes
OSB < 1 GB/s 10100 ms ~10,000 Vertical (WebLogic cluster)

Delivery guarantees

Platform At most once At least once Exactly once Ordering
Kafka Yes Yes (acks=all + min.insync) Yes (idempotent + transactional) Per partition
Pulsar Yes Yes Yes (dedup + transactional) Per partition
RabbitMQ Yes Yes (Publisher Confirm + Consumer Ack) Limited Per queue
NATS Yes Yes (JetStream) Limited Per subject
OSB Yes Yes (XA transactions, exactly-once delivery) Yes (XA + WS-AT) Per pipeline

When to use what

Use case Recommended platform Reasoning
Event sourcing / audit log Kafka, Pulsar Append-only log, high throughput, replay
CDC (Change Data Capture) Kafka (Kafka Connect + Debezium) Connector ecosystem
Task queue (job processing) RabbitMQ, SQS Dead letter, retry, priority, scheduling
API messaging / microservices NATS, RabbitMQ Low latency, simplicity
Data pipeline (ETL) Kafka (KSQL, Kafka Streams) Stream processing in platform
IoT / Edge NATS, MQTT (RabbitMQ) Lightweight, leaf nodes
Enterprise SOA / EAI OSB, IBM IIB, MuleSoft Protocol mediation, XA, B2B, legacy wrapping
Multi-tenant cloud Pulsar Native multi-tenant, geo-replication
Serverless / event-driven SQS/SNS, Pub/Sub Managed, auto-scaling

DR and high availability

See DATACENTERS.en.md — section "Impact of individual technologies on DC topology selection" for detailed DR mapping per platform.

Best practices

  • Don't lose messages in queue — prefer acknowledgement-based consumption (not auto-ack)
  • Dead letter queue — every main queue has a DLQ for undeliverable messages
  • Monitor lag — consumer lag is a key metric (Kafka: kafka.consumer:consumer_lag)
  • Idempotent consumer — same message may be delivered twice
  • Retry with backoff — exponential backoff on processing failure
  • Schema registry — avoid deserialization errors (Avro, Protobuf, JSON Schema)
  • Encryption — TLS in transit, encryption at rest (Kafka: cluster-side + topic-level)

Sources

Links, books, and standards: sources/infrastructure/sources.md

Last revision: 2026-06-12