Skip to content

Apache Kafka

David Liu edited this page Jun 16, 2025 · 22 revisions

an open-source stream-processing software platform developed by LinkedIn and donated to the Apache Software Foundation

  • written in Scala and Java image image

kafka-internal-produce kafka-internal-consume

No animal, please.

image

ZK design

  • ZooKeeper负责存储Kafka的metadata,包括
    • Topic: partition #, replica
    • Broker: address, healthy
    • Consumer group: registration, offset
  • 当有新的Broker加入集群或者某个Broker出现故障时,其他节点可以通过ZooKeeper获取最新的Broker信息,从而进行相应的调整
  • Strict Consistency: ZooKeeper通过观察机制(Watch),确保集群中所有节点看到的元数据是一致的。

KRaft

  • Metadata Log
    • where metadata stored
    • like Kafka message log, metadata log is persistent and sequential
  • Offset get stored in Kafka directly
  • Consumer group 协调仍然由Controller node管理,但change on metadata通过Raft协议同步

Kafka Streams

image A fluent, functional Java API (as Library) to handle complex operations like

  • grouping a stream by a key
  • joining a stream
  • Turn compacted topic into a table

Sink

Provided by

Kafka broker

A Kafka broker is a server in the cluster this will receive and send the data. aka as node

  • A Kafka cluster is a group of multiple Kafka brokers.
  • Each Kafka broker is identified with an ID (integer).
  • All the topic partitions data is Distributed across all brokers(load balanced). Each broker will have certain topic partitions.
  • After connecting to any broker (bootstrap broker) you can have connectivity to the entire cluster.
  • A good number to get started is 3 brokers. You can create any number of brokers you want no limit to that.

Vendors

Confluent

ksqlDB is well worth checking out for developers looking to build streaming applications while taking advantage of their familiarity with relational databases.

Strimzi provides a way to run an Apache Kafka cluster on Kubernetes in various deployment configurations.

bitnami

kafka-ui

  • IBM Event Streams is a high-throughput message bus built with Apache Kafka.
  • Lite plan (Free) is available in Region Dallas (us-south)
    • Offers access to 1 partition in a multi-tenant Event Streams cluster.

Oracle

  • OCI Streaming Service
  • Transactional Event Queues in Oracle Database

AWS

  • Amazon Managed Streaming for Apache Kafka (MSK)
Clone this wiki locally