Skip to content

Terminology

Joe Zack edited this page Dec 7, 2019 · 2 revisions

Terminology

Broker

Brokers are the main component of Kafka. They are simply responsible for storing and retrieving messages. They know nothing of the format of the messages stored, and therefore can not do things like validation.

Zookeeper

Zookeeper is another Apache product that Kafka uses as a reliable Key/Value store, and for leader elections. Note: Kafka may be getting rid of Zookeeper and managing this data itself eventually.

Cluster

A cluster refers to a collection of one or more Kafka Brokers and Zookeeper (for now) servers. A producer or consumer will typically be associated with a cluster via a configuration made up of a few different major components:

Bootstrap Servers Bootstrap servers are a list of known Kafka Brokers that can be given as the initial entry point into the cluster. These Bootstrap Servers are aware of the other servers in the cluster, and can then pass the appropriate information on to the connecting producers or consumers.

Security Configuration There are a variety of ways you authenticate and authorize actions within a Kafka Cluster. Grafka allows you to pass text for configuration which handles many of the available options, but there is currently no way to pass external files for things like jks or jaas.

Producers

Producers are any entity that is able to push messages to Kafka. Many languages have API support for producers, and there are command line tools for pushing messages as well.

The Kafka Brokers will not block inappropriate messages so it's recommended you use a mature library for pushing messages that build protection in.

It's common for applications to be both producers and consumers.

Consumers

Consumers are able to consume messages from Kafka brokers. One key element of Kafka, however, is that the offsets (essentially a pointer to the last read message) are stored centrally in the Kafka cluster itself.

Consumer Groups

Consumers with the same groupId are part of the same consumer group, with retrieved messages being balanced and shared between them. This makes scaling easy, as you can dynamically add new consumers to the group and Kafka will be in charge of splitting up the work appropriately.

Topic

A topic is a named queue of messages. Topics have many possible configurations, dealing with things like replication, ack (how to confirm messages have been received), and message retention.

Message

A message consists of a few different parts, an optional Key, timestamp, and finally the Value which holds the message payload. A message may be encoded in a variety of ways, and it's up to the producers and consumers to agree on this encoding.

Kafka Connect

Connect is a framework built-on top of a Kafka cluster that provides the ability to create and run basic producer/consumer functionality for common tasks.

Schema Registry

It's common for Kafka messages to be encoded using Apache Avro. This reduces message size, has a built in mechanism for schema evolution, and has additional flexibility in terms of delivery. This strategy, however, relies on producers and consumers agreeing on known schema. The schema registry is responsible for maintaining versioned copies of these schema, so that producers and consumers can stay in sync.

Kafka Brokers are not aware of Schema Registry, therefore it's possible (but uncommon) for producers and consumers to use multiple different schema registry. Grafka currently expects the schema registry config and cluster config to be associated 1:1, which is not a requirement of Kafka. See ticket #15 for more info.

Clone this wiki locally