Kafka is a scalable distributed message queue. This means, it allows producers to push high volume of messages to a queue and consumers to pull messages from the queue. In this post I will describe Kafka’s key abstractions and architecture principles.
A Kafka cluster is composed from nodes which are called brokers. Producer client connects to a broker and push a message. Internally, the broker create a sequence number per message, append the message to a log file located in the node’s local hard disk. For efficient fetch of a message – the brokers manage an offset table with a sequence number as the key and value of the seek location of the message in the log file and its size. In Kafka, the consumer client is responsible to track which messages he already consumed (using zookeeper) and not the server, it request a fix number of (or all) updates since the last message he consumed. Other message queues approach this tracking differently – their server track this data for each of their consumers. In Kafka, After a message is consumed by a client it is not removed by the server so other consumers can consume it later including the consumer that just consumed it. This can be helpful in case data need to be processed again (replay) by the consumer after some fault in the first processing. The messages will be kept for some period . To emphasis – the queue is actually implemented by continuous log file on node’s local disk.
You might wonder at this point how Kafka manages efficient reads and writes to local disk. Kafka interaction with the local disk has two characteristic: 1) it appends new writes to an open file(i.e., the message-log-file) and 2) it usually reads continues area from disk. This characteristic lets Kafka rely on OS disk cache for optimizing its reads from and writes to disk. The OS cache is highly efficient, always enabled and it’s optimized for appends calls and for multi reads from a continues area on disk. More advantages of using OS level cache is that the Kafka’s processes uses little JVM’s heap space so avoiding all GC related issues, and in case the broker restart its does not need to run any cache-warm-up as the OS cache is not cleared when process terminates. Another optimization strategy used by Kafka is to reduce unnecessary copy of data – for example copy of data from network buffers from the OS into the application buffers. This is achieved by using Linux’s sendfile API that read data directly from disk (hopefully from OS disk cache) and routing it directly to a network socket without passing through the application and so avoid the unnecessary copy of buffers.
Now, consider the use case where several producers and consumers are interested in different type of messages. In order to support this use case – Kafka is using the abstraction of ‘topic‘. Producer is sending messages to a topic and consumers are reading messages from a topic.
In order to support large volume of messages and parallel consumption – Kafka introduced the concept of topic’s partitioning which means that when producing to a topic – the producer also specify the method for partitioning the messages e.g., via round robin or by specifying a partition function and message’s field that together define a partition per message. for example if there are 5 partitions with a single topic on a Kafka cluster with 5 brokers – then each broker will manage (lead) one of the partitions. Order between messages is kept between messages in the same partition only.
High availability is achieved by replicating partition to other brokers so in case of a failure in one node – another node starts to manage the partitions. Usually, there are more topics and partitions than the number of cluster nodes so a typical broker will have several managed partitions and several replication for other partitions which are managed in different brokers. Replication to brokers achieved by a broker act like a regular client that consumes a specific partition. All producers of a partition write to the same broker that manage this partition and the same goes for consumers of a partition- they all read from the same broker that manage the partition.
Kafka uses the concept of consumer groups to allow a pool of processes to divide up the work of consuming and processing records. These processes can either be running on the same machine or, as is more likely, they can be distributed over many machines to provide additional scalability and fault tolerance for processing.
Each Kafka consumer is able to configure a consumer group that it belongs to, and can dynamically set the list of topics it wants to subscribe to or subscribe to all topics matching certain pattern through. Kafka will deliver each message in the subscribed topics to one process in each consumer group. This is achieved by balancing the partitions in the topic over the consumer processes in each group. So if there is a topic with four partitions, and a consumer group with two processes, each process would consume from two partitions. This group membership is maintained dynamically: if a process fails the partitions assigned to it will be reassigned to other processes in the same group, and if a new process joins the group, partitions will be moved from existing consumers to this new process.
So if two processes subscribe to a topic both specifying different groups they will each get all the records in that topic; if they both specify the same group they will each get about half the records.
Conceptually you can think of a consumer group as being a single logical subscriber that happens to be made up of multiple processes. As a multi-subscriber system, Kafka naturally supports having any number of consumer groups for a given topic without duplicating data.
This is a slight generalization of the functionality that is common in messaging systems. To get semantics similar to a queue in a traditional messaging system all processes would be part of a single consumer group and hence record delivery would be balanced over the group like with a queue. Unlike a traditional messaging system, though, you can have multiple such groups. To get semantics similar to pub-sub in a traditional messaging system each process would have its own consumer group, so each process would subscribe to all the records published to the topic.