General Terms used in Kafka
Producer: The producer is an application that sends messages / data to Kafka. Message may have different meaning of schema for us but for Kapka it is simple array of bytes.
Cluster: Cluster is a group of computers sharing workload for a common purpose. Since Kapka is a distributed system each computer executes one instance of Kapka Broker, so that instead of cluster we can denote Kapka as cluster.

Topic: A topic is a unique name for Kapka stream. Producer send data to Kapka broker. Consumer can ask for data from Kapka broker. But which data set consumer is needed? Topic is a labeled set of data for which consumer may ask for.
For example, 5 producers send data to Kapka Cluster. Kapka Cluster arranges all the data in groups or summarized data in groups according to the demand of consumers. Each defined group is called stream and name of each stream is called topic. Global orders is a topic in the picture. Each consumer may registered for more than one topics and will get data set topic-wise automatically.

Partitions: Data may be huge and sometimes it may the larger than the capacity of a single computer. So the main challenge is to store the data. One solution is to break the data into small parts and distributed to multiple computers so that it will be fitted with the capacity of a computer. Each partition is stored one machine. Each part is called partition. Number of partition have to fix by the designer in advance an it can not be change later.

Offset
Offset is a sequence id which are given to messages as they arrived in a partition. Once a number assigned to a message it cannot be changed. First message gets an offset 0 and the second message gets an offset 2 and so on. These offset numbers are local. So to find a message we need topic id, partition id and offset id.

Consumer Group:
A group of consumers acting as a single logical unit. Members of the same group share the values.

Statlearner
Statlearner