了解Kafka主题和分区

65
投票

这篇文章已经有了答案，但我正在用Kafka权威指南中的一些图片添加我的观点

在回答每个问题之前，让我们添加生产者组件的概述：

1. When a producer is producing a message - It will specify the topic it wants to send the message to, is that right? Does it care about partitions?

生产者将决定目标分区以放置任何消息，具体取决于：

分区ID，如果在消息中指定的话
key％num partitions，如果没有提到分区id
循环，如果消息中既没有分区ID也没有消息密钥，这意味着只有值可用

2. When a subscriber is running - Does it specify its group id so that it can be part of a cluster of consumers of the same topic or several topics that this group of consumers is interested in?

您应该始终配置group.id，除非您使用简单的赋值API，并且不需要在Kafka中存储偏移量。它不会成为任何团体的一部分。 source

3. Does each consumer group have a corresponding partition on the broker or does each consumer have one?

在一个使用者组中，每个分区仅由一个使用者处理。这些是可能的情况

消费者数量少于主题分区数量，然后可以将多个分区分配给该组中的一个消费者
消费者数量与主题分区数量相同，然后分区和消费者映射可以如下所示，
消费者数量高于主题分区数量，然后分区和消费者映射可以如下所示，无效，请查看消费者5

4. As the partitions created by the broker, therefore not a concern for the consumers?

消费者应该知道分区的数量，如问题3中所讨论的。

5. Since this is a queue with an offset for each partition, is it responsibility of the consumer to specify which messages it wants to read? Does it need to save its state?

Kafka（作为特定的组协调器）通过向内部__consumer_offsets主题生成消息来处理偏移状态，通过将enable.auto.commit设置为false，可以将此行为配置为手动。在这种情况下，consumer.commitSync()和consumer.commitAsync()可以帮助管理偏移量。

有关集团协调员的更多信息

它是Kafka服务器端集群中的一个选举代理。
消费者与组协调器交互以进行偏移提交和获取请求。
消费者向组协调员发送定期心跳。

6. What happens when a message is deleted from the queue? - For example: The retention was for 3 hours, then the time passes, how is the offset being handled on both sides?

如果任何消费者在保留期后开始消息，则消息将根据auto.offset.reset配置消耗，该配置可能是latest/earliest。从技术上讲，它是latest（开始处理新消息）因为所有消息在那个时间到期并且保留是主题级别配置。

97
投票

让我们按顺序:)

1 - 当生产者正在生成消息时 - 它将指定要将消息发送到的主题，是吗？它关心分区吗？

默认情况下，生产者不关心分区。您可以选择使用自定义分区程序来获得更好的控制，但它完全是可选的。

2 - 当订户正在运行时 - 它是否指定了其组ID，以便它可以是同一主题的消费者群集的一部分，或者是该群体消费者感兴趣的几个主题？

是的，消费者加入（或创建，如果他们独自一人）消费者群体来共享负载。同一组中没有两个消费者会收到相同的消息。

3 - 每个消费者组在代理上是否有相应的分区，或者每个消费者都有一个分区？

都不是。在两个条件下，消费者组中的所有消费者都被分配了一组分区：同一组中没有两个消费者具有共同的任何分区 - 并且为每个现有分区分配整个消费者组。

4 - 代理是否创建了分区，因此不关心消费者？

它们不是，但你可以从3看到，拥有比现有分区更多的消费者是完全没用的，所以它是你消费的最大并行度水平。

5 - 由于这是一个每个分区都有一个偏移量的队列，因此消费者有责任指定它想要读取的消息吗？是否需要保存其状态？

是的，消费者为每个分区保留每个主题的偏移量。这完全由卡夫卡处理，不用担心。

6 - 从队列中删除邮件时会发生什么？ - 例如：保留时间为3小时，然后时间过去了，两侧的偏移量如何处理？

如果消费者曾经请求代理上的分区不可用的偏移量（例如，由于删除），则它进入错误模式，并最终将此分区的自身重置为可用的最新消息或最旧消息（取决于auto.offset.reset配置值），并继续工作。

10
投票

Kafka使用Topic概念来为消息流带来秩序。

为了平衡负载，可以将主题划分为多个分区并在代理之间进行复制。

分区是有序的，不可变的消息序列，它们被连续附加，即提交日志。

分区中的消息具有序列标识号，该标识号唯一标识分区中的每条消息。

分区允许主题的日志扩展超出适合单个服务器（代理）的大小，并充当并行性的单位。

主题的分区分布在Kafka集群中的代理上，其中每个代理处理数据并请求分区的共享。

每个分区都在可配置数量的代理上进行复制，以确保容错。

在这篇文章中解释得很好：http://codeflex.co/what-is-apache-kafka/

问题描述投票：93回答：3

3个回答

这篇文章已经有了答案，但我正在用Kafka权威指南中的一些图片添加我的观点

1. When a producer is producing a message - It will specify the topic it wants to send the message to, is that right? Does it care about partitions?

2. When a subscriber is running - Does it specify its group id so that it can be part of a cluster of consumers of the same topic or several topics that this group of consumers is interested in?

3. Does each consumer group have a corresponding partition on the broker or does each consumer have one?

4. As the partitions created by the broker, therefore not a concern for the consumers?

5. Since this is a queue with an offset for each partition, is it responsibility of the consumer to specify which messages it wants to read? Does it need to save its state?

6. What happens when a message is deleted from the queue? - For example: The retention was for 3 hours, then the time passes, how is the offset being handled on both sides?

最新问题

了解Kafka主题和分区

问题描述 投票：93回答：3

3个回答

这篇文章已经有了答案，但我正在用Kafka权威指南中的一些图片添加我的观点

1. When a producer is producing a message - It will specify the topic it wants to send the message to, is that right? Does it care about partitions?

2. When a subscriber is running - Does it specify its group id so that it can be part of a cluster of consumers of the same topic or several topics that this group of consumers is interested in?

3. Does each consumer group have a corresponding partition on the broker or does each consumer have one?

4. As the partitions created by the broker, therefore not a concern for the consumers?

5. Since this is a queue with an offset for each partition, is it responsibility of the consumer to specify which messages it wants to read? Does it need to save its state?

6. What happens when a message is deleted from the queue? - For example: The retention was for 3 hours, then the time passes, how is the offset being handled on both sides?

最新问题

问题描述投票：93回答：3