KafKa不懂就学就问就解答笔记

Posted by Yancy on 2017-09-11

1. 部署生产环境,打算部署三个broker实例,但zookeeper部署一个可以吗?

1
答案:可以是可以,但为了容错还是部署zookeeper集群比较好。broker和zookeeper的对应比例倒是没什么,都是独立集群。

2. kafka集群为什么需要zookeeper来配合?
1
答案:zookeeper 是作为性能协调工具的角色存在。存储着你Kafka服务的一些些元数据(partitions、offset等等)。zookeeper集群的作用在于保证Zookeeper服务的高可用。因此你可以根据你的需要来选择是否构建zookeeper集群。
3. 查看kafka topic 消费记录报错WARN Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)

java.net.ConnectException: Connection timed out

1
2
3
问题:
1.是kafka连不上zookeeper了。请检查 zk 集群是否正常能Telnet,kafka集群是否正常。
2.检查server.properties 中zookeeper.connect是否配置正确,如果都没有问题,重新启动服务。
4. kafka 支持压缩传输吗?
5. Kafka 如何在开启数据压缩的情况下, consumer维护自己的offset?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
问题:
kafka这边数据传输消费跨aws机房较慢。会有网络瓶颈,我们kafka在国外的一dalasi机房,消费端在dalasi。
官网解答:
Offset management on the consumer
The data received by a consumer for a topic might contain both compressed as well as uncompressed messages. The consumer iterator transparently decompresses compressed data and only returns an uncompressed message. The offset maintenance in the consumer gets a little tricky. In the zookeeper consumer, the consumed offset is updated each time a message is returned. This consumed offset should be a valid fetch offset for correct failure recovery. Since data is stored in compressed format on the broker, valid fetch offsets are the compressed message boundaries. Hence, for compressed data, the consumed offset will be advanced one compressed message at a time. This has the side effect of possible duplicates in the event of a consumer failure. For uncompressed data, consumed offset will be advanced one message at a time.
这段话不是很理解: producer将100条message压缩成1条发给broker后, broker是如何存储的,并且consumer是如何取出这压缩后的数据, 并维护offset的?
### 消息压缩(message compression)
考虑到网络带宽的瓶颈,Kafka提供了消息组压缩特性。Kafka通过递归消息集来支持高效压缩。高效压缩需要多个消息同时压缩,而不是对每个消息单独压缩。一批消息压缩在一起发送给broker。压缩消息集降低了网络的负载,但是解压缩也带来了一些额外的开销。消息集的解压缩是由broker处理消息offset时完成的。
每个消息可通过一个不可比较的、递增的逻辑offset访问,这个逻辑offset在每个分区内是唯一的。接收到压缩数据后,lead broker将消息集解压缩,为每个消息分配offset。offset分配完成后,leader再次将消息集压缩并写入磁盘。
在Kafka中,数据的压缩由producer完成,可使用GZIP或Snappy压缩协议。同时需要在producer端配置相关的参数:
compression.codec:指定压缩格式,默认为none,可选的值还有gzip和snappy。
compressed.topics:设置对指定的topic开启压缩,默认为null。当compression.codec不为none时,对指定的topic开启压缩;如果compressed.topics为null则对所有topic开启压缩。
消息集ByteBufferMessageSet可能既包含压缩数据也包含非压缩数据,为了区分开来,消息头中添加了压缩属性字节。在该字节中,最低位的两位表示压缩格式,如果都是0表示非压缩数据。

Communicative learning:

🐧 Linux shell_ senior operation and maintenance faction: QQ group 459096184 circle (system operation and maintenance - application operation and maintenance - automation operation and maintenance - virtualization technology research, welcome to join)
🐧 BigData-Exchange School:QQ group 521621407 circles (big data Yun Wei) (Hadoop developer) (big data research enthusiasts) welcome to join

Bidata have internal WeChat exchange group, learn from each other, join QQ group has links.