当使用 enable.idempotence true 时,Spring Cloud Stream Kafka 应用程序的启动速度极慢。

问题描述 投票:4回答:1

我的Scs应用有两个Kafka生产者,配置是这样的。

spring:
  cloud:
    function:
      definition: myProducer1;myProducer2
    stream:
      bindings:
        myproducer1-out-0:
          destination: topic1
          producer:
            useNativeEncoding: true
        myproducer2-out-0:
          destination: topic2
          producer:
            useNativeEncoding: true
      kafka:
        binder:
          brokers: ${kafka.brokers:localhost}
          min-partition-count: 3
          replication-factor: 3
          producerProperties:
            enable:
              idempotence: false
            retries: 10000
            acks: all
            key:
              serializer: io.confluent.kafka.serializers.KafkaAvroSerializer
              subject:
                name:
                  strategy: io.confluent.kafka.serializers.subject.RecordNameStrategy
            value:
              serializer: io.confluent.kafka.serializers.KafkaAvroSerializer
              subject:
                name:
                  strategy: io.confluent.kafka.serializers.subject.RecordNameStrategy
            schema:
              registry:
                url: ${schema-registry.url:http://localhost:8081}

它的启动时间大约为10秒

 o.s.c.s.m.DirectWithAttributesChannel    : Channel 'my-app-1.myproducer2-out-0' has 1 subscriber(s).
 o.s.b.web.embedded.netty.NettyWebServer  : Netty started on port(s): 8084
 e.p.i.m.MyAppApplicationKt     : Started MyAppApplicationKt in 11.288 seconds (JVM running for 11.868)

我需要我的生产者是idempotent的,所以我设置了... ... enabled.idempotence: true. 通过这一改变,启动时间慢了7倍(有时甚至超过10倍)。

 o.s.c.s.m.DirectWithAttributesChannel    : Channel 'my-app-1.myproducer2-out-0' has 1 subscriber(s).
 o.s.b.web.embedded.netty.NettyWebServer  : Netty started on port(s): 8084
 e.p.i.m.MyAppApplicationKt     : Started MyAppApplicationKt in 71.489 seconds (JVM running for 72.127)

我怎样才能加快启动速度?

更新。

我发现了一个问题,在启动过程中 (Proceeding to force close the producer since pending requests could not be completed within timeout 30000 ms.), 有时它发生在其中一个生产者身上,有时发生在两个生产者身上,有时不发生。. 当它不出现时,启动速度和以前一样快。

在下面的日志中,它只发生在一个生产者身上。

o.a.k.clients.producer.KafkaProducer     : [Producer clientId=producer-1] Instantiated an idempotent producer.
o.a.k.c.s.authenticator.AbstractLogin    : Successfully logged in.
o.a.kafka.common.utils.AppInfoParser     : Kafka version: 2.3.1
o.a.kafka.common.utils.AppInfoParser     : Kafka commitId: 18a913733fb71c01
o.a.kafka.common.utils.AppInfoParser     : Kafka startTimeMs: 1586864007183
org.apache.kafka.clients.Metadata        : [Producer clientId=producer-1] Cluster ID: lkc-nvqmv
o.a.k.clients.producer.KafkaProducer     : [Producer clientId=producer-1] Closing the Kafka producer with timeoutMillis = 30000 ms.
o.a.k.c.p.internals.TransactionManager   : [Producer clientId=producer-1] ProducerId set to 32029 with epoch 0

在卡了30秒之后 ProducerId set to 32029 with epoch 0的信息,它记录了 Proceeding to force close...和初始化第二个生产者没有任何问题。

o.a.k.clients.producer.KafkaProducer     : [Producer clientId=producer-1] Proceeding to force close the producer since pending 
o.s.c.s.m.DirectWithAttributesChannel    : Channel 'my-app-1.myproducer1-out-0' has 1 subscriber(s).
o.s.c.s.b.k.p.KafkaTopicProvisioner      : Using kafka topic for outbound: topic2
o.a.k.clients.admin.AdminClientConfig    : AdminClientConfig values: 
...
o.a.k.clients.producer.KafkaProducer     : [Producer clientId=producer-2] Instantiated an idempotent producer.
o.a.k.c.s.authenticator.AbstractLogin    : Successfully logged in.
o.a.kafka.common.utils.AppInfoParser     : Kafka version: 2.3.1
o.a.kafka.common.utils.AppInfoParser     : Kafka commitId: 18a913733fb71c01
o.a.kafka.common.utils.AppInfoParser     : Kafka startTimeMs: 1586864038612
org.apache.kafka.clients.Metadata        : [Producer clientId=producer-2] Cluster ID: lkc-nvqmv
o.a.k.clients.producer.KafkaProducer     : [Producer clientId=producer-2] Closing the Kafka producer with timeoutMillis = 30000 ms.
o.a.k.c.p.internals.TransactionManager   : [Producer clientId=producer-2] ProducerId set to 32030 with epoch 0
o.a.k.clients.producer.KafkaProducer     : [Producer clientId=producer-2] Proceeding to force close the producer since pending 
o.s.c.s.m.DirectWithAttributesChannel    : Channel 'my-app-1.myproducer2-out-0' has 1 subscriber(s).
o.s.b.web.embedded.netty.NettyWebServer  : Netty started on port(s): 8084
e.p.i.m.MetricsIngestorApplicationKt     : Started MetricsIngestorApplicationKt in 66.834 seconds (JVM running for 67.544)

UPDATE 2:

我调试了这背后的逻辑,它发生在以下过程中: doBindProducer() 方法。它得到主题的分区,并为其在 KafkaMessageChannelBinder.

    @Override
    protected MessageHandler createProducerMessageHandler(
            final ProducerDestination destination,
            ExtendedProducerProperties<KafkaProducerProperties> producerProperties,
            MessageChannel channel, MessageChannel errorChannel) throws Exception {
        /*
         * IMPORTANT: With a transactional binder, individual producer properties for
         * Kafka are ignored; the global binder
         * (spring.cloud.stream.kafka.binder.transaction.producer.*) properties are used
         * instead, for all producers. A binder is transactional when
         * 'spring.cloud.stream.kafka.binder.transaction.transaction-id-prefix' has text.
         */
        final ProducerFactory<byte[], byte[]> producerFB = this.transactionManager != null
                ? this.transactionManager.getProducerFactory()
                : getProducerFactory(null, producerProperties);
        Collection<PartitionInfo> partitions = provisioningProvider.getPartitionsForTopic(
                producerProperties.getPartitionCount(), false, () -> {
                    Producer<byte[], byte[]> producer = producerFB.createProducer();
                    List<PartitionInfo> partitionsFor = producer
                            .partitionsFor(destination.getName());
                    producer.close();
                    if (this.transactionManager == null) {
                        ((DisposableBean) producerFB).destroy();
                    }
                    return partitionsFor;
                }, destination.getName());

在正确检索到这个列表后 List<PartitionInfo> partitionsFor它被卡在KafkaProducer.destroy()中,直到30秒超时结束。

enter image description here

为什么会卡在那里?会不会是binder的bug?

apache-kafka apache-kafka-streams spring-cloud-stream spring-cloud-stream-binder-kafka
1个回答
0
投票

我不知道为什么关闭会超时,但你应该可以配置该超时。

请针对绑定器打开一个问题;它目前不支持减少默认的关闭超时(30秒)。

© www.soinside.com 2019 - 2024. All rights reserved.