Как я могу использовать `ConsumerRebalanceListener` в потоковой передаче искры? - PullRequest
0 голосов
/ 06 февраля 2019

Я использую spark-streaming-kafka-0-10_2.11 и spark-streaming_2.1.Я использую следующий код для использования kafka:

import org.apache.kafka.clients.consumer.ConsumerRecord
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark.streaming.kafka010._
import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent
import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe

val kafkaParams = Map[String, Object](
  "bootstrap.servers" -> "localhost:9092,anotherhost:9092",
  "key.deserializer" -> classOf[StringDeserializer],
  "value.deserializer" -> classOf[StringDeserializer],
  "group.id" -> "use_a_separate_group_id_for_each_stream",
  "auto.offset.reset" -> "latest",
  "enable.auto.commit" -> (false: java.lang.Boolean)
)

val topics = Array("topicA", "topicB")
val stream = KafkaUtils.createDirectStream[String, String](
  streamingContext,
  PreferConsistent,
  Subscribe[String, String](topics, kafkaParams)
)

stream.foreachRDD { rdd =>
  val offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges

  // some time later, after outputs have completed
  stream.asInstanceOf[CanCommitOffsets].commitAsync(offsetRanges)
}

Иногда возникает следующее исключение:

org.apache.kafka.clients.consumer.commitfailedexception: commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. this means that the time between subsequent calls to poll() was longer than the configured session.timeout.ms, which typically implies that the poll loop is spending too much time message processing. you can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.

Итак, я хочу использовать ConsumerRebalanceListener для сохранения TopicPartitionИнформация.Я знаю, что мы можем использовать это так:

    val consumer = new KafkaConsumer[String,String](props)

    val listener = new RebalanceListener

    consumer.subscribe(Collections.singletonList(topic), listener)

Итак, мой вопрос, как я могу использовать слушателя в DirectStream?

...