Я использую spark-streaming-kafka-0-10_2.11
и spark-streaming_2.1
.Я использую следующий код для использования kafka:
import org.apache.kafka.clients.consumer.ConsumerRecord
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark.streaming.kafka010._
import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent
import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe
val kafkaParams = Map[String, Object](
"bootstrap.servers" -> "localhost:9092,anotherhost:9092",
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"group.id" -> "use_a_separate_group_id_for_each_stream",
"auto.offset.reset" -> "latest",
"enable.auto.commit" -> (false: java.lang.Boolean)
)
val topics = Array("topicA", "topicB")
val stream = KafkaUtils.createDirectStream[String, String](
streamingContext,
PreferConsistent,
Subscribe[String, String](topics, kafkaParams)
)
stream.foreachRDD { rdd =>
val offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
// some time later, after outputs have completed
stream.asInstanceOf[CanCommitOffsets].commitAsync(offsetRanges)
}
Иногда возникает следующее исключение:
org.apache.kafka.clients.consumer.commitfailedexception: commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. this means that the time between subsequent calls to poll() was longer than the configured session.timeout.ms, which typically implies that the poll loop is spending too much time message processing. you can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
Итак, я хочу использовать ConsumerRebalanceListener
для сохранения TopicPartition
Информация.Я знаю, что мы можем использовать это так:
val consumer = new KafkaConsumer[String,String](props)
val listener = new RebalanceListener
consumer.subscribe(Collections.singletonList(topic), listener)
Итак, мой вопрос, как я могу использовать слушателя в DirectStream
?