как читать данные из mongodb в Zepplin, используя спарк? - PullRequest
0 голосов
/ 25 апреля 2018

Я работаю с zeppelin в hdp 2.6. Я хочу прочитать коллекцию из mongodb, используя интерпретатор spark2

util.Properties.versionString
spark.version
res22: String = version 2.11.8
res23: String = 2.2.0.2.6.4.0-91

Я использую MongoDB 3.4.14 mongo-spark-connector 2.2.2 mongo-java-driver 3.5.0, когда я пробую это

val customReadConfig = ReadConfig(Map("readPreference.name" -> "secondaryPreferred" ,"uri" -> "mongodb://127.0.0.1:27017/test.collections"))
val df5 = spark.sparkSession.read.mongo(customReadConfig)

Я получаю эту ошибку

     customReadConfig: com.mongodb.spark.config.ReadConfig.Self =ReadConfig(test,collections,Some(mongodb://127.0.0.1:27017/test.collections),1000,DefaultMongoPartitioner,Map(),15,ReadPreferenceConfig(secondaryPreferred,None),ReadConcernConfig(None),false)
     org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 20.0 failed 1 times, most recent failure: Lost task 0.0 in stage 20.0 (TID 20, localhost, executor driver): java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
at com.mongodb.spark.rdd.MongoRDD$MongoCursorIterator.<init>(MongoRDD.scala:174)
at com.mongodb.spark.rdd.MongoRDD.compute(MongoRDD.scala:152)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
...