Невозможно загрузить файл GeoJson в Spark Scala, используя magellan в Databricks - PullRequest
0 голосов
/ 31 января 2019

Я пытаюсь загрузить файл GeoJson, используя библиотеку Magellan.Но сталкиваются с ошибками.Мой код:

import magellan.{Point, Polygon, PolyLine}
import org.apache.spark.sql.magellan.dsl.expressions._
import scala.io.Source
import org.json4s._
import org.json4s.jackson.JsonMethods._

//Sourcing the GeoJson from a URL
val jsonStr = Source.fromURL("https://raw.githubusercontent.com/datasets/geo-countries/master/data/countries.geojson").mkString
    val jsonRDD = Seq(jsonStr).toDS
//Writing into a S3 bucket
jsonRDD.coalesce(1).write.mode(SaveMode.Overwrite).json("s3a://my-bucket/github_geo.json")
val neighborhoods = sqlContext.read.format("magellan").option("type", "geojson").load("s3://my-bucket/github_geo.json/").count()

Это выдает ошибку после успешной записи в S3.Ниже приведена ошибка

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 10.125.220.160, executor 0): java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods$.parse(Lorg/json4s/JsonInput;Z)Lorg/json4s/JsonAST$JValue;at magellan.GeoJSONRelation.magellan$GeoJSONRelation$$parseShapeWithMeta(GeoJSONRelation.scala:63)
at magellan.GeoJSONRelation$$anonfun$_buildScan$1.apply(GeoJSONRelation.scala:55)
at magellan.GeoJSONRelation$$anonfun$_buildScan$1.apply(GeoJSONRelation.scala:52)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithoutKey_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$11$$anon$1.hasNext(WholeStageCodegenExec.scala:622)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:139)
at org.apache.spark.scheduler.Task.run(Task.scala:112)

Я попробовал ссылку ниже, и она все еще не работала,

https://forums.databricks.com/questions/6928/cannot-parse-json-using-json4s-on-databricks.html

...