как запустить спарк в консоли sbt? - PullRequest
0 голосов
/ 21 сентября 2019

Я пытаюсь запустить искру на своем ноутбуке в режиме одиночного режима, но получаю странную ошибку.

Вот небольшой пример:

puggini-pro13:stackoverflow lpuggini$ sbt console
[residual] arg = '-sbt-create'
[residual] arg = 'console'
[residual] arg = 'console'
[process_args] java_version = '13'
[sbt_options] declare -a sbt_options='()'
[addMemory] arg = '1024'
[addJava] arg = '-Xms1024m'
[addJava] arg = '-Xmx1024m'
[addJava] arg = '-Xss4M'
[addJava] arg = '-XX:ReservedCodeCacheSize=128m'
[copyRt] java9_rt = '/Users/lpuggini/.sbt/0.13/java9-rt-ext-adoptopenjdk_13/rt.jar'
[addJava] arg = '-Dscala.ext.dirs=/Users/lpuggini/.sbt/0.13/java9-rt-ext-adoptopenjdk_13'
# Executing command line:
java
-Dfile.encoding=UTF-8
-Xms1024m
-Xmx1024m
-Xss4M
-XX:ReservedCodeCacheSize=128m
-Dscala.ext.dirs=/Users/lpuggini/.sbt/0.13/java9-rt-ext-adoptopenjdk_13
-jar
/usr/local/Cellar/sbt/1.3.1/libexec/bin/sbt-launch.jar
console

[info] Loading project definition from /Users/lpuggini/ProgrammingProjects/spark_coursera/stackoverflow/project
[info] Compiling 8 Scala sources to /Users/lpuggini/ProgrammingProjects/spark_coursera/stackoverflow/project/target/scala-2.10/sbt-0.13/classes...
[warn] /Users/lpuggini/ProgrammingProjects/spark_coursera/stackoverflow/project/CommonBuild.scala:3: trait Build in package sbt is deprecated: Use .sbt format instead
[warn] trait CommonBuild extends Build {
[warn]                           ^
[warn] one warning found
error: error while loading String, class file '/Library/Java/JavaVirtualMachines/adoptopenjdk-13.jdk/Contents/Home(java/lang/String.class)' is broken
(class java.lang.NullPointerException/null)
[info] Set current project to bigdata-stackoverflow (in build file:/Users/lpuggini/ProgrammingProjects/spark_coursera/stackoverflow/)
[info] Compiling 2 Scala sources to /Users/lpuggini/ProgrammingProjects/spark_coursera/stackoverflow/target/scala-2.11/classes...
[info] Starting scala interpreter...
[info] 
Welcome to Scala 2.11.12 (OpenJDK 64-Bit Server VM, Java 13).
Type in expressions for evaluation. Or try :help.

scala> import org.apache.spark.SparkConf
import org.apache.spark.SparkConf

scala> import org.apache.spark.SparkContext
import org.apache.spark.SparkContext

scala> import org.apache.spark.SparkContext._
import org.apache.spark.SparkContext._

scala>  val conf: SparkConf = new SparkConf().setAppName("wikipedia").setMaster("local")
conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@6ce3b4b7

scala>   val sc: SparkContext = new SparkContext(conf)
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/09/21 18:19:21 INFO SparkContext: Running Spark version 2.1.0
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/Users/lpuggini/Library/Caches/Coursier/v1/https/repo1.maven.org/maven2/org/apache/hadoop/hadoop-auth/2.2.0/hadoop-auth-2.2.0.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
19/09/21 18:19:22 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/09/21 18:19:22 INFO SecurityManager: Changing view acls to: lpuggini
19/09/21 18:19:22 INFO SecurityManager: Changing modify acls to: lpuggini
19/09/21 18:19:22 INFO SecurityManager: Changing view acls groups to: 
19/09/21 18:19:22 INFO SecurityManager: Changing modify acls groups to: 
19/09/21 18:19:22 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(lpuggini); groups with view permissions: Set(); users  with modify permissions: Set(lpuggini); groups with modify permissions: Set()
19/09/21 18:19:22 INFO Utils: Successfully started service 'sparkDriver' on port 61525.
19/09/21 18:19:22 INFO SparkEnv: Registering MapOutputTracker
19/09/21 18:19:22 INFO SparkEnv: Registering BlockManagerMaster
19/09/21 18:19:22 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
19/09/21 18:19:22 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/09/21 18:19:22 INFO DiskBlockManager: Created local directory at /private/var/folders/93/x9dn9rkx00l0kxvfp5qs4n_54gn5pn/T/blockmgr-74ed4a79-815c-4e82-a9de-280643886ede
19/09/21 18:19:22 INFO MemoryStore: MemoryStore started with capacity 434.4 MB
19/09/21 18:19:22 INFO SparkEnv: Registering OutputCommitCoordinator
19/09/21 18:19:23 INFO Utils: Successfully started service 'SparkUI' on port 4040.
19/09/21 18:19:23 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.1.81:4040
19/09/21 18:19:23 INFO Executor: Starting executor ID driver on host localhost
19/09/21 18:19:23 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 61526.
19/09/21 18:19:23 INFO NettyBlockTransferService: Server created on 192.168.1.81:61526
19/09/21 18:19:23 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
19/09/21 18:19:23 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.1.81, 61526, None)
19/09/21 18:19:23 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.81:61526 with 434.4 MB RAM, BlockManagerId(driver, 192.168.1.81, 61526, None)
19/09/21 18:19:23 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.81, 61526, None)
19/09/21 18:19:23 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.1.81, 61526, None)
sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@5504a112

scala>     val lines   = sc.textFile("src/main/resources/stackoverflow/stackoverflow.csv")
19/09/21 18:20:42 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 78.8 KB, free 434.3 MB)
19/09/21 18:20:42 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 14.4 KB, free 434.3 MB)
19/09/21 18:20:42 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.81:61526 (size: 14.4 KB, free: 434.4 MB)
19/09/21 18:20:42 INFO SparkContext: Created broadcast 0 from textFile at <console>:17
lines: org.apache.spark.rdd.RDD[String] = src/main/resources/stackoverflow/stackoverflow.csv MapPartitionsRDD[1] at textFile at <console>:17

scala> lines.first()
java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2
  at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3720)
  at java.base/java.lang.String.substring(String.java:1909)
  at org.apache.hadoop.util.Shell.<clinit>(Shell.java:48)
  at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:76)
  at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:362)
  at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$30.apply(SparkContext.scala:1014)
  at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$30.apply(SparkContext.scala:1014)
  at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:179)
  at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:179)
  at scala.Option.foreach(Option.scala:257)
  at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:179)
  at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:198)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
  at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
  at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1332)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
  at org.apache.spark.rdd.RDD.take(RDD.scala:1326)
  at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1367)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
  at org.apache.spark.rdd.RDD.first(RDD.scala:1366)
  ... 42 elided

scala> 

Как это исправить?

Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...