Spark-Submit ClassNotFoundException даже с именем пакета - PullRequest
0 голосов
/ 04 мая 2020

У меня есть основной scala класс в пакете sparks.seql.project, и я отправляю свою работу Spark в режиме клиента с командой:

spark-submit --class sparks.seql.project.EcommDataReader --master yarn --deploy-mode client D:\Spark-Workspace\Big-Data-Project\untitled\out\artifacts\Spark_Scala_jar\Spark-Scala.jar

Тем не менее я получаю:

2020-05-04 22:32:37,919 WARN deploy.SparkSubmit$$anon$2: Failed to load sparks.seql.project.EcommDataReader.
java.lang.ClassNotFoundException: sparks.seql.project.EcommDataReader
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:810)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2020-05-04 22:32:38,004 INFO util.ShutdownHookManager: Shutdown hook called
2020-05-04 22:32:38,010 INFO util.ShutdownHookManager: Deleting directory C:\Users\Harsh\AppData\Local\Temp\spark-83beb237-5bf6-4e84-881a-60b63135f7e3

Мой код выглядит так:

package sparks.seql.project

import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.types.IntegerType
import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession}
import org.apache.spark.storage.StorageLevel
import org.apache.spark.sql.functions._

object EcommDataReader {

  val spark = SparkSession.builder.appName("SapientQ1")
    .master("yarn")
    .enableHiveSupport()
    //    .config("hive.metastore.uris","thrift://localhost:9083")
    .config("spark.sql.warehouse.dir", "file://f:/sparkWarehouse/")
    .config("spark.shuffle.service.enabled", "true")
    .config("hive.exec.dynamic.partition", "true")
    .config("hive.exec.dynamic.partition.mode", "non-strict")
    .config("spark.yarn.am.memory","512m")
    .config("spark.yarn.am.cores","1")

    .config("spark.eventLog.dir","D:/sparkeventlog/")
    .getOrCreate()
  import spark.implicits._

  def main(args: Array[String]): Unit = {

  Logger.getLogger("org").setLevel(Level.OFF)
...
...

Хотя это кажется небольшой проблемой, но ничего не работает!

ОБНОВЛЕНИЕ:

build. SBT

name := "Spark-Scala"

version := "0.1"

scalaVersion := "2.11.11"

val sparkVersion = "2.4.3"

val kafkaVersion = "0.11.0.0"

dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-core" % "2.8.7"
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-databind" % "2.8.7"
dependencyOverrides += "com.fasterxml.jackson.module" % "jackson-module-scala_2.11" % "2.8.7"

//libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.7.7"
//
//libraryDependencies += "org.apache.spark" %% "spark-catalyst" % "2.1.0" % Test
//
//libraryDependencies += "org.apache.hive" % "hive-exec" % "2.1.0"
//
//libraryDependencies += "org.apache.hive" % "hive-metastore" % "2.1.0"
//libraryDependencies += "org.apache.hive" % "hive-jdbc" % "0.13.1"

libraryDependencies += "org.apache.spark" % "spark-core_2.11" % sparkVersion

libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" % sparkVersion

libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % sparkVersion

libraryDependencies += "org.apache.kafka" % "kafka-clients" % kafkaVersion

libraryDependencies += "org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % sparkVersion

libraryDependencies += "org.apache.kafka" % "connect-json" % "2.4.0"

libraryDependencies += "org.apache.spark" %% "spark-yarn" % "2.4.3"

libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.4.3"

libraryDependencies += "org.apache.spark" % "spark-sql-kafka-0-10_2.11" % "2.4.3"

//libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.1.0"

libraryDependencies += "com.typesafe.play" %% "play-json" % "2.7.4"

Я строю банку как Build -> Build Artifacts.. -> build в intellij

Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...