У меня есть основной scala класс в пакете sparks.seql.project
, и я отправляю свою работу Spark в режиме клиента с командой:
spark-submit --class sparks.seql.project.EcommDataReader --master yarn --deploy-mode client D:\Spark-Workspace\Big-Data-Project\untitled\out\artifacts\Spark_Scala_jar\Spark-Scala.jar
Тем не менее я получаю:
2020-05-04 22:32:37,919 WARN deploy.SparkSubmit$$anon$2: Failed to load sparks.seql.project.EcommDataReader.
java.lang.ClassNotFoundException: sparks.seql.project.EcommDataReader
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:810)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2020-05-04 22:32:38,004 INFO util.ShutdownHookManager: Shutdown hook called
2020-05-04 22:32:38,010 INFO util.ShutdownHookManager: Deleting directory C:\Users\Harsh\AppData\Local\Temp\spark-83beb237-5bf6-4e84-881a-60b63135f7e3
Мой код выглядит так:
package sparks.seql.project
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.types.IntegerType
import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession}
import org.apache.spark.storage.StorageLevel
import org.apache.spark.sql.functions._
object EcommDataReader {
val spark = SparkSession.builder.appName("SapientQ1")
.master("yarn")
.enableHiveSupport()
// .config("hive.metastore.uris","thrift://localhost:9083")
.config("spark.sql.warehouse.dir", "file://f:/sparkWarehouse/")
.config("spark.shuffle.service.enabled", "true")
.config("hive.exec.dynamic.partition", "true")
.config("hive.exec.dynamic.partition.mode", "non-strict")
.config("spark.yarn.am.memory","512m")
.config("spark.yarn.am.cores","1")
.config("spark.eventLog.dir","D:/sparkeventlog/")
.getOrCreate()
import spark.implicits._
def main(args: Array[String]): Unit = {
Logger.getLogger("org").setLevel(Level.OFF)
...
...
Хотя это кажется небольшой проблемой, но ничего не работает!
ОБНОВЛЕНИЕ:
build. SBT
name := "Spark-Scala"
version := "0.1"
scalaVersion := "2.11.11"
val sparkVersion = "2.4.3"
val kafkaVersion = "0.11.0.0"
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-core" % "2.8.7"
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-databind" % "2.8.7"
dependencyOverrides += "com.fasterxml.jackson.module" % "jackson-module-scala_2.11" % "2.8.7"
//libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.7.7"
//
//libraryDependencies += "org.apache.spark" %% "spark-catalyst" % "2.1.0" % Test
//
//libraryDependencies += "org.apache.hive" % "hive-exec" % "2.1.0"
//
//libraryDependencies += "org.apache.hive" % "hive-metastore" % "2.1.0"
//libraryDependencies += "org.apache.hive" % "hive-jdbc" % "0.13.1"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % sparkVersion
libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" % sparkVersion
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % sparkVersion
libraryDependencies += "org.apache.kafka" % "kafka-clients" % kafkaVersion
libraryDependencies += "org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % sparkVersion
libraryDependencies += "org.apache.kafka" % "connect-json" % "2.4.0"
libraryDependencies += "org.apache.spark" %% "spark-yarn" % "2.4.3"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.4.3"
libraryDependencies += "org.apache.spark" % "spark-sql-kafka-0-10_2.11" % "2.4.3"
//libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.1.0"
libraryDependencies += "com.typesafe.play" %% "play-json" % "2.7.4"
Я строю банку как Build -> Build Artifacts.. -> build
в intellij