Я запустил следующий скрипт, затем произошла ошибка.
Код:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('ml-bank').getOrCreate()
my_data = spark.read.csv('C:\Personal\Portfolio\spark-ml\ind-ban-comment.csv', header = True, inferSchema = True)
my_data.printSchema()
Ошибка:
Py4JJavaError: An error occurred while calling o33.csv.
: java.lang.IllegalStateException: No active or default Spark session found
at org.apache.spark.sql.SparkSession$.$anonfun$active$2(SparkSession.scala:1009)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.SparkSession$.$anonfun$active$1(SparkSession.scala:1009)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.SparkSession$.active(SparkSession.scala:1008)
at org.apache.spark.sql.execution.datasources.v2.FileDataSourceV2.sparkSession(FileDataSourceV2.scala:42)
at org.apache.spark.sql.execution.datasources.v2.FileDataSourceV2.sparkSession$(FileDataSourceV2.scala:42)
at org.apache.spark.sql.execution.datasources.v2.csv.CSVDataSourceV2.sparkSession$lzycompute(CSVDataSourceV2.scala:26)
at org.apache.spark.sql.execution.datasources.v2.csv.CSVDataSourceV2.sparkSession(CSVDataSourceV2.scala:26)
at org.apache.spark.sql.execution.datasources.v2.FileDataSourceV2.qualifiedPathName(FileDataSourceV2.scala:59)
at org.apache.spark.sql.execution.datasources.v2.FileDataSourceV2.$anonfun$getTableName$1(FileDataSourceV2.scala:53)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at org.apache.spark.sql.execution.datasources.v2.FileDataSourceV2.getTableName(FileDataSourceV2.scala:53)
at org.apache.spark.sql.execution.datasources.v2.FileDataSourceV2.getTableName$(FileDataSourceV2.scala:52)
at org.apache.spark.sql.execution.datasources.v2.csv.CSVDataSourceV2.getTableName(CSVDataSourceV2.scala:26)
at org.apache.spark.sql.execution.datasources.v2.csv.CSVDataSourceV2.getTable(CSVDataSourceV2.scala:34)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:220)
at scala.Option.map(Option.scala:230)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:206)
at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:648)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Unknown Source)
Я видел, что кто-то предложил использовать Java 8 в другом посте, но мой Java уже версии 8. Хотелось бы узнать, в чем проблема.
Спасибо!