Я новичок в Spark и пытаюсь прочитать файл CSV с помощью проекта Java maven, но получаю ошибку ArrayIndexOutOfBoundsException.
Зависимости:
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>2.4.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>2.4.0</version>
</dependency>
</dependencies>
Код:
SparkSession spark = SparkSession
.builder()
.appName("Java Spark SQL Example")
.config("spark.master", "local")
.getOrCreate();
//Read file
Dataset<Row> df = spark.read()
.format("csv")
.load("test.csv");
Файл CSV
name,code
A,1
B,3
C,5
Вот трассировка стека
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/11/12 11:16:25 INFO SparkContext: Running Spark version 2.4.0
18/11/12 11:16:25 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/11/12 11:16:25 INFO SparkContext: Submitted application: Java Spark SQL Example
18/11/12 11:16:25 INFO SecurityManager: Changing view acls to: vkumar
18/11/12 11:16:25 INFO SecurityManager: Changing modify acls to: vkumar
18/11/12 11:16:25 INFO SecurityManager: Changing view acls groups to:
18/11/12 11:16:25 INFO SecurityManager: Changing modify acls groups to:
18/11/12 11:16:25 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(vkumar); groups with view permissions: Set(); users with modify permissions: Set(vkumar); groups with modify permissions: Set()
18/11/12 11:16:26 INFO Utils: Successfully started service 'sparkDriver' on port 54382.
18/11/12 11:16:26 INFO SparkEnv: Registering MapOutputTracker
18/11/12 11:16:26 INFO SparkEnv: Registering BlockManagerMaster
18/11/12 11:16:26 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
18/11/12 11:16:26 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
18/11/12 11:16:26 INFO DiskBlockManager: Created local directory at /private/var/folders/32/7klbv5_94wbddn9kgzvbwnkr0000gn/T/blockmgr-ddd2a79d-7fae-48e4-9658-1a8e2a8bb734
18/11/12 11:16:26 INFO MemoryStore: MemoryStore started with capacity 4.1 GB
18/11/12 11:16:26 INFO SparkEnv: Registering OutputCommitCoordinator
18/11/12 11:16:26 INFO Utils: Successfully started service 'SparkUI' on port 4040.
18/11/12 11:16:26 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://kirmac133.domain.com:4040
18/11/12 11:16:26 INFO Executor: Starting executor ID driver on host localhost
18/11/12 11:16:26 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 54383.
18/11/12 11:16:26 INFO NettyBlockTransferService: Server created on kirmac133.domain.com:54383
18/11/12 11:16:26 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/11/12 11:16:26 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, kirmac133.domain.com, 54383, None)
18/11/12 11:16:26 INFO BlockManagerMasterEndpoint: Registering block manager kirmac133.domain.com:54383 with 4.1 GB RAM, BlockManagerId(driver, kirmac133.domain.com, 54383, None)
18/11/12 11:16:26 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, kirmac133.domain.com, 54383, None)
18/11/12 11:16:26 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, kirmac133.domain.com, 54383, None)
18/11/12 11:16:26 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/Users/vkumar/Documents/intellij/tmvalidator/spark-warehouse').
18/11/12 11:16:26 INFO SharedState: Warehouse path is 'file:/Users/vkumar/Documents/intellij/tmvalidator/spark-warehouse'.
18/11/12 11:16:27 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
18/11/12 11:16:28 INFO FileSourceStrategy: Pruning directories with:
18/11/12 11:16:28 INFO FileSourceStrategy: Post-Scan Filters: (length(trim(value#0, None)) > 0)
18/11/12 11:16:28 INFO FileSourceStrategy: Output Data Schema: struct<value: string>
18/11/12 11:16:28 INFO FileSourceScanExec: Pushed Filters:
18/11/12 11:16:29 INFO CodeGenerator: Code generated in 130.688148 ms
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 10582
at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.accept(BytecodeReadingParanamer.java:563)
at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.access$200(BytecodeReadingParanamer.java:338)
at com.thoughtworks.paranamer.BytecodeReadingParanamer.lookupParameterNames(BytecodeReadingParanamer.java:103)
at com.thoughtworks.paranamer.CachingParanamer.lookupParameterNames(CachingParanamer.java:90)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.getCtorParams(BeanIntrospector.scala:44)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1(BeanIntrospector.scala:58)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1$adapted(BeanIntrospector.scala:58)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:240)
at scala.collection.Iterator.foreach(Iterator.scala:937)
at scala.collection.Iterator.foreach$(Iterator.scala:937)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
at scala.collection.IterableLike.foreach(IterableLike.scala:70)
at scala.collection.IterableLike.foreach$(IterableLike.scala:69)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:240)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:237)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.findConstructorParam$1(BeanIntrospector.scala:58)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$19(BeanIntrospector.scala:176)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:194)
at scala.collection.TraversableLike.map(TraversableLike.scala:233)
at scala.collection.TraversableLike.map$(TraversableLike.scala:226)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:194)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$14(BeanIntrospector.scala:170)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$14$adapted(BeanIntrospector.scala:169)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:240)
at scala.collection.immutable.List.foreach(List.scala:388)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:240)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:237)
at scala.collection.immutable.List.flatMap(List.scala:351)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.apply(BeanIntrospector.scala:169)
at com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$._descriptorFor(ScalaAnnotationIntrospectorModule.scala:22)
at com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$.fieldName(ScalaAnnotationIntrospectorModule.scala:30)
at com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$.findImplicitPropertyName(ScalaAnnotationIntrospectorModule.scala:78)
at com.fasterxml.jackson.databind.introspect.AnnotationIntrospectorPair.findImplicitPropertyName(AnnotationIntrospectorPair.java:467)
at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector._addFields(POJOPropertiesCollector.java:351)
at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.collectAll(POJOPropertiesCollector.java:283)
at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.getJsonValueMethod(POJOPropertiesCollector.java:169)
at com.fasterxml.jackson.databind.introspect.BasicBeanDescription.findJsonValueMethod(BasicBeanDescription.java:223)
at com.fasterxml.jackson.databind.ser.BasicSerializerFactory.findSerializerByAnnotations(BasicSerializerFactory.java:348)
at com.fasterxml.jackson.databind.ser.BeanSerializerFactory._createSerializer2(BeanSerializerFactory.java:210)
at com.fasterxml.jackson.databind.ser.BeanSerializerFactory.createSerializer(BeanSerializerFactory.java:153)
at com.fasterxml.jackson.databind.SerializerProvider._createUntypedSerializer(SerializerProvider.java:1203)
at com.fasterxml.jackson.databind.SerializerProvider._createAndCacheUntypedSerializer(SerializerProvider.java:1157)
at com.fasterxml.jackson.databind.SerializerProvider.findValueSerializer(SerializerProvider.java:481)
at com.fasterxml.jackson.databind.SerializerProvider.findTypedValueSerializer(SerializerProvider.java:679)
at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:107)
at com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:3559)
at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2927)
at org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:52)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:142)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:339)
at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3384)
at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:2545)
at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:3365)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3365)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2545)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2759)
at org.apache.spark.sql.execution.datasources.csv.TextInputCSVDataSource$.infer(CSVDataSource.scala:232)
at org.apache.spark.sql.execution.datasources.csv.CSVDataSource.inferSchema(CSVDataSource.scala:68)
at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.inferSchema(CSVFileFormat.scala:63)
at org.apache.spark.sql.execution.datasources.DataSource.$anonfun$getOrInferFileFormatSchema$12(DataSource.scala:183)
at scala.Option.orElse(Option.scala:289)
at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:180)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at kir.com.tmvalidator.Validator.init(Validator.java:30)
at kir.com.tmvalidator.Home.main(Home.java:7)
18/11/12 11:16:29 INFO SparkContext: Invoking stop() from shutdown hook
18/11/12 11:16:29 INFO SparkUI: Stopped Spark web UI at http://kirmac133.domain.com:4040
18/11/12 11:16:29 INFO ContextCleaner: Cleaned accumulator 4
18/11/12 11:16:29 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/11/12 11:16:29 INFO MemoryStore: MemoryStore cleared
18/11/12 11:16:29 INFO BlockManager: BlockManager stopped
18/11/12 11:16:29 INFO BlockManagerMaster: BlockManagerMaster stopped
18/11/12 11:16:29 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/11/12 11:16:29 INFO SparkContext: Successfully stopped SparkContext
18/11/12 11:16:29 INFO ShutdownHookManager: Shutdown hook called
18/11/12 11:16:29 INFO ShutdownHookManager: Deleting directory /private/var/folders/32/7klbv5_94wbddn9kgzvbwnkr0000gn/T/spark-1bde2bb5-603d-4fae-9128-7d92f259077b