Я выполняю следующие операции:
DataSet<Row> df1 = spark.read().format(AVRO_MODE).load(path1);
DataSet<Row> distinctDf1 = df1.distinct()
DataSet<Row> df2 = spark.read().format("com.databricks.spark.avro").load(path2);
DataSet<Row> joinOutput = distinctDf1.join(df2,getColumnStrings(columnList), "leftanti");
joinOutput.repartition(numPartitions, Column.col1).write().format("com.databricks.spark.avro").save(outputPath);
Когда я запускаю вышеупомянутое, я получаю следующую ошибку
18/10/26 02:35:19 [Driver]: ERROR yarn.ApplicationMaster: User class threw exception: java.lang.StackOverflowError
java.lang.StackOverflowError
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.get(TreeNode.scala:58)
at org.apache.spark.sql.catalyst.trees.TreeNode.<init>(TreeNode.scala:80)
at org.apache.spark.sql.catalyst.expressions.Expression.<init>(Expression.scala:55)
at org.apache.spark.sql.catalyst.expressions.LeafExpression.<init>(Expression.scala:335)
at org.apache.spark.sql.catalyst.expressions.BoundReference.<init>(BoundAttribute.scala:32)
at org.apache.spark.sql.execution.CodegenSupport$$anonfun$4.apply(WholeStageCodegenExec.scala:153)
at org.apache.spark.sql.execution.CodegenSupport$$anonfun$4.apply(WholeStageCodegenExec.scala:152)
at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418)
at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1233)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1223)
at scala.collection.immutable.Stream.drop(Stream.scala:858)
at scala.collection.immutable.Stream.drop(Stream.scala:202)
at scala.collection.LinearSeqOptimized$class.apply(LinearSeqOptimized.scala:64)
Будем благодарны за любые предложения о том, почему произошла вышеуказанная ошибка.