Проверьте это:
scala> val df = Seq((101,Array("a","b","c","d","e")),(102,Array("q","w","e")),(103,Array("z","x","w","t","e","q","s"))).toDF("emp","list")
df: org.apache.spark.sql.DataFrame = [emp: int, list: array<string>]
scala> df.show(false)
+---+---------------------+
|emp|list |
+---+---------------------+
|101|[a, b, c, d, e] |
|102|[q, w, e] |
|103|[z, x, w, t, e, q, s]|
+---+---------------------+
scala> val udf_slice = udf( (x:Seq[String]) => x.grouped(3).toList )
udf_slice: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,ArrayType(ArrayType(StringType,true),true),Some(List(ArrayType(StringType,true))))
scala> df.select(col("*"), explode(udf_slice($"list")).as("newlist")).select($"emp", $"newlist"(0).as("col1"), $"newlist"(1).as("col2"), $"newlist"(2).as("col3") ).show(false)
+---+----+----+----+
|emp|col1|col2|col3|
+---+----+----+----+
|101|a |b |c |
|101|d |e |null|
|102|q |w |e |
|103|z |x |w |
|103|t |e |q |
|103|s |null|null|
+---+----+----+----+
scala>
Spark 2.4 - только что попытался реализовать без udfs .. но функция slice () не принимает другие столбцы в качестве параметров для диапазона
val df = Seq((101,Array("a","b","c","d","e")),(102,Array("q","w","e")),(103,Array("z","x","w","t","e","q","s"))).toDF("emp","list")
df.show(false)
val df2 = df.withColumn("list_size_arr", array_repeat(lit(1), ceil(size('list)/3).cast("int")) )
val df3 = df2.select(col("*"),posexplode('list_size_arr))
val udf_slice = udf( (x:Seq[String],start:Int, end:Int ) => x.slice(start,end) )
df3.withColumn("newlist",udf_slice('list,'pos*3, ('pos+1)*3 )).select($"emp", $"newlist").show(false)
Результаты:
+---+---------------------+
|emp|list |
+---+---------------------+
|101|[a, b, c, d, e] |
|102|[q, w, e] |
|103|[z, x, w, t, e, q, s]|
+---+---------------------+
+---+---------+
|emp|newlist |
+---+---------+
|101|[a, b, c]|
|101|[d, e] |
|102|[q, w, e]|
|103|[z, x, w]|
|103|[t, e, q]|
|103|[s] |
+---+---------+
Получить в отдельные столбцы
val df4 = df3.withColumn("newlist",udf_slice('list,'pos*3, ('pos+1)*3 )).select($"emp", $"newlist")
df4.select($"emp", $"newlist"(0).as("col1"), $"newlist"(1).as("col2"), $"newlist"(2).as("col3") ).show(false)
+---+----+----+----+
|emp|col1|col2|col3|
+---+----+----+----+
|101|a |b |c |
|101|d |e |null|
|102|q |w |e |
|103|z |x |w |
|103|t |e |q |
|103|s |null|null|
+---+----+----+----+