Возможное решение может быть:
scala> output.show
+---+---------+
| id|vectorCol|
+---+---------+
| 0|[1.2,1.3]|
| 1|[2.2,2.3]|
| 2|[3.2,3.3]|
+---+---------+
scala> output.printSchema
root
|-- id: integer (nullable = false)
|-- vectorCol: vector (nullable = true)
scala> import org.apache.spark.ml.linalg.DenseVector
import org.apache.spark.ml.linalg.DenseVector
scala> val toArr: Any => Array[Double] = _.asInstanceOf[DenseVector].toArray
toArr: Any => Array[Double] = <function1>
scala> val toArrUdf = udf(toArr)
toArrUdf: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,ArrayType(DoubleType,false),None)
scala> val df1 = output.withColumn("features_arr",toArrUdf('vectorCol))
scala> df1.show
+---+---------+------------+
| id|vectorCol|features_arr|
+---+---------+------------+
| 0|[1.2,1.3]| [1.2, 1.3]|
| 1|[2.2,2.3]| [2.2, 2.3]|
| 2|[3.2,3.3]| [3.2, 3.3]|
+---+---------+------------+
scala> df1.printSchema
root
|-- id: integer (nullable = false)
|-- vectorCol: vector (nullable = true)
|-- features_arr: array (nullable = true)
| |-- element: double (containsNull = false)
Возможную реализацию в pyspark можно увидеть по этой ссылке .
Дайте мне знать, если это поможет !!