Попытка реализовать алгоритм регрессора дерева решений для некоторых обучающих данных, но когда я вызываю fit (), я получаю ошибку.
(trainingData, testData) = data.randomSplit([0.7, 0.3])
vecAssembler = VectorAssembler(inputCols=["_1", "_2", "_3", "_4", "_5", "_6", "_7", "_8", "_9", "_10"], outputCol="features")
dt = DecisionTreeRegressor(featuresCol="features", labelCol="_11")
dt_model = dt.fit(trainingData)
Создает ошибку
File "spark.py", line 100, in <module>
main()
File "spark.py", line 87, in main
dt_model = dt.fit(trainingData)
File "/opt/spark/python/pyspark/ml/base.py", line 132, in fit
return self._fit(dataset)
File "/opt/spark/python/pyspark/ml/wrapper.py", line 295, in _fit
java_model = self._fit_java(dataset)
File "/opt/spark/python/pyspark/ml/wrapper.py", line 292, in _fit_java
return self._java_obj.fit(dataset._jdf)
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/opt/spark/python/pyspark/sql/utils.py", line 79, in deco
raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u'requirement failed: Column features must be of type struct<type:tinyint,size:int,indices:array<int>,values:array<double>> but was actually struct<type:tinyint,size:int,indices:array<int>,values:array<double>>.'
Ноструктуры данных точно такие же.