Я строю классификатор случайных лесов, используя pyspark. Я хочу установить featureSubsetStrategy
как число, а не auto
, sqrt
и т. Д. В документации указано:
featureSubsetStrategy = Param(parent='undefined', name='featureSubsetStrategy', doc='The number of features to consider for splits at each tree node. Supported options: auto, all, onethird, sqrt, log2, (0.0-1.0], [1-n].')
Однако, когда, например, я выбираю число, такое как 0.2
, я получаю следующую ошибку:
TypeError: Invalid param value given for param "featureSubsetStrategy". Could not convert <class 'float'> to string type
То же самое происходит, если бы я использовал featureSubsetStrategy=5
. Как вы установите его так, чтобы он мог быть int или float?
Пример:
# setting target label
label_col = 'veh_pref_Economy'
# random forest parameters
max_depth = 2
subset_strategy = 0.2037
impurity = 'gini'
min_instances_per_node = 41
num_trees = 1
seed = 1246
rf_econ_gen = (RandomForestClassifier()
.setLabelCol(label_col)
.setFeaturesCol("features")
.setMaxDepth(max_depth)
.setFeatureSubsetStrategy(subset_strategy)
.setImpurity(impurity)
.setMinInstancesPerNode(min_instances_per_node)
.setNumTrees(num_trees)
.setSeed(seed))
Возвращает:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/spark-2.2.1-bin-hadoop2.7/python/pyspark/ml/param/__init__.py in _set(self, **kwargs)
418 try:
--> 419 value = p.typeConverter(value)
420 except TypeError as e:
~/spark-2.2.1-bin-hadoop2.7/python/pyspark/ml/param/__init__.py in toString(value)
203 else:
--> 204 raise TypeError("Could not convert %s to string type" % type(value))
205
TypeError: Could not convert <class 'float'> to string type
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-28-71b9c2a0f1a0> in <module>()
3 .setFeaturesCol("features")
4 .setMaxDepth(max_depth)
----> 5 .setFeatureSubsetStrategy(subset_strategy)
6 .setImpurity(impurity)
7 .setMinInstancesPerNode(min_instances_per_node)
~/spark-2.2.1-bin-hadoop2.7/python/pyspark/ml/regression.py in setFeatureSubsetStrategy(self, value)
632 Sets the value of :py:attr:`featureSubsetStrategy`.
633 """
--> 634 return self._set(featureSubsetStrategy=value)
635
636 @since("1.4.0")
~/spark-2.2.1-bin-hadoop2.7/python/pyspark/ml/param/__init__.py in _set(self, **kwargs)
419 value = p.typeConverter(value)
420 except TypeError as e:
--> 421 raise TypeError('Invalid param value given for param "%s". %s' % (p.name, e))
422 self._paramMap[p] = value
423 return self
TypeError: Invalid param value given for param "featureSubsetStrategy". Could not convert <class 'float'> to string type