Pyspark: функция произвольного леса SubsetStrategy не принимает int или float - PullRequest
0 голосов
/ 11 сентября 2018

Я строю классификатор случайных лесов, используя pyspark. Я хочу установить featureSubsetStrategy как число, а не auto, sqrt и т. Д. В документации указано:

featureSubsetStrategy = Param(parent='undefined', name='featureSubsetStrategy', doc='The number of features to consider for splits at each tree node. Supported options: auto, all, onethird, sqrt, log2, (0.0-1.0], [1-n].')

Однако, когда, например, я выбираю число, такое как 0.2, я получаю следующую ошибку:

TypeError: Invalid param value given for param "featureSubsetStrategy". Could not convert <class 'float'> to string type

То же самое происходит, если бы я использовал featureSubsetStrategy=5. Как вы установите его так, чтобы он мог быть int или float?

Пример:

# setting target label
label_col = 'veh_pref_Economy'

# random forest parameters
max_depth = 2
subset_strategy = 0.2037
impurity = 'gini'
min_instances_per_node = 41
num_trees = 1
seed = 1246

rf_econ_gen = (RandomForestClassifier()
                 .setLabelCol(label_col)
                 .setFeaturesCol("features")
                 .setMaxDepth(max_depth)
                 .setFeatureSubsetStrategy(subset_strategy)
                 .setImpurity(impurity)
                 .setMinInstancesPerNode(min_instances_per_node)
                 .setNumTrees(num_trees)
                 .setSeed(seed))

Возвращает:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/spark-2.2.1-bin-hadoop2.7/python/pyspark/ml/param/__init__.py in _set(self, **kwargs)
    418                 try:
--> 419                     value = p.typeConverter(value)
    420                 except TypeError as e:

~/spark-2.2.1-bin-hadoop2.7/python/pyspark/ml/param/__init__.py in toString(value)
    203         else:
--> 204             raise TypeError("Could not convert %s to string type" % type(value))
    205 

TypeError: Could not convert <class 'float'> to string type

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-28-71b9c2a0f1a0> in <module>()
      3                  .setFeaturesCol("features")
      4                  .setMaxDepth(max_depth)
----> 5                  .setFeatureSubsetStrategy(subset_strategy)
      6                  .setImpurity(impurity)
      7                  .setMinInstancesPerNode(min_instances_per_node)

~/spark-2.2.1-bin-hadoop2.7/python/pyspark/ml/regression.py in setFeatureSubsetStrategy(self, value)
    632         Sets the value of :py:attr:`featureSubsetStrategy`.
    633         """
--> 634         return self._set(featureSubsetStrategy=value)
    635 
    636     @since("1.4.0")

~/spark-2.2.1-bin-hadoop2.7/python/pyspark/ml/param/__init__.py in _set(self, **kwargs)
    419                     value = p.typeConverter(value)
    420                 except TypeError as e:
--> 421                     raise TypeError('Invalid param value given for param "%s". %s' % (p.name, e))
    422             self._paramMap[p] = value
    423         return self

TypeError: Invalid param value given for param "featureSubsetStrategy". Could not convert <class 'float'> to string type
...