Я использую Мельбурнский Жилищный Набор данных от Kaggle, чтобы приспособить к нему регрессионную модель, с ценой, являющейся целевым значением.Вы можете найти набор данных здесь
import numpy as np
import pandas as pd
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.ensemble.partial_dependence import partial_dependence, plot_partial_dependence
from sklearn.preprocessing import Imputer
cols_to_use = ['Distance', 'Landsize', 'BuildingArea']
data = pd.read_csv('data/melb_house_pricing.csv')
# drop rows where target is NaN
data = data.loc[~(data['Price'].isna())]
y = data.Price
X = data[cols_to_use]
my_imputer = Imputer()
imputed_X = my_imputer.fit_transform(X)
print(f"Contains NaNs in training data: {np.isnan(imputed_X).sum()}")
print(f"Contains NaNs in target data: {np.isnan(y).sum()}")
print(f"Contains Infinity: {np.isinf(imputed_X).sum()}")
print(f"Contains Infinity: {np.isinf(y).sum()}")
my_model = GradientBoostingRegressor()
my_model.fit(imputed_X, y)
# Here we make the plot
my_plots = plot_partial_dependence(my_model,
features=[0, 2], # column numbers of plots we want to show
X=X, # raw predictors data.
feature_names=['Distance', 'Landsize', 'BuildingArea'], # labels on graphs
grid_resolution=10) # number of values to plot on x axis
Даже после использования Imputer от sklearn я получаю следующую ошибку -
Contains NaNs in training data: 0
Contains NaNs in target data: 0
Contains Infinity: 0
Contains Infinity: 0
/Users/adimyth/.local/lib/python3.7/site-packages/sklearn/utils/deprecation.py:85: DeprecationWarning: Function plot_partial_dependence is deprecated; The function ensemble.plot_partial_dependence has been deprecated in favour of sklearn.inspection.plot_partial_dependence in 0.21 and will be removed in 0.23.
warnings.warn(msg, category=DeprecationWarning)
Traceback (most recent call last):
File "partial_dependency_plots.py", line 29, in <module>
grid_resolution=10) # number of values to plot on x axis
File "/Users/adimyth/.local/lib/python3.7/site-packages/sklearn/utils/deprecation.py", line 86, in wrapped
return fun(*args, **kwargs)
File "/Users/adimyth/.local/lib/python3.7/site-packages/sklearn/ensemble/partial_dependence.py", line 286, in plot_partial_dependence
X = check_array(X, dtype=DTYPE, order='C')
File "/Users/adimyth/.local/lib/python3.7/site-packages/sklearn/utils/validation.py", line 542, in check_array
allow_nan=force_all_finite == 'allow-nan')
File "/Users/adimyth/.local/lib/python3.7/site-packages/sklearn/utils/validation.py", line 56, in _assert_all_finite
raise ValueError(msg_err.format(type_err, X.dtype))
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
Как вы можете видеть, когдаЯ печатаю количество NaN в imputed_X
, получаю 0. Итак, почему я все еще получаю ValueError.Любая помощь?