Я использую RandomizedSearchCV & KNeighborsClassifier, чтобы попытаться предсказать дефолт по ссуде.
Использование RandomizedSearchCV кажется великолепным в теории, но когда я проверяю его, он находит лучший best_esimator_, который предсказывает всете же самые метки.
(данные разделены на 75%, оплачено 25% по умолчанию), поэтому я получаю точность 75%, но она просто прогнозирует все оплаченные деньги.
n_neighbors = [int(x) for x in np.linspace(start = 1, stop = len(X_train)/3, num = 5)]
weights = ['uniform', 'distance']
algorithm = ["auto","ball_tree","kd_tree","brute"]
leaf_size = [int(x) for x in np.linspace(10, 100, num = 5)]
p = [1,2]
random_grid = {'n_neighbors': n_neighbors,
'weights': weights,
'algorithm': algorithm,
'leaf_size': leaf_size,
'p': p}
knn_clf = KNeighborsClassifier()
knn_random = RandomizedSearchCV(estimator = knn_clf, param_distributions = random_grid, n_iter = 25, cv = 3, verbose=1,)
knn_random.fit(X_train, y_train)
Есть личто я могу сделать, чтобы бороться с этим? Есть ли параматер, который я могу передать, чтобы не допустить этого? Или я могу что-то сделать с моими данными?
y_test:
38 PAIDOFF
189 PAIDOFF
140 PAIDOFF
286 COLLECTION
142 PAIDOFF
101 PAIDOFF
187 PAIDOFF
139 PAIDOFF
149 PAIDOFF
11 PAIDOFF
269 COLLECTION
231 PAIDOFF
258 PAIDOFF
84 PAIDOFF
242 PAIDOFF
344 COLLECTION
104 PAIDOFF
214 PAIDOFF
109 PAIDOFF
76 PAIDOFF
41 PAIDOFF
262 COLLECTION
125 PAIDOFF
107 PAIDOFF
27 PAIDOFF
14 PAIDOFF
92 PAIDOFF
194 PAIDOFF
113 PAIDOFF
333 COLLECTION
...
320 COLLECTION
15 PAIDOFF
72 PAIDOFF
122 PAIDOFF
243 PAIDOFF
184 PAIDOFF
294 COLLECTION
280 COLLECTION
218 PAIDOFF
197 PAIDOFF
133 PAIDOFF
143 PAIDOFF
179 PAIDOFF
249 PAIDOFF
80 PAIDOFF
331 COLLECTION
137 PAIDOFF
103 PAIDOFF
120 PAIDOFF
248 PAIDOFF
5 PAIDOFF
236 PAIDOFF
219 PAIDOFF
322 COLLECTION
283 COLLECTION
135 PAIDOFF
124 PAIDOFF
293 COLLECTION
166 PAIDOFF
85 PAIDOFF
прогноз:
array(['PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
'PAIDOFF', 'PAIDOFF'], dtype=object)