Я использую Logisti c Regression (после стандартизации) с сильной регуляризацией (C = 0,01) и Loo, чтобы классифицировать этот набор данных:
A B C label
kid_no
1 129.614649 -99.151354 -2.98 1.0
2 200.693534 42.648611 -1.95 1.0
3 -125.098999 -36.014559 -1.07 1.0
4 13.527978 -2.371960 -2.79 1.0
5 -30.609239 16.293491 -12.30 1.0
6 -72.125555 -84.524834 -2.43 0.0
7 -153.870569 -36.243889 -2.95 1.0
10 258.214146 -16.578850 -1.63 0.0
11 86.980021 130.938644 -5.71 1.0
15 -56.648229 14.971427 -2.40 0.0
16 -189.636275 2.129260 -2.66 1.0
18 162.890974 -51.377910 -0.84 1.0
19 -100.995694 -19.742378 -0.61 1.0
20 11.096394 2.665950 -1.25 0.0
22 -3.162355 57.226235 3.65 0.0
24 28.869347 -98.280911 -3.18 1.0
25 145.096445 -21.978577 -0.34 0.0
31 90.369657 -88.607350 -0.80 0.0
32 -33.243305 62.774210 -4.78 1.0
33 -151.254309 -8.350768 -4.26 1.0
34 75.864617 40.304751 0.01 1.0
35 11.810198 70.309071 -2.16 0.0
36 -93.183271 -0.128391 -0.59 0.0
37 -197.610640 2.743368 -0.81 0.0
38 -52.334223 62.294261 -2.09 0.0
40 29.642026 -68.567169 -3.08 0.0
41 -43.990224 13.344720 -0.55 1.0
99 59.092902 113.274902 -2.39 1.0
Это результаты pred_proba:
y_test y_pred
[1.] [0.45194154 0.54805846]
[1.] [0.45011247 0.54988753]
[1.] [0.45199647 0.54800353]
[1.] [0.44222063 0.55777937]
[1.] [0.41174885 0.58825115]
[0.] [0.4061526 0.5938474]
[1.] [0.44335584 0.55664416]
[0.] [0.40688221 0.59311779]
[1.] [0.43070897 0.56929103]
[0.] [0.40482294 0.59517706]
[1.] [0.44294859 0.55705141]
[1.] [0.4588555 0.5411445]
[1.] [0.45314094 0.54685906]
[0.] [0.4117038 0.5882962]
[0.] [0.42328142 0.57671858]
[1.] [0.44838752 0.55161248]
[0.] [0.41578366 0.58421634]
[0.] [0.41484554 0.58515446]
[1.] [0.43148373 0.56851627]
[1.] [0.43602517 0.56397483]
[1.] [0.45577475 0.54422525]
[0.] [0.40255148 0.59744852]
[0.] [0.41232277 0.58767723]
[0.] [0.40682665 0.59317335]
[0.] [0.40262185 0.59737815]
[0.] [0.40497565 0.59502435]
[1.] [0.45181924 0.54818076]
[1.] [0.44330359 0.55669641]
Удивительно (y_pred [:, 1] <0.58) == True) дает мне 27/28, т.е. 96% точности! Итак, я получаю противоположные вероятности! </p>
Любое объяснение?
Это мой код:
for train_index, test_index in loo.split(X):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
clf = LogisticRegression(C=0.01)
clf.fit(X_train,y_train)
y_pred = clf.predict_proba(X_test)
print(y_test,y_pred[0])
clf.classes_ is array ([0., 1.] )