LogisticRegression pred_probe возвращает странные вероятности - PullRequest
0 голосов
/ 13 июля 2020

Я использую Logisti c Regression (после стандартизации) с сильной регуляризацией (C = 0,01) и Loo, чтобы классифицировать этот набор данных:

          A        B          C    label
kid_no              
1   129.614649  -99.151354  -2.98   1.0
2   200.693534  42.648611   -1.95   1.0
3   -125.098999 -36.014559  -1.07   1.0
4   13.527978   -2.371960   -2.79   1.0
5   -30.609239  16.293491   -12.30  1.0
6   -72.125555  -84.524834  -2.43   0.0
7   -153.870569 -36.243889  -2.95   1.0
10  258.214146  -16.578850  -1.63   0.0
11  86.980021   130.938644  -5.71   1.0
15  -56.648229  14.971427   -2.40   0.0
16  -189.636275 2.129260    -2.66   1.0
18  162.890974  -51.377910  -0.84   1.0
19  -100.995694 -19.742378  -0.61   1.0
20  11.096394   2.665950    -1.25   0.0
22  -3.162355   57.226235   3.65    0.0
24  28.869347   -98.280911  -3.18   1.0
25  145.096445  -21.978577  -0.34   0.0
31  90.369657   -88.607350  -0.80   0.0
32  -33.243305  62.774210   -4.78   1.0
33  -151.254309 -8.350768   -4.26   1.0
34  75.864617   40.304751   0.01    1.0
35  11.810198   70.309071   -2.16   0.0
36  -93.183271  -0.128391   -0.59   0.0
37  -197.610640 2.743368    -0.81   0.0
38  -52.334223  62.294261   -2.09   0.0
40  29.642026   -68.567169  -3.08   0.0
41  -43.990224  13.344720   -0.55   1.0
99  59.092902   113.274902  -2.39   1.0

Это результаты pred_proba:

y_test    y_pred
[1.] [0.45194154 0.54805846]
[1.] [0.45011247 0.54988753]
[1.] [0.45199647 0.54800353]
[1.] [0.44222063 0.55777937]
[1.] [0.41174885 0.58825115]
[0.] [0.4061526 0.5938474]
[1.] [0.44335584 0.55664416]
[0.] [0.40688221 0.59311779]
[1.] [0.43070897 0.56929103]
[0.] [0.40482294 0.59517706]
[1.] [0.44294859 0.55705141]
[1.] [0.4588555 0.5411445]
[1.] [0.45314094 0.54685906]
[0.] [0.4117038 0.5882962]
[0.] [0.42328142 0.57671858]
[1.] [0.44838752 0.55161248]
[0.] [0.41578366 0.58421634]
[0.] [0.41484554 0.58515446]
[1.] [0.43148373 0.56851627]
[1.] [0.43602517 0.56397483]
[1.] [0.45577475 0.54422525]
[0.] [0.40255148 0.59744852]
[0.] [0.41232277 0.58767723]
[0.] [0.40682665 0.59317335]
[0.] [0.40262185 0.59737815]
[0.] [0.40497565 0.59502435]
[1.] [0.45181924 0.54818076]
[1.] [0.44330359 0.55669641]

Удивительно (y_pred [:, 1] <0.58) == True) дает мне 27/28, т.е. 96% точности! Итак, я получаю противоположные вероятности! </p>

Любое объяснение?

Это мой код:


for train_index, test_index in loo.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    clf = LogisticRegression(C=0.01)
    clf.fit(X_train,y_train)
    y_pred = clf.predict_proba(X_test)
    print(y_test,y_pred[0])

clf.classes_ is array ([0., 1.] )

...