все несущественные или NAN p-значения в Logit - PullRequest
0 голосов
/ 05 августа 2020

Я использую lo git со статистическими моделями, которые имеют около 25 регрессоров, от категориальных, порядковых и непрерывных переменных.

Мой код следующий, с его выводом:

a = np.asarray(data_nobands[[*all 25 columns*]], dtype=float)
mod_logit = sm.Logit(np.asarray(data_nobands['cured'], dtype=float),a)
logit_res = mod_logit.fit(method="nm", cov_type="cluster", cov_kwds={"groups":data_nobands['AGREEMENT_NUMBER']})

"""

                           Logit Regression Results                          

==============================================================================

Dep. Variable:                      y   No. Observations:                17316

Model:                          Logit   Df Residuals:                    17292

Method:                           MLE   Df Model:                           23

Date:                Wed, 05 Aug 2020   Pseudo R-squ.:                -0.02503

Time:                        19:49:27   Log-Likelihood:                -10274.

converged:                      False   LL-Null:                       -10023.

Covariance Type:              cluster   LLR p-value:                     1.000

==============================================================================

                 coef    std err          z      P>|z|      [0.025      0.975]

------------------------------------------------------------------------------

x1          3.504e-05      0.009      0.004      0.997      -0.017       0.017

x2          1.944e-05        nan        nan        nan         nan         nan

x3          3.504e-05      2.173   1.61e-05      1.000      -4.259       4.259

x4          3.504e-05      2.912    1.2e-05      1.000      -5.707       5.707

x5          3.504e-05      0.002      0.016      0.988      -0.004       0.004

x6          3.504e-05      0.079      0.000      1.000      -0.154       0.154

x7          3.504e-05      0.003      0.014      0.989      -0.005       0.005

x8          3.504e-05      0.012      0.003      0.998      -0.023       0.023

x9          3.504e-05      0.020      0.002      0.999      -0.039       0.039

x10         3.504e-05      0.021      0.002      0.999      -0.041       0.041

x11         3.504e-05      0.011      0.003      0.997      -0.021       0.022

x12         8.831e-06   5.74e-06      1.538      0.124   -2.42e-06    2.01e-05

x13          4.82e-06   9.23e-06      0.522      0.602   -1.33e-05    2.29e-05

x14         3.504e-05      0.000      0.248      0.804      -0.000       0.000

x15         3.504e-05   4.02e-05      0.871      0.384   -4.38e-05       0.000

x16         1.815e-05   1.58e-05      1.152      0.249   -1.27e-05     4.9e-05

x17         3.504e-05      0.029      0.001      0.999      -0.057       0.057

x18         3.504e-05      0.000      0.190      0.849      -0.000       0.000

x19         9.494e-06        nan        nan        nan         nan         nan

x20         1.848e-05        nan        nan        nan         nan         nan

x21         3.504e-05      0.026      0.001      0.999      -0.051       0.051

x22         3.504e-05      0.037      0.001      0.999      -0.072       0.072

x23           -0.0005      0.000     -2.596      0.009      -0.001      -0.000

x24         3.504e-05      0.006      0.006      0.995      -0.011       0.011

x25         3.504e-05      0.011      0.003      0.998      -0.022       0.022

==============================================================================

""" 

С любыми другими method, такими как bfgs, lbfgs, минимизируйте, результат будет следующим:

"""

                           Logit Regression Results                          

==============================================================================

Dep. Variable:                      y   No. Observations:                17316

Model:                          Logit   Df Residuals:                    17292

Method:                           MLE   Df Model:                           23

Date:                Wed, 05 Aug 2020   Pseudo R-squ.:                 -0.1975

Time:                        19:41:22   Log-Likelihood:                -12003.

converged:                      False   LL-Null:                       -10023.

Covariance Type:              cluster   LLR p-value:                     1.000

==============================================================================

                 coef    std err          z      P>|z|      [0.025      0.975]

------------------------------------------------------------------------------

x1                  0      0.152          0      1.000      -0.299       0.299

x2                  0    724.618          0      1.000   -1420.225    1420.225

x3                  0     20.160          0      1.000     -39.514      39.514

x4                  0     23.008          0      1.000     -45.094      45.094

x5                  0      0.010          0      1.000      -0.020       0.020

x6                  0      1.335          0      1.000      -2.617       2.617

x7                  0      0.020          0      1.000      -0.039       0.039

x8                  0      0.109          0      1.000      -0.214       0.214

x9                  0      0.070          0      1.000      -0.137       0.137

x10                 0      0.175          0      1.000      -0.343       0.343

x11                 0      0.045          0      1.000      -0.088       0.088

x12                 0   1.24e-05          0      1.000   -2.42e-05    2.42e-05

x13                 0   2.06e-05          0      1.000   -4.04e-05    4.04e-05

x14                 0      0.001          0      1.000      -0.002       0.002

x15                 0   5.16e-05          0      1.000      -0.000       0.000

x16                 0    1.9e-05          0      1.000   -3.73e-05    3.73e-05

x17                 0      0.079          0      1.000      -0.155       0.155

x18                 0      0.000          0      1.000      -0.001       0.001

x19                 0   1145.721          0      1.000   -2245.573    2245.573

x20                 0        nan        nan        nan         nan         nan

x21                 0      0.028          0      1.000      -0.055       0.055

x22                 0      0.037          0      1.000      -0.072       0.072

x23                 0      0.000          0      1.000      -0.000       0.000

x24                 0      0.005          0      1.000      -0.010       0.010

x25                 0      0.015          0      1.000      -0.029       0.029

==============================================================================

"""

Как вы можете видеть, я получаю p-значения "nan" или очень нет существенный. В чем может быть проблема?

...