Я использую lo git со статистическими моделями, которые имеют около 25 регрессоров, от категориальных, порядковых и непрерывных переменных.
Мой код следующий, с его выводом:
a = np.asarray(data_nobands[[*all 25 columns*]], dtype=float)
mod_logit = sm.Logit(np.asarray(data_nobands['cured'], dtype=float),a)
logit_res = mod_logit.fit(method="nm", cov_type="cluster", cov_kwds={"groups":data_nobands['AGREEMENT_NUMBER']})
"""
Logit Regression Results
==============================================================================
Dep. Variable: y No. Observations: 17316
Model: Logit Df Residuals: 17292
Method: MLE Df Model: 23
Date: Wed, 05 Aug 2020 Pseudo R-squ.: -0.02503
Time: 19:49:27 Log-Likelihood: -10274.
converged: False LL-Null: -10023.
Covariance Type: cluster LLR p-value: 1.000
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
x1 3.504e-05 0.009 0.004 0.997 -0.017 0.017
x2 1.944e-05 nan nan nan nan nan
x3 3.504e-05 2.173 1.61e-05 1.000 -4.259 4.259
x4 3.504e-05 2.912 1.2e-05 1.000 -5.707 5.707
x5 3.504e-05 0.002 0.016 0.988 -0.004 0.004
x6 3.504e-05 0.079 0.000 1.000 -0.154 0.154
x7 3.504e-05 0.003 0.014 0.989 -0.005 0.005
x8 3.504e-05 0.012 0.003 0.998 -0.023 0.023
x9 3.504e-05 0.020 0.002 0.999 -0.039 0.039
x10 3.504e-05 0.021 0.002 0.999 -0.041 0.041
x11 3.504e-05 0.011 0.003 0.997 -0.021 0.022
x12 8.831e-06 5.74e-06 1.538 0.124 -2.42e-06 2.01e-05
x13 4.82e-06 9.23e-06 0.522 0.602 -1.33e-05 2.29e-05
x14 3.504e-05 0.000 0.248 0.804 -0.000 0.000
x15 3.504e-05 4.02e-05 0.871 0.384 -4.38e-05 0.000
x16 1.815e-05 1.58e-05 1.152 0.249 -1.27e-05 4.9e-05
x17 3.504e-05 0.029 0.001 0.999 -0.057 0.057
x18 3.504e-05 0.000 0.190 0.849 -0.000 0.000
x19 9.494e-06 nan nan nan nan nan
x20 1.848e-05 nan nan nan nan nan
x21 3.504e-05 0.026 0.001 0.999 -0.051 0.051
x22 3.504e-05 0.037 0.001 0.999 -0.072 0.072
x23 -0.0005 0.000 -2.596 0.009 -0.001 -0.000
x24 3.504e-05 0.006 0.006 0.995 -0.011 0.011
x25 3.504e-05 0.011 0.003 0.998 -0.022 0.022
==============================================================================
"""
С любыми другими method
, такими как bfgs, lbfgs, минимизируйте, результат будет следующим:
"""
Logit Regression Results
==============================================================================
Dep. Variable: y No. Observations: 17316
Model: Logit Df Residuals: 17292
Method: MLE Df Model: 23
Date: Wed, 05 Aug 2020 Pseudo R-squ.: -0.1975
Time: 19:41:22 Log-Likelihood: -12003.
converged: False LL-Null: -10023.
Covariance Type: cluster LLR p-value: 1.000
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
x1 0 0.152 0 1.000 -0.299 0.299
x2 0 724.618 0 1.000 -1420.225 1420.225
x3 0 20.160 0 1.000 -39.514 39.514
x4 0 23.008 0 1.000 -45.094 45.094
x5 0 0.010 0 1.000 -0.020 0.020
x6 0 1.335 0 1.000 -2.617 2.617
x7 0 0.020 0 1.000 -0.039 0.039
x8 0 0.109 0 1.000 -0.214 0.214
x9 0 0.070 0 1.000 -0.137 0.137
x10 0 0.175 0 1.000 -0.343 0.343
x11 0 0.045 0 1.000 -0.088 0.088
x12 0 1.24e-05 0 1.000 -2.42e-05 2.42e-05
x13 0 2.06e-05 0 1.000 -4.04e-05 4.04e-05
x14 0 0.001 0 1.000 -0.002 0.002
x15 0 5.16e-05 0 1.000 -0.000 0.000
x16 0 1.9e-05 0 1.000 -3.73e-05 3.73e-05
x17 0 0.079 0 1.000 -0.155 0.155
x18 0 0.000 0 1.000 -0.001 0.001
x19 0 1145.721 0 1.000 -2245.573 2245.573
x20 0 nan nan nan nan nan
x21 0 0.028 0 1.000 -0.055 0.055
x22 0 0.037 0 1.000 -0.072 0.072
x23 0 0.000 0 1.000 -0.000 0.000
x24 0 0.005 0 1.000 -0.010 0.010
x25 0 0.015 0 1.000 -0.029 0.029
==============================================================================
"""
Как вы можете видеть, я получаю p-значения "nan" или очень нет существенный. В чем может быть проблема?