Я использовал метод логистической регрессии в обеих программах, и мне было интересно, почему я получаю разные результаты, особенно с коэффициентами.Результат, Infection, равен (1, 0), а Flushed является непрерывной переменной.
Python:
import statsmodels.api as sm
logit_model=sm.Logit(data['INFECTION'], data['Flushed'])
result=logit_model.fit()
print(result.summary())
Результаты:
Logit Regression Results
==============================================================================
Dep. Variable: INFECTION No. Observations: 414
Model: Logit Df Residuals: 413
Method: MLE Df Model: 0
Date: Fri, 24 Aug 2018 Pseudo R-squ.: -1.388
Time: 15:47:42 Log-Likelihood: -184.09
converged: True LL-Null: -77.104
LLR p-value: nan
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
Flushed -0.6467 0.070 -9.271 0.000 -0.783 -0.510
==============================================================================
R:
mylogit <- glm(INFECTION ~ Flushed, data = cvc, family = "binomial")
summary(mylogit)
Результаты:
Call:
glm(formula = INFECTION ~ Flushed, family = "binomial", data = cvc)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.0598 -0.3107 -0.2487 -0.2224 2.8051
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.91441 0.38639 -10.131 < 2e-16 ***
Flushed 0.22696 0.06049 3.752 0.000175 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1