различные результаты в SAS и R для факториальной логистической регрессии - PullRequest
0 голосов
/ 22 октября 2018

Я пытаюсь выполнить эти факториальные логистические регрессии как в SAS, так и в R, но я получаю разные результаты в dry = rt * chi_ur !!!Почему ???

Мои данные:

id  dry rt  chi_ur
1   1   0   1
2   0   0   0
3   0   0   0
4   0   0   0
5   0   0   1
6   0   0   0
7   0   0   0
8   0   0   1
9   0   0   0
10  0   0   0
11  0   0   0
12  0   0   0
13  1   0   0
14  0   0   0
15  0   0   1
16  0   0   1
17  0   0   0
18  1   0   0
19  0   0   0
20  0   0   0
21  0   0   1
22  1   1   0
23  0   1   1
24  0   0   1
25  0   0   1
26  1   0   0
27  1   0   0
28  0   0   0
29  1   0   0
30  1   0   0
31  1   0   1
32  1   0   0
33  0   0   0
34  1   0   0
35  0   0   0
36  0   0   1
37  1   0   0
38  1   0   0
39  0   0   1
40  0   1   0
41  0   1   0
42  1   1   0
43  0   1   0
44  0   0   0
45  0   0   0
46  0   0   1
47  0   0   0
48  0   0   1
49  1   0   0
50  0   0   1
51  0   0   0
52  1   0   0
53  1   0   0
54  1   0   0
55  1   0   0
56  0   0   0
57  1   0   0
58  0   0   0
59  1   0   0
60  1   0   0
61  0   0   0
62  0   1   0
63  0   0   0
64  0   0   0
65  1   1   0
66  0   0   0
67  1   0   0
68  1   0   0
69  1   0   0
70  1   0   0
71  1   0   0
72  1   0   0
73  1   0   0
74  1   0   0
75  1   0   0
76  1   0   0
77  0   1   0
78  1   0   0
79  0   1   0
80  0   1   0
81  1   0   0
82  1   0   0
83  1   0   0
84  1   0   0
85  1   0   0
86  0   0   1
87  1   0   0
88  1   0   0
89  1   0   0
90  1   0   1
91  1   0   
92  1   0   
93  0   0   
94  0   1   
95  0   1   
96  0   1   
97  1   0   
98  1   0   

R код:

summary(glm(dry ~ chi_ur, data = en, family = binomial))
summary(glm(dry ~ rt, data = en, family = binomial))
summary(glm(dry ~ rt*chi_ur, data = en, family = binomial))

SAS код:

proc logistic data = en.en1 desc;
class chi_ur ;
model dry = chi_ur / expb;
run;

proc logistic data = en.en1 desc;
class rt ;
model dry = rt / expb;
run;

proc logistic data = en.en1 desc;
class rt chi_ur ;
model dry = rt chi_ur rt*chi_ur/ expb;
run;

My R результаты:

> summary(glm(dry ~ chi_ur, data = en, family = binomial))

Call:
glm(formula = dry ~ chi_ur, family = binomial, data = en)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.2601  -1.2601  -0.6231   1.0969   1.8626  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)   0.1924     0.2352   0.818   0.4133  
chi_ur       -1.7328     0.6782  -2.555   0.0106 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 124.59  on 89  degrees of freedom
Residual deviance: 116.37  on 88  degrees of freedom
  (8 observations deleted due to missingness)
AIC: 120.37

Number of Fisher Scoring iterations: 3

> summary(glm(dry ~ rt, data = en, family = binomial))

Call:
glm(formula = dry ~ rt, family = binomial, data = en)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.2181  -1.2181  -0.6945   1.1372   1.7552  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)  0.09531    0.21847   0.436   0.6626  
rt          -1.39459    0.68700  -2.030   0.0424 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 135.69  on 97  degrees of freedom
Residual deviance: 130.81  on 96  degrees of freedom
AIC: 134.81

Number of Fisher Scoring iterations: 4

> summary(glm(dry ~ rt*chi_ur, data = en, family = binomial))

Call:
glm(formula = dry ~ rt * chi_ur, family = binomial, data = en)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.3304  -1.3304  -0.6444   1.0317   1.8297  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)   
(Intercept)    0.3528     0.2559   1.379  0.16798   
rt            -1.2001     0.7360  -1.631  0.10297   
chi_ur        -1.8192     0.6897  -2.637  0.00835 **
rt:chi_ur    -12.8996  1455.3979  -0.009  0.99293   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 124.59  on 89  degrees of freedom
Residual deviance: 113.07  on 86  degrees of freedom
  (8 observations deleted due to missingness)
AIC: 121.07

Number of Fisher Scoring iterations: 14

Мои результаты SAS:

The SAS System     

The LOGISTIC Procedure

Model Information 
Data Set EN.EN1 
Response Variable dry 
Number of Response Levels 2 
Model binary logit 
Optimization Technique Fisher's scoring     

Number of Observations Read 98 
Number of Observations Used 90    

Response Profile 
Ordered
Value dry Total
Frequency 
1 1 43 
2 0 47 

Probability modeled is dry='1'.   

Note: 8 observations were deleted due to missing values for the response or explanatory variables. 

Class Level Information 
Class Value Design
Variables 
chi_ur 0 1 
  1 -1 


Model Convergence Status 
Convergence criterion (GCONV=1E-8) satisfied.        

Model Fit Statistics 
Criterion Intercept Only Intercept and
Covariates 
AIC 126.589 120.371 
SC 129.088 125.371 
-2 Log L 124.589 116.371         

Testing Global Null Hypothesis: BETA=0 
Test Chi-Square DF Pr > ChiSq 
Likelihood Ratio 8.2175 1 0.0041 
Score 7.6262 1 0.0058 
Wald 6.5262 1 0.0106    

Type 3 Analysis of Effects 
Effect DF Wald
Chi-Square Pr > ChiSq 
chi_ur 1 6.5262 0.0106     

Analysis of Maximum Likelihood Estimates 
Parameter   DF Estimate Standard
Error Wald
Chi-Square Pr > ChiSq Exp(Est) 
Intercept   1 -0.6740 0.3391 3.9498 0.0469 0.510 
chi_ur 0 1 0.8664 0.3391 6.5262 0.0106 2.378 

Odds Ratio Estimates 
Effect Point Estimate 95% Wald
Confidence Limits 
chi_ur 0 vs 1 5.656 1.497 21.372 

Association of Predicted Probabilities and
Observed Responses 
Percent Concordant 27.7 Somers' D 0.228 
Percent Discordant 4.9 Gamma 0.700 
Percent Tied 67.4 Tau-a 0.115 
Pairs 2021 c 0.614     
  --------------------------------------------------------------------------------
The SAS System 


The LOGISTIC Procedure

Model Information 
Data Set EN.EN1 
Response Variable dry 
Number of Response Levels 2 
Model binary logit 
Optimization Technique Fisher's scoring 

Number of Observations Read 98 
Number of Observations Used 98      

Response Profile 
Ordered
Value dry Total
Frequency 
1 1 47 
2 0 51     


Probability modeled is dry='1'.    

Class Level
Information 
Class Value Design
Variables 
rt 0 1 
  1 -1 

Model Convergence Status 
Convergence criterion (GCONV=1E-8) satisfied. 

Model Fit Statistics 
Criterion Intercept Only Intercept and
Covariates 
AIC 137.694 134.806 
SC 140.279 139.976 
-2 Log L 135.694 130.806 

Testing Global Null Hypothesis: BETA=0 
Test Chi-Square DF Pr > ChiSq 
Likelihood Ratio 4.8871 1 0.0271 
Score 4.6063 1 0.0319 
Wald 4.1208 1 0.0424 

Type 3 Analysis of Effects 
Effect DF Wald
Chi-Square Pr > ChiSq 
rt 1 4.1208 0.0424 

Analysis of Maximum Likelihood Estimates 
Parameter   DF Estimate Standard
Error Wald
Chi-Square Pr > ChiSq Exp(Est) 
Intercept   1 -0.6020 0.3435 3.0712 0.0797 0.548 
rt 0 1 0.6973 0.3435 4.1208 0.0424 2.008 

Odds Ratio Estimates 
Effect Point Estimate 95% Wald
Confidence Limits 
rt 0 vs 1 4.033 1.049 15.504 


Association of Predicted Probabilities and
Observed Responses 
Percent Concordant 20.2 Somers' D 0.152 
Percent Discordant 5.0 Gamma 0.603 
Percent Tied 74.8 Tau-a 0.077 
Pairs 2397 c 0.576 

--------------------------------------------------------------------------------
The SAS System 

The LOGISTIC Procedure

Model Information 
Data Set EN.EN1 
Response Variable dry 
Number of Response Levels 2 
Model binary logit 
Optimization Technique Fisher's scoring 

Number of Observations Read 98 
Number of Observations Used 90 

Response Profile 
Ordered
Value dry Total
Frequency 
1 1 43 
2 0 47 

Probability modeled is dry='1'. 

Note: 8 observations were deleted due to missing values for the response or explanatory variables. 

Class Level Information 
Class Value Design
Variables 
rt 0 1 
  1 -1 
chi_ur 0 1 
  1 -1 

Model Convergence Status 
Quasi-complete separation of data points detected. 

Warning: The maximum likelihood estimate may not exist. 


Warning: The LOGISTIC procedure continues in spite of the above warning. Results shown are based on the last maximum likelihood iteration. Validity of the model fit is questionable. 


Model Fit Statistics 
Criterion Intercept Only Intercept and
Covariates 
AIC 126.589 121.066 
SC 129.088 131.065 
-2 Log L 124.589 113.066 

Testing Global Null Hypothesis: BETA=0 
Test Chi-Square DF Pr > ChiSq 
Likelihood Ratio 11.5228 3 0.0092 
Score 10.6138 3 0.0140 
Wald 8.6501 3 0.0343       

Joint Tests 
Effect DF Wald
Chi-Square Pr > ChiSq 
rt 1 0.0007 0.9787 
chi_ur 1 0.0009 0.9765 
rt*chi_ur 1 0.0005 0.9830 

Note: Under full-rank parameterizations, Type 3 effect tests are replaced by joint tests. The joint test for an effect is a test that all the parameters associated with that effect are zero. Such joint tests might not be equivalent to Type 3 effect tests under GLM parameterization. 

Analysis of Maximum Likelihood Estimates 
Parameter     DF Estimate Standard
Error Wald
Chi-Square Pr > ChiSq Exp(Est) 
Intercept     1 -3.5417 111.8 0.0010 0.9747 0.029 
rt 0   1 2.9849 111.8 0.0007 0.9787 19.785 
chi_ur 0   1 3.2945 111.8 0.0009 0.9765 26.963 
rt*chi_ur 0 0 1 -2.3849 111.8 0.0005 0.9830 0.092       

Association of Predicted Probabilities and
Observed Responses 
Percent Concordant 40.7 Somers' D 0.319 
Percent Discordant 8.8 Gamma 0.646 
Percent Tied 50.6 Tau-a 0.161 
Pairs 2021 c 0.660 

Я думаю, что немного подозревают, что стандартные ошибки в анализе максимальных правдоподобий SAS остаются прежними ...

Есть идеи?Как я могу это исправить?Спасибо!

1 Ответ

0 голосов
/ 22 октября 2018

Я подозреваю, что это потому, что вы не указали опцию PARAMETERIZATION и REF в операторе CLASS в PROC LOGISTIC, поэтому методы параметризации будут отличаться.R также не указывает, что такое «событие», предполагая, что оно использует 1, тогда результаты должны быть похожими.

class rt (param=ref);
...