Thursday, May 24, 2018

Logistic regression: R data analysis example

Logistic regression: R data analysis example




  1. logit model
The code below estimates a logistic regression model of ever-smoking (eversmk=1,0) on multiple 
variates using the function glm(). The function summary() returns statistical results.
> lr<-glm(eversmk~mscd+lastage+male+educate+bigexp,data=data1, family=binomial(link='logit'))
> #summary
> summary(lr)

Call:
glm(formula = eversmk ~ mscd + lastage + male + educate + bigexp,
   family = binomial(link = "logit"), data = data1)

Deviance Residuals:
   Min      1Q Median       3Q Max
-2.0523  -1.0992 0.7147   1.0135 1.7003

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.668017 0.104441   6.396 1.59e-10 ***
mscd         0.585898 0.068376   8.569 < 2e-16 ***
lastage     -0.020541 0.001629 -12.612  < 2e-16 ***
male         1.170651 0.040600  28.833 < 2e-16 ***
educate      0.086159 0.021617   3.986 6.73e-05 ***
bigexp       0.155788 0.042812   3.639 0.000274 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

   Null deviance: 15987  on 11683 degrees of freedom
Residual deviance: 14855  on 11678 degrees of freedom
AIC: 14867

Number of Fisher Scoring iterations: 4

  1. OLS Coefficients
The code below return estimated coefficients. The function confint() returns 95%CI . 
The function vcov() returns variance-covariance matrix of coefficients.
> #coefficients
> summary(lr)$coefficients
              Estimate Std. Error    z value Pr(>|z|)
(Intercept)  0.66801673 0.104440901   6.396122 1.593728e-10
mscd         0.58589821 0.068376131   8.568753 1.046113e-17
lastage     -0.02054140 0.001628713 -12.612044  1.812304e-36
male         1.17065129 0.040600495  28.833424 8.177325e-183
educate      0.08615878 0.021616789   3.985734 6.727172e-05
bigexp       0.15578756 0.042812090   3.638868 2.738386e-04
> #95%CI
> confint(lr)
Waiting for profiling to be done...
                 2.5 % 97.5 %
(Intercept)  0.46343431 0.87286402
mscd         0.45243451 0.72051492
lastage     -0.02373898 -0.01735411
male         1.09126786 1.25042635
educate      0.04380033 0.12854326
bigexp       0.07194211 0.23977142
> #variance-covariance matrix
> round(vcov(lr),8)
             (Intercept)     mscd lastage     male educate bigexp
(Intercept)  0.01090790 0.00111201 -0.00013035 -0.00080027 -0.00085761 -0.00013945
mscd          0.00111201 0.00467530 -0.00002223 -0.00007440  0.00000676 -0.00064822
lastage     -0.00013035 -0.00002223  0.00000265 -0.00000135 -0.00000825 -0.00001129
male        -0.00080027 -0.00007440 -0.00000135  0.00164840 0.00007378 0.00013915
educate     -0.00085761 0.00000676 -0.00000825  0.00007378 0.00046729 0.00005713
bigexp      -0.00013945 -0.00064822 -0.00001129  0.00013915 0.00005713 0.00183288

  1. Expected probability
The function predict( type='response') return the predicted probability of the fitting model.
> mean_Y<-predict(lr, type='response')
> hist(mean_Y, xlab='Expected probability', main='')

No comments:

Post a Comment