Thursday, May 24, 2018

Logistic regression: binary LR using 2x2 table

Logistic regression: binary LR using 2x2 table


 
Consider some special cases of logistic regression model: there is only one binary predictor X (x=1,0).
  1. Mathematics Principle

Consider a 2x2 table

Outcome status
+
-
Exposure
status
+
a
b
-
c
d
Transform it into another format showed as the table.
Y
X
counts
Probability

+
+
a
P(Y+|X+)=a/(a+b)
Odds(X+)=P(Y+|X+)P(Y-|X+)=P(Y+|X+)1-P(Y+|X+)=ab
-
+
b
P(Y-|X+)=b/(a+b)
+
-
c
P(Y+|X-)=c/(c+d)
Odds(X-)=P(Y+|X-)P(Y-|X-)=P(Y+|X-)1-P(Y+|X-)=cd
-
-
d
P(Y-|X-)=d/(c+d)




OR=Odds(X+)Odds(X-)=adbc
Consider the table using the model of logistic regression. The outcome status works dichotomous dependent variable Y(Y=0,1), and the exposure status works as dichotomous predictor X(X=0,1). So there is logistic regression model:
logit P(X)=log (odds) =ln(P (X) 1-P (X) )=β0+β1X
log (odds(X+)) =β0+β1×1=ln⁡(ab)
log (odds(X-)) =β0+β1×0=ln⁡(cd)
So
β0=ln (cd) =ln⁡(odds(X-)),  β1=ln (ab) -ln (cd) =ln⁡(adbc)=ln⁡(OR)
SEβ0=c+dc×d,  SEβ1=1a+1b+1c+1d
OR=eβ1,  odds(X-)=eβ0,   odds(X+)=eβ0+β1

  1. Case study

The expenditure was converted to big-expenditure ($bigexp=1 if expenditure exceed $1,000 or 0 otherwise), and smoking disease is yes($mscd=1) or no ($mscd=0). Explore logistic regression of expenditure ($bigexp) on smoking disease ($mscd).
> lrA=glm(data=data1,bigexp~mscd,family=binomial(link="logit"))
> summary.glm(lrA)

Call:
glm(formula = bigexp ~ mscd, family = binomial(link = "logit"),
   data = data1)

Deviance Residuals:
   Min      1Q Median       3Q Max
-1.6592  -0.8834 -0.8834   1.5032 1.5032

Coefficients:
           Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.73953    0.02100 -35.21 <2e-16 ***
mscd         1.82505 0.06677   27.33 <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

   Null deviance: 15411  on 11683 degrees of freedom
Residual deviance: 14534  on 11682 degrees of freedom
AIC: 14538

Number of Fisher Scoring iterations: 4


The next, we will calculate coefficients and their standard errors by hand, which are comparable with the above results. There is 2x2 table
> table(data1$bigexp,data1$mscd)
              mscd
             0 1
bigexp  0 7016 333
        1 3349 986
> #beta0
> (beta0<-log(3349/7016))
[1] -0.7395315
> (SE_beta0<-sqrt((7016+3349)/(7016*3349)))
[1] 0.02100305
> #beta1
> (beta1<-log((7016*986)/(333*3349)))
[1] 1.825045
> (SE_beta0<-sqrt(1/7016+1/3349+1/333+1/986))
[1] 0.06677073


No comments:

Post a Comment