Logistic regression: binary LR using 2x2 table
Consider some special cases of logistic regression model: there is only one binary predictor X (x=1,0).
Mathematics Principle
Consider a 2x2 table
Outcome status
| |||
+
|
-
| ||
Exposure
status
|
+
|
a
|
b
|
-
|
c
|
d
|
Transform it into another format showed as the table.
Y
|
X
|
counts
|
Probability
| |
+
|
+
|
a
|
P(Y+|X+)=a/(a+b)
|
Odds(X+)=P(Y+|X+)P(Y-|X+)=P(Y+|X+)1-P(Y+|X+)=ab
|
-
|
+
|
b
|
P(Y-|X+)=b/(a+b)
| |
+
|
-
|
c
|
P(Y+|X-)=c/(c+d)
|
Odds(X-)=P(Y+|X-)P(Y-|X-)=P(Y+|X-)1-P(Y+|X-)=cd
|
-
|
-
|
d
|
P(Y-|X-)=d/(c+d)
| |
OR=Odds(X+)Odds(X-)=adbc
|
Consider the table using the model of logistic regression. The outcome status works dichotomous dependent variable Y(Y=0,1), and the exposure status works as dichotomous predictor X(X=0,1). So there is logistic regression model:
logit P(X)=log (odds) =ln(P (X) 1-P (X) )=β0+β1X
log (odds(X+)) =β0+β1×1=ln(ab)
log (odds(X-)) =β0+β1×0=ln(cd)
So
β0=ln (cd) =ln(odds(X-)), β1=ln (ab) -ln (cd) =ln(adbc)=ln(OR)
SEβ0=c+dc×d, SEβ1=1a+1b+1c+1d
OR=eβ1, odds(X-)=eβ0, odds(X+)=eβ0+β1
Case study
The expenditure was converted to big-expenditure ($bigexp=1 if expenditure exceed $1,000 or 0 otherwise), and smoking disease is yes($mscd=1) or no ($mscd=0). Explore logistic regression of expenditure ($bigexp) on smoking disease ($mscd).
> lrA=glm(data=data1,bigexp~mscd,family=binomial(link="logit"))
> summary.glm(lrA)
Call:
glm(formula = bigexp ~ mscd, family = binomial(link = "logit"),
data = data1)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.6592 -0.8834 -0.8834 1.5032 1.5032
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.73953 0.02100 -35.21 <2e-16 ***
mscd 1.82505 0.06677 27.33 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 15411 on 11683 degrees of freedom
Residual deviance: 14534 on 11682 degrees of freedom
AIC: 14538
Number of Fisher Scoring iterations: 4
The next, we will calculate coefficients and their standard errors by hand, which are comparable with the above results. There is 2x2 table
> table(data1$bigexp,data1$mscd)
mscd
0 1
bigexp 0 7016 333
1 3349 986
> #beta0
> (beta0<-log(3349/7016))
[1] -0.7395315
> (SE_beta0<-sqrt((7016+3349)/(7016*3349)))
[1] 0.02100305
> #beta1
> (beta1<-log((7016*986)/(333*3349)))
[1] 1.825045
> (SE_beta0<-sqrt(1/7016+1/3349+1/333+1/986))
[1] 0.06677073
No comments:
Post a Comment