Logistic regression: binary LR using 2x2 table

Consider some special cases of logistic regression model: there is only one binary predictor X (x=1,0).

Mathematics Principle

Consider a 2x2 table

		Outcome status
		+	-
Exposure status	+	a	b
Exposure status	-	c	d

Transform it into another format showed as the table.

Y	X	counts	Probability
+	+	a	P(Y+\|X+)=a/(a+b)	Odds(X+)=P(Y+\|X+)P(Y-\|X+)=P(Y+\|X+)1-P(Y+\|X+)=ab
-	+	b	P(Y-\|X+)=b/(a+b)	Odds(X+)=P(Y+\|X+)P(Y-\|X+)=P(Y+\|X+)1-P(Y+\|X+)=ab
+	-	c	P(Y+\|X-)=c/(c+d)	Odds(X-)=P(Y+\|X-)P(Y-\|X-)=P(Y+\|X-)1-P(Y+\|X-)=cd
-	-	d	P(Y-\|X-)=d/(c+d)	Odds(X-)=P(Y+\|X-)P(Y-\|X-)=P(Y+\|X-)1-P(Y+\|X-)=cd
				OR=Odds(X+)Odds(X-)=adbc

Consider the table using the model of logistic regression. The outcome status works dichotomous dependent variable Y(Y=0,1), and the exposure status works as dichotomous predictor X(X=0,1). So there is logistic regression model:

logit P(X)=log (odds) =ln(P (X) 1-P (X) )=β0+β1X

log (odds(X+)) =β0+β1×1=ln⁡(ab)

log (odds(X-)) =β0+β1×0=ln⁡(cd)

β0=ln (cd) =ln⁡(odds(X-)), β1=ln (ab) -ln (cd) =ln⁡(adbc)=ln⁡(OR)

SEβ0=c+dc×d, SEβ1=1a+1b+1c+1d

OR=eβ1, odds(X-)=eβ0, odds(X+)=eβ0+β1

Case study

The expenditure was converted to big-expenditure ($bigexp=1 if expenditure exceed $1,000 or 0 otherwise), and smoking disease is yes($mscd=1) or no ($mscd=0). Explore logistic regression of expenditure ($bigexp) on smoking disease ($mscd).

> lrA=glm(data=data1,bigexp~mscd,family=binomial(link="logit"))

> summary.glm(lrA)

Call:

glm(formula = bigexp ~ mscd, family = binomial(link = "logit"),

data = data1)

Deviance Residuals:

Min 1Q Median 3Q Max

-1.6592 -0.8834 -0.8834 1.5032 1.5032

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -0.73953 0.02100 -35.21 <2e-16 ***

mscd 1.82505 0.06677 27.33 <2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 15411 on 11683 degrees of freedom

Residual deviance: 14534 on 11682 degrees of freedom

AIC: 14538

Number of Fisher Scoring iterations: 4

The next, we will calculate coefficients and their standard errors by hand, which are comparable with the above results. There is 2x2 table

> table(data1$bigexp,data1$mscd)

mscd

0 1

bigexp 0 7016 333

1 3349 986

> #beta0

> (beta0<-log(3349/7016))

[1] -0.7395315

> (SE_beta0<-sqrt((7016+3349)/(7016*3349)))

[1] 0.02100305

> #beta1

> (beta1<-log((7016*986)/(333*3349)))

[1] 1.825045

> (SE_beta0<-sqrt(1/7016+1/3349+1/333+1/986))

[1] 0.06677073

Y	X	counts	Probability
+	+	a	P(Y+\|X+)=a/(a+b)	Odds(X+)=P(Y+\|X+)P(Y-\|X+)=P(Y+\|X+)1-P(Y+\|X+)=ab
-	+	b	P(Y-\|X+)=b/(a+b)	Odds(X+)=P(Y+\|X+)P(Y-\|X+)=P(Y+\|X+)1-P(Y+\|X+)=ab
+	-	c	P(Y+\|X-)=c/(c+d)	Odds(X-)=P(Y+\|X-)P(Y-\|X-)=P(Y+\|X-)1-P(Y+\|X-)=cd
-	-	d	P(Y-\|X-)=d/(c+d)	Odds(X-)=P(Y+\|X-)P(Y-\|X-)=P(Y+\|X-)1-P(Y+\|X-)=cd
				OR=Odds(X+)Odds(X-)=adbc

Tiezheng Yuan Ph.D.

Thursday, May 24, 2018

Logistic regression: binary LR using 2x2 table

Logistic regression: binary LR using 2x2 table

Mathematics Principle

Case study

No comments:

Post a Comment