Multiple Linear regression:
Coefficient estimation of MLR
Suppose that there is linear relationship between variables X and Y. So consider a multiple variate
linear regression model (MvLR)
Remember the below formula
Var(Y|X)= σ2I
- Maximum likelihood function and Least square
Under Gaussian multiple density distribution, εi is random independent variables. We have a Gaussian
distribution . So
Minimize the sum of squared residuals (RSS) using Ordinary Least Square method (OLS), and
estimate unknown or observed parameters β and ε, which was denoted as and .
Due to ,
Due to matrix calculus theorem, and X'X is symmetric matrix, so X'X=(X'X)'=XX', and (X'X)-1X'X=I.
The derivatives
set . So
Here is a case study of multiple variables linear regression
> lm2<-lm(wt~age+ht, data=d)
> summary(lm2)
lm(formula = wt ~ age + ht, data = d)
Min 1Q Median 3Q Max
-2.48498 -0.53548 0.01508 0.51986 2.77917
Estimate Std. Error t value Pr(>|t|)
(Intercept) -8.297442 0.865929 -9.582 <2e-16 ***
age 0.005368 0.010169 0.528 0.598
ht 0.228086 0.014205 16.057 <2e-16 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9035 on 182 degrees of freedom
Multiple R-squared: 0.9163, Adjusted R-squared: 0.9154
F-statistic: 995.8 on 2 and 182 DF, p-value: < 2.2e-16
> coef(lm2)
(Intercept) age ht
-8.297442239 0.005368228 0.228085501
So the coefficients of
Calculate Based on the formula
Here are the R code for calculating
> X<-cbind(rep(1, nrow(d)), as.matrix(d[,c('age','ht')]))
> Y<-as.matrix(d$wt)
> library(MASS)
> ginv(t(X)%*%X)%*%t(X)%*%Y
[1,] -8.297442239
[2,] 0.005368228
[3,] 0.228085501
- Gauss-Markov Theorem.
Here, we am going to discuss the expected value, variance and variance-covariance of coefficients.
Gauss-Markov Theorem: OLS estimator is the Best Linear, Unbiased, and efficient Estimator (BLUE).
There are three main proofs regarding the statements.
- is unbiased estimator of β:
proof .
Assumption: Y=Xβ+ε, E(ε|X)=0, and X has rank k (no perfect collinearity).
Because (X'X)-1X'X=I and E(ε)=0, and Iβ=βI=β
proof variance of .
- has linear relationship with β
- has minimal variance among all linear and unbiased estimators.
Proof variance-covariance matrix of the OLS estimates:
With linear algebra theorem COV(X)=E[(X-E(X))(X-E(X))'], and . So
Because and Y=Xβ+ε. So
Because , So
Because σ2 is unknown, replace σ2 with when n is huge. So
> #standard error of OLS_beta
> OLS_residuals<-Y-X%*%OLS_beta
> (OLS_var<-as.numeric(t(OLS_residuals)%*%OLS_residuals/182))
[1] 0.8162765
> OLS_var*diag(1, 3)
[,1] [,2] [,3]
[1,] 0.8162765 0.0000000 0.0000000
[2,] 0.0000000 0.8162765 0.0000000
[3,] 0.0000000 0.0000000 0.8162765
> #variance-covariance matrix
> (var_cov<-OLS_var*diag(1, 3)%*%ginv(t(X)%*%X))
[,1] [,2] [,3]
[1,] 0.749832532 0.0076531233 -0.0121572259
[2,] 0.007653123 0.0001034006 -0.0001341481
[3,] -0.012157226 -0.0001341481 0.0002017823
> #standard error of OLS beta
> (se_beta<-sqrt(diag(var_cov)))
[1] 0.86592871 0.01016861 0.01420501
Here are the 95% confidence intervals of
| |
> (t95<-qt(0.975, df=182, lower.tail=F))
[1] 1.973084
> data.frame('beta'=OLS_beta, 'lower_bound'=OLS_beta-t95*se_beta,
+ 'upper_bound'=OLS_beta+t95*se_beta)
beta lower_bound upper_bound
1 -8.297442239 -10.00599239 -6.58889209
2 0.005368228 -0.01469529 0.02543175
3 0.228085501 0.20005782 0.25611318
| |
- Normality and Significance test of coefficients
The test is used to check the significance of individual regression coefficients in the multiple linear
regression model. Adding a significant variable to a regression model makes the model more
effective, while adding an unimportant variable may make the model worse. The hypothesis
statements to test the significance of a particular regression coefficient.
Under the CLM assumptions, we suppose denoted as Multivariate Gaussian distribution (MVG)
with mean β and variance-covariance matrix σ2(X'X)-1. So
We could obtain a standard normal distribution of an OLS estimator given k th coefficient:
The population σ2 is unknown. We could estimate OLS σ2 (). Use the standard error of
instead of standard deviation of β1 andβ0.
So (1-α)% confidence interval of and .
H0: implies that no linear relationship exists between X and Y.
Under H0:
So t statistics:
Here are the R code
> (t_stat<-(OLS_beta-0)/se_beta)
[1,] -9.5821309
[2,] 0.5279217
[3,] 16.0566930
#p values
> pt(t_stat, df=182, lower.tail=T)
[1,] 3.635395e-18
[2,] 7.009016e-01
[3,] 1.000000e+00
Then we could expand the method. Suppose
And is denoted as Gaussian distribution. So
There is variance-covariance matrix
Regarding , there are variance , and
covariance . So
So 95% CI of
