Multiple Linear regression:
Coefficient estimation of MLR
Coefficient estimation of MLR
Suppose that there is linear relationship between variables X and Y. So consider a multiple variate
linear regression model (MvLR)
linear regression model (MvLR)
Remember the below formula
E(Y|X)=βX
Var(Y|X)= σ2I
- Maximum likelihood function and Least square
Under Gaussian multiple density distribution, εi is random independent variables. We have a Gaussian
distributiondata:image/s3,"s3://crabby-images/71cb2/71cb2e14ce8998e77a946a318ce258cdb8b771c4" alt=""
. So
distribution
Minimize the sum of squared residuals (RSS) using Ordinary Least Square method (OLS), and
estimate unknown or observed parameters β and ε, which was denoted asdata:image/s3,"s3://crabby-images/98281/9828156400871bfc9ce3e17e2987b51a33f3afc1" alt=""
and data:image/s3,"s3://crabby-images/427fc/427fcbeae8f3bfed390c55bf3248627771f30c4e" alt=""
.
estimate unknown or observed parameters β and ε, which was denoted as
Due to data:image/s3,"s3://crabby-images/bca31/bca319e0334a851c7d9c6c0ca3b5b0a99ec9b467" alt=""
,
Sodata:image/s3,"s3://crabby-images/206ca/206caa62d7d01af5fb3f344b0564fd4d020af90c" alt=""
data:image/s3,"s3://crabby-images/fc6c0/fc6c0478f146153f3442f38962e59bbe741f70c0" alt=""
So
So, data:image/s3,"s3://crabby-images/c88bf/c88bf68aafc55b47db1d8eea47359bda6171dbde" alt=""
data:image/s3,"s3://crabby-images/2ec5b/2ec5b866c4751f7524f641fe00ef11f49a3116c1" alt=""
Due to matrix calculus theorem, and X'X is symmetric matrix, so X'X=(X'X)'=XX', and (X'X)-1X'X=I.
The derivatives
set data:image/s3,"s3://crabby-images/9165c/9165cdc028cfdf5f1b057de965d014c7eb2e42bf" alt=""
. So data:image/s3,"s3://crabby-images/88d04/88d04b9761eb6c2b12870c721d416379c1ec04bb" alt=""
data:image/s3,"s3://crabby-images/40d26/40d263766094d013212b0b20b3632315afc9d4cd" alt=""
Here is a case study of multiple variables linear regression
> lm2<-lm(wt~age+ht, data=d)
> summary(lm2)
Call:
lm(formula = wt ~ age + ht, data = d)
Residuals:
Min 1Q Median 3Q Max
-2.48498 -0.53548 0.01508 0.51986 2.77917
Coefficients:
Estimate Std. Error t value Pr(>|t|) data:image/s3,"s3://crabby-images/49be4/49be4c0fc135380e3c806780ffe2229644987e0f" alt=""
(Intercept) -8.297442 0.865929 -9.582 <2e-16 ***
age 0.005368 0.010169 0.528 0.598
ht 0.228086 0.014205 16.057 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9035 on 182 degrees of freedom
Multiple R-squared: 0.9163, Adjusted R-squared: 0.9154
F-statistic: 995.8 on 2 and 182 DF, p-value: < 2.2e-16
> coef(lm2)
(Intercept) age ht
-8.297442239 0.005368228 0.228085501
So the coefficients of data:image/s3,"s3://crabby-images/06a2d/06a2d379ee32c58e004170020ac15b22e7efd995" alt=""
data:image/s3,"s3://crabby-images/6bfe5/6bfe5101567a1d3ea1d27bfddc11f5abc9c17579" alt=""
Calculate data:image/s3,"s3://crabby-images/37126/37126c1322f4809e9fedd71eaf636c88c34f7ed6" alt=""
Based on the formula
data:image/s3,"s3://crabby-images/5695e/5695e3dd59dbe47afac4b532b43e0ca5bb6104f7" alt=""
Here are the R code for calculating data:image/s3,"s3://crabby-images/663b2/663b2a4ed7c58ab0c37acc0d19b8b24a6e63a38b" alt=""
> X<-cbind(rep(1, nrow(d)), as.matrix(d[,c('age','ht')]))
> Y<-as.matrix(d$wt)
> library(MASS)
> ginv(t(X)%*%X)%*%t(X)%*%Y
[,1]
[1,] -8.297442239
[2,] 0.005368228
[3,] 0.228085501
- Gauss-Markov Theorem.
Here, we am going to discuss the expected value, variance and variance-covariance of coefficients.
Gauss-Markov Theorem: OLS estimator is the Best Linear, Unbiased, and efficient Estimator (BLUE).
There are three main proofs regarding the statements.
Gauss-Markov Theorem: OLS estimator is the Best Linear, Unbiased, and efficient Estimator (BLUE).
There are three main proofs regarding the statements.
is unbiased estimator of β:
proof data:image/s3,"s3://crabby-images/a1335/a133577695c9657a0f748c9e7a27027a6a7590ab" alt=""
.
Assumption: Y=Xβ+ε, E(ε|X)=0, and X has rank k (no perfect collinearity).
Because (X'X)-1X'X=I and E(ε)=0, and Iβ=βI=βdata:image/s3,"s3://crabby-images/94e30/94e3024c783fe85cd7e410b55db1f3e84a2f9948" alt=""
proof variance of data:image/s3,"s3://crabby-images/9b669/9b6690a72710bfb1803c9b1bc8cc6c573d03e477" alt=""
. data:image/s3,"s3://crabby-images/85172/8517265297378d993a90919152cfba80df0c0737" alt=""
data:image/s3,"s3://crabby-images/a81dd/a81dd54b01692af6f981f034e0eba8d6c0b495af" alt=""
data:image/s3,"s3://crabby-images/3a124/3a1248191b0823f71221b42d63e03bff61c0c07e" alt=""
has linear relationship with β
Proof
has minimal variance among all linear and unbiased estimators.
Proof variance-covariance matrix of the OLS estimates:
data:image/s3,"s3://crabby-images/dc11f/dc11fb3d199c4dfc143f3d227138783a909c7efa" alt=""
data:image/s3,"s3://crabby-images/a295f/a295f5628949e43f4332a80135e87b70e923339f" alt=""
With linear algebra theorem COV(X)=E[(X-E(X))(X-E(X))'], and data:image/s3,"s3://crabby-images/ca1d2/ca1d2220c8587dbbcceb4e7af2ab5fe225315e06" alt=""
. So
Because data:image/s3,"s3://crabby-images/a0496/a0496379b0277a483bcccfa572d3ad0661dfd459" alt=""
and Y=Xβ+ε. So
So data:image/s3,"s3://crabby-images/3193d/3193dcfd452a779df20aaee259ad0424fb4cebf7" alt=""
data:image/s3,"s3://crabby-images/2fa51/2fa51a3eabd766127720d9ed9c740bd493af2e1f" alt=""
Because data:image/s3,"s3://crabby-images/500ec/500ec4785e6e303449661016406cef08d4a971b8" alt=""
, So
Because σ2 is unknown, replace σ2 with data:image/s3,"s3://crabby-images/0d985/0d985553ea821900d2d72dfa809446a97597c2f2" alt=""
when n is huge. So
> #standard error of OLS_beta
> OLS_residuals<-Y-X%*%OLS_beta
> (OLS_var<-as.numeric(t(OLS_residuals)%*%OLS_residuals/182))
[1] 0.8162765
> OLS_var*diag(1, 3)
[,1] [,2] [,3]
[1,] 0.8162765 0.0000000 0.0000000
[2,] 0.0000000 0.8162765 0.0000000
[3,] 0.0000000 0.0000000 0.8162765
> #variance-covariance matrix
> (var_cov<-OLS_var*diag(1, 3)%*%ginv(t(X)%*%X))
[,1] [,2] [,3]
[1,] 0.749832532 0.0076531233 -0.0121572259
[2,] 0.007653123 0.0001034006 -0.0001341481
[3,] -0.012157226 -0.0001341481 0.0002017823
> #standard error of OLS beta
> (se_beta<-sqrt(diag(var_cov)))
[1] 0.86592871 0.01016861 0.01420501
Here are the 95% confidence intervals of
| |
> (t95<-qt(0.975, df=182, lower.tail=F))
[1] 1.973084
> data.frame('beta'=OLS_beta, 'lower_bound'=OLS_beta-t95*se_beta,
+ 'upper_bound'=OLS_beta+t95*se_beta)
beta lower_bound upper_bound
1 -8.297442239 -10.00599239 -6.58889209
2 0.005368228 -0.01469529 0.02543175
3 0.228085501 0.20005782 0.25611318
| |
|
- Normality and Significance test of coefficients
The test is used to check the significance of individual regression coefficients in the multiple linear
regression model. Adding a significant variable to a regression model makes the model more
effective, while adding an unimportant variable may make the model worse. The hypothesis
statements to test the significance of a particular regression coefficient.
regression model. Adding a significant variable to a regression model makes the model more
effective, while adding an unimportant variable may make the model worse. The hypothesis
statements to test the significance of a particular regression coefficient.
Under the CLM assumptions, we suppose data:image/s3,"s3://crabby-images/05fb5/05fb5fa7288a8c51b93d879c4e217bc5b6a58235" alt=""
denoted as Multivariate Gaussian distribution (MVG)
with mean β and variance-covariance matrix σ2(X'X)-1. So
with mean β and variance-covariance matrix σ2(X'X)-1. So
We could obtain a standard normal distribution of an OLS estimator given k th coefficient:
The population σ2 is unknown. We could estimate OLS σ2 (). Use the standard error of data:image/s3,"s3://crabby-images/52d9f/52d9ff68c90d4d63d3958bd0cd424608a9d83b5a" alt=""
instead of standard deviation of β1 andβ0.
instead of standard deviation of β1 andβ0.
So (1-α)% confidence interval of data:image/s3,"s3://crabby-images/f1406/f14066e0c103fc56dfd6dfb19e5b620493d8cdad" alt=""
and data:image/s3,"s3://crabby-images/57a7c/57a7c0f7be68b28f9b539a6cada122d6fb1ddb43" alt=""
.
H0: data:image/s3,"s3://crabby-images/f3bef/f3bef0318a7d859b2a3f9d4f027261ed7de16ba6" alt=""
implies that no linear relationship exists between X and Y.
H1: data:image/s3,"s3://crabby-images/004f2/004f25a9fdd7b503427c233a9879a12ff69b3a71" alt=""
data:image/s3,"s3://crabby-images/1403b/1403b1591799539ae147d5437641b973bd9b53d7" alt=""
Under H0:
So t statistics:
Here are the R code
#t-statistics
> (t_stat<-(OLS_beta-0)/se_beta)
[,1]
[1,] -9.5821309
[2,] 0.5279217
[3,] 16.0566930
#p values
> pt(t_stat, df=182, lower.tail=T)
[,1]
[1,] 3.635395e-18
[2,] 7.009016e-01
[3,] 1.000000e+00
Then we could expand the method. Suppose
data:image/s3,"s3://crabby-images/50960/50960b4ee09002f6e431e12f294fa8e455ea9fa5" alt=""
And data:image/s3,"s3://crabby-images/4d897/4d89744e90e306221b18cb71011a4f325c52224b" alt=""
is denoted as Gaussian distribution. So
There is variance-covariance matrix data:image/s3,"s3://crabby-images/1577d/1577dd9486049a6a134dcf1c91a0e989db926ee3" alt=""
data:image/s3,"s3://crabby-images/33b02/33b025211f708bd0a8f10af53e243b9b86355cd4" alt=""
Regarding data:image/s3,"s3://crabby-images/e6359/e6359c262cc1d1b2f948bc27474149e0adb303cc" alt=""
, there are variance data:image/s3,"s3://crabby-images/042d6/042d654d2c0500de22e02e5dbfdff86bb5a39d3c" alt=""
, and
covariancedata:image/s3,"s3://crabby-images/6fbef/6fbef01d59f12b98b08ddfddd84399ef2942c209" alt=""
. So
covariance
So 95% CI of data:image/s3,"s3://crabby-images/0fcc1/0fcc159fdb30250244802d5957f29b327332b260" alt=""
data:image/s3,"s3://crabby-images/9237b/9237bffc85c5373e82a745c06ec27ac43edf0fa9" alt=""
So data:image/s3,"s3://crabby-images/f9d15/f9d15e613896199987b6beaa468c2f1a287f412e" alt=""
data:image/s3,"s3://crabby-images/a8570/a857056125eaa2d772882f67e275658fe2579800" alt=""
No comments:
Post a Comment