Thursday, May 24, 2018

Linear Regression: R2

Linear Regression:  R2




Call:
lm(formula = wt ~ age, data = d)

Residuals:
   Min     1Q Median      3Q Max
-3.7237 -0.8276  0.1854 0.9183 4.5043

Coefficients:
           Estimate Std. Error t value Pr(>|t|)    
(Intercept) 5.444528   0.204316 26.65 <2e-16 ***
age         0.157003 0.005845   26.86 <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.401 on 183 degrees of freedom
Multiple R-squared:  0.7977,    Adjusted R-squared:  0.7966
F-statistic: 721.4 on 1 and 183 DF,  p-value: < 2.2e-16

Here is a linear regression model with weight denoted as Y (dependent variable), and age denoted as X (independent variable):
Y=β02X+ε
The above are the summary of a linear regression. Here, R2 known as coefficient of determination is 0.7977. The result means that 79.77% of the variance of the weights can be accounted for by the linear regression model and the remaining of the variance may be caused by other factors.

Definition of r-squared
r-squared denoted as R2 or r2 is the proportion of variance in the dependent variable that is explained or predictable by the independent variables or variable. Consider Regression Sum of Squares (SSR), error sum of squares (SSE), and total sum of squares (SST):
If there is perfect fitting ε=0, then . Therefore SSE=0, and SST=SSR+SSE=SSR, and SSR/SST=1

How to use R2
R2 a statistic that will give some information about the goodness of fit of a model, but doesn't measure goodness of it. Higher r-squared would indicate better fitting of the model to the same data compared with another model. You cannot compare r-squared values based on different data sets. You cannot safely say a model is good fitted only based on a high r-squared, and bad fitting based on a low value.
R2 ranges from 0 to 1. An R2 of 1 indicates that the regression line perfectly fits the data. Values of R2 outside the range 0 to 1 can occur where it is used to measure the agreement between observed and modeled values and where the "modeled" values are not obtained by linear regression and depending on which formulation of R2 is used.

R code
R2 is a statistical measure of how well the regression line approximates the real data points. In all instances where R2 is used, the predictors are calculated by ordinary least-squares regression.
Here are the R code:
> SYY<-sum((d$wt-mean(d$wt))^2)
> RSS=sum((d$wt-predict(lm1))^2)
> 1-RSS/SYY #R2
[1] 0.7976624

R2 in linear least square regression equals the square of the pearson correlation coefficients r2. With nonlinear and multiple regression, the conversion is to always use R2.

No comments:

Post a Comment