Tiezheng Yuan Ph.D.: Coefficients of Correlation and Covariance

Coefficients of Correlation and Covariance

population pearson correlation coefficient

sample pearson correlation coefficient

sample covariance:

Regarding longitudinal data, the correlation matrix would be

Here j=1….t as a time point at i=1….n subject. For example a longitudinal data set:

head(obs.matrix)

wt.2 wt.3 wt.0 wt.1 wt.4

3 13.1 13.8 NA NA NA

6 15.8 16.2 14.9 15.1 NA

20 NA NA NA NA 13.3

21 14.3 14.9 14.2 14.3 NA

27 14.9 NA NA 14.7 16.0

40 NA NA NA NA 13.3

dim(obs.matrix)

[1] 87 5

There is correlation matrix using the function cor(method='pearon'). The correlation matrix is symmetric matrix with 1 diagonal.
correlation coefficient ranges from -1 to 1. r=1 means perfect linear relation, and r=0 indicate no linea correlation,
and r=-1 indicates Y would increase with X decrease.

cor.matrix<-round(cor(obs.matrix,use="pairwise.complete.obs"),3)

cor.matrix

wt.2 wt.3 wt.0 wt.1 wt.4

wt.2 1.000 0.965 0.963 0.964 0.973

wt.3 0.965 1.000 0.971 0.943 0.966

wt.0 0.963 0.971 1.000 0.955 0.946

wt.1 0.964 0.943 0.955 1.000 0.934

wt.4 0.973 0.966 0.946 0.934 1.000

There is covariance matrix using the function cov(method='pearon').:

cov.matrix<-round(cov(obs.matrix,use="pairwise.complete.obs"),3)

cov.matrix

wt.2 wt.3 wt.0 wt.1 wt.4

wt.2 3.008 3.054 2.849 2.824 3.120

wt.3 3.054 3.932 2.912 2.794 3.831

wt.0 2.849 2.912 2.820 2.492 2.965

wt.1 2.824 2.794 2.492 2.644 2.802

wt.4 3.120 3.831 2.965 2.802 3.720

The next, I would like to show how to calculate correlation coefficient between two time point wt.2 and wt.3.
Here is the scatterplot.

plot(wt.2~wt.3, data=obs.matrix)

r=cor(obs.matrix$wt.2,obs.matrix$wt.3, use="pairwise.complete.obs")

[1] 0.9647316

text(12,15, bquote(italic(r)~'='~.(round(r,4))))

The correlation coefficient between wt.2 and wt.3 is r=0.9647316 and covariance cov=3.054.

> #remove NA

> sub<-obs.matrix[!(is.na(obs.matrix[,'wt.2'])|is.na(obs.matrix[,'wt.3'])),c('wt.2','wt.3')]

> #mean values of wt.2

> (wt.2.mean<-mean(sub$wt.2))

[1] 13.89273

> #mean values of wt.3

> (wt.3.mean<-mean(sub$wt.3))

[1] 14.51818

> #variance values of wt.2

> (wt.2.var<-var(sub$wt.2))

[1] 3.126613

> #variance values of wt.3

> (wt.3.var<-var(sub$wt.3))

[1] 3.205219

> #covariance of wt.2 and wt.3

> c=sum((sub$wt.2-wt.2.mean)*(sub$wt.3-wt.3.mean))

> c/(nrow(sub)-1) #sample covariance

[1] 3.054024

> v=sqrt(sum((sub$wt.2-wt.2.mean)^2))*sqrt(sum((sub$wt.3-wt.3.mean)^2))

> c/v #correlation coefficient

[1] 0.9647316

Tiezheng Yuan Ph.D.

Thursday, May 24, 2018

Coefficients of Correlation and Covariance

No comments:

Post a Comment