Tiezheng Yuan Ph.D.: R: Numeric vector

Abstract: Manipulation of numeric vector in R

Create a vector

Create a simple numeric vector

> v=c(1,2,3,4,6,7,20,100,34,54) # a sample vector

> v

[1] 1 2 3 4 6 7 20 100 34 54

> names(v)<-v # given names of the vector

> v

1 2 3 4 6 7 20 100 34 54

Here, we generate other types of numeric vectors

> rep(10, length=4) #get a four-element vector, initiated as 10

[1] 10 10 10 10

> rep(c(1,2,3), length=4*3) # replicate a vector

[1] 1 2 3 1 2 3 1 2 3 1 2 3

> a=1:10 # get a vector from 1 to 10

> a

[1] 1 2 3 4 5 6 7 8 9 10

> 10:1 # from the max to the min

[1] 10 9 8 7 6 5 4 3 2 1

Here is more complicated numeric sequences

> seq(from=10, to=100, by=5) # numbers from 10-100, the interval is 5

[1] 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

> seq(from=10, to=100, length=5) # 5 numbers from 10-100 with equal intervals

[1] 10.0 32.5 55.0 77.5 100.0

Index in a vector

> v[1] # the first element

[1] 1

> v[c(1,3)] # the first and third elements

[1] 1 3

> v[2:6] # the elements from the second to the sixth

[1] 2 3 4 6 7

> v[length(v)] # the last element

[1] 54

> tail(v, n=1) # same as above

[1] 54

> tail(v, n=2) # the last two elements

[1] 34 54

> rev(v)[1] # the last element

[1] 54

> v['1'] # the element with the name ‘1’

Change elements of a vector

You can replace values of any elements by using the index in vectors

> v

[1] 1 2 3 4 6 7 20 100 34 54

> v[4]=233 #replace the fourth element

> v

[1] 1 2 3 233 6 7 20 100 34 54

> v[2:5]=c(3,33,333, 3333) # replace the second-fifth elements at a time

> v

[1] 1 3 33 333 3333 7 20 100 34 54

You can also remove elements like this:

> v

[1] 1 3 33 333 3333 7 20 100 34 54

> v[-1] #drop the first element

[1] 3 33 333 3333 7 20 100 34 54

> v[-1:-3] # drop the first three elements

[1] 333 3333 7 20 100 34 54

> v[-(1:3)] # the same as above, but another style

[1] 333 3333 7 20 100 34 54

> v[-length(v)] # drop the last element

[1] 1 3 33 333 3333 7 20 100 34

Or append elements using append() like this:

> v=1:10

> v

[1] 1 2 3 4 5 6 7 8 9 10

> append(v, c(20,3,2)) # add three elements to the end of the vector

[1] 1 2 3 4 5 6 7 8 9 10 20 3 2

The process of insert the elements into a vector would be a little more complicated. Here we define a function name insert() firstly.

> insert <- function(vector,ins,pos.after){

if(pos.after<length(vector) ){ #insert

com.vector<-c(vector[1:pos.after],ins,vector[(pos.after+1):length(vector)])

}else{ # append

com.vector<-c(vector,ins)

}

return(com.vector)

} # define a subroutine named as insert()

> v=c(1,2,3)

> v

[1] 1 2 3

> insert(v, c(333,2), 2) #insert two elements after the second element

[1] 1 2 333 2 3

> insert(v, c(333,2), 20) # append elements

[1] 1 2 3 333 2

Statistics of a vector

Here are the functions for statistics of this vector

> summary(v)

Min. 1st Qu. Median Mean 3rd Qu. Max.

1.00 10.25 33.50 391.80 88.50 3333.00

> min(v) #the minimum

[1] 1

> max(v) # the maximum

[1] 3333

> mean(v) #mean value

[1] 391.8

> median(v) #median value

[1] 33.5

> range(v) #range of the vector: the min and max values

[1] 1 3333

> sum(v) # sum values of the vector

[1] 3918

> length(v) # the number of elements

[1] 10

Another issue is how to deal the NA elements when using mean/median/min/max functions. For example:

> mean(c(1,3,NA)) #return NA if NA is included in the vector

[1] NA

> mean(c(1,3,NA), na.rm=T) # na.rm=T

[1] 2

Quantiles are values take at regular intervals from the reverse of the cumulative distribution function(CDF) of a random variable. The 2-quantile is median.

> quantile(v) #quantile

0% 25% 50% 75% 100%

1.00 10.25 33.50 88.50 3333.00

> quantile(v, 0.4) # quantile at 0.4

40%

27.8

Order of a vector

Here we use the function sort() and order() for ordering the elements. The difference is that sort() return the ordered vector directly, and order() return the index.

> v

[1] 1 3 33 333 3333 7 20 100 34 54

> sort(v) # the default is increasing

[1] 1 3 7 20 33 34 54 100 333 3333

> sort(v, decreasing=T) # this is the decreasing

[1] 3333 333 100 54 34 33 20 7 3 1

> order(v, decreasing=T) # return the index of the vector

[1] 5 4 8 10 9 3 7 6 2 1

> v[order(v, decreasing=T)] # the decreasing

[1] 3333 333 100 54 34 33 20 7 3 1

> rev(v) #reverse the elements

[1] 54 34 100 20 7 3333 333 33 3 1

Filter a vector

Sometimes we should get the sub-sets of a vector

> v

[1] 1 3 33 333 3333 7 20 100 34 54

> v[v>=10] # get all elements beyond 10

[1] 33 333 3333 20 100 34 54

> v[v>=10&v<200] # get elements more than 10 and less than 200

[1] 33 20 100 34 54

The selection function which() can also used for filtering vectors. More advanced filtering standards can be used.

> v

[1] 1 3 33 333 3333 7 20 100 34 54

> which(v>2)

[1] 2 3 4 5 6 7 8 9 10

> which(v*0.5>10) # expression formula

[1] 3 4 5 8 9 10

Mathematics of a vector

Note: ‘v+v’ is different from ‘c(v+v)’.

> v+v # a vector plus by another one

[1] 2 6 66 666 6666 14 40 200 68 108

> c(v, 100, 200) # combine/append vector

[1] 1 3 33 333 3333 7 20 100 34 54 100 200

> v/3 #all elements divided by 3

[1] 0.3333333 1.0000000 11.0000000 111.0000000 1111.0000000 2.3333333 6.6666667 33.3333333 11.3333333 18.0000000

> log2(v) #all elements with logarithm

[1] 0.000000 1.584963 5.044394 8.379378 11.702606 2.807355 4.321928 6.643856 5.087463 5.754888

The length of vector should be the same is the calculation between vector, other wise a warning will be returned.

> v+v[-1]

[1] 4 36 366 3666 3340 27 120 134 88 57

Warning message:

In v + v[-1] :

longer object length is not a multiple of shorter object length

Unique or duplicated elements

Here are the examples of R functions unique() and duplicated().

> a=c(1,2,3,2,1,1,6,0)

> unique(a) # remove any duplicated elements

[1] 1 2 3 6 0

> duplicated(a) # which are duplicated elements

[1] FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE

> a[duplicated(a)]

[1] 2 1 1

> unique(a[duplicated(a)]) # 2 and 1 are the duplicated elements

[1] 2 1

Moreover, how can we know those elements shared by two vectors, or those unique elements?

> a

[1] 1 2 3 2 1 1 6 0

> b

[1] 1 40 23 2

> intersect(a,b) # those shared elements

[1] 1 2

> setdiff(a,b) # the elements are involved only in a but not in b

[1] 3 6 0

> setdiff(b,a) # the elements are involved only in b but not in a

[1] 40 23

writing date: 2015.01.23

Tiezheng Yuan Ph.D.

Friday, January 30, 2015

R: Numeric vector

No comments:

Post a Comment