Friday, January 30, 2015

R: Numeric vector



Abstract: Manipulation of numeric vector in R


Create a vector
Create a simple numeric vector
> v=c(1,2,3,4,6,7,20,100,34,54) # a sample vector
> v
[1] 1 2 3 4 6 7 20 100 34 54

> names(v)<-v # given names of the vector
> v
1 2 3 4 6 7 20 100 34 54
1 2 3 4 6 7 20 100 34 54

Here, we generate other types of numeric vectors
> rep(10, length=4) #get a four-element vector, initiated as 10
[1] 10 10 10 10

> rep(c(1,2,3), length=4*3) # replicate a vector
[1] 1 2 3 1 2 3 1 2 3 1 2 3

> a=1:10 # get a vector from 1 to 10
> a
[1] 1 2 3 4 5 6 7 8 9 10
> 10:1 # from the max to the min
[1] 10 9 8 7 6 5 4 3 2 1

Here is more complicated numeric sequences
> seq(from=10, to=100, by=5) # numbers from 10-100, the interval is 5
[1] 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
> seq(from=10, to=100, length=5) # 5 numbers from 10-100 with equal intervals
[1] 10.0 32.5 55.0 77.5 100.0

Index in a vector
> v[1] # the first element
[1] 1

> v[c(1,3)] # the first and third elements
[1] 1 3

> v[2:6] # the elements from the second to the sixth
[1] 2 3 4 6 7

> v[length(v)] # the last element
[1] 54
> tail(v, n=1) # same as above
[1] 54
> tail(v, n=2) # the last two elements
[1] 34 54
> rev(v)[1] # the last element
[1] 54

> v['1'] # the element with the name ‘1’
1
1

Change elements of a vector
You can replace values of any elements by using the index in vectors
> v
[1] 1 2 3 4 6 7 20 100 34 54
> v[4]=233 #replace the fourth element

> v
[1] 1 2 3 233 6 7 20 100 34 54
> v[2:5]=c(3,33,333, 3333) # replace the second-fifth elements at a time
> v
[1] 1 3 33 333 3333 7 20 100 34 54

You can also remove elements like this:
> v
[1] 1 3 33 333 3333 7 20 100 34 54

> v[-1] #drop the first element
[1] 3 33 333 3333 7 20 100 34 54

> v[-1:-3] # drop the first three elements
[1] 333 3333 7 20 100 34 54
> v[-(1:3)] # the same as above, but another style
[1] 333 3333 7 20 100 34 54

> v[-length(v)] # drop the last element
[1] 1 3 33 333 3333 7 20 100 34

Or append elements using append() like this:
> v=1:10
> v
[1] 1 2 3 4 5 6 7 8 9 10

> append(v, c(20,3,2)) # add three elements to the end of the vector
[1] 1 2 3 4 5 6 7 8 9 10 20 3 2

The process of insert the elements into a vector would be a little more complicated. Here we define a function name insert() firstly.
> insert <- function(vector,ins,pos.after){
if(pos.after<length(vector) ){ #insert
com.vector<-c(vector[1:pos.after],ins,vector[(pos.after+1):length(vector)])
}else{ # append
com.vector<-c(vector,ins)
}
return(com.vector)
} # define a subroutine named as insert()
> v=c(1,2,3)
> v
[1] 1 2 3
> insert(v, c(333,2), 2) #insert two elements after the second element
[1] 1 2 333 2 3
> insert(v, c(333,2), 20) # append elements
[1] 1 2 3 333 2

Statistics of a vector
Here are the functions for statistics of this vector
> summary(v)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 10.25 33.50 391.80 88.50 3333.00

> min(v) #the minimum
[1] 1
> max(v) # the maximum
[1] 3333

> mean(v) #mean value
[1] 391.8
> median(v) #median value
[1] 33.5

> range(v) #range of the vector: the min and max values
[1] 1 3333

> sum(v) # sum values of the vector
[1] 3918

> length(v) # the number of elements
[1] 10

Another issue is how to deal the NA elements when using mean/median/min/max functions. For example:
> mean(c(1,3,NA)) #return NA if NA is included in the vector
[1] NA
> mean(c(1,3,NA), na.rm=T) # na.rm=T
[1] 2

Quantiles are values take at regular intervals from the reverse of the cumulative distribution function(CDF) of a random variable. The 2-quantile is median.
> quantile(v) #quantile
0% 25% 50% 75% 100%
1.00 10.25 33.50 88.50 3333.00

> quantile(v, 0.4) # quantile at 0.4
40%
27.8

Order of a vector
Here we use the function sort() and order() for ordering the elements. The difference is that sort() return the ordered vector directly, and order() return the index.
> v
[1] 1 3 33 333 3333 7 20 100 34 54

> sort(v) # the default is increasing
[1] 1 3 7 20 33 34 54 100 333 3333

> sort(v, decreasing=T) # this is the decreasing
[1] 3333 333 100 54 34 33 20 7 3 1

> order(v, decreasing=T) # return the index of the vector
[1] 5 4 8 10 9 3 7 6 2 1
> v[order(v, decreasing=T)] # the decreasing
[1] 3333 333 100 54 34 33 20 7 3 1

> rev(v) #reverse the elements
[1] 54 34 100 20 7 3333 333 33 3 1

Filter a vector
Sometimes we should get the sub-sets of a vector
> v
[1] 1 3 33 333 3333 7 20 100 34 54

> v[v>=10] # get all elements beyond 10
[1] 33 333 3333 20 100 34 54
> v[v>=10&v<200] # get elements more than 10 and less than 200
[1] 33 20 100 34 54

The selection function which() can also used for filtering vectors. More advanced filtering standards can be used.
> v
[1] 1 3 33 333 3333 7 20 100 34 54
> which(v>2)
[1] 2 3 4 5 6 7 8 9 10
> which(v*0.5>10) # expression formula
[1] 3 4 5 8 9 10

Mathematics of a vector
Note: ‘v+v’ is different from ‘c(v+v)’.
> v+v # a vector plus by another one
[1] 2 6 66 666 6666 14 40 200 68 108
> c(v, 100, 200) # combine/append vector
[1] 1 3 33 333 3333 7 20 100 34 54 100 200

> v/3 #all elements divided by 3
[1] 0.3333333 1.0000000 11.0000000 111.0000000 1111.0000000 2.3333333 6.6666667 33.3333333 11.3333333 18.0000000

> log2(v) #all elements with logarithm
[1] 0.000000 1.584963 5.044394 8.379378 11.702606 2.807355 4.321928 6.643856 5.087463 5.754888

The length of vector should be the same is the calculation between vector, other wise a warning will be returned.
> v+v[-1]
[1] 4 36 366 3666 3340 27 120 134 88 57
Warning message:
In v + v[-1] :
longer object length is not a multiple of shorter object length

Unique or duplicated elements
Here are the examples of R functions unique() and duplicated().
> a=c(1,2,3,2,1,1,6,0)
> unique(a) # remove any duplicated elements
[1] 1 2 3 6 0
> duplicated(a) # which are duplicated elements
[1] FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE
> a[duplicated(a)]
[1] 2 1 1
> unique(a[duplicated(a)]) # 2 and 1 are the duplicated elements
[1] 2 1

Moreover, how can we know those elements shared by two vectors, or those unique elements?
> a
[1] 1 2 3 2 1 1 6 0
> b
[1] 1 40 23 2

> intersect(a,b) # those shared elements
[1] 1 2

> setdiff(a,b) # the elements are involved only in a but not in b
[1] 3 6 0
> setdiff(b,a) # the elements are involved only in b but not in a
[1] 40 23


writing date: 2015.01.23



No comments:

Post a Comment