Chapter 3 Data structure in R
Understanding R data structure is very important. Basic data structures in R include Vector, Matrix, Data Frame, Factor and List. By learning this Chapter you will understand how to work with the right data structure you might need.
3.1 Vector
Vectors are a fundamental concept in R, and many functions in R returns results as vectors.
A vector is a one-dimensional array of values, the value can be character
, logical
, integer
or numeric
. But a vector can only contain values of the same type.
HINT: YOU can get the documentation of a function using ?
or help
. E.g ?rep
or help(rep)
TRY:
<- rep(1,3) x
<- 1:3 y
<- c(1,2,3) z
c()is the function to combine values, try the two following commands:
c(x,y)
+y x
Quiz
1. What is the difference of c(a,b) and a+b
2. Create a vector with "R" "is" "fun"
HINT: use c()
A new vector can be created by splicing an existing vector with a numerical indexes. We can use the indexes yo slice the vector. To slice between two indexes, we can use the colon operator :
.
Here is an example of create a vector with students marks. You can use names()
to assign the names for values inside the vector.
<- c(50, 100, 90, 80, 70)
marks <- c("Amy","Bobby","Cindy","Eddy","Dylon") student_names
names(marks) <- student_names
c(2:4)] marks[
c("Bobby","Cindy")] marks[
Quiz
1. Get the highest mark amongst Amy, Cindy, Dylon
HINT: use max()
3.2 Matrix
Matrix is a two dimensional data structure in R programming. Matrix is a two-dimensional vector. Same as a vector, all values in a matrix should be of the same type. And all columns should be the same length.
3.2.1 Make a matrix
You can simply make a matrxi like this:
<- matrix(1:15, nrow = 3, ncol = 5) m
You can change the column names and row names:
#Change names
colnames(m) <- c("A","B","C","D","E")
rownames(m) <- c("X","Y","Z")
You can also make a matrix by cbind()
and rbind()
to bind columns or rows.
cbind(c(1:9),c(11:19))
rbind(c(1:9),c(11:19))
We can also bind a column or row to the existing matrix.
cbind(m,c(16,17,18))
rbind(m,c(20,21,22,23,24))
The value in a matrix can be be accessed as [row_index, column_index].
<- matrix(1:15, nrow = 3, ncol = 5)
m # select rows 1 & 2 and columns 1 & 2
c(1,2),c(1,2)]
m[# select all columns
c(1,2),] m[
which()
function returns the true indices of a logical object, try:
which.min(m)
which.max(m)
which(m == 7)
3.2.2 Modify a matrix
Assign a value, <-
and =
do the same work
<- assignment (right to left)
= assignment (right to left)
Assign all elements less than 5 to 0
<5] <- 0 m[m
You can transpose a matrix by t()
t(m)
Remove last row
<- m[-3,]
m # or
<- m[-nrow(m),] m
3.3 DataFrame
Data frame is a two dimensional data structure in R. It is similar with matrix, but you can have different data types for elements in a data frame.
<- data.frame("SID" = 1:3, "Age" = c(23,25,21), "Name" = c("Amy","Bobby","Cindy"), "Mark" = c(100,82,75)) x
Use [
will return us a data frame.
TRY:
"Name"] x[
Accessing with [[
or $
is similar. They will return the result as a vector.
"Name"]]
x[[$Name x
Select students with mark greater than 80 and make a new list called x_highmark.
<- x[x$Mark>80,] x_highmark
Quiz
1.Select Students with mark greater than 80 and get the average age of them.
HINT: use mean()
3.4 Factor
Factors are variables in R for categorical variables.
= c(0,1,1,1,1,2,2,2,1,2,1,1,1)
category = factor(category)
fdata fdata
## [1] 0 1 1 1 1 2 2 2 1 2 1 1 1
## Levels: 0 1 2
Or you can factor the vector with specific names.
= factor(category,labels=c("A","B","C"))
fdata1 fdata1
## [1] A B B B B C C C B C B B B
## Levels: A B C
You can not treat the values in factors as numerical data, for example, if you use mean(fdata)
you will get the warning as below.
mean(fdata)
## Warning in mean.default(fdata): argument is not numeric or logical: returning NA
## [1] NA
If we want to calculate the mean of the original numeric values of the fdata variable, we can use
mean(as.numeric(fdata))
## [1] 2.230769
3.5 List
List is a data structure having mixed data types. It is the most flexible data structure in R. Data frame is a special case of list.
We can check the data type with typeof()
function and find the length using length()
, we can use str()
to find the data structure.
<- list("a" = 1000, "b" = TRUE, "c" = 1:3)
x typeof(x)
length(x)
str(x)