Chapter 3 Data structure in R
Understanding R data structure is very important. Basic data structures in R include Vector, Matrix, Data Frame, Factor and List. By learning this Chapter you will understand how to work with the right data structure you might need.
3.1 Vector
Vectors are a fundamental concept in R, and many functions in R returns results as vectors.
A vector is a one-dimensional array of values, the value can be character, logical, integer or numeric. But a vector can only contain values of the same type.
HINT: YOU can get the documentation of a function using ? or help . E.g ?rep or help(rep)
TRY:
x <- rep(1,3)y <- 1:3z <- c(1,2,3)c()is the function to combine values, try the two following commands:
c(x,y)x+yQuiz
1. What is the difference of c(a,b) and a+b
2. Create a vector with "R" "is" "fun"
HINT: use c()
A new vector can be created by splicing an existing vector with a numerical indexes. We can use the indexes yo slice the vector. To slice between two indexes, we can use the colon operator : .
Here is an example of create a vector with students marks. You can use names() to assign the names for values inside the vector.
marks <- c(50, 100, 90, 80, 70)
student_names <- c("Amy","Bobby","Cindy","Eddy","Dylon")names(marks) <- student_namesmarks[c(2:4)] marks[c("Bobby","Cindy")] Quiz
1. Get the highest mark amongst Amy, Cindy, Dylon
HINT: use max()
3.2 Matrix
Matrix is a two dimensional data structure in R programming. Matrix is a two-dimensional vector. Same as a vector, all values in a matrix should be of the same type. And all columns should be the same length.
3.2.1 Make a matrix
You can simply make a matrxi like this:
m <- matrix(1:15, nrow = 3, ncol = 5)You can change the column names and row names:
#Change names
colnames(m) <- c("A","B","C","D","E")
rownames(m) <- c("X","Y","Z")You can also make a matrix by cbind() and rbind() to bind columns or rows.
cbind(c(1:9),c(11:19))
rbind(c(1:9),c(11:19))We can also bind a column or row to the existing matrix.
cbind(m,c(16,17,18))
rbind(m,c(20,21,22,23,24))The value in a matrix can be be accessed as [row_index, column_index].
m <- matrix(1:15, nrow = 3, ncol = 5)
# select rows 1 & 2 and columns 1 & 2
m[c(1,2),c(1,2)]
# select all columns
m[c(1,2),] which() function returns the true indices of a logical object, try:
which.min(m)
which.max(m)
which(m == 7)3.2.2 Modify a matrix
Assign a value, <- and = do the same work
<- assignment (right to left)
= assignment (right to left)
Assign all elements less than 5 to 0
m[m<5] <- 0You can transpose a matrix by t()
t(m)Remove last row
m <- m[-3,]
# or
m <- m[-nrow(m),]3.3 DataFrame
Data frame is a two dimensional data structure in R. It is similar with matrix, but you can have different data types for elements in a data frame.
x <- data.frame("SID" = 1:3, "Age" = c(23,25,21), "Name" = c("Amy","Bobby","Cindy"), "Mark" = c(100,82,75))Use [ will return us a data frame.
TRY:
x["Name"]Accessing with [[ or $ is similar. They will return the result as a vector.
x[["Name"]]
x$NameSelect students with mark greater than 80 and make a new list called x_highmark.
x_highmark <- x[x$Mark>80,]Quiz
1.Select Students with mark greater than 80 and get the average age of them.
HINT: use mean()
3.4 Factor
Factors are variables in R for categorical variables.
category = c(0,1,1,1,1,2,2,2,1,2,1,1,1)
fdata = factor(category)
fdata## [1] 0 1 1 1 1 2 2 2 1 2 1 1 1
## Levels: 0 1 2
Or you can factor the vector with specific names.
fdata1 = factor(category,labels=c("A","B","C"))
fdata1## [1] A B B B B C C C B C B B B
## Levels: A B C
You can not treat the values in factors as numerical data, for example, if you use mean(fdata) you will get the warning as below.
mean(fdata)## Warning in mean.default(fdata): argument is not numeric or logical: returning NA
## [1] NA
If we want to calculate the mean of the original numeric values of the fdata variable, we can use
mean(as.numeric(fdata))## [1] 2.230769
3.5 List
List is a data structure having mixed data types. It is the most flexible data structure in R. Data frame is a special case of list.
We can check the data type with typeof() function and find the length using length(), we can use str() to find the data structure.
x <- list("a" = 1000, "b" = TRUE, "c" = 1:3)
typeof(x)
length(x)
str(x)