Chapter 3 Data structure in R

Understanding R data structure is very important. Basic data structures in R include Vector, Matrix, Data Frame, Factor and List. By learning this Chapter you will understand how to work with the right data structure you might need.

3.1 Vector

Vectors are a fundamental concept in R, and many functions in R returns results as vectors. A vector is a one-dimensional array of values, the value can be character, logical, integer or numeric. But a vector can only contain values of the same type.

HINT: YOU can get the documentation of a function using ? or help . E.g ?rep or help(rep)

TRY:

x <- rep(1,3)
y <- 1:3
z <- c(1,2,3)

c()is the function to combine values, try the two following commands:

c(x,y)
x+y

Quiz

1. What is the difference of c(a,b) and a+b
2. Create a vector with "R" "is" "fun"
HINT: use c()

A new vector can be created by splicing an existing vector with a numerical indexes. We can use the indexes yo slice the vector. To slice between two indexes, we can use the colon operator : . Here is an example of create a vector with students marks. You can use names() to assign the names for values inside the vector.

marks <- c(50, 100, 90, 80, 70) 
student_names <- c("Amy","Bobby","Cindy","Eddy","Dylon")
names(marks) <- student_names
marks[c(2:4)] 
marks[c("Bobby","Cindy")] 

Quiz

1. Get the highest mark amongst Amy, Cindy, Dylon
HINT: use max()

3.2 Matrix

Matrix is a two dimensional data structure in R programming. Matrix is a two-dimensional vector. Same as a vector, all values in a matrix should be of the same type. And all columns should be the same length.

3.2.1 Make a matrix

You can simply make a matrxi like this:

m <- matrix(1:15, nrow = 3, ncol = 5)

You can change the column names and row names:

#Change names
colnames(m) <- c("A","B","C","D","E")
rownames(m) <- c("X","Y","Z")

You can also make a matrix by cbind() and rbind() to bind columns or rows.

cbind(c(1:9),c(11:19))
rbind(c(1:9),c(11:19))

We can also bind a column or row to the existing matrix.

cbind(m,c(16,17,18))
rbind(m,c(20,21,22,23,24))

The value in a matrix can be be accessed as [row_index, column_index].

m <- matrix(1:15, nrow = 3, ncol = 5)
# select rows 1 & 2 and columns 1 & 2
m[c(1,2),c(1,2)] 
# select all columns
m[c(1,2),] 

which() function returns the true indices of a logical object, try:

which.min(m)
which.max(m)
which(m == 7)

3.2.2 Modify a matrix

Assign a value, <- and = do the same work

<-         assignment (right to left)
=          assignment (right to left)

Assign all elements less than 5 to 0

m[m<5] <- 0

You can transpose a matrix by t()

t(m)

Remove last row

m <- m[-3,]
# or
m <- m[-nrow(m),]

3.3 DataFrame

Data frame is a two dimensional data structure in R. It is similar with matrix, but you can have different data types for elements in a data frame.

x <- data.frame("SID" = 1:3, "Age" = c(23,25,21), "Name" = c("Amy","Bobby","Cindy"), "Mark" = c(100,82,75))

Use [ will return us a data frame. TRY:

x["Name"]

Accessing with [[ or $ is similar. They will return the result as a vector.

x[["Name"]]
x$Name

Select students with mark greater than 80 and make a new list called x_highmark.

x_highmark <- x[x$Mark>80,]

Quiz

1.Select Students with mark greater than 80 and get the average age of them.
HINT: use mean()

3.4 Factor

Factors are variables in R for categorical variables.

category = c(0,1,1,1,1,2,2,2,1,2,1,1,1)
fdata = factor(category)
fdata
##  [1] 0 1 1 1 1 2 2 2 1 2 1 1 1
## Levels: 0 1 2

Or you can factor the vector with specific names.

fdata1 = factor(category,labels=c("A","B","C"))
fdata1
##  [1] A B B B B C C C B C B B B
## Levels: A B C

You can not treat the values in factors as numerical data, for example, if you use mean(fdata) you will get the warning as below.

mean(fdata)
## Warning in mean.default(fdata): argument is not numeric or logical: returning NA
## [1] NA

If we want to calculate the mean of the original numeric values of the fdata variable, we can use

mean(as.numeric(fdata))
## [1] 2.230769

3.5 List

List is a data structure having mixed data types. It is the most flexible data structure in R. Data frame is a special case of list.

We can check the data type with typeof() function and find the length using length(), we can use str() to find the data structure.

x <- list("a" = 1000, "b" = TRUE, "c" = 1:3)
typeof(x)
length(x)
str(x)