Thursday, October 22, 2015

Before we get started …

  • install the package "datasets" and "data.table", and load it in
require(datasets)
require(data.table)
## Loading required package: data.table
## Warning: package 'data.table' was built under R version 3.1.3
  • We would use "sleep" in this lab session
  • Before we get started, type the following:
dt <- data.table(sleep)

Recap with quick questions

Question 1

What are the differences between data.frame() and data.table()?

Answer 1

# - data.table can extract rows without commas
# - data.table allow multiple functions to compute on data

Question 2

Extract the second and third rows in "sleep". (How to do so using data.frame() and data.table()?)

Answer 2

# data.frame()
sleep[2:3,]
##   extra group ID
## 2  -1.6     1  2
## 3  -0.2     1  3
# data.table()
dt[2:3]
##    extra group ID
## 1:  -1.6     1  2
## 2:  -0.2     1  3

Question 3

Extract the second and third columns in "sleep". (How to do so using data.frame() and data.table()?)

Answer 3

# data.frame()
sleep[,2:3]

# data.table()
dt[,.(group,ID)]

Question 4

What is the sum of the first column in "sleep"? (How to do so using data.frame() and data.table()?)

Answer 4

# data.frame()
sum(sleep$extra)
## [1] 30.8
# data.table()
dt[,.(total = sum(sleep$extra))] 
##    total
## 1:  30.8
dt[, sum(sleep$extra)]
## [1] 30.8

Question 5

Reorder the data "sleep" by the first column.

Answer 5

# data.frame()
sleep[order(sleep$extra),]

Question 6

Explore dataset "trees". How many rows and columns?

Answer 6

dim(trees)
## [1] 31  3

Question 7

In "trees", extract rows that have Girth larger than 12.0 (exclude) and Height larger than 80 (include).

Answer 7

attach(trees)
trees[Girth > 12.0 & Height >= 80, ]
##    Girth Height Volume
## 17  12.9     85   33.8
## 18  13.3     86   27.4
## 22  14.2     80   31.7
## 26  17.3     81   55.4
## 27  17.5     82   55.7
## 28  17.9     80   58.3
## 29  18.0     80   51.5
## 30  18.0     80   51.0
## 31  20.6     87   77.0
detach(trees)

Exercises

Before we get started …

We would use "trees" dataset in this exercise.

Exercises

  • What is the maximum value of Height?
  • Group the data "trees" by Volume.
  • Create a new column Average_Height, and put the value of average height to the column.
  • Extract rows that have Volume not equal to 10.3.
  • Create a new column Factor_Height. Change the numeric type of Height into factors. And for values that are lower than 75 (include), label as level "1"; values that range from 76-80, as level "2"; and values that are higher than 81, assigned as level "3".
  • Count the occurrences of Volume by Factor_Height.
  • So, what is a factor? What can we do with it?

Exercises

  • For question (f), the final output should look like the following:
  • The final output of your dataframe should look like the following:

Answers

attach(trees)
max(Height)

trees[order(Volume),]

trees$Average_Height <- mean(Height)

trees[Volume !=10.3,]

factor_height <- factor(Height)
levels(factor_height) <- c(rep(1,10),rep(2,5), rep(3,6))
trees$Factor_Height <- factor_height

table(Volume, trees$Factor_Height)
detach(trees)

## A factor could be used to help doing data categorization.