Thursday, October 29, 2015

Quick Recap

  • What's the differences? (dataset: trees)
trees[order(Height),]

dt <- data.table(trees)
dt[, .(Girth, Volume), by=Height]

Before we get started …

  • install the package "datasets", and load it in
require(datasets)
  • We would use "warpbreaks" in this lab session
str(warpbreaks)
## 'data.frame':    54 obs. of  3 variables:
##  $ breaks : num  26 30 54 25 70 52 51 26 67 18 ...
##  $ wool   : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 1 1 ...
##  $ tension: Factor w/ 3 levels "L","M","H": 1 1 1 1 1 1 1 1 1 2 ...

Recap with quick questions

Question 1

Count the occurrences of each break, and assign the results to a variable, freq.

Answer 1

freq <- table(warpbreaks[,1]); freq
## 
## 10 12 13 14 15 16 17 18 19 20 21 24 25 26 27 28 29 30 31 35 36 39 41 42 43 
##  1  1  1  1  3  2  2  3  2  2  4  2  1  4  1  3  4  2  1  1  2  2  1  1  1 
## 44 51 52 54 67 70 
##  1  1  1  1  1  1

Question 2

Based on Question 1, compute the proportion of each break, and assign the output to a variable percent.

Answer 2

percent <- table(warpbreaks[,1])/sum(warpbreaks[,1])
percent
## 
##           10           12           13           14           15 
## 0.0006578947 0.0006578947 0.0006578947 0.0006578947 0.0019736842 
##           16           17           18           19           20 
## 0.0013157895 0.0013157895 0.0019736842 0.0013157895 0.0013157895 
##           21           24           25           26           27 
## 0.0026315789 0.0013157895 0.0006578947 0.0026315789 0.0006578947 
##           28           29           30           31           35 
## 0.0019736842 0.0026315789 0.0013157895 0.0006578947 0.0006578947 
##           36           39           41           42           43 
## 0.0013157895 0.0013157895 0.0006578947 0.0006578947 0.0006578947 
##           44           51           52           54           67 
## 0.0006578947 0.0006578947 0.0006578947 0.0006578947 0.0006578947 
##           70 
## 0.0006578947

Question 3

Draw a histogram for freq, and draw a red line on the histgram based on its density.

Answer 3

hist(freq)
lines(density(freq), col='red')

Question 4

Draw a pie chart for percent.

Answer 4

pie(percent)

Question 5

What can a boxplot be used for?

Answer 5

# a boxplot can be used to present 
# min, median, max and quantiles of the data.

Question 6

Draw a boxplot on breaks, and rotate the y-axis labels 90 degrees clockwise.

Answer 6

boxplot(warpbreaks[,1], las=1)

Question 7

Draw a bar chart for number of breaks with type A wool, give a title name Type A wool.

Answer 7

attach(warpbreaks)
barplot(breaks[wool=='A'], main='Type A wool')

detach(warpbreaks)

Exercises

Before we get started …

We would use "anorexia" dataset in this exercise.

require(MASS)
## Loading required package: MASS
## Warning: package 'MASS' was built under R version 3.1.3
str(anorexia)
## 'data.frame':    72 obs. of  3 variables:
##  $ Treat : Factor w/ 3 levels "CBT","Cont","FT": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Prewt : num  80.7 89.4 91.8 74 78.1 88.3 87.3 75.1 80.6 78.4 ...
##  $ Postwt: num  80.2 80.1 86.4 86.3 76.1 78.1 75.1 86.7 73.5 84.6 ...

Exercises

  • [a] What is the differences between '=' and '=='?
  • [b] Draw a histogram for Postwt, and rename the x-axis label as Weight of patient after study period, in lbs.
  • [c] Draw a boxplot for Prewt based on Treat.
  • [d] Draw a horizontal barplot for Postwt with Treat labeled as "FT".
  • [e] Make a contingency table for Prewt based on Treat, and draw a bar plot.
  • [f] Draw a pie chart for Treat (remember using proportions), and create a lengend located at the topright of the graph. And assign three colors 'red', 'yellow', and 'green' to the labels.
  • [g] Draw a box around the above pie chart.

Answers

# '=' is to assign a variable; 
# whereas '==' refers to having the same values.

attach(anorexia)
hist(Postwt, xlab='Weight of patient after study period, in lbs')
boxplot(Prewt~Treat)
barplot(Postwt[Treat=='FT'], horiz=T)
barplot(table(Prewt, Treat))
cols <- c('red','yellow','green')
pie(table(Treat)/length(Treat), col=cols)
legend('topright',c('CBT','Cont','FT'), fill=cols)

box(); detach(anorexia)