Thursday, November 12, 2015

Recap with quick questions

Question 1

Give examples for the following statistic variables.

  • nominal variables
  • ordinal variables
  • interval variables
  • ratio variables

Answer 1

# nominal: gender(male vs female)
# ordinal: ranking (very fast, fast, slow)
# interval: geographic longitudes
# ratio: absolute zero (temperature)

Question 2

What is z-score? What can it be used for?

Answer 2

# makes the distributions standardized
# z-score can be used to find percentile rank

Question 3

In R, what are the function names for z-score, median, minimum and mode?

Answer 3

# scale()
# median()
# min()

# sorted_freq <- sort(table(Height), decreasing=T)
# max_freq <- sorted_freq[1]
# names(max_freq)

Question 4

What is the mean of a z-score?

Answer 4

# 0

Question 5

What is a positive z-score?

Answer 5

# a positive z-score is above average

Question 6

Why do we need to calculate variability?

Answer 6

# to find out whether data is scarse or not

Exercises

Before we get started …

  • Load the library datasets
  • We would use the data trees again
require(datasets)
str(trees)
## 'data.frame':    31 obs. of  3 variables:
##  $ Girth : num  8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 ...
##  $ Height: num  70 65 63 72 81 83 66 75 80 75 ...
##  $ Volume: num  10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 ...

Exercises

  • [a] What is the maximum value within column Height?
  • [b] What are the median and variance values within column Volume?
  • [c] Within Girth column, extract values that are larger than 12.0 (not included). Scale the extracted values.
  • [d] Within Volume column, extract values that are smaller than 31.7 (included). Plot a scaled histgram and add a vertical red line to x-axis value equals 0.
  • [e] Within Height column, which one is the mode?

Answers

attach(trees)
max(Height)

median(Volume)
var(Volume)

scale(Girth[Girth > 12.0])

v <- scale(Volume[Volume <= 31.7])
hist(v)
abline(v=0, col='red')
names(sort(-table(Height))[1])
detach(trees)