Thursday, November 12, 2015

## Question 1

Give examples for the following statistic variables.

• nominal variables
• ordinal variables
• interval variables
• ratio variables

## Answer 1

# nominal: gender(male vs female)
# ordinal: ranking (very fast, fast, slow)
# interval: geographic longitudes
# ratio: absolute zero (temperature)

## Question 2

What is z-score? What can it be used for?

## Answer 2

# makes the distributions standardized
# z-score can be used to find percentile rank

## Question 3

In R, what are the function names for z-score, median, minimum and mode?

## Answer 3

# scale()
# median()
# min()

# sorted_freq <- sort(table(Height), decreasing=T)
# max_freq <- sorted_freq
# names(max_freq)

## Question 4

What is the mean of a z-score?

## Answer 4

# 0

## Question 5

What is a positive z-score?

## Answer 5

# a positive z-score is above average

## Question 6

Why do we need to calculate variability?

## Answer 6

# to find out whether data is scarse or not

## Before we get started …

• Load the library datasets
• We would use the data trees again
require(datasets)
str(trees)
## 'data.frame':    31 obs. of  3 variables:
##  $Girth : num 8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 ... ##$ Height: num  70 65 63 72 81 83 66 75 80 75 ...
##  \$ Volume: num  10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 ...

## Exercises

• [a] What is the maximum value within column Height?
• [b] What are the median and variance values within column Volume?
• [c] Within Girth column, extract values that are larger than 12.0 (not included). Scale the extracted values.
• [d] Within Volume column, extract values that are smaller than 31.7 (included). Plot a scaled histgram and add a vertical red line to x-axis value equals 0.
• [e] Within Height column, which one is the mode?

## Answers

attach(trees)
max(Height)

median(Volume)
var(Volume)

scale(Girth[Girth > 12.0])

v <- scale(Volume[Volume <= 31.7])
hist(v)
abline(v=0, col='red')
names(sort(-table(Height)))
detach(trees)