In this module, you’ll review a few basic hypothesis tests, and learn how to make R do the calculations for you.

If your memory of hypothesis testing is fresh, you may be able to skip the review parts of these sections. You are only expected to know the basic idea behind each test, not every detail.

Videos: 10-45 min

Readings: 30 min

Activities: 30 min

Check-ins: 2

A **t-test** is used when your hypotheses involve **one or two mean values**, such as \[ H_0: \mu_1 = \mu_2 \] \[ H_a: \mu_1 > \mu_2 \]

Functions: `t.test()`

in base R, or `t_test()`

in the `infer`

package.

```
##
## Welch Two Sample t-test
##
## data: bill_length_mm by sex
## t = -6.6725, df = 329.29, p-value = 1.066e-10
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -4.865676 -2.649908
## sample estimates:
## mean in group female mean in group male
## 42.09697 45.85476
```

```
## Warning: The statistic is based on a difference or ratio; by default, for
## difference-based statistics, the explanatory variable is subtracted in the
## order "female" - "male", or divided in the order "female" / "male" for ratio-
## based statistics. To specify this order yourself, supply `order = c("female",
## "male")`.
```

```
## # A tibble: 1 x 6
## statistic t_df p_value alternative lower_ci upper_ci
## <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 -6.67 329. 1.07e-10 two.sided -4.87 -2.65
```

State (in words) the null and alternate hypotheses for the test in the code above.

State your conclusion.

A **Chi-Square** Test is used when your hypotheses involve counts or percents.

Similarly, one option is `chisq.test()`

in base R, which needs a *two-way table* as input.

The other option is `chisq_test()`

in `infer`

, which takes a data frame and variables as input. Be careful, though - the variables must be categorical to be appropriate for a chi squared test.

```
## Warning in chisq.test(my_table[, -1]): Chi-squared approximation may be
## incorrect
```

```
##
## Pearson's Chi-squared test
##
## data: my_table[, -1]
## X-squared = 18.036, df = 4, p-value = 0.001214
```

```
mtcars %>%
mutate(
cyl = factor(cyl),
gear = factor(gear)
) %>%
chisq_test(
response = gear,
explanatory = cyl
)
```

```
## Warning in stats::chisq.test(table(x), ...): Chi-squared approximation may be
## incorrect
```

```
## # A tibble: 1 x 3
## statistic chisq_df p_value
## <dbl> <int> <dbl>
## 1 18.0 4 0.00121
```

Why did we include the

`[,-1]`

in the first code chunk?Why did we include the

`mutate()`

step in the second code chunk?What happens if you swap the response and explanatory variable in the second code chunk?

What do you conclude from this test?

The tests above, and other like them, assume a *distribution* of your test statistic.

We assume that a difference of sample means is approximately Normal (t), because of the Central Limit Theorem. There is also underlying math involved in showing that the test statistic for the Chi-Square test has - you guessed it! - a Chi-Square distribution.

These are called **parametric tests**.

However, sometimes we don’t feel comfortable that all our assumptions are met to assume a distribution, or perhaps we are interested in a test statistic that does not have an easy-to-derive distribution. In these cases. we might want to use a **nonparametric test**.

(Bootstrapping is a form of *nonparametric analysis*!)

The **permutation test** relies on random resampling of the data to determine how “extreme” the original data is.

(Stop at 8 minutes in.)