The Tidyverse and dplyr


In this lesson, you’ll learn how to wrangle data using the dplyr package in the tidyverse

When you are finished, you should be able to…


Time Estimates:
     Videos: 40 min
     Readings: 0-60 min
     Activities: 20 min
     Check-ins: 1



What is the Tidyverse?


Required Video: Intro to the Tidyverse




Optional Video: The beginning of the word ‘tidyverse’



Wrangling data with dplyr


Required Reading: Tibbles


Required Video: dplyr



See the slides.     


Recommended Reading: Data Wrangling


Recommended Reading: Data Transformation


Recommended Tutorial: Practice with Dplyr




Check-In 1: dplyr


Question 1: Suppose we would like to study how the ratio of penguin body mass to flipper size differs across the species. Rearrange the following steps in the pipeline into an order that accomplishes this goal.

# a
arrange(avg_mass_flipper_ratioo)


# b
group_by(species)

# c
penguins 
  

# d
summarize(
  avg_mass_flipper_ratioo = median(mass_flipper_ratio)
)
  
# e
mutate(
  mass_flipper_ratio = body_mass_g/flipper_length_mm
)

Question 2:

Consider the base R code below.

mean(penguins[penguins$species == "Adelie", "body_mass_g"])

For each of the following dplyr pipelines, indicate if it

  1. Returns the exact same thing as the Base R code;
  2. Returns the correct information, but the wrong object type;
  3. Returns incorrect information; or
  4. Returns an error
# a
penguins %>%
  filter("body_mass_g") %>%
  pull("Adelie") %>%
  mean()


# b
penguins %>%
  filter(species == "Adelie") %>%
  select(body_mass_g) %>%
  summarize(mean(body_mass_g))


# c
penguins %>%
  pull(body_mass_g) %>%
  filter(species == "Adelie") %>%
  mean()

# d
penguins %>%
  filter(species == "Adelie") %>%
  select(body_mass_g) %>%
  mean()

# e
penguins %>%
  filter(species == "Adelie") %>%
  pull(body_mass_g) %>%
  mean()

# f
penguins %>%
  select(species == "Adelie") %>%
  filter(body_mass_g) %>%
  summarize(mean(body_mass_g))

Walkthrough of cereals activity


Optional Video: Live coding of cereals dataset