Lab 3: Billboard Hot 100


Instructions

  • Please answer these questions using code.

  • The only R printouts should be the answers to the questions. Make sure your code does not display any extra information.

  • If the question is fully answered by the code, you do not need to also answer it in text form. For example, you do not need to write “The median weight of penguins is 4050”; you can just compute the median.

  • However, for more open-ended questions, you should still provide text to motivate and interpret your work.

  • Think about making your code readable. Use white space (spacebars and new lines) carefully, use variable names that are clear, and be deliberate about how you use code chunks.


The Data

Today, we will study song popularity. In the US, the Billboard Hot 100 is a list that comes out every week, showing the 100 most played songs that week.

The following code will load a dataset of Billboard Hot 100 songs. More information about the creation of this dataset, as well as some analyses by the author, can be found here: https://mikekling.com/analyzing-the-billboard-hot-100/ The dataset you are provided is a limited version of the full data, containing: - The title - The artist - The highest rank the song ever reached (1 is the best) - The number of weeks the song was on the chart - The latest date the song appeared on the Billboard Hot 100

songs <- read.table("https://www.dropbox.com/s/jrwjthqo9b5o07g/billboard_songs.txt?dl=1", header = TRUE, stringsAsFactors = FALSE)

Advice

This is a very large dataset! Consider using a function like sample_n to create a small dataset with only, say, 200 of the rows. You can try all your code out on the smaller dataset first, and then only run the analysis of the full data after you have perfected everything.


Setup

Do any data cleaning you need.

Hint: You’ll want to create a datetime object for the date the song leaves the chart.


Questions

  1. What 10 songs spent the longest on the charts? Give only the title, artist, and weeks.

  2. Find the oldest song(s) in this dataset, i.e., the earliest songs to enter the charts. What date did they leave the charts?

  3. What hit songs could have been played at your 10th birthday party? That is, which songs that eventually peaked at #1 entered the charts within two months (before or after) your 10th birthday? Give only the song title, artist, and date of chart entry.

  4. Which five artists had the most number 1 hits?

For this question, you may ignore songs with more than one artist listed.

  1. What is the most common word, at least 4 letters long, used in the title of any song? Give only the word itself, and its count.

Hint: “hello” and “Hello” are the same word!

  1. Let’s take a look at artists who work together on songs. Which artists have featured on the most Billboard charting songs?

Hint: The functions separate() or str_split() might be useful to you.

Definitions:

RAE SREMMURD featuring NICKI MINAJ & YOUNG THUG

In this string, Nicki Minaj and Young Thug are considered to be featured.

JESSIE J, ARIANA GRANDE & NICKI MINAJ

In this string, Jessie J and Ariana Grande and Nicki Minaj all worked together (collaborated), but nobody was featured.


Challenge

Choose a musical artist or band that has charted in at least 5 of the years in this dataset.

Make a visualization that summarizes their Billboard success over time.