Practice: Grammar of Graphics

Instructions and Data

We recommend that you use a new Quarto document to write the commands associated with this set of practice problems.

# required package
library(tidyverse)

ggplot2 cheat sheet. Likewise, while working on the practice problems have at hand the cheat sheet for ggplot2

https://posit.co/wp-content/uploads/2022/10/data-visualization-1.pdf

Iris data set

We are going to use the famous iris data set which is a built-in data frame in R. This data set contains petal and sepal measurements of iris flowers from three different species (see image below).

# iris data set: first 5 rows
head(iris, n = 5)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
# frequency of Species
count(iris, Species)
     Species  n
1     setosa 50
2 versicolor 50
3  virginica 50

1) Mappings and Geometries (Fill in the blanks)

In your quarto document, create code chunks and complete the commands to obtain the following graphics (use one chunk per graphic!)

  1. Histogram of Sepal.Length
# a) histogram of Sepal.Length
ggplot(data = ______,
       mapping = aes(x = ______________)) +
  geom____________()
Show answer
ggplot(data = iris,
       mapping = aes(x = Sepal.Length)) +
  geom_histogram()
  1. Density plot of Sepal.Length
# b) density plot of Sepal.Length
ggplot(data = ______,
       mapping = aes(x = ______________)) +
  geom____________()
Show answer
ggplot(data = iris,
       mapping = aes(x = Sepal.Length)) +
  geom_density()
  1. Violin plots of Sepal.Length (x) by Species (y)
# c) violin plot of Sepal.Length
ggplot(data = ______,
       mapping = aes(x = ____________, y = ___________)) +
  geom____________()
Show answer
ggplot(data = iris,
       mapping = aes(x = Sepal.Length, y = Species)) +
  geom_violin()
  1. Scatter plot of Sepal.Length (x) and Sepal.Width (y)
# d) scatterplot of Sepal.Length (x) and Sepal.Width (y)
ggplot(data = ______,
       mapping = aes(x = ______________, y = ______________)) +
  geom____________()
Show answer
ggplot(data = iris,
       mapping = aes(x = Sepal.Length, y = Sepal.Width)) +
  geom_point()
  1. Scatter plot of Sepal.Length (x) and Sepal.Width (y) coloring (color) points by Species.
# e) scatter plot of Sepal.Length (x) and Sepal.Width (y)
# coloring (color) points by Species
ggplot(data = ______,
       mapping = aes(___ = _____________, 
                     ___ = _____________,
                     _____ = _____________)) +
  geom____________()
Show answer
ggplot(data = iris,
       mapping = aes(x = Sepal.Length, 
                     y = Sepal.Width,
                     color = Species)) +
  geom_point()
  1. Boxplots of Sepal.Length (x) by Species (y)
# f) boxplots of Sepal.Length (x) by Species (y)
ggplot(data = ______,
       mapping = aes(x = ______________, y = ______________)) +
  geom_____________()
Show answer
ggplot(data = iris,
       mapping = aes(x = Sepal.Length, y = Species)) +
  geom_boxplot()
  1. Boxplots of Sepal.Length (y) by Species (x)
# g) boxplots of Sepal.Length (y) by Species (x)
ggplot(data = ______,
       mapping = aes(x = ______________, y = ______________)) +
  geom_____________()
Show answer
ggplot(data = iris,
       mapping = aes(x = Species, y = Sepal.Length)) +
  geom_boxplot()
  1. Density plots of Sepal.Length, color-filled (fill) by Species
# h) density plots of Sepal.Length, color-filled (fill) by Species
ggplot(data = ______,
       mapping = aes(x = ______________,
                     fill = ____________)) +
  geom___________()
Show answer
ggplot(data = iris,
       mapping = aes(x = Sepal.Length, fill = Species)) +
  geom_density()

3) Settings -vs- Mappings

In the Grammar of Graphics, it is important to understand the difference between a mapping and a setting. Recall that a setting is when you set or fix the value of a visual attribute to a constant or a value that does NOT come from the data frame.

  1. Histogram of Sepal.Length, filling the bars in "orange" (fill) and changing the color of the bar borders to "white" (color)
# a) histogram of Sepal.Length, filling the bars in "orange" (fill)
# and changing the color of the bar borders to "white" (color)
ggplot(data = ______,
       mapping = aes(x = ______________)) +
  geom____________(fill = _______, color = _________)
Show answer
ggplot(data = iris,
       mapping = aes(x = Sepal.Length)) +
  geom_histogram(fill = "orange", color = "white")
  1. Scatter plot of Sepal.Length (x) and Sepal.Width (y) coloring points in "red" (color) change size of points to 3 (size)
# b) scatter plot of Sepal.Length (x) and Sepal.Width (y)
# coloring points in "red" (color) 
# change size of points to 3 (size)
ggplot(data = ______,
       mapping = aes(___ = _____________, 
                     ___ = _____________)) +
  geom__________(color = ________, size = ___)
Show answer
gggplot(data = iris,
       mapping = aes(x = Sepal.Length, 
                     y = Sepal.Width)) +
  geom_point(color = "red", size = 3)
  1. Bar plot of Species from a random sample of 40 flowers filling color of bars to "turquoise".
# c) bar plot of Species from a random sample of 40 flowers
# filling color of bars to "turquoise"
set.seed(246)
iris_sample = slice_sample(iris, n = 40, replace = TRUE)

ggplot(data = ____________,
       mapping = aes(x = _________)) +
  geom__________(fill = ________)
Show answer
# random sample of 40 flowers
set.seed(246)
iris_sample = slice_sample(iris, n = 40, replace = TRUE)

ggplot(data = iris_sample,
       mapping = aes(x = Species)) +
  geom_bar(fill = "turquoise")

3) Labels, Annotations and Themes

  1. Choose two numerical variables from iris and graph a scatter plot, set the color of points to "blue", and add the following:
  • title
  • x-axis label
  • y-axis label
# a) scatterplot, set color of points to "blue"
# adding title and axis labels
ggplot(data = ______,
       mapping = aes(___ = _____________, 
                     ___ = _____________)) +
  geom____________(color = _____) +
  labs(title = __________,
       x = __________,
       y = __________)
Show answer
ggplot(data = iris,
       mapping = aes(x = Sepal.Length, 
                     y = Sepal.Width,
                     color = Species)) +
  geom_point() +
  labs(title = "Relationship between Sepal Length and Sepal Width",
       x = "Sepal Length",
       y = "Sepal Width")
  1. Choose a different pair of numerical variables from iris and graph another scatter plot, color coding points by Species, and add a text annotation to highlight something interesting or unusual in the plot.
Show answer
ggplot(data = iris,
       mapping = aes(x = Petal.Length, 
                     y = Sepal.Width,
                     color = Species)) +
  geom_point() +
  annotate(geom = "text",
           x = 1.5,
           y = 2.2,
           label = "a tiny setosa")
# b) scatterplot, coloring (color) points by Species,
# adding an annotation
ggplot(data = ______,
       mapping = aes(___ = _____________, 
                     ___ = _____________,
                     _____ = _____________)) +
  geom____________() +
  annotate(geom = "text",
           x = __________,
           y = __________,
           label = ___________)
  1. Choose one of your previous two scatter plots and re-graph it but this time using a ggplot theme that it’s different from the default one. Also, add labels and the annotation.
Show answer
ggplot(data = iris,
       mapping = aes(x = Petal.Length, 
                     y = Sepal.Width,
                     color = Species)) +
  geom_point() +
  labs(title = "Relationship between Petal Length and Sepal Width",
       x = "Sepal Length",
       y = "Sepal Width") +
  annotate(geom = "text",
           x = 1.5,
           y = 2.2,
           label = "a tiny setosa") +
  theme_minimal()
# c) yet another scatterplot
ggplot(data = ______,
       mapping = aes(________)) +
  geom________() +
  labs(_____) +
  annotate(_____) +
  theme________()

4) More Questions

  1. Are the following lines of code equivalent (i.e. give you the same plot)?
ggplot(iris, aes(x = Petal.Length, y = Petal.Width, color = Species)) + 
  geom_point()

ggplot(iris, aes(x = Petal.Length, y = Petal.Width)) + 
  geom_point(aes(color = Species))
Show answer
# Answer: True
  1. Are the following lines of code equivalent (i.e. give you the same plot)?
ggplot(iris, aes(Petal.Length, Petal.Width)) + 
  geom_point(color = "blue")

ggplot(iris, aes(Petal.Length, Petal.Width, color = "blue")) + 
  geom_point()
Show answer
# Answer: False
  1. Are the following lines of code equivalent?
ggplot(iris, aes(Petal.Length, Petal.Width)) + 
  geom_point()

ggplot(iris) + 
  geom_point(aes(Petal.Length, Petal.Width))
Show answer
# Answer: True
  1. What is the problem with the following code? How can it be fixed?
# scatter plot of Petal.Width, and Petal.Length
ggplot(iris) +
  geom_point(x = Petal.Width, y = Petal.Length)
Show answer
# Answer: the x and y mappings need to be specified inside aes()
  1. What is the problem with the following code? How can it be fixed?
# scatter plot of Petal.Width, and Petal.Length
ggplot(iris, aes(Petal.Length, Petal.Width))
  geom_point(aes(color = Species))
Show answer
# Answer: the plus sign '+' is missing
  1. What is the problem with the following code? How can it be fixed?
# scatter plot of Petal.Width, and Petal.Length
ggplot(iris, aes(x = Petal.Length  y = Petal.Width)) +
  goem_point(aes(color = Species))
Show answer
# Answer: missing comma between arguments 'x' and 'y' inside ggplot()
# misspelled 'goem_point()`