1 Tables and Tidyverse

The overarching topic in this text involves working with tabular data (i.e. data arranged in rows and columns). The main reason to focus on tables is because tabular data is the most ubiquitous format in which data is handled for most types of analysis.

Hey, what if my data is not in tabular format?

While it is true that not all data sets are stored or organized in tables, in most data analysis projects—sooner or later—you will be handling data in some sort of rectangular structure. Because of this, I firmly believe that the best way to get you started learning about data analysis with R is by getting your hands dirty manipulating tables.

Obviously there are limitations. Not everything that is done in data analysis can be done with tables. But having a solid foundation around data arranged in this format will pay off down your data analysis road.

Installing Tidyverse

Since this text focuses on Tidyverse tools, you will need to install the associated ecosystem of Tidyverse packages. This is very easy to do.

Recall that there are a couple of different ways to install R packages. One common option is to invoke the install.packages() function in R’s console, specifying the name(s) of the package(s), within quotations, and separated by commas, to be installed. Like this:

# run the command below in the console
# (don't include this command in any Rmd or qmd file)
# don't worry too much if you get a warning message
install.packages("tidyverse")

Another option to install a package in RStudio is to do it by using the Packages tab located in the pane that contains other tabs such as Files, Plots, Help, etc. In the Packages tab you can find the “Install” button, click it, and follow the steps to install "tidyverse".

Remember that you only need to install a package once! After a package has been installed in your machine, there is no need to call install.packages() again on the same package. What you should always invoke, in order to use the functions in a package, is the library() function:

# you should include this command in your source file(s)
library(tidyverse)

About loading packages: Another rule to keep in mind is to always load any required packages at the very top of your script files (e.g. .R or .Rmd or .qmd or .Rnw files). Avoid calling the library() function in the middle of a script. Instead, load all the packages before anything else.

A bit about Tidyverse

Tidyverse is not a single package. Instead, it is a collection of packages. This means that when you install "tidyverse", you are actually installing the following packages

  • "ggplot2": for creating plots and graphics

  • "dplyr": for manipulating tables

  • "tidyr": to tidying-up your data

  • "readr": for importing rectangular data

  • "tibble": provides “improved” R data frames

  • "stringr": for string manipulation

  • "forcats": for working with R factors

  • "purrr": for functional programming in R

To learn more about other tidyverse details, visit:

https://tidyverse.tidyverse.org/

Note

We won’t cover all the functionality provided by Tidyverse. Instead, we will focus on "dplyr" and "ggplot2".