The overarching topic in this text involves working with tabular data (i.e. data arranged in rows and columns). The main reason to focus on tables is because tabular data is the most ubiquitous format in which data is handled for most types of analysis.
Hey, what if my data is not in tabular format?
While it is true that not all data sets are stored or organized in tables, in most data analysis projects—sooner or later—you will be handling data in some sort of rectangular structure. Because of this, I firmly believe that the best way to get you started learning about data analysis with R is by getting your hands dirty manipulating tables.
Obviously there are limitations. Not everything that is done in data analysis can be done with tables. But having a solid foundation around data arranged in this format will pay off down your data analysis road.
Since this text focuses on Tidyverse tools, you will need to install the associated ecosystem of Tidyverse packages. This is very easy to do.
Recall that there are a couple of different ways to install R packages. One
common option is to invoke the
install.packages() function in R’s console,
specifying the name(s) of the package(s), within quotations, and separated
by commas, to be installed. Like this:
# run the command below in the console # (don't include this command in any Rmd or qmd file) # don't worry too much if you get a warning message install.packages("tidyverse")
Another option to install a package in RStudio is to do it by using the
Packages tab located in the pane that contains other tabs such as
Help, etc. In the
Packages tab you can find the “Install” button,
click it, and follow the steps to install
Remember that you only need to install a package once! After a package has been
installed in your machine, there is no need to call
on the same package. What you should always invoke, in order to use the
functions in a package, is the
# you should include this command in your source file(s) library(tidyverse)
About loading packages: Another rule to keep in mind is to always load any
required packages at the very top of your script files (e.g.
.Rnw files). Avoid calling the
library() function in the middle
of a script. Instead, load all the packages before anything else.
Tidyverse is not a single package. Instead, it is a collection of packages.
This means that when you install
"tidyverse", you are actually installing the
"ggplot2": for creating plots and graphics
"dplyr": for manipulating tables
"tidyr": to tidying-up your data
"readr": for importing rectangular data
"tibble": provides “improved” R data frames
"stringr": for string manipulation
"forcats": for working with R factors
"purrr": for functional programming in R
To learn more about other tidyverse details, visit: