12 Introduction

In the previous part of the book (Data Tables), you had your first contact with data tables in R. In particular, you got introduced to basic functions of packages "dplyr" and "ggplot2" that allow you to perform simple manipulation and visualization of tabular data.

Before moving on with the full conceptual framework of data tables, and before teaching you more data analysis skills, we need to provide an introduction to the fundamentals of data objects in R. In order to enjoy and exploit R as our main computational tool, one of the first things you need to learn is about the objects or “containers” R provides to handle data (e.g. vectors, factors, matrices, arrays, and lists).

As we mentioned in chapter What do we mean by data?, every program needs to provide some mechanism to handle and organize data values in a way that we can do computations on them.

Abstract view of data objects in data analysis and programming languages

Figure 12.1: Abstract view of data objects in data analysis and programming languages

In general, programming languages tend to offer two types or levels for handling data:

  • Data Types
  • Data Structures

Data types are the simplest building blocks (integer, real, logical, character). Think of these as the atoms or elementary molecules.

Data structures, also known as data objects, are the containers for several data types. If we think of data types as atoms, then data objects would be like molecules formed by a set of atoms or a set of other molecules.

Programs use a variety of objects for storing data. Among the common names you will find out there we have:

  • lists
  • arrays
  • sets
  • tables
  • dictionaries

A Word of Caution

In this book, we will focus on data objects available in R. But keep in mind that other programs may have objects under the same name, that are intrinsically different from their implementation in R. And also, other languages may have similar objects to those available in R, but with different names.

For example, R and Python have an object called list, but they are completely different creatures. A Python “list” is actually closer to an R “vector”. Likewise, Matlab has objects like matrices and arrays which R also provides, and although each language has its own syntax to manipulate these objects, they have similar behavior.

Much of what we cover in this part of the book may feel like it’s being covered in a vacuum, like if we were just looking at isolated trees instead of the entire forest. If this turns out to be the case for some of you, just put that feeling aside and let yourself go. As we move on to next parts of the book, you should be able to connect the dots, and see how the seemingly isolated trees form a beautiful forest.