4 More About Vectors
In the previous chapter we started the topic of data objects by introducing R vectors and some of their basic properties. In this chapter we continue the discussion of vectors, specifically the notions of vectorization, recycling, and subsetting.
4.1 Motivation: Future Value
Let’s bring back the savings example from the previous chapters: you have $1000 and you decide to deposit this money in a savings account that pays you an annual interest rate of 2%. We’ve already seen how to calculate the amount of money that you would have at the end of the first, second and third years. Let’s now calculate the saved amount for a 10-year period.
How much money will you have at the end of each year during a 10-year period?
To answer this question, we could compute individual amount objects (e.g.
amount1
, amount2
, amount3
, etc) to get the saved amount at the end of
each year. For example:
# inputs
<- 1000
deposit <- 0.02
rate
# amounts at the end of years 1, 2, 3, ..., 10
= deposit * (1 + rate)
amount1 = amount1 * (1 + rate)
amount2 = amount2 * (1 + rate)
amount3 = amount3 * (1 + rate)
amount4 = amount4 * (1 + rate)
amount5 = amount5 * (1 + rate)
amount6 = amount6 * (1 + rate)
amount7 = amount7 * (1 + rate)
amount8 = amount8 * (1 + rate)
amount9 = amount8 * (1 + rate) amount10
The problem with this piece of code is that it is too repetitive, time consuming, boring, and error prone (can you spot the error?). Even worse, imagine if you were interested in computing the amount of your investment for a 20-year or a 30-year or a longer year period?
The good news is that we don’t have to be so repetitive. Before describing what the alternative—and more efficient—approach is, we need to do a bit of algebra.
4.1.1 Future Value Formula
In one year you’ll have:
\[ 1000 \times (1.02) = 1020 \]
In two years you’ll have:
\[ 1000 \times (1.02) \times (1.02) = 1000 \times (1.02)^2 = 1040.4 \]
In three years you’ll have:
\[ 1000 \times (1.02) \times (1.02) \times (1.02) = 1000 \times (1.02)^3 = 1061.208 \]
Do you see a pattern?
If you deposit $1000 at a rate of return \(r\), how much will you have at the end of year \(t\)? The answer is given by the Future Value (FV) formula. In its simplest version, the formula is:
\[ \text{FV} = \text{PV} \times (1 + r)^n \]
\(\text{FV}\) = future value (how much you’ll have)
\(\text{PV}\) = present value (the initial deposit)
\(r\) = rate of return (e.g. annual rate of return)
\(n\) = number of periods (e.g. number of years)
Keep in mind that there are more sophisticated versions of the FV formula. For now, let’s keep things simple and use the above equation.
If you deposit $1000 at a rate of 2%, how much will you have at the end of year 10?
<- 1000
deposit <- 0.02
rate <- 10
year
<- deposit * (1 + rate)^year
amount10
amount10> [1] 1218.994
Using the formula of the Future Value you can directly compute the amount that you would have at the end of the tenth year. But what about calculating the amounts at the end of each year during that time period? Enter vectorization!
4.2 Vectorization
In order to explain what vectorization is, let me first show you the following
R code. Compared to the code snippet above, note that the code below uses a
vector years
containing a numeric sequence from 1 to 10, thanks to the :
(“colon”) operator. This vector years
is then used to play the role of the
exponent in the Future Value formula:
<- 1000
deposit <- 0.02
rate <- 1:10 # vector of years
years
# example of vectorization (or vectorized code)
<- deposit * (1 + rate)^years
amounts
amounts> [1] 1020.000 1040.400 1061.208 1082.432 1104.081 1126.162 1148.686 1171.659
> [9] 1195.093 1218.994
The computed object amounts
is exactly what we are looking for. This vector
contains the saved amounts at the end of each year, from the first year till
the tenth year.
The code used to obtain amounts
is an example of one of the most fundamental
and powerful kinds of operations (computations) in R, and it has its special
name: vectorization, also referred to as vectorized code.
When you write code like this:
= deposit * (1 + rate)^years amounts
we say that your code is vectorized. Technically speaking, this code uses not just vectorization but it also uses something else called recycling, which we will explain in the next section. But let’s describe vectorization first.
So what is vectorization?
Simply put, vectorization means that a given function or operation will be applied to all the elements of one or more vectors, element by element.
Say you want to create a vector log_amounts
by taking the logarithm of
amounts
. All you have to do is apply the log()
function to amounts
:
<- log(amounts) log_amounts
When you create the vector log_amounts
, what you’re doing is applying a
function to a vector, which in turn acts on all the elements of the vector.
Hence the reason why, in R parlance, we call it vectorization.
Most functions that operate with vectors in R are vectorized functions. This means that an action is applied to all elements of the vector without the need to explicitly type commands to traverse all of its values, element by element.
In many other programming languages, you would have to use a set of commands to loop over each element of a vector (or list of numbers) to transform them. But not in R.
Another simple example of vectorization would be the calculation of the square root of all the amounts:
sqrt(amounts)
> [1] 31.93744 32.25523 32.57619 32.90034 33.22771 33.55834 33.89227 34.22951
> [9] 34.57011 34.91410
Be careful. Not every function that takes in a vector is necessarily vectorized.
An example of a function that does not perform vectorization is the mean()
function:
mean(amounts)
> [1] 1116.872
As expected, mean()
returns the average or mean value of all the numeric
values in the vector amounts
. It is not vectorized because it does compute
the mean of every element of the input vector.
So be careful, just because a function does a computation with an input vector, it does not mean that its vectorized. Vectorization happens when the same function or action is applied to every element of a vector.
Why should you care about vectorization?
If you are new to programming, learning about R’s vectorization will be very natural and you won’t stop to think about it too much. If you have some previous programming experience in other languages (e.g. C, python, perl), you know that vectorization does not tend to be a native thing.
Vectorization is essential in R. It saves you from typing many lines of code, and you will exploit vectorization with other useful functions known as the apply family functions (we’ll talk about them later in the book).
4.3 Recycling
Closely related with the concept of vectorization we have the notion of Recycling. To explain recycling let’s see an example.
The values in the vector amounts
are given in dollars, but what if you need
to convert them into values expressed in thousands of dollars?. To convert
from dollars to thousands-of-dollars you just need to divide by 1000; for
example
- 1,000 dollars becomes 1 thousands-dollars
- 10,000 dollars becomes 10 thousands-dollars
- 1 dollar becomes 0.001 thousands-dollars
Here is how to create a new vector thousands
:
<- amounts / 1000
thousands
thousands> [1] 1.020000 1.040400 1.061208 1.082432 1.104081 1.126162 1.148686 1.171659
> [9] 1.195093 1.218994
What you just did (assuming that you did things correctly) is called Recycling, which is what R does when you operate with two (or more) vectors of different length.
To understand this concept, you need to remember that R does not have a data structure for scalars (single numbers). Scalars are in reality vectors of length 1.
The conversion from dollars to thousands-of-dollars requires this operation:
amounts / 1000
. Although it may not be obvious, we are operating with two
vectors of different length: amounts
has 10 elements, whereas 1000
is a
one-element vector. So how does R know what to do in this case?
Well, R uses its recycling principle, which takes the shorter vector (in
this case 1000
) and recycles its content to form a temporary vector that
matches the length of the longer vector (i.e. amounts
).
Another recycling example
Here’s another example of recycling. Saved amounts of elements in an odd number position will be divided by two; values of elements in an even number position will be divided by 10:
<- c(1/2, 1/10)
units <- amounts * units
new_amounts
new_amounts> [1] 510.0000 104.0400 530.6040 108.2432 552.0404 112.6162 574.3428 117.1659
> [9] 597.5463 121.8994
In this piece of code, the elements of units
are recycled (i.e. repeated) as
many times as the number of elements in amounts
.
To achieve the same result without using recycling you would have to create a
vector new_units
(i.e. the values to divide by) of the same length as amounts
.
For example, you could create a vector new_units
with the replicate function
rep()
having ten elements in which those values in odd positions are 1/2
and those values in even positions are 1/10
:
<- rep(c(1/2, 1/10), length.out = length(amounts))
new_units * new_units
amounts > [1] 510.0000 104.0400 530.6040 108.2432 552.0404 112.6162 574.3428 117.1659
> [9] 597.5463 121.8994
4.3.1 Vectorization and Recycling
Let’s bring back the code that uses the Future Value to obtain the vector
amounts
:
= deposit * (1 + rate)^years amounts
Recall that deposit
and rate
are vectors of length 1. And so it is the
number 1
, it is a vector containing just one element. In contrast, years
has 10 elements. This means that R is dealing with four vectors some of which
have different lengths.
In pictures, we have the following diagram:

Figure 4.1: Diagram depicting vectors of different lengths.
How does R take care of this?
The following diagram depicts what R does behind the scenes: R recycles the
shorter vectors to match the length of the longest vector. In this example,
vectors deposit
, rate
, and 1
are the shorter vectors, which are then
recycle to match the length of the longest vector years
. The computation
process is completed with vectorization.
As you can tell, this is an example of vectorization & recycling rules in R.
4.4 Manipulating Vectors: Subsetting
In addition to creating vectors, you should also learn how to do some basic
manipulation of vectors. The most common type of manipulation is called
subsetting, also known as indexing or subscripting, which we use to
extract and also replace elements of a vector (or another R object). To do so,
you use what I like to call bracket notation. This implies using (square)
brackets [ ]
to get access to the elements of a vector.
To subset a vector, you type the name of the vector, followed by an opening and a closing bracket. Inside the brackets you specify an indexing vector which could be a numeric vector, a logical vector, and sometimes a character vector. Let’s see these options in more detail in the following subsections.

Figure 4.2: Use of brackets for subscripting vectors.
4.4.1 Numeric Subsetting
This type of subsetting, as the name indicates, is when the indexing vector consists of a numeric vector with one or more values that correspond to the position(s) of the vector element(s).
The simplest type of numeric subsetting is when we use a single number (which is a vector of length one).
# amount at end of year 1
1]
amounts[> [1] 1020
The numeric indexing vector can have more than one element. For example, if
we want to extract the elements in positions 1, 2 and 3, we could provide a
numeric sequence 1:3
:
# amounts at end of years 1, 2, and 3
1:3]
amounts[> [1] 1020.000 1040.400 1061.208
The numeric positions don’t have to be consecutive numbers. You can also use a vector of non-consecutive numbers:
# amounts at end of years 2 and 4
c(2, 4)]
amounts[> [1] 1040.400 1082.432
Likewise, we can also use a vector with repeated numbers:
# repeated amounts
c(2, 2, 2)]
amounts[> [1] 1040.4 1040.4 1040.4
In addition to the previous subscripting options, we can specify negative numbers to indicate that we want to exclude an element in the associated position:
# exclude 2nd year
-2]
amounts[> [1] 1020.000 1061.208 1082.432 1104.081 1126.162 1148.686 1171.659 1195.093
> [9] 1218.994
# exclude 2nd and 4th years
-c(2, 4)]
amounts[> [1] 1020.000 1061.208 1104.081 1126.162 1148.686 1171.659 1195.093 1218.994
4.4.2 Character Subsetting
Sometimes, you may have a vector with named elements. When this is the case, you can use a character vector—containing one or more of the element names—as the indexing vector.
None of the vectors that we have created so far have named elements. So let’s
see how to do this. One way to give names to the elements of an existing vector
is with the function names()
= amounts[1:3]
amounts3 names(amounts3) = c("y1", "y2", "y3")
amounts3> y1 y2 y3
> 1020.000 1040.400 1061.208
When a vector, like amounts3
, has named elements, we can use those names for
subsetting purposes. Instead of using a numeric vector we use a character
vector. Hence the term character subsetting.
For example, to extract the element in amounts3
that has name "y1"
we pass
this string inside the brackets:
"y1"]
amounts3[> y1
> 1020
To get the elements in amounts3
that have names "y1"
and "y3"
, we can
write
c("y1", "y3")]
amounts3[> y1 y3
> 1020.000 1061.208
And like in the numeric subsetting case, we can also write a command such as:
c("y2", "y2", "y2", "y1")]
amounts3[> y2 y2 y2 y1
> 1040.4 1040.4 1040.4 1020.0
4.4.3 Logical Subsetting
Another type of subsetting is when we use a logical vector as the indexing vector.
Let me show you an example of logical subsetting. In this case, we will use a
logical vector with three elements c(TRUE, FALSE, FALSE)
and pass this
inside the brackets:
c(TRUE, FALSE, FALSE)]
amounts3[> y1
> 1020
As you can tell, the retrieved element in amounts3
is the one associated to
the TRUE
position, whereas those elements associated to the FALSE
values
are excluded. This is how the logical values (in the indexing vector) are used:
TRUE
means inclusionFALSE
means exclusion
So, if we want to extract only the element in the second position, we could write something like this:
c(FALSE, TRUE, FALSE)]
amounts3[> y2
> 1040.4
Now, I have to say that doing logical subsetting in this way is not really
how we tend to use it in practice. In other words, we won’t be providing an
explicit logical vector, typing a bunch of TRUE
’s and FALSE
’s values.
Instead, what we typically do is to provide a command that, when executed by R,
will return a logical vector.
Consider the following example. We create a vector x
, and then we use the
greater than symbol >
to compute a mathematical comparison which in turn
will return a logical vector.
= c(2, 4, 6, 8)
x > 5
x > [1] FALSE FALSE TRUE TRUE
Knowing that x > 5
produces a logical vector in which FALSE
indicates that
the number is less than or equal 5, and TRUE
indicates that the number
is greater than 5, we can write the following command to subset those
elements in x
that are greater than five:
> 5]
x[x > [1] 6 8
This is a simple example of logical subsetting because the indexing vector
is the logical vector that comes from executing the comparison x > 5
.
Here is a less simple example of logical subsetting to extract the elements in
x
that are greater than 3 and less than or equal to 6. This requires
two comparison expressions, x > 3
and x <= 6
, and the use of the logical
AND operator &
to form a compound logical comparison:
> 3 & x <= 6]
x[x > [1] 4 6
4.4.4 Summary of Subsetting
In summary, the things that you can specify inside the brackets are three kind of vectors:
numeric vectors
logical vectors (the length of the logical vector must match the length of the vector to be subset)
character vectors (if the elements have names)
In addition to the brackets []
, some common functions that you can use on
vectors are:
length()
gives the number of valuessort()
sorts the values in increasing or decreasing waysrev()
reverses the valuesunique()
extracts unique elements
length(amounts3)
length(amounts3)]
amounts3[sort(amounts3, decreasing = TRUE)
rev(amounts3)
4.5 Exercises
1) Explain, in your own words, the concept of vectorization a.k.a. vectorized operations in R.
2) Explain, in your own words, the recycling principle in R.
3) Write 2 different R commands to return the first five elements of a
vector x
(assume x
has more than 5 elements).
4) Suppose y <- c(1, 4, 9, 16, 25)
. Write down the R command to return a
vector z
, in which each element of z
is the square root of each element of
the vector y
.
5) Which command will fail to return the first five elements of a vector
x
? (assume x
has more than 5 elements).
x[1:5]
x[c(1,2,3,4,5)]
head(x, n = 5)
x[seq(1, 5)]
x(1:5)
6) For each the following parts, state whether the given function or operator is vectorized, and provide a code example to support your claim.
length()
*
log()
max()
trunc()
7) Consider the following code to obtain vectors name
and mpg
from the
data set mtcars
that contains data about 32 automobiles (1973–74 models).
# vectors name and mpg
= rownames(mtcars)
name = mtcars$mpg mpg
Use seq()
, and bracket notation, to subset (extract):
all the even elements in
name
(i.e. extract positions 2, 4, 6, etc)all the odd elements in
mpg
(i.e. extract positions 1, 3, 5, etc)all the elements in positions multiples of 5 (e.g. extract positions 5, 10, 15, etc) of
mpg
all the even elements in
name
but this time in reverse order; hint therev()
function is your friend.
8) Consider the following code to obtain vectors name
, mpg
and cyl
from the data set mtcars
that contains data about 32 automobiles (1973–74
models).
# vectors name, mpg and cyl
= rownames(mtcars)
name = mtcars$mpg # miles-per-gallon
mpg = mtcars$cyl # number of cylinders cyl
Write commands, using bracket notation, to answer the parts listed below.
You may need to use is.na()
, sum()
, min()
, max()
, which()
,
which.min()
, which.max()
:
name of cars that have a fuel consumption of less than 15 mpg
name of cars that have 6 cylinders
largest mpg value of all cars with 8 cylinders
name of car(s) with mpg equal to the median mpg
name of car(s) with mpg of at most 22, and at least 6 cylinders