# inputs
= 1000
deposit = 0.02
rate
# amounts at the end of years 1, 2, and 3
= deposit * (1 + rate)
amount1 = amount1 * (1 + rate)
amount2 = amount2 * (1 + rate) amount3
3 Creating Vectors
In the preceding chapter you started learning about the basic properties of vectors. We focused on the common flavors of vectors (e.g. logical, integer, double, and character), reviewed special values (e.g. NULL
, NA
), talked about the length or size of a vector, and we also mentioned that elements in a vector can have names.
Likewise, you have seen two basic ways to create simple vectors:
creation of one-element vectors, e.g.
money = 100
,rate = 0.02
,account = "savings"
creation of more than one-element, yet small, vectors with the combine function
c()
, e.g.years = c(1L, 2L, 3L)
.
In this chapter I discuss two broad topics: 1) a review of various functions and ways to create vectors, and 2) a description of the coercion notion. Why should you learn about various forms of vector creation in R? Because as I said, R is—to a large extent—a vector-based language, and you have to be ready to create multiple kinds of vectors, and to take advantage of some of the mechanisms that R provides for doing this. As for the topic of coercion, you also need to understand the behavior of R when working with vectors of different data types.
3.1 Creating vectors with c()
We’ve seen how to create simple vectors containing just one element (i.e. length-1 vectors)
We’ve also seen the basic use of the combine function c()
to create a vector containing several elements:
# combine amounts in a single vector
= c(amount1, amount2, amount3)
amounts amounts
[1] 1020.000 1040.400 1061.208
The c()
function is one of the primary functions to create vectors of length greater than one. Here’s another example for how to create a vector flavors
with some ice-cream flavors:
<- c("lemon", "vanilla", "chocolate")
flavors
flavors
[1] "lemon" "vanilla" "chocolate"
Basically, you call c()
and you type in the values, separating them by commas.
If your vector has only one element, you don’t need to call the c()
function.
# no need to use c() to create a one-element vector
= c("lemon")
lemon
# instead just do this
= "lemon" lemon
One more thing that you can do when using c()
is to give names to the elements of the created vector. This is done by joining pairs of values of the form: 'name' = value
, where 'name'
is the name given to the value
of an element. For instance, you can create the vector amounts
and give names to each element like this:
# give names to elements when using c()
# (names specified with quotes)
= c(
amounts "year1" = amount1,
"year2" = amount2,
"year3" = amount3)
amounts
year1 year2 year3
1020.000 1040.400 1061.208
As you can tell, the names of each element are specified as character
values: "year1"
, "year2"
, and "year3"
. Interestingly, you can also specify names without quoting them:
# give names to elements when using c()
# (names unquoted)
= c(
amounts2 year1 = amount1,
year2 = amount2,
year3 = amount3)
amounts2
year1 year2 year3
1020.000 1040.400 1061.208
This way of giving names to the elements of a vector can feel a bit surprising, especially to users that have previous programming experience but are new to the R syntax. Personally, I don’t really care about having 2 different—and apparently confusing—ways to give names to elements when using functions like c()
. Having said that, I can perfectly understand the initial shock and confusion that this may cause to non-experienced useRs.
To be consistent with most other languages, and also to play defensively, I tend to recommend quoting the values that are supposed to be the names of the elements in a vector. Again, this is my personal biased suggestion, and it is not a rule by any means.
3.2 Default Vectors
R comes with a set of functions to initialize vectors of a specific data type. The generic function is vector()
but there are also type-specific versions:
vector()
logical()
integer()
double()
andnumeric()
character()
The function vector()
, as the name indicates, lets you create a vector of a given mode
and of a certain length
. By default, vector()
creates a "logical"
vector of length = 0
.
= vector()
log log
logical(0)
length(log)
[1] 0
Notice what happens when you print log
, the output displayed is: logical(0)
. This is the notation that R uses to indicate that a vector is of length zero. The previous call is equivalent to:
vector(mode = "logical", length = 0)
A common question that some useRs have when they encounter things like logical(0)
is “when do you use zero-length vectors”? The quick answer is: you can use zero-length vectors to initialize a vector that will later be populated with more elements. This typically happens when you know that a vector of certain type is needed to store several values, but you don’t know in advance how many elements will be computed.
All the other functions, e.g. logical()
, integer()
, etc, take just one argument length
to indicate the number of elements of the output vector. Keep in mind that the value(s) of the initialized vector cannot be changed:
logical(length = 1)
[1] FALSE
integer(length = 2)
[1] 0 0
double(length = 3)
[1] 0 0 0
character(length = 4)
[1] "" "" "" ""
3.3 Numeric Sequences
A common situation when creating vectors involves creating numeric sequences. If the numeric sequence is short and simple, it could be created with the combine function c()
, for example:
= c(1, 2, 3, 4)
s1 s1
[1] 1 2 3 4
Often, you will have to create less simpler and/or longer sequences. For these purposes there are two useful functions:
the colon operator
":"
the sequence function
seq()
and its siblingsseq.int()
,seq_along()
andseq.len()
3.3.1 Sequences with :
The colon operator :
lets you create numeric sequences by indicating the starting and ending values. For instance, if you want to generate an integer sequence starting at 1 and ending at 10, you use this command:
= 1:10
ints ints
[1] 1 2 3 4 5 6 7 8 9 10
Notice that the colon operator, when used with whole numbers, will produce an integer sequence
typeof(ints)
[1] "integer"
However, when the starting value is not a whole number, then the generated sequence will be of type double
, with one-unit steps. For example:
= 1.5:5.5
dbls dbls
[1] 1.5 2.5 3.5 4.5 5.5
typeof(dbls)
[1] "double"
Run the following commands to see how R generates different sequences:
1.5:5
1.5:5.1
1.5:5.5
1.5:5.9
You can also create a descending sequence by starting with a value on the left-hand side of :
that is greater than the value on the right-hand side:
# descending (reversed) sequence
10:1
[1] 10 9 8 7 6 5 4 3 2 1
# this also applies to negative numbers
-10:-1
[1] -10 -9 -8 -7 -6 -5 -4 -3 -2 -1
3.3.2 Sequences with seq()
The colon operator :
can be very useful but it has its limitations. Its main downside is that the generated sequences are of one-unit steps. But what if you want a sequence with steps different from one-unit? For instance, what if you are interested in something like: 2, 4, 6, 8, ...
?
In addition to the colon operator, R also provides the more generic seq()
function for creating numeric sequences. This function comes with a couple of parameters that let you generate sequences in various forms.
The simplest usage of seq()
involves passing values for the arguments from
(the starting value) and to
(the ending value):
# equivalent to 1:10
seq(from = 1, to = 10)
[1] 1 2 3 4 5 6 7 8 9 10
As you can tell, the sequence is created with one-unit steps. But this can be changed with the by
argument. Say you want steps of two-units, then specify by = 2
:
seq(from = 1, to = 10, by = 2)
[1] 1 3 5 7 9
Now, what if you want a decreasing sequence, for example 10, 9, …, 1? You can also use seq()
to achieve this goal. The starting value from
is 10, the ending value to
is 1, and the step size by
has to be -1
seq(from = 10, to = 1, by = -1)
[1] 10 9 8 7 6 5 4 3 2 1
Sometimes you may be interested in creating a sequence of a specific length. When this is the case, you need to use the length.out
argument. For example, say we want to start with 2, getting the sequence of the first six even numbers. One way to obtain this sequence is with from = 2
, steps of size by = 2
, and a length of length.out = 6
seq(from = 2, length.out = 6, by = 2)
[1] 2 4 6 8 10 12
3.3.3 Sequences with seq_len()
and seq_along()
seq()
comes with sibling functions such as seq.int()
, seq_len()
and seq_along()
. These are more specialized functions than the generic seq()
, and they can be more efficient to generate certain sequences.
The function seq.int()
is designed to generate integer sequences. The difference against seq()
is that seq.int()
is more efficient:
# equivalent to seq(from = 5, to = 10), but more efficient
seq.int(from = 5, to = 10)
[1] 5 6 7 8 9 10
If you want a sequence of consecutive positive integers starting at 1, seq_len()
is your friend:
# equivalent to seq(from = 1, to = 10), but more efficient
seq_len(10)
[1] 1 2 3 4 5 6 7 8 9 10
The third type of sequence function is seq_along()
. This function takes a vector of any length, and it produces a sequence of consecutive positive integers of the same length as the input vector.
= c("savings", "checking", "brokerage", "retirement")
accounts seq_along(accounts)
[1] 1 2 3 4
If the input vector has length zero, then seq_along()
returns zero
= NULL # length(null) is zero
null seq_along(null)
integer(0)
3.4 Replicated Vectors
Some times you need to create vectors containing repeated elements. To do this you can use the function rep()
. This function takes a vector as the main input, and then it optionally takes various arguments: times
, length.out
, and each
that let you control the way in which the elements of the input vector should be repeated.
rep(1, times = 5) # repeat 1 five times
[1] 1 1 1 1 1
rep(c(1, 2), times = 3) # repeat 1 2 three times
[1] 1 2 1 2 1 2
rep(c(1, 2), each = 2) # each element repeated twice
[1] 1 1 2 2
rep(c(1, 2), length.out = 5) # repeat until length of 5
[1] 1 2 1 2 1
Here are two less simple examples:
rep(c(3, 2, 1), times = 3:1)
[1] 3 3 3 2 2 1
rep(c(3, 2, 1), times = 3, each = 2)
[1] 3 3 2 2 1 1 3 3 2 2 1 1 3 3 2 2 1 1
3.5 Coercion
One of the basic properties of vectors that you learned in the preceding chapter is that vectors are atomic objects. This is just the fancy way to say that all the elements of a vector have to be of the same data type. If I show you the following vectors, and ask you about their data types, you should have no problem answering this question:
= c(TRUE, FALSE)
one = 2:4
two = c(11, 22, 33)
three = c("one", "two", "three") four
If you have doubts about the data type of any of the above vectors, recall that you can use typeof()
to get the answer.
But what if I create vectors by mixing elements of different data types? For example:
= c(FALSE, 1L) # logical & integer
uno = c(1L, 2L, 3) # integer & double
dos = c(1, 2, "3") # double & character tres
Enter coercion principles!
Coercion is another fundamental concept that you should learn about vectors. This has to do with the mechanisms that R uses to make sure that all the elements in a vector are of the same data type.
There are two coercion mechanisms or approaches:
implicit coercion rules
explicit coercion functions
3.5.1 Implicit Coercion Rules
Implicit coercion is what R does when we try to combine values of different types into a single vector. Here’s an example:
<- c(TRUE, 1L, 2.0, "three")
mixed mixed
[1] "TRUE" "1" "2" "three"
In this command we are mixing different data types: a logical TRUE
, an integer 1L
, a double 2.0
, and a character "three"
. Now, even though the input values are of different data flavors, R has decided to convert everything into type "character"
. Technically speaking, R has implicitly coerced the values as characters, without asking for our permission and without even letting us know that it did so.
If you are not familiar with implicit coercion rules, you may get an initial impression that R is acting weirdly, in a nonsensical form. The more you get familiar with R, you will notice some interesting coercion patterns. But you don’t need to struggle figuring out what R will do. You just have to remember the following hierarchy:
\[ \mathsf{character > double > integer > logical} \]
Here’s how R works in terms of coercion:
characters have priority over other data types: as long as one element is a character, all other elements are coerced into characters
if a vector has numbers (double and integer) and logicals, double will dominate
finally, when mixing integers and logicals, integers will dominate
Also, when certain operations are applied to certain data types, R may apply its coercion rules. An example of this behavior is when you have a logical vector on which you apply arithmetic operations:
# logical vector
= c(TRUE, FALSE, TRUE)
logs
# addition (creates integers)
= logs + logs
logs2 typeof(logs2)
[1] "integer"
# multiplication (creates doubles)
= logs * 3
logs3 typeof(logs3)
[1] "double"
3.5.2 Explicit Coercion Functions
The other type of coercion mechanism, known as explicit coercion, is done when you explicitly tell R to convert a certain type of vector into a different data type by using explicit coercion functions such as:
as.integer()
as.double()
as.character()
as.logical()
Depending on the type of input vector, and the coercion function, you may achieve what you want, or R may fail to convert things accordingly.
We can take deposit
, which is of type double, and convert it into an integer with no issues:
= as.integer(deposit)
int_deposit int_deposit
[1] 1000
Interestingly, the way an integer
number is displayed is exactly the same as its double
version. To confirm that int_deposit
is indeed of type integer
you can use the is.integer()
function
is.integer(deposit)
[1] FALSE
is.integer(int_deposit)
[1] TRUE
What about trying to convert a character string such as "string"
into an integer? You can try to apply as.integer()
but in this case the attempt is fruitless:
as.integer("string")
Warning: NAs introduced by coercion
[1] NA
3.6 Exercises
1) What is the data type—as returned by typeof()
—of each of the following vectors. Try guessing the data type without running any commands.
x
: wherex <- c(TRUE, FALSE)
y
: wherey <- c(x, 10)
z
: wherez <- c(y, 10, "a")
Show answer
= c(TRUE, FALSE)
x = c(x, 10)
y = c(y, 10, "a")
z
typeof(x)
typeof(y)
typeof(z)
2) What is the data type—as returned by typeof()
—of each of the following vectors. Try guessing the data type without running any commands.
x
: wherex <- c('1', '2', '3', '4')
y
: wherey <- (x == 1)
z
: wherez <- y + 0
w
: wherew <- c(x, "5.5")
yz1
: whereyz1 <- c(y, z, pi)
Show answer
<- c('1', '2', '3', '4')
x <- (x == 1)
y <- y + 0
z <- c(x, "5.5")
w <- c(y, z, pi)
yz1
typeof(x)
typeof(y)
typeof(z)
typeof(w)
typeof(yz1)
3) Consider the data—about so-called Terrestrial planets—provided in the table below. These planets include Mercury, Venus, Earth, and Mars. They are called terrestrial because they are “Earth-like” planets in contrast to the Jovian planets that involve planets similar to Jupiter (i.e. Jupiter, Saturn, Uranus and Neptune). The main characteristics of terrestrial planets is that they are relatively small in size and in mass, with a solid rocky surface, and metals deep in its interior.
planet | gravity | daylength | temp | moons | haswater |
---|---|---|---|---|---|
Mercury | 3.7 | 4222.6 | 167 | 0 | FALSE |
Venus | 8.9 | 2802 | 464 | 0 | FALSE |
Earth | 9.8 | 24 | 15 | 1 | TRUE |
Mars | 3.7 | 24.7 | -65 | 2 | TRUE |
Create vectors for each of the columns in the data table displayed above, according to the following data-type specifications:
planet
: character vectorgravity
: real (i.e. double) vector (\(m/s^2\))daylength
: real (i.e. double) vector (hours)temp
: integer vector (mean temperature in Celsius)moons
: integer vector (number of moons)haswater
: logical vector indicating whether a planet has known bodies of liquid water on its surface
Show answer
= c("Mercry", "Venus", "Earth", "Mars")
planet = c(3.7, 8.9, 9.8, 3.7)
gravity = c(4222.6, 2802, 24, 24.7)
daylength = c(167L, 464L, 15L, -65L)
temp = c(0L, 0L, 1L, 2L)
moons = c(FALSE, FALSE, TRUE, TRUE) haswater
4) Refer to the vectors created in the previous question. Without running any R commands, try to guess the data type—as returned by typeof()
—if you had to create a new vector by combining, i.e. using the function c()
, the following:
planets
withgravity
planets
withtemp
planets
withhaswater
gravity
withdaylength
gravity
withtemp
temp
withmoons
temp
withhaswater
5) Figure out how to use the function seq()
to create the following vector
[1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
Show answer
seq(from = 1, to = 2, by = 0.1)
6) Figure out how to use the function seq()
to create the following vector
[1] 1000 900 800 700 600 500 400 300 200 100
Show answer
seq(from = 1000, to = 100, by = -100)
7) Figure out how to use the colon operator :
to create the following vector
[1] 5 4 3 2 1 0 -1 -2 -3 -4 -5
Show answer
5:-5
8) Figure out how to use the colon operator :
to create the following vector
[1] 9.25 8.25 7.25 6.25 5.25 4.25 3.25 2.25 1.25
Show answer
9.25:1.25
9) Find out how to use the function rep()
and the input vector 1:3
to create the following vector:
[1] 1 1 2 2 3 3
Show answer
rep(1:3, each = 2)
10) Find out how to use the function rep()
and the input vector 1:3
to create the following vector:
[1] 1 2 3 1 2 3
Show answer
rep(1:3, times = 2)
11) Find out how to use the function rep()
and the input vector 1:4
to create the following vector:
[1] 1 2 2 3 3 3 4 4 4 4
Show answer
rep(1:4, each = 1:4)
12) Use the seq()
function to create vectors for each of the following parts, and find their sum()
.
- What is the sum of the first 100 positive odd numbers?
Show answer
= seq(from = 1, by = 2, length.out = 100)
x sum(x)
- Find the sum of the first 64 terms of the arithmetic series: \(3 + 9 + 15 + 21 + \dots\)
Show answer
= seq(from = 3, by = 6, length.out = 64)
x sum(x)
- Find the partial sum of the arithmetic series below: \(7 + 12 + 17 + 22 + \dots + 187\)
Show answer
= seq(from = 7, to = 187, by = 5)
x sum(x)