7 Creating Vectors
In the preceding chapter you started learning about the basic properties of
vectors. We focused on the common flavors of vectors (e.g. logical, integer,
double, and character), reviewed special values (e.g. NULL
, NA
), talked
about the length or size of a vector, and we also mentioned that elements in a
vector can have names.
Likewise, you have seen two basic ways to create simple vectors:
creation of one-element vectors, e.g.
money = 100
,rate = 0.02
,account = "savings"
creation of more than one-element, yet small, vectors with the combine function
c()
, e.g.years = c(1L, 2L, 3L)
.
In this chapter I discuss two broad topics: 1) a review of various functions and ways to create vectors, and 2) a description of the coercion notion. Why should you learn about various forms of vector creation in R? Because as I said, R is—to a large extent—a vector-based language, and you have to be ready to create multiple kinds of vectors, and to take advantage of some of the mechanisms that R provides for doing this. As for the topic of coercion, you also need to understand the behavior of R when working with vectors of different data types.
7.1 Creating vectors with c()
We’ve seen how to create simple vectors containing just one element (i.e. length-1 vectors)
# inputs
= 1000
deposit = 0.02
rate
# amounts at the end of years 1, 2, and 3
= deposit * (1 + rate)
amount1 = amount1 * (1 + rate)
amount2 = amount2 * (1 + rate) amount3
We’ve also seen the basic use of the combine function c()
to create a
vector containing several elements:
# combine amounts in a single vector
= c(amount1, amount2, amount3)
amounts
amounts> [1] 1020.000 1040.400 1061.208
The c()
function is one of the primary functions to create vectors of length
greater than one. Here’s another example for how to create a vector flavors
with some ice-cream flavors:
<- c("lemon", "vanilla", "chocolate")
flavors
flavors> [1] "lemon" "vanilla" "chocolate"
Basically, you call c()
and you type in the values, separating them by
commas.
If your vector has only one element, you don’t need to call the c()
function.
# no need to use c() to create a one-element vector
= c("lemon")
lemon
# instead just do this
= "lemon" lemon
One more thing that you can do when using c()
is to give names to the
elements of the created vector. This is done by joining pairs of
values of the form: 'name' = value
, where 'name'
is the name given to the
value
of an element. For instance, you can create the vector
amounts
and give names to each element like this:
# give names to elements when using c()
# (names specified with quotes)
= c(
amounts "year1" = amount1,
"year2" = amount2,
"year3" = amount3)
amounts> year1 year2 year3
> 1020.000 1040.400 1061.208
As you can tell, the names of each element are specified as character
values:
"year1"
, "year2"
, and "year3"
. Interestingly, you can also
specify names without quoting them:
# give names to elements when using c()
# (names unquoted)
= c(
amounts2 year1 = amount1,
year2 = amount2,
year3 = amount3)
amounts2> year1 year2 year3
> 1020.000 1040.400 1061.208
This way of giving names to the elements of a vector can feel a bit surprising,
especially to users that have previous programming experience but are new to
the R syntax. Personally, I don’t really care about having 2 different—and
apparently confusing—ways to give names to elements when using functions like
c()
. Having said that, I can perfectly understand the initial shock and
confusion that this may cause to non-experienced useRs.
To be consistent with most other languages, and also to play defensively, I tend to recommend quoting the values that are supposed to be the names of the elements in a vector. Again, this is my personal biased suggestion, and it is not a rule by any means.
7.2 Default Vectors
R comes with a set of functions to initialize vectors of a specific data type.
The generic function is vector()
but there are also type-specific versions:
vector()
logical()
integer()
double()
andnumeric()
character()
The function vector()
, as the name indicates, lets you create a vector of
a given mode
and of a certain length
. By default, vector()
creates a
"logical"
vector of length = 0
.
= vector()
log
log> logical(0)
length(log)
> [1] 0
Notice what happens when you print log
, the output displayed is: logical(0)
.
This is the notation that R uses to indicate that a vector is of length zero.
The previous call is equivalent to:
vector(mode = "logical", length = 0)
A common question that some useRs have when they encounter things like
logical(0)
is “when do you use zero-length vectors”? The quick answer is:
you can use zero-length vectors to initialize a vector that will later be
populated with more elements. This typically happens when you know that a
vector of certain type is needed to store several values, but you don’t know
in advance how many elements will be computed.
All the other functions, e.g. logical()
, integer()
, etc, take just one
argument length
to indicate the number of elements of the output vector.
Keep in mind that the value(s) of the initialized vector cannot be changed:
logical(length = 1)
> [1] FALSE
integer(length = 2)
> [1] 0 0
double(length = 3)
> [1] 0 0 0
character(length = 4)
> [1] "" "" "" ""
7.3 Numeric Sequences
A common situation when creating vectors involves creating numeric sequences.
If the numeric sequence is short and simple, it could be created with the
combine function c()
, for example:
= c(1, 2, 3, 4)
s1
s1> [1] 1 2 3 4
Often, you will have to create less simpler and/or longer sequences. For these purposes there are two useful functions:
the colon operator
":"
the sequence function
seq()
and its siblingsseq.int()
,seq_along()
andseq.len()
7.3.1 Sequences with :
The colon operator :
lets you create numeric sequences by indicating the
starting and ending values. For instance, if you want to generate an integer
sequence starting at 1 and ending at 10, you use this command:
= 1:10
ints
ints> [1] 1 2 3 4 5 6 7 8 9 10
Notice that the colon operator, when used with whole numbers, will produce an integer sequence
typeof(ints)
> [1] "integer"
However, when the starting value is not a whole number, then the generated
sequence will be of type double
, with one-unit steps. For example:
= 1.5:5.5
dbls
dbls> [1] 1.5 2.5 3.5 4.5 5.5
typeof(dbls)
> [1] "double"
Run the following commands to see how R generates different sequences:
1.5:5
1.5:5.1
1.5:5.5
1.5:5.9
You can also create a descending sequence by starting with a value on the
left-hand side of :
that is greater than the value on the right-hand side:
# descending (reversed) sequence
10:1
> [1] 10 9 8 7 6 5 4 3 2 1
# this also applies to negative numbers
-10:-1
> [1] -10 -9 -8 -7 -6 -5 -4 -3 -2 -1
7.3.2 Sequences with seq()
The colon operator :
can be very useful but it has its limitations. Its main
downside is that the generated sequences are of one-unit steps. But what if
you want a sequence with steps different from one-unit? For instance, what if
you are interested in something like: 2, 4, 6, 8, ...
?
In addition to the colon operator, R also provides the more generic seq()
function for creating numeric sequences. This function comes with a couple of
parameters that let you generate sequences in various forms.
The simplest usage of seq()
involves passing values for the arguments from
(the starting value) and to
(the ending value):
# equivalent to 1:10
seq(from = 1, to = 10)
> [1] 1 2 3 4 5 6 7 8 9 10
As you can tell, the sequence is created with one-unit steps. But this can
be changed with the by
argument. Say you want steps of two-units, then
specify by = 2
:
seq(from = 1, to = 10, by = 2)
> [1] 1 3 5 7 9
Now, what if you want a decreasing sequence, for example 10, 9, …, 1?
You can also use seq()
to achieve this goal. The starting value from
is 10,
the ending value to
is 1, and the step size by
has to be -1
seq(from = 10, to = 1, by = -1)
> [1] 10 9 8 7 6 5 4 3 2 1
Sometimes you may be interested in creating a sequence of a specific length.
When this is the case, you need to use the length.out
argument. For example,
say we want to start with 2, getting the sequence of the first six even numbers.
One way to obtain this sequence is with from = 2
, steps of size by = 2
,
and a length of length.out = 6
seq(from = 2, length.out = 6, by = 2)
> [1] 2 4 6 8 10 12
7.3.3 Sequences with seq_len()
and seq_along()
seq()
comes with sibling functions such as seq.int()
, seq_len()
and
seq_along()
. These are more specialized functions than the generic seq()
,
and they can be more efficient to generate certain sequences.
The function seq.int()
is designed to generate integer sequences. The
difference against seq()
is that seq.int()
is more efficient:
# equivalent to seq(from = 5, to = 10), but more efficient
seq.int(from = 5, to = 10)
> [1] 5 6 7 8 9 10
If you want a sequence of consecutive positive integers starting at 1,
seq_len()
is your friend:
# equivalent to seq(from = 1, to = 10), but more efficient
seq_len(10)
> [1] 1 2 3 4 5 6 7 8 9 10
The third type of sequence function is seq_along()
. This function takes
a vector of any length, and it produces a sequence of consecutive positive
integers of the same length as the input vector.
= c("savings", "checking", "brokerage", "retirement")
accounts seq_along(accounts)
> [1] 1 2 3 4
If the input vector has length zero, then seq_along()
returns zero
= NULL # length(null) is zero
null seq_along(null)
> integer(0)
7.4 Replicated Vectors
Some times you need to create vectors containing repeated elements. To do this
you can use the function rep()
. This function takes a vector as the main
input, and then it optionally takes various arguments: times
, length.out
,
and each
that let you control the way in which the elements of the input
vector should be repeated.
rep(1, times = 5) # repeat 1 five times
> [1] 1 1 1 1 1
rep(c(1, 2), times = 3) # repeat 1 2 three times
> [1] 1 2 1 2 1 2
rep(c(1, 2), each = 2) # each element repeated twice
> [1] 1 1 2 2
rep(c(1, 2), length.out = 5) # repeat until length of 5
> [1] 1 2 1 2 1
Here are two less simple examples:
rep(c(3, 2, 1), times = 3:1)
> [1] 3 3 3 2 2 1
rep(c(3, 2, 1), times = 3, each = 2)
> [1] 3 3 2 2 1 1 3 3 2 2 1 1 3 3 2 2 1 1
7.5 Coercion
One of the basic properties of vectors that you learned in the preceding chapter is that vectors are atomic objects. This is just the fancy way to say that all the elements of a vector have to be of the same data type. If I show you the following vectors, and ask you about their data types, you should have no problem answering this question:
= c(TRUE, FALSE)
one = 2:4
two = c(11, 22, 33)
three = c("one", "two", "three") four
If you have doubts about the data type of any of the above vectors, recall
that you can use typeof()
to get the answer.
But what if I create vectors by mixing elements of different data types? For example:
= c(FALSE, 1L) # logical & integer
uno = c(1L, 2L, 3) # integer & double
dos = c(1, 2, "3") # double & character tres
Enter coercion principles!
Coercion is another fundamental concept that you should learn about vectors. This has to do with the mechanisms that R uses to make sure that all the elements in a vector are of the same data type.
There are two coercion mechanisms or approaches:
implicit coercion rules
explicit coercion functions
7.5.1 Implicit Coercion Rules
Implicit coercion is what R does when we try to combine values of different types into a single vector. Here’s an example:
<- c(TRUE, 1L, 2.0, "three")
mixed
mixed> [1] "TRUE" "1" "2" "three"
In this command we are mixing different data types: a logical TRUE
, an integer
1L
, a double 2.0
, and a character "three"
. Now, even though the input
values are of different data flavors, R has decided to convert everything into
type "character"
. Technically speaking, R has implicitly coerced the
values as characters, without asking for our permission and without even
letting us know that it did so.
If you are not familiar with implicit coercion rules, you may get an initial impression that R is acting weirdly, in a nonsensical form. The more you get familiar with R, you will notice some interesting coercion patterns. But you don’t need to struggle figuring out what R will do. You just have to remember the following hierarchy:
\[ \mathsf{character > double > integer > logical} \]
Here’s how R works in terms of coercion:
characters have priority over other data types: as long as one element is a character, all other elements are coerced into characters
if a vector has numbers (double and integer) and logicals, double will dominate
finally, when mixing integers and logicals, integers will dominate
Also, when certain operations are applied to certain data types, R may apply its coercion rules. An example of this behavior is when you have a logical vector on which you apply arithmetic operations:
# logical vector
= c(TRUE, FALSE, TRUE)
logs
# addition (creates integers)
= logs + logs
logs2 typeof(logs2)
> [1] "integer"
# multiplication (creates doubles)
= logs * 3
logs3 typeof(logs3)
> [1] "double"
7.5.2 Explicit Coercion Functions
The other type of coercion mechanism, known as explicit coercion, is done when you explicitly tell R to convert a certain type of vector into a different data type by using explicit coercion functions such as:
as.integer()
as.double()
as.character()
as.logical()
Depending on the type of input vector, and the coercion function, you may achieve what you want, or R may fail to convert things accordingly.
We can take deposit
, which is of type double, and convert it into an integer
with no issues:
= as.integer(deposit)
int_deposit
int_deposit> [1] 1000
Interestingly, the way an integer
number is displayed is exactly the same
as its double
version. To confirm that int_deposit
is indeed of type
integer
you can use the is.integer()
function
is.integer(deposit)
> [1] FALSE
is.integer(int_deposit)
> [1] TRUE
What about trying to convert a character string such as "string"
into an
integer? You can try to apply as.integer()
but in this case the attempt is
fruitless:
as.integer("string")
> Warning: NAs introduced by coercion
> [1] NA
7.6 Exercises
1) What is the data type—as returned by typeof()
—of each of the
following vectors. Try guessing the data type without running any commands.
x
: wherex <- c(TRUE, FALSE)
y
: wherey <- c(x, 10)
z
: wherez <- c(y, 10, "a")
2) What is the data type—as returned by typeof()
—of each of the
following vectors. Try guessing the data type without running any commands.
x
: wherex <- c('1', '2', '3', '4')
y
: wherey <- (x == 1)
z
: wherez <- y + 0
w
: wherew <- c(x, "5.5")
yz1
: whereyz1 <- c(y, z, pi)
3) Consider the data—about so-called Terrestrial planets—provided in the table below. These planets include Mercury, Venus, Earth, and Mars. They are called terrestrial because they are “Earth-like” planets in contrast to the Jovian planets that involve planets similar to Jupiter (i.e. Jupiter, Saturn, Uranus and Neptune). The main characteristics of terrestrial planets is that they are relatively small in size and in mass, with a solid rocky surface, and metals deep in its interior.
planet | gravity | daylength | temp | moons | haswater |
---|---|---|---|---|---|
Mercury | 3.7 | 4222.6 | 167 | 0 | FALSE |
Venus | 8.9 | 2802 | 464 | 0 | FALSE |
Earth | 9.8 | 24 | 15 | 1 | TRUE |
Mars | 3.7 | 24.7 | -65 | 2 | FALSE |
Create vectors for each of the columns in the data table displayed above, according to the following data-type specifications:
planet
: character vectorgravity
: real (i.e. double) vector (\(m/s^2\))daylength
: real (i.e. double) vector (hours)temp
: integer vector (mean temperature in Celsius)moons
: integer vector (number of moons)haswater
: logical vector indicating whether a planet has known bodies of liquid water on its surface
4) Refer to the vectors created in the previous question. Without running
any R commands, try to guess the data type—as returned by typeof()
—if you
had to create a new vector by combining, using the function c()
, the
following:
combine
planets
withgravity
combine
planets
withtemp
combine
planets
withhaswater
gravity
withdaylength
combine
gravity
withtemp
combine
temp
withmoons
combine
temp
withhaswater
5) How do you use the function seq()
to create the following vector?
[1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
6) Write a command using the function rep()
and the input vector 1:3
to
create the following vector:
[1] 1 1 2 2 3 3
7) Write a command using the function rep()
and the input vector 1:3
to
create the following vector:
[1] 1 2 3 1 2 3
8) Write a command using the function rep()
and the input vector 1:4
to
create the following vector:
[1] 1 2 2 3 3 3 4 4 4 4