3 Creating Vectors

In the preceding chapter you started learning about the basic properties of vectors. We focused on the common flavors of vectors (e.g. logical, integer, double, and character), reviewed special values (e.g. NULL, NA), talked about the length or size of a vector, and we also mentioned that elements in a vector can have names.

Likewise, you have seen two basic ways to create simple vectors:

creation of one-element vectors, e.g. money = 100, rate = 0.02, account = "savings"
creation of more than one-element, yet small, vectors with the combine function c(), e.g. years = c(1L, 2L, 3L).

In this chapter I discuss two broad topics: 1) a review of various functions and ways to create vectors, and 2) a description of the coercion notion. Why should you learn about various forms of vector creation in R? Because as I said, R is—to a large extent—a vector-based language, and you have to be ready to create multiple kinds of vectors, and to take advantage of some of the mechanisms that R provides for doing this. As for the topic of coercion, you also need to understand the behavior of R when working with vectors of different data types.

3.1 Creating vectors with `c()`

We’ve seen how to create simple vectors containing just one element (i.e. length-1 vectors)

# inputs
deposit = 1000
rate = 0.02

# amounts at the end of years 1, 2, and 3
amount1 = deposit * (1 + rate)
amount2 = amount1 * (1 + rate)
amount3 = amount2 * (1 + rate)

We’ve also seen the basic use of the combine function c() to create a vector containing several elements:

# combine amounts in a single vector
amounts = c(amount1, amount2, amount3)
amounts
> [1] 1020.000 1040.400 1061.208

The c() function is one of the primary functions to create vectors of length greater than one. Here’s another example for how to create a vector flavors with some ice-cream flavors:

flavors <- c("lemon", "vanilla", "chocolate")

flavors
> [1] "lemon"     "vanilla"   "chocolate"

Basically, you call c() and you type in the values, separating them by commas.

If your vector has only one element, you don’t need to call the c() function.

# no need to use c() to create a one-element vector
lemon = c("lemon")

# instead just do this
lemon = "lemon"

One more thing that you can do when using c() is to give names to the elements of the created vector. This is done by joining pairs of values of the form: 'name' = value, where 'name' is the name given to the value of an element. For instance, you can create the vector amounts and give names to each element like this:

# give names to elements when using c()
# (names specified with quotes)
amounts = c(
  "year1" = amount1, 
  "year2" = amount2, 
  "year3" = amount3)

amounts
>    year1    year2    year3 
> 1020.000 1040.400 1061.208

As you can tell, the names of each element are specified as character values: "year1", "year2", and "year3". Interestingly, you can also specify names without quoting them:

# give names to elements when using c()
# (names unquoted)
amounts2 = c(
  year1 = amount1, 
  year2 = amount2, 
  year3 = amount3)

amounts2
>    year1    year2    year3 
> 1020.000 1040.400 1061.208

This way of giving names to the elements of a vector can feel a bit surprising, especially to users that have previous programming experience but are new to the R syntax. Personally, I don’t really care about having 2 different—and apparently confusing—ways to give names to elements when using functions like c(). Having said that, I can perfectly understand the initial shock and confusion that this may cause to non-experienced useRs.

To be consistent with most other languages, and also to play defensively, I tend to recommend quoting the values that are supposed to be the names of the elements in a vector. Again, this is my personal biased suggestion, and it is not a rule by any means.

3.2 Default Vectors

R comes with a set of functions to initialize vectors of a specific data type. The generic function is vector() but there are also type-specific versions:

vector()
logical()
integer()
double() and numeric()
character()

The function vector(), as the name indicates, lets you create a vector of a given mode and of a certain length. By default, vector() creates a "logical" vector of length = 0.

log = vector()
log
> logical(0)

length(log)
> [1] 0

Notice what happens when you print log, the output displayed is: logical(0). This is the notation that R uses to indicate that a vector is of length zero. The previous call is equivalent to:

vector(mode = "logical", length = 0)

A common question that some useRs have when they encounter things like logical(0) is “when do you use zero-length vectors”? The quick answer is: you can use zero-length vectors to initialize a vector that will later be populated with more elements. This typically happens when you know that a vector of certain type is needed to store several values, but you don’t know in advance how many elements will be computed.

All the other functions, e.g. logical(), integer(), etc, take just one argument length to indicate the number of elements of the output vector. Keep in mind that the value(s) of the initialized vector cannot be changed:

logical(length = 1)
> [1] FALSE

integer(length = 2)
> [1] 0 0

double(length = 3)
> [1] 0 0 0

character(length = 4)
> [1] "" "" "" ""

3.3 Numeric Sequences

A common situation when creating vectors involves creating numeric sequences. If the numeric sequence is short and simple, it could be created with the combine function c(), for example:

s1 = c(1, 2, 3, 4)
s1
> [1] 1 2 3 4

Often, you will have to create less simpler and/or longer sequences. For these purposes there are two useful functions:

the colon operator ":"
the sequence function seq() and its siblings seq.int(), seq_along() and seq.len()

3.3.1 Sequences with `:`

The colon operator : lets you create numeric sequences by indicating the starting and ending values. For instance, if you want to generate an integer sequence starting at 1 and ending at 10, you use this command:

ints = 1:10
ints
>  [1]  1  2  3  4  5  6  7  8  9 10

Notice that the colon operator, when used with whole numbers, will produce an integer sequence

typeof(ints)
> [1] "integer"

However, when the starting value is not a whole number, then the generated sequence will be of type double, with one-unit steps. For example:

dbls = 1.5:5.5
dbls
> [1] 1.5 2.5 3.5 4.5 5.5

typeof(dbls)
> [1] "double"

Run the following commands to see how R generates different sequences:

1.5:5
1.5:5.1
1.5:5.5
1.5:5.9

You can also create a descending sequence by starting with a value on the left-hand side of : that is greater than the value on the right-hand side:

# descending (reversed) sequence
10:1
>  [1] 10  9  8  7  6  5  4  3  2  1

# this also applies to negative numbers
-10:-1
>  [1] -10  -9  -8  -7  -6  -5  -4  -3  -2  -1

3.3.2 Sequences with `seq()`

The colon operator : can be very useful but it has its limitations. Its main downside is that the generated sequences are of one-unit steps. But what if you want a sequence with steps different from one-unit? For instance, what if you are interested in something like: 2, 4, 6, 8, ...?

In addition to the colon operator, R also provides the more generic seq() function for creating numeric sequences. This function comes with a couple of parameters that let you generate sequences in various forms.

The simplest usage of seq() involves passing values for the arguments from (the starting value) and to (the ending value):

# equivalent to 1:10
seq(from = 1, to = 10)
>  [1]  1  2  3  4  5  6  7  8  9 10

As you can tell, the sequence is created with one-unit steps. But this can be changed with the by argument. Say you want steps of two-units, then specify by = 2:

seq(from = 1, to = 10, by = 2)
> [1] 1 3 5 7 9

Now, what if you want a decreasing sequence, for example 10, 9, …, 1? You can also use seq() to achieve this goal. The starting value from is 10, the ending value to is 1, and the step size by has to be -1

seq(from = 10, to = 1, by = -1)
>  [1] 10  9  8  7  6  5  4  3  2  1

Sometimes you may be interested in creating a sequence of a specific length. When this is the case, you need to use the length.out argument. For example, say we want to start with 2, getting the sequence of the first six even numbers. One way to obtain this sequence is with from = 2, steps of size by = 2, and a length of length.out = 6

seq(from = 2, length.out = 6, by = 2)
> [1]  2  4  6  8 10 12

3.3.3 Sequences with `seq_len()` and `seq_along()`

seq() comes with sibling functions such as seq.int(), seq_len() and seq_along(). These are more specialized functions than the generic seq(), and they can be more efficient to generate certain sequences.

The function seq.int() is designed to generate integer sequences. The difference against seq() is that seq.int() is more efficient:

# equivalent to seq(from = 5, to = 10), but more efficient
seq.int(from = 5, to = 10)
> [1]  5  6  7  8  9 10

If you want a sequence of consecutive positive integers starting at 1, seq_len() is your friend:

# equivalent to seq(from = 1, to = 10), but more efficient
seq_len(10)
>  [1]  1  2  3  4  5  6  7  8  9 10

The third type of sequence function is seq_along(). This function takes a vector of any length, and it produces a sequence of consecutive positive integers of the same length as the input vector.

accounts = c("savings", "checking", "brokerage", "retirement")
seq_along(accounts)
> [1] 1 2 3 4

If the input vector has length zero, then seq_along() returns zero

null = NULL  # length(null) is zero
seq_along(null)
> integer(0)

3.4 Replicated Vectors

Some times you need to create vectors containing repeated elements. To do this you can use the function rep(). This function takes a vector as the main input, and then it optionally takes various arguments: times, length.out, and each that let you control the way in which the elements of the input vector should be repeated.

rep(1, times = 5)        # repeat 1 five times
> [1] 1 1 1 1 1

rep(c(1, 2), times = 3)  # repeat 1 2 three times
> [1] 1 2 1 2 1 2

rep(c(1, 2), each = 2)   # each element repeated twice
> [1] 1 1 2 2

rep(c(1, 2), length.out = 5)  # repeat until length of 5
> [1] 1 2 1 2 1

Here are two less simple examples:

rep(c(3, 2, 1), times = 3:1)
> [1] 3 3 3 2 2 1

rep(c(3, 2, 1), times = 3, each = 2)
>  [1] 3 3 2 2 1 1 3 3 2 2 1 1 3 3 2 2 1 1

3.5 Coercion

One of the basic properties of vectors that you learned in the preceding chapter is that vectors are atomic objects. This is just the fancy way to say that all the elements of a vector have to be of the same data type. If I show you the following vectors, and ask you about their data types, you should have no problem answering this question:

one = c(TRUE, FALSE)
two = 2:4
three = c(11, 22, 33)
four = c("one", "two", "three")

If you have doubts about the data type of any of the above vectors, recall that you can use typeof() to get the answer.

But what if I create vectors by mixing elements of different data types? For example:

uno = c(FALSE, 1L)    # logical & integer
dos = c(1L, 2L, 3)    # integer & double
tres = c(1, 2, "3")   # double & character

Enter coercion principles!

Coercion is another fundamental concept that you should learn about vectors. This has to do with the mechanisms that R uses to make sure that all the elements in a vector are of the same data type.

There are two coercion mechanisms or approaches:

implicit coercion rules
explicit coercion functions

3.5.1 Implicit Coercion Rules

Implicit coercion is what R does when we try to combine values of different types into a single vector. Here’s an example:

mixed <- c(TRUE, 1L, 2.0, "three")
mixed
> [1] "TRUE"  "1"     "2"     "three"

In this command we are mixing different data types: a logical TRUE, an integer 1L, a double 2.0, and a character "three". Now, even though the input values are of different data flavors, R has decided to convert everything into type "character". Technically speaking, R has implicitly coerced the values as characters, without asking for our permission and without even letting us know that it did so.

If you are not familiar with implicit coercion rules, you may get an initial impression that R is acting weirdly, in a nonsensical form. The more you get familiar with R, you will notice some interesting coercion patterns. But you don’t need to struggle figuring out what R will do. You just have to remember the following hierarchy:

\[ \mathsf{character > double > integer > logical} \]

Here’s how R works in terms of coercion:

characters have priority over other data types: as long as one element is a character, all other elements are coerced into characters
if a vector has numbers (double and integer) and logicals, double will dominate
finally, when mixing integers and logicals, integers will dominate

Also, when certain operations are applied to certain data types, R may apply its coercion rules. An example of this behavior is when you have a logical vector on which you apply arithmetic operations:

# logical vector
logs = c(TRUE, FALSE, TRUE)

# addition (creates integers)
logs2 = logs + logs
typeof(logs2)
> [1] "integer"

# multiplication (creates doubles)
logs3 = logs * 3
typeof(logs3)
> [1] "double"

3.5.2 Explicit Coercion Functions

The other type of coercion mechanism, known as explicit coercion, is done when you explicitly tell R to convert a certain type of vector into a different data type by using explicit coercion functions such as:

as.integer()
as.double()
as.character()
as.logical()

Depending on the type of input vector, and the coercion function, you may achieve what you want, or R may fail to convert things accordingly.

We can take deposit, which is of type double, and convert it into an integer with no issues:

int_deposit = as.integer(deposit)
int_deposit
> [1] 1000

Interestingly, the way an integer number is displayed is exactly the same as its double version. To confirm that int_deposit is indeed of type integer you can use the is.integer() function

is.integer(deposit)
> [1] FALSE
is.integer(int_deposit)
> [1] TRUE

What about trying to convert a character string such as "string" into an integer? You can try to apply as.integer() but in this case the attempt is fruitless:

as.integer("string")
> Warning: NAs introduced by coercion
> [1] NA

3.6 Exercises

1) What is the data type—as returned by typeof()—of each of the following vectors. Try guessing the data type without running any commands.

x: where x <- c(TRUE, FALSE)
y: where y <- c(x, 10)
z: where z <- c(y, 10, "a")

2) What is the data type—as returned by typeof()—of each of the following vectors. Try guessing the data type without running any commands.

x: where x <- c('1', '2', '3', '4')
y: where y <- (x == 1)
z: where z <- y + 0
w: where w <- c(x, "5.5")
yz1: where yz1 <- c(y, z, pi)

3) Consider the data—about so-called Terrestrial planets—provided in the table below. These planets include Mercury, Venus, Earth, and Mars. They are called terrestrial because they are “Earth-like” planets in contrast to the Jovian planets that involve planets similar to Jupiter (i.e. Jupiter, Saturn, Uranus and Neptune). The main characteristics of terrestrial planets is that they are relatively small in size and in mass, with a solid rocky surface, and metals deep in its interior.

planet	gravity	daylength	temp	moons	haswater
Mercury	3.7	4222.6	167	0	FALSE
Venus	8.9	2802	464	0	FALSE
Earth	9.8	24	15	1	TRUE
Mars	3.7	24.7	-65	2	FALSE

Create vectors for each of the columns in the data table displayed above, according to the following data-type specifications:

planet: character vector
gravity: real (i.e. double) vector (\(m/s^2\))
daylength: real (i.e. double) vector (hours)
temp: integer vector (mean temperature in Celsius)
moons: integer vector (number of moons)
haswater: logical vector indicating whether a planet has known bodies of liquid water on its surface

4) Refer to the vectors created in the previous question. Without running any R commands, try to guess the data type—as returned by typeof()—if you had to create a new vector by combining, i.e. using the function c(), the following:

planets with gravity
planets with temp
planets with haswater
gravity with daylength
gravity with temp
temp with moons
temp with haswater

5) Figure out how to use the function seq() to create the following vector

 [1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0

6) Figure out how to use the function seq() to create the following vector

 [1] 1000  900  800  700  600  500  400  300  200  100

7) Figure out how to use the colon operator : to create the following vector

 [1]  5  4  3  2  1  0 -1 -2 -3 -4 -5

8) Figure out how to use the colon operator : to create the following vector

[1] 9.25 8.25 7.25 6.25 5.25 4.25 3.25 2.25 1.25

9) Find out how to use the function rep() and the input vector 1:3 to create the following vector:

[1] 1 1 2 2 3 3

10) Find out how to use the function rep() and the input vector 1:3 to create the following vector:

[1] 1 2 3 1 2 3

11) Find out how to use the function rep() and the input vector 1:4 to create the following vector:

 [1] 1 2 2 3 3 3 4 4 4 4

12) Use the seq() function to create vectors for each of the following parts, and find their sum().

What is the sum of the first 100 positive odd numbers?
Find the sum of the first 64 terms of the arithmetic series: \(3 + 9 + 15 + 21 + \dots\)
Find the partial sum of the arithmetic series below: \(7 + 12 + 17 + 22 + \dots + 187\)