# 3 Creating Vectors

In the preceding chapter you started learning about the basic properties of
vectors. We focused on the common flavors of vectors (e.g. logical, integer,
double, and character), reviewed special values (e.g. `NULL`

, `NA`

), talked
about the length or size of a vector, and we also mentioned that elements in a
vector can have names.

Likewise, you have seen two basic ways to create simple vectors:

creation of one-element vectors, e.g.

`money = 100`

,`rate = 0.02`

,`account = "savings"`

creation of more than one-element, yet small, vectors with the combine function

`c()`

, e.g.`years = c(1L, 2L, 3L)`

.

In this chapter I discuss two broad topics: 1) a review of various functions
and ways to **create vectors**, and 2) a description of the **coercion** notion.
Why should you learn about various forms of vector creation in R? Because as I
said, R is—to a large extent—a vector-based language, and you have to be
ready to create multiple kinds of vectors, and to take advantage of some of the
mechanisms that R provides for doing this. As for the topic of coercion, you
also need to understand the behavior of R when working with vectors of different
data types.

## 3.1 Creating vectors with `c()`

We’ve seen how to create simple vectors containing just one element (i.e. length-1 vectors)

```
# inputs
= 1000
deposit = 0.02
rate
# amounts at the end of years 1, 2, and 3
= deposit * (1 + rate)
amount1 = amount1 * (1 + rate)
amount2 = amount2 * (1 + rate) amount3
```

We’ve also seen the basic use of the **combine** function `c()`

to create a
vector containing several elements:

```
# combine amounts in a single vector
= c(amount1, amount2, amount3)
amounts
amounts> [1] 1020.000 1040.400 1061.208
```

The `c()`

function is one of the primary functions to create vectors of length
greater than one. Here’s another example for how to create a vector `flavors`

with some ice-cream flavors:

```
<- c("lemon", "vanilla", "chocolate")
flavors
flavors> [1] "lemon" "vanilla" "chocolate"
```

Basically, you call `c()`

and you type in the values, separating them by
commas.

If your vector has only one element, you don’t need to call the `c()`

function.

```
# no need to use c() to create a one-element vector
= c("lemon")
lemon
# instead just do this
= "lemon" lemon
```

One more thing that you can do when using `c()`

is to give names to the
elements of the created vector. This is done by joining pairs of
values of the form: `'name' = value`

, where `'name'`

is the name given to the
`value`

of an element. For instance, you can create the vector
`amounts`

and give names to each element like this:

```
# give names to elements when using c()
# (names specified with quotes)
= c(
amounts "year1" = amount1,
"year2" = amount2,
"year3" = amount3)
amounts> year1 year2 year3
> 1020.000 1040.400 1061.208
```

As you can tell, the names of each element are specified as `character`

values:
`"year1"`

, `"year2"`

, and `"year3"`

. Interestingly, you can also
specify names without quoting them:

```
# give names to elements when using c()
# (names unquoted)
= c(
amounts2 year1 = amount1,
year2 = amount2,
year3 = amount3)
amounts2> year1 year2 year3
> 1020.000 1040.400 1061.208
```

This way of giving names to the elements of a vector can feel a bit surprising,
especially to users that have previous programming experience but are new to
the R syntax. Personally, I don’t really care about having 2 different—and
apparently confusing—ways to give names to elements when using functions like
`c()`

. Having said that, I can perfectly understand the initial shock and
confusion that this may cause to non-experienced useRs.

To be consistent with most other languages, and also to play defensively, I tend to recommend quoting the values that are supposed to be the names of the elements in a vector. Again, this is my personal biased suggestion, and it is not a rule by any means.

## 3.2 Default Vectors

R comes with a set of functions to initialize vectors of a specific data type.
The generic function is `vector()`

but there are also type-specific versions:

`vector()`

`logical()`

`integer()`

`double()`

and`numeric()`

`character()`

The function `vector()`

, as the name indicates, lets you create a vector of
a given `mode`

and of a certain `length`

. By default, `vector()`

creates a
`"logical"`

vector of `length = 0`

.

```
= vector()
log
log> logical(0)
length(log)
> [1] 0
```

Notice what happens when you print `log`

, the output displayed is: `logical(0)`

.
This is the notation that R uses to indicate that a vector is of length zero.
The previous call is equivalent to:

`vector(mode = "logical", length = 0)`

A common question that some useRs have when they encounter things like
`logical(0)`

is “when do you use zero-length vectors”? The quick answer is:
you can use zero-length vectors to initialize a vector that will later be
populated with more elements. This typically happens when you know that a
vector of certain type is needed to store several values, but you don’t know
in advance how many elements will be computed.

All the other functions, e.g. `logical()`

, `integer()`

, etc, take just one
argument `length`

to indicate the number of elements of the output vector.
Keep in mind that the value(s) of the initialized vector cannot be changed:

```
logical(length = 1)
> [1] FALSE
integer(length = 2)
> [1] 0 0
double(length = 3)
> [1] 0 0 0
character(length = 4)
> [1] "" "" "" ""
```

## 3.3 Numeric Sequences

A common situation when creating vectors involves creating numeric sequences.
If the numeric sequence is short and simple, it could be created with the
combine function `c()`

, for example:

```
= c(1, 2, 3, 4)
s1
s1> [1] 1 2 3 4
```

Often, you will have to create less simpler and/or longer sequences. For these purposes there are two useful functions:

the colon operator

`":"`

the sequence function

`seq()`

and its siblings`seq.int()`

,`seq_along()`

and`seq.len()`

### 3.3.1 Sequences with `:`

The colon operator `:`

lets you create numeric sequences by indicating the
starting and ending values. For instance, if you want to generate an integer
sequence starting at 1 and ending at 10, you use this command:

```
= 1:10
ints
ints> [1] 1 2 3 4 5 6 7 8 9 10
```

Notice that the colon operator, when used with whole numbers, will produce an integer sequence

```
typeof(ints)
> [1] "integer"
```

However, when the starting value is not a whole number, then the generated
sequence will be of type `double`

, with one-unit steps. For example:

```
= 1.5:5.5
dbls
dbls> [1] 1.5 2.5 3.5 4.5 5.5
typeof(dbls)
> [1] "double"
```

Run the following commands to see how R generates different sequences:

```
1.5:5
1.5:5.1
1.5:5.5
1.5:5.9
```

You can also create a **descending** sequence by starting with a value on the
left-hand side of `:`

that is greater than the value on the right-hand side:

```
# descending (reversed) sequence
10:1
> [1] 10 9 8 7 6 5 4 3 2 1
```

```
# this also applies to negative numbers
-10:-1
> [1] -10 -9 -8 -7 -6 -5 -4 -3 -2 -1
```

### 3.3.2 Sequences with `seq()`

The colon operator `:`

can be very useful but it has its limitations. Its main
downside is that the generated sequences are of one-unit steps. But what if
you want a sequence with steps different from one-unit? For instance, what if
you are interested in something like: `2, 4, 6, 8, ...`

?

In addition to the colon operator, R also provides the more generic `seq()`

function for creating numeric sequences. This function comes with a couple of
parameters that let you generate sequences in various forms.

The simplest usage of `seq()`

involves passing values for the arguments `from`

(the starting value) and `to`

(the ending value):

```
# equivalent to 1:10
seq(from = 1, to = 10)
> [1] 1 2 3 4 5 6 7 8 9 10
```

As you can tell, the sequence is created with one-unit steps. But this can
be changed with the `by`

argument. Say you want steps of two-units, then
specify `by = 2`

:

```
seq(from = 1, to = 10, by = 2)
> [1] 1 3 5 7 9
```

Now, what if you want a decreasing sequence, for example 10, 9, …, 1?
You can also use `seq()`

to achieve this goal. The starting value `from`

is 10,
the ending value `to`

is 1, and the step size `by`

has to be `-1`

```
seq(from = 10, to = 1, by = -1)
> [1] 10 9 8 7 6 5 4 3 2 1
```

Sometimes you may be interested in creating a sequence of a specific length.
When this is the case, you need to use the `length.out`

argument. For example,
say we want to start with 2, getting the sequence of the first six even numbers.
One way to obtain this sequence is with `from = 2`

, steps of size `by = 2`

,
and a length of `length.out = 6`

```
seq(from = 2, length.out = 6, by = 2)
> [1] 2 4 6 8 10 12
```

### 3.3.3 Sequences with `seq_len()`

and `seq_along()`

`seq()`

comes with sibling functions such as `seq.int()`

, `seq_len()`

and
`seq_along()`

. These are more specialized functions than the generic `seq()`

,
and they can be more efficient to generate certain sequences.

The function `seq.int()`

is designed to generate integer sequences. The
difference against `seq()`

is that `seq.int()`

is more efficient:

```
# equivalent to seq(from = 5, to = 10), but more efficient
seq.int(from = 5, to = 10)
> [1] 5 6 7 8 9 10
```

If you want a sequence of consecutive positive integers starting at 1,
`seq_len()`

is your friend:

```
# equivalent to seq(from = 1, to = 10), but more efficient
seq_len(10)
> [1] 1 2 3 4 5 6 7 8 9 10
```

The third type of sequence function is `seq_along()`

. This function takes
a vector of any length, and it produces a sequence of consecutive positive
integers of the same length as the input vector.

```
= c("savings", "checking", "brokerage", "retirement")
accounts seq_along(accounts)
> [1] 1 2 3 4
```

If the input vector has length zero, then `seq_along()`

returns zero

```
= NULL # length(null) is zero
null seq_along(null)
> integer(0)
```

## 3.4 Replicated Vectors

Some times you need to create vectors containing repeated elements. To do this
you can use the function `rep()`

. This function takes a vector as the main
input, and then it optionally takes various arguments: `times`

, `length.out`

,
and `each`

that let you control the way in which the elements of the input
vector should be repeated.

```
rep(1, times = 5) # repeat 1 five times
> [1] 1 1 1 1 1
rep(c(1, 2), times = 3) # repeat 1 2 three times
> [1] 1 2 1 2 1 2
rep(c(1, 2), each = 2) # each element repeated twice
> [1] 1 1 2 2
rep(c(1, 2), length.out = 5) # repeat until length of 5
> [1] 1 2 1 2 1
```

Here are two less simple examples:

```
rep(c(3, 2, 1), times = 3:1)
> [1] 3 3 3 2 2 1
rep(c(3, 2, 1), times = 3, each = 2)
> [1] 3 3 2 2 1 1 3 3 2 2 1 1 3 3 2 2 1 1
```

## 3.5 Coercion

One of the basic properties of vectors that you learned in the preceding chapter
is that vectors are *atomic* objects. This is just the fancy way to say that all
the elements of a vector have to be of the same data type. If I show you the
following vectors, and ask you about their data types, you should have no
problem answering this question:

```
= c(TRUE, FALSE)
one = 2:4
two = c(11, 22, 33)
three = c("one", "two", "three") four
```

If you have doubts about the data type of any of the above vectors, recall
that you can use `typeof()`

to get the answer.

But what if I create vectors by mixing elements of different data types? For example:

```
= c(FALSE, 1L) # logical & integer
uno = c(1L, 2L, 3) # integer & double
dos = c(1, 2, "3") # double & character tres
```

Enter **coercion** principles!

Coercion is another fundamental concept that you should learn about vectors. This has to do with the mechanisms that R uses to make sure that all the elements in a vector are of the same data type.

There are two coercion mechanisms or approaches:

implicit coercion rules

explicit coercion functions

### 3.5.1 Implicit Coercion Rules

**Implicit coercion** is what R does when we try to combine values of different
types into a single vector. Here’s an example:

```
<- c(TRUE, 1L, 2.0, "three")
mixed
mixed> [1] "TRUE" "1" "2" "three"
```

In this command we are mixing different data types: a logical `TRUE`

, an integer
`1L`

, a double `2.0`

, and a character `"three"`

. Now, even though the input
values are of different data flavors, R has decided to convert everything into
type `"character"`

. Technically speaking, R has **implicitly coerced** the
values as characters, without asking for our permission and without even
letting us know that it did so.

If you are not familiar with implicit coercion rules, you may get an initial impression that R is acting weirdly, in a nonsensical form. The more you get familiar with R, you will notice some interesting coercion patterns. But you don’t need to struggle figuring out what R will do. You just have to remember the following hierarchy:

\[ \mathsf{character > double > integer > logical} \]

Here’s how R works in terms of coercion:

characters have priority over other data types: as long as one element is a character, all other elements are coerced into characters

if a vector has numbers (double and integer) and logicals, double will dominate

finally, when mixing integers and logicals, integers will dominate

Also, when certain operations are applied to certain data types, R may apply its coercion rules. An example of this behavior is when you have a logical vector on which you apply arithmetic operations:

```
# logical vector
= c(TRUE, FALSE, TRUE)
logs
# addition (creates integers)
= logs + logs
logs2 typeof(logs2)
> [1] "integer"
# multiplication (creates doubles)
= logs * 3
logs3 typeof(logs3)
> [1] "double"
```

### 3.5.2 Explicit Coercion Functions

The other type of coercion mechanism, known as **explicit coercion**, is done
when you explicitly tell R to convert a certain type of vector into a different
data type by using explicit coercion functions such as:

`as.integer()`

`as.double()`

`as.character()`

`as.logical()`

Depending on the type of input vector, and the coercion function, you may achieve what you want, or R may fail to convert things accordingly.

We can take `deposit`

, which is of type double, and convert it into an integer
with no issues:

```
= as.integer(deposit)
int_deposit
int_deposit> [1] 1000
```

Interestingly, the way an `integer`

number is displayed is exactly the same
as its `double`

version. To confirm that `int_deposit`

is indeed of type
`integer`

you can use the `is.integer()`

function

```
is.integer(deposit)
> [1] FALSE
is.integer(int_deposit)
> [1] TRUE
```

What about trying to convert a character string such as `"string"`

into an
integer? You can try to apply `as.integer()`

but in this case the attempt is
fruitless:

```
as.integer("string")
> Warning: NAs introduced by coercion
> [1] NA
```

## 3.6 Exercises

**1)** What is the data type—as returned by `typeof()`

—of each of the
following vectors. Try guessing the data type without running any commands.

`x`

: where`x <- c(TRUE, FALSE)`

`y`

: where`y <- c(x, 10)`

`z`

: where`z <- c(y, 10, "a")`

**2)** What is the data type—as returned by `typeof()`

—of each of the
following vectors. Try guessing the data type without running any commands.

`x`

: where`x <- c('1', '2', '3', '4')`

`y`

: where`y <- (x == 1)`

`z`

: where`z <- y + 0`

`w`

: where`w <- c(x, "5.5")`

`yz1`

: where`yz1 <- c(y, z, pi)`

**3)** Consider the data—about so-called **Terrestrial** planets—provided
in the table below. These planets include Mercury, Venus, Earth, and Mars. They
are called terrestrial because they are “Earth-like” planets in contrast to the
**Jovian** planets that involve planets similar to Jupiter (i.e. Jupiter,
Saturn, Uranus and Neptune). The main characteristics of terrestrial planets is that they are relatively small in size and in mass, with a solid rocky surface,
and metals deep in its interior.

planet | gravity | daylength | temp | moons | haswater |
---|---|---|---|---|---|

Mercury | 3.7 | 4222.6 | 167 | 0 | FALSE |

Venus | 8.9 | 2802 | 464 | 0 | FALSE |

Earth | 9.8 | 24 | 15 | 1 | TRUE |

Mars | 3.7 | 24.7 | -65 | 2 | FALSE |

Create vectors for each of the columns in the data table displayed above, according to the following data-type specifications:

`planet`

: character vector`gravity`

: real (i.e. double) vector (\(m/s^2\))`daylength`

: real (i.e. double) vector (hours)`temp`

: integer vector (mean temperature in Celsius)`moons`

: integer vector (number of moons)`haswater`

: logical vector indicating whether a planet has known bodies of liquid water on its surface

**4)** Refer to the vectors created in the previous question. Without running
any R commands, try to guess the data type—as returned by `typeof()`

—if you
had to create a new vector by combining, i.e. using the function `c()`

, the
following:

`planets`

with`gravity`

`planets`

with`temp`

`planets`

with`haswater`

`gravity`

with`daylength`

`gravity`

with`temp`

`temp`

with`moons`

`temp`

with`haswater`

**5)** Figure out how to use the function `seq()`

to create the following vector

` [1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0`

**6)** Figure out how to use the function `seq()`

to create the following vector

` [1] 1000 900 800 700 600 500 400 300 200 100`

**7)** Figure out how to use the colon operator `:`

to create the following vector

` [1] 5 4 3 2 1 0 -1 -2 -3 -4 -5`

**8)** Figure out how to use the colon operator `:`

to create the following vector

`[1] 9.25 8.25 7.25 6.25 5.25 4.25 3.25 2.25 1.25`

**9)** Find out how to use the function `rep()`

and the input vector `1:3`

to
create the following vector:

`[1] 1 1 2 2 3 3`

**10)** Find out how to use the function `rep()`

and the input vector `1:3`

to
create the following vector:

`[1] 1 2 3 1 2 3`

**11)** Find out how to use the function `rep()`

and the input vector `1:4`

to
create the following vector:

` [1] 1 2 2 3 3 3 4 4 4 4`

**12)** Use the `seq()`

function to create vectors for each of the following
parts, and find their `sum()`

.

What is the sum of the first 100 positive odd numbers?

Find the sum of the first 64 terms of the arithmetic series: \(3 + 9 + 15 + 21 + \dots\)

Find the partial sum of the arithmetic series below: \(7 + 12 + 17 + 22 + \dots + 187\)