set.seed(123)
= runif(n = 3, min = 0, max = 2)
rand rand
[1] 0.5751550 1.5766103 0.8179538
This part introduces the pipe operators, which allow you write function calls in a more human-readable way. This kind of operators can be extremely useful within tidyverse operations that require many steps.
We should note that there are two operators: %>%
and |>
%>%
is the "magrittr"
operator, and it is the oldest of the 2 available pipes in R.
|>
is a more recent operator, and it is now part of "base"
R.
To understand how the pipe operators work, let’s see a simple example. Suppose we want to generate three random numbers—following a uniform distribution—in the interval [0, 2]. We can do this with the function runif()
as follows:
set.seed(123)
= runif(n = 3, min = 0, max = 2)
rand rand
[1] 0.5751550 1.5766103 0.8179538
The set.seed()
function is used to set a random seed so that every time runif()
is invoked, we get the same random numbers (for reproducibility purposes).
As you can tell, runif()
takes three arguments:
n
: number of valuesmin
: lower limit of the distributionmax
: upper limit of the distributionIt turns out that we can use the pipe operators to compute the same output. Here is how:
set.seed(123)
3 |> runif(min = 0, max = 2)
[1] 0.5751550 1.5766103 0.8179538
So what does the pipe do? It allows you to write a function call that takes at least 2 arguments, e.g. f(x, y)
, and express it into the call x |> f(y)
. In other words, you start with the first argument, then the pipe, then the function with the second (and more) arguments.
Keep in mind that the preceding example is extremely basic and it does not show the full potential of the pipe. To better appreciate its capabilities, let’s move on to a more interesting example.
Consider again the table sep2010
sep2010
# A tibble: 8 × 5
name wind pressure category days
<chr> <dbl> <dbl> <chr> <int>
1 Gaston 35 1005 ts 1
2 Hermine 60 989 ts 4
3 Igor 135 924 cat4 13
4 Julia 120 948 cat4 8
5 Karl 110 956 cat3 4
6 Lisa 75 982 cat1 6
7 Matthew 50 998 ts 3
8 Nicole 40 994 ts 1
Say we are interested in the wind speed of hurricanes, and that in addition to having speeds measured in knots we also want them in miles-per-hour (mph) and also in kilometers-per-hour (kph). And not only that, but we want to arrange the hurricanes in table sep10
by wind speed in increasing order.
One option is to do calculations step-by-step, storing the intermediate results in their own data objects.
# manipulation step-by-step
= filter(sep2010, category != "ts")
dat1 = select(dat1, name, wind)
dat2 = mutate(
dat3
dat2,wind_mph = wind * 1.15078,
wind_kph = wind * 1.852)
= arrange(dat3, wind)
dat4 dat4
# A tibble: 4 × 4
name wind wind_mph wind_kph
<chr> <dbl> <dbl> <dbl>
1 Lisa 75 86.3 139.
2 Karl 110 127. 204.
3 Julia 120 138. 222.
4 Igor 135 155. 250.
Another option, if you don’t want to name the intermediate results, requires wrapping the function calls inside each other:
# inside-out style (hard to read)
arrange(
mutate(
select(
filter(sep2010, category != "ts"),
name, wind),wind_mph = wind * 1.15078,
wind_kph = wind * 1.852),
wind)
# A tibble: 4 × 4
name wind wind_mph wind_kph
<chr> <dbl> <dbl> <dbl>
1 Lisa 75 86.3 139.
2 Karl 110 127. 204.
3 Julia 120 138. 222.
4 Igor 135 155. 250.
This is difficult to read because the order of the operations is from inside to out. And it can get particularly ugly when you want to do many operations at once, which doesn’t lead to particularly elegant code: the arguments are a long way away from the function.
To get around this problem, you can use a piper either %>%
or |>
.
As we mentioned, x |> f(y)
turns into f(x, y)
so you can use it to rewrite multiple operations that you can read left-to-right, top-to-bottom:
# manipulation step-by-step
|>
sep2010 filter(category != "ts") |>
select(name, wind) |>
mutate(
wind_mph = wind * 1.15078,
wind_kph = wind * 1.852) |>
arrange(wind)
# A tibble: 4 × 4
name wind wind_mph wind_kph
<chr> <dbl> <dbl> <dbl>
1 Lisa 75 86.3 139.
2 Karl 110 127. 204.
3 Julia 120 138. 222.
4 Igor 135 155. 250.
Notice how convenient the pipe can be in this case. It is easier to read, you can follow the flow of commands in a linear way, and it allows you to write commands in a way that is very close to the mental (or verbal) order in which you are thinking of the operations to be performed.