6  The Pipe Operators

This part introduces the pipe operators, which allow you write function calls in a more human-readable way. This kind of operators can be extremely useful within tidyverse operations that require many steps.

We should note that there are two operators: %>% and |>

6.1 Basic Piping

To understand how the pipe operators work, let’s see a simple example. Suppose we want to generate three random numbers—following a uniform distribution—in the interval [0, 2]. We can do this with the function runif() as follows:

set.seed(123)
rand = runif(n = 3, min = 0, max = 2)
rand
[1] 0.5751550 1.5766103 0.8179538

The set.seed() function is used to set a random seed so that every time runif() is invoked, we get the same random numbers (for reproducibility purposes).

As you can tell, runif() takes three arguments:

  • n: number of values
  • min: lower limit of the distribution
  • max: upper limit of the distribution

It turns out that we can use the pipe operators to compute the same output. Here is how:

set.seed(123)
3 |> runif(min = 0, max = 2)
[1] 0.5751550 1.5766103 0.8179538

So what does the pipe do? It allows you to write a function call that takes at least 2 arguments, e.g. f(x, y), and express it into the call x |> f(y). In other words, you start with the first argument, then the pipe, then the function with the second (and more) arguments.

Keep in mind that the preceding example is extremely basic and it does not show the full potential of the pipe. To better appreciate its capabilities, let’s move on to a more interesting example.

6.2 The Power of Piping

Consider again the table sep2010

sep2010
# A tibble: 8 × 5
  name     wind pressure category  days
  <chr>   <dbl>    <dbl> <chr>    <int>
1 Gaston     35     1005 ts           1
2 Hermine    60      989 ts           4
3 Igor      135      924 cat4        13
4 Julia     120      948 cat4         8
5 Karl      110      956 cat3         4
6 Lisa       75      982 cat1         6
7 Matthew    50      998 ts           3
8 Nicole     40      994 ts           1

Say we are interested in the wind speed of hurricanes, and that in addition to having speeds measured in knots we also want them in miles-per-hour (mph) and also in kilometers-per-hour (kph). And not only that, but we want to arrange the hurricanes in table sep10 by wind speed in increasing order.

Step-by-step computations

One option is to do calculations step-by-step, storing the intermediate results in their own data objects.

# manipulation step-by-step
dat1 = filter(sep2010, category != "ts")
dat2 = select(dat1, name, wind)
dat3 = mutate(
  dat2,
  wind_mph = wind * 1.15078,
  wind_kph = wind * 1.852)
dat4 = arrange(dat3, wind)
dat4
# A tibble: 4 × 4
  name   wind wind_mph wind_kph
  <chr> <dbl>    <dbl>    <dbl>
1 Lisa     75     86.3     139.
2 Karl    110    127.      204.
3 Julia   120    138.      222.
4 Igor    135    155.      250.

Nested function calls

Another option, if you don’t want to name the intermediate results, requires wrapping the function calls inside each other:

# inside-out style (hard to read)
arrange(
  mutate(
    select(
      filter(sep2010, category != "ts"), 
      name, wind),
    wind_mph = wind * 1.15078,
    wind_kph = wind * 1.852),
  wind)
# A tibble: 4 × 4
  name   wind wind_mph wind_kph
  <chr> <dbl>    <dbl>    <dbl>
1 Lisa     75     86.3     139.
2 Karl    110    127.      204.
3 Julia   120    138.      222.
4 Igor    135    155.      250.

This is difficult to read because the order of the operations is from inside to out. And it can get particularly ugly when you want to do many operations at once, which doesn’t lead to particularly elegant code: the arguments are a long way away from the function.

To get around this problem, you can use a piper either %>% or |>.

6.2.1 The pipe operator

As we mentioned, x |> f(y) turns into f(x, y) so you can use it to rewrite multiple operations that you can read left-to-right, top-to-bottom:

# manipulation step-by-step
sep2010 |> 
  filter(category != "ts") |>
  select(name, wind) |>
  mutate(
    wind_mph = wind * 1.15078,
    wind_kph = wind * 1.852) |>
  arrange(wind)
# A tibble: 4 × 4
  name   wind wind_mph wind_kph
  <chr> <dbl>    <dbl>    <dbl>
1 Lisa     75     86.3     139.
2 Karl    110    127.      204.
3 Julia   120    138.      222.
4 Igor    135    155.      250.

Notice how convenient the pipe can be in this case. It is easier to read, you can follow the flow of commands in a linear way, and it allows you to write commands in a way that is very close to the mental (or verbal) order in which you are thinking of the operations to be performed.