4 Vectorized loops with apply()

We finished the last chapter writing code that simulates playing Game-A 10 times. For recap purposes, the implemented code is displayed below:

set.seed(133)

die = 1:6
number_games = 10

# initialize output matrix (to be populated in for-loop)
games = matrix(0, nrow = number_games, ncol = 4)

for (game in 1:number_games) {
  games[game, ] = sample(die, size = 4, replace = TRUE)
}

rownames(games) = paste0("game", 1:number_games)
colnames(games) = paste0("roll", 1:4)
games

The punchline of this piece of code has to do with the for() loop, storing the outputs of each game in the corresponding row of the games matrix.

Additionally, we also wrote a second for() loop to determine whether each game—each row in games—had at least one six; this was done with the any() function, and it’s depicted in the following diagram:

Diagram depicting the application of any() to all the rows of matrix games.

Figure 4.1: Diagram depicting the application of any() to all the rows of matrix games.

4.1 Function apply()

Instead of writing a loop to see which games are wins, and which games are losses, we can take advantage of a very interesting function called apply(), which R users refer to as a vectorized loop function.

As the name indicates, apply() lets you apply a function to the elements of a matrix. The elements of a matrix can be:

  • its rows: MARGIN = 1

  • its columns: MARGIN = 2

  • both (rows & cols): MARGIN = c(1, 2)

For example, say you want to get the sum() of all the elements in each row of games. Here’s how to do that with apply():

# row sum
apply(X = games, MARGIN = 1, FUN = sum)
##  game1  game2  game3  game4  game5  game6  game7  game8  game9 game10 
##     12     13     16     15      7     13     13      8     10     17

We pass three inputs to apply(). The first ingredient is the input matrix, the second ingredient specifies the MARGIN value, and the third ingredient FUN is the function to be applied. MARGIN = 1 means that the function FUN is applied row-by-row.

Here’s another example. Say you want to obtain the product of all the elements in each column of games. This requires specifying MARGIN = 2 and FUN = prod:

# column product
apply(X = games, MARGIN = 2, FUN = prod)
## roll1 roll2 roll3 roll4 
## 38400 19440  5760   720

Or what if you want to get the minimum in each row of games? All you have to do is apply() the min function:

# row minimum
apply(X = games, MARGIN = 1, FUN = min)
##  game1  game2  game3  game4  game5  game6  game7  game8  game9 game10 
##      1      1      2      1      1      2      1      1      1      1

4.2 Anonymous functions and apply()

Sometimes, there is no built-in function to be used for the argument FUN. For instance, say you want to obtain the range of each row, that is, the maximum minus the minimum. R has a range() function but it does not return a single value, it just gives you the min() and the max() of an input vector:

game_1 = c(1, 6, 4, 1)
range(game_1)
## [1] 1 6

If you want the range, you need to compute the max() minus the min()

game_1 = c(1, 6, 4, 1)
range_1 = max(game_1) - min(game_1)
range_1
## [1] 5

Because R does not have a built-in function that returns the range, we need to provide this function to the FUN argument of apply(). When the function to be provided is fairly simple, we can create an anonymous function inside apply(), here’s how we do it:

# row ranges (with anonymous function)
apply(
  X = games, 
  MARGIN = 1, 
  FUN = function(x) max(x) - min(x))
##  game1  game2  game3  game4  game5  game6  game7  game8  game9 game10 
##      5      5      3      5      3      2      5      3      5      5

The reason why the provided function to the argument FUN is called an anonymous function is because the created function has no name.

An alternative option is to first create a function outside apply(), and then pass this function like any other function. This alternative is often preferred when the body of the function to be passed to apply() involves several lines of code.

For example, in the following code chunk we create a function vector_range()—that computes the statistical range—and then we pass this function to apply() in order to get the range in each row of the matrix games:

# auxiliary function to compute range
vector_range = function(x) {
  max(x) - min(x)
}

# row ranges
apply(
  X = games, 
  MARGIN = 1, 
  FUN = vector_range)
##  game1  game2  game3  game4  game5  game6  game7  game8  game9 game10 
##      5      5      3      5      3      2      5      3      5      5

4.2.1 Number of wins with apply()

Let’s go back to the task of finding which games are wins. Because there’s no default function that computes if any() element of a vector is equal to six, we need to create an anonymous function for the FUN argument:

wins = apply(
  X = games, 
  MARGIN = 1,
  FUN = function(x) any(x == 6))

wins
##  game1  game2  game3  game4  game5  game6  game7  game8  game9 game10 
##   TRUE   TRUE  FALSE   TRUE  FALSE  FALSE   TRUE  FALSE   TRUE   TRUE

We can now compute the proportion of wins:

prop_wins = sum(wins) / number_games
prop_wins
## [1] 0.6