= 1:6
die
die
[1] 1 2 3 4 5 6
In the preceding chapter we described Game-A, and we were able to derive the probability of winning this game:
\[\begin{align*} Prob(\text{at least one six in 4 rolls}) &= 1 - Prob(\text{no six in 4 rolls}) \\ &= 1 - \left( \frac{5}{6} \right)^4 \\ &= 0.517747 \end{align*}\]
In this example, we are able to find the exact probability associated to the event of “winning Game-A”. However, not all probability problems have an analytical solution. When this is the case, we often rely on computers to implement simulations that allow us to find an approximate solution, and have a better understanding of the systematic long-term patterns that arise in random processes.
To go ahead with our exploration of simulations, let us pretend that the probability of winning Game-A had no analytical solution. Instead, let’s see how to use R for running various simulations to get an idea of the probabilities involved behind Game-A.
Because Game-A involves rolling a die four times, the first step is to create—in a computational sense—an object die, and learn how to roll it.
Perhaps the most straightforward way to create a die
object is with a numeric vector. One option for this is with a numeric sequence as follows:
= 1:6
die
die
[1] 1 2 3 4 5 6
Once we have an object die
, how do we roll it? Rolling a die, computationally speaking, implies getting a sample of size 1 from the elements in vector die
. To do this in R, we can use the function sample()
. This function takes an input vector and draws a random sample—of a given size—from the vector’s elements.
# random seed (for reproducibility purposes)
set.seed(5)
sample(die, size = 1)
[1] 2
Because sample()
is a function that generates random numbers, every time you invoke it you will get a different output.
# another "roll"
sample(die, size = 1)
[1] 3
# one more "roll"
sample(die, size = 1)
[1] 1
To be able to reproduce results with any random generator function, you can use set.seed()
to set the so-called random seed that the algorithms behind sample()
and friends employ to generate those values. By setting this seed, you can obtain the same reproducible output:
# random seed (for reproducibility purposes)
set.seed(5)
sample(die, size = 1)
[1] 2
sample()
What about rolling a pair of dice? We could repeat the sample()
command twice:
set.seed(5)
= sample(die, size = 1)
roll1 = sample(die, size = 1) roll2
To avoid repeating the sample()
command multiple times we can also change the value of the size
argument in sample()
:
set.seed(5)
sample(die, size = 2)
[1] 2 3
The issue, not evident here, is that sample()
–by default—draws samples without replacement. This means that if we want to get a sample of size six, we would get a simple rearrangement of the elements in die
, which may not be what we really want:
set.seed(5)
sample(die, size = 6)
[1] 2 3 1 5 4 6
Even worse, for bigger samples of size 7 or greater, we would run into some limitations:
set.seed(5)
sample(die, size = 7)
Error in sample.int(length(x), size, replace, prob): cannot take a sample larger than the population when 'replace = FALSE'
So, how do we draw samples with replacement? To draw samples with replacement we need to use the replace
argument as follows:
set.seed(5)
sample(die, size = 2, replace = TRUE)
[1] 2 3
So far, so good. We have a mechanism to simulate rolling a die four times:
set.seed(20)
= sample(die, size = 4, replace = TRUE)
four_rolls four_rolls
[1] 6 3 2 1
With an output vector of four rolls, the next step involves determining if any of the numbers is a six. To do this, we use a comparison to see which elements in four_rolls
are equal to 6:
== 6 four_rolls
[1] TRUE FALSE FALSE FALSE
This comparison returns a logical vector, which we can then pass to sum()
in order to count the number of TRUE
values:
= sum(four_rolls == 6)
count_sixes count_sixes
[1] 1
Because all we care about is knowing if there’s at least one six, we can also use the any()
function; here are a couple of examples of what this function does when testing equality to 6:
any(c(6, 3, 2, 1) == 6)
[1] TRUE
any(c(5, 3, 2, 1) == 6)
[1] FALSE
any(c(4, 6, 2, 6) == 6)
[1] TRUE
If any()
returns TRUE
, we win that game. If it returns FALSE
, we lose.
To summarize, here’s some code that allows us to play Game-A once, determining the winning status:
# playing Game-A once
set.seed(20)
= sample(die, size = 4, replace = TRUE)
four_rolls = any(four_rolls)
win win
[1] TRUE