26 Simulations (part 1)
In this chapter we are going to write code to simulate investing a lump-sum
in the stock market. We will use some functions from the collection of
"tidyverse"
packages.
# required package
library(tidyverse)
26.1 Motivation
We are going to consider investing in a total stock market index fund which is a financial instrument that tracks the performance of the U.S. stock market. This type of instrument typically takes the form of a mutual fund or an exchange-traded fund (ETF) that invests in a basket of stocks that passively tracks the performance of the U.S. stock market.
26.1.1 Investing a Lump-sum
Say you are going to start your investment with a lump-sum of $1000 in a U.S. total stock market index fund. In theory, this is money that you don’t plan to use within the next 5 years, and you are willing to invest it for the long run (10 years or more). How much money would you expect to get at the end of a 10-year period? Or after 15 years? Or after 20 years? Or after a longer period?
26.1.2 Example with Future Value
Suppose you want to invest $1000 in a financial product (e.g. Money Market, Certificate of Deposit, Mutual Fund, ETF) that has a hypothetical annual return rate of 5%. Assume that you will keep this investment for 10 years. How much money would you expect to get at the end of this period?
To answer this question, you can use the Future Value formula, assuming compound interest.
Here are the ingredients for the Future Value formula:
- \(C_0\) = cash flow at period 0 (i.e. how much you invest)
- \(r\) = rate of return (in a given period)
- \(n\) = number of periods
- \(\text{FV}\) = future value (i.e. what you’ll get)
\[ \text{FV} = C_0 (1 + r)^n \]
One of the critical ingredients to calculate the future of your investment is the annual rate of return \(r\). The issue is that annual rates of return when investing in the stock market fluctuate every year—because the market is volatile. Consequently, the exact answer to the above question is impossible to know.
However, we are going to make some assumptions, and run some simulations to get an idea of the behavior for this type of investment. For the sake of simplicity, we will ignore inflation.
26.2 Variable rates of return and volatility
For simulation purposes, we are going to assume a probability distribution on the rates of return. More specifically, we are going to assume that annual rates of return \(r_y\) follow a normal distribution with a certain mean \(\mu\) and a certain standard deviation \(\sigma\).
\[ \text{rate}_y \sim \mathcal{N}(\mu = \text{avg rate}, \hspace{1mm} \sigma = \text{avg volatility}) \]
Which \(\mu\) and which \(\sigma\)? We could look for historical data and use an average annual rate of return for \(\mu\), as well as an average annual volatility for \(\sigma\)
average annual return: \(\mu = 10\%\)
average annual volatility: \(\sigma = 18\%\)
The following diagram illustrates the flow of compounding for a certain invested amount \(amt_0\). Observe that a random rate of return \(t_t\) is generated for each year \(t = 1, 2, 3, ...\)

Figure 26.1: Random rates of return following a normal distribution
26.2.1 Example: Running 3 simulations
One the goals of this project is to run various simulations that give you a theoretical idea for the behavior of this investment.
Here’s a toy example with the annual balance of 3 simulations (see table below).
The first row (0) shows the
initial investment amount of $1000. Each year $360 are contributed at the end
of the year. The last row (6) shows the balance of the investment at the end
of year 5. Column 1 corresponds to the 1st simulation, column 2 to the 2nd
simulation, etc. Rates of return for each year are generated via rnorm()
in
such a way that \(r_y \sim N(\mu = 10\%, \sigma = 18\%)\)
<- 1000
initial_amount <- 5
years <- 3
repetitions <- 0.10
annual_return <- 0.18
annual_volatility
set.seed(12345)
<- matrix(0, nrow = years+1, ncol = repetitions)
balances
for (rept in 1:repetitions) {
<- rep(initial_amount, years+1)
balance <- rnorm(years, mean = annual_return, sd = annual_volatility)
return_rates for (y in 1:years) {
+1] <- balance[y] * (1 + return_rates[y])^1
balance[y
}<- balance
balances[ ,rept]
}
# adding row-and-column names
rownames(balances) <- 0:years
colnames(balances) <- paste0("sim", 1:repetitions)
# converting into data.frame (for tidyverse convenience)
= as.data.frame(balances)
balances $year = 0:years
balances balances
> sim1 sim2 sim3 year
> 0 1000.000 1000.0000 1000.000 0
> 1 1205.395 772.7679 1079.075 1
> 2 1479.868 937.6903 1539.966 2
> 3 1598.739 984.8438 1796.698 3
> 4 1628.109 1032.9547 2144.609 4
> 5 1968.481 965.3189 2069.342 5
26.2.2 Reshaping the table of simulations
To graph timelines with "ggplot2"
, it is more convenient to arrange the
content in what is called “tall” format. One option to do this is with the
melt()
function from package "reshape2"
. Assuming that the data displayed
above is in an object balances
, we can melt it as follows:
= pivot_longer(
sim_dat
balances, cols = starts_with("sim"),
names_to = c("simulation"),
values_to = "amount")
sim_dat> # A tibble: 18 × 3
> year simulation amount
> <int> <chr> <dbl>
> 1 0 sim1 1000
> 2 0 sim2 1000
> 3 0 sim3 1000
> 4 1 sim1 1205.
> 5 1 sim2 773.
> 6 1 sim3 1079.
> 7 2 sim1 1480.
> 8 2 sim2 938.
> 9 2 sim3 1540.
> 10 3 sim1 1599.
> 11 3 sim2 985.
> 12 3 sim3 1797.
> 13 4 sim1 1628.
> 14 4 sim2 1033.
> 15 4 sim3 2145.
> 16 5 sim1 1968.
> 17 5 sim2 965.
> 18 5 sim3 2069.
26.2.3 Graphing the Simulations
Having the data in this “tall’ format, we can easily graph it with ggplot()
and friends (btw: by now you should be abe to figure out the ggplot()
command
that produces the following graph):
ggplot(data = sim_dat, aes(x = year, y = amount)) +
geom_line(aes(group = simulation), alpha = 0.5) +
theme_minimal() +
geom_hline(yintercept = 0, color = "tomato") +
xlab("Years") +
ylab("Portfolio balance")
26.3 Running 100 simulations
What if we decide to increase the number of simulations to say 100? Here’s what could happen.
<- 1000
initial_amount <- 10
years <- 100
repetitions <- 0.10
annual_return <- 0.18
annual_volatility
set.seed(12345)
<- matrix(0, nrow = years+1, ncol = repetitions)
balances
for (rept in 1:repetitions) {
<- rep(initial_amount, years+1)
balance <- rnorm(years, mean = annual_return, sd = annual_volatility)
return_rates for (y in 1:years) {
+1] <- balance[y] * (1 + return_rates[y])^1
balance[y
}<- balance
balances[ ,rept]
}
# adding row-and-column names
rownames(balances) <- 0:years
colnames(balances) <- paste0("sim", 1:repetitions)
# converting into data.frame (for tidyverse convenience)
= as.data.frame(balances)
balances $year = 0:years
balances
= pivot_longer(
sim_dat as.data.frame(balances),
cols = starts_with("sim"),
names_to = c("simulation"),
values_to = "amount")
And then graph them:
In an attempt to provide a better way to observe the general behavior of the simulations, we can add reference lines for the 10-th and 90-th percentiles (colored in yellow), as well as the median line (blue line).
26.4 More Summary Statistics
Something of interest, could be the computation of numeric summaries that help us make sense of the obtained results from 100 simulations. For instance, we could calculate some quantiles of the portfolio balance along the retirement period.
We can then compute statistics
# average and median returns
<- apply(balances, 1, mean)
ret_mean <- apply(balances, 1, median)
ret_median <- apply(balances, 1, quantile, 0.1)
ret_perc10 <- apply(balances, 1, quantile, 0.9)
ret_perc90
# assembling separate data frames
<- data.frame(
dat_median year = 0:years,
amount = ret_median
)
<- data.frame(
dat_perc10 year = 0:years,
amount = ret_perc10
)
<- data.frame(
dat_perc90 year = 0:years,
amount = ret_perc90
)
With these statistics, we can enrich the plot
ggplot() +
geom_line(data = sim_dat,
aes(x = year, y = amount, group = simulation),
alpha = 0.2) +
geom_line(data = dat_perc10, aes(x = year, y = amount),
color = "orange", size = 1.2, alpha = 0.7) +
geom_line(data = dat_perc90, aes(x = year, y = amount),
color = "orange", size = 1.2, alpha = 0.7) +
geom_line(data = dat_median, aes(x = year, y = amount),
color = "blue", size = 1.2) +
theme_minimal() +
geom_hline(yintercept = 0, color = "tomato") +
xlab("Years") +
ylab("Portfolio balance")