[1] 0.3989423
STAT 20: Introduction to Probability and Statistics
Look at a chance problem like drawing (with replacement) from a box with numbered tickets
Box Models provide an analogy for many chance processes which help to analyze chance variability
\(X\) = Getting heads when tossing a fair coin (once).
Box with tickets:
\[ \boxed{ \ \fbox{0} \quad \fbox{1} \ } \]
Draw one ticket out of this box.
\(X\) = Getting heads when tossing a biased coin (2/3 chance of heads).
Box with tickets:
\[ \boxed{ \ \fbox{0} \quad \fbox{1} \quad \fbox{1} \ } \]
Draw one ticket out this box.
\(X\) = Getting heads when tossing a biased coin (1/4 chance of heads).
Box with tickets:
\[ \boxed{ \ \fbox{0} \quad \fbox{0} \quad \fbox{0} \quad \fbox{1} \ } \]
Draw one ticket out this box.
\(X\) = Number of heads when tossing a fair coin five times.
Box with tickets:
\[ \boxed{ \ \fbox{0} \quad \fbox{1} \ } \]
Draw five tickets with replacement out of this box, and add them up.
\(X\) = number of spots when rolling a die (once).
Box with tickets:
\[ \boxed{ \ \fbox{1} \quad \fbox{2} \quad \fbox{3} \quad \fbox{4} \quad \fbox{5} \quad \fbox{6} \ } \]
Draw one ticket out of this box.
\(X\) = Sum of dice.
Box with tickets:
\[ \boxed{ \ \fbox{1} \quad \fbox{2} \quad \fbox{3} \quad \fbox{4} \quad \fbox{5} \quad \fbox{6} \ } \]
Draw two tickets with replacement out of this box, and add them.
\(X\) is a random variable with the distribution shown below:
\[ X = \begin{cases} 3, \; \text{ with prob } 1/3\\ 4, \; \text{ with prob } 1/4\\ 5, \; \text{ with prob } 5/12 \end{cases} \]
Box with tickets:
\[ \boxed{ \ \fbox{3} \ \fbox{3} \ \fbox{3} \ \fbox{3} \quad \fbox{4} \ \fbox{4} \ \fbox{4} \quad \fbox{5} \ \fbox{5} \ \fbox{5} \ \fbox{5} \ \fbox{5} \ } \]
Draw one ticket out of this box.
\(X\) = Number of spins landing on red
Box with 38 tickets:
\[ \boxed{ \ \underset{\text{18 black}}{\fbox{0} \ \fbox{0} \dots \fbox{0}} \quad \underset{\text{18 red}}{\fbox{1} \ \fbox{1} \dots \fbox{1}} \quad \underset{\text{2 green}}{\fbox{0} \ \fbox{0}} \ } \]
Draw five tickets with replacement out of this box, and add them.
\(X\) = number of spots when rolling a die (once).
Draw one ticket out this box:
\[ \boxed{ \ \fbox{1} \quad \fbox{2} \quad \fbox{3} \quad \fbox{4} \quad \fbox{5} \quad \fbox{6} \ } \]
E(X) = ?
Var(X) = ?
\(X\) = number of spots when rolling a die (once).
Draw one ticket out this box:
\[ \boxed{ \ \fbox{1} \quad \fbox{2} \quad \fbox{3} \quad \fbox{4} \quad \fbox{5} \quad \fbox{6} \ } \]
E(X) = Average of tickets in box
Var(X) = Variance of tickets in box
\(X\) = number of spots when rolling a die (once).
Draw one ticket out this box:
\[ \boxed{ \ \fbox{1} \quad \fbox{2} \quad \fbox{3} \quad \fbox{4} \quad \fbox{5} \quad \fbox{6} \ } \]
\[ E(X) = \frac{1 + 2 + 3 + 4 + 5 + 6}{6} = 3.5 \]
\[ Var(X) = \frac{(1-3.5)^2 + (2-3.5)^2 + \dots + (5-3.5)^2 + (6-3.5)^2}{6} = 2.91 \]
Draw two tickets with replacement out of this box, and add them.
\[ \boxed{ \ \fbox{1} \quad \fbox{2} \quad \fbox{3} \quad \fbox{4} \quad \fbox{5} \quad \fbox{6} \ } \]
Sum of dice \(S = X_1 + X_2\), where \(X_1\) is the number in first ticket, and \(X_2\) is the number in second ticket.
\(E(S) = ?\)
\(Var(S) = ?\)
Draw two tickets with replacement out of this box, and add them.
\[ \boxed{ \ \fbox{1} \quad \fbox{2} \quad \fbox{3} \quad \fbox{4} \quad \fbox{5} \quad \fbox{6} \ } \]
Sum of dice \(S = X_1 + X_2\), where \(X_1\) is the number in first ticket, and \(X_2\) is the number in second ticket.
\(E(S) = E(X_1 + X_2)\)
Draw two tickets with replacement out of this box, and add them.
\[ \boxed{ \ \fbox{1} \quad \fbox{2} \quad \fbox{3} \quad \fbox{4} \quad \fbox{5} \quad \fbox{6} \ } \]
Sum of dice \(S = X_1 + X_2\), where \(X_1\) is the number in first ticket, and \(X_2\) is the number in second ticket.
\(E(S) = E(X_1 + X_2) = E(X_1) + E(X_2)\)
Draw two tickets with replacement out of this box, and add them.
\[ \boxed{ \ \fbox{1} \quad \fbox{2} \quad \fbox{3} \quad \fbox{4} \quad \fbox{5} \quad \fbox{6} \ } \]
Sum of dice \(S = X_1 + X_2\), where \(X_1\) is the number in first ticket, and \(X_2\) is the number in second ticket.
\(E(S) = E(X_1 + X_2) = E(X_1) + E(X_2) = 2E(X)\)
Draw two tickets with replacement out of this box, and add them.
\[ \boxed{ \ \fbox{1} \quad \fbox{2} \quad \fbox{3} \quad \fbox{4} \quad \fbox{5} \quad \fbox{6} \ } \]
Sum of dice \(S = X_1 + X_2\), where \(X_1\) is the number in first ticket, and \(X_2\) is the number in second ticket.
\(E(S) = E(X_1 + X_2) = E(X_1) + E(X_2) = 2E(X) = 2(3.5) = 7\)
Draw two tickets with replacement out of this box, and add them.
\[ \boxed{ \ \fbox{1} \quad \fbox{2} \quad \fbox{3} \quad \fbox{4} \quad \fbox{5} \quad \fbox{6} \ } \]
Sum of dice \(S = X_1 + X_2\), where \(X_1\) is the number in first ticket, and \(X_2\) is the number in second ticket.
\(Var(S) = Var(X_1 + X_2)\)
Draw two tickets with replacement out of this box, and add them.
\[ \boxed{ \ \fbox{1} \quad \fbox{2} \quad \fbox{3} \quad \fbox{4} \quad \fbox{5} \quad \fbox{6} \ } \]
Sum of dice \(S = X_1 + X_2\), where \(X_1\) is the number in first ticket, and \(X_2\) is the number in second ticket.
\(Var(S) = Var(X_1 + X_2) = Var(X_1) + Var(X_2)\)
Draw two tickets with replacement out of this box, and add them.
\[ \boxed{ \ \fbox{1} \quad \fbox{2} \quad \fbox{3} \quad \fbox{4} \quad \fbox{5} \quad \fbox{6} \ } \]
Sum of dice \(S = X_1 + X_2\), where \(X_1\) is the number in first ticket, and \(X_2\) is the number in second ticket.
\(Var(S) = Var(X_1 + X_2) = Var(X_1) + Var(X_2) = 2 \times Var(X)\)
Expected Value for S sum of draws from a box model:
\[ \Large E(S) = (\text{# of draws}) \times (\text{avg of box}) \]
\[ \begin{align} E(S) &= E(X_1 + \dots + X_n) \\ &= E(X_1) + \dots + E(X_n)\\ &= n \times E(X) \end{align} \]
Variance for S sum of draws from a box model:
\[ \Large Var(S) = (\text{# of draws}) \times (\text{variance of box}) \]
\[ \begin{align} Var(S) &= Var(X_1 + \dots + X_n) \\ &= Var(X_1) + \dots + Var(X_n)\\ &= n \times Var(X) \end{align} \]
How far off we expect to be from the expected value?
\[ \Large SD(S) = (\text{# of draws})^{1/2} \times (\text{SD of box}) \]
\[ \begin{align} SD(S) &= \left( nVar(X) \right)^{1/2} \\ &= \sqrt{n} \times SD(X) \end{align} \]
The most important continuous distribution in Statistics.
Also known as the Gaussian distribution.
If a random variable \(X\) follows a normal distribution, we write:
\[ X \sim N(\mu, \sigma) \quad \text{or} \quad X \sim N(\mu, \sigma^2) \] where \(\mu\) is the mean, and \(\sigma\) is the SD (\(\sigma^2\) is the Var).
Total area under the curve?
Total area under the curve is 1
dnorm()
computes the density \(f(x)\) of \(X \sim N(\mu, \sigma)\)
pnorm()
computes the CDF \(F(x) = P(X \leq x)\) of \(X\)
qnorm()
is the inverse of the CDF; given a probability (or percentile) it returns the value on the x-axis that corresponds to that percentile
rnorm()
generates random numbers from a Normal distribution
dnorm()
dnorm()
computes the density \(f(x)\) of \(X \sim N(\mu, \sigma)\)
pnorm()
pnorm()
computes \(F(x)\) of \(X \sim N(\mu, \sigma)\)
example 1: \(\quad F(0.5) = P(X \leq 0.5)\) for \(X \sim N(1, 2)\)
qnorm()
qnorm()
computes the inverse of \(F(x)\) for \(X \sim N(\mu, \sigma)\)
example 1: which \(x\) gives \(P(X \leq x) = 0.20\) for \(X \sim N(1, 2)\)
rnorm()
rnorm()
generates random numbers from a Normal distribution
example 1: generate 3 values from \(X \sim N(0, 1)\)
\(S\) and \(\bar{X}\) will follow an approximately Normal Distribution, as we increase the number of draws.
Central Limit Theorem
Let \(\mu\) be the average of the box, and \(\sigma\) the SD of the box:
\(S \sim N(n \times \mu, \ \sqrt{n} \times \sigma)\)
\(\bar{X} \sim N(\mu, \ \sigma / \sqrt{n})\)
Net gain while betting on red on a roulette spin.
If we bet a dollar on red, then our net gain is
\[ \text{gain} = \begin{cases} +1 & \text{with prob } \frac{18}{38} \\ -1 & \text{with prob } \frac{20}{38} \end{cases} \]
[1] -0.05263158
# define the gain for a single spin
gain <- c(1, -1)
# define the corresponding probabilities
prob_gain <- c(18/38, 20/38)
exp_gain <- sum(gain * prob_gain)
exp_gain
[1] -0.05263158
gains <- replicate(
n = 1000, # 1000 repetitions
expr = {
# net gain in 10 spins of roulette
spins = sample(x = gain, size = 10, prob = prob_gain, replace = TRUE)
gain = sum(spins)
})
# empirical histogram
data.frame(gains) |>
ggplot(aes(x = gains)) +
geom_histogram(color = "white", binwidth = 2) +
labs(title = "N = 10",
x = "net gain") +
theme_bw()
gains <- replicate(
n = 1000, # 1000 repetitions
expr = {
# net gain in 10 spins of roulette
spins = sample(x = gain, size = 100, prob = prob_gain, replace = TRUE)
gain = sum(spins)
})
# empirical histogram
data.frame(gains) |>
ggplot(aes(x = gains)) +
geom_histogram(color = "white", binwidth = 2) +
labs(title = "N = 100",
x = "net gain") +
theme_bw()
gains <- replicate(
n = 1000, # 1000 repetitions
expr = {
# net gain in 10 spins of roulette
spins = sample(x = gain, size = 1000, prob = prob_gain, replace = TRUE)
gain = sum(spins)
})
# empirical histogram
data.frame(gains) |>
ggplot(aes(x = gains)) +
geom_histogram(color = "white", binwidth = 8) +
labs(title = "N = 1000",
x = "net gain") +
theme_bw()
gains <- replicate(
n = 1000, # 1000 repetitions
expr = {
# net gain in 10 spins of roulette
spins = sample(x = gain, size = 5000, prob = prob_gain, replace = TRUE)
gain = sum(spins)
})
# empirical histogram
data.frame(gains) |>
ggplot(aes(x = gains)) +
geom_histogram(color = "white", binwidth = 15) +
labs(title = "N = 5000",
x = "net gain") +
theme_bw()