Skip to content

Commit f4b2eea

Browse files
committed
other edits
1 parent 74b7e35 commit f4b2eea

File tree

1 file changed

+7
-5
lines changed

1 file changed

+7
-5
lines changed

6_STATINFERENCE/Statistical Inference Course Notes.Rmd

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -647,12 +647,12 @@ grid.arrange(g, p, ncol = 2)
647647

648648
### Example - CLT with Bernoulli Trials (Coin Flips)
649649
- for this example, we will simulate $n$ flips of a possibly unfair coin
650-
- $X_i$ be the 0 or 1 result of the $i^{th}$ flip of a possibly unfair coin
650+
- let $X_i$ be the 0 or 1 result of the $i^{th}$ flip of a possibly unfair coin
651651
+ sample proportion , $\hat p$, is the average of the coin flips
652652
+ $E[X_i] = p$ and $Var(X_i) = p(1-p)$
653653
+ standard error of the mean is $SE = \sqrt{p(1-p)/n}$
654654
+ in principle, normalizing the random variable $X_i$, we should get an approximately standard normal distribution $$\frac{\hat p - p}{\sqrt{p(1-p)/n}} \sim N(0,~1)$$
655-
- therefore, we will flip a coin $n$ times, take the sample proportion of heads (successes with probability $p$), subtract off 0.5 (ideal sample proportion) and multiply the result by divide by $\frac{1}{2 \sqrt{n}}$ and compare it to the standard normal
655+
- therefore, we will flip a coin $n$ times, take the sample proportion of heads (successes with probability $p$), subtract off 0.5 (ideal sample proportion) and multiply the result by $\frac{1}{2 \sqrt{n}}$ and compare it to the standard normal
656656

657657
```{r, echo = FALSE, fig.width=6, fig.height = 3, fig.align='center'}
658658
# specify number of simulations
@@ -711,7 +711,7 @@ g + facet_grid(. ~ size)
711711
* **95% confidence interval for the population mean $\mu$** is defined as $$\bar X \pm 2\sigma/\sqrt{n}$$ for the sample mean $\bar X \sim N(\mu, \sigma^2/n)$
712712
* you can choose to use 1.96 to be more accurate for the confidence interval
713713
* $P(\bar{X} > \mu + 2\sigma/\sqrt{n}~or~\bar{X} < \mu - 2\sigma/\sqrt{n}) = 5\%$
714-
* **interpretation**: if we were to repeated samples of size $n$ from the population and construct this confidence interval for each case, approximately 95% of the intervals will contain $\mu$
714+
* **interpretation**: if we were to repeatedly draw samples of size $n$ from the population and construct this confidence interval for each case, approximately 95% of the intervals will contain $\mu$
715715
* confidence intervals get **narrower** with less variability or
716716
larger sample sizes
717717
* ***Note**: Poisson and binomial distributions have exact intervals that don't require CLT *
@@ -729,9 +729,10 @@ mean(x) + c(-1, 1) * qnorm(0.975) * sd(x)/sqrt(length(x))
729729
### Confidence Interval - Bernoulli Distribution/Wald Interval
730730
* for Bernoulli distributions, $X_i$ is 0 or 1 with success probability $p$ and the variance is $\sigma^2 = p(1 - p)$
731731
* the confidence interval takes the form of $$\hat{p} \pm z_{1-\alpha/2}\sqrt{\frac{p(1-p)}{n}}$$
732-
* since the population proportion $p$ is unknown, we can use $\hat{p} = X/n$ as estimate
732+
* since the population proportion $p$ is unknown, we can use the sampled proportion of success $\hat{p} = X/n$ as estimate
733733
* $p(1-p)$ is largest when $p = 1/2$, so 95% confidence interval can be calculated by $$\begin{aligned}
734-
\hat{p} \pm Z_{0.95} \sqrt{\frac{0.5(1-0.5)}{n}} & = \hat{p} \pm 1.96 \sqrt{\frac{1}{4n}}\\
734+
\hat{p} \pm Z_{0.95} \sqrt{\frac{0.5(1-0.5)}{n}} & = \hat{p} \pm qnorm(.975) \sqrt{\frac{1}{4n}}\\
735+
& = \hat{p} \pm 1.96 \sqrt{\frac{1}{4n}}\\
735736
& = \hat{p} \pm \frac{1.96}{2} \sqrt{\frac{1}{n}}\\
736737
& \approx \hat{p} \pm \frac{1}{\sqrt{n}}\\
737738
\end{aligned}$$
@@ -948,6 +949,7 @@ t.test(g2, g1, paired = TRUE)
948949
* $S_p\left(\frac{1}{n_x} + \frac{1}{n_y}\right)^{1/2}$ = standard error
949950
* $S_p^2 = \{(n_x - 1) S_x^2 + (n_y - 1) S_y^2\}/(n_x + n_y - 2)$ = pooled variance estimator
950951
* this is effectively a weighted average between the two variances, such that different sample sizes are taken in to account
952+
* For equal sample sizes, $n_x = n_y$, $S_p^2 = \frac{S_x^2 + S_y^2}{2}$ (average of variance of two groups)
951953
* ***Note:** this interval assumes **constant variance** across two groups; if variance is different, use the next interval *
952954

953955
### Independent Group t Intervals - Different Variance

0 commit comments

Comments
 (0)