Skip to content

Commit 8923e45

Browse files
committed
more edits
1 parent 28df974 commit 8923e45

File tree

3 files changed

+25
-10
lines changed

3 files changed

+25
-10
lines changed

6_STATINFERENCE/Statistical Inference Course Notes.Rmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -304,7 +304,7 @@ ggplot(dat, aes(x = x, y = y, color = factor)) + geom_line(size = 2)
304304
```
305305

306306

307-
* **variance** = measure of spread, the square of expected distance from the mean (expressed in $X$'s units$^2$)
307+
* **variance** = measure of spread or dispersion, the expected squared distance of the variable from its mean (expressed in $X$'s units$^2$)
308308
- as we can see from above, higher variances $\rightarrow$ more spread, lower $\rightarrow$ smaller spread
309309
* $Var(X) = E[(X-\mu)^2] = E[X^2] - E[X]^2$
310310
* **standard deviation** $= \sqrt{Var(X)}$ $\rightarrow$ has same units as X
@@ -352,7 +352,7 @@ grid.raster(readPNG("figures/8.png"))
352352
```
353353

354354
* **distribution for mean of random samples**
355-
* expected value of the **mean** of distribution of means = expected value of the sample = population mean
355+
* expected value of the **mean** of distribution of means = expected value of the sample mean = population mean
356356
* $E[\bar X]=\mu$
357357
* expected value of the variance of distribution of means
358358
* $Var(\bar X) = \sigma^2/n$

8_PREDMACHLEARN/Practical Machine Learning Course Notes HTML.Rmd

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -859,10 +859,10 @@ matlines(testFaith$waiting,pred1,type="l",,col=c(1,2,2),lty = c(1,1,1), lwd=3)
859859
+ multiple predictors (dummy/indicator variables) are created for factor variables
860860
- `plot(lm$finalModel)` = construct 4 diagnostic plots for evaluating the model
861861
+ ***Note**: more information on these plots can be found at `?plot.lm` *
862-
+ ***Residual vs Fitted***
862+
+ ***Residuals vs Fitted***
863863
+ ***Normal Q-Q***
864864
+ ***Scale-Location***
865-
+ ***Residual vs Leverage***
865+
+ ***Residuals vs Leverage***
866866

867867
```{r fig.align = 'center'}
868868
# create train and test sets
@@ -878,15 +878,23 @@ par(mfrow = c(2, 2))
878878
plot(finMod,pch=19,cex=0.5,col="#00000010")
879879
```
880880

881-
* plotting residuals by index can be helpful in showing missing variables
881+
* plotting residuals by fitted values and coloring with a variable not used in the model helps spot a trend in that variable.
882+
883+
```{r fig.width = 4, fig.height = 3, fig.align = 'center'}
884+
# plot fitted values by residuals
885+
qplot(finMod$fitted, finMod$residuals, color=race, data=training)
886+
```
887+
888+
* plotting residuals by index (ie; row numbers) can be helpful in showing missing variables
882889
- `plot(finMod$residuals)` = plot the residuals against index (row number)
883-
- if there's a trend/pattern in the residuals, it is highly likely that another variable (such as age/time) should be included
890+
- if there's a trend/pattern in the residuals, it is highly likely that another variable (such as age/time) should be included.
884891
+ residuals should not have relationship to index
885892

886893
```{r fig.width = 4, fig.height = 3, fig.align = 'center'}
887894
# plot residual by index
888895
plot(finMod$residuals,pch=19,cex=0.5)
889896
```
897+
890898
* here the residuals increase linearly with the index, and the highest residuals are concentrated in the higher indexes, so there must be a missing variable
891899

892900

8_PREDMACHLEARN/Practical Machine Learning Course Notes.Rmd

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -875,10 +875,10 @@ matlines(testFaith$waiting,pred1,type="l",,col=c(1,2,2),lty = c(1,1,1), lwd=3)
875875
+ multiple predictors (dummy/indicator variables) are created for factor variables
876876
- `plot(lm$finalModel)` = construct 4 diagnostic plots for evaluating the model
877877
+ ***Note**: more information on these plots can be found at `?plot.lm` *
878-
+ ***Residual vs Fitted***
878+
+ ***Residuals vs Fitted***
879879
+ ***Normal Q-Q***
880880
+ ***Scale-Location***
881-
+ ***Residual vs Leverage***
881+
+ ***Residuals vs Leverage***
882882

883883
```{r fig.align = 'center'}
884884
# create train and test sets
@@ -894,9 +894,16 @@ par(mfrow = c(2, 2))
894894
plot(finMod,pch=19,cex=0.5,col="#00000010")
895895
```
896896

897-
* plotting residuals by index can be helpful in showing missing variables
897+
* plotting residuals by fitted values and coloring with a variable not used in the model helps spot a trend in that variable.
898+
899+
```{r fig.width = 4, fig.height = 3, fig.align = 'center'}
900+
# plot fitted values by residuals
901+
qplot(finMod$fitted, finMod$residuals, color=race, data=training)
902+
```
903+
904+
* plotting residuals by index (ie; row numbers) can be helpful in showing missing variables
898905
- `plot(finMod$residuals)` = plot the residuals against index (row number)
899-
- if there's a trend/pattern in the residuals, it is highly likely that another variable (such as age/time) should be included
906+
- if there's a trend/pattern in the residuals, it is highly likely that another variable (such as age/time) should be included.
900907
+ residuals should not have relationship to index
901908

902909
```{r fig.width = 4, fig.height = 3, fig.align = 'center'}

0 commit comments

Comments
 (0)