Skip to content

Commit e4362a6

Browse files
committed
more notes
1 parent 082dbb2 commit e4362a6

File tree

1 file changed

+17
-1
lines changed

1 file changed

+17
-1
lines changed

Introduction to Statistical Learning/Chapter 8.ipynb

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,23 @@
8383
"### Tree Pruning\n",
8484
"It is possible to build a decision tree so specific (one with so many branches) that each observation can be predicted exactly. This would be complete memorization, ie overfitting, of the data. Because we want to have the tree work with unseen data, we can prune the tree.\n",
8585
"\n",
86-
"One strategy would be some have some threshold for stopping a branch from splitting - it must have decreased RSS by a certain amount."
86+
"One strategy would be some have some threshold for stopping a branch from splitting - it must have decreased RSS by a certain amount. Since this might miss a good split deeper in the tree, pruning is preferred.\n",
87+
"\n",
88+
"Pruning works by.... \n",
89+
"1. growing a very large tree and stopping only when a minimum number of observations are left in each branch.\n",
90+
"2. At each stage during the growing process add a penalty term $\\alpha|T|$ to RSS where |T| is the number of terminal nodes.\n",
91+
"3. This will give a function that maps $\\alpha$ to a particular subtree. So $\\alpha = 0$ would map to the original huge tree and for example $\\alpha = 5$ could map to a tree that with only half of the terminal nodes.\n",
92+
"\n",
93+
"Choose $\\alpha$ through cross validation by...\n",
94+
"1. Splitting training data into K folds\n",
95+
"2. Grow a large tree and apply the penalty term exactly as above (map each $\\alpha$ to a particular subtree.)\n",
96+
"3. evaluate each $\\alpha$ (subtree) on the left-out fold\n",
97+
"4. Average all the $\\alpha$ (subtrees) for each iteration of the K-folds\n",
98+
"\n",
99+
"Then use this $\\alpha$ to choose the tree from above.\n",
100+
"\n",
101+
"### Classification Trees\n",
102+
"Predict at each node, the most commonly occurring class."
87103
]
88104
},
89105
{

0 commit comments

Comments
 (0)