Skip to content

Commit 52d7a7b

Browse files
committed
Updated tutorial
1 parent 7379e33 commit 52d7a7b

16 files changed

+1625
-37
lines changed

.DS_Store

0 Bytes
Binary file not shown.

Sklearn/.DS_Store

0 Bytes
Binary file not shown.

Sklearn/DecisionTrees/.DS_Store

8 KB
Binary file not shown.

Sklearn/DecisionTrees/.ipynb_checkpoints/Classification_Trees_using_Python-checkpoint.ipynb

Lines changed: 1201 additions & 0 deletions
Large diffs are not rendered by default.

Sklearn/DecisionTrees/.ipynb_checkpoints/DecisionTreeAnatomy-checkpoint.ipynb

Lines changed: 154 additions & 0 deletions
Large diffs are not rendered by default.

Sklearn/DecisionTrees/Classification_Trees_using_Python.ipynb

Lines changed: 6 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -35,27 +35,7 @@
3535
"cell_type": "markdown",
3636
"metadata": {},
3737
"source": [
38-
"Make information about dataset?"
39-
]
40-
},
41-
{
42-
"cell_type": "markdown",
43-
"metadata": {},
44-
"source": [
45-
"Parameters | Number\n",
46-
"--- | ---\n",
47-
"Classes | 3\n",
48-
"Samples per class | [59, 71, 48]\n",
49-
"Samples total | 178\n",
50-
"Dimensionality | 13\n",
51-
"Features | Real Positive"
52-
]
53-
},
54-
{
55-
"cell_type": "markdown",
56-
"metadata": {},
57-
"source": [
58-
"We will be using the [House Sales in King County dataset](https://www.kaggle.com/harlfoxem/housesalesprediction)."
38+
"The Iris dataset is one of datasets scikit-learn comes with that do not require the downloading of any file from some external website. The code below will load the iris dataset."
5939
]
6040
},
6141
{
@@ -66,22 +46,11 @@
6646
},
6747
"outputs": [],
6848
"source": [
69-
"url = 'https://raw.githubusercontent.com/mGalarnyk/Python_Tutorials/master/Kaggle/HousingSalesKC/kc_house_data.csv'\n",
70-
"\n",
71-
"df = pd.read_csv(url)"
72-
]
73-
},
74-
{
75-
"cell_type": "code",
76-
"execution_count": 3,
77-
"metadata": {
78-
"collapsed": true
79-
},
80-
"outputs": [],
81-
"source": [
82-
"df.drop(['date', 'id', 'yr_renovated', 'zipcode', 'lat', 'long']\n",
83-
" , axis = 1\n",
84-
" , inplace = True)"
49+
"import pandas as pd\n",
50+
"from sklearn.datasets import load_iris\n",
51+
"data = load_iris()\n",
52+
"df = pd.DataFrame(data.data, columns=data.feature_names)\n",
53+
"df['target'] = data.target"
8554
]
8655
},
8756
{

Sklearn/DecisionTrees/DecisionTreeAnatomy.ipynb

Lines changed: 154 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
digraph Tree {
2+
node [shape=box, style="filled", color="black"] ;
3+
0 [label = "Root Node", fillcolor="cyan"] ;
4+
1 [label = "Leaf/Terminal\nNode", fillcolor="springgreen"] ;
5+
0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ;
6+
2 [label="Decision Node", fillcolor="pink"] ;
7+
0 -> 2 [labeldistance=2.5, labelangle=-45, headlabel="False"] ;
8+
3 [label = "Leaf/Terminal\nNode", fillcolor="springgreen"] ;
9+
2 -> 3 ;
10+
4 [label = "Leaf/Terminal\nNode", fillcolor="springgreen"] ;
11+
2 -> 4 ;
12+
}
17.7 KB
Loading
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
digraph Tree {
2+
node [shape=box] ;
3+
0 [label="petal length (cm) <= 2.45\ngini = 0.667\nsamples = 150\nvalue = [50, 50, 50]\nclass = setosa"] ;
4+
1 [label="gini = 0.0\nsamples = 50\nvalue = [50, 0, 0]\nclass = setosa"] ;
5+
0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ;
6+
2 [label="petal width (cm) <= 1.75\ngini = 0.5\nsamples = 100\nvalue = [0, 50, 50]\nclass = versicolor"] ;
7+
0 -> 2 [labeldistance=2.5, labelangle=-45, headlabel="False"] ;
8+
3 [label="gini = 0.168\nsamples = 54\nvalue = [0, 49, 5]\nclass = versicolor"] ;
9+
2 -> 3 ;
10+
4 [label="gini = 0.043\nsamples = 46\nvalue = [0, 1, 45]\nclass = virginica"] ;
11+
2 -> 4 ;
12+
}

0 commit comments

Comments
 (0)