Skip to content

Commit 8be5807

Browse files
authored
Update 09_tabular to fastai v2.2.7 (fastai#413)
saleElaped is now detected as continuous variable right away.
1 parent c3ceea7 commit 8be5807

File tree

2 files changed

+17
-33
lines changed

2 files changed

+17
-33
lines changed

09_tabular.ipynb

Lines changed: 15 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -9366,33 +9366,27 @@
93669366
"cell_type": "markdown",
93679367
"metadata": {},
93689368
"source": [
9369-
"In this case, however, there's one variable that we absolutely do not want to treat as categorical: the `saleElapsed` variable. A categorical variable cannot, by definition, extrapolate outside the range of values that it has seen, but we want to be able to predict auction sale prices in the future. Therefore, we need to make this a continuous variable:"
9369+
"In this case, there's one variable that we absolutely do not want to treat as categorical: the `saleElapsed` variable. A categorical variable cannot, by definition, extrapolate outside the range of values that it has seen, but we want to be able to predict auction sale prices in the future. Let's verify that `cont_cat_split` did the correct thing."
93709370
]
93719371
},
93729372
{
93739373
"cell_type": "code",
93749374
"execution_count": 98,
93759375
"metadata": {},
9376-
"outputs": [],
9377-
"source": [
9378-
"cont_nn.append('saleElapsed')\n",
9379-
"cat_nn.remove('saleElapsed')"
9380-
]
9381-
},
9382-
{
9383-
"cell_type": "markdown",
9384-
"metadata": {},
9385-
"source": [
9386-
"Also, to use this as a continuous variable, we have to ensure it's of a numeric type:"
9387-
]
9388-
},
9389-
{
9390-
"cell_type": "code",
9391-
"execution_count": 106,
9392-
"metadata": {},
9393-
"outputs": [],
9376+
"outputs": [
9377+
{
9378+
"data": {
9379+
"text/plain": [
9380+
"['saleElapsed']"
9381+
]
9382+
},
9383+
"execution_count": 98,
9384+
"metadata": {},
9385+
"output_type": "execute_result"
9386+
}
9387+
],
93949388
"source": [
9395-
"df_nn['saleElapsed'] = df_nn['saleElapsed'].astype(int)"
9389+
"cont_nn"
93969390
]
93979391
},
93989392
{
@@ -9975,7 +9969,7 @@
99759969
"1. What's a good type of plot for showing tree interpreter results?\n",
99769970
"1. What is the \"extrapolation problem\"?\n",
99779971
"1. How can you tell if your test or validation set is distributed in a different way than your training set?\n",
9978-
"1. Why do we make `saleElapsed` a continuous variable, even although it has less than 9,000 distinct values?\n",
9972+
"1. Why do we ensure `saleElapsed` is a continuous variable, even although it has less than 9,000 distinct values?\n",
99799973
"1. What is \"boosting\"?\n",
99809974
"1. How could we use embeddings with a random forest? Would we expect this to help?\n",
99819975
"1. Why might we not always use a neural net for tabular modeling?"

clean/09_tabular.ipynb

Lines changed: 2 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1153,17 +1153,7 @@
11531153
"metadata": {},
11541154
"outputs": [],
11551155
"source": [
1156-
"cont_nn.append('saleElapsed')\n",
1157-
"cat_nn.remove('saleElapsed')"
1158-
]
1159-
},
1160-
{
1161-
"cell_type": "code",
1162-
"execution_count": null,
1163-
"metadata": {},
1164-
"outputs": [],
1165-
"source": [
1166-
"df_nn['saleElapsed'] = df_nn['saleElapsed'].astype(int)"
1156+
"cont_nn"
11671157
]
11681158
},
11691159
{
@@ -1375,7 +1365,7 @@
13751365
"1. What's a good type of plot for showing tree interpreter results?\n",
13761366
"1. What is the \"extrapolation problem\"?\n",
13771367
"1. How can you tell if your test or validation set is distributed in a different way than your training set?\n",
1378-
"1. Why do we make `saleElapsed` a continuous variable, even although it has less than 9,000 distinct values?\n",
1368+
"1. Why do we ensure `saleElapsed` is a continuous variable, even although it has less than 9,000 distinct values?\n",
13791369
"1. What is \"boosting\"?\n",
13801370
"1. How could we use embeddings with a random forest? Would we expect this to help?\n",
13811371
"1. Why might we not always use a neural net for tabular modeling?"

0 commit comments

Comments
 (0)