Skip to content

Commit 01199b3

Browse files
add filter
1 parent e9b9815 commit 01199b3

File tree

4 files changed

+392
-1
lines changed

4 files changed

+392
-1
lines changed

Chapter3/filter.ipynb

Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -701,6 +701,150 @@
701701
"source": [
702702
"df.loc[df.groupby(\"type\")[\"type\"].transform(\"size\") > 1]"
703703
]
704+
},
705+
{
706+
"cell_type": "markdown",
707+
"id": "7afb3774",
708+
"metadata": {},
709+
"source": [
710+
"### df.filter: Filter Columns Based on a Subset of Their Names"
711+
]
712+
},
713+
{
714+
"cell_type": "markdown",
715+
"id": "54f71d1b",
716+
"metadata": {},
717+
"source": [
718+
"If you want to filter columns of a pandas DataFrame based on a subset of their names, use `DataFrame.filter`. In the example below, we only choose the columns that contain the word \"cat\". "
719+
]
720+
},
721+
{
722+
"cell_type": "code",
723+
"execution_count": 6,
724+
"id": "a121a0b3",
725+
"metadata": {},
726+
"outputs": [
727+
{
728+
"data": {
729+
"text/html": [
730+
"<div>\n",
731+
"<style scoped>\n",
732+
" .dataframe tbody tr th:only-of-type {\n",
733+
" vertical-align: middle;\n",
734+
" }\n",
735+
"\n",
736+
" .dataframe tbody tr th {\n",
737+
" vertical-align: top;\n",
738+
" }\n",
739+
"\n",
740+
" .dataframe thead th {\n",
741+
" text-align: right;\n",
742+
" }\n",
743+
"</style>\n",
744+
"<table border=\"1\" class=\"dataframe\">\n",
745+
" <thead>\n",
746+
" <tr style=\"text-align: right;\">\n",
747+
" <th></th>\n",
748+
" <th>cat1</th>\n",
749+
" <th>cat2</th>\n",
750+
" <th>num1</th>\n",
751+
" </tr>\n",
752+
" </thead>\n",
753+
" <tbody>\n",
754+
" <tr>\n",
755+
" <th>0</th>\n",
756+
" <td>a</td>\n",
757+
" <td>b</td>\n",
758+
" <td>1</td>\n",
759+
" </tr>\n",
760+
" <tr>\n",
761+
" <th>1</th>\n",
762+
" <td>b</td>\n",
763+
" <td>c</td>\n",
764+
" <td>2</td>\n",
765+
" </tr>\n",
766+
" </tbody>\n",
767+
"</table>\n",
768+
"</div>"
769+
],
770+
"text/plain": [
771+
" cat1 cat2 num1\n",
772+
"0 a b 1\n",
773+
"1 b c 2"
774+
]
775+
},
776+
"execution_count": 6,
777+
"metadata": {},
778+
"output_type": "execute_result"
779+
}
780+
],
781+
"source": [
782+
"import pandas as pd\n",
783+
"\n",
784+
"df = pd.DataFrame({\"cat1\": [\"a\", \"b\"], \"cat2\": [\"b\", \"c\"], \"num1\": [1, 2]})\n",
785+
"df \n"
786+
]
787+
},
788+
{
789+
"cell_type": "code",
790+
"execution_count": 7,
791+
"id": "2de5974c",
792+
"metadata": {},
793+
"outputs": [
794+
{
795+
"data": {
796+
"text/html": [
797+
"<div>\n",
798+
"<style scoped>\n",
799+
" .dataframe tbody tr th:only-of-type {\n",
800+
" vertical-align: middle;\n",
801+
" }\n",
802+
"\n",
803+
" .dataframe tbody tr th {\n",
804+
" vertical-align: top;\n",
805+
" }\n",
806+
"\n",
807+
" .dataframe thead th {\n",
808+
" text-align: right;\n",
809+
" }\n",
810+
"</style>\n",
811+
"<table border=\"1\" class=\"dataframe\">\n",
812+
" <thead>\n",
813+
" <tr style=\"text-align: right;\">\n",
814+
" <th></th>\n",
815+
" <th>cat1</th>\n",
816+
" <th>cat2</th>\n",
817+
" </tr>\n",
818+
" </thead>\n",
819+
" <tbody>\n",
820+
" <tr>\n",
821+
" <th>0</th>\n",
822+
" <td>a</td>\n",
823+
" <td>b</td>\n",
824+
" </tr>\n",
825+
" <tr>\n",
826+
" <th>1</th>\n",
827+
" <td>b</td>\n",
828+
" <td>c</td>\n",
829+
" </tr>\n",
830+
" </tbody>\n",
831+
"</table>\n",
832+
"</div>"
833+
],
834+
"text/plain": [
835+
" cat1 cat2\n",
836+
"0 a b\n",
837+
"1 b c"
838+
]
839+
},
840+
"execution_count": 7,
841+
"metadata": {},
842+
"output_type": "execute_result"
843+
}
844+
],
845+
"source": [
846+
"df.filter(like='cat', axis=1)"
847+
]
704848
}
705849
],
706850
"metadata": {

docs/Chapter3/filter.html

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -608,6 +608,11 @@ <h1 class="site-logo" id="site-title">Effective Python for Data Scientists</h1>
608608
4.6.3. Filter a pandas DataFrame by Value Counts
609609
</a>
610610
</li>
611+
<li class="toc-h2 nav-item toc-entry">
612+
<a class="reference internal nav-link" href="#df-filter-filter-columns-based-on-a-subset-of-their-names">
613+
4.6.4. df.filter: Filter Columns Based on a Subset of Their Names
614+
</a>
615+
</li>
611616
</ul>
612617

613618
</nav>
@@ -1130,6 +1135,104 @@ <h2><span class="section-number">4.6.3. </span>Filter a pandas DataFrame by Valu
11301135
</script></div>
11311136
</div>
11321137
</div>
1138+
<div class="section" id="df-filter-filter-columns-based-on-a-subset-of-their-names">
1139+
<h2><span class="section-number">4.6.4. </span>df.filter: Filter Columns Based on a Subset of Their Names<a class="headerlink" href="#df-filter-filter-columns-based-on-a-subset-of-their-names" title="Permalink to this headline"></a></h2>
1140+
<p>If you want to filter columns of a pandas DataFrame based on a subset of their names, use <code class="docutils literal notranslate"><span class="pre">DataFrame.filter</span></code>. In the example below, we only choose the columns that contain the word “cat”.</p>
1141+
<div class="cell docutils container">
1142+
<div class="cell_input docutils container">
1143+
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
1144+
1145+
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">({</span><span class="s2">&quot;cat1&quot;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&quot;a&quot;</span><span class="p">,</span> <span class="s2">&quot;b&quot;</span><span class="p">],</span> <span class="s2">&quot;cat2&quot;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&quot;b&quot;</span><span class="p">,</span> <span class="s2">&quot;c&quot;</span><span class="p">],</span> <span class="s2">&quot;num1&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">]})</span>
1146+
<span class="n">df</span>
1147+
</pre></div>
1148+
</div>
1149+
</div>
1150+
<div class="cell_output docutils container">
1151+
<div class="output text_html"><div>
1152+
<style scoped>
1153+
.dataframe tbody tr th:only-of-type {
1154+
vertical-align: middle;
1155+
}
1156+
1157+
.dataframe tbody tr th {
1158+
vertical-align: top;
1159+
}
1160+
1161+
.dataframe thead th {
1162+
text-align: right;
1163+
}
1164+
</style>
1165+
<table border="1" class="dataframe">
1166+
<thead>
1167+
<tr style="text-align: right;">
1168+
<th></th>
1169+
<th>cat1</th>
1170+
<th>cat2</th>
1171+
<th>num1</th>
1172+
</tr>
1173+
</thead>
1174+
<tbody>
1175+
<tr>
1176+
<th>0</th>
1177+
<td>a</td>
1178+
<td>b</td>
1179+
<td>1</td>
1180+
</tr>
1181+
<tr>
1182+
<th>1</th>
1183+
<td>b</td>
1184+
<td>c</td>
1185+
<td>2</td>
1186+
</tr>
1187+
</tbody>
1188+
</table>
1189+
</div></div></div>
1190+
</div>
1191+
<div class="cell docutils container">
1192+
<div class="cell_input docutils container">
1193+
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">df</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">like</span><span class="o">=</span><span class="s1">&#39;cat&#39;</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
1194+
</pre></div>
1195+
</div>
1196+
</div>
1197+
<div class="cell_output docutils container">
1198+
<div class="output text_html"><div>
1199+
<style scoped>
1200+
.dataframe tbody tr th:only-of-type {
1201+
vertical-align: middle;
1202+
}
1203+
1204+
.dataframe tbody tr th {
1205+
vertical-align: top;
1206+
}
1207+
1208+
.dataframe thead th {
1209+
text-align: right;
1210+
}
1211+
</style>
1212+
<table border="1" class="dataframe">
1213+
<thead>
1214+
<tr style="text-align: right;">
1215+
<th></th>
1216+
<th>cat1</th>
1217+
<th>cat2</th>
1218+
</tr>
1219+
</thead>
1220+
<tbody>
1221+
<tr>
1222+
<th>0</th>
1223+
<td>a</td>
1224+
<td>b</td>
1225+
</tr>
1226+
<tr>
1227+
<th>1</th>
1228+
<td>b</td>
1229+
<td>c</td>
1230+
</tr>
1231+
</tbody>
1232+
</table>
1233+
</div></div></div>
1234+
</div>
1235+
</div>
11331236
</div>
11341237

11351238
<script type="text/x-thebe-config">

0 commit comments

Comments
 (0)