Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion configuration.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@
"source": [
"import azureml.core\n",
"\n",
"print(\"This notebook was created using version 1.19.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.20.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
Expand Down
135 changes: 98 additions & 37 deletions contrib/fairness/fairlearn-azureml-mitigation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
"Please see the [configuration notebook](../../configuration.ipynb) for information about creating one, if required.\n",
"This notebook also requires the following packages:\n",
"* `azureml-contrib-fairness`\n",
"* `fairlearn==0.4.6`\n",
"* `fairlearn==0.4.6` (v0.5.0 will work with minor modifications)\n",
"* `joblib`\n",
"* `shap`\n",
"\n",
Expand All @@ -62,13 +62,20 @@
"# !pip install --upgrade scikit-learn>=0.22.1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, please ensure that when you downloaded this notebook, you also downloaded the `fairness_nb_utils.py` file from the same location, and placed it in the same directory as this notebook."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"LoadingData\"></a>\n",
"## Loading the Data\n",
"We use the well-known `adult` census dataset, which we load using `shap` (for convenience). We start with a fairly unremarkable set of imports:"
"We use the well-known `adult` census dataset, which we will fetch from the OpenML website. We start with a fairly unremarkable set of imports:"
]
},
{
Expand All @@ -79,17 +86,24 @@
"source": [
"from fairlearn.reductions import GridSearch, DemographicParity, ErrorRate\n",
"from fairlearn.widget import FairlearnDashboard\n",
"from sklearn import svm\n",
"from sklearn.preprocessing import LabelEncoder, StandardScaler\n",
"\n",
"from sklearn.compose import ColumnTransformer\n",
"from sklearn.datasets import fetch_openml\n",
"from sklearn.impute import SimpleImputer\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.preprocessing import StandardScaler, OneHotEncoder\n",
"from sklearn.compose import make_column_selector as selector\n",
"from sklearn.pipeline import Pipeline\n",
"\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now load and inspect the data from the `shap` package:"
"We can now load and inspect the data:"
]
},
{
Expand All @@ -98,13 +112,13 @@
"metadata": {},
"outputs": [],
"source": [
"from utilities import fetch_openml_with_retries\n",
"from fairness_nb_utils import fetch_openml_with_retries\n",
"\n",
"data = fetch_openml_with_retries(data_id=1590)\n",
" \n",
"# Extract the items we want\n",
"X_raw = data.data\n",
"Y = (data.target == '>50K') * 1\n",
"y = (data.target == '>50K') * 1\n",
"\n",
"X_raw[\"race\"].value_counts().to_dict()"
]
Expand All @@ -113,7 +127,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We are going to treat the sex of each individual as a protected attribute (where 0 indicates female and 1 indicates male), and in this particular case we are going separate this attribute out and drop it from the main data (this is not always the best option - see the [Fairlearn website](http://fairlearn.github.io/) for further discussion). We also separate out the Race column, but we will not perform any mitigation based on it. Finally, we perform some standard data preprocessing steps to convert the data into a format suitable for the ML algorithms"
"We are going to treat the sex and race of each individual as protected attributes, and in this particular case we are going to remove these attributes from the main data (this is not always the best option - see the [Fairlearn website](http://fairlearn.github.io/) for further discussion). Protected attributes are often denoted by 'A' in the literature, and we follow that convention here:"
]
},
{
Expand All @@ -123,23 +137,14 @@
"outputs": [],
"source": [
"A = X_raw[['sex','race']]\n",
"X = X_raw.drop(labels=['sex', 'race'],axis = 1)\n",
"X_dummies = pd.get_dummies(X)\n",
"\n",
"sc = StandardScaler()\n",
"X_scaled = sc.fit_transform(X_dummies)\n",
"X_scaled = pd.DataFrame(X_scaled, columns=X_dummies.columns)\n",
"\n",
"\n",
"le = LabelEncoder()\n",
"Y = le.fit_transform(Y)"
"X_raw = X_raw.drop(labels=['sex', 'race'],axis = 1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With our data prepared, we can make the conventional split in to 'test' and 'train' subsets:"
"We now preprocess our data. To avoid the problem of data leakage, we split our data into training and test sets before performing any other transformations. Subsequent transformations (such as scalings) will be fit to the training data set, and then applied to the test dataset."
]
},
{
Expand All @@ -148,21 +153,76 @@
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split\n",
"X_train, X_test, Y_train, Y_test, A_train, A_test = train_test_split(X_scaled, \n",
" Y, \n",
" A,\n",
" test_size = 0.2,\n",
" random_state=0,\n",
" stratify=Y)\n",
"\n",
"# Work around indexing issue\n",
"(X_train, X_test, y_train, y_test, A_train, A_test) = train_test_split(\n",
" X_raw, y, A, test_size=0.3, random_state=12345, stratify=y\n",
")\n",
"\n",
"# Ensure indices are aligned between X, y and A,\n",
"# after all the slicing and splitting of DataFrames\n",
"# and Series\n",
"\n",
"X_train = X_train.reset_index(drop=True)\n",
"A_train = A_train.reset_index(drop=True)\n",
"X_test = X_test.reset_index(drop=True)\n",
"y_train = y_train.reset_index(drop=True)\n",
"y_test = y_test.reset_index(drop=True)\n",
"A_train = A_train.reset_index(drop=True)\n",
"A_test = A_test.reset_index(drop=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have two types of column in the dataset - categorical columns which will need to be one-hot encoded, and numeric ones which will need to be rescaled. We also need to take care of missing values. We use a simple approach here, but please bear in mind that this is another way that bias could be introduced (especially if one subgroup tends to have more missing values).\n",
"\n",
"For this preprocessing, we make use of `Pipeline` objects from `sklearn`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"numeric_transformer = Pipeline(\n",
" steps=[\n",
" (\"impute\", SimpleImputer()),\n",
" (\"scaler\", StandardScaler()),\n",
" ]\n",
")\n",
"\n",
"categorical_transformer = Pipeline(\n",
" [\n",
" (\"impute\", SimpleImputer(strategy=\"most_frequent\")),\n",
" (\"ohe\", OneHotEncoder(handle_unknown=\"ignore\", sparse=False)),\n",
" ]\n",
")\n",
"\n",
"preprocessor = ColumnTransformer(\n",
" transformers=[\n",
" (\"num\", numeric_transformer, selector(dtype_exclude=\"category\")),\n",
" (\"cat\", categorical_transformer, selector(dtype_include=\"category\")),\n",
" ]\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, the preprocessing pipeline is defined, we can run it on our training data, and apply the generated transform to our test data:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X_train = preprocessor.fit_transform(X_train)\n",
"X_test = preprocessor.transform(X_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -181,7 +241,7 @@
"source": [
"unmitigated_predictor = LogisticRegression(solver='liblinear', fit_intercept=True)\n",
"\n",
"unmitigated_predictor.fit(X_train, Y_train)"
"unmitigated_predictor.fit(X_train, y_train)"
]
},
{
Expand All @@ -198,7 +258,7 @@
"outputs": [],
"source": [
"FairlearnDashboard(sensitive_features=A_test, sensitive_feature_names=['Sex', 'Race'],\n",
" y_true=Y_test,\n",
" y_true=y_test,\n",
" y_pred={\"unmitigated\": unmitigated_predictor.predict(X_test)})"
]
},
Expand Down Expand Up @@ -249,9 +309,10 @@
"metadata": {},
"outputs": [],
"source": [
"sweep.fit(X_train, Y_train,\n",
"sweep.fit(X_train, y_train,\n",
" sensitive_features=A_train.sex)\n",
"\n",
"# For Fairlearn v0.5.0, need sweep.predictors_\n",
"predictors = sweep._predictors"
]
},
Expand All @@ -273,9 +334,9 @@
" classifier = lambda X: m.predict(X)\n",
" \n",
" error = ErrorRate()\n",
" error.load_data(X_train, pd.Series(Y_train), sensitive_features=A_train.sex)\n",
" error.load_data(X_train, pd.Series(y_train), sensitive_features=A_train.sex)\n",
" disparity = DemographicParity()\n",
" disparity.load_data(X_train, pd.Series(Y_train), sensitive_features=A_train.sex)\n",
" disparity.load_data(X_train, pd.Series(y_train), sensitive_features=A_train.sex)\n",
" \n",
" errors.append(error.gamma(classifier)[0])\n",
" disparities.append(disparity.gamma(classifier).max())\n",
Expand Down Expand Up @@ -329,15 +390,15 @@
"source": [
"FairlearnDashboard(sensitive_features=A_test, \n",
" sensitive_feature_names=['Sex', 'Race'],\n",
" y_true=Y_test.tolist(),\n",
" y_true=y_test.tolist(),\n",
" y_pred=predictions_dominant)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When using sex as the sensitive feature, we see a Pareto front forming - the set of predictors which represent optimal tradeoffs between accuracy and disparity in predictions. In the ideal case, we would have a predictor at (1,0) - perfectly accurate and without any unfairness under demographic parity (with respect to the protected attribute \"sex\"). The Pareto front represents the closest we can come to this ideal based on our data and choice of estimator. Note the range of the axes - the disparity axis covers more values than the accuracy, so we can reduce disparity substantially for a small loss in accuracy. Finally, we also see that the unmitigated model is towards the top right of the plot, with high accuracy, but worst disparity.\n",
"When using sex as the sensitive feature and accuracy as the metric, we see a Pareto front forming - the set of predictors which represent optimal tradeoffs between accuracy and disparity in predictions. In the ideal case, we would have a predictor at (1,0) - perfectly accurate and without any unfairness under demographic parity (with respect to the protected attribute \"sex\"). The Pareto front represents the closest we can come to this ideal based on our data and choice of estimator. Note the range of the axes - the disparity axis covers more values than the accuracy, so we can reduce disparity substantially for a small loss in accuracy. Finally, we also see that the unmitigated model is towards the top right of the plot, with high accuracy, but worst disparity.\n",
"\n",
"By clicking on individual models on the plot, we can inspect their metrics for disparity and accuracy in greater detail. In a real example, we would then pick the model which represented the best trade-off between accuracy and disparity given the relevant business constraints."
]
Expand Down Expand Up @@ -444,7 +505,7 @@
"from fairlearn.metrics._group_metric_set import _create_group_metric_set\n",
"\n",
"\n",
"dash_dict = _create_group_metric_set(y_true=Y_test,\n",
"dash_dict = _create_group_metric_set(y_true=y_test,\n",
" predictions=predictions_dominant_ids,\n",
" sensitive_features=sf,\n",
" prediction_type='binary_classification')"
Expand Down
File renamed without changes.
Loading