Azure · harneetvirk · Jan 11, 2021 · Jan 11, 2021
diff --git a/configuration.ipynb b/configuration.ipynb
@@ -103,7 +103,7 @@
       "source": [
         "import azureml.core\n",
         "\n",
-        "print(\"This notebook was created using version 1.19.0 of the Azure ML SDK\")\n",
+        "print(\"This notebook was created using version 1.20.0 of the Azure ML SDK\")\n",
         "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
       ]
     },

diff --git a/contrib/fairness/fairlearn-azureml-mitigation.ipynb b/contrib/fairness/fairlearn-azureml-mitigation.ipynb
@@ -46,7 +46,7 @@
         "Please see the [configuration notebook](../../configuration.ipynb) for information about creating one, if required.\n",
         "This notebook also requires the following packages:\n",
         "* `azureml-contrib-fairness`\n",
-        "* `fairlearn==0.4.6`\n",
+        "* `fairlearn==0.4.6` (v0.5.0 will work with minor modifications)\n",
         "* `joblib`\n",
         "* `shap`\n",
         "\n",
@@ -62,13 +62,20 @@
         "# !pip install --upgrade scikit-learn>=0.22.1"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Finally, please ensure that when you downloaded this notebook, you also downloaded the `fairness_nb_utils.py` file from the same location, and placed it in the same directory as this notebook."
+      ]
+    },
     {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
         "<a id=\"LoadingData\"></a>\n",
         "## Loading the Data\n",
-        "We use the well-known `adult` census dataset, which we load using `shap` (for convenience). We start with a fairly unremarkable set of imports:"
+        "We use the well-known `adult` census dataset, which we will fetch from the OpenML website. We start with a fairly unremarkable set of imports:"
       ]
     },
     {
@@ -79,17 +86,24 @@
       "source": [
         "from fairlearn.reductions import GridSearch, DemographicParity, ErrorRate\n",
         "from fairlearn.widget import FairlearnDashboard\n",
-        "from sklearn import svm\n",
-        "from sklearn.preprocessing import LabelEncoder, StandardScaler\n",
+        "\n",
+        "from sklearn.compose import ColumnTransformer\n",
+        "from sklearn.datasets import fetch_openml\n",
+        "from sklearn.impute import SimpleImputer\n",
         "from sklearn.linear_model import LogisticRegression\n",
+        "from sklearn.model_selection import train_test_split\n",
+        "from sklearn.preprocessing import StandardScaler, OneHotEncoder\n",
+        "from sklearn.compose import make_column_selector as selector\n",
+        "from sklearn.pipeline import Pipeline\n",
+        "\n",
         "import pandas as pd"
       ]
     },
     {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "We can now load and inspect the data from the `shap` package:"
+        "We can now load and inspect the data:"
       ]
     },
     {
@@ -98,13 +112,13 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "from utilities import fetch_openml_with_retries\n",
+        "from fairness_nb_utils import fetch_openml_with_retries\n",
         "\n",
         "data = fetch_openml_with_retries(data_id=1590)\n",
         "    \n",
         "# Extract the items we want\n",
         "X_raw = data.data\n",
-        "Y = (data.target == '>50K') * 1\n",
+        "y = (data.target == '>50K') * 1\n",
         "\n",
         "X_raw[\"race\"].value_counts().to_dict()"
       ]
@@ -113,7 +127,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "We are going to treat the sex of each individual as a protected attribute (where 0 indicates female and 1 indicates male), and in this particular case we are going separate this attribute out and drop it from the main data (this is not always the best option - see the [Fairlearn website](http://fairlearn.github.io/) for further discussion). We also separate out the Race column, but we will not perform any mitigation based on it. Finally, we perform some standard data preprocessing steps to convert the data into a format suitable for the ML algorithms"
+        "We are going to treat the sex and race of each individual as protected attributes, and in this particular case we are going to remove these attributes from the main data (this is not always the best option - see the [Fairlearn website](http://fairlearn.github.io/) for further discussion). Protected attributes are often denoted by 'A' in the literature, and we follow that convention here:"
       ]
     },
     {
@@ -123,23 +137,14 @@
       "outputs": [],
       "source": [
         "A = X_raw[['sex','race']]\n",
-        "X = X_raw.drop(labels=['sex', 'race'],axis = 1)\n",
-        "X_dummies = pd.get_dummies(X)\n",
-        "\n",
-        "sc = StandardScaler()\n",
-        "X_scaled = sc.fit_transform(X_dummies)\n",
-        "X_scaled = pd.DataFrame(X_scaled, columns=X_dummies.columns)\n",
-        "\n",
-        "\n",
-        "le = LabelEncoder()\n",
-        "Y = le.fit_transform(Y)"
+        "X_raw = X_raw.drop(labels=['sex', 'race'],axis = 1)"
       ]
     },
     {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "With our data prepared, we can make the conventional split in to 'test' and 'train' subsets:"
+        "We now preprocess our data. To avoid the problem of data leakage, we split our data into training and test sets before performing any other transformations. Subsequent transformations (such as scalings) will be fit to the training data set, and then applied to the test dataset."
       ]
     },
     {
@@ -148,21 +153,76 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "from sklearn.model_selection import train_test_split\n",
-        "X_train, X_test, Y_train, Y_test, A_train, A_test = train_test_split(X_scaled, \n",
-        "                                                    Y, \n",
-        "                                                    A,\n",
-        "                                                    test_size = 0.2,\n",
-        "                                                    random_state=0,\n",
-        "                                                    stratify=Y)\n",
-        "\n",
-        "# Work around indexing issue\n",
+        "(X_train, X_test, y_train, y_test, A_train, A_test) = train_test_split(\n",
+        "    X_raw, y, A, test_size=0.3, random_state=12345, stratify=y\n",
+        ")\n",
+        "\n",
+        "# Ensure indices are aligned between X, y and A,\n",
+        "# after all the slicing and splitting of DataFrames\n",
+        "# and Series\n",
+        "\n",
         "X_train = X_train.reset_index(drop=True)\n",
-        "A_train = A_train.reset_index(drop=True)\n",
         "X_test = X_test.reset_index(drop=True)\n",
+        "y_train = y_train.reset_index(drop=True)\n",
+        "y_test = y_test.reset_index(drop=True)\n",
+        "A_train = A_train.reset_index(drop=True)\n",
         "A_test = A_test.reset_index(drop=True)"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "We have two types of column in the dataset - categorical columns which will need to be one-hot encoded, and numeric ones which will need to be rescaled. We also need to take care of missing values. We use a simple approach here, but please bear in mind that this is another way that bias could be introduced (especially if one subgroup tends to have more missing values).\n",
+        "\n",
+        "For this preprocessing, we make use of `Pipeline` objects from `sklearn`:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "numeric_transformer = Pipeline(\n",
+        "    steps=[\n",
+        "        (\"impute\", SimpleImputer()),\n",
+        "        (\"scaler\", StandardScaler()),\n",
+        "    ]\n",
+        ")\n",
+        "\n",
+        "categorical_transformer = Pipeline(\n",
+        "    [\n",
+        "        (\"impute\", SimpleImputer(strategy=\"most_frequent\")),\n",
+        "        (\"ohe\", OneHotEncoder(handle_unknown=\"ignore\", sparse=False)),\n",
+        "    ]\n",
+        ")\n",
+        "\n",
+        "preprocessor = ColumnTransformer(\n",
+        "    transformers=[\n",
+        "        (\"num\", numeric_transformer, selector(dtype_exclude=\"category\")),\n",
+        "        (\"cat\", categorical_transformer, selector(dtype_include=\"category\")),\n",
+        "    ]\n",
+        ")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Now, the preprocessing pipeline is defined, we can run it on our training data, and apply the generated transform to our test data:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "X_train = preprocessor.fit_transform(X_train)\n",
+        "X_test = preprocessor.transform(X_test)"
+      ]
+    },
     {
       "cell_type": "markdown",
       "metadata": {},
@@ -181,7 +241,7 @@
       "source": [
         "unmitigated_predictor = LogisticRegression(solver='liblinear', fit_intercept=True)\n",
         "\n",
-        "unmitigated_predictor.fit(X_train, Y_train)"
+        "unmitigated_predictor.fit(X_train, y_train)"
       ]
     },
     {
@@ -198,7 +258,7 @@
       "outputs": [],
       "source": [
         "FairlearnDashboard(sensitive_features=A_test, sensitive_feature_names=['Sex', 'Race'],\n",
-        "                   y_true=Y_test,\n",
+        "                   y_true=y_test,\n",
         "                   y_pred={\"unmitigated\": unmitigated_predictor.predict(X_test)})"
       ]
     },
@@ -249,9 +309,10 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "sweep.fit(X_train, Y_train,\n",
+        "sweep.fit(X_train, y_train,\n",
         "          sensitive_features=A_train.sex)\n",
         "\n",
+        "# For Fairlearn v0.5.0, need sweep.predictors_\n",
         "predictors = sweep._predictors"
       ]
     },
@@ -273,9 +334,9 @@
         "    classifier = lambda X: m.predict(X)\n",
         "    \n",
         "    error = ErrorRate()\n",
-        "    error.load_data(X_train, pd.Series(Y_train), sensitive_features=A_train.sex)\n",
+        "    error.load_data(X_train, pd.Series(y_train), sensitive_features=A_train.sex)\n",
         "    disparity = DemographicParity()\n",
-        "    disparity.load_data(X_train, pd.Series(Y_train), sensitive_features=A_train.sex)\n",
+        "    disparity.load_data(X_train, pd.Series(y_train), sensitive_features=A_train.sex)\n",
         "    \n",
         "    errors.append(error.gamma(classifier)[0])\n",
         "    disparities.append(disparity.gamma(classifier).max())\n",
@@ -329,15 +390,15 @@
       "source": [
         "FairlearnDashboard(sensitive_features=A_test, \n",
         "                   sensitive_feature_names=['Sex', 'Race'],\n",
-        "                   y_true=Y_test.tolist(),\n",
+        "                   y_true=y_test.tolist(),\n",
         "                   y_pred=predictions_dominant)"
       ]
     },
     {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "When using sex as the sensitive feature, we see a Pareto front forming - the set of predictors which represent optimal tradeoffs between accuracy and disparity in predictions. In the ideal case, we would have a predictor at (1,0) - perfectly accurate and without any unfairness under demographic parity (with respect to the protected attribute \"sex\"). The Pareto front represents the closest we can come to this ideal based on our data and choice of estimator. Note the range of the axes - the disparity axis covers more values than the accuracy, so we can reduce disparity substantially for a small loss in accuracy. Finally, we also see that the unmitigated model is towards the top right of the plot, with high accuracy, but worst disparity.\n",
+        "When using sex as the sensitive feature and accuracy as the metric, we see a Pareto front forming - the set of predictors which represent optimal tradeoffs between accuracy and disparity in predictions. In the ideal case, we would have a predictor at (1,0) - perfectly accurate and without any unfairness under demographic parity (with respect to the protected attribute \"sex\"). The Pareto front represents the closest we can come to this ideal based on our data and choice of estimator. Note the range of the axes - the disparity axis covers more values than the accuracy, so we can reduce disparity substantially for a small loss in accuracy. Finally, we also see that the unmitigated model is towards the top right of the plot, with high accuracy, but worst disparity.\n",
         "\n",
         "By clicking on individual models on the plot, we can inspect their metrics for disparity and accuracy in greater detail. In a real example, we would then pick the model which represented the best trade-off between accuracy and disparity given the relevant business constraints."
       ]
@@ -444,7 +505,7 @@
         "from fairlearn.metrics._group_metric_set import _create_group_metric_set\n",
         "\n",
         "\n",
-        "dash_dict = _create_group_metric_set(y_true=Y_test,\n",
+        "dash_dict = _create_group_metric_set(y_true=y_test,\n",
         "                                     predictions=predictions_dominant_ids,\n",
         "                                     sensitive_features=sf,\n",
         "                                     prediction_type='binary_classification')"

diff --git a/contrib/fairness/utilities.py → contrib/fairness/fairness_nb_utils.py b/contrib/fairness/utilities.py → contrib/fairness/fairness_nb_utils.py