akashperfect
diff --git a/‎configuration.ipynb‎
Lines changed: 1 addition & 1 deletion b/‎configuration.ipynb‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎how-to-use-azureml/automated-machine-learning/automl_setup.cmd‎
Lines changed: 1 addition & 1 deletion b/‎how-to-use-azureml/automated-machine-learning/automl_setup.cmd‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎how-to-use-azureml/automated-machine-learning/automl_setup_linux.sh‎
Lines changed: 1 addition & 1 deletion b/‎how-to-use-azureml/automated-machine-learning/automl_setup_linux.sh‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎how-to-use-azureml/automated-machine-learning/automl_setup_mac.sh‎
Lines changed: 1 addition & 1 deletion b/‎how-to-use-azureml/automated-machine-learning/automl_setup_mac.sh‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎how-to-use-azureml/automated-machine-learning/classification-bank-marketing-all-features/auto-ml-classification-bank-marketing-all-features.ipynb‎
Lines changed: 70 additions & 15 deletions b/‎how-to-use-azureml/automated-machine-learning/classification-bank-marketing-all-features/auto-ml-classification-bank-marketing-all-features.ipynb‎
Lines changed: 70 additions & 15 deletions
diff --git a/‎how-to-use-azureml/automated-machine-learning/forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb‎
Lines changed: 1 addition & 1 deletion b/‎how-to-use-azureml/automated-machine-learning/forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb‎
Lines changed: 6 additions & 20 deletions b/‎how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb‎
Lines changed: 6 additions & 20 deletions
diff --git a/‎how-to-use-azureml/automated-machine-learning/forecasting-grouping/build.py‎
Lines changed: 3 additions & 1 deletion b/‎how-to-use-azureml/automated-machine-learning/forecasting-grouping/build.py‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎how-to-use-azureml/automated-machine-learning/forecasting-grouping/deploy/score.py‎
Lines changed: 1 addition & 1 deletion b/‎how-to-use-azureml/automated-machine-learning/forecasting-grouping/deploy/score.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎how-to-use-azureml/automated-machine-learning/forecasting-high-frequency/automl-forecasting-function.ipynb‎
Lines changed: 8 additions & 13 deletions b/‎how-to-use-azureml/automated-machine-learning/forecasting-high-frequency/automl-forecasting-function.ipynb‎
Lines changed: 8 additions & 13 deletions
@@ -103,7 +103,7 @@
       "source": [
         "import azureml.core\n",
         "\n",
-        "print(\"This notebook was created using version 1.0.76.1 of the Azure ML SDK\")\n",
+        "print(\"This notebook was created using version 1.0.76.2 of the Azure ML SDK\")\n",
         "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
       ]
     },
 
@@ -14,7 +14,7 @@ IF "%CONDA_EXE%"=="" GOTO CondaMissing
 call conda activate %conda_env_name% 2>nul:
 
 if not errorlevel 1 (
-  echo Upgrading azureml-sdk[automl,notebooks,explain] in existing conda environment %conda_env_name%
+  echo Upgrading existing conda environment %conda_env_name%
   call pip uninstall azureml-train-automl -y -q
   call conda env update --name %conda_env_name% --file %automl_env_file%
   if errorlevel 1 goto ErrorExit
 
@@ -22,7 +22,7 @@ fi
 
 if source activate $CONDA_ENV_NAME 2> /dev/null
 then
-   echo "Upgrading azureml-sdk[automl,notebooks,explain] in existing conda environment" $CONDA_ENV_NAME
+   echo "Upgrading existing conda environment" $CONDA_ENV_NAME
    pip uninstall azureml-train-automl -y -q
    conda env update --name $CONDA_ENV_NAME --file $AUTOML_ENV_FILE &&
    jupyter nbextension uninstall --user --py azureml.widgets
 
@@ -22,7 +22,7 @@ fi
 
 if source activate $CONDA_ENV_NAME 2> /dev/null
 then
-   echo "Upgrading azureml-sdk[automl,notebooks,explain] in existing conda environment" $CONDA_ENV_NAME
+   echo "Upgrading existing conda environment" $CONDA_ENV_NAME
    pip uninstall azureml-train-automl -y -q
    conda env update --name $CONDA_ENV_NAME --file $AUTOML_ENV_FILE &&
    jupyter nbextension uninstall --user --py azureml.widgets
 
@@ -285,14 +285,16 @@
         "|**task**|classification or regression or forecasting|\n",
         "|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
         "|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
-        "|**blacklist_models** or **whitelist_models** |*List* of *strings* indicating machine learning algorithms for AutoML to avoid in this run.<br><br> Allowed values for **Classification**<br><i>LogisticRegression</i><br><i>SGD</i><br><i>MultinomialNaiveBayes</i><br><i>BernoulliNaiveBayes</i><br><i>SVM</i><br><i>LinearSVM</i><br><i>KNN</i><br><i>DecisionTree</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>GradientBoosting</i><br><i>TensorFlowDNN</i><br><i>TensorFlowLinearClassifier</i><br><br>Allowed values for **Regression**<br><i>ElasticNet</i><br><i>GradientBoosting</i><br><i>DecisionTree</i><br><i>KNN</i><br><i>LassoLars</i><br><i>SGD</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>TensorFlowLinearRegressor</i><br><i>TensorFlowDNN</i><br><br>Allowed values for **Forecasting**<br><i>ElasticNet</i><br><i>GradientBoosting</i><br><i>DecisionTree</i><br><i>KNN</i><br><i>LassoLars</i><br><i>SGD</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>TensorFlowLinearRegressor</i><br><i>TensorFlowDNN</i><br><i>Arima</i><br><i>Prophet</i>|\n",
+        "|**blacklist_models** | *List* of *strings* indicating machine learning algorithms for AutoML to avoid in this run. <br><br> Allowed values for **Classification**<br><i>LogisticRegression</i><br><i>SGD</i><br><i>MultinomialNaiveBayes</i><br><i>BernoulliNaiveBayes</i><br><i>SVM</i><br><i>LinearSVM</i><br><i>KNN</i><br><i>DecisionTree</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>GradientBoosting</i><br><i>TensorFlowDNN</i><br><i>TensorFlowLinearClassifier</i><br><br>Allowed values for **Regression**<br><i>ElasticNet</i><br><i>GradientBoosting</i><br><i>DecisionTree</i><br><i>KNN</i><br><i>LassoLars</i><br><i>SGD</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>TensorFlowLinearRegressor</i><br><i>TensorFlowDNN</i><br><br>Allowed values for **Forecasting**<br><i>ElasticNet</i><br><i>GradientBoosting</i><br><i>DecisionTree</i><br><i>KNN</i><br><i>LassoLars</i><br><i>SGD</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>TensorFlowLinearRegressor</i><br><i>TensorFlowDNN</i><br><i>Arima</i><br><i>Prophet</i>|\n",
+        "| **whitelist_models** |  *List* of *strings* indicating machine learning algorithms for AutoML to use in this run. Same values listed above for **blacklist_models** allowed for **whitelist_models**.|\n",
         "|**experiment_exit_score**| Value indicating the target for *primary_metric*. <br>Once the target is surpassed the run terminates.|\n",
         "|**experiment_timeout_minutes**| Maximum amount of time in minutes that all iterations combined can take before the experiment terminates.|\n",
         "|**enable_early_stopping**| Flag to enble early termination if the score is not improving in the short term.|\n",
         "|**featurization**| 'auto' / 'off'  Indicator for whether featurization step should be done automatically or not. Note: If the input data is sparse, featurization cannot be turned on.|\n",
         "|**n_cross_validations**|Number of cross validation splits.|\n",
         "|**training_data**|Input dataset, containing both features and label column.|\n",
         "|**label_column_name**|The name of the label column.|\n",
+        "|**model_explainability**|Indicate to explain each trained pipeline or not.|\n",
         "\n",
         "**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
       ]
@@ -324,6 +326,7 @@
         "                             training_data = train_data,\n",
         "                             label_column_name = label,\n",
         "                             validation_data = validation_dataset,\n",
+        "                             model_explainability=True,\n",
         "                             **automl_settings\n",
         "                            )"
       ]
@@ -456,6 +459,72 @@
         "RunDetails(remote_run).show() "
       ]
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Retrieve the Best Model's explanation\n",
+        "Retrieve the explanation from the best_run which includes explanations for engineered features and raw features. Make sure that the run for generating explanations for the best model is completed."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# Wait for the best model explanation run to complete\n",
+        "from azureml.train.automl.run import AutoMLRun\n",
+        "model_explainability_run_id = remote_run.get_properties().get('ModelExplainRunId')\n",
+        "print(model_explainability_run_id)\n",
+        "if model_explainability_run_id is not None:\n",
+        "    model_explainability_run = AutoMLRun(experiment=experiment, run_id=model_explainability_run_id)\n",
+        "    model_explainability_run.wait_for_completion()\n",
+        "\n",
+        "# Get the best run object\n",
+        "best_run, fitted_model = remote_run.get_output()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "#### Download engineered feature importance from artifact store\n",
+        "You can use ExplanationClient to download the engineered feature explanations from the artifact store of the best_run."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "client = ExplanationClient.from_run(best_run)\n",
+        "engineered_explanations = client.download_model_explanation(raw=False)\n",
+        "exp_data = engineered_explanations.get_feature_importance_dict()\n",
+        "exp_data"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "#### Download raw feature importance from artifact store\n",
+        "You can use ExplanationClient to download the raw feature explanations from the artifact store of the best_run."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "client = ExplanationClient.from_run(best_run)\n",
+        "engineered_explanations = client.download_model_explanation(raw=True)\n",
+        "exp_data = engineered_explanations.get_feature_importance_dict()\n",
+        "exp_data"
+      ]
+    },
     {
       "cell_type": "markdown",
       "metadata": {},
@@ -572,20 +641,6 @@
         "best_run, fitted_model = remote_run.get_output()"
       ]
     },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "import os\n",
-        "import shutil\n",
-        "\n",
-        "sript_folder = os.path.join(os.getcwd(), 'inference')\n",
-        "project_folder = '/inference'\n",
-        "os.makedirs(project_folder, exist_ok=True)"
-      ]
-    },
     {
       "cell_type": "code",
       "execution_count": null,
 
@@ -42,7 +42,7 @@
         "\n",
         "AutoML highlights here include built-in holiday featurization, accessing engineered feature names, and working with the `forecast` function. Please also look at the additional forecasting notebooks, which document lagging, rolling windows, forecast quantiles, other ways to use the forecast function, and forecaster deployment.\n",
         "\n",
-        "Make sure you have executed the [configuration](../configuration.ipynb) before running this notebook.\n",
+        "Make sure you have executed the [configuration notebook](../../../configuration.ipynb) before running this notebook.\n",
         "\n",
         "Notebook synopsis:\n",
         "1. Creating an Experiment in an existing Workspace\n",
 
@@ -31,8 +31,8 @@
         "1. [Results](#Results)\n",
         "\n",
         "Advanced Forecasting\n",
-        "1. [Advanced Training](#Advanced Training)\n",
-        "1. [Advanced Results](#Advanced Results)"
+        "1. [Advanced Training](#advanced_training)\n",
+        "1. [Advanced Results](#advanced Results)"
       ]
     },
     {
@@ -463,11 +463,7 @@
       "metadata": {},
       "source": [
         "### Forecast Function\n",
-        "For forecasting, we will use the forecast function instead of the predict function. There are two reasons for this.\n",
-        "\n",
-        "We need to pass the recent values of the target variable y, whereas the scikit-compatible predict function only takes the non-target variables 'test'. In our case, the test data immediately follows the training data, and we fill the target variable with NaN. The NaN serves as a question mark for the forecaster to fill with the actuals. Using the forecast function will produce forecasts using the shortest possible forecast horizon. The last time at which a definite (non-NaN) value is seen is the forecast origin - the last time when the value of the target is known.\n",
-        "\n",
-        "Using the predict method would result in getting predictions for EVERY horizon the forecaster can predict at. This is useful when training and evaluating the performance of the forecaster at various horizons, but the level of detail is excessive for normal use."
+        "For forecasting, we will use the forecast function instead of the predict function. Using the predict method would result in getting predictions for EVERY horizon the forecaster can predict at. This is useful when training and evaluating the performance of the forecaster at various horizons, but the level of detail is excessive for normal use. Forecast function also can handle more complicated scenarios, see notebook on [high frequency forecasting](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/forecasting-high-frequency/automl-forecasting-function.ipynb)."
       ]
     },
     {
@@ -476,15 +472,10 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "# Replace ALL values in y by NaN.\n",
-        "# The forecast origin will be at the beginning of the first forecast period.\n",
-        "# (Which is the same time as the end of the last training period.)\n",
-        "y_query = y_test.copy().astype(np.float)\n",
-        "y_query.fill(np.nan)\n",
         "# The featurized data, aligned to y, will also be returned.\n",
         "# This contains the assumptions that were made in the forecast\n",
         "# and helps align the forecast to the original data\n",
-        "y_predictions, X_trans = fitted_model.forecast(X_test, y_query)"
+        "y_predictions, X_trans = fitted_model.forecast(X_test)"
       ]
     },
     {
@@ -557,7 +548,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "## Advanced Training\n",
+        "## Advanced Training <a id=\"advanced_training\"></a>\n",
         "We did not use lags in the previous model specification. In effect, the prediction was the result of a simple regression on date, grain and any additional features. This is often a very good prediction as common time series patterns like seasonality and trends can be captured in this manner. Such simple regression is horizon-less: it doesn't matter how far into the future we are predicting, because we are not using past data. In the previous example, the horizon was only used to split the data for cross-validation."
       ]
     },
@@ -652,15 +643,10 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "# Replace ALL values in y by NaN.\n",
-        "# The forecast origin will be at the beginning of the first forecast period.\n",
-        "# (Which is the same time as the end of the last training period.)\n",
-        "y_query = y_test.copy().astype(np.float)\n",
-        "y_query.fill(np.nan)\n",
         "# The featurized data, aligned to y, will also be returned.\n",
         "# This contains the assumptions that were made in the forecast\n",
         "# and helps align the forecast to the original data\n",
-        "y_predictions, X_trans = fitted_model_lags.forecast(X_test, y_query)"
+        "y_predictions, X_trans = fitted_model_lags.forecast(X_test)"
       ]
     },
     {
 
@@ -8,6 +8,7 @@
 from azureml.core.compute import ComputeTarget
 from azureml.core.conda_dependencies import CondaDependencies
 from azureml.core.dataset import Dataset
+from azureml.data import TabularDataset
 from azureml.pipeline.core import PipelineData, PipelineParameter, TrainingOutput, StepSequence
 from azureml.pipeline.steps import PythonScriptStep
 from azureml.train.automl import AutoMLConfig
@@ -34,8 +35,9 @@ def _get_configs(automlconfig: AutoMLConfig,
         group_name = valid_chars.sub('', group_name)
         for key in group.index:
             single = single._dataflow.filter(data._dataflow[key] == group[key])
+        t_dataset = TabularDataset._create(single)
         group_conf = copy.deepcopy(automlconfig)
-        group_conf.user_settings['training_data'] = single
+        group_conf.user_settings['training_data'] = t_dataset
         group_conf.user_settings['label_column_name'] = target_column
         group_conf.user_settings['compute_target'] = compute_target
         configs[group_name] = group_conf
 
@@ -44,7 +44,7 @@ def run(raw_data):
                 model_path = Model.get_model_path(cur_group)
                 model = joblib.load(model_path)
                 models[cur_group] = model
-            _, xtrans = models[cur_group].forecast(df_one, np.repeat(np.nan, len(df_one)))
+            _, xtrans = models[cur_group].forecast(df_one)
             dfs.append(xtrans)
         df_ret = pd.concat(dfs)
         df_ret.reset_index(drop=False, inplace=True)
 
@@ -377,9 +377,7 @@
         "\n",
         "![Forecasting after training](forecast_function_at_train.png)\n",
         "\n",
-        "The `X_test` and `y_query` below, taken together, form the **forecast request**. The two are interpreted as aligned - `y_query` could actally be a column in `X_test`. `NaN`s in `y_query` are the question marks. These will be filled with the forecasts.\n",
-        "\n",
-        "When the forecast period immediately follows the training period, the models retain the last few points of data. You can simply fill `y_query` filled with question marks - the model has the data for the lookback already.\n"
+        "We use `X_test` as a **forecast request** to generate the predictions."
       ]
     },
     {
@@ -408,8 +406,7 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "y_query = np.repeat(np.NaN, X_test.shape[0])\n",
-        "y_pred_no_gap, xy_nogap =  fitted_model.forecast(X_test, y_query)\n",
+        "y_pred_no_gap, xy_nogap =  fitted_model.forecast(X_test)\n",
         "\n",
         "# xy_nogap contains the predictions in the _automl_target_col column.\n",
         "# Those same numbers are output in y_pred_no_gap\n",
@@ -437,7 +434,7 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "quantiles =  fitted_model.forecast_quantiles(X_test, y_query)\n",
+        "quantiles =  fitted_model.forecast_quantiles(X_test)\n",
         "quantiles"
       ]
     },
@@ -460,10 +457,10 @@
         "# specify which quantiles you would like \n",
         "fitted_model.quantiles = [0.01, 0.5, 0.95]\n",
         "# use forecast_quantiles function, not the forecast() one\n",
-        "y_pred_quantiles =  fitted_model.forecast_quantiles(X_test, y_query)\n",
+        "y_pred_quantiles =  fitted_model.forecast_quantiles(X_test)\n",
         "\n",
         "# it all nicely aligns column-wise\n",
-        "pd.concat([X_test.reset_index(), pd.DataFrame({'query' : y_query}), y_pred_quantiles], axis=1)"
+        "pd.concat([X_test.reset_index(), y_pred_quantiles], axis=1)"
       ]
     },
     {
@@ -539,9 +536,7 @@
       "outputs": [],
       "source": [
         "try: \n",
-        "    y_query = y_away.copy()\n",
-        "    y_query.fill(np.NaN)\n",
-        "    y_pred_away, xy_away = fitted_model.forecast(X_away, y_query)\n",
+        "    y_pred_away, xy_away = fitted_model.forecast(X_away)\n",
         "    xy_away\n",
         "except Exception as e:\n",
         "    print(e)"
@@ -551,7 +546,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "How should we read that eror message? The forecast origin is at the last time the model saw an actual value of `y` (the target). That was at the end of the training data! Because the model received all `NaN` (and not an actual target value), it is attempting to forecast from the end of training data. But the requested forecast periods are past the maximum horizon. We need to provide a define `y` value to establish the forecast origin.\n",
+        "How should we read that eror message? The forecast origin is at the last time the model saw an actual value of `y` (the target). That was at the end of the training data! The model is attempting to forecast from the end of training data. But the requested forecast periods are past the maximum horizon. We need to provide a define `y` value to establish the forecast origin.\n",
         "\n",
         "We will use this helper function to take the required amount of context from the data preceding the testing data. It's definition is intentionally simplified to keep the idea in the clear."
       ]
@@ -740,7 +735,7 @@
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
-      "version": "3.6.7"
+      "version": "3.6.8"
     },
     "tags": [
       "Forecasting",
Original file line number	Diff line number	Diff line change
`@@ -103,7 +103,7 @@`
`103`	`103`	`"source": [`
`104`	`104`	`"import azureml.core\n",`
`105`	`105`	`"\n",`
`106`		`- "print(\"This notebook was created using version 1.0.76.1 of the Azure ML SDK\")\n",`
	`106`	`+ "print(\"This notebook was created using version 1.0.76.2 of the Azure ML SDK\")\n",`
`107`	`107`	`"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"`
`108`	`108`	`]`
`109`	`109`	`},`