Skip to content

Commit e149565

Browse files
authored
Merge pull request Azure#679 from Azure/release_update/Release-30
update samples - test
2 parents 0c2c450 + 75610ec commit e149565

File tree

33 files changed

+830
-119
lines changed

33 files changed

+830
-119
lines changed

configuration.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@
103103
"source": [
104104
"import azureml.core\n",
105105
"\n",
106-
"print(\"This notebook was created using version 1.0.76.1 of the Azure ML SDK\")\n",
106+
"print(\"This notebook was created using version 1.0.76.2 of the Azure ML SDK\")\n",
107107
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
108108
]
109109
},

how-to-use-azureml/automated-machine-learning/automl_setup.cmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ IF "%CONDA_EXE%"=="" GOTO CondaMissing
1414
call conda activate %conda_env_name% 2>nul:
1515

1616
if not errorlevel 1 (
17-
echo Upgrading azureml-sdk[automl,notebooks,explain] in existing conda environment %conda_env_name%
17+
echo Upgrading existing conda environment %conda_env_name%
1818
call pip uninstall azureml-train-automl -y -q
1919
call conda env update --name %conda_env_name% --file %automl_env_file%
2020
if errorlevel 1 goto ErrorExit

how-to-use-azureml/automated-machine-learning/automl_setup_linux.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ fi
2222

2323
if source activate $CONDA_ENV_NAME 2> /dev/null
2424
then
25-
echo "Upgrading azureml-sdk[automl,notebooks,explain] in existing conda environment" $CONDA_ENV_NAME
25+
echo "Upgrading existing conda environment" $CONDA_ENV_NAME
2626
pip uninstall azureml-train-automl -y -q
2727
conda env update --name $CONDA_ENV_NAME --file $AUTOML_ENV_FILE &&
2828
jupyter nbextension uninstall --user --py azureml.widgets

how-to-use-azureml/automated-machine-learning/automl_setup_mac.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ fi
2222

2323
if source activate $CONDA_ENV_NAME 2> /dev/null
2424
then
25-
echo "Upgrading azureml-sdk[automl,notebooks,explain] in existing conda environment" $CONDA_ENV_NAME
25+
echo "Upgrading existing conda environment" $CONDA_ENV_NAME
2626
pip uninstall azureml-train-automl -y -q
2727
conda env update --name $CONDA_ENV_NAME --file $AUTOML_ENV_FILE &&
2828
jupyter nbextension uninstall --user --py azureml.widgets

how-to-use-azureml/automated-machine-learning/classification-bank-marketing-all-features/auto-ml-classification-bank-marketing-all-features.ipynb

Lines changed: 70 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -285,14 +285,16 @@
285285
"|**task**|classification or regression or forecasting|\n",
286286
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
287287
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
288-
"|**blacklist_models** or **whitelist_models** |*List* of *strings* indicating machine learning algorithms for AutoML to avoid in this run.<br><br> Allowed values for **Classification**<br><i>LogisticRegression</i><br><i>SGD</i><br><i>MultinomialNaiveBayes</i><br><i>BernoulliNaiveBayes</i><br><i>SVM</i><br><i>LinearSVM</i><br><i>KNN</i><br><i>DecisionTree</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>GradientBoosting</i><br><i>TensorFlowDNN</i><br><i>TensorFlowLinearClassifier</i><br><br>Allowed values for **Regression**<br><i>ElasticNet</i><br><i>GradientBoosting</i><br><i>DecisionTree</i><br><i>KNN</i><br><i>LassoLars</i><br><i>SGD</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>TensorFlowLinearRegressor</i><br><i>TensorFlowDNN</i><br><br>Allowed values for **Forecasting**<br><i>ElasticNet</i><br><i>GradientBoosting</i><br><i>DecisionTree</i><br><i>KNN</i><br><i>LassoLars</i><br><i>SGD</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>TensorFlowLinearRegressor</i><br><i>TensorFlowDNN</i><br><i>Arima</i><br><i>Prophet</i>|\n",
288+
"|**blacklist_models** | *List* of *strings* indicating machine learning algorithms for AutoML to avoid in this run. <br><br> Allowed values for **Classification**<br><i>LogisticRegression</i><br><i>SGD</i><br><i>MultinomialNaiveBayes</i><br><i>BernoulliNaiveBayes</i><br><i>SVM</i><br><i>LinearSVM</i><br><i>KNN</i><br><i>DecisionTree</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>GradientBoosting</i><br><i>TensorFlowDNN</i><br><i>TensorFlowLinearClassifier</i><br><br>Allowed values for **Regression**<br><i>ElasticNet</i><br><i>GradientBoosting</i><br><i>DecisionTree</i><br><i>KNN</i><br><i>LassoLars</i><br><i>SGD</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>TensorFlowLinearRegressor</i><br><i>TensorFlowDNN</i><br><br>Allowed values for **Forecasting**<br><i>ElasticNet</i><br><i>GradientBoosting</i><br><i>DecisionTree</i><br><i>KNN</i><br><i>LassoLars</i><br><i>SGD</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>TensorFlowLinearRegressor</i><br><i>TensorFlowDNN</i><br><i>Arima</i><br><i>Prophet</i>|\n",
289+
"| **whitelist_models** | *List* of *strings* indicating machine learning algorithms for AutoML to use in this run. Same values listed above for **blacklist_models** allowed for **whitelist_models**.|\n",
289290
"|**experiment_exit_score**| Value indicating the target for *primary_metric*. <br>Once the target is surpassed the run terminates.|\n",
290291
"|**experiment_timeout_minutes**| Maximum amount of time in minutes that all iterations combined can take before the experiment terminates.|\n",
291292
"|**enable_early_stopping**| Flag to enble early termination if the score is not improving in the short term.|\n",
292293
"|**featurization**| 'auto' / 'off' Indicator for whether featurization step should be done automatically or not. Note: If the input data is sparse, featurization cannot be turned on.|\n",
293294
"|**n_cross_validations**|Number of cross validation splits.|\n",
294295
"|**training_data**|Input dataset, containing both features and label column.|\n",
295296
"|**label_column_name**|The name of the label column.|\n",
297+
"|**model_explainability**|Indicate to explain each trained pipeline or not.|\n",
296298
"\n",
297299
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
298300
]
@@ -324,6 +326,7 @@
324326
" training_data = train_data,\n",
325327
" label_column_name = label,\n",
326328
" validation_data = validation_dataset,\n",
329+
" model_explainability=True,\n",
327330
" **automl_settings\n",
328331
" )"
329332
]
@@ -456,6 +459,72 @@
456459
"RunDetails(remote_run).show() "
457460
]
458461
},
462+
{
463+
"cell_type": "markdown",
464+
"metadata": {},
465+
"source": [
466+
"### Retrieve the Best Model's explanation\n",
467+
"Retrieve the explanation from the best_run which includes explanations for engineered features and raw features. Make sure that the run for generating explanations for the best model is completed."
468+
]
469+
},
470+
{
471+
"cell_type": "code",
472+
"execution_count": null,
473+
"metadata": {},
474+
"outputs": [],
475+
"source": [
476+
"# Wait for the best model explanation run to complete\n",
477+
"from azureml.train.automl.run import AutoMLRun\n",
478+
"model_explainability_run_id = remote_run.get_properties().get('ModelExplainRunId')\n",
479+
"print(model_explainability_run_id)\n",
480+
"if model_explainability_run_id is not None:\n",
481+
" model_explainability_run = AutoMLRun(experiment=experiment, run_id=model_explainability_run_id)\n",
482+
" model_explainability_run.wait_for_completion()\n",
483+
"\n",
484+
"# Get the best run object\n",
485+
"best_run, fitted_model = remote_run.get_output()"
486+
]
487+
},
488+
{
489+
"cell_type": "markdown",
490+
"metadata": {},
491+
"source": [
492+
"#### Download engineered feature importance from artifact store\n",
493+
"You can use ExplanationClient to download the engineered feature explanations from the artifact store of the best_run."
494+
]
495+
},
496+
{
497+
"cell_type": "code",
498+
"execution_count": null,
499+
"metadata": {},
500+
"outputs": [],
501+
"source": [
502+
"client = ExplanationClient.from_run(best_run)\n",
503+
"engineered_explanations = client.download_model_explanation(raw=False)\n",
504+
"exp_data = engineered_explanations.get_feature_importance_dict()\n",
505+
"exp_data"
506+
]
507+
},
508+
{
509+
"cell_type": "markdown",
510+
"metadata": {},
511+
"source": [
512+
"#### Download raw feature importance from artifact store\n",
513+
"You can use ExplanationClient to download the raw feature explanations from the artifact store of the best_run."
514+
]
515+
},
516+
{
517+
"cell_type": "code",
518+
"execution_count": null,
519+
"metadata": {},
520+
"outputs": [],
521+
"source": [
522+
"client = ExplanationClient.from_run(best_run)\n",
523+
"engineered_explanations = client.download_model_explanation(raw=True)\n",
524+
"exp_data = engineered_explanations.get_feature_importance_dict()\n",
525+
"exp_data"
526+
]
527+
},
459528
{
460529
"cell_type": "markdown",
461530
"metadata": {},
@@ -572,20 +641,6 @@
572641
"best_run, fitted_model = remote_run.get_output()"
573642
]
574643
},
575-
{
576-
"cell_type": "code",
577-
"execution_count": null,
578-
"metadata": {},
579-
"outputs": [],
580-
"source": [
581-
"import os\n",
582-
"import shutil\n",
583-
"\n",
584-
"sript_folder = os.path.join(os.getcwd(), 'inference')\n",
585-
"project_folder = '/inference'\n",
586-
"os.makedirs(project_folder, exist_ok=True)"
587-
]
588-
},
589644
{
590645
"cell_type": "code",
591646
"execution_count": null,

how-to-use-azureml/automated-machine-learning/forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@
4242
"\n",
4343
"AutoML highlights here include built-in holiday featurization, accessing engineered feature names, and working with the `forecast` function. Please also look at the additional forecasting notebooks, which document lagging, rolling windows, forecast quantiles, other ways to use the forecast function, and forecaster deployment.\n",
4444
"\n",
45-
"Make sure you have executed the [configuration](../configuration.ipynb) before running this notebook.\n",
45+
"Make sure you have executed the [configuration notebook](../../../configuration.ipynb) before running this notebook.\n",
4646
"\n",
4747
"Notebook synopsis:\n",
4848
"1. Creating an Experiment in an existing Workspace\n",

how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb

Lines changed: 6 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,8 @@
3131
"1. [Results](#Results)\n",
3232
"\n",
3333
"Advanced Forecasting\n",
34-
"1. [Advanced Training](#Advanced Training)\n",
35-
"1. [Advanced Results](#Advanced Results)"
34+
"1. [Advanced Training](#advanced_training)\n",
35+
"1. [Advanced Results](#advanced Results)"
3636
]
3737
},
3838
{
@@ -463,11 +463,7 @@
463463
"metadata": {},
464464
"source": [
465465
"### Forecast Function\n",
466-
"For forecasting, we will use the forecast function instead of the predict function. There are two reasons for this.\n",
467-
"\n",
468-
"We need to pass the recent values of the target variable y, whereas the scikit-compatible predict function only takes the non-target variables 'test'. In our case, the test data immediately follows the training data, and we fill the target variable with NaN. The NaN serves as a question mark for the forecaster to fill with the actuals. Using the forecast function will produce forecasts using the shortest possible forecast horizon. The last time at which a definite (non-NaN) value is seen is the forecast origin - the last time when the value of the target is known.\n",
469-
"\n",
470-
"Using the predict method would result in getting predictions for EVERY horizon the forecaster can predict at. This is useful when training and evaluating the performance of the forecaster at various horizons, but the level of detail is excessive for normal use."
466+
"For forecasting, we will use the forecast function instead of the predict function. Using the predict method would result in getting predictions for EVERY horizon the forecaster can predict at. This is useful when training and evaluating the performance of the forecaster at various horizons, but the level of detail is excessive for normal use. Forecast function also can handle more complicated scenarios, see notebook on [high frequency forecasting](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/forecasting-high-frequency/automl-forecasting-function.ipynb)."
471467
]
472468
},
473469
{
@@ -476,15 +472,10 @@
476472
"metadata": {},
477473
"outputs": [],
478474
"source": [
479-
"# Replace ALL values in y by NaN.\n",
480-
"# The forecast origin will be at the beginning of the first forecast period.\n",
481-
"# (Which is the same time as the end of the last training period.)\n",
482-
"y_query = y_test.copy().astype(np.float)\n",
483-
"y_query.fill(np.nan)\n",
484475
"# The featurized data, aligned to y, will also be returned.\n",
485476
"# This contains the assumptions that were made in the forecast\n",
486477
"# and helps align the forecast to the original data\n",
487-
"y_predictions, X_trans = fitted_model.forecast(X_test, y_query)"
478+
"y_predictions, X_trans = fitted_model.forecast(X_test)"
488479
]
489480
},
490481
{
@@ -557,7 +548,7 @@
557548
"cell_type": "markdown",
558549
"metadata": {},
559550
"source": [
560-
"## Advanced Training\n",
551+
"## Advanced Training <a id=\"advanced_training\"></a>\n",
561552
"We did not use lags in the previous model specification. In effect, the prediction was the result of a simple regression on date, grain and any additional features. This is often a very good prediction as common time series patterns like seasonality and trends can be captured in this manner. Such simple regression is horizon-less: it doesn't matter how far into the future we are predicting, because we are not using past data. In the previous example, the horizon was only used to split the data for cross-validation."
562553
]
563554
},
@@ -652,15 +643,10 @@
652643
"metadata": {},
653644
"outputs": [],
654645
"source": [
655-
"# Replace ALL values in y by NaN.\n",
656-
"# The forecast origin will be at the beginning of the first forecast period.\n",
657-
"# (Which is the same time as the end of the last training period.)\n",
658-
"y_query = y_test.copy().astype(np.float)\n",
659-
"y_query.fill(np.nan)\n",
660646
"# The featurized data, aligned to y, will also be returned.\n",
661647
"# This contains the assumptions that were made in the forecast\n",
662648
"# and helps align the forecast to the original data\n",
663-
"y_predictions, X_trans = fitted_model_lags.forecast(X_test, y_query)"
649+
"y_predictions, X_trans = fitted_model_lags.forecast(X_test)"
664650
]
665651
},
666652
{

how-to-use-azureml/automated-machine-learning/forecasting-grouping/build.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
from azureml.core.compute import ComputeTarget
99
from azureml.core.conda_dependencies import CondaDependencies
1010
from azureml.core.dataset import Dataset
11+
from azureml.data import TabularDataset
1112
from azureml.pipeline.core import PipelineData, PipelineParameter, TrainingOutput, StepSequence
1213
from azureml.pipeline.steps import PythonScriptStep
1314
from azureml.train.automl import AutoMLConfig
@@ -34,8 +35,9 @@ def _get_configs(automlconfig: AutoMLConfig,
3435
group_name = valid_chars.sub('', group_name)
3536
for key in group.index:
3637
single = single._dataflow.filter(data._dataflow[key] == group[key])
38+
t_dataset = TabularDataset._create(single)
3739
group_conf = copy.deepcopy(automlconfig)
38-
group_conf.user_settings['training_data'] = single
40+
group_conf.user_settings['training_data'] = t_dataset
3941
group_conf.user_settings['label_column_name'] = target_column
4042
group_conf.user_settings['compute_target'] = compute_target
4143
configs[group_name] = group_conf

how-to-use-azureml/automated-machine-learning/forecasting-grouping/deploy/score.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ def run(raw_data):
4444
model_path = Model.get_model_path(cur_group)
4545
model = joblib.load(model_path)
4646
models[cur_group] = model
47-
_, xtrans = models[cur_group].forecast(df_one, np.repeat(np.nan, len(df_one)))
47+
_, xtrans = models[cur_group].forecast(df_one)
4848
dfs.append(xtrans)
4949
df_ret = pd.concat(dfs)
5050
df_ret.reset_index(drop=False, inplace=True)

how-to-use-azureml/automated-machine-learning/forecasting-high-frequency/automl-forecasting-function.ipynb

Lines changed: 8 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -377,9 +377,7 @@
377377
"\n",
378378
"![Forecasting after training](forecast_function_at_train.png)\n",
379379
"\n",
380-
"The `X_test` and `y_query` below, taken together, form the **forecast request**. The two are interpreted as aligned - `y_query` could actally be a column in `X_test`. `NaN`s in `y_query` are the question marks. These will be filled with the forecasts.\n",
381-
"\n",
382-
"When the forecast period immediately follows the training period, the models retain the last few points of data. You can simply fill `y_query` filled with question marks - the model has the data for the lookback already.\n"
380+
"We use `X_test` as a **forecast request** to generate the predictions."
383381
]
384382
},
385383
{
@@ -408,8 +406,7 @@
408406
"metadata": {},
409407
"outputs": [],
410408
"source": [
411-
"y_query = np.repeat(np.NaN, X_test.shape[0])\n",
412-
"y_pred_no_gap, xy_nogap = fitted_model.forecast(X_test, y_query)\n",
409+
"y_pred_no_gap, xy_nogap = fitted_model.forecast(X_test)\n",
413410
"\n",
414411
"# xy_nogap contains the predictions in the _automl_target_col column.\n",
415412
"# Those same numbers are output in y_pred_no_gap\n",
@@ -437,7 +434,7 @@
437434
"metadata": {},
438435
"outputs": [],
439436
"source": [
440-
"quantiles = fitted_model.forecast_quantiles(X_test, y_query)\n",
437+
"quantiles = fitted_model.forecast_quantiles(X_test)\n",
441438
"quantiles"
442439
]
443440
},
@@ -460,10 +457,10 @@
460457
"# specify which quantiles you would like \n",
461458
"fitted_model.quantiles = [0.01, 0.5, 0.95]\n",
462459
"# use forecast_quantiles function, not the forecast() one\n",
463-
"y_pred_quantiles = fitted_model.forecast_quantiles(X_test, y_query)\n",
460+
"y_pred_quantiles = fitted_model.forecast_quantiles(X_test)\n",
464461
"\n",
465462
"# it all nicely aligns column-wise\n",
466-
"pd.concat([X_test.reset_index(), pd.DataFrame({'query' : y_query}), y_pred_quantiles], axis=1)"
463+
"pd.concat([X_test.reset_index(), y_pred_quantiles], axis=1)"
467464
]
468465
},
469466
{
@@ -539,9 +536,7 @@
539536
"outputs": [],
540537
"source": [
541538
"try: \n",
542-
" y_query = y_away.copy()\n",
543-
" y_query.fill(np.NaN)\n",
544-
" y_pred_away, xy_away = fitted_model.forecast(X_away, y_query)\n",
539+
" y_pred_away, xy_away = fitted_model.forecast(X_away)\n",
545540
" xy_away\n",
546541
"except Exception as e:\n",
547542
" print(e)"
@@ -551,7 +546,7 @@
551546
"cell_type": "markdown",
552547
"metadata": {},
553548
"source": [
554-
"How should we read that eror message? The forecast origin is at the last time the model saw an actual value of `y` (the target). That was at the end of the training data! Because the model received all `NaN` (and not an actual target value), it is attempting to forecast from the end of training data. But the requested forecast periods are past the maximum horizon. We need to provide a define `y` value to establish the forecast origin.\n",
549+
"How should we read that eror message? The forecast origin is at the last time the model saw an actual value of `y` (the target). That was at the end of the training data! The model is attempting to forecast from the end of training data. But the requested forecast periods are past the maximum horizon. We need to provide a define `y` value to establish the forecast origin.\n",
555550
"\n",
556551
"We will use this helper function to take the required amount of context from the data preceding the testing data. It's definition is intentionally simplified to keep the idea in the clear."
557552
]
@@ -740,7 +735,7 @@
740735
"name": "python",
741736
"nbconvert_exporter": "python",
742737
"pygments_lexer": "ipython3",
743-
"version": "3.6.7"
738+
"version": "3.6.8"
744739
},
745740
"tags": [
746741
"Forecasting",

0 commit comments

Comments
 (0)