jjryle
diff --git a/‎how-to-use-azureml/automated-machine-learning/README.md‎
Lines changed: 80 additions & 45 deletions b/‎how-to-use-azureml/automated-machine-learning/README.md‎
Lines changed: 80 additions & 45 deletions
@@ -106,52 +106,87 @@ jupyter notebook
 <a name="samples"></a>
 # Automated ML SDK Sample Notebooks
 
-- [auto-ml-classification-credit-card-fraud.ipynb](classification-credit-card-fraud/auto-ml-classification-credit-card-fraud.ipynb)
-    - Dataset: Kaggle's [credit card fraud detection dataset](https://www.kaggle.com/mlg-ulb/creditcardfraud)
-    - Simple example of using automated ML for classification to fraudulent credit card transactions
-    - Uses azure compute for training
-
-- [auto-ml-regression.ipynb](regression/auto-ml-regression.ipynb)
-    - Dataset: Hardware Performance Dataset
-    - Simple example of using automated ML for regression
-    - Uses azure compute for training
-
-- [auto-ml-regression-explanation-featurization.ipynb](regression-explanation-featurization/auto-ml-regression-explanation-featurization.ipynb)
+## Classification
+- **Classify Credit Card Fraud**
+    - Dataset: [Kaggle's credit card fraud detection dataset](https://www.kaggle.com/mlg-ulb/creditcardfraud)
+      - **[Jupyter Notebook (remote run)](classification-credit-card-fraud/auto-ml-classification-credit-card-fraud.ipynb)**
+          - run the experiment remotely on AML Compute cluster
+          - test the performance of the best model in the local environment
+      - **[Jupyter Notebook (local run)](local-run-classification-credit-card-fraud/auto-ml-classification-credit-card-fraud-local.ipynb)**
+          - run experiment in the local environment
+          - use Mimic Explainer for computing feature importance
+          - deploy the best model along with the explainer to an Azure Kubernetes (AKS) cluster, which will compute the raw and engineered feature importances at inference time
+- **Predict Term Deposit Subscriptions in a Bank**
+    - Dataset: [UCI's bank marketing dataset](https://www.kaggle.com/janiobachmann/bank-marketing-dataset)
+        - **[Jupyter Notebook](classification-bank-marketing-all-features/auto-ml-classification-bank-marketing-all-features.ipynb)**
+          - run experiment remotely on AML Compute cluster to generate ONNX compatible models
+          - view the featurization steps that were applied during training
+          - view feature importance for the best model
+          - download the best model in ONNX format and use it for inferencing using ONNXRuntime
+          - deploy the best model in PKL format to Azure Container Instance (ACI)
+- **Predict Newsgroup based on Text from News Article**
+    - Dataset: [20 newsgroups text dataset](https://scikit-learn.org/0.19/datasets/twenty_newsgroups.html)
+        - **[Jupyter Notebook](classification-text-dnn/auto-ml-classification-text-dnn.ipynb)**
+          - AutoML highlights here include using deep neural networks (DNNs) to create embedded features from text data
+          - AutoML will use Bidirectional Encoder Representations from Transformers (BERT) when a GPU compute is used
+          - Bidirectional Long-Short Term neural network (BiLSTM) will be utilized when a CPU compute is used, thereby optimizing the choice of DNN
+
+## Regression
+- **Predict Performance of Hardware Parts**
     - Dataset: Hardware Performance Dataset
-    - Shows featurization and excplanation
-    - Uses azure compute for training
-
-- [auto-ml-forecasting-energy-demand.ipynb](forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb)
-    - Dataset: [NYC energy demand data](forecasting-a/nyc_energy.csv)
-    - Example of using automated ML for training a forecasting model
-
-- [auto-ml-classification-credit-card-fraud-local.ipynb](local-run-classification-credit-card-fraud/auto-ml-classification-credit-card-fraud-local.ipynb)
-    - Dataset: Kaggle's [credit card fraud detection dataset](https://www.kaggle.com/mlg-ulb/creditcardfraud)
-    - Simple example of using automated ML for classification to fraudulent credit card transactions
-    - Uses local compute for training
-
-- [auto-ml-classification-bank-marketing-all-features.ipynb](classification-bank-marketing-all-features/auto-ml-classification-bank-marketing-all-features.ipynb)
-    - Dataset: UCI's [bank marketing dataset](https://www.kaggle.com/janiobachmann/bank-marketing-dataset)
-    - Simple example of using automated ML for classification to predict term deposit subscriptions for a bank
-    - Uses azure compute for training
-
-- [auto-ml-forecasting-orange-juice-sales.ipynb](forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb)
-    - Dataset: [Dominick's grocery sales of orange juice](forecasting-b/dominicks_OJ.csv)
-    - Example of training an automated ML forecasting model on multiple time-series
-
-- [auto-ml-forecasting-bike-share.ipynb](forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb)
-    - Dataset: forecasting for a bike-sharing
-    - Example of training an automated ML forecasting model on multiple time-series
-
-- [auto-ml-forecasting-function.ipynb](forecasting-forecast-function/auto-ml-forecasting-function.ipynb)
-    - Example of training an automated ML forecasting model on multiple time-series
-
-- [auto-ml-forecasting-beer-remote.ipynb](forecasting-beer-remote/auto-ml-forecasting-beer-remote.ipynb)
-    - Example of training an automated ML forecasting model on multiple time-series
-    - Beer Production Forecasting
-
-- [auto-ml-continuous-retraining.ipynb](continuous-retraining/auto-ml-continuous-retraining.ipynb)
-    - Continuous retraining using Pipelines and Time-Series TabularDataset
+        - **[Jupyter Notebook](regression/auto-ml-regression.ipynb)**
+            - run the experiment remotely on AML Compute cluster
+            - get best trained model for a different metric than the one the experiment was optimized for
+            - test the performance of the best model in the local environment
+        - **[Jupyter Notebook (advanced)](regression/auto-ml-regression.ipynb)**
+            - run the experiment remotely on AML Compute cluster
+            - customize featurization: override column purpose within the dataset, configure transformer parameters
+            - get best trained model for a different metric than the one the experiment was optimized for
+            - run a model explanation experiment on the remote cluster
+            - deploy the model along the explainer and run online inferencing
+
+## Time Series Forecasting
+- **Forecast Energy Demand**
+    - Dataset: [NYC energy demand data](http://mis.nyiso.com/public/P-58Blist.htm)
+        - **[Jupyter Notebook](forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb)**
+          - run experiment remotely on AML Compute cluster
+          - use lags and rolling window features
+          - view the featurization steps that were applied during training
+          - get the best model, use it to forecast on test data and compare the accuracy of predictions against real data
+- **Forecast Orange Juice Sales (Multi-Series)**
+    - Dataset: [Dominick's grocery sales of orange juice](forecasting-orange-juice-sales/dominicks_OJ.csv)
+        - **[Jupyter Notebook](forecasting-orange-juice-sales/dominicks_OJ.csv)**
+          - run experiment remotely on AML Compute cluster
+          - customize time-series featurization, change column purpose and override transformer hyper parameters
+          - evaluate locally the performance of the generated best model
+          - deploy the best model as a webservice on Azure Container Instance (ACI)
+          - get online predictions from the deployed model
+- **Forecast Demand of a Bike-Sharing Service**
+    - Dataset: [Bike demand data](forecasting-bike-share/bike-no.csv)
+        - **[Jupyter Notebook](forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb)**
+          - run experiment remotely on AML Compute cluster
+          - integrate holiday features
+          - run rolling forecast for test set that is longer than the forecast horizon
+          - compute metrics on the predictions from the remote forecast
+- **The Forecast Function Interface**
+    - Dataset: Generated for sample purposes
+        - **[Jupyter Notebook](forecasting-forecast-function/auto-ml-forecasting-function.ipynb)**
+          - train a forecaster using a remote AML Compute cluster
+          - capabilities of forecast function (e.g. forecast farther into the horizon)
+          - generate confidence intervals
+- **Forecast Beverage Production**
+    - Dataset: [Monthly beer production data](forecasting-beer-remote/Beer_no_valid_split_train.csv)
+        - **[Jupyter Notebook](forecasting-beer-remote/auto-ml-forecasting-beer-remote.ipynb)**
+          - train using a remote AML Compute cluster
+          - enable the DNN learning model
+          - forecast on a remote compute cluster and compare different model performance
+- **Continuous Retraining with NOAA Weather Data**
+    - Dataset: [NOAA weather data from Azure Open Datasets](https://azure.microsoft.com/en-us/services/open-datasets/)
+        - **[Jupyter Notebook](continuous-retraining/auto-ml-continuous-retraining.ipynb)**
+          - continuously retrain a model using Pipelines and AutoML
+          - create a Pipeline to upload a time series dataset to an Azure blob
+          - create a Pipeline to run an AutoML experiment and register the best resulting model in the Workspace
+          - publish the training pipeline created and schedule it to run daily
 
 <a name="documentation"></a>
 See [Configure automated machine learning experiments](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-auto-train) to learn how more about the the settings and features available for automated machine learning experiments.