Skip to content

Commit cedf8ef

Browse files
committed
updates
1 parent 087a203 commit cedf8ef

File tree

1 file changed

+22
-13
lines changed

1 file changed

+22
-13
lines changed

automl/README.md

Lines changed: 22 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2,19 +2,21 @@
22
1. [Automated ML Introduction](#introduction)
33
1. [Running samples in Azure Notebooks](#jupyter)
44
1. [Running samples in a Local Conda environment](#localconda)
5-
1. [Auto ML SDK Sample Notebooks](#samples)
5+
1. [Automated ML SDK Sample Notebooks](#samples)
66
1. [Documentation](#documentation)
77
1. [Running using python command](#pythoncommand)
88
1. [Troubleshooting](#troubleshooting)
99

10-
# Automated ML introduction <a name="introduction"></a>
10+
<a name="introduction"></a>
11+
# Automated ML introduction
1112
Automated machine learning (automated ML) builds high quality machine learning models for you by automating model and hyperparameter selection. Bring a labelled dataset that you want to build a model for, automated ML will give you a high quality machine learning model that you can use for predictions.
1213

1314
If you are new to Data Science, automated ML will help you get jumpstarted by simplifying machine learning model building. It abstracts you from needing to perform model selection, hyperparameter selection and in one step creates a high quality trained model for you to use.
1415

1516
If you are an experienced data scientist, automated ML will help increase your productivity by intelligently performing the model and hyperparameter selection for your training and generates high quality models much quicker than manually specifying several combinations of the parameters and running training jobs. automated ML provides visibility and access to all the training jobs and the performance characteristics of the models to help you further tune the pipeline if you desire.
1617

17-
## Running samples in Azure Notebooks - Jupyter based notebooks in the Azure cloud <a name="jupyter"></a>
18+
<a name="jupyter"></a>
19+
## Running samples in Azure Notebooks - Jupyter based notebooks in the Azure cloud
1820

1921
1. [![Azure Notebooks](https://notebooks.azure.com/launch.png)](https://aka.ms/aml-clone-azure-notebooks)
2022
[Import sample notebooks ](https://aka.ms/aml-clone-azure-notebooks) into Azure Notebooks if they are not already there.
@@ -27,7 +29,8 @@ If you are an experienced data scientist, automated ML will help increase your p
2729

2830
![set kernal to Python 3.6](../images/python36.png)
2931

30-
## Running samples in a Local Conda environment <a name="localconda"></a>
32+
<a name="localconda"></a>
33+
## Running samples in a Local Conda environment
3134

3235
To run these notebook on your own notebook server, use these installation instructions.
3336

@@ -73,7 +76,8 @@ automl_setup_linux.sh
7376
- Please make sure you use the Python [conda env:azure_automl] kernel when trying the sample Notebooks.
7477
- Follow the instructions in the individual notebooks to explore various features in automated ML
7578

76-
# Auto ML SDK Sample Notebooks <a name="samples"></a>
79+
<a name="samples"></a>
80+
# Automated ML SDK Sample Notebooks
7781
- [00.configuration.ipynb](00.configuration.ipynb)
7882
- Register Machine Learning Services Resource Provider
7983
- Create new Azure ML Workspace
@@ -100,7 +104,7 @@ automl_setup_linux.sh
100104

101105
- [03b.auto-ml-remote-batchai.ipynb](03b.auto-ml-remote-batchai.ipynb)
102106
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
103-
- Example of using Auto ML for classification using a remote Batch AI compute for training
107+
- Example of using automated ML for classification using a remote Batch AI compute for training
104108
- Parallel execution of iterations
105109
- Async tracking of progress
106110
- Cancelling individual iterations or entire run
@@ -156,14 +160,15 @@ automl_setup_linux.sh
156160
- [13.auto-ml-dataprep.ipynb](13.auto-ml-dataprep.ipynb)
157161
- Using DataPrep for reading data
158162

159-
160-
# Documentation <a name="documentation"></a>
163+
<a name="documentation"></a>
164+
# Documentation
161165
## Table of Contents
162166
1. [Automated ML Settings ](#automlsettings)
163167
2. [Cross validation split options](#cvsplits)
164168
3. [Get Data Syntax](#getdata)
165169

166-
## Automated ML Settings <a name="automlsettings"></a>
170+
<a name="automlsettings"></a>
171+
## Automated ML Settings
167172
|Property|Description|Default|
168173
|-|-|-|
169174
|**primary_metric**|This is the metric that you want to optimize.<br><br> Classification supports the following primary metrics <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i><br><i>precision_score_weighted</i><br><br> Regression supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i><br><i>normalized_root_mean_squared_log_error</i>| Classification: accuracy <br><br> Regression: spearman_correlation
@@ -177,7 +182,8 @@ automl_setup_linux.sh
177182
|**exit_score**|*double* value indicating the target for *primary_metric*. <br> Once the target is surpassed the run terminates|None|
178183
|**blacklist_algos**|*Array* of *strings* indicating pipelines to ignore for Auto ML.<br><br> Allowed values for **Classification**<br><i>LogisticRegression</i><br><i>SGDClassifierWrapper</i><br><i>NBWrapper</i><br><i>BernoulliNB</i><br><i>SVCWrapper</i><br><i>LinearSVMWrapper</i><br><i>KNeighborsClassifier</i><br><i>DecisionTreeClassifier</i><br><i>RandomForestClassifier</i><br><i>ExtraTreesClassifier</i><br><i>gradient boosting</i><br><i>LightGBMClassifier</i><br><br>Allowed values for **Regression**<br><i>ElasticNet</i><br><i>GradientBoostingRegressor</i><br><i>DecisionTreeRegressor</i><br><i>KNeighborsRegressor</i><br><i>LassoLars</i><br><i>SGDRegressor</i><br><i>RandomForestRegressor</i><br><i>ExtraTreesRegressor</i>|None|
179184

180-
## Cross validation split options <a name="cvsplits"></a>
185+
<a name="cvsplits"></a>
186+
## Cross validation split options
181187
### K-Folds Cross Validation
182188
Use *n_cross_validations* setting to specify the number of cross validations. The training data set will be randomly split into *n_cross_validations* folds of equal size. During each cross validation round, one of the folds will be used for validation of the model trained on the remaining folds. This process repeats for *n_cross_validations* rounds until each fold is used once as validation set. Finally, the average scores accross all *n_cross_validations* rounds will be reported, and the corresponding model will be retrained on the whole training data set.
183189

@@ -187,7 +193,8 @@ Use *validation_size* to specify the percentage of the training data set that sh
187193
### Custom train and validation set
188194
You can specify seperate train and validation set either through the get_data() or directly to the fit method.
189195

190-
## get_data() syntax <a name="getdata"></a>
196+
<a name="getdata"></a>
197+
## get_data() syntax
191198
The *get_data()* function can be used to return a dictionary with these values:
192199

193200
|Key|Type|Dependency|Mutually Exclusive with|Description|
@@ -203,7 +210,8 @@ The *get_data()* function can be used to return a dictionary with these values:
203210
|columns|Array of strings|data_train||*Optional* Whitelist of columns to use for features|
204211
|cv_splits_indices|Array of integers|data_train||*Optional* List of indexes to split the data for cross validation|
205212

206-
# Running using python command <a name="pythoncommand"></a>
213+
<a name="pythoncommand"></a>
214+
# Running using python command
207215
Jupyter notebook provides a File / Download as / Python (.py) option for saving the notebook as a Python file.
208216
You can then run this file using the python command.
209217
However, on Windows the file needs to be modified before it can be run.
@@ -213,7 +221,8 @@ The following condition must be added to the main code in the file:
213221

214222
The main code of the file must be indented so that it is under this condition.
215223

216-
# Troubleshooting <a name="troubleshooting"></a>
224+
<a name="troubleshooting"></a>
225+
# Troubleshooting
217226
## Iterations fail and the log contains "MemoryError"
218227
This can be caused by insufficient memory on the DSVM. Automated ML loads all training data into memory. So, the available memory should be more than the training data size.
219228
If you are using a remote DSVM, memory is needed for each concurrent iteration. The concurrent_iterations setting specifies the maximum concurrent iterations. For example, if the training data size is 8Gb and concurrent_iterations is set to 10, the minimum memory required is at least 80Gb.

0 commit comments

Comments
 (0)