Skip to content

Commit a06da4d

Browse files
committed
updates from Jeff, Gaurav
1 parent cedf8ef commit a06da4d

File tree

1 file changed

+33
-5
lines changed

1 file changed

+33
-5
lines changed

automl/README.md

Lines changed: 33 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,20 @@ If you are an experienced data scientist, automated ML will help increase your p
3434

3535
To run these notebook on your own notebook server, use these installation instructions.
3636

37-
It is best if you create a new conda environment locally to try this SDK, so it doesn't mess up with your existing Python environment.
37+
The instructions below will install everything you need and then start a Jupyter notebook. To start your Jupyter notebook manually, use:
38+
39+
```
40+
conda activate azure_automl
41+
jupyter notebook
42+
```
43+
44+
or on Mac:
45+
46+
```
47+
source activate azure_automl
48+
jupyter notebook
49+
```
50+
3851

3952
### 1. Install mini-conda from [here](https://conda.io/miniconda.html), choose Python 3.7 or higher.
4053
- **Note**: if you already have conda installed, you can keep using it but it should be version 4.4.10 or later (as shown by: conda -V). If you have a previous version installed, you can update it using the command: conda update conda.
@@ -64,12 +77,11 @@ bash automl_setup_mac.sh
6477
cd to the **automl** folder where the sample notebooks were extracted and then run:
6578

6679
```
67-
automl_setup_linux.sh
80+
bash automl_setup_linux.sh
6881
```
6982

7083
### 4. Running configuration.ipynb
7184
- Before running any samples you next need to run the configuration notebook. Click on 00.configuration.ipynb notebook
72-
- Please make sure you use the Python [conda env:azure_automl] kernel when running this notebook.
7385
- Execute the cells in the notebook to Register Machine Learning Services Resource Provider and create a workspace. (*instructions in notebook*)
7486

7587
### 5. Running Samples
@@ -164,8 +176,9 @@ automl_setup_linux.sh
164176
# Documentation
165177
## Table of Contents
166178
1. [Automated ML Settings ](#automlsettings)
167-
2. [Cross validation split options](#cvsplits)
168-
3. [Get Data Syntax](#getdata)
179+
1. [Cross validation split options](#cvsplits)
180+
1. [Get Data Syntax](#getdata)
181+
1. [Data pre-processing and featurization](#preprocessing)
169182

170183
<a name="automlsettings"></a>
171184
## Automated ML Settings
@@ -210,6 +223,21 @@ The *get_data()* function can be used to return a dictionary with these values:
210223
|columns|Array of strings|data_train||*Optional* Whitelist of columns to use for features|
211224
|cv_splits_indices|Array of integers|data_train||*Optional* List of indexes to split the data for cross validation|
212225

226+
<a name="preprocessing"></a>
227+
## Data pre-processing and featurization
228+
If you use "preprocess=True", the following data preprocessing steps are performed automatically for you:
229+
230+
1. Dropping high cardinality or no variance features
231+
- Features with no useful information are dropped from training and validation sets. These include features with all values missing, same value across all rows or with extremely high cardinality (e.g., hashes, IDs or GUIDs).
232+
2. Missing value imputation
233+
- For numerical features, missing values are imputed with average of values in the column.
234+
- For categorical features, missing values are imputed with most frequent value.
235+
3. Generating additional features
236+
- For DateTime features: Year, Month, Day, Day of week, Day of year, Quarter, Week of the year, Hour, Minute, Second.
237+
- For Text features: Term frequency based on bi-grams and tri-grams, Count vectorizer.
238+
4. Transformations and encodings
239+
- Numeric features with very few unique values are transformed into categorical features.
240+
213241
<a name="pythoncommand"></a>
214242
# Running using python command
215243
Jupyter notebook provides a File / Download as / Python (.py) option for saving the notebook as a Python file.

0 commit comments

Comments
 (0)