You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: automl/README.md
+33-5Lines changed: 33 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -34,7 +34,20 @@ If you are an experienced data scientist, automated ML will help increase your p
34
34
35
35
To run these notebook on your own notebook server, use these installation instructions.
36
36
37
-
It is best if you create a new conda environment locally to try this SDK, so it doesn't mess up with your existing Python environment.
37
+
The instructions below will install everything you need and then start a Jupyter notebook. To start your Jupyter notebook manually, use:
38
+
39
+
```
40
+
conda activate azure_automl
41
+
jupyter notebook
42
+
```
43
+
44
+
or on Mac:
45
+
46
+
```
47
+
source activate azure_automl
48
+
jupyter notebook
49
+
```
50
+
38
51
39
52
### 1. Install mini-conda from [here](https://conda.io/miniconda.html), choose Python 3.7 or higher.
40
53
-**Note**: if you already have conda installed, you can keep using it but it should be version 4.4.10 or later (as shown by: conda -V). If you have a previous version installed, you can update it using the command: conda update conda.
@@ -64,12 +77,11 @@ bash automl_setup_mac.sh
64
77
cd to the **automl** folder where the sample notebooks were extracted and then run:
65
78
66
79
```
67
-
automl_setup_linux.sh
80
+
bash automl_setup_linux.sh
68
81
```
69
82
70
83
### 4. Running configuration.ipynb
71
84
- Before running any samples you next need to run the configuration notebook. Click on 00.configuration.ipynb notebook
72
-
- Please make sure you use the Python [conda env:azure_automl] kernel when running this notebook.
73
85
- Execute the cells in the notebook to Register Machine Learning Services Resource Provider and create a workspace. (*instructions in notebook*)
74
86
75
87
### 5. Running Samples
@@ -164,8 +176,9 @@ automl_setup_linux.sh
164
176
# Documentation
165
177
## Table of Contents
166
178
1.[Automated ML Settings ](#automlsettings)
167
-
2.[Cross validation split options](#cvsplits)
168
-
3.[Get Data Syntax](#getdata)
179
+
1.[Cross validation split options](#cvsplits)
180
+
1.[Get Data Syntax](#getdata)
181
+
1.[Data pre-processing and featurization](#preprocessing)
169
182
170
183
<aname="automlsettings"></a>
171
184
## Automated ML Settings
@@ -210,6 +223,21 @@ The *get_data()* function can be used to return a dictionary with these values:
210
223
|columns|Array of strings|data_train||*Optional* Whitelist of columns to use for features|
211
224
|cv_splits_indices|Array of integers|data_train||*Optional* List of indexes to split the data for cross validation|
212
225
226
+
<aname="preprocessing"></a>
227
+
## Data pre-processing and featurization
228
+
If you use "preprocess=True", the following data preprocessing steps are performed automatically for you:
229
+
230
+
1. Dropping high cardinality or no variance features
231
+
- Features with no useful information are dropped from training and validation sets. These include features with all values missing, same value across all rows or with extremely high cardinality (e.g., hashes, IDs or GUIDs).
232
+
2. Missing value imputation
233
+
- For numerical features, missing values are imputed with average of values in the column.
234
+
- For categorical features, missing values are imputed with most frequent value.
235
+
3. Generating additional features
236
+
- For DateTime features: Year, Month, Day, Day of week, Day of year, Quarter, Week of the year, Hour, Minute, Second.
237
+
- For Text features: Term frequency based on bi-grams and tri-grams, Count vectorizer.
238
+
4. Transformations and encodings
239
+
- Numeric features with very few unique values are transformed into categorical features.
240
+
213
241
<aname="pythoncommand"></a>
214
242
# Running using python command
215
243
Jupyter notebook provides a File / Download as / Python (.py) option for saving the notebook as a Python file.
0 commit comments