Skip to content

Commit 14ecfb0

Browse files
authored
Merge pull request Azure#448 from jeff-shepherd/master
Update new notebooks to use dataprep and add sql files
2 parents cd3c980 + 61b396b commit 14ecfb0

File tree

15 files changed

+4282
-3087
lines changed

15 files changed

+4282
-3087
lines changed

how-to-use-azureml/automated-machine-learning/classification-bank-marketing/auto-ml-classification-bank-marketing.ipynb

Lines changed: 727 additions & 740 deletions
Large diffs are not rendered by default.

how-to-use-azureml/automated-machine-learning/classification-credit-card-fraud/auto-ml-classification-credit-card-fraud.ipynb

Lines changed: 710 additions & 716 deletions
Large diffs are not rendered by default.

how-to-use-azureml/automated-machine-learning/regression-concrete-strength/auto-ml-regression-concrete-strength.ipynb

Lines changed: 798 additions & 810 deletions
Large diffs are not rendered by default.

how-to-use-azureml/automated-machine-learning/regression-hardware-performance/auto-ml-regression-hardware-performance.ipynb

Lines changed: 798 additions & 821 deletions
Large diffs are not rendered by default.
Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
# Table of Contents
2+
1. [Introduction](#introduction)
3+
1. [Setup using Azure Data Studio](#azuredatastudiosetup)
4+
1. [Energy demand example using Azure Data Studio](#azuredatastudioenergydemand)
5+
1. [Set using SQL Server Management Studio for SQL Server 2017 on Windows](#ssms2017)
6+
1. [Set using SQL Server Management Studio for SQL Server 2019 on Linux](#ssms2019)
7+
1. [Energy demand example using SQL Server Management Studio](#ssmsenergydemand)
8+
9+
10+
<a name="introduction"></a>
11+
# Introduction
12+
SQL Server 2017 or 2019 can call Azure ML automated machine learning to create models trained on data from SQL Server.
13+
This uses the sp_execute_external_script stored procedure, which can call Python scripts.
14+
SQL Server 2017 and SQL Server 2019 can both run on Windows or Linux.
15+
However, this integration is not available for SQL Server 2017 on Linux.
16+
17+
This folder shows how to setup the integration and has a sample that uses the integration to train and predict based on an energy demand dataset.
18+
19+
This integration is part of SQL Server and so can be used from any SQL client.
20+
These instructions show using it from Azure Data Studio or SQL Server Managment Studio.
21+
22+
<a name="azuredatastudiosetup"></a>
23+
## Setup using Azure Data Studio
24+
25+
These step show setting up the integration using Azure Data Studio.
26+
27+
1. If you don't already have SQL Server, you can install it from [https://www.microsoft.com/en-us/sql-server/sql-server-downloads](https://www.microsoft.com/en-us/sql-server/sql-server-downloads)
28+
1. Install Azure Data Studio from [https://docs.microsoft.com/en-us/sql/azure-data-studio/download?view=sql-server-2017](https://docs.microsoft.com/en-us/sql/azure-data-studio/download?view=sql-server-2017)
29+
1. Start Azure Data Studio and connect to SQL Server. [https://docs.microsoft.com/en-us/sql/azure-data-studio/sql-notebooks?view=sql-server-2017](https://docs.microsoft.com/en-us/sql/azure-data-studio/sql-notebooks?view=sql-server-2017)
30+
1. Create a database named "automl".
31+
1. Open the notebook how-to-use-azureml\automated-machine-learning\sql-server\setup\auto-ml-sql-setup.ipynb and follow the instructions in it.
32+
33+
<a name="azuredatastudioenergydemand"></a>
34+
## Energy demand example using Azure Data Studio
35+
36+
Once you have completed the setup, you can try the energy demand sample in the notebook energy-demand\auto-ml-sql-energy-demand.ipynb.
37+
This has cells to train a model, predict based on the model and show metrics for each pipeline run in training the model.
38+
39+
<a name="ssms2017"></a>
40+
## Setup using SQL Server Management Studio for SQL Server 2017 on Windows
41+
42+
These instruction setup the integration for SQL Server 2017 on Windows.
43+
44+
1. If you don't already have SQL Server, you can install it from [https://www.microsoft.com/en-us/sql-server/sql-server-downloads](https://www.microsoft.com/en-us/sql-server/sql-server-downloads)
45+
2. Enable external scripts with the following commands:
46+
```sh
47+
sp_configure 'external scripts enabled',1
48+
reconfigure with override
49+
```
50+
3. Stop SQL Server.
51+
4. Install the automated machine learning libraries using the following commands from Administrator command prompt (If you are using a non-default SQL Server instance name, replace MSSQLSERVER in the second command with the instance name)
52+
```sh
53+
cd "C:\Program Files\Microsoft SQL Server"
54+
cd "MSSQL14.MSSQLSERVER\PYTHON_SERVICES"
55+
python.exe -m pip install azureml-sdk[automl]
56+
python.exe -m pip install --upgrade numpy
57+
python.exe -m pip install --upgrade sklearn
58+
```
59+
5. Start SQL Server and the service "SQL Server Launchpad service".
60+
6. In Windows Firewall, click on advanced settings and in Outbound Rules, disable "Block network access for R local user accounts in SQL Server instance xxxx".
61+
7. Execute the files in the setup folder in SQL Server Management Studio: aml_model.sql, aml_connection.sql, AutoMLGetMetrics.sql, AutoMLPredict.sql and AutoMLTrain.sql
62+
8. Create an Azure Machine Learning Workspace. You can use the instructions at: [https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-workspace ](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-workspace)
63+
9. Create a config.json file file using the subscription id, resource group name and workspace name that you used to create the workspace. The file is described at: [https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#workspace)
64+
10. Create an Azure service principal. You can do this with the commands:
65+
```sh
66+
az login
67+
az account set --subscription subscriptionid
68+
az ad sp create-for-rbac --name principlename --password password
69+
```
70+
11. Insert the values \<tenant\>, \<AppId\> and \<password\> returned by create-for-rbac above into the aml_connection table. Set \<path\> as the absolute path to your config.json file. Set the name to “Default”.
71+
72+
<a name="ssms2019"></a>
73+
## Setup using SQL Server Management Studio for SQL Server 2019 on Linux
74+
1. Install SQL Server 2019 from: [https://www.microsoft.com/en-us/sql-server/sql-server-downloads](https://www.microsoft.com/en-us/sql-server/sql-server-downloads)
75+
2. Install machine learning support from: [https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-setup-machine-learning?view=sqlallproducts-allversions#ubuntu](https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-setup-machine-learning?view=sqlallproducts-allversions#ubuntu)
76+
3. Then install SQL Server management Studio from [https://docs.microsoft.com/en-us/sql/ssms/download-sql-server-management-studio-ssms?view=sql-server-2017](https://docs.microsoft.com/en-us/sql/ssms/download-sql-server-management-studio-ssms?view=sql-server-2017)
77+
4. Enable external scripts with the following commands:
78+
```sh
79+
sp_configure 'external scripts enabled',1
80+
reconfigure with override
81+
```
82+
5. Stop SQL Server.
83+
6. Install the automated machine learning libraries using the following commands from Administrator command (If you are using a non-default SQL Server instance name, replace MSSQLSERVER in the second command with the instance name):
84+
```sh
85+
sudo /opt/mssql/mlservices/bin/python/python -m pip install azureml-sdk[automl]
86+
sudo /opt/mssql/mlservices/bin/python/python -m pip install --upgrade numpy
87+
sudo /opt/mssql/mlservices/bin/python/python -m pip install --upgrade sklearn
88+
```
89+
7. Start SQL Server.
90+
8. Execute the files aml_model.sql, aml_connection.sql, AutoMLGetMetrics.sql, AutoMLPredict.sql and AutoMLTrain.sql in SQL Server Management Studio.
91+
9. Create an Azure Machine Learning Workspace. You can use the instructions at: [https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-workspace)
92+
10. Create a config.json file file using the subscription id, resource group name and workspace name that you use to create the workspace. The file is described at: [https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#workspace)
93+
11. Create an Azure service principal. You can do this with the commands:
94+
```sh
95+
az login
96+
az account set --subscription subscriptionid
97+
az ad sp create-for-rbac --name principlename --password password
98+
```
99+
12. Insert the values \<tenant\>, \<AppId\> and \<password\> returned by create-for-rbac above into the aml_connection table. Set \<path\> as the absolute path to your config.json file. Set the name to “Default”.
100+
101+
<a name="ssmsenergydemand"></a>
102+
## Energy demand example using SQL Server Management Studio
103+
104+
Once you have completed the setup, you can try the energy demand sample queries.
105+
First you need to load the sample data in the database.
106+
1. In SQL Server Management Studio, you can right-click the database, select Tasks, then Import Flat file.
107+
1. Select the file MachineLearningNotebooks\notebooks\how-to-use-azureml\automated-machine-learning\forecasting-energy-demand\nyc_energy.csv.
108+
1. When you get to the column definition page, allow nulls for all columns.
109+
110+
You can then run the queries in the energy-demand folder:
111+
* TrainEnergyDemand.sql runs AutoML, trains multiple models on data and selects the best model.
112+
* PredictEnergyDemand.sql predicts based on the most recent training run.
113+
* GetMetrics.sql returns all the metrics for each model in the most recent training run.
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
-- This lists all the metrics for all iterations for the most recent run.
2+
3+
DECLARE @RunId NVARCHAR(43)
4+
DECLARE @ExperimentName NVARCHAR(255)
5+
6+
SELECT TOP 1 @ExperimentName=ExperimentName, @RunId=SUBSTRING(RunId, 1, 43)
7+
FROM aml_model
8+
ORDER BY CreatedDate DESC
9+
10+
EXEC dbo.AutoMLGetMetrics @RunId, @ExperimentName
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
-- This shows using the AutoMLPredict stored procedure to predict using a forecasting model for the nyc_energy dataset.
2+
3+
DECLARE @Model NVARCHAR(MAX) = (SELECT TOP 1 Model FROM dbo.aml_model
4+
WHERE ExperimentName = 'automl-sql-forecast'
5+
ORDER BY CreatedDate DESC)
6+
7+
EXEC dbo.AutoMLPredict @input_query='
8+
SELECT CAST(timeStamp AS NVARCHAR(30)) AS timeStamp,
9+
demand,
10+
precip,
11+
temp
12+
FROM nyc_energy
13+
WHERE demand IS NOT NULL AND precip IS NOT NULL AND temp IS NOT NULL
14+
AND timeStamp >= ''2017-02-01''',
15+
@label_column='demand',
16+
@model=@model
17+
WITH RESULT SETS ((timeStamp NVARCHAR(30), actual_demand FLOAT, precip FLOAT, temp FLOAT, predicted_demand FLOAT))
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
-- This shows using the AutoMLTrain stored procedure to create a forecasting model for the nyc_energy dataset.
2+
3+
INSERT INTO dbo.aml_model(RunId, ExperimentName, Model, LogFileText, WorkspaceName)
4+
EXEC dbo.AutoMLTrain @input_query='
5+
SELECT CAST(timeStamp as NVARCHAR(30)) as timeStamp,
6+
demand,
7+
precip,
8+
temp,
9+
CASE WHEN timeStamp < ''2017-01-01'' THEN 0 ELSE 1 END AS is_validate_column
10+
FROM nyc_energy
11+
WHERE demand IS NOT NULL AND precip IS NOT NULL AND temp IS NOT NULL
12+
and timeStamp < ''2017-02-01''',
13+
@label_column='demand',
14+
@task='forecasting',
15+
@iterations=10,
16+
@iteration_timeout_minutes=5,
17+
@time_column_name='timeStamp',
18+
@is_validate_column='is_validate_column',
19+
@experiment_name='automl-sql-forecast',
20+
@primary_metric='normalized_root_mean_squared_error'
21+
Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Train a model and use it for prediction\r\n",
8+
"\r\n",
9+
"Before running this notebook, run the auto-ml-sql-setup.ipynb notebook."
10+
]
11+
},
12+
{
13+
"cell_type": "markdown",
14+
"metadata": {},
15+
"source": [
16+
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/sql-server/energy-demand/auto-ml-sql-energy-demand.png)"
17+
]
18+
},
19+
{
20+
"cell_type": "markdown",
21+
"metadata": {},
22+
"source": [
23+
"## Set the default database"
24+
]
25+
},
26+
{
27+
"cell_type": "code",
28+
"execution_count": null,
29+
"metadata": {},
30+
"outputs": [],
31+
"source": [
32+
"USE [automl]\r\n",
33+
"GO"
34+
]
35+
},
36+
{
37+
"cell_type": "markdown",
38+
"metadata": {},
39+
"source": [
40+
"## Use the AutoMLTrain stored procedure to create a forecasting model for the nyc_energy dataset."
41+
]
42+
},
43+
{
44+
"cell_type": "code",
45+
"execution_count": null,
46+
"metadata": {},
47+
"outputs": [],
48+
"source": [
49+
"INSERT INTO dbo.aml_model(RunId, ExperimentName, Model, LogFileText, WorkspaceName)\r\n",
50+
"EXEC dbo.AutoMLTrain @input_query='\r\n",
51+
"SELECT CAST(timeStamp as NVARCHAR(30)) as timeStamp,\r\n",
52+
" demand,\r\n",
53+
"\t precip,\r\n",
54+
"\t temp,\r\n",
55+
"\t CASE WHEN timeStamp < ''2017-01-01'' THEN 0 ELSE 1 END AS is_validate_column\r\n",
56+
"FROM nyc_energy\r\n",
57+
"WHERE demand IS NOT NULL AND precip IS NOT NULL AND temp IS NOT NULL\r\n",
58+
"and timeStamp < ''2017-02-01''',\r\n",
59+
"@label_column='demand',\r\n",
60+
"@task='forecasting',\r\n",
61+
"@iterations=10,\r\n",
62+
"@iteration_timeout_minutes=5,\r\n",
63+
"@time_column_name='timeStamp',\r\n",
64+
"@is_validate_column='is_validate_column',\r\n",
65+
"@experiment_name='automl-sql-forecast',\r\n",
66+
"@primary_metric='normalized_root_mean_squared_error'"
67+
]
68+
},
69+
{
70+
"cell_type": "markdown",
71+
"metadata": {},
72+
"source": [
73+
"## Use the AutoMLPredict stored procedure to predict using the forecasting model for the nyc_energy dataset."
74+
]
75+
},
76+
{
77+
"cell_type": "code",
78+
"execution_count": null,
79+
"metadata": {},
80+
"outputs": [],
81+
"source": [
82+
"DECLARE @Model NVARCHAR(MAX) = (SELECT TOP 1 Model FROM dbo.aml_model\r\n",
83+
" WHERE ExperimentName = 'automl-sql-forecast'\r\n",
84+
"\t\t\t\t\t\t\t\tORDER BY CreatedDate DESC)\r\n",
85+
"\r\n",
86+
"EXEC dbo.AutoMLPredict @input_query='\r\n",
87+
"SELECT CAST(timeStamp AS NVARCHAR(30)) AS timeStamp,\r\n",
88+
" demand,\r\n",
89+
"\t precip,\r\n",
90+
"\t temp\r\n",
91+
"FROM nyc_energy\r\n",
92+
"WHERE demand IS NOT NULL AND precip IS NOT NULL AND temp IS NOT NULL\r\n",
93+
"AND timeStamp >= ''2017-02-01''',\r\n",
94+
"@label_column='demand',\r\n",
95+
"@model=@model\r\n",
96+
"WITH RESULT SETS ((timeStamp NVARCHAR(30), actual_demand FLOAT, precip FLOAT, temp FLOAT, predicted_demand FLOAT))"
97+
]
98+
},
99+
{
100+
"cell_type": "markdown",
101+
"metadata": {},
102+
"source": [
103+
"## List all the metrics for all iterations for the most recent training run."
104+
]
105+
},
106+
{
107+
"cell_type": "code",
108+
"execution_count": null,
109+
"metadata": {},
110+
"outputs": [],
111+
"source": [
112+
"DECLARE @RunId NVARCHAR(43)\r\n",
113+
"DECLARE @ExperimentName NVARCHAR(255)\r\n",
114+
"\r\n",
115+
"SELECT TOP 1 @ExperimentName=ExperimentName, @RunId=SUBSTRING(RunId, 1, 43)\r\n",
116+
"FROM aml_model\r\n",
117+
"ORDER BY CreatedDate DESC\r\n",
118+
"\r\n",
119+
"EXEC dbo.AutoMLGetMetrics @RunId, @ExperimentName"
120+
]
121+
}
122+
],
123+
"metadata": {
124+
"authors": [
125+
{
126+
"name": "jeffshep"
127+
}
128+
],
129+
"kernelspec": {
130+
"display_name": "SQL",
131+
"language": "sql",
132+
"name": "SQL"
133+
},
134+
"language_info": {
135+
"name": "sql",
136+
"version": ""
137+
}
138+
},
139+
"nbformat": 4,
140+
"nbformat_minor": 2
141+
}
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
-- This procedure returns a list of metrics for each iteration of a run.
2+
SET ANSI_NULLS ON
3+
GO
4+
SET QUOTED_IDENTIFIER ON
5+
GO
6+
CREATE OR ALTER PROCEDURE [dbo].[AutoMLGetMetrics]
7+
(
8+
@run_id NVARCHAR(250), -- The RunId
9+
@experiment_name NVARCHAR(32)='automl-sql-test', -- This can be used to find the experiment in the Azure Portal.
10+
@connection_name NVARCHAR(255)='default' -- The AML connection to use.
11+
) AS
12+
BEGIN
13+
DECLARE @tenantid NVARCHAR(255)
14+
DECLARE @appid NVARCHAR(255)
15+
DECLARE @password NVARCHAR(255)
16+
DECLARE @config_file NVARCHAR(255)
17+
18+
SELECT @tenantid=TenantId, @appid=AppId, @password=Password, @config_file=ConfigFile
19+
FROM aml_connection
20+
WHERE ConnectionName = @connection_name;
21+
22+
EXEC sp_execute_external_script @language = N'Python', @script = N'import pandas as pd
23+
import logging
24+
import azureml.core
25+
import numpy as np
26+
from azureml.core.experiment import Experiment
27+
from azureml.train.automl.run import AutoMLRun
28+
from azureml.core.authentication import ServicePrincipalAuthentication
29+
from azureml.core.workspace import Workspace
30+
31+
auth = ServicePrincipalAuthentication(tenantid, appid, password)
32+
33+
ws = Workspace.from_config(path=config_file, auth=auth)
34+
35+
experiment = Experiment(ws, experiment_name)
36+
37+
ml_run = AutoMLRun(experiment = experiment, run_id = run_id)
38+
39+
children = list(ml_run.get_children())
40+
iterationlist = []
41+
metricnamelist = []
42+
metricvaluelist = []
43+
44+
for run in children:
45+
properties = run.get_properties()
46+
if "iteration" in properties:
47+
iteration = int(properties["iteration"])
48+
for metric_name, metric_value in run.get_metrics().items():
49+
if isinstance(metric_value, float):
50+
iterationlist.append(iteration)
51+
metricnamelist.append(metric_name)
52+
metricvaluelist.append(metric_value)
53+
54+
metrics = pd.DataFrame({"iteration": iterationlist, "metric_name": metricnamelist, "metric_value": metricvaluelist})
55+
'
56+
, @output_data_1_name = N'metrics'
57+
, @params = N'@run_id NVARCHAR(250),
58+
@experiment_name NVARCHAR(32),
59+
@tenantid NVARCHAR(255),
60+
@appid NVARCHAR(255),
61+
@password NVARCHAR(255),
62+
@config_file NVARCHAR(255)'
63+
, @run_id = @run_id
64+
, @experiment_name = @experiment_name
65+
, @tenantid = @tenantid
66+
, @appid = @appid
67+
, @password = @password
68+
, @config_file = @config_file
69+
WITH RESULT SETS ((iteration INT, metric_name NVARCHAR(100), metric_value FLOAT))
70+
END

0 commit comments

Comments
 (0)