Skip to content

Commit b55ac36

Browse files
authored
Merge pull request Azure#428 from rastala/master
update cluster creation
2 parents 4ecc58d + de16231 commit b55ac36

File tree

29 files changed

+674
-196
lines changed

29 files changed

+674
-196
lines changed

configuration.ipynb

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@
3939
" 1. Workspace parameters\n",
4040
" 1. Access your workspace\n",
4141
" 1. Create a new workspace\n",
42+
" 1. Create compute resources\n",
4243
"1. [Next steps](#Next%20steps)\n",
4344
"\n",
4445
"---\n",
@@ -241,6 +242,97 @@
241242
"ws.write_config()"
242243
]
243244
},
245+
{
246+
"cell_type": "markdown",
247+
"metadata": {},
248+
"source": [
249+
"### Create compute resources for your training experiments\n",
250+
"\n",
251+
"Many of the sample notebooks use Azure ML managed compute (AmlCompute) to train models using a dynamically scalable pool of compute. In this section you will create default compute clusters for use by the other notebooks and any other operations you choose.\n",
252+
"\n",
253+
"To create a cluster, you need to specify a compute configuration that specifies the type of machine to be used and the scalability behaviors. Then you choose a name for the cluster that is unique within the workspace that can be used to address the cluster later.\n",
254+
"\n",
255+
"The cluster parameters are:\n",
256+
"* vm_size - this describes the virtual machine type and size used in the cluster. All machines in the cluster are the same type. You can get the list of vm sizes available in your region by using the CLI command\n",
257+
"\n",
258+
"```shell\n",
259+
"az vm list-skus -o tsv\n",
260+
"```\n",
261+
"* min_nodes - this sets the minimum size of the cluster. If you set the minimum to 0 the cluster will shut down all nodes while note in use. Setting this number to a value higher than 0 will allow for faster start-up times, but you will also be billed when the cluster is not in use.\n",
262+
"* max_nodes - this sets the maximum size of the cluster. Setting this to a larger number allows for more concurrency and a greater distributed processing of scale-out jobs.\n",
263+
"\n",
264+
"\n",
265+
"To create a **CPU** cluster now, run the cell below. The autoscale settings mean that the cluster will scale down to 0 nodes when inactive and up to 4 nodes when busy."
266+
]
267+
},
268+
{
269+
"cell_type": "code",
270+
"execution_count": null,
271+
"metadata": {},
272+
"outputs": [],
273+
"source": [
274+
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
275+
"from azureml.core.compute_target import ComputeTargetException\n",
276+
"\n",
277+
"# Choose a name for your CPU cluster\n",
278+
"cpu_cluster_name = \"cpu-cluster\"\n",
279+
"\n",
280+
"# Verify that cluster does not exist already\n",
281+
"try:\n",
282+
" cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
283+
" print(\"Found existing cpu-cluster\")\n",
284+
"except ComputeTargetException:\n",
285+
" print(\"Creating new cpu-cluster\")\n",
286+
" \n",
287+
" # Specify the configuration for the new cluster\n",
288+
" compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_D2_V2\",\n",
289+
" min_nodes=0,\n",
290+
" max_nodes=4)\n",
291+
"\n",
292+
" # Create the cluster with the specified name and configuration\n",
293+
" cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
294+
" \n",
295+
" # Wait for the cluster to complete, show the output log\n",
296+
" cpu_cluster.wait_for_completion(show_output=True)"
297+
]
298+
},
299+
{
300+
"cell_type": "markdown",
301+
"metadata": {},
302+
"source": [
303+
"To create a **GPU** cluster, run the cell below. Note that your subscription must have sufficient quota for GPU VMs or the command will fail. To increase quota, see [these instructions](https://docs.microsoft.com/en-us/azure/azure-supportability/resource-manager-core-quotas-request). "
304+
]
305+
},
306+
{
307+
"cell_type": "code",
308+
"execution_count": null,
309+
"metadata": {},
310+
"outputs": [],
311+
"source": [
312+
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
313+
"from azureml.core.compute_target import ComputeTargetException\n",
314+
"\n",
315+
"# Choose a name for your GPU cluster\n",
316+
"gpu_cluster_name = \"gpu-cluster\"\n",
317+
"\n",
318+
"# Verify that cluster does not exist already\n",
319+
"try:\n",
320+
" gpu_cluster = ComputeTarget(workspace=ws, name=gpu_cluster_name)\n",
321+
" print(\"Found existing gpu cluster\")\n",
322+
"except ComputeTargetException:\n",
323+
" print(\"Creating new gpu-cluster\")\n",
324+
" \n",
325+
" # Specify the configuration for the new cluster\n",
326+
" compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_NC6\",\n",
327+
" min_nodes=0,\n",
328+
" max_nodes=4)\n",
329+
" # Create the cluster with the specified name and configuration\n",
330+
" gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, compute_config)\n",
331+
"\n",
332+
" # Wait for the cluster to complete, show the output log\n",
333+
" gpu_cluster.wait_for_completion(show_output=True)"
334+
]
335+
},
244336
{
245337
"cell_type": "markdown",
246338
"metadata": {},

how-to-use-azureml/automated-machine-learning/dataprep-remote-execution/auto-ml-dataprep-remote-execution.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -205,7 +205,7 @@
205205
"from azureml.core.compute import ComputeTarget\n",
206206
"\n",
207207
"# Choose a name for your cluster.\n",
208-
"amlcompute_cluster_name = \"cpucluster\"\n",
208+
"amlcompute_cluster_name = \"cpu-cluster\"\n",
209209
"\n",
210210
"found = False\n",
211211
"\n",

how-to-use-azureml/automated-machine-learning/remote-amlcompute/auto-ml-remote-amlcompute.ipynb

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,9 @@
119119
"metadata": {},
120120
"source": [
121121
"### Create or Attach existing AmlCompute\n",
122-
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your AutoML run. In this tutorial, you create an AmlCompute as your training compute resource.\n",
122+
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your AutoML run. In this tutorial, you create `AmlCompute` as your training compute resource.\n",
123+
"\n",
124+
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
123125
"\n",
124126
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
125127
]
@@ -134,12 +136,10 @@
134136
"from azureml.core.compute import ComputeTarget\n",
135137
"\n",
136138
"# Choose a name for your cluster.\n",
137-
"amlcompute_cluster_name = \"cpucluster\"\n",
139+
"amlcompute_cluster_name = \"cpu-cluster\"\n",
138140
"\n",
139141
"found = False\n",
140-
"\n",
141142
"# Check if this compute target already exists in the workspace.\n",
142-
"\n",
143143
"cts = ws.compute_targets\n",
144144
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n",
145145
" found = True\n",

how-to-use-azureml/deployment/onnx/onnx-train-pytorch-aml-deploy-mnist.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@
9898
"from azureml.core.compute_target import ComputeTargetException\n",
9999
"\n",
100100
"# choose a name for your cluster\n",
101-
"cluster_name = \"gpucluster\"\n",
101+
"cluster_name = \"gpu-cluster\"\n",
102102
"\n",
103103
"try:\n",
104104
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",

how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-getting-started.ipynb

Lines changed: 26 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -206,8 +206,15 @@
206206
"cell_type": "markdown",
207207
"metadata": {},
208208
"source": [
209-
"#### Retrieve default Azure Machine Learning compute\n",
210-
"Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's get the default Azure Machine Learning Compute in the current workspace. We will then run the training script on this compute target."
209+
"#### Retrieve or create a Azure Machine Learning compute\n",
210+
"Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's create a new Azure Machine Learning Compute in the current workspace, if it doesn't already exist. We will then run the training script on this compute target.\n",
211+
"\n",
212+
"If we could not find the compute with the given name in the previous cell, then we will create a new compute here. We will create an Azure Machine Learning Compute containing **STANDARD_D2_V2 CPU VMs**. This process is broken down into the following steps:\n",
213+
"\n",
214+
"1. Create the configuration\n",
215+
"2. Create the Azure Machine Learning compute\n",
216+
"\n",
217+
"**This process will take about 3 minutes and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell.**"
211218
]
212219
},
213220
{
@@ -216,7 +223,23 @@
216223
"metadata": {},
217224
"outputs": [],
218225
"source": [
219-
"aml_compute = ws.get_default_compute_target(\"CPU\")"
226+
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
227+
"from azureml.core.compute_target import ComputeTargetException\n",
228+
"\n",
229+
"aml_compute_target = \"cpu-cluster\"\n",
230+
"try:\n",
231+
" aml_compute = AmlCompute(ws, aml_compute_target)\n",
232+
" print(\"found existing compute target.\")\n",
233+
"except ComputeTargetException:\n",
234+
" print(\"creating new compute target\")\n",
235+
" \n",
236+
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n",
237+
" min_nodes = 1, \n",
238+
" max_nodes = 4) \n",
239+
" aml_compute = ComputeTarget.create(ws, aml_compute_target, provisioning_config)\n",
240+
" aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
241+
" \n",
242+
"print(\"Azure Machine Learning Compute attached\")\n"
220243
]
221244
},
222245
{

how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-azurebatch-to-run-a-windows-executable.ipynb

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,25 @@
113113
"metadata": {},
114114
"outputs": [],
115115
"source": [
116-
"batch_compute = ws.get_default_compute_target(\"CPU\")"
116+
"batch_compute_name = 'mybatchcompute' # Name to associate with new compute in workspace\n",
117+
"\n",
118+
"# Batch account details needed to attach as compute to workspace\n",
119+
"batch_account_name = \"<batch_account_name>\" # Name of the Batch account\n",
120+
"batch_resource_group = \"<batch_resource_group>\" # Name of the resource group which contains this account\n",
121+
"\n",
122+
"try:\n",
123+
" # check if already attached\n",
124+
" batch_compute = BatchCompute(ws, batch_compute_name)\n",
125+
"except ComputeTargetException:\n",
126+
" print('Attaching Batch compute...')\n",
127+
" provisioning_config = BatchCompute.attach_configuration(resource_group=batch_resource_group, \n",
128+
" account_name=batch_account_name)\n",
129+
" batch_compute = ComputeTarget.attach(ws, batch_compute_name, provisioning_config)\n",
130+
" batch_compute.wait_for_completion()\n",
131+
" print(\"Provisioning state:{}\".format(batch_compute.provisioning_state))\n",
132+
" print(\"Provisioning errors:{}\".format(batch_compute.provisioning_errors))\n",
133+
"\n",
134+
"print(\"Using Batch compute:{}\".format(batch_compute.cluster_resource_id))"
117135
]
118136
},
119137
{

how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-estimatorstep.ipynb

Lines changed: 32 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -76,8 +76,18 @@
7676
"cell_type": "markdown",
7777
"metadata": {},
7878
"source": [
79-
"## Get default AmlCompute\n",
80-
"You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you use default `AmlCompute` as your training compute resource."
79+
"## Create or Attach existing AmlCompute\n",
80+
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource."
81+
]
82+
},
83+
{
84+
"cell_type": "markdown",
85+
"metadata": {},
86+
"source": [
87+
"If we could not find the cluster with the given name, then we will create a new cluster here. We will create an `AmlCompute` cluster of `STANDARD_NC6` GPU VMs. This process is broken down into 3 steps:\n",
88+
"1. create the configuration (this step is local and only takes a second)\n",
89+
"2. create the cluster (this step will take about **20 seconds**)\n",
90+
"3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell"
8191
]
8292
},
8393
{
@@ -86,7 +96,25 @@
8696
"metadata": {},
8797
"outputs": [],
8898
"source": [
89-
"cpu_cluster = ws.get_default_compute_target(\"CPU\")\n",
99+
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
100+
"from azureml.core.compute_target import ComputeTargetException\n",
101+
"\n",
102+
"# choose a name for your cluster\n",
103+
"cluster_name = \"cpu-cluster\"\n",
104+
"\n",
105+
"try:\n",
106+
" cpu_cluster = ComputeTarget(workspace=ws, name=cluster_name)\n",
107+
" print('Found existing compute target')\n",
108+
"except ComputeTargetException:\n",
109+
" print('Creating a new compute target...')\n",
110+
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', max_nodes=4)\n",
111+
"\n",
112+
" # create the cluster\n",
113+
" cpu_cluster = ComputeTarget.create(ws, cluster_name, compute_config)\n",
114+
"\n",
115+
" # can poll for a minimum number of nodes and for a specific timeout. \n",
116+
" # if no min node count is provided it uses the scale settings for the cluster\n",
117+
" cpu_cluster.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
90118
"\n",
91119
"# use get_status() to get a detailed status for the current cluster. \n",
92120
"print(cpu_cluster.get_status().serialize())"
@@ -96,7 +124,7 @@
96124
"cell_type": "markdown",
97125
"metadata": {},
98126
"source": [
99-
"Now that you have created the compute target, let's see what the workspace's `compute_targets` property returns. You should now see one entry named 'cpucluster' of type `AmlCompute`."
127+
"Now that you have created the compute target, let's see what the workspace's `compute_targets` property returns. You should now see one entry named 'cpu-cluster' of type `AmlCompute`."
100128
]
101129
},
102130
{

how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-parameter-tuning-with-hyperdrive.ipynb

Lines changed: 21 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -184,26 +184,14 @@
184184
"metadata": {},
185185
"source": [
186186
"## Retrieve or create a Azure Machine Learning compute\n",
187-
"Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads.\n",
188-
"Let's check the available computes first."
189-
]
190-
},
191-
{
192-
"cell_type": "code",
193-
"execution_count": null,
194-
"metadata": {},
195-
"outputs": [],
196-
"source": [
197-
"cts = ws.compute_targets\n",
198-
"for name, ct in cts.items():\n",
199-
" print(name, ct.type, ct.provisioning_state)"
200-
]
201-
},
202-
{
203-
"cell_type": "markdown",
204-
"metadata": {},
205-
"source": [
206-
"Now let's get the default Azure Machine Learning Compute in the current workspace. We will then run the training script on this compute target."
187+
"Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's create a new Azure Machine Learning Compute in the current workspace, if it doesn't already exist. We will then run the training script on this compute target.\n",
188+
"\n",
189+
"If we could not find the compute with the given name in the previous cell, then we will create a new compute here. This process is broken down into the following steps:\n",
190+
"\n",
191+
"1. Create the configuration\n",
192+
"2. Create the Azure Machine Learning compute\n",
193+
"\n",
194+
"**This process will take a few minutes and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell.**\n"
207195
]
208196
},
209197
{
@@ -212,9 +200,20 @@
212200
"metadata": {},
213201
"outputs": [],
214202
"source": [
215-
"compute_target = ws.get_default_compute_target(\"GPU\")\n",
203+
"cluster_name = \"gpu-cluster\"\n",
204+
"\n",
205+
"try:\n",
206+
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
207+
" print('Found existing compute target {}.'.format(cluster_name))\n",
208+
"except ComputeTargetException:\n",
209+
" print('Creating a new compute target...')\n",
210+
" compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_NC6\",\n",
211+
" max_nodes=4)\n",
212+
"\n",
213+
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
214+
" compute_target.wait_for_completion(show_output=True, timeout_in_minutes=20)\n",
216215
"\n",
217-
"print(compute_target.get_status().serialize())"
216+
"print(\"Azure Machine Learning Compute attached\")"
218217
]
219218
},
220219
{

how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-publish-and-run-using-rest-endpoint.ipynb

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,20 @@
7979
"metadata": {},
8080
"outputs": [],
8181
"source": [
82-
"aml_compute = ws.get_default_compute_target(\"CPU\")"
82+
"from azureml.core.compute_target import ComputeTargetException\n",
83+
"\n",
84+
"aml_compute_target = \"cpu-cluster\"\n",
85+
"try:\n",
86+
" aml_compute = AmlCompute(ws, aml_compute_target)\n",
87+
" print(\"found existing compute target.\")\n",
88+
"except ComputeTargetException:\n",
89+
" print(\"creating new compute target\")\n",
90+
" \n",
91+
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n",
92+
" min_nodes = 1, \n",
93+
" max_nodes = 4) \n",
94+
" aml_compute = ComputeTarget.create(ws, aml_compute_target, provisioning_config)\n",
95+
" aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n"
8396
]
8497
},
8598
{

0 commit comments

Comments
 (0)