Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 92 additions & 0 deletions configuration.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
" 1. Workspace parameters\n",
" 1. Access your workspace\n",
" 1. Create a new workspace\n",
" 1. Create compute resources\n",
"1. [Next steps](#Next%20steps)\n",
"\n",
"---\n",
Expand Down Expand Up @@ -241,6 +242,97 @@
"ws.write_config()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create compute resources for your training experiments\n",
"\n",
"Many of the sample notebooks use Azure ML managed compute (AmlCompute) to train models using a dynamically scalable pool of compute. In this section you will create default compute clusters for use by the other notebooks and any other operations you choose.\n",
"\n",
"To create a cluster, you need to specify a compute configuration that specifies the type of machine to be used and the scalability behaviors. Then you choose a name for the cluster that is unique within the workspace that can be used to address the cluster later.\n",
"\n",
"The cluster parameters are:\n",
"* vm_size - this describes the virtual machine type and size used in the cluster. All machines in the cluster are the same type. You can get the list of vm sizes available in your region by using the CLI command\n",
"\n",
"```shell\n",
"az vm list-skus -o tsv\n",
"```\n",
"* min_nodes - this sets the minimum size of the cluster. If you set the minimum to 0 the cluster will shut down all nodes while note in use. Setting this number to a value higher than 0 will allow for faster start-up times, but you will also be billed when the cluster is not in use.\n",
"* max_nodes - this sets the maximum size of the cluster. Setting this to a larger number allows for more concurrency and a greater distributed processing of scale-out jobs.\n",
"\n",
"\n",
"To create a **CPU** cluster now, run the cell below. The autoscale settings mean that the cluster will scale down to 0 nodes when inactive and up to 4 nodes when busy."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# Choose a name for your CPU cluster\n",
"cpu_cluster_name = \"cpu-cluster\"\n",
"\n",
"# Verify that cluster does not exist already\n",
"try:\n",
" cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
" print(\"Found existing cpu-cluster\")\n",
"except ComputeTargetException:\n",
" print(\"Creating new cpu-cluster\")\n",
" \n",
" # Specify the configuration for the new cluster\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_D2_V2\",\n",
" min_nodes=0,\n",
" max_nodes=4)\n",
"\n",
" # Create the cluster with the specified name and configuration\n",
" cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
" \n",
" # Wait for the cluster to complete, show the output log\n",
" cpu_cluster.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To create a **GPU** cluster, run the cell below. Note that your subscription must have sufficient quota for GPU VMs or the command will fail. To increase quota, see [these instructions](https://docs.microsoft.com/en-us/azure/azure-supportability/resource-manager-core-quotas-request). "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# Choose a name for your GPU cluster\n",
"gpu_cluster_name = \"gpu-cluster\"\n",
"\n",
"# Verify that cluster does not exist already\n",
"try:\n",
" gpu_cluster = ComputeTarget(workspace=ws, name=gpu_cluster_name)\n",
" print(\"Found existing gpu cluster\")\n",
"except ComputeTargetException:\n",
" print(\"Creating new gpu-cluster\")\n",
" \n",
" # Specify the configuration for the new cluster\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_NC6\",\n",
" min_nodes=0,\n",
" max_nodes=4)\n",
" # Create the cluster with the specified name and configuration\n",
" gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, compute_config)\n",
"\n",
" # Wait for the cluster to complete, show the output log\n",
" gpu_cluster.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@
"from azureml.core.compute import ComputeTarget\n",
"\n",
"# Choose a name for your cluster.\n",
"amlcompute_cluster_name = \"cpucluster\"\n",
"amlcompute_cluster_name = \"cpu-cluster\"\n",
"\n",
"found = False\n",
"\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,9 @@
"metadata": {},
"source": [
"### Create or Attach existing AmlCompute\n",
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your AutoML run. In this tutorial, you create an AmlCompute as your training compute resource.\n",
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your AutoML run. In this tutorial, you create `AmlCompute` as your training compute resource.\n",
"\n",
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
"\n",
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
]
Expand All @@ -134,12 +136,10 @@
"from azureml.core.compute import ComputeTarget\n",
"\n",
"# Choose a name for your cluster.\n",
"amlcompute_cluster_name = \"cpucluster\"\n",
"amlcompute_cluster_name = \"cpu-cluster\"\n",
"\n",
"found = False\n",
"\n",
"# Check if this compute target already exists in the workspace.\n",
"\n",
"cts = ws.compute_targets\n",
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n",
" found = True\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
"cluster_name = \"gpucluster\"\n",
"cluster_name = \"gpu-cluster\"\n",
"\n",
"try:\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -206,8 +206,15 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Retrieve default Azure Machine Learning compute\n",
"Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's get the default Azure Machine Learning Compute in the current workspace. We will then run the training script on this compute target."
"#### Retrieve or create a Azure Machine Learning compute\n",
"Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's create a new Azure Machine Learning Compute in the current workspace, if it doesn't already exist. We will then run the training script on this compute target.\n",
"\n",
"If we could not find the compute with the given name in the previous cell, then we will create a new compute here. We will create an Azure Machine Learning Compute containing **STANDARD_D2_V2 CPU VMs**. This process is broken down into the following steps:\n",
"\n",
"1. Create the configuration\n",
"2. Create the Azure Machine Learning compute\n",
"\n",
"**This process will take about 3 minutes and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell.**"
]
},
{
Expand All @@ -216,7 +223,23 @@
"metadata": {},
"outputs": [],
"source": [
"aml_compute = ws.get_default_compute_target(\"CPU\")"
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"aml_compute_target = \"cpu-cluster\"\n",
"try:\n",
" aml_compute = AmlCompute(ws, aml_compute_target)\n",
" print(\"found existing compute target.\")\n",
"except ComputeTargetException:\n",
" print(\"creating new compute target\")\n",
" \n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n",
" min_nodes = 1, \n",
" max_nodes = 4) \n",
" aml_compute = ComputeTarget.create(ws, aml_compute_target, provisioning_config)\n",
" aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
" \n",
"print(\"Azure Machine Learning Compute attached\")\n"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,25 @@
"metadata": {},
"outputs": [],
"source": [
"batch_compute = ws.get_default_compute_target(\"CPU\")"
"batch_compute_name = 'mybatchcompute' # Name to associate with new compute in workspace\n",
"\n",
"# Batch account details needed to attach as compute to workspace\n",
"batch_account_name = \"<batch_account_name>\" # Name of the Batch account\n",
"batch_resource_group = \"<batch_resource_group>\" # Name of the resource group which contains this account\n",
"\n",
"try:\n",
" # check if already attached\n",
" batch_compute = BatchCompute(ws, batch_compute_name)\n",
"except ComputeTargetException:\n",
" print('Attaching Batch compute...')\n",
" provisioning_config = BatchCompute.attach_configuration(resource_group=batch_resource_group, \n",
" account_name=batch_account_name)\n",
" batch_compute = ComputeTarget.attach(ws, batch_compute_name, provisioning_config)\n",
" batch_compute.wait_for_completion()\n",
" print(\"Provisioning state:{}\".format(batch_compute.provisioning_state))\n",
" print(\"Provisioning errors:{}\".format(batch_compute.provisioning_errors))\n",
"\n",
"print(\"Using Batch compute:{}\".format(batch_compute.cluster_resource_id))"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -76,8 +76,18 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get default AmlCompute\n",
"You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you use default `AmlCompute` as your training compute resource."
"## Create or Attach existing AmlCompute\n",
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we could not find the cluster with the given name, then we will create a new cluster here. We will create an `AmlCompute` cluster of `STANDARD_NC6` GPU VMs. This process is broken down into 3 steps:\n",
"1. create the configuration (this step is local and only takes a second)\n",
"2. create the cluster (this step will take about **20 seconds**)\n",
"3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell"
]
},
{
Expand All @@ -86,7 +96,25 @@
"metadata": {},
"outputs": [],
"source": [
"cpu_cluster = ws.get_default_compute_target(\"CPU\")\n",
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
"cluster_name = \"cpu-cluster\"\n",
"\n",
"try:\n",
" cpu_cluster = ComputeTarget(workspace=ws, name=cluster_name)\n",
" print('Found existing compute target')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', max_nodes=4)\n",
"\n",
" # create the cluster\n",
" cpu_cluster = ComputeTarget.create(ws, cluster_name, compute_config)\n",
"\n",
" # can poll for a minimum number of nodes and for a specific timeout. \n",
" # if no min node count is provided it uses the scale settings for the cluster\n",
" cpu_cluster.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
"\n",
"# use get_status() to get a detailed status for the current cluster. \n",
"print(cpu_cluster.get_status().serialize())"
Expand All @@ -96,7 +124,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that you have created the compute target, let's see what the workspace's `compute_targets` property returns. You should now see one entry named 'cpucluster' of type `AmlCompute`."
"Now that you have created the compute target, let's see what the workspace's `compute_targets` property returns. You should now see one entry named 'cpu-cluster' of type `AmlCompute`."
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -184,26 +184,14 @@
"metadata": {},
"source": [
"## Retrieve or create a Azure Machine Learning compute\n",
"Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads.\n",
"Let's check the available computes first."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cts = ws.compute_targets\n",
"for name, ct in cts.items():\n",
" print(name, ct.type, ct.provisioning_state)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's get the default Azure Machine Learning Compute in the current workspace. We will then run the training script on this compute target."
"Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's create a new Azure Machine Learning Compute in the current workspace, if it doesn't already exist. We will then run the training script on this compute target.\n",
"\n",
"If we could not find the compute with the given name in the previous cell, then we will create a new compute here. This process is broken down into the following steps:\n",
"\n",
"1. Create the configuration\n",
"2. Create the Azure Machine Learning compute\n",
"\n",
"**This process will take a few minutes and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell.**\n"
]
},
{
Expand All @@ -212,9 +200,20 @@
"metadata": {},
"outputs": [],
"source": [
"compute_target = ws.get_default_compute_target(\"GPU\")\n",
"cluster_name = \"gpu-cluster\"\n",
"\n",
"try:\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
" print('Found existing compute target {}.'.format(cluster_name))\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_NC6\",\n",
" max_nodes=4)\n",
"\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
" compute_target.wait_for_completion(show_output=True, timeout_in_minutes=20)\n",
"\n",
"print(compute_target.get_status().serialize())"
"print(\"Azure Machine Learning Compute attached\")"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,20 @@
"metadata": {},
"outputs": [],
"source": [
"aml_compute = ws.get_default_compute_target(\"CPU\")"
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"aml_compute_target = \"cpu-cluster\"\n",
"try:\n",
" aml_compute = AmlCompute(ws, aml_compute_target)\n",
" print(\"found existing compute target.\")\n",
"except ComputeTargetException:\n",
" print(\"creating new compute target\")\n",
" \n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n",
" min_nodes = 1, \n",
" max_nodes = 4) \n",
" aml_compute = ComputeTarget.create(ws, aml_compute_target, provisioning_config)\n",
" aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n"
]
},
{
Expand Down
Loading