diff --git a/configuration.ipynb b/configuration.ipynb index 57a665413..03124438c 100644 --- a/configuration.ipynb +++ b/configuration.ipynb @@ -39,6 +39,7 @@ " 1. Workspace parameters\n", " 1. Access your workspace\n", " 1. Create a new workspace\n", + " 1. Create compute resources\n", "1. [Next steps](#Next%20steps)\n", "\n", "---\n", @@ -241,6 +242,97 @@ "ws.write_config()" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create compute resources for your training experiments\n", + "\n", + "Many of the sample notebooks use Azure ML managed compute (AmlCompute) to train models using a dynamically scalable pool of compute. In this section you will create default compute clusters for use by the other notebooks and any other operations you choose.\n", + "\n", + "To create a cluster, you need to specify a compute configuration that specifies the type of machine to be used and the scalability behaviors. Then you choose a name for the cluster that is unique within the workspace that can be used to address the cluster later.\n", + "\n", + "The cluster parameters are:\n", + "* vm_size - this describes the virtual machine type and size used in the cluster. All machines in the cluster are the same type. You can get the list of vm sizes available in your region by using the CLI command\n", + "\n", + "```shell\n", + "az vm list-skus -o tsv\n", + "```\n", + "* min_nodes - this sets the minimum size of the cluster. If you set the minimum to 0 the cluster will shut down all nodes while note in use. Setting this number to a value higher than 0 will allow for faster start-up times, but you will also be billed when the cluster is not in use.\n", + "* max_nodes - this sets the maximum size of the cluster. Setting this to a larger number allows for more concurrency and a greater distributed processing of scale-out jobs.\n", + "\n", + "\n", + "To create a **CPU** cluster now, run the cell below. The autoscale settings mean that the cluster will scale down to 0 nodes when inactive and up to 4 nodes when busy." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# Choose a name for your CPU cluster\n", + "cpu_cluster_name = \"cpu-cluster\"\n", + "\n", + "# Verify that cluster does not exist already\n", + "try:\n", + " cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n", + " print(\"Found existing cpu-cluster\")\n", + "except ComputeTargetException:\n", + " print(\"Creating new cpu-cluster\")\n", + " \n", + " # Specify the configuration for the new cluster\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_D2_V2\",\n", + " min_nodes=0,\n", + " max_nodes=4)\n", + "\n", + " # Create the cluster with the specified name and configuration\n", + " cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n", + " \n", + " # Wait for the cluster to complete, show the output log\n", + " cpu_cluster.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To create a **GPU** cluster, run the cell below. Note that your subscription must have sufficient quota for GPU VMs or the command will fail. To increase quota, see [these instructions](https://docs.microsoft.com/en-us/azure/azure-supportability/resource-manager-core-quotas-request). " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# Choose a name for your GPU cluster\n", + "gpu_cluster_name = \"gpu-cluster\"\n", + "\n", + "# Verify that cluster does not exist already\n", + "try:\n", + " gpu_cluster = ComputeTarget(workspace=ws, name=gpu_cluster_name)\n", + " print(\"Found existing gpu cluster\")\n", + "except ComputeTargetException:\n", + " print(\"Creating new gpu-cluster\")\n", + " \n", + " # Specify the configuration for the new cluster\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_NC6\",\n", + " min_nodes=0,\n", + " max_nodes=4)\n", + " # Create the cluster with the specified name and configuration\n", + " gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, compute_config)\n", + "\n", + " # Wait for the cluster to complete, show the output log\n", + " gpu_cluster.wait_for_completion(show_output=True)" + ] + }, { "cell_type": "markdown", "metadata": {}, diff --git a/how-to-use-azureml/automated-machine-learning/dataprep-remote-execution/auto-ml-dataprep-remote-execution.ipynb b/how-to-use-azureml/automated-machine-learning/dataprep-remote-execution/auto-ml-dataprep-remote-execution.ipynb index 3c2d1d11d..e4b50a4e5 100644 --- a/how-to-use-azureml/automated-machine-learning/dataprep-remote-execution/auto-ml-dataprep-remote-execution.ipynb +++ b/how-to-use-azureml/automated-machine-learning/dataprep-remote-execution/auto-ml-dataprep-remote-execution.ipynb @@ -205,7 +205,7 @@ "from azureml.core.compute import ComputeTarget\n", "\n", "# Choose a name for your cluster.\n", - "amlcompute_cluster_name = \"cpucluster\"\n", + "amlcompute_cluster_name = \"cpu-cluster\"\n", "\n", "found = False\n", "\n", diff --git a/how-to-use-azureml/automated-machine-learning/remote-amlcompute/auto-ml-remote-amlcompute.ipynb b/how-to-use-azureml/automated-machine-learning/remote-amlcompute/auto-ml-remote-amlcompute.ipynb index 9af1b1c31..f00e51dc5 100644 --- a/how-to-use-azureml/automated-machine-learning/remote-amlcompute/auto-ml-remote-amlcompute.ipynb +++ b/how-to-use-azureml/automated-machine-learning/remote-amlcompute/auto-ml-remote-amlcompute.ipynb @@ -119,7 +119,9 @@ "metadata": {}, "source": [ "### Create or Attach existing AmlCompute\n", - "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your AutoML run. In this tutorial, you create an AmlCompute as your training compute resource.\n", + "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your AutoML run. In this tutorial, you create `AmlCompute` as your training compute resource.\n", + "\n", + "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n", "\n", "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." ] @@ -134,12 +136,10 @@ "from azureml.core.compute import ComputeTarget\n", "\n", "# Choose a name for your cluster.\n", - "amlcompute_cluster_name = \"cpucluster\"\n", + "amlcompute_cluster_name = \"cpu-cluster\"\n", "\n", "found = False\n", - "\n", "# Check if this compute target already exists in the workspace.\n", - "\n", "cts = ws.compute_targets\n", "if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n", " found = True\n", diff --git a/how-to-use-azureml/deployment/onnx/onnx-train-pytorch-aml-deploy-mnist.ipynb b/how-to-use-azureml/deployment/onnx/onnx-train-pytorch-aml-deploy-mnist.ipynb index 5d44a1d5e..9a7f20351 100644 --- a/how-to-use-azureml/deployment/onnx/onnx-train-pytorch-aml-deploy-mnist.ipynb +++ b/how-to-use-azureml/deployment/onnx/onnx-train-pytorch-aml-deploy-mnist.ipynb @@ -98,7 +98,7 @@ "from azureml.core.compute_target import ComputeTargetException\n", "\n", "# choose a name for your cluster\n", - "cluster_name = \"gpucluster\"\n", + "cluster_name = \"gpu-cluster\"\n", "\n", "try:\n", " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-getting-started.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-getting-started.ipynb index 27d83e03a..c11afc663 100644 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-getting-started.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-getting-started.ipynb @@ -206,8 +206,15 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Retrieve default Azure Machine Learning compute\n", - "Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's get the default Azure Machine Learning Compute in the current workspace. We will then run the training script on this compute target." + "#### Retrieve or create a Azure Machine Learning compute\n", + "Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's create a new Azure Machine Learning Compute in the current workspace, if it doesn't already exist. We will then run the training script on this compute target.\n", + "\n", + "If we could not find the compute with the given name in the previous cell, then we will create a new compute here. We will create an Azure Machine Learning Compute containing **STANDARD_D2_V2 CPU VMs**. This process is broken down into the following steps:\n", + "\n", + "1. Create the configuration\n", + "2. Create the Azure Machine Learning compute\n", + "\n", + "**This process will take about 3 minutes and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell.**" ] }, { @@ -216,7 +223,23 @@ "metadata": {}, "outputs": [], "source": [ - "aml_compute = ws.get_default_compute_target(\"CPU\")" + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "aml_compute_target = \"cpu-cluster\"\n", + "try:\n", + " aml_compute = AmlCompute(ws, aml_compute_target)\n", + " print(\"found existing compute target.\")\n", + "except ComputeTargetException:\n", + " print(\"creating new compute target\")\n", + " \n", + " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n", + " min_nodes = 1, \n", + " max_nodes = 4) \n", + " aml_compute = ComputeTarget.create(ws, aml_compute_target, provisioning_config)\n", + " aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n", + " \n", + "print(\"Azure Machine Learning Compute attached\")\n" ] }, { diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-azurebatch-to-run-a-windows-executable.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-azurebatch-to-run-a-windows-executable.ipynb index faea75529..07cf623c3 100644 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-azurebatch-to-run-a-windows-executable.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-azurebatch-to-run-a-windows-executable.ipynb @@ -113,7 +113,25 @@ "metadata": {}, "outputs": [], "source": [ - "batch_compute = ws.get_default_compute_target(\"CPU\")" + "batch_compute_name = 'mybatchcompute' # Name to associate with new compute in workspace\n", + "\n", + "# Batch account details needed to attach as compute to workspace\n", + "batch_account_name = \"\" # Name of the Batch account\n", + "batch_resource_group = \"\" # Name of the resource group which contains this account\n", + "\n", + "try:\n", + " # check if already attached\n", + " batch_compute = BatchCompute(ws, batch_compute_name)\n", + "except ComputeTargetException:\n", + " print('Attaching Batch compute...')\n", + " provisioning_config = BatchCompute.attach_configuration(resource_group=batch_resource_group, \n", + " account_name=batch_account_name)\n", + " batch_compute = ComputeTarget.attach(ws, batch_compute_name, provisioning_config)\n", + " batch_compute.wait_for_completion()\n", + " print(\"Provisioning state:{}\".format(batch_compute.provisioning_state))\n", + " print(\"Provisioning errors:{}\".format(batch_compute.provisioning_errors))\n", + "\n", + "print(\"Using Batch compute:{}\".format(batch_compute.cluster_resource_id))" ] }, { diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-estimatorstep.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-estimatorstep.ipynb index 12d843932..b2de5ff41 100644 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-estimatorstep.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-estimatorstep.ipynb @@ -76,8 +76,18 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Get default AmlCompute\n", - "You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you use default `AmlCompute` as your training compute resource." + "## Create or Attach existing AmlCompute\n", + "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If we could not find the cluster with the given name, then we will create a new cluster here. We will create an `AmlCompute` cluster of `STANDARD_NC6` GPU VMs. This process is broken down into 3 steps:\n", + "1. create the configuration (this step is local and only takes a second)\n", + "2. create the cluster (this step will take about **20 seconds**)\n", + "3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell" ] }, { @@ -86,7 +96,25 @@ "metadata": {}, "outputs": [], "source": [ - "cpu_cluster = ws.get_default_compute_target(\"CPU\")\n", + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = \"cpu-cluster\"\n", + "\n", + "try:\n", + " cpu_cluster = ComputeTarget(workspace=ws, name=cluster_name)\n", + " print('Found existing compute target')\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', max_nodes=4)\n", + "\n", + " # create the cluster\n", + " cpu_cluster = ComputeTarget.create(ws, cluster_name, compute_config)\n", + "\n", + " # can poll for a minimum number of nodes and for a specific timeout. \n", + " # if no min node count is provided it uses the scale settings for the cluster\n", + " cpu_cluster.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n", "\n", "# use get_status() to get a detailed status for the current cluster. \n", "print(cpu_cluster.get_status().serialize())" @@ -96,7 +124,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Now that you have created the compute target, let's see what the workspace's `compute_targets` property returns. You should now see one entry named 'cpucluster' of type `AmlCompute`." + "Now that you have created the compute target, let's see what the workspace's `compute_targets` property returns. You should now see one entry named 'cpu-cluster' of type `AmlCompute`." ] }, { diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-parameter-tuning-with-hyperdrive.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-parameter-tuning-with-hyperdrive.ipynb index 7249f9117..940f816e5 100644 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-parameter-tuning-with-hyperdrive.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-parameter-tuning-with-hyperdrive.ipynb @@ -184,26 +184,14 @@ "metadata": {}, "source": [ "## Retrieve or create a Azure Machine Learning compute\n", - "Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads.\n", - "Let's check the available computes first." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "cts = ws.compute_targets\n", - "for name, ct in cts.items():\n", - " print(name, ct.type, ct.provisioning_state)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now let's get the default Azure Machine Learning Compute in the current workspace. We will then run the training script on this compute target." + "Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's create a new Azure Machine Learning Compute in the current workspace, if it doesn't already exist. We will then run the training script on this compute target.\n", + "\n", + "If we could not find the compute with the given name in the previous cell, then we will create a new compute here. This process is broken down into the following steps:\n", + "\n", + "1. Create the configuration\n", + "2. Create the Azure Machine Learning compute\n", + "\n", + "**This process will take a few minutes and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell.**\n" ] }, { @@ -212,9 +200,20 @@ "metadata": {}, "outputs": [], "source": [ - "compute_target = ws.get_default_compute_target(\"GPU\")\n", + "cluster_name = \"gpu-cluster\"\n", + "\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", + " print('Found existing compute target {}.'.format(cluster_name))\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_NC6\",\n", + " max_nodes=4)\n", + "\n", + " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", + " compute_target.wait_for_completion(show_output=True, timeout_in_minutes=20)\n", "\n", - "print(compute_target.get_status().serialize())" + "print(\"Azure Machine Learning Compute attached\")" ] }, { diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-publish-and-run-using-rest-endpoint.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-publish-and-run-using-rest-endpoint.ipynb index 0296f720f..422303561 100644 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-publish-and-run-using-rest-endpoint.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-publish-and-run-using-rest-endpoint.ipynb @@ -79,7 +79,20 @@ "metadata": {}, "outputs": [], "source": [ - "aml_compute = ws.get_default_compute_target(\"CPU\")" + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "aml_compute_target = \"cpu-cluster\"\n", + "try:\n", + " aml_compute = AmlCompute(ws, aml_compute_target)\n", + " print(\"found existing compute target.\")\n", + "except ComputeTargetException:\n", + " print(\"creating new compute target\")\n", + " \n", + " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n", + " min_nodes = 1, \n", + " max_nodes = 4) \n", + " aml_compute = ComputeTarget.create(ws, aml_compute_target, provisioning_config)\n", + " aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n" ] }, { diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-schedule-for-a-published-pipeline.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-schedule-for-a-published-pipeline.ipynb index a85f9bc66..5b9976c7c 100644 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-schedule-for-a-published-pipeline.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-schedule-for-a-published-pipeline.ipynb @@ -54,7 +54,7 @@ "metadata": {}, "source": [ "### Compute Targets\n", - "#### Retrieve the default Azure Machine Learning Compute" + "#### Retrieve an already attached Azure Machine Learning Compute" ] }, { @@ -63,7 +63,31 @@ "metadata": {}, "outputs": [], "source": [ - "aml_compute_target = ws.get_default_compute_target(\"CPU\")" + "from azureml.core import Run, Experiment, Datastore\n", + "\n", + "from azureml.widgets import RunDetails\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import AmlCompute, ComputeTarget\n", + "aml_compute_target = \"cpu-cluster\"\n", + "try:\n", + " aml_compute = AmlCompute(ws, aml_compute_target)\n", + " print(\"Found existing compute target: {}\".format(aml_compute_target))\n", + "except:\n", + " print(\"Creating new compute target: {}\".format(aml_compute_target))\n", + " \n", + " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n", + " min_nodes = 1, \n", + " max_nodes = 4) \n", + " aml_compute = ComputeTarget.create(ws, aml_compute_target, provisioning_config)\n", + " aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)" ] }, { diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-versioned-pipeline-endpoints.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-versioned-pipeline-endpoints.ipynb index 353f2261f..5afa90700 100644 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-versioned-pipeline-endpoints.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-versioned-pipeline-endpoints.ipynb @@ -85,10 +85,24 @@ "metadata": {}, "outputs": [], "source": [ + "from azureml.core import Run, Experiment, Datastore\n", + "from azureml.core.compute import AmlCompute, ComputeTarget\n", "from azureml.pipeline.steps import PythonScriptStep\n", "from azureml.pipeline.core import Pipeline\n", "\n", - "aml_compute = ws.get_default_compute_target(\"CPU\")\n", + "#Retrieve an already attached Azure Machine Learning Compute\n", + "aml_compute_target = \"cpu-cluster\"\n", + "try:\n", + " aml_compute = AmlCompute(ws, aml_compute_target)\n", + " print(\"Found existing compute target: {}\".format(aml_compute_target))\n", + "except:\n", + " print(\"Creating new compute target: {}\".format(aml_compute_target))\n", + " \n", + " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n", + " min_nodes = 1, \n", + " max_nodes = 4) \n", + " aml_compute = ComputeTarget.create(ws, aml_compute_target, provisioning_config)\n", + " aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n", "\n", "# source_directory\n", "source_directory = '.'\n", diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-automated-machine-learning-step.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-automated-machine-learning-step.ipynb index bf7795e83..b43f6afa7 100644 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-automated-machine-learning-step.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-automated-machine-learning-step.ipynb @@ -139,7 +139,31 @@ "metadata": {}, "outputs": [], "source": [ - "compute_target = ws.get_default_compute_target(\"CPU\")" + "# Choose a name for your cluster.\n", + "amlcompute_cluster_name = \"cpu-cluster\"\n", + "\n", + "found = False\n", + "# Check if this compute target already exists in the workspace.\n", + "cts = ws.compute_targets\n", + "if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n", + " found = True\n", + " print('Found existing compute target.')\n", + " compute_target = cts[amlcompute_cluster_name]\n", + " \n", + "if not found:\n", + " print('Creating a new compute target...')\n", + " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n", + " #vm_priority = 'lowpriority', # optional\n", + " max_nodes = 4)\n", + "\n", + " # Create the cluster.\n", + " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n", + " \n", + " # Can poll for a minimum number of nodes and for a specific timeout.\n", + " # If no min_node_count is provided, it will use the scale settings for the cluster.\n", + " compute_target.wait_for_completion(show_output = True, min_node_count = 1, timeout_in_minutes = 10)\n", + " \n", + " # For a more detailed view of current AmlCompute status, use get_status()." ] }, { diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-data-dependency-steps.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-data-dependency-steps.ipynb index 62d0b576f..831fdf6ab 100644 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-data-dependency-steps.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-data-dependency-steps.ipynb @@ -137,7 +137,22 @@ "metadata": {}, "outputs": [], "source": [ - "aml_compute = ws.get_default_compute_target(\"CPU\")\n" + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "aml_compute_target = \"cpu-cluster\"\n", + "try:\n", + " aml_compute = AmlCompute(ws, aml_compute_target)\n", + " print(\"found existing compute target.\")\n", + "except ComputeTargetException:\n", + " print(\"creating new compute target\")\n", + " \n", + " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n", + " min_nodes = 1, \n", + " max_nodes = 4) \n", + " aml_compute = ComputeTarget.create(ws, aml_compute_target, provisioning_config)\n", + " aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n", + " \n", + "print(\"Aml Compute attached\")\n" ] }, { @@ -418,13 +433,31 @@ "RunDetails(pipeline_run1).show()" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Wait for pipeline run to complete" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "pipeline_run1.wait_for_completion(show_output=True)" + ] + }, { "cell_type": "markdown", "metadata": {}, "source": [ "### See Outputs\n", "\n", - "See where outputs of each pipeline step are located on your datastore." + "See where outputs of each pipeline step are located on your datastore.\n", + "\n", + "***Wait for pipeline run to complete, to make sure all the outputs are ready***" ] }, { diff --git a/how-to-use-azureml/machine-learning-pipelines/pipeline-batch-scoring/pipeline-batch-scoring.ipynb b/how-to-use-azureml/machine-learning-pipelines/pipeline-batch-scoring/pipeline-batch-scoring.ipynb index 0c592e683..bf2a4daec 100644 --- a/how-to-use-azureml/machine-learning-pipelines/pipeline-batch-scoring/pipeline-batch-scoring.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/pipeline-batch-scoring/pipeline-batch-scoring.ipynb @@ -162,7 +162,7 @@ "metadata": {}, "source": [ "### Create and attach Compute targets\n", - "Use the below code to get the default Compute target. " + "Use the below code to create and attach Compute targets. " ] }, { @@ -171,9 +171,33 @@ "metadata": {}, "outputs": [], "source": [ - "cluster_type = os.environ.get(\"AML_CLUSTER_TYPE\", \"GPU\")\n", + "# choose a name for your cluster\n", + "aml_compute_name = os.environ.get(\"AML_COMPUTE_NAME\", \"gpu-cluster\")\n", + "cluster_min_nodes = os.environ.get(\"AML_COMPUTE_MIN_NODES\", 0)\n", + "cluster_max_nodes = os.environ.get(\"AML_COMPUTE_MAX_NODES\", 1)\n", + "vm_size = os.environ.get(\"AML_COMPUTE_SKU\", \"STANDARD_NC6\")\n", "\n", - "compute_target = ws.get_default_compute_target(cluster_type)" + "\n", + "if aml_compute_name in ws.compute_targets:\n", + " compute_target = ws.compute_targets[aml_compute_name]\n", + " if compute_target and type(compute_target) is AmlCompute:\n", + " print('found compute target. just use it. ' + aml_compute_name)\n", + "else:\n", + " print('creating a new compute target...')\n", + " provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size, # NC6 is GPU-enabled\n", + " vm_priority = 'lowpriority', # optional\n", + " min_nodes = cluster_min_nodes, \n", + " max_nodes = cluster_max_nodes)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, aml_compute_name, provisioning_config)\n", + " \n", + " # can poll for a minimum number of nodes and for a specific timeout. \n", + " # if no min node count is provided it will use the scale settings for the cluster\n", + " compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n", + " \n", + " # For a more detailed view of current Azure Machine Learning Compute status, use get_status()\n", + " print(compute_target.get_status().serialize())" ] }, { diff --git a/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/pipeline-style-transfer.ipynb b/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/pipeline-style-transfer.ipynb index 06af5b9cc..b0deef241 100644 --- a/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/pipeline-style-transfer.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/pipeline-style-transfer.ipynb @@ -94,7 +94,7 @@ "outputs": [], "source": [ "# AmlCompute\n", - "cpu_cluster_name = \"cpucluster\"\n", + "cpu_cluster_name = \"cpu-cluster\"\n", "try:\n", " cpu_cluster = AmlCompute(ws, cpu_cluster_name)\n", " print(\"found existing cluster.\")\n", @@ -108,7 +108,7 @@ " cpu_cluster.wait_for_completion(show_output=True)\n", " \n", "# AmlCompute\n", - "gpu_cluster_name = \"gpucluster\"\n", + "gpu_cluster_name = \"gpu-cluster\"\n", "try:\n", " gpu_cluster = AmlCompute(ws, gpu_cluster_name)\n", " print(\"found existing cluster.\")\n", @@ -351,7 +351,6 @@ " inputs=[models_dir, ffmpeg_images],\n", " outputs=[processed_images],\n", " pip_packages=[\"mpi4py\", \"torch\", \"torchvision\"],\n", - " runconfig=amlcompute_run_config,\n", " use_gpu=True,\n", " source_directory=scripts_folder\n", ")\n", diff --git a/how-to-use-azureml/training-with-deep-learning/distributed-chainer/distributed-chainer.ipynb b/how-to-use-azureml/training-with-deep-learning/distributed-chainer/distributed-chainer.ipynb index af3180fa2..4d577c755 100644 --- a/how-to-use-azureml/training-with-deep-learning/distributed-chainer/distributed-chainer.ipynb +++ b/how-to-use-azureml/training-with-deep-learning/distributed-chainer/distributed-chainer.ipynb @@ -95,8 +95,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Get default AmlCompute\n", - "You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource. Specifically, the below code gets the default compute cluster.\n", + "## Create or attach existing AmlCompute\n", + "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource. Specifically, the below code creates an `STANDARD_NC6` GPU cluster that autoscales from `0` to `4` nodes.\n", + "\n", + "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.\n", "\n", "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." ] @@ -107,7 +109,24 @@ "metadata": {}, "outputs": [], "source": [ - "compute_target = ws.get_default_compute_target(type=\"GPU\")\n", + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = \"gpu-cluster\"\n", + "\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", + " print('Found existing compute target.')\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n", + " max_nodes=4)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", + "\n", + " compute_target.wait_for_completion(show_output=True)\n", "\n", "# use get_status() to get a detailed status for the current AmlCompute. \n", "print(compute_target.get_status().serialize())" @@ -117,7 +136,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The above code retrieves the default GPU compute. If you instead want to use default CPU compute, provide type=\"CPU\"." + "The above code creates GPU compute. If you instead want to create CPU compute, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`." ] }, { @@ -223,7 +242,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI, you must provide the argument `distributed_training=MpiConfiguration()`. Using this estimator with these settings, Chainer and its dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `Chainer` constructor's `pip_packages` or `conda_packages` parameters." + "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI, you must provide the argument `distributed_backend='mpi'`. Using this estimator with these settings, Chainer and its dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `Chainer` constructor's `pip_packages` or `conda_packages` parameters." ] }, { diff --git a/how-to-use-azureml/training-with-deep-learning/distributed-cntk-with-custom-docker/distributed-cntk-with-custom-docker.ipynb b/how-to-use-azureml/training-with-deep-learning/distributed-cntk-with-custom-docker/distributed-cntk-with-custom-docker.ipynb index 52f901ecf..523f58edc 100644 --- a/how-to-use-azureml/training-with-deep-learning/distributed-cntk-with-custom-docker/distributed-cntk-with-custom-docker.ipynb +++ b/how-to-use-azureml/training-with-deep-learning/distributed-cntk-with-custom-docker/distributed-cntk-with-custom-docker.ipynb @@ -98,8 +98,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Get default AmlCompute\n", - "You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use default `AmlCompute` as the training compute resource.\n", + "## Create or Attach existing AmlCompute\n", + "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.\n", + "\n", + "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n", "\n", "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." ] @@ -110,7 +112,24 @@ "metadata": {}, "outputs": [], "source": [ - "compute_target = ws.get_default_compute_target(type=\"GPU\")\n", + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = \"gpu-cluster\"\n", + "\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", + " print('Found existing compute target.')\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n", + " max_nodes=4)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", + "\n", + " compute_target.wait_for_completion(show_output=True)\n", "\n", "# use get_status() to get a detailed status for the current AmlCompute\n", "print(compute_target.get_status().serialize())" @@ -270,7 +289,6 @@ "metadata": {}, "outputs": [], "source": [ - "from azureml.core.runconfig import MpiConfiguration\n", "from azureml.train.estimator import Estimator\n", "\n", "script_params = {\n", @@ -284,7 +302,8 @@ " entry_script='cntk_distr_mnist.py',\n", " script_params=script_params,\n", " node_count=2,\n", - " distributed_training=MpiConfiguration(),\n", + " process_count_per_node=1,\n", + " distributed_backend='mpi',\n", " pip_packages=['cntk-gpu==2.6'],\n", " custom_docker_image='microsoft/mmlspark:gpu-0.12',\n", " use_gpu=True)" @@ -296,7 +315,7 @@ "source": [ "We would like to train our model using a [pre-built Docker container](https://hub.docker.com/r/microsoft/mmlspark/). To do so, specify the name of the docker image to the argument `custom_docker_image`. Finally, we provide the `cntk` package to `pip_packages` to install CNTK 2.6 on our custom image.\n", "\n", - "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to run distributed CNTK, which uses MPI, you must provide the argument `distributed_training=MpiConfiguration()`." + "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to run distributed CNTK, which uses MPI, you must provide the argument `distributed_backend='mpi'`." ] }, { diff --git a/how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb b/how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb index 81085fd9f..83b4de134 100644 --- a/how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb +++ b/how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb @@ -96,8 +96,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Get default AmlCompute\n", - "You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource. Specifically, the below code uses the default compute in the workspace.\n", + "## Create or attach existing AmlCompute\n", + "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource. Specifically, the below code creates an `STANDARD_NC6` GPU cluster that autoscales from `0` to `4` nodes.\n", + "\n", + "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.\n", "\n", "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." ] @@ -108,7 +110,24 @@ "metadata": {}, "outputs": [], "source": [ - "compute_target = ws.get_default_compute_target(type=\"GPU\")\n", + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = \"gpu-cluster\"\n", + "\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", + " print('Found existing compute target.')\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n", + " max_nodes=4)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", + "\n", + " compute_target.wait_for_completion(show_output=True)\n", "\n", "# use get_status() to get a detailed status for the current AmlCompute. \n", "print(compute_target.get_status().serialize())" @@ -118,7 +137,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The above code retrieves the default GPU compute. If you instead want to use default CPU compute, provide type=\"CPU\"." + "The above code creates GPU compute. If you instead want to create CPU compute, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`." ] }, { @@ -236,7 +255,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI/Horovod, you must provide the argument `distributed_training=MpiConfiguration()`. Using this estimator with these settings, PyTorch, Horovod and their dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `PyTorch` constructor's `pip_packages` or `conda_packages` parameters." + "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI/Horovod, you must provide the argument `distributed_backend='mpi'`. Using this estimator with these settings, PyTorch, Horovod and their dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `PyTorch` constructor's `pip_packages` or `conda_packages` parameters." ] }, { diff --git a/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-horovod/distributed-tensorflow-with-horovod.ipynb b/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-horovod/distributed-tensorflow-with-horovod.ipynb index ca35a2dc9..41eae0959 100644 --- a/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-horovod/distributed-tensorflow-with-horovod.ipynb +++ b/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-horovod/distributed-tensorflow-with-horovod.ipynb @@ -98,8 +98,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Get default AmlCompute\n", - "You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you use default `AmlCompute` as your training compute resource.\n", + "## Create or Attach existing AmlCompute\n", + "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.\n", + "\n", + "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n", "\n", "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." ] @@ -110,7 +112,24 @@ "metadata": {}, "outputs": [], "source": [ - "compute_target = ws.get_default_compute_target(\"GPU\")\n", + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = \"gpu-cluster\"\n", + "\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", + " print('Found existing compute target')\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", + " max_nodes=4)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", + "\n", + " compute_target.wait_for_completion(show_output=True)\n", "\n", "# use get_status() to get a detailed status for the current cluster. \n", "print(compute_target.get_status().serialize())" @@ -120,7 +139,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The above code retrieves the default GPU compute. If you instead want to use default CPU compute, provide type=\"CPU\"." + "The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`." ] }, { @@ -304,7 +323,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI/Horovod, you must provide the argument `distributed_training=MpiConfiguration()`. Using this estimator with these settings, TensorFlow, Horovod and their dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `TensorFlow` constructor's `pip_packages` or `conda_packages` parameters.\n", + "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI/Horovod, you must provide the argument `distributed_backend='mpi'`. Using this estimator with these settings, TensorFlow, Horovod and their dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `TensorFlow` constructor's `pip_packages` or `conda_packages` parameters.\n", "\n", "Note that we passed our training data reference `ds_data` to our script's `--input_data` argument. This will 1) mount our datastore on the remote compute and 2) provide the path to the data zip file on our datastore." ] diff --git a/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-parameter-server/distributed-tensorflow-with-parameter-server.ipynb b/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-parameter-server/distributed-tensorflow-with-parameter-server.ipynb index bae85a2c5..565883883 100644 --- a/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-parameter-server/distributed-tensorflow-with-parameter-server.ipynb +++ b/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-parameter-server/distributed-tensorflow-with-parameter-server.ipynb @@ -98,8 +98,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Get default AmlCompute\n", - "You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you use default `AmlCompute` as your training compute resource.\n", + "## Create or Attach existing AmlCompute\n", + "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.\n", + "\n", + "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n", "\n", "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." ] @@ -110,7 +112,24 @@ "metadata": {}, "outputs": [], "source": [ - "compute_target = ws.get_default_compute_target(type=\"GPU\")\n", + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = \"gpu-cluster\"\n", + "\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", + " print('Found existing compute target.')\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", + " max_nodes=4)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", + "\n", + " compute_target.wait_for_completion(show_output=True)\n", "\n", "# use get_status() to get a detailed status for the current cluster. \n", "print(compute_target.get_status().serialize())" @@ -220,7 +239,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The above code specifies that we will run our training script on `2` nodes, with two workers and one parameter server. In order to execute a native distributed TensorFlow run, you must provide the argument `distributed_training=TensorflowConfiguration()`. Using this estimator with these settings, TensorFlow and its dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `TensorFlow` constructor's `pip_packages` or `conda_packages` parameters." + "The above code specifies that we will run our training script on `2` nodes, with two workers and one parameter server. In order to execute a native distributed TensorFlow run, you must provide the argument `distributed_backend='ps'`. Using this estimator with these settings, TensorFlow and its dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `TensorFlow` constructor's `pip_packages` or `conda_packages` parameters." ] }, { diff --git a/how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/dummy_train.py b/how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/dummy_train.py index d6b723b34..96fc8959a 100644 --- a/how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/dummy_train.py +++ b/how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/dummy_train.py @@ -9,8 +9,16 @@ parser = argparse.ArgumentParser() parser.add_argument('--numbers-in-sequence', type=int, dest='num_in_sequence', default=10, help='number of fibonacci numbers in sequence') + +# This is how you can use a bool argument in Python. If you want the 'my_bool_var' to be True, just pass it +# in Estimator's script_param as script+params:{'my_bool_var': ''}. +# And, if you want to use it as False, then do not pass it in the Estimator's script_params. +# You can reverse the behavior by setting action='store_false' in the next line. +parser.add_argument("--my_bool_var", action='store_true') + args = parser.parse_args() num = args.num_in_sequence +my_bool_var = args.my_bool_var def fibo(n): @@ -23,6 +31,7 @@ def fibo(n): try: from azureml.core import Run run = Run.get_context() + print("The value of boolean parameter 'my_bool_var' is {}".format(my_bool_var)) print("Log Fibonacci numbers.") for i in range(0, num - 1): run.log('Fibonacci numbers', fibo(i)) diff --git a/how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/how-to-use-estimator.ipynb b/how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/how-to-use-estimator.ipynb index cce3ce1bd..6070cb56b 100644 --- a/how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/how-to-use-estimator.ipynb +++ b/how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/how-to-use-estimator.ipynb @@ -113,8 +113,18 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Get default AmlCompute\n", - "You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you use default `AmlCompute` as your training compute resource." + "## Create or Attach existing AmlCompute\n", + "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If we could not find the cluster with the given name, then we will create a new cluster here. We will create an `AmlCompute` cluster of `STANDARD_NC6` GPU VMs. This process is broken down into 3 steps:\n", + "1. create the configuration (this step is local and only takes a second)\n", + "2. create the cluster (this step will take about **20 seconds**)\n", + "3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell" ] }, { @@ -123,7 +133,25 @@ "metadata": {}, "outputs": [], "source": [ - "cpu_cluster = ws.get_default_compute_target(\"CPU\")\n", + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = \"cpu-cluster\"\n", + "\n", + "try:\n", + " cpu_cluster = ComputeTarget(workspace=ws, name=cluster_name)\n", + " print('Found existing compute target')\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', max_nodes=4)\n", + "\n", + " # create the cluster\n", + " cpu_cluster = ComputeTarget.create(ws, cluster_name, compute_config)\n", + "\n", + " # can poll for a minimum number of nodes and for a specific timeout. \n", + " # if no min node count is provided it uses the scale settings for the cluster\n", + " cpu_cluster.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n", "\n", "# use get_status() to get a detailed status for the current cluster. \n", "print(cpu_cluster.get_status().serialize())" @@ -207,8 +235,11 @@ "outputs": [], "source": [ "# use a conda environment, don't use Docker, on local computer\n", + "# Let's see how you can pass bool arguments in the script_params. Passing `'--my_bool_var': ''` will set my_bool_var as True and\n", + "# if you want it to be False, just do not pass it in the script_params.\n", "script_params = {\n", - " '--numbers-in-sequence': 10\n", + " '--numbers-in-sequence': 10,\n", + " '--my_bool_var': ''\n", "}\n", "est = Estimator(source_directory='.', script_params=script_params, compute_target='local', entry_script='dummy_train.py', use_docker=False)\n", "run = exp.submit(est)\n", diff --git a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.ipynb b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.ipynb index e6a3f48d1..31282d863 100644 --- a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.ipynb +++ b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.ipynb @@ -95,8 +95,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Get default AmlCompute\n", - "You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource.\n", + "## Create or Attach existing AmlCompute\n", + "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource.\n", + "\n", + "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.\n", "\n", "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." ] @@ -107,7 +109,24 @@ "metadata": {}, "outputs": [], "source": [ - "compute_target = ws.get_default_compute_target(type=\"GPU\")\n", + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = \"gpu-cluster\"\n", + "\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", + " print('Found existing compute target.')\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", + " max_nodes=4)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", + "\n", + " compute_target.wait_for_completion(show_output=True)\n", "\n", "# use get_status() to get a detailed status for the current cluster. \n", "print(compute_target.get_status().serialize())" @@ -117,7 +136,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The above code retrieves the default GPU compute. If you instead want to use default CPU compute, provide type=\"CPU\"." + "The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`." ] }, { diff --git a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-keras/train-hyperparameter-tune-deploy-with-keras.ipynb b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-keras/train-hyperparameter-tune-deploy-with-keras.ipynb index d5ab58fc4..afe3a2f36 100644 --- a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-keras/train-hyperparameter-tune-deploy-with-keras.ipynb +++ b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-keras/train-hyperparameter-tune-deploy-with-keras.ipynb @@ -239,8 +239,18 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Get default AmlCompute\n", - "You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you use default `AmlCompute` as your training compute resource." + "## Create or Attach existing AmlCompute\n", + "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If we could not find the cluster with the given name, then we will create a new cluster here. We will create an `AmlCompute` cluster of `STANDARD_NC6` GPU VMs. This process is broken down into 3 steps:\n", + "1. create the configuration (this step is local and only takes a second)\n", + "2. create the cluster (this step will take about **20 seconds**)\n", + "3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell" ] }, { @@ -249,7 +259,26 @@ "metadata": {}, "outputs": [], "source": [ - "compute_target = ws.get_default_compute_target(type=\"GPU\")\n", + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = \"gpu-cluster\"\n", + "\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", + " print('Found existing compute target')\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", + " max_nodes=4)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", + "\n", + " # can poll for a minimum number of nodes and for a specific timeout. \n", + " # if no min node count is provided it uses the scale settings for the cluster\n", + " compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n", "\n", "# use get_status() to get a detailed status for the current cluster. \n", "print(compute_target.get_status().serialize())" @@ -259,7 +288,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Now that you have retrtieved the compute target, let's see what the workspace's `compute_targets` property returns." + "Now that you have created the compute target, let's see what the workspace's `compute_targets` property returns. You should now see one entry named \"gpu-cluster\" of type `AmlCompute`." ] }, { @@ -351,7 +380,7 @@ "metadata": {}, "source": [ "## Create TensorFlow estimator & add Keras\n", - "Next, we construct an `azureml.train.dnn.TensorFlow` estimator object, use the `gpucluster` as compute target, and pass the mount-point of the datastore to the training code as a parameter.\n", + "Next, we construct an `azureml.train.dnn.TensorFlow` estimator object, use the `gpu-cluster` as compute target, and pass the mount-point of the datastore to the training code as a parameter.\n", "The TensorFlow estimator is providing a simple way of launching a TensorFlow training job on a compute target. It will automatically provide a docker image that has TensorFlow installed. In this case, we add `keras` package (for the Keras framework obviously), and `matplotlib` package for plotting a \"Loss vs. Accuracy\" chart and record it in run history." ] }, @@ -524,7 +553,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "In the training script, the Keras model is saved into two files, `model.json` and `model.h5`, in the `outputs/models` folder on the gpucluster AmlCompute node. Azure ML automatically uploaded anything written in the `./outputs` folder into run history file store. Subsequently, we can use the `run` object to download the model files. They are under the the `outputs/model` folder in the run history file store, and are downloaded into a local folder named `model`." + "In the training script, the Keras model is saved into two files, `model.json` and `model.h5`, in the `outputs/models` folder on the gpu-cluster AmlCompute node. Azure ML automatically uploaded anything written in the `./outputs` folder into run history file store. Subsequently, we can use the `run` object to download the model files. They are under the the `outputs/model` folder in the run history file store, and are downloaded into a local folder named `model`." ] }, { diff --git a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb index f1e89a3c1..da497aae6 100644 --- a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb +++ b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb @@ -261,8 +261,18 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Get default AmlCompute\n", - "You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you use default `AmlCompute` as your training compute resource." + "## Create or Attach existing AmlCompute\n", + "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If we could not find the cluster with the given name, then we will create a new cluster here. We will create an `AmlCompute` cluster of `STANDARD_NC6` GPU VMs. This process is broken down into 3 steps:\n", + "1. create the configuration (this step is local and only takes a second)\n", + "2. create the cluster (this step will take about **20 seconds**)\n", + "3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell" ] }, { @@ -271,7 +281,26 @@ "metadata": {}, "outputs": [], "source": [ - "compute_target = ws.get_default_compute_target(type=\"GPU\")\n", + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = \"gpu-cluster\"\n", + "\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", + " print('Found existing compute target')\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", + " max_nodes=4)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", + "\n", + " # can poll for a minimum number of nodes and for a specific timeout. \n", + " # if no min node count is provided it uses the scale settings for the cluster\n", + " compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n", "\n", "# use get_status() to get a detailed status for the current cluster. \n", "print(compute_target.get_status().serialize())" @@ -281,7 +310,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Now that you have retrieved the compute target, let's see what the workspace's `compute_targets` property returns." + "Now that you have created the compute target, let's see what the workspace's `compute_targets` property returns. You should now see one entry named 'gpu-cluster' of type `AmlCompute`." ] }, { diff --git a/how-to-use-azureml/training/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.ipynb b/how-to-use-azureml/training/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.ipynb index 329793384..8a8ab0375 100644 --- a/how-to-use-azureml/training/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.ipynb +++ b/how-to-use-azureml/training/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.ipynb @@ -108,7 +108,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Get default AmlCompute" + "## Create AmlCompute" ] }, { @@ -126,7 +126,24 @@ "metadata": {}, "outputs": [], "source": [ - "compute_target = ws.get_default_compute_target(type=\"CPU\")\n", + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = \"gpu-cluster\"\n", + "\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", + " print('Found existing compute target.')\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n", + " max_nodes=4)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", + "\n", + " compute_target.wait_for_completion(show_output=True)\n", "\n", "# use get_status() to get a detailed status for the current cluster. \n", "print(compute_target.get_status().serialize())" @@ -136,7 +153,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The above code retrieves the default CPU compute." + "The above code creates GPU compute. If you instead want to create CPU compute, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`." ] }, { diff --git a/how-to-use-azureml/training/train-hyperparameter-tune-deploy-with-sklearn/train_iris.py b/how-to-use-azureml/training/train-hyperparameter-tune-deploy-with-sklearn/train_iris.py index dc532abdc..b7a924345 100644 --- a/how-to-use-azureml/training/train-hyperparameter-tune-deploy-with-sklearn/train_iris.py +++ b/how-to-use-azureml/training/train-hyperparameter-tune-deploy-with-sklearn/train_iris.py @@ -22,7 +22,7 @@ def main(): help='Penalty parameter of the error term') args = parser.parse_args() - run.log('Kernel type', np.string(args.kernel)) + run.log('Kernel type', np.str(args.kernel)) run.log('Penalty', np.float(args.penalty)) # loading the iris dataset diff --git a/how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb b/how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb index 40e57e675..de79e0527 100644 --- a/how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb +++ b/how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb @@ -188,79 +188,6 @@ "myenv.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Get the default compute target\n", - "\n", - "In this case, we use the default `AmlCompute`target from the workspace." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import ScriptRunConfig\n", - "from azureml.core.runconfig import DEFAULT_CPU_IMAGE\n", - "\n", - "src = ScriptRunConfig(source_directory=project_folder, script='train.py')\n", - "\n", - "# Use default compute target\n", - "src.run_config.target = ws.get_default_compute_target(type=\"CPU\").name\n", - "\n", - "# Set environment\n", - "src.run_config.environment = myenv" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Submit run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = experiment.submit(src)\n", - "\n", - "# Show run details\n", - "run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "# Shows output of the run on stdout.\n", - "run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.get_metrics()" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -283,7 +210,7 @@ "from azureml.core.compute_target import ComputeTargetException\n", "\n", "# Choose a name for your CPU cluster\n", - "cpu_cluster_name = \"cpucluster\"\n", + "cpu_cluster_name = \"cpu-cluster\"\n", "\n", "# Verify that cluster does not exist already\n", "try:\n", @@ -310,13 +237,28 @@ "metadata": {}, "outputs": [], "source": [ + "from azureml.core import ScriptRunConfig\n", + "from azureml.core.runconfig import DEFAULT_CPU_IMAGE\n", + "\n", + "src = ScriptRunConfig(source_directory=project_folder, script='train.py')\n", + "\n", "# Set compute target to the one created in previous step\n", "src.run_config.target = cpu_cluster.name\n", + "\n", + "# Set environment\n", + "src.run_config.environment = myenv\n", " \n", "run = experiment.submit(config=src)\n", "run" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)." + ] + }, { "cell_type": "code", "execution_count": null, @@ -364,7 +306,7 @@ "from azureml.core.compute_target import ComputeTargetException\n", "\n", "# Choose a name for your CPU cluster\n", - "cpu_cluster_name = \"cpucluster\"\n", + "cpu_cluster_name = \"cpu-cluster\"\n", "\n", "# Verify that cluster does not exist already\n", "try:\n", @@ -463,7 +405,7 @@ "outputs": [], "source": [ "#Delete () is used to deprovision and delete the AmlCompute target. Useful if you want to re-use the compute name \n", - "#'cpucluster' in this case but use a different VM family for instance.\n", + "#'cpu-cluster' in this case but use a different VM family for instance.\n", "\n", "#cpu_cluster.delete()" ] diff --git a/tutorials/img-classification-part1-training.ipynb b/tutorials/img-classification-part1-training.ipynb index d890a57bd..81fd73e1b 100644 --- a/tutorials/img-classification-part1-training.ipynb +++ b/tutorials/img-classification-part1-training.ipynb @@ -126,7 +126,9 @@ "metadata": {}, "source": [ "### Create or Attach existing compute resource\n", - "By using Azure Machine Learning Compute, a managed service, data scientists can train machine learning models on clusters of Azure virtual machines. Examples include VMs with GPU support. In this tutorial, you use default Azure Machine Learning Compute as your training environment." + "By using Azure Machine Learning Compute, a managed service, data scientists can train machine learning models on clusters of Azure virtual machines. Examples include VMs with GPU support. In this tutorial, you create Azure Machine Learning Compute as your training environment. The code below creates the compute clusters for you if they don't already exist in your workspace.\n", + "\n", + "**Creation of compute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace the code will skip the creation process." ] }, { @@ -140,10 +142,38 @@ }, "outputs": [], "source": [ + "from azureml.core.compute import AmlCompute\n", + "from azureml.core.compute import ComputeTarget\n", "import os\n", "\n", - "cluster_type = os.environ.get(\"AML_COMPUTE_CLUSTER_TYPE\", \"CPU\")\n", - "compute_target = ws.get_default_compute_target(cluster_type)" + "# choose a name for your cluster\n", + "compute_name = os.environ.get(\"AML_COMPUTE_CLUSTER_NAME\", \"cpu-cluster\")\n", + "compute_min_nodes = os.environ.get(\"AML_COMPUTE_CLUSTER_MIN_NODES\", 0)\n", + "compute_max_nodes = os.environ.get(\"AML_COMPUTE_CLUSTER_MAX_NODES\", 4)\n", + "\n", + "# This example uses CPU VM. For using GPU VM, set SKU to STANDARD_NC6\n", + "vm_size = os.environ.get(\"AML_COMPUTE_CLUSTER_SKU\", \"STANDARD_D2_V2\")\n", + "\n", + "\n", + "if compute_name in ws.compute_targets:\n", + " compute_target = ws.compute_targets[compute_name]\n", + " if compute_target and type(compute_target) is AmlCompute:\n", + " print('found compute target. just use it. ' + compute_name)\n", + "else:\n", + " print('creating a new compute target...')\n", + " provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size,\n", + " min_nodes = compute_min_nodes, \n", + " max_nodes = compute_max_nodes)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)\n", + " \n", + " # can poll for a minimum number of nodes and for a specific timeout. \n", + " # if no min node count is provided it will use the scale settings for the cluster\n", + " compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n", + " \n", + " # For a more detailed view of current AmlCompute status, use get_status()\n", + " print(compute_target.get_status().serialize())" ] }, { @@ -324,8 +354,8 @@ "# get hold of the current run\n", "run = Run.get_context()\n", "\n", - "print('Train a logistic regression model with regularizaion rate of', args.reg)\n", - "clf = LogisticRegression(C=1.0/args.reg, random_state=42)\n", + "print('Train a logistic regression model with regularization rate of', args.reg)\n", + "clf = LogisticRegression(C=1.0/args.reg, solver=\"liblinear\", multi_class=\"auto\", random_state=42)\n", "clf.fit(X_train, y_train)\n", "\n", "print('Predict the test set')\n", @@ -386,14 +416,13 @@ "source": [ "### Create an estimator\n", "\n", - "An estimator object is used to submit the run. Create your estimator by running the following code to define:\n", + "An estimator object is used to submit the run. Azure Machine Learning has pre-configured estimators for common machine learning frameworks, as well as generic Estimator. Create SKLearn estimator for scikit-learn model, by specifying\n", "\n", "* The name of the estimator object, `est`\n", "* The directory that contains your scripts. All the files in this directory are uploaded into the cluster nodes for execution. \n", "* The compute target. In this case you will use the AmlCompute you created\n", "* The training script name, train.py\n", "* Parameters required from the training script \n", - "* Python packages needed for training\n", "\n", "In this tutorial, this target is AmlCompute. All files in the script folder are uploaded into the cluster nodes for execution. The data_folder is set to use the datastore (`ds.path('mnist').as_mount()`)." ] @@ -408,18 +437,17 @@ }, "outputs": [], "source": [ - "from azureml.train.estimator import Estimator\n", + "from azureml.train.sklearn import SKLearn\n", "\n", "script_params = {\n", " '--data-folder': ds.path('mnist').as_mount(),\n", - " '--regularization': 0.05\n", + " '--regularization': 0.5\n", "}\n", "\n", - "est = Estimator(source_directory=script_folder,\n", + "est = SKLearn(source_directory=script_folder,\n", " script_params=script_params,\n", " compute_target=compute_target,\n", - " entry_script='train.py',\n", - " conda_packages=['scikit-learn'])" + " entry_script='train.py')" ] }, { @@ -646,18 +674,6 @@ "language": "python", "name": "python36" }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - }, "msauthor": "roastala" }, "nbformat": 4,