-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Add documentation on the preview ADB linking experience #580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next
Next commit
add linking docs
- Loading branch information
commit f08e68c8e942c48fba3119794d555c97d1dff9d3
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,33 +1,73 @@ | ||
| Azure Databricks is a managed Spark offering on Azure and customers already use it for advanced analytics. It provides a collaborative Notebook based environment with CPU or GPU based compute cluster. | ||
| Azure Databricks is a managed Spark offering on Azure and customers already use it for advanced analytics. It provides a collaborative Notebook based environment with CPU or GPU based compute cluster. | ||
|
|
||
| In this section, you will find sample notebooks on how to use Azure Machine Learning SDK with Azure Databricks. You can train a model using Spark MLlib and then deploy the model to ACI/AKS from within Azure Databricks. You can also use Automated ML capability (**public preview**) of Azure ML SDK with Azure Databricks. | ||
| In this section, you will find sample notebooks on how to use Azure Machine Learning SDK with Azure Databricks. You can train a model using Spark MLlib and then deploy the model to ACI/AKS from within Azure Databricks. You can also use Automated ML capability (**public preview**) of Azure ML SDK with Azure Databricks. | ||
|
|
||
| - Customers who use Azure Databricks for advanced analytics can now use the same cluster to run experiments with or without automated machine learning. | ||
| - You can keep the data within the same cluster. | ||
| - You can leverage the local worker nodes with autoscale and auto termination capabilities. | ||
| - You can use multiple cores of your Azure Databricks cluster to perform simultenous training. | ||
| - You can further tune the model generated by automated machine learning if you chose to. | ||
| - Every run (including the best run) is available as a pipeline, which you can tune further if needed. | ||
| - Customers who use Azure Databricks for advanced analytics can now use the same cluster to run experiments with or without automated machine learning. | ||
| - You can keep the data within the same cluster. | ||
| - You can leverage the local worker nodes with autoscale and auto termination capabilities. | ||
| - You can use multiple cores of your Azure Databricks cluster to perform simultenous training. | ||
| - You can further tune the model generated by automated machine learning if you chose to. | ||
| - Every run (including the best run) is available as a pipeline, which you can tune further if needed. | ||
| - The model trained using Azure Databricks can be registered in Azure ML SDK workspace and then deployed to Azure managed compute (ACI or AKS) using the Azure Machine learning SDK. | ||
|
|
||
| Please follow our [Azure doc](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#azure-databricks) to install the sdk in your Azure Databricks cluster before trying any of the sample notebooks. | ||
|
|
||
| **Single file** - | ||
| **Single file** - | ||
| The following archive contains all the sample notebooks. You can the run notebooks after importing [DBC](Databricks_AMLSDK_1-4_6.dbc) in your Databricks workspace instead of downloading individually. | ||
|
|
||
| Notebooks 1-4 have to be run sequentially & are related to Income prediction experiment based on this [dataset](https://archive.ics.uci.edu/ml/datasets/adult) and demonstrate how to data prep, train and operationalize a Spark ML model with Azure ML Python SDK from within Azure Databricks. | ||
| Notebooks 1-4 have to be run sequentially & are related to Income prediction experiment based on this [dataset](https://archive.ics.uci.edu/ml/datasets/adult) and demonstrate how to data prep, train and operationalize a Spark ML model with Azure ML Python SDK from within Azure Databricks. | ||
|
|
||
| Notebook 6 is an Automated ML sample notebook for Classification. | ||
|
|
||
| Learn more about [how to use Azure Databricks as a development environment](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-environment#azure-databricks) for Azure Machine Learning service. | ||
|
|
||
| **Databricks as a Compute Target from AML Pipelines** | ||
| You can use Azure Databricks as a compute target from [Azure Machine Learning Pipelines](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-ml-pipelines). Take a look at this notebook for details: [aml-pipelines-use-databricks-as-compute-target.ipynb](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/azure-databricks/databricks-as-remote-compute-target/aml-pipelines-use-databricks-as-compute-target.ipynb). | ||
| You can use Azure Databricks as a compute target from [Azure Machine Learning Pipelines](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-ml-pipelines). Take a look at this notebook for details: [aml-pipelines-use-databricks-as-compute-target.ipynb](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/azure-databricks/databricks-as-remote-compute-target/aml-pipelines-use-databricks-as-compute-target.ipynb). | ||
|
|
||
| # Linked Azure Databricks and Azure ML Workspaces (Preview) | ||
akshaya-a marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| Customers can now link Azure Databricks and AzureML Workspaces to better enable MLOps scenarios by managing their tracking data in a single place - the Azure ML workspace. | ||
|
|
||
| ## Linking the Workspaces (Admin operation) | ||
|
|
||
| 1. The Azure Databricks Azure portal blade now includes a new button to link an Azure ML workspace | ||
akshaya-a marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|  | ||
| 2. Both a new or existing Azure ML Workspace can be linked in the resulting prompt. Follow any instructions to set up the Azure ML workspace. | ||
akshaya-a marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|  | ||
| 3. After a successful link operation, you should see the Azure Databricks overview reflect the linked status | ||
|  | ||
|
|
||
| ## Configure MLflow to send data to Azure ML (All roles) | ||
|
|
||
| 1. Add azureml-mlflow as a library to any notebook or cluster that should send data to Azure ML. You can do this via | ||
akshaya-a marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| 1. [DBUtils](https://docs.azuredatabricks.net/user-guide/dev-tools/dbutils.html#dbutils-library) | ||
| ``` | ||
| dbutils.library.installPyPI("azureml-mlflow") | ||
| dbutils.library.restartPython() # Removes Python state | ||
| ``` | ||
| 2. [Cluster Libraries](https://docs.azuredatabricks.net/user-guide/libraries.html#install-a-library-on-a-cluster) | ||
|  | ||
| 2. [Set the MLflow tracking URI](https://mlflow.org/docs/latest/tracking.html#where-runs-are-recorded) to the following scheme: | ||
| ``` | ||
| adbazureml://${azuremlRegion}.experiments.azureml.net/history/v1.0/subscriptions/${azuremlSubscriptionId}/resourceGroups/${azuremlResourceGroupName}/providers/Microsoft.MachineLearningServices/workspaces/${azuremlWorkspaceName} | ||
| ``` | ||
| 1. You can automatically configure this on your clusters using this helper script | ||
akshaya-a marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
akshaya-a marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| 1. [AzureML Tracking Cluster Init Script](./linking/README.md) | ||
| 3. That's it! If configured correctly, you'll now be able to see your MLflow tracking data in both Azure ML (via the REST API and all clients) and Azure Databricks (in the MLflow UI and using the MLflow client) | ||
akshaya-a marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
| ## Known Preview Limitations | ||
| While we roll this experience out to customers for feedback, there are some known limitations we'd love comments on in addition to any other issues seen in your workflow | ||
akshaya-a marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| ### 1-to-1 Workspace linking | ||
| Currently, an Azure ML Workspace can only be linked to one Azure Databricks Workspace at a time. | ||
| ### Data synchronization | ||
| At the moment, data is only generated and sent to Azure ML as well for tracking. Editing tags via the Azure Databricks MLflow UI won't be reflected in the Azure ML UI. | ||
akshaya-a marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| ### Java and R support | ||
| The experience currently is only available from the Python MLflow client. | ||
|
|
||
| For more on SDK concepts, please refer to [notebooks](https://github.com/Azure/MachineLearningNotebooks). | ||
|
|
||
| **Please let us know your feedback.** | ||
|
|
||
|
|
||
|
|
||
|  | ||
|
|
||
|  | ||
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,56 @@ | ||
| # Adding an init script to an Azure Databricks cluster | ||
|
|
||
| The [azureml-cluster-init.sh](./azureml-cluster-init.sh) script configures the environment to | ||
| 1. Use the configured AzureML Workspace with Workspace.from_config() | ||
| 2. Set the default MLflow Tracking Server to be the AzureML managed one | ||
|
|
||
| Modify azureml-cluster-init.sh by providing the values for region, subscriptionId, resourceGroupName, and workspaceName of your target Azure ML workspace in the highlighted section at the top of the script. | ||
|
|
||
| To create the Azure Databricks cluster-scoped init script | ||
|
|
||
| 1. Create the base directory you want to store the init script in if it does not exist. | ||
| ``` | ||
| dbutils.fs.mkdirs("dbfs:/databricks/<directory>/") | ||
| ``` | ||
|
|
||
| 2. Create the script by copying the contents of azureml-cluster-init.sh | ||
| ``` | ||
| dbutils.fs.put("/databricks/<directory>/azureml-cluster-init.sh",""" | ||
| <configured_contents_of_azureml-cluster-init.sh> | ||
| """, True) | ||
|
|
||
| 3. Check that the script exists. | ||
| ``` | ||
| display(dbutils.fs.ls("dbfs:/databricks/<directory>/azureml-cluster-init.sh")) | ||
| ``` | ||
|
|
||
| 1. Configure the cluster to run the script. | ||
| * Using the cluster configuration page | ||
| 1. On the cluster configuration page, click the Advanced Options toggle. | ||
| 1. At the bottom of the page, click the Init Scripts tab. | ||
| 1. In the Destination drop-down, select a destination type. Example: 'DBFS' | ||
| 1. Specify a path to the init script. | ||
| ``` | ||
| dbfs:/databricks/<directory>/azureml-cluster-init.sh | ||
| ``` | ||
| 1. Click Add | ||
|
|
||
| * Using the API. | ||
| ``` | ||
| curl -n -X POST -H 'Content-Type: application/json' -d '{ | ||
| "cluster_id": "<cluster_id>", | ||
| "num_workers": <num_workers>, | ||
| "spark_version": "<spark_version>", | ||
| "node_type_id": "<node_type_id>", | ||
| "cluster_log_conf": { | ||
| "dbfs" : { | ||
| "destination": "dbfs:/cluster-logs" | ||
| } | ||
| }, | ||
| "init_scripts": [ { | ||
| "dbfs": { | ||
| "destination": "dbfs:/databricks/<directory>/azureml-cluster-init.sh" | ||
| } | ||
| } ] | ||
| }' https://<databricks-instance>/api/2.0/clusters/edit | ||
| ``` |
24 changes: 24 additions & 0 deletions
24
how-to-use-azureml/azure-databricks/linking/azureml-cluster-init.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,24 @@ | ||
| #!/bin/bash | ||
| # This script configures the environment to | ||
| # 1. Use the configured AzureML Workspace with azureml.core.Workspace.from_config() | ||
| # 2. Set the default MLflow Tracking Server to be the AzureML managed one | ||
|
|
||
| ############## START CONFIGURATION ################# | ||
| # Provide the required *AzureML* workspace information | ||
| region="" # example: westus2 | ||
| subscriptionId="" # example: bcb65f42-f234-4bff-91cf-9ef816cd9936 | ||
| resourceGroupName="" # example: dev-rg | ||
| workspaceName="" # example: myazuremlws | ||
|
|
||
| # Optional config directory | ||
| configLocation="/databricks/config.json" | ||
| ############### END CONFIGURATION ################# | ||
|
|
||
|
|
||
| # Drop the workspace configuration on the cluster | ||
| sudo touch $configLocation | ||
| sudo echo {\\"subscription_id\\": \\"${subscriptionId}\\", \\"resource_group\\": \\"${resourceGroupName}\\", \\"workspace_name\\": \\"${workspaceName}\\"} > $configLocation | ||
|
|
||
| # Set the MLflow Tracking URI | ||
| trackingUri="adbazureml://${region}.experiments.azureml.net/history/v1.0/subscriptions/${subscriptionId}/resourceGroups/${resourceGroupName}/providers/Microsoft.MachineLearningServices/workspaces/${workspaceName}" | ||
| sudo echo export MLFLOW_TRACKING_URI=${trackingUri} >> /databricks/spark/conf/spark-env.sh |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.