SQL Server Big Data cluster bundles Spark and HDFS together with SQL server. Azure Data Studio IDE provides built in notebooks that enables data scientists and data engineers to run Spark notebooks and job in Python, R, or Scala code against the Big Data Cluster. This folder contains spark sample notebook on using Spark in SQL server Big data cluster
Data Loading - Transforming CSV to Parquet
Data Virtualization - Spark to SQL using MSSQL Spark connector
Data Virtualization - Spark to SQL using Spark JDBC connector
Data Virtualization - Spark with external Object Stores
Configure - Configure a spark session using a notebook
Install - Install 3rd party packages
Restful-Access - Access Spark in BDC via restful Livy APIs
-
From Azure Data Studio Connect to the SQL Server Master instance in a big data cluster.
-
Right-click on the server name, select Manage, switch to SQL Server Big Data Cluster tab, and open the notebook in Azure Data Studio. Wait for the “Kernel” and the target context (“Attach to”) to be populated. If required set the relevant “Kernel” ( e.g PySpark3 ) and Attach to needs to be the IP address of your big data cluster endpoint.
-
Run each cell in the Notebook sequentially.