Name	Name	Last commit message	Last commit date
parent directory ..
mleap_sql_test	mleap_sql_test
mssql-mleap-app	mssql-mleap-app
README.md	README.md
Train_Score_Export_with_Spark.jpg	Train_Score_Export_with_Spark.jpg
train_score_export_ml_models_with_spark.ipynb	train_score_export_ml_models_with_spark.ipynb

Name

Last commit message

Last commit date

train_score_export_ml_models_with_spark.ipynb

MLeap on SQL Server Big Data cluster

This folder shows how we can build a model with Spark ML, export the model to MLeap, and score the model in SQL Server with its Java Language Extension

Model training with Spark ML

In this sample code, AdultCensusIncome.csv is used to build a Spark ML pipeline model. We can download the dataset from internet and put it on HDFS on a SQL BDC cluster so that it can be accessed by Spark.

The data is first read into Spark and split into training and testing datasets. We then train a pipeline mode with the training data and export the model to a mleap bundle.

An equivalent Jupyter notebook is also included here if it is preferred over pure Python code.

Model scoring with SQL Server

Now that we have the Spark ML pipeline model in a common serialization MLeap bundle format, we can score the model in Java without the presence of Spark.

In order to score the model in SQL Server with its Java Language Extension, we need first build a Java application that can load the model into Java and score it. The mssql-mleap-app folder shows how that can be done.

Then in T-SQL we can call the Java application and score the model with some database table.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

MLeap on SQL Server Big Data cluster

Model training with Spark ML

Model scoring with SQL Server

FilesExpand file tree

sparkml

Directory actions

More options

Directory actions

More options

Latest commit

History

sparkml

Folders and files

parent directory

README.md

MLeap on SQL Server Big Data cluster

Model training with Spark ML

Model scoring with SQL Server