Skip to content

Latest commit

 

History

History

README.md

MLeap on SQL Server Big Data cluster

This folder shows how we can build a model with Spark ML, export the model to MLeap, and score the model in SQL Server with its Java Language Extension

Train_Score_Export_with_Spark.jpg

Model training with Spark ML

In this sample code, AdultCensusIncome.csv is used to build a Spark ML pipeline model. We can download the dataset from internet and put it on HDFS on a SQL BDC cluster so that it can be accessed by Spark.

The data is first read into Spark and split into training and testing datasets. We then train a pipeline mode with the training data and export the model to a mleap bundle.

An equivalent Jupyter notebook is also included here if it is preferred over pure Python code.

Model scoring with SQL Server

Now that we have the Spark ML pipeline model in a common serialization MLeap bundle format, we can score the model in Java without the presence of Spark.

In order to score the model in SQL Server with its Java Language Extension, we need first build a Java application that can load the model into Java and score it. The mssql-mleap-app folder shows how that can be done.

Then in T-SQL we can call the Java application and score the model with some database table.