Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
*.userprefs

# Build results
_site/
[Dd]ebug/
[Dd]ebugPublic/
[Rr]elease/
Expand Down
74 changes: 74 additions & 0 deletions Powershell_Instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
---
layout: default
title: PowerShell Instructions
---


## PowerShell Instructions
---------------------------

<div class="row">
<div class="col-md-6">
<div class="toc">
<li> <a href="#setup">Setup</a></li>
<li> <a href="#execute-powershell-script">Execute PowerShell Script</a></li>
<li> <a href="#score-production-data">Score Production Data</a></li>
<li> <a href="#review-data">Review Data</a></li>
<li> <a href="#visualizing-results">Visualizing Results</a> </li>
<li> <a href="#other-steps">Other Steps</a></li>
</div>
</div>
<div class="col-md-6">
If you have deployed a VM through the
<a href="{{ site.aka_url }}">Azure AI Gallery</a>, all the steps below have already been performed and your database on that machine has all the resulting tables and stored procedures. You can explore this solution in more detail by examining the folders and running Python or stored procedures to re-create the model, or skip to trying out the model in the included [Jupyter notebook](jupyter.html).
</div>
</div>

If you are configuring your own server, continue with the steps below to run the PowerShell script.

## Setup
-----------

First, make sure you have set up your SQL Server by <a href="SetupSQL.html">following these instructions</a>. Then proceed with the steps below to run the solution template using the automated PowerShell file.

## Execute PowerShell Script
----------------------------

Running this PowerShell script will create the data tables and stored procedures for the the operationalization of this solution in R in the `{{ site.db_name }}` database. It will also execute these procedures to create full database with results of the steps – dataset creation, modeling, and scoring as described [here](dba.html).


1. Log onto the machine that contains the SQL Server you wish to use.

1. Install [Git](https://gitforwindows.org/) if it is not already present. During the install, check the box to add LFS support.

2. Download <a href="https://raw.githubusercontent.com/Microsoft/ml-server-text-classification/dev/Resources/ActionScripts/TextClassificationSetup.ps1" download>TextClassificationSetup.ps1</a> to your computer.

3. Open a command or PowerShell window as Administrator.

4. CD to the directory where you downloaded the above .ps1 file and execute the command:

.\TextClassificationSetup.ps1

5. Answer the prompts if any.

This will make the following modification to your SQL Server:

* Creates the SLQRUserGroup for running R and Python code.
* Reconfigures SQL Server to allow running of external scripts.
* Installs the latest SQL Server 2017 Cumulative Update if no updates have been installed (this solution requires at least CU1 to run successfully).
* Clones the solution code and data into the c:\Solutions\{{ site.folder_name }} directory.
* Creates the solution databases `{{ site.db_name }}_R` and `{{ site.db_name }}_Py`
* Runs the solution workflow to populate all database tables.

<div class="alert alert info">
If you wish to run the solution code on a different computer than SQL Server machine, see <a href="local.html">Setup for Local Code Execution</a>.
</div>

## Review Data
--------------

Once the PowerShell script has completed successfully, log into the SQL Server Management Studio to view all the datasets that have been created in the `{{ site.db_name }}_R` or `{{ site.db_name }}_Py` databases.
Hit `Refresh` if necessary.

[Click here](tables.html) to view the details all tables created in this solution.

46 changes: 46 additions & 0 deletions SetupSQL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
layout: default
title: "On-Prem: Setup SQL Server 2017"
---

## On-Prem: Setup SQL Server
--------------------------

<div class="row">
<div class="col-md-6">
<div class="toc">
<li><a href="#prepare-your-sql-server-installation">Prepare your SQL Server Installation</a></li>
<li><a href="#ready-to-run-code">Ready to Run Code</a></li>
</div>
</div>
<div class="col-md-6">
The instructions on this page will help you to add this solution to your on premises SQL Server 2016 or higher.
<p>
If you instead would like to try this solution out on a virtual machine, visit the <a href="{{ site.aka_url }}">Azure AI Gallery</a> and use the Deploy button. All the configuration described below will be done for you, as well as the initial deployment of the solution. </p>
</div>
</div>

## Prepare your SQL Server Installation
-------------------------------------------

The rest of this page assumes you are configuring your on premises SQL Server 2016 or 2017 for this solution.

If you need a trial version of SQL Server see [What's New in SQL Server 2017](https://docs.microsoft.com/en-us/sql/sql-server/what-s-new-in-sql-server-2017)for download or VM options.

Complete the steps in the Set up Microsoft Machine Learning Services (In-Database) Instructions. The set up instructions file can found at <a href="https://msdn.microsoft.com/en-us/library/mt696069.aspx" target="_blank"> https://msdn.microsoft.com/en-us/library/mt696069.aspx</a>

Make sure to install both SQL and Standalone version of R and Python.


### Ready to Run Code
---------------------

You are now ready to run the code for this solution.

* Install the solution by following these <a href="Powershell_Instructions.html">PowerShell Instructions</a> for deployment.

* Typically a data scientist will create and test a predictive model from their favorite R IDE, at which point the models will be stored in SQL Server and then scored in production using Transact-SQL (T-SQL) stored procedures.
You can follow along with this by following the steps in [For the Data Scientist](data-scientist.html) or [For the Database Analyst](dba.html).



17 changes: 17 additions & 0 deletions _config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
author:
name: Microsoft

description: Text Classificaion using SQL Server 2017 + ML Services with R or Python
# names
solution_name: TextClassification
folder_name: TextClassification
db_name: TextClassification
ps1_name: TextClassificationSetup.ps1
pbix_name: TextClassification.pbix


# urls
code_url: "https://github.com/Microsoft/ml-server-text-classification"
website_url: "https://microsoft.github.io/ml-server-text-classification/"
aka_url: "http://aka.ms/text-classification"

25 changes: 25 additions & 0 deletions _includes/finalsteps.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
<a name="step4"></a>

### Step 4: Visualize the Results
-------------------------
The Power BI file **TextClassification.pbix** included with this solution can be used to visualize how the model performs.

On the `Training Summary` tab, you can see the predicted and actual labels for the Test data, along with the Micro Average Accuracy and Macro Average Accuracy values.

<img src="images/pbi1.png" />

On the `Scoring New Text` tab you can view the predicted labels for new text.

<img src="images/pbi2.png" />

## Template Contents
---------------------

[View the contents of this solution template](contents.html).


To try this out yourself:

* View the [Quick Start](quick.html).

[&lt; Home](index.html)
1 change: 1 addition & 0 deletions _includes/inputdata.html
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This solution uses a preprocessed version of the [NewsGroups20](http://scikit-learn.org/stable/datasets/twenty_newsgroups.html), containing a Subject, a Text, and a Label. It has a similar structure to a support ticket data set which would also have two data fields: Title, and Problem description. Thus, you can easily change this solution to use your support ticket data simply by providing the same structure for the test and train datasets.
9 changes: 9 additions & 0 deletions _includes/pysetup.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
<p>Debra would work on her own machine, using <a href="https://docs.microsoft.com/en-us/sql/advanced-analytics/python/sql-server-python-services">Machine Learning Services with Python</a> to execute these Python scripts. In case you want to run the code from the VM, ML Services Python has already been installed.</p>

<p>The Python code is present in the <strong>{{site.folder_name}}/Python</strong> directory. </p>

<p>OPTIONAL: You can execute the Python code on your local computer if you wish, but you must first <a href="local.html">prepare both the VM and your computer</a>. </p>

Follow these instructions to <a href="jupyter.html">view and execute the Python code with the Jupyter Notebook</a>.

You can also execute the Python code with an IDE. Both PyCharms and Visual Studio are installed on your VM. For each, you must first configure the Python interpreter to use <code>C:\Program Files\Microsoft\ML Server\PYTHON_SERVER\python.exe</code>.
5 changes: 5 additions & 0 deletions _includes/requirements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Running the R or Python scripts requires the following:

* SQL server 2017 with RevoscalePy (version 9.2) and MicrosoftML (version 1.5.0) installed and configured;
* SQL Database which the user has write permission and execute stored procedures;
* For more information about Machine Learning Services in SQL Server, please visit: [https://msdn.microsoft.com/en-us/library/mt604847.aspx](https://msdn.microsoft.com/en-us/library/mt604847.aspx)
85 changes: 85 additions & 0 deletions _layouts/default.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<link rel="shortcut icon" type="image/png" href="images/favicon.png">
<title>{{ page.title }}</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="{{ site.description }}">
<meta name="author" content="{{ site.author.name }}">

<!-- Le styles -->
<link href="stylesheets/bootstrap.min.css" rel="stylesheet"/>

<!-- HTML5 shim, for IE6-8 support of HTML5 elements -->
<!--[if lt IE 9]>
<script src="/js/html5shiv.js"></script>
<![endif]-->
<link rel="stylesheet" href="stylesheets/mystyles.css">

</head>
<body>
<div class="container">
<div class="row">
<div class="col-sm-9">
<div class="jumboton teal">
<h1>{{ site.solution_name }}</h1>
<p>Implemented with Microsoft Machine Learning Services</p>
</div>
<div class="content">
{{ content }}
</div>
</div><!--/col -->

<div class="col-sm-3 sidebar-offcanvas" id="sidebar">
<img src="images/TextAnalysis.png">
<div class="list-group">
<a href="index.html" class="list-group-item">Home</a>
<a href="data-scientist.html" class="list-group-item">For the Data Scientist</a>
<a href="dba.html" class="list-group-item">For the Database Analyst</a>
<a href="quick.html" class="list-group-item">Quick Start</a>
</div>
<hr />
<div class="center">
<a class="btn btn-large btn-info" href="{{ site.code_url }}">
View On <strong>GitHub</strong></a>
</div>
<hr />
<p class="details">Other Links</p>
<div class="toc">
<li><a href="contents.html">Packet Contents</a></li>
<li><a href="sitemap.html">Site Map</a></li>
<li><a href="https://aka.ms/ml-server-samples">Other ML Server Solutions</a></li>

</div>
</div><!--/col -->
</div><!-- /row -->

<div class="row">
<hr />
<footer>
<p>
This project has adopted the <a href="https://opensource.microsoft.com/codeofconduct/">Microsoft Open Source Code of Conduct</a>. For more information see the <a href="https://opensource.microsoft.com/codeofconduct/faq/">Code of Conduct FAQ</a> or contact <a href="mailto:[email protected]">[email protected]</a> with any additional questions or comments.</p>
<p><small>Hosted on GitHub Pages</small> </p>
</footer>
</div>

</div><!-- /container -->

<script src="//ajax.aspnetcdn.com/ajax/jQuery/jquery-3.3.1.min.js"></script>


<!-- <script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

ga('create', 'UA-88854735-5', 'auto');
ga('send', 'pageview');

</script> -->


</body>
</html>
106 changes: 106 additions & 0 deletions contents.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
---
layout: default
title: Template Contents
---

## Template Contents
--------------------

The following is the directory structure for this template:

- [**Data**](#data) This contains Data for scoring. Other data is downloaded during the solution workflow
- [**R**](#model-development-in-R) This contains the R code to prepare training/testing/evaluation set, train the multi-class classifier and evaluate the model.
- [**Python**](#model-development-in-python) This contains the Python code to prepare training/testing/evaluation set, train the multi-class classifier and evaluate the model.
- [**SQLR**](#operationalize-in-sql-r ) Stored procedures in SQL implement the model training workflow with R code.
- [**SQLPy**](#operationalize-in-sql-python ) Stored procedures in SQL implement the model training workflow with Python code.
- [**Resources**](#resources-for-the-solution-packet) This directory contains other resources for the solution package.




### Data
----------------------------
Data for training and testing will also be downloaded and added to this directory, so more files will be present once the solution has been run once.
<table class="table table-striped table-condensed">
<tr><th> File </th><th> Description</th></tr>
<tr><td>News_To_Score </td><td> Text file containing new data for scoring. </td></tr>
</table>

### Model Development in R
-------------------------

<table class="table table-striped table-condensed">
<tr><th> File </th><th> Description </th></tr>
<tr><td>TextClassificationR.ipynb </td><td> Create features on the fly for the training and testing set, train model, make predictions, and evaluate the model in Jupyter notebook.</td></tr>
<tr><td>run_modeling_main.R </td><td> Create features on the fly for the training and testing set, train model, make predictions, and evaluate the model.</td></tr>
</table>

* See [For the Data Scientist](data_scientist.html) for more details about these files.


### Operationalize in SQL R
-------------------------------------------------------
Stored procedures in SQL implement the model training workflow with R code.

<table class="table table-striped table-condensed">
<tr><th> File </th><th> Description </th></tr>
<tr><td>Load_Data.ps1</td><td>Loads all data for the solution if you'd like to create a second instance of the solution on the same server</td></tr>
<tr><td>execute_yourself.sql</td><td>Runs through all the steps of the solution</td></tr>
<tr><td>step0_create_tables.sql</td><td>Create data tables, invoked in Load_Data.ps1</td></tr>
<tr><td>step1_create_features_train.sql</td><td>Create features on the fly and train model </td></tr>
<tr><td>step2_score.sql</td><td>Scores data with model created in step1 </td></tr>
<tr><td>step3_evaluate.sql</td><td>Evaluates model created in step1 </td></tr>
</table>

* See [ For the Database Analyst](dba.html) for more information.
* Follow the [PowerShell Instructions](Powershell_Instructions.html) to execute the PowerShell script which creates these stored procedures.

### Model Development in Python
-------------------------

<table class="table table-striped table-condensed">
<tr><th> File </th><th> Description </th></tr>
<tr><td>TextClassificationR.ipynb </td><td> Create features on the fly for the training and testing set, train model, make predictions, and evaluate the model in Jupyter notebook.</td></tr>
<tr><td>run_modeling_main.py </td><td> Create features on the fly for the training and testing set, train model, make predictions, and evaluate the model.</td></tr>
</table>


* See [For the Data Scientist](data_scientist.html) for more details about these files.


### Operationalize in SQL Python
-------------------------------------------------------
Stored procedures in SQL implement the model training workflow with Python code.

<table class="table table-striped table-condensed">
<tr><th> File </th><th> Description </th></tr>
<tr><td>Load_Data.ps1</td><td>Loads all data for the solution if you'd like to create a second instance of the solution on the same server</td></tr>
<tr><td>execute_yourself.sql</td><td>Runs through all the steps of the solution</td></tr>
<tr><td>step0_create_tables.sql</td><td>Create data tables, invoked in Load_Data.ps1</td></tr>
<tr><td>step1_create_features_train.sql</td><td>Create features on the fly and train model </td></tr>
<tr><td>step2_score.sql</td><td>Scores data with model created in step1 </td></tr>
<tr><td>step3_evaluate.sql</td><td>Evaluates model created in step1 </td></tr>
</table>

* See [ For the Database Analyst](dba.html) for more information.
* Follow the [PowerShell Instructions](Powershell_Instructions.html) to execute the PowerShell script which creates these stored procedures.

### Resources for the Solution Package
------------------------------------

<table class="table table-striped table-condensed">
<tr><th> File </th><th> Description </th></tr>

<tr><td> .\Resources\ActionScripts\ConfigureSQL.ps1</td><td>Configures SQL, called from SetupVM.ps1 </td></tr>
<tr><td> .\Resources\ActionScripts\CreateDatabase.sql</td><td>Creates the database for this solution, called from ConfigureSQL.ps1 </td></tr>
<tr><td> .\Resources\ActionScripts\CreateSQLObjectsPy.sql</td><td>Creates the tables and stored procedures for this solution, called from ConfigureSQL.ps1 </td></tr>
<tr><td> .\Resources\ActionScripts\CreateSQLObjectsR.sql</td><td>Creates the tables and stored procedures for this solution, called from ConfigureSQL.ps1 </td></tr>
<tr><td> .\Resources\ActionScripts\TextClassificationSetup.ps1</td><td>Configures SQL, creates and populates database</td></tr>
<tr><td> .\Resources\ActionScripts\SolutionHelp.url</td><td>URL to the help page </td></tr>

</table>




[&lt; Home](index.html)
Loading