SQL Database don't have built-in support for transitive closure clustering, so the only workaround is to implement this algorithm in .Net framework and expose it as T-SQL function. A discussion on the problem, the algorithm and a pure T-SQL based solution can be found here:
- Transitive Closure Clustering with SQL Server, UDA and JSON
- T-SQL Puzzle Challenge Grouping Connected Items
This code sample demonstrates how to create CLR User-Defined aggregate that implements clustering.
About this sample
Build the CLR/TransitiveClosure aggregate
Add RegEx functions to your SQL database
Test the functions
Disclaimers
- Applies to: SQL Server 2016+ Enterprise / Developer / Evaluation Edition
- Key features:
- CLR, JSON
- Programming Language: .NET C#
- Author: Davide Mauri, Jovan Popovic [jovanpop-msft]
- Download the source code and open the solution using Visual Studio.
- Change the password in .pfk file and rebuild the solution in Release mode.
- Open and save TransitiveClosure.tt to generate output T-SQL file that will contain script that inserts .dll file with the Transitive closure clustering aggregate.
File TransitiveClosure.sql contains the code that will import aggregate into SQL Database.
If you have not added CLR assemblies in your database, you should use the following script to enable CLR:
sp_configure @configname=clr_enabled, @configvalue=1
GO
RECONFIGURE
GO
Once you enable CLR, you can use the T-SQL script to add the clustering aggregate. The script depends on the location where you have built the project, and might look like:
CREATE ASSEMBLY TransitiveClosure FROM 'D:\GitHub\sql-server-samples\samples\features\sql-clr\TransitiveClosure\bin\Release\TransitiveClosureAggregatorLibrary.dll' WITH PERMISSION_SET = SAFE;
GO
CREATE SCHEMA TC;
GO
CREATE AGGREGATE TC.CLUSTERING(@id1 INT, @id2 INT)
RETURNS NVARCHAR(MAX)
EXTERNAL NAME TransitiveClosure.[TransitiveClosure.Aggregate];
This code will import assembly in SQL Database and add an aggregate that provides clustering functionalities.
Once you create the assembly and expose the aggregate, you can use it to cluster some relational data in T-SQL code:
declare @edges table(n1 int, n2 int);
insert into @edges
values
(1,2),(2,3),(3,4),(4,5),(2,21),(2,22),
(7,8),(8,9),(9,10);
select TC.CLUSTERING(n1,n2)
from @edges;
The result will be JSON document that groups the numbers that belong to the same cluster.
{
"0":[1,2,3,4,5,21,22],
"1":[7,8,9,10]
}You can transform this JSON document into relational formatusing OPENJSON function:
select cluster = [key], elements = value
from openjson(
(select TC.CLUSTERING(n1,n2) from @edges)
);
The result of this query is:
| cluster | elements |
|---|---|
| 0 | [1,2,3,4,5,21,22] |
| 1 | [7,8,9,10] |
The code included in this sample is not intended to be a set of best practices on how to build scalable enterprise grade applications. This is beyond the scope of this sample.