Skip to content

luke-ad-murphy/SQL-KS-statistic-distribution-comparison

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Overview

This project evaluates whether a test distribution aligns with an expected (control) distribution by comparing their cumulative distribution functions (CDFs).

Methodology

Two key calculations determine whether the test distribution meets expectations:

  1. Kolmogorov-Smirnov (K-S) Test The Kolmogorov-Smirnov test is a non-parametric statistical test that assesses whether two distributions originate from the same underlying distribution. It calculates the maximum absolute difference between the two CDFs at any given point.
image

Example Results

Pass Case: In the first graph below, the control (expected) CDF is shown in white, and the test CDF in red. The K-S test returns a value of 0.1158, meaning the largest observed difference between the two CDFs is 11.58 percentage points. Based on scenario testing, a tolerance of 0.15 (15%) provides reasonable results, so this test passes.

Fail Case: In the second graph, the K-S test returns 0.4434, indicating a significant deviation between the two distributions. As a result, this test fails.

image image

To ensure statistical significance, the OS/SDK/device/chipset sample size must be sufficiently large. This is evaluated using a p-value threshold of < 0.05. The coefficient used for calculating the p-value in this context is 0.01.

For further reference, see the attached kstest.pdf file, which contains a detailed breakdown of the calculations and results.

About

Compare two number series to observe whether they are from the same distribution

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published