Skip to content

docet85/outlier_detector

Repository files navigation

Outlier Detector toolkit

Build Status codecov License: MIT Code style: black pre-commit

This project features a set of tools for outlier detection, marking or filtering away samples as they come to your Python analysis code.

Most of the tools rely on double tailed Dixon's Q-test (https://en.wikipedia.org/wiki/Dixon%27s_Q_test).

Installation

pip install outlier-detector

TL;DR

I have a sample and a known data distribution: is the sample an outlier?
sample = -14.5
distribution = [0.1, 1.1, 4.78, 2.0, 7.2, 5.3]

from outlier_detector.functions import is_outlier
print(is_outlier(distribution, sample))
I have a distribution and I iterate over it: is the n-th sample an outlier?
distribution = [0.1, 1.1, 4.78, 2.0, 7.2, 5.3, 8.1, -14.1, 5.4]
from outlier_detector.detectors import OutlierDetector
od = OutlierDetector(buffer_samples=5)
for sample in distribution:
    print(od.is_outlier(sample))
I have a generating object from which I pop samples, and I want only valid samples: how can I reject outliers?
distribution = [0.1, 1.1, 4.78, 2.0, 7.2, 5.3, 8.1, -14.1, 5.4]
from outlier_detector.filters import filter_outlier

class MyGen:
    def __init__(self):
        self.cursor = -1

    @filter_outlier()
    def pop(self):
        self.cursor += 1
        return distribution[self.cursor]

g = MyGen()
while True:
    try:
        r = g.pop()
        print(r)
    except IndexError:
        print('No more data')
        break
I have a generating object from which I pop samples, and I want to iterate only on valid samples: how can I reject outliers and get an iterator?
distribution = [0.1, 1.1, 4.78, 2.0, 7.2, 5.3, 8.1, -14.1, 5.4]
from outlier_detector.filters import OutlierFilter

class MyGen:
    def __init__(self):
        self.cursor = -1

    def pop(self):
        self.cursor += 1
        return distribution[self.cursor]

g = MyGen()
of = OutlierFilter()
try:
    for sample in of.filter(g.pop):
        print(sample)
except IndexError:
    print('No more data')

Documentation

The toolkit is organized so you can exploit one of the following pattern in the easiest way possible: functions for static analysis, detectors for objects with internal buffers, and filters for decorators.

For documentation see doc file

About

Minimal tool for outliers detection on small samples set

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages