GitHub - rrunner/log_priority_queue: Assign priority to log messages using UML and NLP techniques

Log message priority queue application

Introduction

This application defines a priority queue to system logs by utilizing Unsupervised Machine Learning (UML) and Natural Language Processing (NLP) techniques.

The application objective is to classify log messages, in terms of rules and outlier characteristics, to enable prioritisation of log messages generated by a system.

Usage

In the terminal:

Activate the virtual environment

Activate virtual environment and install dependencies using the requirements.txt file.

Run the application

The application accepts excel file and database table input.

    python log_priority_queue.py --help

High-level methodology description

A basic TF-IDF vectorizer is applied to the training data, and the same vectorizer is applied to transform the prediction data, in order to map the textual data into points in a coordinate system, e.g. each log message becomes a vector with the dimensionality set by the number of unique words in the training and prediction data.

The training data is fit by using the unsupervised K-means learning approach, where the number of clusters are allowed to increase as long as each cluster contains a certain number of log messages. The model is trained each time the application is run. By fitting the model on each run, decommissioned systems/batch load jobs etc. are excluded from training by design.

An outlier is defined as a prediction data point that deviates from all training data points in terms of the cosine similarity.

The idea is, on the basis of rules and outlier statistics, to assign each prediction data point a priority such that total batch of prediction data can be arranged by priority. Outliers and 'failure' items are assigned a high priority. Repeated log messages and 'success' items are assigned a lower priority.

Technical details

Logging

The application includes logging. Each run produces a logfile that is stored in subfolder /log.

Testing

The application includes a test suite using pytest.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
base		base
data		data
log		log
model		model
tests		tests
.gitignore		.gitignore
README.md		README.md
log_priority_queue.py		log_priority_queue.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Log message priority queue application

Introduction

Usage

High-level methodology description

Technical details

Logging

Testing

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Log message priority queue application

Introduction

Usage

High-level methodology description

Technical details

Logging

Testing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages