Hybrid Log Classification System

A project showcasing hybrid system to log classification that combines rule-based, machine learning, and LLM-based classification for robust and accurate log categorization.

Overview

This project implements a three-layer classification pipeline designed for high-volume and diverse log environments:

Regex Classifier - Fast, deterministic rule-based classification
ML Classifier - Probabilistic classification using machine learning models
LLM Classifier - Advanced fallback using Large Language Models for complex cases

Each layer filters logs based on confidence thresholds, ensuring speed when possible while maintaining accuracy for ambiguous cases.

Project Structure

├── app/                          # Main application 
├── models/                       # Trained model artifacts
├── data/                         # Data
├── scripts/                      # Scripts to train ML model
├── requirements.txt              # Python dependencies
├── run_api.sh                    # API startup script
└── README.md                     # This file

Installation

Prerequisites

Python 3.8+
pip

Setup

Clone the repository:

git clone https://github.com/sejalshitole/Log-Classification.git
cd Log-Classification

Create a virtual environment:

python -m venv venv
source venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Configure environment variables:

cp .env.example .env

Edit .env and set your configuration:

GEMINI_API_KEY=your_api_key_here

Usage

Running the API

Start the FastAPI server:

bash run_api.sh

or manually:

uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

The API will be available at http://127.0.0.1:8000/docs

API Endpoints

Classify a single log:

curl -X POST "http://127.0.0.1:8000/v1/logs/analyze" \
  -H "Content-Type: application/json" \
  -d '{"log": "ERROR [2024-01-01 10:00:00] Database connection failed"}'

Response:

{
  "label": "ERROR",
  "confidence": 0.98,
  "layer": "regex",
  "llm_explanation": null
}

Batch process CSV file:

curl -X POST "http://127.0.0.1:8000/v1/logs/analyze/batch" \
  -F "file=@logs.csv" \
  -F "log_column=message"

API Response Format

{
  "label": "Type of Log",
  "confidence": 0.0-1.0,
  "layer": "regex|ml|llm",
  "llm_explanation": "Optional explanation from LLM"
}

Training the ML Model

Train or retrain the machine learning classifier:

python -m scripts.train_ml

Classification Pipeline

The hybrid classifier uses a cascading approach:

Input Log
    ↓
[1] Regex Classifier (fast, deterministic)
    ├─ Confidence ≥ REGEX_CONFIDENCE? → Return
    └─ Confidence < REGEX_CONFIDENCE? → Next layer
    ↓
[2] ML Classifier (probabilistic)
    ├─ Confidence ≥ ML_CONFIDENCE? → Return
    └─ Confidence < ML_CONFIDENCE? → Next layer
    ↓
[3] LLM Classifier (slow, most accurate)
    └─ Return with explanation

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
app		app
data		data
models		models
scripts		scripts
tests		tests
.DS_Store		.DS_Store
.env		.env
.gitattributes		.gitattributes
.gitignore		.gitignore
Figure_1.png		Figure_1.png
Log-Classification.code-workspace		Log-Classification.code-workspace
README.md		README.md
confusion_matrix.png		confusion_matrix.png
diagram.md		diagram.md
output1.png		output1.png
requirements.txt		requirements.txt
result251204-1551.txt		result251204-1551.txt
result251204-1604.txt		result251204-1604.txt
result251204-1624.txt		result251204-1624.txt
result251204-1634.txt		result251204-1634.txt
result251204-1640.txt		result251204-1640.txt
result251204-1643.txt		result251204-1643.txt
result251204-1646.txt		result251204-1646.txt
result251204-1649.txt		result251204-1649.txt
result251204-1654.txt		result251204-1654.txt
result251204-1658.txt		result251204-1658.txt
results.csv		results.csv
results.txt		results.txt
results_metrics.txt		results_metrics.txt
run_api.sh		run_api.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hybrid Log Classification System

Overview

Project Structure

Installation

Prerequisites

Setup

Usage

Running the API

API Endpoints

API Response Format

Training the ML Model

Classification Pipeline

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hybrid Log Classification System

Overview

Project Structure

Installation

Prerequisites

Setup

Usage

Running the API

API Endpoints

API Response Format

Training the ML Model

Classification Pipeline

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages