🧪 Mauzalyzer

Ultimate Data Duplication & Similarity Analyzer — built for developers, analysts, and data engineers who need quick insights from messy, large, or structured data files.

Created by mauzware
Works on Linux 🐧 and Windows 🧩
Fast, powerful, and customizable via CLI ⚙️

📦 Features

🔁 Detects exact duplicates across rows and columns
🔍 Finds similar values using fuzzy matching
🧠 Automatically categorizes values: Numeric, Textual, Mixed, Unknown
📊 Outputs detailed summaries and top frequent values
💾 Exports reports in JSON, TXT, or XML
📁 Supports both CSV and XLSX files
⚡ Modes: Fast, Standard, and Detailed
🧼 Experimental support for messy CSVs (--messy)
🌈 Colorful CLI with optional logging and quiet/debug modes

🛠️ Installation

Make sure you have Python 3.9+ installed (tested on 3.13+).
You can use either pip or pip3, whichever works on your system depending on your Python version.

Windows

git clone https://github.com/mauzware/Mauzalyzer.git
cd Mauzalyzer
pip install -r requirements.txt
python mauzalyzer.py --help

Linux Debian/Ubuntu

git clone https://github.com/mauzware/Mauzalyzer.git
cd Mauzalyzer
pip install -r requirements.txt
python mauzalyzer.py --help

Kali Linux

In Kali, all required modules are already pre-installed.

git clone https://github.com/mauzware/Mauzalyzer.git
cd Mauzalyzer
python mauzalyzer.py --help

If you are missing some modules by any chance, you can install them with:

Create a virtual environment and use: pip3 install -r requirements.txt
Install them manually with apt: sudo apt install python3-[module_name]

Virtual Environment Setup

sudo apt update
sudo apt install python3-venv -y

git clone https://github.com/mauzware/Mauzalyzer.git
cd Mauzalyzer
python3 -m venv Mauzalyzer-env
source Mauzalyzer-env/bin/activate
pip install -r requirements.txt
deactivate

🖥️ Usage

You can use either python or python3, whichever works on your system depending on your Python version.

python mauzalyzer.py [OPTIONS] source
python3 mauzalyzer.py [OPTIONS] source

Examples:

python3 mauzalyzer.py Your_File.csv #Basic scan
python mauzalyzer.py Your_File.xlsx #Basic scan
python3 mauzalyzer.py Your_File.csv --detailed #Detailed scan
python mauzalyzer.py --fast Your_File.xlsx #Fast scan
python3 mauzalyzer.py Your_File.csv -o Report_Name --output-format=txt #Saving output in TXT format
python mauzalyzer.py Your_File.xlsx -o Report_Name --output-format=xml #Saving output in XML format

🔧 Basic Options

Option	Description
`--fast`	Fast scanning (basic checks only)
`--detailed`	Detailed scanning with deep similarity analysis
`--type csv/xlsx`	Manually set the file type
`--chunksize`	Set custom chunk size for large files
`--messy`	Preprocess messy CSV files
`-o`, `--output`	Custom output file name
`--output-format`	Output format: `json`, `txt`, `xml`

🛡️ Utility Flags

Flag	Description
`--version`	Display version and author info
`--help`	Show help screen
`--debug`	Enable full debug traceback
`--quiet`	Suppress all output
`-v`, `--verbose`	Show verbose output

📸 Screenshots

💡 Help menu on Linux:

💡 Help menu on Windows:

💡 Mauzalyzer in action:

📂 Output Example

{
  "analysis_date": "2025-04-17T17:03:33",
  "data_source": "Your_Input_File.xlsx",
  "findings": [...],
  "summary": {...}
}

Reports are saved to the data_report/ folder and include a timestamp + hash for uniqueness.
data_report/ folder will be automatically created after first usage.

⚡ Bonus: Optional Header Row Removal, details are below.

This code helps remove repeated or stray header rows inside messy CSVs (usually when a report was exported from Excel or multiple tables were merged).

❗️ When to use:

You scanned a file and noticed weird duplicated values like "type" or "sale_date"
You know your file includes repeated headers (you may have seen them in Excel file when you opened it)

📋 Instructions:

Below this comment, you'll see a method called 'remove_headers(df)', edit the list 'known_headers' to include any words you want to treat as "header rows".
In regards to editing 'known_headers', you can add more values or remove some, it's completely on you.
Go to method 'scan_csv()' in the code, you'll see '#df = remove_headers(df)', just remove # and that's it, voila removed headers are implemented.

Example:

def scan_csv(self, similarity_threshold=85):
    try:
        df = self.safe_read_csv(file_path)
        #df = remove_headers(df) <-- Here, simply delete # and its done

🚧 Future Plans: Mauzalyzer v2.0 (coming soon...)

Mauzalyzer Engineers are already cooking up new features for v2.0. Stay tuned! 👾

👁️ Better schema detection for extremely messy files
🗂️ Support for more formats: JSON, XML, TXT (as inputs)
🎛️ GUI mode (TBD)
🔧 Interactive mode for manual value inspection
⚙️ Additional CLI support

👨‍💻 Author

📜 License

This project is open-source and distributed under the terms of the MIT License. You are free to use, modify, and distribute it with proper attribution.

All kuddos go to my professor who taught me everything I know, I think she will be proud of me using this many emojis. 😅
To all my friends who supported me on this wonderful journey — I haven't forgotten you, folks. Big thanks and much love to all of you! ❤️

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github		.github
LICENSE		LICENSE
README.md		README.md
mauzalyzer.py		mauzalyzer.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧪 Mauzalyzer

📦 Features

🛠️ Installation

🖥️ Usage

🔧 Basic Options

🛡️ Utility Flags

📸 Screenshots

📂 Output Example

⚡ Bonus: Optional Header Row Removal, details are below.

🚧 Future Plans: Mauzalyzer v2.0 (coming soon...)

👨‍💻 Author

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

mauzware/Mauzalyzer

Folders and files

Latest commit

History

Repository files navigation

🧪 Mauzalyzer

📦 Features

🛠️ Installation

🖥️ Usage

🔧 Basic Options

🛡️ Utility Flags

📸 Screenshots

📂 Output Example

⚡ Bonus: Optional Header Row Removal, details are below.

🚧 Future Plans: Mauzalyzer v2.0 (coming soon...)

👨‍💻 Author

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages