SDPython's Data Cleaning & Preprocessing Toolkit 🛠️

📑 Table of Contents

Overview
Features
Installation
- Creating a Virtual Environment
- Installing Dependencies
Modules
Examples
Contributing
License
Contact

🌟 Overview

This Data Cleaning & Preprocessing Toolkit is a Python library that makes the data wrangling phase of your Data Science projects as smooth as butter. It is not meant to replace your favorite libraries (pandas, nltk); it just extends their use with a few functions that I found myself writing over and over. With a focus on code reusability and ease of use, this toolkit is the perfect addition to your Data Science arsenal.

"In God we trust. All others must bring data." - W. Edwards Deming (Ostensibly).

🎁 Features

Missing Value Handling: Say goodbye to missing data woes.
Outlier Management: Handle outliers like a pro.
Feature Engineering: Transform raw data into insights.
Text Cleaning: Get your text data in shape.
Data Integrity Checks: Ensure your data is clean and reliable.
Class Imbalance Handling: Balance your datasets like a yogi.

🛠️ Installation

Creating a Virtual Environment

Open your terminal and run:
```
python3 -m venv SDPython
```

This will create a new Python virtual environment named SDPython.

Activate the virtual environment:
- macOS and Linux:
```
source SDPython/bin/activate
```
- Windows:
```
.\SDPython\Scripts\Activate
```

Installing Dependencies

After activating the virtual environment, navigate to the directory where requirements.txt is located and run:

pip install -r requirements.txt

This will install all the required packages.

📦 Modules

missing_values.py: Impute and manage missing values.
outliers.py: Detect and handle outliers effectively.
feature_engineering.py: Tools for feature creation and transformation.
text_cleaning.py: Essential text cleaning operations.
integrity_checks.py: Functions for checking data integrity.
imbalance_handling.py: Address class imbalance problems.

Check out the examples/ folder for usage examples!

📖 Examples

Visit the examples/ folder for Jupyter Notebooks demonstrating how to use each module.

🔍 Each example walks you through the process, explaining every step.

👥 Contributing

We love contributions! Please see CONTRIBUTING.md for details on how you can contribute.

📜 License

This project is licensed under the MIT License - see the LICENSE.md file for details.

📞 Contact

Website - jasongodfrey.info
Email - jason.godfrey@accelerate.com

If you like this project, please give it a ⭐ on GitHub! 😊

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
examples		examples
modules		modules
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SDPython.jpg		SDPython.jpg
SDPython2.png		SDPython2.png
notes.md		notes.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SDPython's Data Cleaning & Preprocessing Toolkit 🛠️

📑 Table of Contents

🌟 Overview

🎁 Features

🛠️ Installation

Creating a Virtual Environment

Installing Dependencies

This will install all the required packages.

📦 Modules

📖 Examples

👥 Contributing

📜 License

📞 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SDPython's Data Cleaning & Preprocessing Toolkit 🛠️

📑 Table of Contents

🌟 Overview

🎁 Features

🛠️ Installation

Creating a Virtual Environment

Installing Dependencies

This will install all the required packages.

📦 Modules

📖 Examples

👥 Contributing

📜 License

📞 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages