Skip to content

ja-godfrey/SDPython-toolkit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SDPython's Data Cleaning & Preprocessing Toolkit 🛠️


Lines of Code Maintenance Works on My Machine GitHub stars


📑 Table of Contents


🌟 Overview

This Data Cleaning & Preprocessing Toolkit is a Python library that makes the data wrangling phase of your Data Science projects as smooth as butter. It is not meant to replace your favorite libraries (pandas, nltk); it just extends their use with a few functions that I found myself writing over and over. With a focus on code reusability and ease of use, this toolkit is the perfect addition to your Data Science arsenal.

"In God we trust. All others must bring data." - W. Edwards Deming (Ostensibly).


🎁 Features

  • Missing Value Handling: Say goodbye to missing data woes.
  • Outlier Management: Handle outliers like a pro.
  • Feature Engineering: Transform raw data into insights.
  • Text Cleaning: Get your text data in shape.
  • Data Integrity Checks: Ensure your data is clean and reliable.
  • Class Imbalance Handling: Balance your datasets like a yogi.

🛠️ Installation

Creating a Virtual Environment

  1. Open your terminal and run:

    python3 -m venv SDPython
    

This will create a new Python virtual environment named SDPython.

  1. Activate the virtual environment:

    • macOS and Linux:

      source SDPython/bin/activate
    • Windows:

      .\SDPython\Scripts\Activate

Installing Dependencies

After activating the virtual environment, navigate to the directory where requirements.txt is located and run:

pip install -r requirements.txt

This will install all the required packages.

📦 Modules

Check out the examples/ folder for usage examples!


📖 Examples

Visit the examples/ folder for Jupyter Notebooks demonstrating how to use each module.

🔍 Each example walks you through the process, explaining every step.


👥 Contributing

We love contributions! Please see CONTRIBUTING.md for details on how you can contribute.


📜 License

This project is licensed under the MIT License - see the LICENSE.md file for details.


📞 Contact

If you like this project, please give it a ⭐ on GitHub! 😊

About

A python toolkit meant to extend functionality of python data analysis packages with easy-to-use scripts aimed towards aiding other fellows in the Strategic Data Project

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages