Skip to content

chinmaynadgir/DataPreprocessing-and-ML-operations

Repository files navigation

Data Science Portfolio - Chinmay S Nadgir

Overview

This repository contains five comprehensive, publish-quality Jupyter notebooks that demonstrate end-to-end data science workflows with a strong focus on practical, production-ready techniques. These modules cover:

Module 1: Data Loading

Techniques and best practices to ingest data efficiently from multiple file formats with robust error handling and memory optimization.

Module 2: Data Preprocessing

Systematic data cleaning, missing value treatment, outlier detection, feature engineering, and categorical encoding strategies.

Module 3: Statistics & Probability

Core statistical concepts including hypothesis testing, probability distributions, correlation analysis, and regression modeling with rigorous interpretation.

Module 4: Data Visualization

Design and implementation of static and interactive visualizations following best practices to enhance data understanding and communication.

Module 5: Exploratory Data Analysis (EDA)

A business-driven, end-to-end EDA workflow from data quality assessment to hypothesis testing, insights extraction, and actionable recommendations.


Other Data Science Projects

In addition to the modules above, prior data science work includes foundational projects in data mining and machine learning covering:

  • Data Preprocessing: Comprehensive handling of raw, noisy, and missing data; transformation methods such as normalization and discretization; and dimensionality reduction techniques for large datasets.

  • Algorithm Implementations: Application of Apriori algorithm for association rule mining to analyze market basket datasets and discover frequent itemsets. Implementation of K-means clustering on insurance policy data for customer segmentation and risk analysis.

  • Datasets Used:

    • Grocery shopping dataset (~9,800 rows, 32 features) for frequent itemset mining and association analysis.
    • Insurance policy dataset (~1,340 rows, 7 features) for unsupervised clustering and premium prediction.


image image image image

About

Performing data preprocessing on certain data and also ML Operations

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published