Skip to content

NoebejaraPaul/sql-data-cleaning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“‰ Layoffs Data Cleaning

οΏ½ Introduction

This project focuses on cleaning a dataset of global layoffs from major companies (2020-2023). Using SQL, I ensured the data is clean, standardized, and ready for analysis.

πŸ” Check out the SQL queries used in this project: SQL Cleaning Scripts

πŸ”Ž Background

Handling real-world data often involves dealing with duplicates, inconsistencies, and missing values. This project applies a structured SQL-based cleaning process to prepare the layoffs dataset for meaningful analysis.

πŸ›  Data Cleaning Process

πŸ”Ή Removing Duplicates

  • Used ROW_NUMBER() with PARTITION BY to identify duplicate records
  • Created a staging table (layoffs_staging2) to store deduplicated data
  • Ensured only unique entries were retained

πŸ”Ή Standardizing Data

  • Fixed spelling inconsistencies in company names, locations, and industries
  • Standardized text formats

πŸ”Ή Handling Missing Values

  • Replaced missing values where necessary
  • Removed rows with crucial missing data

πŸ”Ή Filtering Unwanted Rows

  • Removed incomplete and erroneous records

βš™οΈ Tools Used

  • πŸ›’ MySQL Workbench for SQL queries and data cleaning
  • πŸ“‚ Kaggle for dataset sourcing

πŸ“š What I Learned

  • The importance of data cleaning in ensuring accuracy and reliability for analysis
  • How to use SQL functions like ROW_NUMBER() to detect duplicates efficiently
  • Best practices for handling missing data and maintaining data integrity
  • The significance of standardizing values to avoid inconsistencies

πŸ” Conclusion

Data cleaning is a crucial step in any data analysis project. By applying structured SQL techniques, we transformed raw, messy data into a reliable dataset suitable for insights and decision-making.

πŸ’‘ Closing Thoughts

This project reinforced the importance of systematic data cleaning, preparing me for future data-driven projects. Next steps include automating the cleaning process and integrating it with visualization tools for deeper analysis.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published