Skip to content

jugal101/Data-Analysis-Task-5

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Titanic Dataset Exploratory Data Analysis (EDA)

Overview

This repository contains an exploratory data analysis (EDA) of the Titanic dataset, which provides information about the passengers aboard the RMS Titanic and their survival status. The analysis includes data cleaning, visualization, and statistical summaries to uncover patterns and relationships.

Dataset Information

  • Source: Kaggle Titanic Dataset
  • Rows: 891 passengers
  • Columns: 12 features including survival status, passenger class, age, gender, fare, etc.

Key Findings

Demographic Insights

  • Gender Distribution: 577 males (64.8%) and 314 females (35.2%)
  • Passenger Classes:
    • 3rd Class: 491 passengers (55.1%)
    • 1st Class: 216 passengers (24.2%)
    • 2nd Class: 184 passengers (20.7%)
  • Survival Rate: 342 survived (38.4%), 549 perished (61.6%)

Survival Analysis

  • Gender Impact: Females had significantly higher survival rates (74.2%) vs males (18.9%)
  • Class Impact: 1st class passengers had the highest survival rate (63%)
  • Age Impact: Children (<10 years) had higher survival rates
  • Fare Impact: Higher fare-paying passengers had better survival chances

Data Quality

  • Missing values detected:
    • Age: 177 missing (19.9%)
    • Cabin: 687 missing (77.1%)
    • Embarked: 2 missing

Visualizations

The analysis includes the following visualizations:

  1. Univariate Analysis:

    • Histograms for numerical variables (Age, Fare, etc.)
    • Boxplots for outlier detection
    • Count plots for categorical variables (Sex, Pclass, Survived)
  2. Bivariate Analysis:

    • Survival rates by gender, class, and embarkation port
    • Age and fare distributions by survival status
  3. Multivariate Analysis:

    • Pairplot showing relationships between numerical variables
    • Correlation heatmap

Files

  • Titanic_EDA.ipynb: Jupyter Notebook containing the complete analysis
  • titanic.csv: Dataset file
  • Titanic_EDA.pdf: PDF report of findings

How to Run

  1. Clone this repository
  2. Install required packages: pandas, numpy, matplotlib, seaborn
  3. Open and run the Jupyter Notebook

Dependencies

  • Python 3.x
  • Jupyter Notebook
  • Libraries: Pandas, NumPy, Matplotlib, Seaborn

Author

[Your Name]

License

This project is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors