This repository contains an exploratory data analysis (EDA) of the Titanic dataset, which provides information about the passengers aboard the RMS Titanic and their survival status. The analysis includes data cleaning, visualization, and statistical summaries to uncover patterns and relationships.
- Source: Kaggle Titanic Dataset
- Rows: 891 passengers
- Columns: 12 features including survival status, passenger class, age, gender, fare, etc.
- Gender Distribution: 577 males (64.8%) and 314 females (35.2%)
- Passenger Classes:
- 3rd Class: 491 passengers (55.1%)
- 1st Class: 216 passengers (24.2%)
- 2nd Class: 184 passengers (20.7%)
- Survival Rate: 342 survived (38.4%), 549 perished (61.6%)
- Gender Impact: Females had significantly higher survival rates (74.2%) vs males (18.9%)
- Class Impact: 1st class passengers had the highest survival rate (63%)
- Age Impact: Children (<10 years) had higher survival rates
- Fare Impact: Higher fare-paying passengers had better survival chances
- Missing values detected:
- Age: 177 missing (19.9%)
- Cabin: 687 missing (77.1%)
- Embarked: 2 missing
The analysis includes the following visualizations:
-
Univariate Analysis:
- Histograms for numerical variables (Age, Fare, etc.)
- Boxplots for outlier detection
- Count plots for categorical variables (Sex, Pclass, Survived)
-
Bivariate Analysis:
- Survival rates by gender, class, and embarkation port
- Age and fare distributions by survival status
-
Multivariate Analysis:
- Pairplot showing relationships between numerical variables
- Correlation heatmap
Titanic_EDA.ipynb: Jupyter Notebook containing the complete analysistitanic.csv: Dataset fileTitanic_EDA.pdf: PDF report of findings
- Clone this repository
- Install required packages:
pandas,numpy,matplotlib,seaborn - Open and run the Jupyter Notebook
- Python 3.x
- Jupyter Notebook
- Libraries: Pandas, NumPy, Matplotlib, Seaborn
[Your Name]
This project is licensed under the MIT License.