Skip to content

mnmzz/CCMACLRL_COM231_PROJECT

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

💫 WWE Superstar Popularity Tier Prediction

Team Name: Artificial Ledger  

Subject & Section: CCMACLRL - COM231ML

Professor: Elizer Ponio Jr    

No. of Units: 3 Units

Prerequisite: None

📊 Table of Contents

🏆 WWE Superstar Popularity Tier Prediction

A comprehensive machine learning system that predicts WWE superstar popularity tiers (Main Eventer, Midcard, Enhancement) based on career statistics and performance metrics. This multi-class classification project demonstrates end-to-end ML pipeline development with robust validation and deployment-ready features.

🎯 Project Overview

This project implements a robust machine learning pipeline for classifying WWE superstars into popularity tiers based on various performance metrics and career statistics. The system features dynamic data acquisition, comprehensive exploratory data analysis, and multiple classification algorithms with hyperparameter optimization.

🎯 Business Problem

WWE management needs to understand what factors contribute to a wrestler's success and popularity tier placement. This model helps in:

  • Talent development and scouting
  • Brand strategy optimization
  • Performance metric analysis
  • Predictive roster managemen

🌐 References to any work

This is the site arXiv wherein we need to use to provide references to any work that is not our own in our final paper.

Although we are still unable to find any similar to our work. Since I've come up to original work of mine, even in kaggle doesn't had a similar dataset of paper works.

Out Partial WebApp 🌐 : https://alt-wrestlers-predictor.netlify.app/

📊 Dataset Features

💫 Primary Dataset: wwe_rosters.csv

  • 185 wrestlers with comprehensive career statistics
  • 15+ features including match history, title reigns, social media presence
  • Target Variable: popularity_tier (Main Eventer, Midcard, Enhancement)

💫 External Validation: other_brand_rosters.csv

  • 550 wrestlers from various wrestling promotions (AEW, NJPW, Impact, etc.)
  • Used for model generalization testing

🎪 Superstar Profile

  • 🧬 Basic Info: Wrestler ID, Name, Brand, Age, Weight Class
  • 📅 Career Timeline: Debut Year, Years Active
  • ⚔️ Match Statistics: Total Matches, Avg Matches per Month, Career Win Percentage

🏅 Championship History

  • 👑 Title Reigns: World Titles, Secondary Titles, Tag Team Titles
  • 💎 Total Championships: Combined title count
  • ⭐ Current Champion Status: Binary indicator

📈 Performance Metrics

  • 🎪 Main Event Appearances: PPV main events count
  • 📱 Social Media Presence: Followers in millions
  • 🔥 Finisher Popularity: Move effectiveness rating

🚀 Key Features

🗃️ 1. Dynamic Data Management

  • 🔄 Robust data loading from GitHub with error handling
  • 🌐 Support for multiple datasets (WWE and other brands)
  • 📋 Comprehensive dataset summaries and validation

🔍 2. Advanced Exploratory Data Analysis

  • ❌ Missing values analysis with visualizations
  • 🎯 Target variable distribution analysis
  • 📊 Comprehensive numerical feature analysis including:
    • 📈 Distribution histograms with KDE
    • 📦 Boxplots by popularity tier
    • 🔗 Correlation heatmaps and insights
    • 🔄 Feature pairplot analysis
    • 📋 Statistical summaries

⚙️ 3. Sophisticated Feature Engineering

  • 🎯 Dynamic feature creation (matches_per_year, titles_per_year, main_event_frequency)
  • 🏷️ Categorical feature encoding (brand, weight_class)
  • 📏 Standardized preprocessing pipeline

🚀 Other key features

  • Career Metrics: years_active, total_matches, career_win_percentage
  • Accolades: world_title_reigns, secondary_title_reigns, tag_title_reigns
  • Performance: avg_matches_per_month, main_evented_ppv
  • Popularity: social_media_followers_millions
  • Demographics: age, weight_class, brand

🧠 Wrestler Roster Data source link

🤖 4. Multi-Model Classification

Implementation of multiple classification algorithms:

  • 🌲 Random Forest Classifier
  • 🎯 Support Vector Machine (SVM)
  • 📈 Gradient Boosting Classifier
  • 📊 Logistic Regression
  • 👥 K-Nearest Neighbors
  • 🌳 Decision Tree Classifier

⚡ 5. Hyperparameter Optimization

  • 🔧 GridSearchCV for optimal parameter tuning
  • ✅ Cross-validation with configurable folds
  • 📝 Comprehensive model evaluation

📊 6. Comprehensive Model Evaluation

  • 🎯 Multiple performance metrics:
    • ✅ Accuracy, Precision, Recall, F1-Score
    • 📋 Classification reports
    • 🎭 Confusion matrices
  • 🔄 Cross-validation scores

🏆 Model Performance Metrics

📈 Key Results from Analysis:

  • Dataset Size: 185 WWE superstars + 550 other brand superstars
  • Feature Count: 18 comprehensive metrics per superstar
  • Target Distribution: Multi-class classification across 3 tiers
  • Cross-Validation: 5-fold CV for robust performance estimation

🎯 Evaluation Metrics Tracked:

  • Overall Accuracy - Total correct predictions
  • Precision per Class - Main Eventer, Midcard, Enhancement
  • Recall per Class - Sensitivity for each tier
  • F1-Score - Harmonic mean of precision and recall
  • Confusion Matrix - Detailed classification breakdown

🛠 Technical Implementation

⚙️ Configuration Management

Centralized configuration class for dynamic parameter management:

  • 🌐 Data URLs and paths
  • 🤖 Model parameters and test configurations
  • 🎯 Feature engineering specifications
  • 🔧 Hyperparameter grids for all models
  • 🎨 Visualization settings

🏗️ Model Architecture

  • 📥 DataLoader: Dynamic data acquisition and validation
  • 🔍 DataExplorer: Comprehensive EDA with advanced visualizations
  • 🤖 ModelTrainer: Multi-model training and evaluation framework

🎨 Visualization Suite

  • 🎨 Custom color schemes for different popularity tiers
  • 📊 Multiple plot types (bar, pie, distribution, correlation, pairplot)
  • 📝 Statistical annotations and insights
  • 🎯 Professional-grade matplotlib and seaborn visualizations

📊 Data Insights from EDA

🎯 Target Distribution Analysis

  • Comprehensive analysis of popularity tier distribution
  • Statistical summary of class balance
  • Temporal analysis of tier distribution over debut years

🔍 Feature Correlations

  • Identification of highly correlated features (>0.7)
  • Top feature correlations analysis
  • Pairplot visualization for key feature relationships

📈 Statistical Summaries

  • Detailed descriptive statistics for all numerical features
  • Variance, skewness, and kurtosis analysis
  • Feature distribution insights

🎮 Usage

The notebook is organized into clear phases:

  1. ⚙️ Configuration & Imports - Environment setup and library imports
  2. 📥 Data Loading & Exploration - Dynamic data acquisition and initial analysis
  3. 🔍 Comprehensive EDA - Detailed statistical and visual analysis
  4. ⚙️ Feature Engineering - Creation of enhanced features
  5. 🤖 Model Training & Evaluation - Multi-algorithm implementation
  6. 🔧 Hyperparameter Tuning - Optimization for best performance
  7. 📊 Results & Insights - Comprehensive model comparison

🔧 Requirements

pandas>=1.3.0
numpy>=1.21.0
matplotlib>=3.4.0
seaborn>=0.11.0
scikit-learn>=1.0.0

📁 Project Structure

CCMACLRL_COM231_PROJECT/
│
├── 🎨assets/                             # Images and Background file
│
├── 📁datasets/
│      └── 📁test/
│      │   └── other_brand_rosters.csv  # External validation
│      │
│      └── 📁 training/ 
│          └── wwe_rosters.csv          # Primary training data
│
├── model/
│   └── 🌐 wwe_popularity_predictor.pkl
│    
│
├── notebook/
│   └── 🐍 WWE_Popularity_Prediction.ipynb
│
│
├── docs/                                # This folder is for research paper
│   ├── 🔍 research_paper.pdf
│    
│
├── 📄 LICENCE
└── 📖 README.md

🏆 Popularity Tiers

The system classifies superstars into three main tiers:

  • 🎪 Main Eventer: Top-tier performers, championship contenders

  • 🥈 Midcard: Regular performers with consistent appearances

  • 🌟 Enhancement: Developing talent and roster support

🔮 Future Enhancements

  • 🔄 Real-time data integration from wrestling APIs

  • 🧠 Advanced ensemble methods and neural networks

  • 🌐 Web application for interactive predictions

  • 📊 Expanded feature set including match ratings and fan sentiment

  • 📱 Mobile app for on-the-go predictions

  • 🔍 Advanced feature importance analysis

🏆 Contributing    

Contributing     

If you would like to contribute to the Flight Booking App, please follow these steps:

  1. Fork the repository.
  2. Create a new branch for your feature or bug fix.
  3. Make your changes and commit them.
  4. Push your changes to your forked repository.
  5. Submit a pull request to the main repository.

🧠 Submitting Changes

🧠 Contributions are welcome! If you have ideas for improvements or want to add more exercises, follow these steps:

  1. Fork the repository.
  2. Create a new branch.
  3. Make your changes and commit them.
  4. Push to your fork and submit a pull request. 💕💕💕💕

👋 Contributors

Special thanks to all my groupmates:

🛸 FAQ

🛸 Reporting Issues

Some changes need to be address
- TBA
- TBA
- TBA
🤖 If you encounter any issues or have suggestions, please open an issue to let us know.

🔑 License & Citation

This project is developed for educational and portfolio purposes. WWE data is used under fair use for academic research.

@misc{wwe_popularity_2025,
  title = {WWE Superstar Popularity Tier Prediction},
  author = {Jay Arre Talosig},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/flexycode/CCMACLRL_COM231_PROJECT}}
}
Check the License tab for copyright permission

🔭 Acknowledgements     

✨ Professor Elizer Jr. D. Ponio

Professor Elizer Jr. Ponio is a Software engineer, lecturer, and Machine Learning engineer at the National University. With a Bachelor of Science degree and Master degree in Computer Science, he brings a strong foundation in computer science principles. Prof. Ponio's expertise in software engineering and machine learning is evident in his teaching style and practical approach. He is dedicated to providing students with a comprehensive understanding of the subject matter and incorporates real-world applications into his instruction. Prof. Ponio's combination of academic qualifications, industry experience, and passion for teaching make him a valuable asset to the National University community.

📫 Changelogs

Chronological list of updates, bug fixes, new features, and other modifications for our Machine Learning Project.

💻 [01.0.0] - 2025-09-29      

Role & Project Management

  • 💻 Final Project requirements for our project
  • ✨ RAW
  • ✨ SmackDown
  • ✨ NXT

💻 [02.0.0] - 2025-10-11      

Development Progress

  • 💻 Uploaded the Python Notebook

Commit message for pushing or pull-request

🧊 ML Final Project

⭐ If you find this project useful, please give it a star on GitHub! 🎯 Predicting wrestling stardom through data science and machine learning. Building the future of sports analytics one superstar at a time! 🏆


mystreak

mystreak

    

About

🤖 This repository is intended for our Machine Learning Project CCMACLRL COM231ML by Professor Elizer Ponio Jr

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 100.0%