💫 WWE Superstar Popularity Tier Prediction

Team Name: Artificial Ledger

Subject & Section: CCMACLRL - COM231ML

Schedule: TUE 11:00AM - 01:40 PM VR09CCIT - FRI 11:00AM - 03:00 PM 408 MB

Professor: Elizer Ponio Jr

No. of Units: 3 Units

Prerequisite: None

Kaggle: WWE Popularity - Multiclass Classification

Project Website: Artificial Ledger Wrestlers Predictor

📊 Table of Contents

Introduction
Key Features
Folder Structure
Contributing
License
Acknowledgements
FAQ
Changelog

🧠 Introduction

🏆 WWE Superstar Popularity Tier Prediction

A comprehensive machine learning system that predicts WWE superstar popularity tiers (Main Eventer, Midcard, Enhancement) based on career statistics and performance metrics. This multi-class classification project demonstrates end-to-end ML pipeline development with robust validation and deployment-ready features.

🎯 Project Overview

This project implements a robust machine learning pipeline for classifying WWE superstars into popularity tiers based on various performance metrics and career statistics. The system features dynamic data acquisition, comprehensive exploratory data analysis, and multiple classification algorithms with hyperparameter optimization.

🎯 Business Problem

WWE management needs to understand what factors contribute to a wrestler's success and popularity tier placement. This model helps in:

Talent development and scouting
Brand strategy optimization
Performance metric analysis
Predictive roster managemen

🌐 References to any work

This is the site arXiv wherein we need to use to provide references to any work that is not our own in our final paper.

Although we are still unable to find any similar to our work. Since I've come up to original work of mine, even in kaggle doesn't had a similar dataset of paper works.

Out Partial WebApp 🌐 : https://alt-wrestlers-predictor.netlify.app/

📊 Dataset Features

💫 Primary Dataset: `wwe_rosters.csv`

185 wrestlers with comprehensive career statistics
15+ features including match history, title reigns, social media presence
Target Variable: popularity_tier (Main Eventer, Midcard, Enhancement)

💫 External Validation: `other_brand_rosters.csv`

550 wrestlers from various wrestling promotions (AEW, NJPW, Impact, etc.)
Used for model generalization testing

🎪 Superstar Profile

🧬 Basic Info: Wrestler ID, Name, Brand, Age, Weight Class
📅 Career Timeline: Debut Year, Years Active
⚔️ Match Statistics: Total Matches, Avg Matches per Month, Career Win Percentage

🏅 Championship History

👑 Title Reigns: World Titles, Secondary Titles, Tag Team Titles
💎 Total Championships: Combined title count
⭐ Current Champion Status: Binary indicator

📈 Performance Metrics

🎪 Main Event Appearances: PPV main events count
📱 Social Media Presence: Followers in millions
🔥 Finisher Popularity: Move effectiveness rating

🚀 Key Features

🗃️ 1. Dynamic Data Management

🔄 Robust data loading from GitHub with error handling
🌐 Support for multiple datasets (WWE and other brands)
📋 Comprehensive dataset summaries and validation

🔍 2. Advanced Exploratory Data Analysis

❌ Missing values analysis with visualizations
🎯 Target variable distribution analysis
📊 Comprehensive numerical feature analysis including:
- 📈 Distribution histograms with KDE
- 📦 Boxplots by popularity tier
- 🔗 Correlation heatmaps and insights
- 🔄 Feature pairplot analysis
- 📋 Statistical summaries

⚙️ 3. Sophisticated Feature Engineering

🎯 Dynamic feature creation (matches_per_year, titles_per_year, main_event_frequency)
🏷️ Categorical feature encoding (brand, weight_class)
📏 Standardized preprocessing pipeline

🚀 Other key features

Career Metrics: years_active, total_matches, career_win_percentage
Accolades: world_title_reigns, secondary_title_reigns, tag_title_reigns
Performance: avg_matches_per_month, main_evented_ppv
Popularity: social_media_followers_millions
Demographics: age, weight_class, brand

🧠 Wrestler Roster Data source link

🤖 4. Multi-Model Classification

Implementation of multiple classification algorithms:

🌲 Random Forest Classifier
🎯 Support Vector Machine (SVM)
📈 Gradient Boosting Classifier
📊 Logistic Regression
👥 K-Nearest Neighbors
🌳 Decision Tree Classifier

⚡ 5. Hyperparameter Optimization

🔧 GridSearchCV for optimal parameter tuning
✅ Cross-validation with configurable folds
📝 Comprehensive model evaluation

📊 6. Comprehensive Model Evaluation

🎯 Multiple performance metrics:
- ✅ Accuracy, Precision, Recall, F1-Score
- 📋 Classification reports
- 🎭 Confusion matrices
🔄 Cross-validation scores

🏆 Model Performance Metrics

📈 Key Results from Analysis:

Dataset Size: 185 WWE superstars + 550 other brand superstars
Feature Count: 18 comprehensive metrics per superstar
Target Distribution: Multi-class classification across 3 tiers
Cross-Validation: 5-fold CV for robust performance estimation

🎯 Evaluation Metrics Tracked:

Overall Accuracy - Total correct predictions
Precision per Class - Main Eventer, Midcard, Enhancement
Recall per Class - Sensitivity for each tier
F1-Score - Harmonic mean of precision and recall
Confusion Matrix - Detailed classification breakdown

🛠 Technical Implementation

⚙️ Configuration Management

Centralized configuration class for dynamic parameter management:

🌐 Data URLs and paths
🤖 Model parameters and test configurations
🎯 Feature engineering specifications
🔧 Hyperparameter grids for all models
🎨 Visualization settings

🏗️ Model Architecture

📥 DataLoader: Dynamic data acquisition and validation
🔍 DataExplorer: Comprehensive EDA with advanced visualizations
🤖 ModelTrainer: Multi-model training and evaluation framework

🎨 Visualization Suite

🎨 Custom color schemes for different popularity tiers
📊 Multiple plot types (bar, pie, distribution, correlation, pairplot)
📝 Statistical annotations and insights
🎯 Professional-grade matplotlib and seaborn visualizations

📊 Data Insights from EDA

🎯 Target Distribution Analysis

Comprehensive analysis of popularity tier distribution
Statistical summary of class balance
Temporal analysis of tier distribution over debut years

🔍 Feature Correlations

Identification of highly correlated features (>0.7)
Top feature correlations analysis
Pairplot visualization for key feature relationships

📈 Statistical Summaries

Detailed descriptive statistics for all numerical features
Variance, skewness, and kurtosis analysis
Feature distribution insights

🎮 Usage

The notebook is organized into clear phases:

⚙️ Configuration & Imports - Environment setup and library imports
📥 Data Loading & Exploration - Dynamic data acquisition and initial analysis
🔍 Comprehensive EDA - Detailed statistical and visual analysis
⚙️ Feature Engineering - Creation of enhanced features
🤖 Model Training & Evaluation - Multi-algorithm implementation
🔧 Hyperparameter Tuning - Optimization for best performance
📊 Results & Insights - Comprehensive model comparison

🔧 Requirements

pandas>=1.3.0
numpy>=1.21.0
matplotlib>=3.4.0
seaborn>=0.11.0
scikit-learn>=1.0.0

📁 Project Structure

CCMACLRL_COM231_PROJECT/
│
├── 🎨assets/                             # Images and Background file
│
├── 📁datasets/
│      └── 📁test/
│      │   └── other_brand_rosters.csv  # External validation
│      │
│      └── 📁 training/ 
│          └── wwe_rosters.csv          # Primary training data
│
├── model/
│   └── 🌐 wwe_popularity_predictor.pkl
│    
│
├── notebook/
│   └── 🐍 WWE_Popularity_Prediction.ipynb
│
│
├── docs/                                # This folder is for research paper
│   ├── 🔍 research_paper.pdf
│    
│
├── 📄 LICENCE
└── 📖 README.md

🏆 Popularity Tiers

The system classifies superstars into three main tiers:

🎪 Main Eventer: Top-tier performers, championship contenders
🥈 Midcard: Regular performers with consistent appearances
🌟 Enhancement: Developing talent and roster support

🔮 Future Enhancements

🔄 Real-time data integration from wrestling APIs
🧠 Advanced ensemble methods and neural networks
🌐 Web application for interactive predictions
📊 Expanded feature set including match ratings and fan sentiment
📱 Mobile app for on-the-go predictions
🔍 Advanced feature importance analysis

🏆 Contributing

Contributing

If you would like to contribute to the Flight Booking App, please follow these steps:

Fork the repository.
Create a new branch for your feature or bug fix.
Make your changes and commit them.
Push your changes to your forked repository.
Submit a pull request to the main repository.

🧠 Submitting Changes

🧠 Contributions are welcome! If you have ideas for improvements or want to add more exercises, follow these steps:

Fork the repository.
Create a new branch.
Make your changes and commit them.
Push to your fork and submit a pull request. 💕💕💕💕

👋 Contributors

Special thanks to all my groupmates:

😎 Jay Arre Talosig - Machine Learning Engineer | Blockchain Developer | Bioinformatics Scientist
🧭 Queen Maegan Pedido - Machine Learning Engineer | Software Engineer
💥 Moira Mercado - Machine Learning Engineer | Software Engineer
🎲 James Adrian Castro - Machine Learning Engineer | Software Engineer

🛸 FAQ

🛸 Reporting Issues

Some changes need to be address
- TBA
- TBA
- TBA

🤖 If you encounter any issues or have suggestions, please open an issue to let us know.

🔑 License & Citation

This project is developed for educational and portfolio purposes. WWE data is used under fair use for academic research.

@misc{wwe_popularity_2025,
  title = {WWE Superstar Popularity Tier Prediction},
  author = {Jay Arre Talosig},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/flexycode/CCMACLRL_COM231_PROJECT}}
}

Check the License tab for copyright permission

🔭 Acknowledgements

✨ Professor Elizer Jr. D. Ponio

Professor Elizer Jr. Ponio is a Software engineer, lecturer, and Machine Learning engineer at the National University. With a Bachelor of Science degree and Master degree in Computer Science, he brings a strong foundation in computer science principles. Prof. Ponio's expertise in software engineering and machine learning is evident in his teaching style and practical approach. He is dedicated to providing students with a comprehensive understanding of the subject matter and incorporates real-world applications into his instruction. Prof. Ponio's combination of academic qualifications, industry experience, and passion for teaching make him a valuable asset to the National University community.

📫 Changelogs

Chronological list of updates, bug fixes, new features, and other modifications for our Machine Learning Project.

💻 [01.0.0] - 2025-09-29

Role & Project Management

💻 Final Project requirements for our project
✨ RAW
✨ SmackDown
✨ NXT

💻 [02.0.0] - 2025-10-11

Development Progress

💻 Uploaded the Python Notebook

Commit message for pushing or pull-request

🧊 ML Final Project

⭐ If you find this project useful, please give it a star on GitHub! 🎯 Predicting wrestling stardom through data science and machine learning. Building the future of sports analytics one superstar at a time! 🏆

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
assets		assets
datasets		datasets
docs		docs
model		model
notebook		notebook
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation