Team Name: Artificial Ledger
Subject & Section: CCMACLRL - COM231ML
Professor: Elizer Ponio Jr
No. of Units: 3 Units
Prerequisite: None
Project Website: Artificial Ledger Wrestlers Predictor
A comprehensive machine learning system that predicts WWE superstar popularity tiers (Main Eventer, Midcard, Enhancement) based on career statistics and performance metrics. This multi-class classification project demonstrates end-to-end ML pipeline development with robust validation and deployment-ready features.
This project implements a robust machine learning pipeline for classifying WWE superstars into popularity tiers based on various performance metrics and career statistics. The system features dynamic data acquisition, comprehensive exploratory data analysis, and multiple classification algorithms with hyperparameter optimization.
WWE management needs to understand what factors contribute to a wrestler's success and popularity tier placement. This model helps in:
- Talent development and scouting
- Brand strategy optimization
- Performance metric analysis
- Predictive roster managemen
This is the site arXiv wherein we need to use to provide references to any work that is not our own in our final paper.
Although we are still unable to find any similar to our work. Since I've come up to original work of mine, even in kaggle doesn't had a similar dataset of paper works.
Out Partial WebApp 🌐 : https://alt-wrestlers-predictor.netlify.app/
- 185 wrestlers with comprehensive career statistics
- 15+ features including match history, title reigns, social media presence
- Target Variable:
popularity_tier(Main Eventer, Midcard, Enhancement)
- 550 wrestlers from various wrestling promotions (AEW, NJPW, Impact, etc.)
- Used for model generalization testing
- 🧬 Basic Info: Wrestler ID, Name, Brand, Age, Weight Class
- 📅 Career Timeline: Debut Year, Years Active
- ⚔️ Match Statistics: Total Matches, Avg Matches per Month, Career Win Percentage
- 👑 Title Reigns: World Titles, Secondary Titles, Tag Team Titles
- 💎 Total Championships: Combined title count
- ⭐ Current Champion Status: Binary indicator
- 🎪 Main Event Appearances: PPV main events count
- 📱 Social Media Presence: Followers in millions
- 🔥 Finisher Popularity: Move effectiveness rating
- 🔄 Robust data loading from GitHub with error handling
- 🌐 Support for multiple datasets (WWE and other brands)
- 📋 Comprehensive dataset summaries and validation
- ❌ Missing values analysis with visualizations
- 🎯 Target variable distribution analysis
- 📊 Comprehensive numerical feature analysis including:
- 📈 Distribution histograms with KDE
- 📦 Boxplots by popularity tier
- 🔗 Correlation heatmaps and insights
- 🔄 Feature pairplot analysis
- 📋 Statistical summaries
- 🎯 Dynamic feature creation (
matches_per_year,titles_per_year,main_event_frequency) - 🏷️ Categorical feature encoding (
brand,weight_class) - 📏 Standardized preprocessing pipeline
- Career Metrics:
years_active,total_matches,career_win_percentage - Accolades:
world_title_reigns,secondary_title_reigns,tag_title_reigns - Performance:
avg_matches_per_month,main_evented_ppv - Popularity:
social_media_followers_millions - Demographics:
age,weight_class,brand
- WWE: World Wrestling Entertainment Roster
- AEW: All Elite Wrestling Roster
- TNA: Total Nonstop Action Wrestling Roster
- ROH: Ring of Honor Roster
- NJPW: New Japan Pro Wrestling Roster
- NWA: National Wrestling Alliance Roster
- OVW: Ohio Valley Wrestling Roster
- AAA: Lucha Libre AAA Worldwide Roster
- AJPW: All Japan Pro Wrestling Roster
- NOAH: Pro Wrestling NOAH Roster
- MLW: Major League Wrestling Roster
- FPW: Filipino Pro Wrestling Roster
Implementation of multiple classification algorithms:
- 🌲 Random Forest Classifier
- 🎯 Support Vector Machine (SVM)
- 📈 Gradient Boosting Classifier
- 📊 Logistic Regression
- 👥 K-Nearest Neighbors
- 🌳 Decision Tree Classifier
- 🔧 GridSearchCV for optimal parameter tuning
- ✅ Cross-validation with configurable folds
- 📝 Comprehensive model evaluation
- 🎯 Multiple performance metrics:
- ✅ Accuracy, Precision, Recall, F1-Score
- 📋 Classification reports
- 🎭 Confusion matrices
- 🔄 Cross-validation scores
- Dataset Size: 185 WWE superstars + 550 other brand superstars
- Feature Count: 18 comprehensive metrics per superstar
- Target Distribution: Multi-class classification across 3 tiers
- Cross-Validation: 5-fold CV for robust performance estimation
- Overall Accuracy - Total correct predictions
- Precision per Class - Main Eventer, Midcard, Enhancement
- Recall per Class - Sensitivity for each tier
- F1-Score - Harmonic mean of precision and recall
- Confusion Matrix - Detailed classification breakdown
Centralized configuration class for dynamic parameter management:
- 🌐 Data URLs and paths
- 🤖 Model parameters and test configurations
- 🎯 Feature engineering specifications
- 🔧 Hyperparameter grids for all models
- 🎨 Visualization settings
- 📥 DataLoader: Dynamic data acquisition and validation
- 🔍 DataExplorer: Comprehensive EDA with advanced visualizations
- 🤖 ModelTrainer: Multi-model training and evaluation framework
- 🎨 Custom color schemes for different popularity tiers
- 📊 Multiple plot types (bar, pie, distribution, correlation, pairplot)
- 📝 Statistical annotations and insights
- 🎯 Professional-grade matplotlib and seaborn visualizations
- Comprehensive analysis of popularity tier distribution
- Statistical summary of class balance
- Temporal analysis of tier distribution over debut years
- Identification of highly correlated features (>0.7)
- Top feature correlations analysis
- Pairplot visualization for key feature relationships
- Detailed descriptive statistics for all numerical features
- Variance, skewness, and kurtosis analysis
- Feature distribution insights
The notebook is organized into clear phases:
- ⚙️ Configuration & Imports - Environment setup and library imports
- 📥 Data Loading & Exploration - Dynamic data acquisition and initial analysis
- 🔍 Comprehensive EDA - Detailed statistical and visual analysis
- ⚙️ Feature Engineering - Creation of enhanced features
- 🤖 Model Training & Evaluation - Multi-algorithm implementation
- 🔧 Hyperparameter Tuning - Optimization for best performance
- 📊 Results & Insights - Comprehensive model comparison
pandas>=1.3.0
numpy>=1.21.0
matplotlib>=3.4.0
seaborn>=0.11.0
scikit-learn>=1.0.0CCMACLRL_COM231_PROJECT/
│
├── 🎨assets/ # Images and Background file
│
├── 📁datasets/
│ └── 📁test/
│ │ └── other_brand_rosters.csv # External validation
│ │
│ └── 📁 training/
│ └── wwe_rosters.csv # Primary training data
│
├── model/
│ └── 🌐 wwe_popularity_predictor.pkl
│
│
├── notebook/
│ └── 🐍 WWE_Popularity_Prediction.ipynb
│
│
├── docs/ # This folder is for research paper
│ ├── 🔍 research_paper.pdf
│
│
├── 📄 LICENCE
└── 📖 README.md
The system classifies superstars into three main tiers:
-
🎪 Main Eventer: Top-tier performers, championship contenders
-
🥈 Midcard: Regular performers with consistent appearances
-
🌟 Enhancement: Developing talent and roster support
-
🔄 Real-time data integration from wrestling APIs
-
🧠 Advanced ensemble methods and neural networks
-
🌐 Web application for interactive predictions
-
📊 Expanded feature set including match ratings and fan sentiment
-
📱 Mobile app for on-the-go predictions
-
🔍 Advanced feature importance analysis
If you would like to contribute to the Flight Booking App, please follow these steps:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Make your changes and commit them.
- Push your changes to your forked repository.
- Submit a pull request to the main repository.
🧠 Contributions are welcome! If you have ideas for improvements or want to add more exercises, follow these steps:
- Fork the repository.
- Create a new branch.
- Make your changes and commit them.
- Push to your fork and submit a pull request. 💕💕💕💕
-
😎 Jay Arre Talosig - Machine Learning Engineer | Blockchain Developer | Bioinformatics Scientist
-
🧭 Queen Maegan Pedido - Machine Learning Engineer | Software Engineer
-
💥 Moira Mercado - Machine Learning Engineer | Software Engineer
-
🎲 James Adrian Castro - Machine Learning Engineer | Software Engineer
Some changes need to be address
- TBA
- TBA
- TBAThis project is developed for educational and portfolio purposes. WWE data is used under fair use for academic research.
@misc{wwe_popularity_2025,
title = {WWE Superstar Popularity Tier Prediction},
author = {Jay Arre Talosig},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/flexycode/CCMACLRL_COM231_PROJECT}}
}
Check the License tab for copyright permission
Professor Elizer Jr. Ponio is a Software engineer, lecturer, and Machine Learning engineer at the National University. With a Bachelor of Science degree and Master degree in Computer Science, he brings a strong foundation in computer science principles. Prof. Ponio's expertise in software engineering and machine learning is evident in his teaching style and practical approach. He is dedicated to providing students with a comprehensive understanding of the subject matter and incorporates real-world applications into his instruction. Prof. Ponio's combination of academic qualifications, industry experience, and passion for teaching make him a valuable asset to the National University community.
Chronological list of updates, bug fixes, new features, and other modifications for our Machine Learning Project.
- 💻 Final Project requirements for our project
- ✨ RAW
- ✨ SmackDown
- ✨ NXT
- 💻 Uploaded the Python Notebook
🧊 ML Final Project
⭐ If you find this project useful, please give it a star on GitHub! 🎯 Predicting wrestling stardom through data science and machine learning. Building the future of sports analytics one superstar at a time! 🏆












