A basketball analytics framework that uses machine learning to predict team performance and identify competitive NBA team compositions.
Superteam uses XGBoost regression to predict team performance metrics (plus-minus scores) based on collective player statistics from NBA games. The system can:
- Simulate matchups between any two NBA teams
- Simulate an entire regular season to rank all 30 teams
- Run tournament-style brackets with random team compositions
- Find optimal team compositions within salary cap constraints
- Suggest trades to improve team performance
- Build optimal teams around a specific player
- Data Collection: Fetches comprehensive box score data from 7 different NBA API endpoints
- Machine Learning: XGBoost models trained on 10,000+ games
- Interactive Dashboard: Streamlit web application for exploration
- Trade Analysis: Find value-matched trades to improve team performance
- Salary Cap Awareness: Optional salary cap constraints for realistic team building
- Python 3.9+
- MongoDB Atlas account (for data storage)
-
Clone the repository:
git clone <repository-url> cd super_team
-
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install the package:
pip install -e ".[dev]" -
Configure environment variables:
cp .env.example .env # Edit .env with your MongoDB credentials
Run the Streamlit dashboard:
streamlit run src/superteam/app.pyThe dashboard provides the following applications:
- Raw Data: Browse player statistics with custom scoring metric
- Simulate Matchup: Compare any two NBA teams
- Simulate Regular Season: Rank all 30 NBA teams
- Simulate Tournament: Run multi-round tournament brackets
- Build Team Around Player: Optimize roster with a specific player
- Get Super Team: Find the best team composition
- Trade Finder: Suggest roster improvements via trades
To collect fresh data from the NBA API:
python src/superteam/collect_data.pyThis process:
- Fetches game data from multiple NBA API endpoints
- Stores player and team performance statistics in MongoDB
- Implements rate limiting to respect API constraints
To train new models:
python src/superteam/model.pyThe training script:
- Loads game data from MongoDB
- Preprocesses features for matchup prediction
- Trains XGBoost regression models
- Saves models for different team sizes (1, 5, 8, 10, 13 players)
super_team/
├── src/
│ └── superteam/ # Main package
│ ├── __init__.py # Package initialization
│ ├── app.py # Streamlit web application
│ ├── simulation.py # Team simulation & optimization
│ ├── helpers.py # Utility functions
│ ├── model.py # Model training script
│ ├── collect_data.py # Data collection from NBA API
│ ├── models.py # Pydantic data models
│ ├── constants.py # Configuration (environment variables)
│ └── logger.py # Logging configuration
├── tests/ # Test suite
│ ├── conftest.py # Pytest fixtures
│ ├── test_helpers.py # Helper function tests
│ ├── test_simulation.py # Simulation function tests
│ └── test_models.py # Pydantic model tests
├── notebooks/ # Jupyter notebooks
├── data/ # Player data CSVs
├── models/ # Trained XGBoost models
│ ├── 1_player_model.json
│ ├── 5_player_model.json
│ ├── 8_player_model.json
│ ├── 10_player_model.json
│ └── 13_player_model.json
├── pyproject.toml # Python packaging config
├── requirements.txt # Python dependencies
├── .env.example # Example environment file
└── doc/ # Documentation
| Variable | Description | Default |
|---|---|---|
MONGO_PW |
MongoDB password | (required) |
MONGO_DB |
MongoDB database name | dev |
MONGO_NAME |
MongoDB username | superteam |
LOG_LEVEL |
Logging level | INFO |
LOG_FILE |
Log file path (optional) | (none) |
The model uses the following hyperparameters:
- Booster:
gbtree - Learning rate:
0.1 - Estimators:
100 - Max depth:
4 - Early stopping:
50rounds
- Collection: Box scores are fetched from 7 NBA API endpoints (advanced stats, tracking, traditional, four factors, misc, scoring, usage)
- Storage: Raw data is stored in MongoDB collections
- Preprocessing: Statistics are flattened and normalized for model input
- Training: XGBoost models are trained on historical matchup data
- Prediction: Models predict plus-minus for team matchups
- For each team, get the top N players by minutes played
- Create feature vectors from player statistics
- Stack features for both teams into a single input
- Model predicts plus-minus differential
- Team with higher prediction wins
The get_super_team function uses a genetic algorithm-style approach:
- Start with a random team
- Generate random challenger teams
- If challenger beats current best (and meets salary cap), become new best
- Repeat for specified iterations
- Return best team found
Run the test suite:
pytestRun with coverage:
pytest --cov=superteam --cov-report=htmlThe test suite includes:
- Unit tests for helper functions
- Unit tests for simulation functions
- Unit tests for Pydantic data models
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests:
pytest - Submit a pull request
This project is for educational and research purposes.
- NBA API for providing comprehensive basketball statistics
- XGBoost for the machine learning framework
- Streamlit for the interactive dashboard