Superteam

A basketball analytics framework that uses machine learning to predict team performance and identify competitive NBA team compositions.

Overview

Superteam uses XGBoost regression to predict team performance metrics (plus-minus scores) based on collective player statistics from NBA games. The system can:

Simulate matchups between any two NBA teams
Simulate an entire regular season to rank all 30 teams
Run tournament-style brackets with random team compositions
Find optimal team compositions within salary cap constraints
Suggest trades to improve team performance
Build optimal teams around a specific player

Features

Data Collection: Fetches comprehensive box score data from 7 different NBA API endpoints
Machine Learning: XGBoost models trained on 10,000+ games
Interactive Dashboard: Streamlit web application for exploration
Trade Analysis: Find value-matched trades to improve team performance
Salary Cap Awareness: Optional salary cap constraints for realistic team building

Installation

Prerequisites

Python 3.9+
MongoDB Atlas account (for data storage)

Setup

Clone the repository:

git clone <repository-url>
cd super_team

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the package:
```
pip install -e ".[dev]"
```

Configure environment variables:

cp .env.example .env
# Edit .env with your MongoDB credentials

Usage

Web Application

Run the Streamlit dashboard:

streamlit run src/superteam/app.py

The dashboard provides the following applications:

Raw Data: Browse player statistics with custom scoring metric
Simulate Matchup: Compare any two NBA teams
Simulate Regular Season: Rank all 30 NBA teams
Simulate Tournament: Run multi-round tournament brackets
Build Team Around Player: Optimize roster with a specific player
Get Super Team: Find the best team composition
Trade Finder: Suggest roster improvements via trades

Data Collection

To collect fresh data from the NBA API:

python src/superteam/collect_data.py

This process:

Fetches game data from multiple NBA API endpoints
Stores player and team performance statistics in MongoDB
Implements rate limiting to respect API constraints

Model Training

To train new models:

python src/superteam/model.py

The training script:

Loads game data from MongoDB
Preprocesses features for matchup prediction
Trains XGBoost regression models
Saves models for different team sizes (1, 5, 8, 10, 13 players)

Project Structure

super_team/
├── src/
│   └── superteam/          # Main package
│       ├── __init__.py     # Package initialization
│       ├── app.py          # Streamlit web application
│       ├── simulation.py   # Team simulation & optimization
│       ├── helpers.py      # Utility functions
│       ├── model.py        # Model training script
│       ├── collect_data.py # Data collection from NBA API
│       ├── models.py       # Pydantic data models
│       ├── constants.py    # Configuration (environment variables)
│       └── logger.py       # Logging configuration
├── tests/                  # Test suite
│   ├── conftest.py         # Pytest fixtures
│   ├── test_helpers.py     # Helper function tests
│   ├── test_simulation.py  # Simulation function tests
│   └── test_models.py      # Pydantic model tests
├── notebooks/              # Jupyter notebooks
├── data/                   # Player data CSVs
├── models/                 # Trained XGBoost models
│   ├── 1_player_model.json
│   ├── 5_player_model.json
│   ├── 8_player_model.json
│   ├── 10_player_model.json
│   └── 13_player_model.json
├── pyproject.toml          # Python packaging config
├── requirements.txt        # Python dependencies
├── .env.example            # Example environment file
└── doc/                    # Documentation

Configuration

Environment Variables

Variable	Description	Default
`MONGO_PW`	MongoDB password	(required)
`MONGO_DB`	MongoDB database name	`dev`
`MONGO_NAME`	MongoDB username	`superteam`
`LOG_LEVEL`	Logging level	`INFO`
`LOG_FILE`	Log file path (optional)	(none)

Model Configuration

The model uses the following hyperparameters:

Booster: gbtree
Learning rate: 0.1
Estimators: 100
Max depth: 4
Early stopping: 50 rounds

How It Works

Data Pipeline

Collection: Box scores are fetched from 7 NBA API endpoints (advanced stats, tracking, traditional, four factors, misc, scoring, usage)
Storage: Raw data is stored in MongoDB collections
Preprocessing: Statistics are flattened and normalized for model input
Training: XGBoost models are trained on historical matchup data
Prediction: Models predict plus-minus for team matchups

Simulation Algorithm

For each team, get the top N players by minutes played
Create feature vectors from player statistics
Stack features for both teams into a single input
Model predicts plus-minus differential
Team with higher prediction wins

Team Optimization

The get_super_team function uses a genetic algorithm-style approach:

Start with a random team
Generate random challenger teams
If challenger beats current best (and meets salary cap), become new best
Repeat for specified iterations
Return best team found

Testing

Run the test suite:

pytest

Run with coverage:

pytest --cov=superteam --cov-report=html

The test suite includes:

Unit tests for helper functions
Unit tests for simulation functions
Unit tests for Pydantic data models

Contributing

Fork the repository
Create a feature branch
Make your changes
Run tests: pytest
Submit a pull request

License

This project is for educational and research purposes.

Acknowledgments

NBA API for providing comprehensive basketball statistics
XGBoost for the machine learning framework
Streamlit for the interactive dashboard

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Superteam

Overview

Features

Installation

Prerequisites

Setup

Usage

Web Application

Data Collection

Model Training

Project Structure

Configuration

Environment Variables

Model Configuration

How It Works

Data Pipeline

Simulation Algorithm

Team Optimization

Testing

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.claude		.claude
__pycache__		__pycache__
data		data
doc		doc
models		models
notebooks		notebooks
src		src
tests		tests
.DS_Store		.DS_Store
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

marcoloco23/super_team

Folders and files

Latest commit

History

Repository files navigation

Superteam

Overview

Features

Installation

Prerequisites

Setup

Usage

Web Application

Data Collection

Model Training

Project Structure

Configuration

Environment Variables

Model Configuration

How It Works

Data Pipeline

Simulation Algorithm

Team Optimization

Testing

Contributing

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages