A backtesting framework for testing trading strategies with statistical rigor using real market data.
This project implements a backtesting framework following Unix Philosophy principles:
- Single Responsibility: Each component does one thing well
- Composability: Components work together seamlessly
- Statistical Rigor: Proper significance testing and edge detection
- Simplicity: Easy to understand and extend
- Testability: Comprehensive test coverage (65+ tests)
-
Data Pipeline (
backtest/data_loader.py,backtest/downloader.py)- Downloads real market data from Polygon.io flat files
- Flexible timeframe detection (minute vs day data)
- Local caching to minimize API calls
-
Strategy Framework (
backtest/strategy.py)- Abstract base class for all trading strategies
- Simple interface:
on_data(bar) -> List[Order] - Built-in position tracking and convenience methods
-
Portfolio Management (
backtest/portfolio.py)- Executes orders and tracks cash/positions
- Supports both long and short positions
- Real-time portfolio valuation
-
Order System (
backtest/order.py)- Simple Order and Position classes
- Market and limit order support
- Clean separation of concerns
-
Backtesting Engine (
backtest/engine.py)- Orchestrates strategy execution over market data
- Linear processing with clear results
- Performance metrics and statistics
-
Statistical Testing Framework (
backtest/statistical_testing.py) π- Rigorous statistical analysis of strategy performance
- Transaction cost modeling (configurable %)
- Cross-sectional and time-series testing
- T-tests, confidence intervals, and significance testing
- Market cap filtering and stock selection
- Buy and Hold (
strategies/buy_and_hold.py): Simple benchmark strategy
- Python 3.13+
- uv package manager
- Polygon.io API credentials (free tier available)
- Clone the repository:
git clone <repository-url>
cd backtest- Install dependencies:
uv sync- Set up credentials:
cp .env.example .env
# Edit .env with your Polygon.io API keyfrom backtest.downloader import PolygonDownloader
from backtest.data_loader import DataLoader
from backtest.engine import Engine
from strategies.buy_and_hold import BuyAndHoldStrategy
from datetime import date
# Download data
downloader = PolygonDownloader()
data_file = downloader.download_stock_day_data(date(2025, 1, 2))
# Load SPY data
data = DataLoader.from_polygon_csv(data_file)
spy_data = [bar for bar in data if bar.ticker == "SPY"]
# Run backtest
engine = Engine(initial_cash=100000)
strategy = BuyAndHoldStrategy(investment_per_ticker=100000)
results = engine.run(strategy, spy_data)
print(f"Total Return: {results.total_return:.2%}")from backtest.statistical_testing import StatisticalTester
from strategies.buy_and_hold import BuyAndHoldStrategy
from datetime import date
# Test buy-and-hold strategy for statistical significance
tester = StatisticalTester(transaction_cost_pct=0.05) # 5% total costs
results, summary = tester.run_cross_sectional_test(
strategy_class=BuyAndHoldStrategy,
start_date=date(2025, 1, 2),
end_date=date(2025, 1, 31),
n_stocks=100,
initial_cash=100000
)
tester.print_summary(summary, "Buy-and-Hold Edge Test")
# Output: Statistical analysis with p-values, confidence intervals, win rates============================================================
BUY-AND-HOLD EDGE TEST RESULTS
============================================================
Sample Size: 50 stocks
Benchmark Return (SPY): 2.94%
PERFORMANCE METRICS:
Mean Return: -1.95%
Standard Deviation: 10.70%
Win Rate vs Benchmark: 26.0%
Mean Sharpe Ratio: -0.019
STATISTICAL SIGNIFICANCE TEST:
Null Hypothesis: Mean return = Benchmark return
T-statistic: -3.227
P-value: 0.0022
95% Confidence Interval: [-4.99%, 1.10%]
π΄ SIGNIFICANT UNDERPERFORMANCE (p < 0.05)
The strategy performs significantly worse than benchmark.
============================================================
# Run all tests (65+ tests)
uv run python -m pytest
# Run specific test categories
uv run python -m pytest tests/test_statistical_testing.py
# Code quality checks
uv run ruff check .
uv run ty checkbacktest/
βββ backtest/ # Core engine components
β βββ __init__.py
β βββ data_loader.py # Data loading from Polygon files
β βββ downloader.py # Data downloading from S3
β βββ engine.py # Main backtesting orchestration
β βββ order.py # Order and Position classes
β βββ portfolio.py # Portfolio management
β βββ strategy.py # Strategy base class
β βββ statistical_testing.py # Statistical analysis framework π
βββ strategies/ # Example trading strategies
β βββ buy_and_hold.py # Buy and hold implementation
βββ tests/ # Comprehensive test suite (65+ tests)
βββ data/ # Downloaded market data (local cache)
βββ test_spy_backtest.py # SPY backtest example
βββ test_statistical_edge.py # Statistical edge testing example π
βββ .env.example # Environment template
βββ pyproject.toml # Project configuration
βββ README.md
- Real Polygon.io market data integration (10,552+ stocks)
- Flexible data loading with automatic timeframe detection
- Portfolio management with long/short position support
- Complete order execution simulation
- Performance metrics and reporting
- Transaction Cost Modeling: Configurable costs (commission, slippage, fees)
- Stock Selection: Market cap filtering, volume filtering, random sampling
- Cross-Sectional Testing: Test strategy across many stocks, same period
- Time-Series Testing: Test same stocks across multiple periods (planned)
- Statistical Significance: T-tests, p-values, confidence intervals
- Edge Detection: Quantitative proof of strategy alpha vs benchmark
- 65+ comprehensive tests covering all components
- Type checking with
ty(all types validated) - Code quality with
ruff(all standards met) - Unix philosophy: each component does one thing well
This engine uses Polygon.io flat files which provide:
- Real historical US stock market data
- Minute and daily aggregates
- High-quality, institutional-grade data
- S3-compatible API access
The engine includes realistic transaction costs:
- Configurable percentage: Default 5% total costs
- Round-trip costs: Buy + sell transactions
- Real-world accuracy: Accounts for commission, slippage, regulatory fees
- Null Hypothesis: Strategy return = Benchmark return
- Sample Selection: Market cap filtered, random sampling
- Statistical Testing: Student's t-test, 95% confidence
- Performance Metrics: Mean return, standard deviation, win rate, Sharpe ratio
- Significance Analysis: P-values, confidence intervals, effect size
- Buy-and-Hold Individual Stocks: Shows significant underperformance vs SPY (p=0.0022)
- Transaction Cost Impact: 5% costs significantly erode single-stock strategies
- Diversification Value: SPY's automatic rebalancing provides systematic advantage
- Statistical Rigor: Proper significance testing reveals true strategy edge
- Do one thing well: Each class has a single, clear responsibility
- Work together: Components compose cleanly
- Text streams: Data flows through simple, predictable interfaces
- Reusable components across strategies
- Common utilities in base classes
- Consistent interfaces throughout
- Proper significance testing at 95% confidence
- Transaction cost modeling for realistic results
- Large sample sizes for statistical power
- Benchmark comparisons for edge detection
- Minimal external dependencies (numpy, scipy, pandas only)
- Clear, readable code
- Educational focus with production-quality implementation
- https://blog.headlandstech.com/2017/08/03/quantitative-trading-summary/
- https://jspauld.com/post/35126549635/how-i-made-500k-with-machine-learning-and-hft
- Follow the existing code style (ruff compliant)
- Add tests for new functionality (maintain 65+ test coverage)
- Run quality checks before submitting (
ruff check .,ty check) - Keep components focused and simple (Unix philosophy)
- Include statistical validation for new strategies
MIT