Positive EV MLB Model
A machine learning model to find and exploit pricing inefficiencies in Major League Baseball moneyline markets.
Problem
Sports betting markets are competitive and noisy. Identifying consistent, profitable opportunities (positive expected value, or +EV) requires a systematic approach that can parse complex data signals, from player performance to market odds movement, without succumbing to common biases.
Approach
The approach involved building a feature-rich dataset from multiple sources, including historical game data and odds. Key steps included:
- Feature Engineering: Created rolling averages, situational stats, and odds-implied metrics.
- Labeling: Used a triple-barrier method to define outcomes (win, loss, no bet) based on whether odds were beaten, creating a more robust target than simple win/loss.
- Modeling: Trained an XGBoost model to predict the probability of beating the closing line odds, a proxy for profitable bets.
- Policy: Implemented a fractional Kelly criterion policy for bankroll management to optimize bet sizing based on the model's confidence.
Outcome
Backtesting showed a consistent positive return on investment over several seasons. A live paper-trading phase confirmed the model's edge, matching historical performance. The synthetic chart below represents the typical profit curve from a simulated season, demonstrating steady growth.