College basketball teams only play about 30 games with the same player group before the season is over and turnover begins, so interpreting the right metrics is important for any model. How we generally interpret basketball here at SMA is by putting more weight into metrics with less volatility than whether or not the ball falls into the hoop.. (or, for example, measures that'll predict FG% better than FG%). Shot proximity is important in this regard, so for example, the model doesn't like long two point jumpshots. They're the most inefficient shot in basketball, and as such, our model penalizes teams who take a lot of them, regardless of the FG% outcome. The college three-point line is about 2 feet closer than the NBA's (depending where you shoot along the arc), so the most efficient teams need to be taking threes or taking high-percentage, close-proximity shots. Team passing abilities are important along these lines of thinking too, and so is offensive rebounding ability.
Our CBB model is not a four-factor (http://www.basketball-reference.com/about/factors.html) model. There's just not enough of a sample size to develop a reliable regression equation for 1 team (with new components every year) in one 30-ish game season.
While every model acknowledges defense as important, in many of them it's being slightly undervalued. It's also another area where FG% allowed can be something a model can get too wrapped up in. Yeah, effective FG% allowed over the course of a season is important to look at, but we'd rather look at turnover and defensive rebounding metrics.
Free throws are important in college basketball, and anyone that's ever had action on a game on watched some of these kids in the 4th quarter knows that. Teams get into the bonus and double-bonus quick, and poor free throw shooting is an easy way to give away possession after possession. Losing a cover because an 18 year old kid can't make free throws is infuriating.
In line with free throws, fouls are important too, but also volatile because you're talking about a different crew of striped shirts and whistles in every game. That being said, we do consider top/bottom fouls committed rate and fouls drawn rate to be a big factor in our model.
I put a heavier weight on recent play. College teams "gelling" is a real concept with such high year-to-year turnover.
At the core of our NBA model is PIE, a metric first introduced by NBA.com in 2013. It's a stat that takes a stab at measuring overall efficiency on both the player and team level. From NBA.com:
A high PIE % is highly correlated to winning. In fact, a team's PIE rating and a team's winning percentage correlate at an R square of .908.
We also believe recent team play is a better predictor of a team's future performance than their play from several months ago, and as such, recency is more heavily weighted.
Our NBA model doesn't care about a team's record. It objectively measures a team's efficiency throughout each game, from start to finish, possession by possession - regardless of whether it ended as a win or loss. A team being on an 0-4 slide doesn't mean much if they played top teams competitively. Conversely, a team being on a 4 game win streak where they barely beat the Lakers/Nets/Suns/Magic doesn't paint the whole picture either. Objective measures of efficiency are what gives our model the ability to find value in over and underpriced lines in the betting market as a result of the general betting public's ignorance.
Much like our NBA model, our WNBA model is based on an advanced metric called PIE, a stat that takes a stab at measuring overall efficiency on both the player and team level. Our WNBA model objectively measures a team's efficiency throughout each game, from start to finish, possession by possession - then develops projections based on that efficiency. Objective measures of efficiency are what gives our model the ability to find value in over and underpriced lines in the betting market as a result of the general betting public's ignorance.
Our new NCAAF model uses the public Fremeau Efficiency Index (FEI) as the core of its projections. FEI is a college football rating system based on opponent-adjusted drive efficiency. Kind of like college basketball, with so many teams in college football of varying talent levels, it's important to weigh each performance only as much as the opponent's team strength dictates.
Approximately 20,000 possessions are tracked each year in college football. FEI filters out first-half "clock-kills" and end-of-game "garbage time" drives and scores.
"Unadjusted game efficiency (GE) is a measure of net success on non-garbage possessions, and opponent adjustments are calculated with special emphasis placed on quality performances against good teams, win or lose. Offensive FEI (OFEI) is value generated per offensive non-garbage possession adjusted for the strength of opponent defenses faced. Defensive FEI (DFEI) is value generated per opponent offensive non-garbage possession adjusted for the strength of opponent offenses faced. Special Teams Efficiency (STE) is the average value generated per non-garbage possession by a team's non-offensive and non-defensive units."
This method of advanced evaluation of college football teams is then poured into our equations, allowing our model to produce specific game projections.
Our new NFL model uses an advanced stat concept known as DVOA, founded by the Football Outsiders research/analysis group. DVOA measures a team's efficiency by comparing success on every single play to a league average based on situation and opponent.
DVOA is a method of evaluating teams, units, or players. It takes every single play during the NFL season and compares each one to a league-average baseline based on situation. DVOA measures not just yardage, but yardage towards a first down: Five yards on third-and-4 are worth more than five yards on first-and-10 and much more than five yards on third-and-12. Red zone plays are worth more than other plays. Performance is also adjusted for the quality of the opponent. DVOA is a percentage, so a team with a DVOA of 10.0% is 10 percent better than the average team, and a quarterback with a DVOA of -20.0% is 20 percent worse than the average quarterback. Because DVOA measures scoring, defenses are better when they are negative.
This method of advanced evaluation is then poured into our equations, allowing our model to produce specific game projections.
While most MLB models make projections based on how a team's been hitting as a whole, our offensive projections are based on each and every player included in that particular team's lineup for the day. This means our model waits for each lineup to be posted (usually within a few hours before first pitch), then analyzes it on a player-by-player basis. This method is to ensure the highest accuracy in predicting a team's performance.
The pitching/hitting evaluation component of the model uses advanced MLB metrics that go way over the casual baseball fan's head. Exit velocity, batted ball profiles, splits, plate discipline metrics, park factors, performance with or against certain pitches/velocities (combined with pitch usage rates), BABIP, FIP/xFIP, SIERA, and wRC+ are among the many metrics incorporated in the model. The challenge of MLB is analyzing advanced data to determine which players have been lucky and unlucky in relation to their actual performance. This is something that public/square bettors are very poor at figuring out, leaving a lot of value on the table in the betting market. Much like a player projection system, our model identifies a "true" performance level for players and projects games accordingly.
Our tennis model is based on an Elo concept that factors in not just who a given player has beaten (or lost to), but how they actually performed during the match/how much they won or lost by. It factors in service and return strength by a number of metrics scraped from tennisabstract, as well as a player's court surface strength, incorporating each player's recent form. It generates win probabilities, then compares the probabilities to the implied probability from the current line. That's where value is derived.