Deep Dive: Predictive QB Stats

The Question

A few weeks ago, a friend of mine (who knows I’m into football analytics) texted me what I’m sure he thought was a fairly innocuous and straight-forward question. As it happened, he was having an argument with a friend of his and wanted some data to back-up his assertions that the stats his opponent was using are meaningless, while those he was using were more predictive. So, he decided to call in a favor from the one football fan who he logically assumed would be looking at spreadsheets on a Friday night:

After telling Brandon about CPOE, EPA and forwarding him a link to airyards.com, it became apparent that he was looking at the 2019 season passing stats section of pro-football-reference.com and was hoping to distill meaning from some combination of the familiar stats he was looking at. The goal, from my friend’s point of view, was to use well-known, readily-available statistics to make his point rather than trying to explain advanced stats that he had just been made aware of himself. Further, Brandon had season level data in front of him and was intent on using that as his baseline for comparison; game and play-level information was simply overkill from his perspective.

I knew what had to be done. After a series of texts that reached no firm conclusion, I told Brandon that I’d have to get back to him, and fired up RStudio (slight lie, it was already running) to start my analysis.

Research Summary

Football analytics has come to increased prominence of late, causing many who have been plugged-in to its many notable findings confusion at the slow uptake of excellent metrics that quantify quarterback play -- ESPN’s Quarterback Rating (QBR), Completion Percentage Over Expected (CPOE), Expected Points Added (EPA), etc. -- and the continued reliance on those that are not quite as good (cough, Passer Rating). In this blog post, my aim is twofold: First, I will quantify the value of various metrics (both advanced and conventional) to predicting the only thing that matters in the NFL, wins. Second, I hope to show that the right conventional stats can be combined to form a reasonable approximation of value when applied to the task of stratifying the performance of quarterbacks. My goal is to bridge the gap between fans who are skeptical of advanced metrics and those who advocate for them by showing that reasonable evaluations of quarterback play can be made by either approach. My hypothesis, then, is that the output of a model built from conventional QB stats can do a decent (p < 0.05) job of correlating with the outputs of models produced by advanced measures when both are applied to predicting season-long win percentage for the teams whose QB’s generate them.

The Data

Beginning my study of quarterback stats, I started where my friend did, at the season-level passing statistics page of pro-football-reference.com (PFR). Given that PFR is the best source on the internet for a wide variety of NFL stats and that my main hypothesis centers on testing conventional stats in particular, this seemed like a great initial source. I collected data for every passer season beginning in 2006 (when ESPN’s advanced stat “QBR” began to be calculated) and used this as the baseline for testing my hypothesis.

NflscrapR would serve as my second data source, from which EPA can be accessed and CPOE can be derived. One limitation here would be a more limited sample size than that of the data from PFR, as nflscrapR is currently limited to play-by-play data beginning in 2009 at the earliest.

Next, Football Outsiders’ Value Over Average (VOA), Defense-Adjusted Value Over Average (DVOA) and Defense-Adjusted Yards Above Replacement (DYAR) metrics would form my next data set for the testing and comparison of well-known advanced metrics with the conventional stats collected from PFR.

Lastly, 538’s Josh Hermsmeyer (twitter: @friscojosh) recently posted CPOE data on his excellent site airyards.com. As such, CPOE from airyards was thus included in my study as an additional advanced data point for testing. As with nflscrapR's data, CPOE from airyards was limited to a shorter time horizon (2012-2019) than that of my main data set, but would be useful in the analysis nonetheless.

Further, several additional variables were calculated to arrive at our final data set, which is detailed in the table below.

Methodology:

Once all of the data was pulled and aggregated at the season level, the variables from PFR were processed and tested according to the following steps:

Filter out passers with less than 100 attempts. This was an arbitrary condition of the conversation begun with my friend and was included in our analysis for congruence with his initial question.
Filter data to omit the 2019 season (we will predict win_pct in 2019 later on, so we don’t want to use that data to build our models)
Remove blended and adjusted metrics to isolate unadjusted performance statistics. These make our final model less explicable than if we only use the raw performance stats that describe specific outcome counts and rates. Variables excluded in this step include:
- Net Yards per Attempt (net_ypa), Adjusted Net Yards per Attempt (adj_net_ypa) and Adjusted Yards per Attempt (adj_ypa); these stats blend a combination of YPA, touchdowns, interceptions, sacks and sack yardage in varying degrees
- First downs (first_downs), first downs per game (fd_pg) and first downs per attempt (fd_pa); these stats include both first downs and touchdowns
- Adjusted Positive Play Rate (adj_ppr); this stat blends first downs, touchdowns and passing attempts
Use linear regression to determine each remaining variable’s individual correlation to season level Win Percentage (win_pct)
Use linear regression to determine year-over-year (YOY) stability for the same metrics, identifying the extent to which metrics follow a QB from year to year as his own team changes around him or as he changes teams himself

At this point, evaluating both the statistical significance (p-value) and R-Squared (% variance in win_pct explained by the listed variable) is interesting, as the output of the steps above tells us which variables individually bear significance to variation in win percentage. These values are listed in the table below:

Some interesting findings here include:

Interceptions don't seem to matter in the grand scheme of things (this is great news for Baker and Browns fans heading in to 2020; likewise for whichever team signs Jameis Winston to a prove-it deal this off season). Of course, throwing a high number of interceptions is not helpful to winning individual games, but measured at the season level no cut of a quarterback's interceptions follow him from year to year, suggesting that interceptions are equal parts poor luck and/or poor support at the NFL level.
Yards per Attempt, First Downs and Touchdowns are important (as a note: many times statistical analysis presents us with some very obvious findings).
Touchdown-to-Interception Ratio (td_int_ratio) is among our least 'sticky' variables (column: YOY Stability/Adj. Rsq.), meaning that it is very likely not related to a true quarterback skill that the player carries with him from year to year and team to team. A favorite stat among fans and commentators alike, "TD:INT" is a fun number to look at, but its variation is very likely more attributable to a number of factors not within the control of your favorite team's quarterback.

With our variables now individually understood, we continue with filtering out those that - statistically speaking - don't seem to be important to our work going forward. To accomplish this, we:

Filter out metrics with p-values > 0.0001 for either correlation to win_pct or for YOY stability
Filter out metrics whose R-Squared value for YOY stability is less than average for all variables tested. This equates to an R-Squared value of 0.1674 (the average R-Squared value for YOY correlation between modeled PFR variables)

After applying these filters, the significant variables remaining from PFR season-level passing statistics are depicted below:

With variables now selected that individually both

1) bear statistical significance to variations in win percentage, and

2) are more stable than average (relative to the other available, conventional PFR statistics) from year to year.

We can now begin to build a multivariate model to predict team level win percentage based on a combination of these Quarterback stats:

Inspecting the initial seven-variable model, evidence of multicollinearity (a condition where explanatory variables correlate more closely with one-another than they do with the target variable) is present. We suspect this because of two facts:

1) variables which individually show statistically significance (p < 0.0001; labelled “Pr(>|t|)” above) appear to lose significance in the presence of multicollinearity, and

2) we observe sign-switching (positive to negative, or vice-versa) in explanatory variables’ regression coefficients (“Estimate”, above) between the univariate models and our multivariate model.

To deal with multicollinearity, we must search for high pairwise correlations between explanatory variables (in this case any having absolute values of approximately greater than 0.60). In our first correlation analysis (example below), we find a very high pairwise correlation between total pass yards (pass_yds) and total non-scoring first downs (ns_fd). Looking to the first column of the output, we identify which of these variables correlates more closely with win percentage (win_pct) and then drop the lower of the pair from our analysis, allowing that variable to be “explained” by the one with the higher correlation to win_pct. In this case, pass_yds (0.504) correlates more strongly with win_pct than does ns_fd (0.485), so we drop ns_fd from our analysis going forward.

With ns_fd dropped out of the analysis, we re-run our regression, look for further evidence of multicollinearity, eliminate high-pairwise correlations, rinse and repeat until evidence of multicollinearity no longer exists in our model. Sparing readers the gory details (although they’re all documented within in the code shared to my github page), what we end up with after the whole process is the following model:

From the regression output above, we see that our model is statistically significant (p-value: < 0.0001) and that approximately 35.4% of the variation in a team’s season-level win percentage can be explained (Adjusted R-squared = 0.354) by a weighted combination of sack rate (sack_pct), completion percentage (cmp_pct) and total touchdowns (td).

An Alternate Approach

Going back to the beginning, you may wonder what the results would look like if we hadn’t filtered out the adjusted stats (Net Yards per Attempt, Adjusted Net Yards per Attempt, First Down metrics, etc.) and run the same analysis. Using the same process described above (“The Methodology”) but keeping these metrics in the mix, we arrive at the following conclusions:

After iterating through combinations of these variables and eliminating those that correlate more strongly with one-another than with season-level win percentage, we arrive at the following model of QB performance:

From the regression output above, we see that our model is statistically significant (p-value: < 0.0001) and that approximately 37.8% of the variation in a team’s season-level win percentage can be explained (Adjusted R-squared = 0.3775) by a weighted combination of sack rate (sack_pct) and the blended statistic “adjusted positive play rate” (adj_ppr). As a reminder, this process was heavily summarized here; for those interested, each step in the creation of the model above can be reviewed and reproduced within the code and commentary linked to the brownalytics github page.

Results

So what season-level win percentage would our models have expected QBs to attain in 2019 given their underlying statistical performances? Additionally, how do these models rank the 2019 season’s QB’s according to this output?

These estimates are listed in the table below, with the column “xWinPct_rank” stratifying each QB according to the expected win percentage (xWinPct) that each of our models (the purely Conventional Model and the model incorporating adjusted stats) would expect that QB’s team to attain given his output in the categories included in each:

Are the rankings perfect? Definitely not! Average absolute error in both outputs tested at approximately 0.16 (or 16%; equivalent to approximately +/-2.5 wins per season), which means that each of our findings should be taken with a grain of salt. But part of the process of creating analytical models like these is that we find out just how uncertain the prediction we’ve made is. Recall that in the creation of our conventional model, we learned that only 35% of the variation in a team’s win percentage is explained by the stats that drive both the xWinPct and xWinPct_rank values presented above (38% in the adjusted model). That means between 62% and 65% of the variation in a team’s win percentage is influenced by things that aren’t picked up in the stats we looked at here (coaching, non-QB personnel, luck, etc.). Given the acknowledgement that much of what makes teams successful is not under a quarterback’s partial or direct control, an average error measure of 15% or 2.5 games based solely on three passing statistics isn’t a bad result.

So you might ask, what might models like these be good for? Given the availability of more accurate alternatives, it may not be worth the effort to create it or to use them in decision-making. But, if data to drive more precise methods of decision making are unavailable, less explicable, or more expensive to produce (in terms of time, computing power, money or both), these simple models could offer a decent solution for turning readily-available and accessible statistics into a reasonably good weightings of quarterback performance.

Advanced Metrics for estimation of QB Contribution to Win Percentage

So, how do our models based on both conventional stats and adjusted conventional stats stack up to those using advanced metrics such as QBR, EPA, CPOE and DVOA? We used a similar process to test those inputs and found the following:

What we find is encouraging for our conventional models, as advanced statistics don’t seem to describe an overwhelmingly higher amount of variance in win percentage than do the models created above (“Conventional Stats Model” and “Adjusted Conventional Stats Model” in the figure; catchy names, I know). But the true test of our initial hypothesis revolves around how well the outputs of our models (xWinPct in the previous section) correlate to the outputs of those models of win_pct which can be derived using the advanced statistics referenced above.

To perform our final test, we used the advanced QB stats already introduced to predict season-level team win_pct and tested the correlation between these predictions and those created by the “Conventional Stats Model” and the “Adjusted Conventional Stats Model”. Recall that our goal at the outset was for our models’ predictions to bear a relationship to more advanced models at a p < 0.05 level of significance. The results of these comparisons are summarized below:

As can be interpreted from the ‘p-value’ columns, we see that both of the conventional models performed very well in terms of correlating to the predictions made by more advanced statistics, as p-values for these comparisons were held below our desired threshold of 0.05. Additionally, R-squared values for these correlations are strong for the most part, lending further confidence to our models' abilities to estimate win percentage from their underlying statistical metrics.

Further Work

A few potential issues with our approach that can be cleaned up in future iterations of this work are listed in this section as an acknowledgement that other approaches to this question may yield better results for other analysts and as thought starters for anyone who may want to take on the challenge of improving on the baseline laid out here. One such improvement would be to use game-level data instead of season-level data for a similar analysis to increase sample size and to introduce wider variation into the discussion (game-by-game performance will vary more widely than season-level averages). Including a wider sample of quarterbacks would also be an interesting addition. This study included only those with >= 100 attempts in each season and could be subject to some selection bias as a result (note: I reran the conventional model with this threshold set at >= 60 attempts and found similar results). Another approach might be to use different types of models (I used linear regression exclusively here) to produce alternate results that may improve predictive accuracy. Normalizing input variables is another practice which was not incorporated here that could affect outcomes, particularly early in the analysis where variables were tested for significance before being included in our first iteration of what ultimately became the conventional model.

Conclusion

I hope this post lays out well that there is perfectly useful information contained in conventional quarterback stats. I also hope that readers who may be on the fence about advanced stats might recognize through the comparisons that I’ve made here that advanced stats, while sometimes challenging to understand fully, are better than conventional stats in general. At some point though, even the analytically-inclined reader will likely be at a tailgate or a watch party with a friend who wants to talk QB rankings but who is also skeptical of “Analytics” for any number of reasons. In these situations, it’s important to recognize that not everyone has access to advanced stats and further, not everyone is ready to immediately accept the idea that EPA, DVOA or CPOE - things they may well have never heard of - tell a better story than more familiar passing metrics. In these cases, I hope the reader will be able to draw on the work done in the course of crafting this post to make a simple, yet reasoned, case for the conventional stats that do matter – touchdowns, completion percentage, and sack percentage - with regards to predicting season-level quarterback success. In so doing, I hope the reader will be able to help to bridge the gap between analytically curious and analytically skeptical fans while simultaneously ensuring that their Sunday invitation to watch football and eat buffalo chicken dip remains stable week-over-week and year-over-year.

All code and data used to produce this analysis can be accessed at the following link: https://github.com/brownalytics/qb_stats

Acknowledgements:

A big "Thank you!" goes out to Ben Robinson (twitter: @benj_robinson) and Lee Sharpe (twitter: @LeeSharpeNFL), who helped me edit the initial monstrosity of a write-up that I sent them, and who reassured me about sharing my amateurish coding and very amateurish statistical skills with the world! If you like football and quality analysis, do yourself a favor and follow them on twitter.

Sources:

Airyards. "Completion Percentage Over Expected - 2012-19". Airyards.com. https://airyards.com/cpoe.html (accessed February 6, 2020).

Football Outsiders. 2019 NFL Quarterback Ratings. www.footballoutsiders.com (Accessed February 6,2020).

Football Outsiders. 2018 NFL Quarterback Ratings. www.footballoutsiders.com (Accessed February 6,2020).

Football Outsiders. 2017 NFL Quarterback Ratings. www.footballoutsiders.com (Accessed February 6,2020).

Football Outsiders. 2016 NFL Quarterback Ratings. www.footballoutsiders.com (Accessed February 6,2020).

Football Outsiders. 2015 NFL Quarterback Ratings. www.footballoutsiders.com (Accessed February 6,2020).

Football Outsiders. 2014 NFL Quarterback Ratings. www.footballoutsiders.com (Accessed February 6,2020).

Football Outsiders. 2013 NFL Quarterback Ratings. www.footballoutsiders.com (Accessed February 6,2020).

Football Outsiders. 2012 NFL Quarterback Ratings. www.footballoutsiders.com (Accessed February 6,2020).

Football Outsiders. 2011 NFL Quarterback Ratings. www.footballoutsiders.com (Accessed February 6,2020).

Football Outsiders. 2010 NFL Quarterback Ratings. www.footballoutsiders.com (Accessed February 6,2020).

Football Outsiders. 2009 NFL Quarterback Ratings. www.footballoutsiders.com (Accessed February 6,2020).

Football Outsiders. 2008 NFL Quarterback Ratings. www.footballoutsiders.com (Accessed February 6,2020).

Football Outsiders. 2007 NFL Quarterback Ratings. www.footballoutsiders.com (Accessed February 6,2020).

Football Outsiders. 2006 NFL Quarterback Ratings. www.footballoutsiders.com (Accessed February 6,2020).

Horowitz, Maksim, Ventura, Sam & Yurko, Ron (2016). NFLScrapr; Rstats package for accessing NFL play-by-play data, providing expected points and win probability estimate (Accessed February 6, 2020).

Pro Football Reference. 2019 NFL Passing. www.footballoutsiders.com (Accessed February 6,2020).

Pro Football Reference. 2018 NFL Passing. www.footballoutsiders.com (Accessed February 6,2020).

Pro Football Reference. 2017 NFL Passing. www.footballoutsiders.com (Accessed February 6,2020).

Pro Football Reference. 2016 NFL Passing. www.footballoutsiders.com (Accessed February 6,2020).

Pro Football Reference. 2015 NFL Passing. www.footballoutsiders.com (Accessed February 6,2020).

Pro Football Reference. 2014 NFL Passing. www.footballoutsiders.com (Accessed February 6,2020).

Pro Football Reference. 2013 NFL Passing. www.footballoutsiders.com (Accessed February 6,2020).

Pro Football Reference. 2012 NFL Passing. www.footballoutsiders.com (Accessed February 6,2020).

Pro Football Reference. 2011 NFL Passing. www.footballoutsiders.com (Accessed February 6,2020).

Pro Football Reference. 2010 NFL Passing. www.footballoutsiders.com (Accessed February 6,2020).

Pro Football Reference. 2009 NFL Passing. www.footballoutsiders.com (Accessed February 6,2020).

Pro Football Reference. 2008 NFL Passing. www.footballoutsiders.com (Accessed February 6,2020).

Pro Football Reference. 2007 NFL Passing. www.footballoutsiders.com (Accessed February 6,2020).

Pro Football Reference. 2006 NFL Passing. www.footballoutsiders.com (Accessed February 6,2020).