Interaction designer focused on advanced analytics, data visualization, and other complex problems

College Football's Curveball: A Review

March 13, 2013

A number of people have produced some great analysis on decisions and strategies in the game of football. Brian Burke and his team at Advanced NFL Stats, for example, have been doing some fantastic analysis on data from NFL games, covering a wide range of topics, including home field advantage, the effect of weather conditions, time management, play calling balance, going for it on fourth down, and a host of other strategy topics. He has even published basic play-by-play data for all NFL seasons since 2002, so you can do your own analysis.1

If the existing data sets that support these analyses have a drawback, though, it is that there is not much detail recorded in the play-by-play accounts, which tend to include attributes like team, venue, time, down, distance to go, yards gained, and not much else. In particular, these accounts tend not to have information about either team’s behavior, such as offensive and defensive formations, direction of the play, or defensive behaviors like blitzing.

At its core, the analysis process is founded on data, and good data in football can be hard to find. Good analysis of data is even harder to find.

So I was excited to see that during the 2012 college football season, Football Study Hall hired two interns to catalog and code every play in a sample of 109 college games. Although they haven’t released the raw data, they have started to release some analysis based on the resulting dataset of more than 15,000 plays. A 109 game sample might sound like a lot, but it is actually a pretty small fraction of a college football season where 125 teams play more than 700 games.2 So, this isn’t the largest football play data set, and it isn’t the first to be published, but it promises to be an interesting one because of the level of detail of the information it is tracking, including, among other things, information about pre-snap formations, pass distances through the air, defense behaviors like blitzing, and incomplete pass reasons. This level of detail should provide an opportunity to explore a wide range of strategies and decisions that coaches and players make during the course of a game.

That’s what I’m hoping for, anyway. I want to see analysis as interesting and useful as Romer’s paper on going for it on fourth down or the articles at Advanced NFL Stats, analysis that gives you insight into the effectiveness of particular strategies or decisions, fueled by the increased detail in this new data set.

The first analysis article in the series is College football’s curveball by Bill Connelly, which looks at the use and effectiveness of one particular type of formation, the “no-back” formation. Unfortunately, the analysis in this article is unfocused, and in some cases, deeply flawed. There are moments where the truly unique potential of the underlying data set shows through, but those moments are often not explored in depth, and are surrounded by a lot of information with far less value.

Early on in the article, there is a very interesting (but very short) discussion suggesting that blitzing is an effective strategy against this type of formation:

… opponents only blitz against the no-back 25 percent of the time, choosing instead to rush four and react to the quarterback’s quick read. Considering the results — blitzing results in 4.5 yards per play, not blitzing results in 6.2 — this appears to be a generally faulty approach.

This is exactly the kind of strategic analysis I was hoping for from this project— important, detailed, and only made possible by the richness of the data set that they have collected. Connelly clearly understands the game extremely well, and can explain the numbers from the data with team strategies and behaviors that make sense.

Unfortunately, the numbers themselves are not always so well thought out. Another major part of the analysis of defense performance against this formation is a table ranking teams’ defensive performance against the no-back offense, which is not nearly as strong as the section on blitzing, and is very likely misleading.

Connelly uses the difference in average yards-per-play between no-back plays and all plays as the measure of a team’s defensive performance against a no-back offense. This technique embeds an assumption that a team’s expected performance against a no-back offense should be very close to its performance against an average play. It seems reasonable to assume that defensive performance against a no-back offense is related to overall defensive performance, but there are a couple of major problems with this approach. First, if there is a relationship, it is not necessarily one-to-one as the yard differential would indicate. Defenses might tend to struggle more (or less) against this strategy. It is also possible that no such relationship exists.

Second, and more importantly, the data for all plays includes the “no-back” plays. Any variation in the no-back measure will also create a similar variation in the all plays measure, so a comparison of these two groups will tend to show a relationship between them. Imagine a team that is really good against the no-back offense, and average otherwise. An overall average that includes the no-back plays would show a performance that is slightly above average. Ideally, we would like to compare independent measures, so that any relationships we find come from properties of the system we are measuring and not properties of the data we are using. We can do this by comparing the no-back plays to all of the other (one or more back) plays, instead of to the entire set of plays.

I created the following scatter plot to do that, using the data from Connelly’s article, plotting defensive performance against the no-back offense over defensive performance against all other offenses.

I have added a linear regression trend line, which shows a slight upward trend, but that trend is nowhere near the 1-for-1 value assumed in the original article. Even worse, the regression model indicates that this trend only explains about 1% of the variation in the data. In other words, we are not seeing any meaningful relationship between the number of yards per play that defenses gave up against no-back offenses and the number of yards they gave up against all offenses.

Now, the discussion in the article around this data does seem to indicate that the numbers presented do not show that good defensive teams fared any better against the no-back offense than bad ones. What we have done so far is just illustrate that more clearly than the original article did. But the reliance on the “differential” metric skews both the rankings and the analysis of individual teams’ performances. TCU, for example, is called out as a team that “struggled” against no-back offenses, even though their performance actually measured better than average and better than the trend.

There is yet another problem with these rankings, though, even if we are only ranking teams solely by their performance against the no-back offense. The problem is that the average performance numbers are very noisy, and a lot of that noise may be coming from sampling error. We started from a relatively small sample of plays, the no-back formation is relatively uncommon (overall, it was used 5.4% of the time), and the data set is further subdivided by school, leading to some extremely small samples. Combining that with a relatively large standard deviation of yards-per-play3 leads to some pretty large margins of error around each school’s No-back yards measure. This graph adds in some error bars that give an idea of the scope of the problem:

It is clear then, that the defensive rankings listed in the article are dominated by noise, and are practically useless. This problem of not adequately tracking or reporting error when comparing (and especially ranking) performance seems to be fairly common in sports analysis. You should be careful about drawing conclusions based on a ranking system that doesn’t account either for error or for the probability that two entries in the list will perform in accordance with their rankings.

The subdivision of the data by school, though, did have a side benefit. In breaking up the data this way, Connelly noticed a pattern:

Honestly, one could make a pretty good case that the best way to know how to defend a no-back set is to run it pretty frequently yourself.

I like this hypothesis for several reasons. First, it is about an actual decision that teams made, which increases the likelihood that it is important or has strategic implications. Second, it groups the data into much larger samples, which decreases the noise and makes it easier to find significant effects. And third, although Connelly seems unsure about this hypothesis, we can check it.

From this data, we’re not going to be able to know if running a no-back offense leads to good defensive performance against a no-back offense, but we can tell if there is a difference between the schools that ran the no-back offense frequently, and those that didn’t. Let’s take a look at another graph of the same data, this time with the data points color coded by the percentage of time that the team’s offense ran no-back plays– red for what Connelly considered frequent and dark blue for those considered infrequent. The light blue marks are for schools for which no information about the offensive plays was available (for the purposes of calculating averages, I have counted them as infrequent). Also, since we’ve seen no evidence for a relationship between one-or-more back yards and no-back yards, I’ve dropped the one-or-more back yards from the graph, and simply arranged the data in order by the amount of time their offense uses the no-back formation (as far as possible given the data that was published).

Adding in the averages for both the frequent offensive users of the no-back formation, and the infrequent ones, we can see that there does appear to be a difference between these two groups, and the difference appears to be significant. Unfortunately, the article did not contain data on the offensive use of the no-back formation for many teams, which complicates drawing conclusions from the data.

I remain excited by the potential of the data that Football Study Hall has collected, and I think that Connelly has a great understanding of the strategy of the game that leads him in some insightful directions with his analysis (such as his discussion of blitzing in this article), but good analysis requires defensible methods and results, and his analysis sometimes fails on those points.

  1. A slightly more analysis friendly version of the 2005 NFL season is available from The Football Project, and college football data from 2005-1012 are available from College Football Statistics 

  2. The full college season had more than 157,000 plays. For contrast, the 32 NFL teams play a total of 256 games in the regular season, running more than 40,000 plays. 

  3. Connelly’s original article does not state the standard deviation of yards-per-play, so I have had to estimate it here. The value I am using here, 9.35, was derived from College Football Statistics’ database of all college football plays in 2012.