The Road to WAR (for hockey), Part 4: You can’t spell “An Incremental Improvement” without two “team”s

Through the previous three parts of this series, I’ve outlined the history and problem of finding a single catch-all statistic, why rates are an effective means of capturing this, and how I’ve chosen to divide the problem into several event types. It’s finally time to get to some common currencies, but to jump all the way to player evaluation in one step is excessive when there are more issues to shake down first. So in this post (and the next one), I’ll be applying these methods to create team rankings: the first will be retrospective and heavier on advanced methods, and the second will look at a more predictive method that can be done without regression.1

First, we have the rates of events for each team in a game. At a basic level, we have

[team events] = [baseline rate]*[circumstances]

But at any one time, we’re looking at (a minimum of) two competing processes affected by the teams in the game: scoring on each goal. And so we break it down:

[home scoring] = [baseline and circumstance]*[home offense]*[away defense]
[away scoring] = [baseline and circumstance]*[away offense]*[home defense]

where each of the home and away adjustments is with respect to an average team, or a relative rate of 1. I’m saving the replacement discussion for later in this series to keep things simple, but you can see where it would fit here: change the baseline to a team of replacements2 and make the team ability with respect to that level instead.

My team laid out a lot of the groundwork for this with goals scored in our AoAS paper, but the method works equally well with shot attempts. And we know that splitting shot attempts by basic goal probability reveals stark differences, so let’s do that for now, along with blocked shots, for the past 6 NHL seasons (2008-2014), using the Poisson process regression method from that paper: that’s 30 teams, 2 states (O/D), 4 event types and 6 seasons for even strength alone; double that number for PP and SH estimates. We also account for all those good things earlier: score effects, fatigue and faceoff results.3 At this point, though, we shouldn’t expect too much from these results compared to the numbers unadjusted by process regression; there’s enough relative balance in the schedule, compared to player substitutions, that the raw numbers should be fairly close to the adjustments, and we’re not likely to change our opinion about a whole team because of this.

Our adjustment factors vary about 1 — better offensive and worse defensive teams will have higher ratings, and vice versa for lower ratings. Letting the baseline rate be that for a home team in a tie game, here are the rates at even strength for each of the three zones plus blocked shots:


Blocked-20132014Low Prob-20132014

Home Plate Minus Slot-20132014


The last three plots look very close to what we’ll see on the Hextally app. There’s enough distinction between these plots to demonstrate that not all shots are equal in intent; the correlation between these rates is high but far from perfect, with the similarities explained by offensive zone possession and the differences by offensive style and noise. When we put all unblocked shots together with mean goal probabilities, we get a simpler shot for the expected goals a team would get with average shooting from each position:



We have the same graphs for power-play and short-handed expected goal-scoring rates:



Now, each of these was done with a reference baseline: if that team was at home with the game tied, and every player was an average shooter, and every goaltender of average ability controlling for distance/location. For a team rating, this is pretty cleaned up, but we’re one step away from turning this into a shot-attempt-based goals compared to average.

Shot Quality Assurance, Redux

With this mechanism for learning team ability at each of these separate shot types, we have the means to do a little prediction: namely, we want to see whether we gain anything extra in predicting future team ratings when we mix and match these different types. Consider 5 different ratings we can measure: shot zones 1 through 3, blocked shots, and a merged model for all shot attempts (which I’ll call T-Corsi for here). Each has an offensive and a defensive component for each team. Now, consider three linear regressions of shot class outcomes at year y+1 given year y predictors:

  • T-Corsi at time y.
  • That shot class;
  • All shot classes.

When I run the linear regression for each of these — offensive and defensive ratings, here are the tables of adjusted R-squared values4:

Offensive Outcome T-Corsi Self Combined
Perimeter 0.282 0.322 0.334
Home Plate 0.231 0.245 0.263
Slot 0.002 0.084 0.111
Blocked 0.378 0.485 0.478
T-Corsi 0.404 0.404 0.411

Defensive Outcome T-Corsi Self Combined
Perimeter 0.313 0.366 0.373
Home Plate 0.265 0.287 0.311
Slot 0.319 0.414 0.439
Blocked 0.436 0.488 0.521
T-Corsi 0.586 0.586 0.572

Immediately we see two things: first, defensive skill is more persistent than offensive, year after year, and way more persistent at stopping shots in the slot. Offensive T-Corsi can’t predict a team’s ability to shoot in the slot at all, when we know these shots are materially more valuable.

Second, even with more observations, T-Corsi is a worse predictor of every offensive and defensive sub-ability than using only that ability in the previous season. though we do get an improvement when we include each component with different weights. There must be persistent parts of each team’s construction that cause shots of each type to be more or less pronounced, and not taking these into account lessens our ability to predict what will happen next.

Given that we now have a measure for how a team changes the rate at which scoring events happen for and against them, we have a mechanism for adjusting what should be expected from a team in this position. In the next post, I’ll be putting things together to get us a counting measure of expected goals with these adjustments in place, to give us a more retrospective evaluation of how a team performed in a season.