Measuring WAR is as much about context as it is about performance. Since our goal is to value measures that are predictive of future performance, a team that plays against strong opposition should be compensated because any baseline team would do worse in expectation; a team that plays a series of games at home with sufficient rest should expect to do worse than their record suggests when they’re on the road. How we adjust these events to a relative baseline — average or replacement — determines the success or failure of a system.
The events we have break down into two groups:
- Processes, such as shooting events, broken into groups based on their degree of effectiveness.
- Binary success/failure events, which are the consequences of these processes: faceoff wins/losses, whether shots become goals, (eventually) whether a zone entry was successful while retaining possession.
It’s easier to get a handle on what “baseline” means for the second group: we can get the expected probability of success for a single event with our baseline, like a standard shooting percentage or faceoff win fraction, and match that against what we observed. But for processes over time, it’s not so clear what we ought to do. We have three choices:
- Change the value of the event itself from the perspective of the agent in question. A shot registered against the New Jersey Devils should be worth more than one registered against the Colorado Avalanche because it’s harder to do; we might not know it for sure, but we sure expect it from the data. If we didn’t have time on ice, this would be the natural option — and any approximation to this that doesn’t use time on ice will probably have to use this approach — but making this adjustment doesn’t follow naturally from the model, so I’d rather avoid it if possible.
- Explicitly divide the credit or blame for an event between the various factors. We know Sidney Crosby is a good player, so he should get more credit for a shot on goal than his defensive defensemen. This is essentially how most of the linear regression models work: each goal, point, expected goal, etc. is divided up between the people on the ice according to their respective abilities. (Heck, GVT assumes equality and splits it evenly, but it can be safer to do that than to assume you know what the weights should be.)
- Keep the value of the event the same, but note the context in which it occurred so as to adjust it for each player. This means we count it a bit differently for everyone in context, but a goal is a goal no matter who scored it or who was on the ice; it’s that we would expect more goals from some players and less from others.
This last point is obviously where I want to go with this, and not just because I’m knocking the other two methods. It’s because when we think about rates over time, it’s a lot easier to picture why they work the way they do. Just like a car going 50 miles an hour will cover the same distance in one hour that a cyclist going 25 will do in 2, we can think about it this way: an excellent offensive player either increases the scoring rate over the same time, or makes it as if the clock went slower but kept the pace of play the same. We can think of actions as ones that affect the clock: a center-ice faceoff takes away from the scoring time for both teams no matter who wins it.
And herein we have our first definition of baseline for any scoring process: the total number of goals we expect to score if we replace the player/team/agent in question with an appropriate replacement. For teams, we can use the league average, since all teams play the same amount of time in the regular season. For circumstances like goal leads or deficits, we can do this with respect to a tied game for the away team (only because everything else goes up from here). For players, this gets trickier because we haven’t defined average or replacement yet, but the definition holds when we do, but for now, Goals Above Baseline for teams is immediate from the last post — the observed number of goals scored for (or against) a team minus the expected number of goals, given each of the situations we observed previously. This is calculated for offense, defense, power-play and short-handed situations using average zone value for shots, in each of the three scoring zones of interest.
The full table for all team-seasons of Retrospective Shooting Expected Goals Above Baseline is below. A few highlights:
- Top four spots for Even Strength Goals Above Baseline since 2008: the 2010-2011 Lightning, 2013-2014 Sharks and 2013-2014 Kings, and 2009-2010 Blackhawks. I feel less silly for picking the Sharks in my pool now.
- Bottom four spots: 13-14 Sabres, 13-14 Leafs, 10-11 Oilers and 09-10 Oilers. Again, no big surprises.
- Best power play teams: 10-11 Sharks, 08-09 Red Wings, 08-09 Ducks. That Ducks team had little else going for it, unlike the other high achievers here.
- Worst power play teams: 09-10 Thrashers, 09-10 Panthers, 08-09 Oilers. These names recur throughout the rest of the top. The 13-14 Devils crack the top 5 for being exceptionally awful compared to their recent past, and I can’t imagine that’ll persist again.