Category Archives: Uncategorized

2014 Pittsburgh Hockey Analytics Workshop — Updated!

We are pleased to announce the 2014 Pittsburgh Hockey Analytics Workshop, hosted by the Carnegie Mellon University Department of Statistics and war-on-ice.com.  The workshop will consist of short talks and panel discussions about hockey analytics, as well as plenty of social time to help bind the community together.

  • When:  Saturday, November 8, 2014, 11am — 5pm
  • Where:  Carnegie Mellon University (Doherty Hall 2315), 5000 Forbes Avenue, Pittsburgh, PA
  • Registration:  Here
  • Attendance Fee:  None, though we will ask for donations to cover snacks and volunteer expenses — see the Donate button on our home page

A full list of speakers and topics is now available:  See here for the workshop program.

Confirmed speakers include (but are not limited to):

  • Stephen Burtch (@SteveBurtch), sportsnet.ca
  • Jen Lute Costella (@RegressedPDO), Puck Daddy
  • Sean Gentille (@seangentille), The Sporting News
  • Jesse Marshall (@jmarshfof), Faceoff Factor and The Pensblog
  • Your war-on-ice.com founders (@acthomasca, @stat_sam), Carnegie Mellon University

Social events for workshop attendees:

Friday night meetup:

Saturday night meetup:

  • When:  6pm (or immediately following the workshop)
  • Where:  Hough’s
  • What:  Believe it or not, we’ll actually be watching the game (Penguins @ Sabres, 7pm)

Workshop Parking:  Free in the Carnegie Mellon University East Campus Garage (note:  due to construction, the garage entrance is at the intersection of Forbes Avenue and Beeler Street).  Follow the map below to park and to get to the workshop room!

Google Maps

  1. The full Carnegie Mellon University Campus Map is here.
  2. When you enter Doherty Hall, walk to the end of the hallway and go up the stairs (or elevator) to the second floor.  Continue down the hallway to room 2315.

In the media:

In Labs: Optimal Time To Pull The Goalie

We’ve included a new app in the Labs section of the site that simulates the end-of-game scoring process using the standard Poisson scoring assumption. This is the same model that’s been used for the last 40 years, but got a richer treatment in Beaudoin and Swartz’s 2010 article in The American Statistician. (One can make the argument that these times are still too conservative, but for now, the fewer assumptions we make, the better.)

This assumes that you know the scoring rates of both teams at even strength, the trailing team with an extra attacker, and the leading team shooting on an empty net; the defaults are reasonable baseline values and can be adjusted with the sliders.

We run 100,000 simulations of goal-scoring times with these rates and these pull times with respect to 3rd-period deficits. The optimal time to pull with 1 or 2 goal deficits is calculated as the point at which the gain in the probability of tying the game is greatest and displayed as red and green lines respectively.

The Road to WAR (for hockey), Part 5: Getting Goals Above Baseline

Measuring WAR is as much about context as it is about performance. Since our goal is to value measures that are predictive of future performance, a team that plays against strong opposition should be compensated because any baseline team would do worse in expectation; a team that plays a series of games at home with sufficient rest should expect to do worse than their record suggests when they’re on the road. How we adjust these events to a relative baseline — average or replacement — determines the success or failure of a system.

The events we have break down into two groups:

  • Processes, such as shooting events, broken into groups based on their degree of effectiveness.
  • Binary success/failure events, which are the consequences of these processes: faceoff wins/losses, whether shots become goals, (eventually) whether a zone entry was successful while retaining possession.

Continue reading

The Road to WAR (for hockey), Part 4: You can’t spell “An Incremental Improvement” without two “team”s

Through the previous three parts of this series, I’ve outlined the history and problem of finding a single catch-all statistic, why rates are an effective means of capturing this, and how I’ve chosen to divide the problem into several event types. It’s finally time to get to some common currencies, but to jump all the way to player evaluation in one step is excessive when there are more issues to shake down first. So in this post (and the next one), I’ll be applying these methods to create team rankings: the first will be retrospective and heavier on advanced methods, and the second will look at a more predictive method that can be done without regression.

Continue reading

The Road to WAR (for hockey), Part 3: Shot Quality Assurance, plus A Bonus on Travel Fatigue

Note: This was originally posted at acthomas.ca. Further updates will be posted to this entry.

We’ve got the notion now that the rate at which events occur, and what happens to the game because of these events, drives my understanding of the game. And the last post mentioned a little bit about differing types of events, though we haven’t established anything definitive about effectiveness and how we should divide this up. The data we have available for the past six seasons has this sort of location information for shot attempts:

  • GOALs and SHOTs have (x,y) coordinates and distances from the net.
  • MISSed shots have distances from the net. (I impute these locations based off SHOT locations, conditioning on distance from the net; it’s not perfect but it’s a start.)
  • BLOCKed shots have neither.

We know that different stadiums have biases when it comes to recording missed and blocked shots, and that there are a number of quality issues when it comes to the (x,y) locations of shots, but we do the best we can with what we have. One of the simplest approaches we can take is to bin the data into smaller blocks that are descriptive of general areas but insensitive to small changes. Trying to balance these areas for an equal number of shots is also difficult without making them less comprehensible — there’s a reason we have shorthand terms for the point, the slot and down low, after all — but if the count is reasonably close, we can still get a pretty good picture.

Continue reading

The Road to WAR (for hockey), 2: All Rate Now

Note: This was originally posted at acthomas.ca. Further updates will be posted to this entry.

Summary: In which I introduce the use of multiple factors to produce the rates of events that occur in hockey.

The main purpose and major obstacle of single-number summaries is finding a common currency to relate each event within its proper context. We can only do that with counting measures when we find a way to adjust one unit to another in a sensible way, and the easiest way that I know is to express things in terms of the rates at which events happen. And there are two processes at work in any hockey game: when each team tries to score goals on the other. So if we know the rates at which these events are expected to occur, we have a baseline; better yet, we can see how circumstances like game score, home ice advantage and more tend to affect these rates during the game. That is what I’ll call Point 1:

The relative value of an agent — a team, player, combination of players, or circumstance — is how they change the rate at which events occur, for and against.

This is of course meaningless to our purpose without Point 2:

The only events that matter are predictive or indicative of a goal being scored.

Neither of these should be disputable if what we’re trying to do is predict the outcomes of games, before or during.

Continue reading

The Road to WAR (for hockey), Part 1: The Single-Number Dream

Note: This was originally posted at acthomas.ca. Further updates will be posted to this entry.

I’m extremely heartened by a new-found appreciation for statistical methods in hockey, by teams and fans alike, and somehow gifted with a small amount of time to collect a number of thoughts on the current state of the field. One of the biggest questions along the way is the best way to summarize a player’s accomplishments in one number.

The blogger community has focused on largely descriptive statistical counting measures and fractions that have individual predictive power, but while being less clear about those actual predictions are than I’d like. The statistical community, myself included, have put out plenty of work on how the game might flow, but we haven’t done nearly as well about explaining our methods to a larger audience, or assuming that the tools are there. I’ve tried to do that with nhlscrapr, to make the NHL RTSS data easier to process, and with hextally, to display shot data with respect to league averages, but there’s more I can share that will help to bring the quicker-moving world of bloggers closer to tool-builders like me.

Over this and the next sequence of posts, I’ll lay out my vision for how these pieces can fit together in creating a general formula for Wins Above Replacement in hockey, a single currency to compare all parts of the game, with the best level of data we have available to us at any time. I have a few guidelines I want to follow:

1) This system should be forward-looking; that is, no new information intrinsic to the system should affect our estimates from the past. I want this to be based on a predictive idea so that past performance is indicative of the (immediate) future; my only exception to this would be if we learned of bias in the data which needed to be corrected after the fact.

This limits our use of standard regression tools to disentangle the impact of different factors, if it means that we have to process the entire data set all at once, mixing past and future through matrix inversion. Ain’t gonna happen that way.

2) Every piece should be linearly decomposable into its constituent parts. Part of this is that if one piece can be improved — such as goaltender performance — we can just hot-swap that piece with the improvement.

3) If possible, our methods should depend on generative models so that we can simulate game outcomes as closely as possible. Partly, it validates these statistical methods by letting us see how a game could come about; partly it gives us transparent prediction and forecasting.

4) Relating to the previous point: everything should be validated based on its ability to predict future outcomes on a grander scale. We shall not judge based on eyeball fit but by overall measures of predictive scale.

5) No magic numbers — that is, constants in the equation that appear because we needed something to fit them, the idea that a replacement goaltender’s save percentage should be 0.900 without justfication. Some are unable to be avoided in the short term — like, say, the 10 worst goaltenders in the league are “replacements” — but the more we can justify these through design choices, or fit them explicitly through data, the better. At best, these numbers are placeholders.

6) Numbers should be independent of managerial choices. This is tougher than it sounds, since the best players get the best ice time and scrubs never see the power play, but it’s as much a commitment to form as it is a promise. For example, it’s unfair to judge a pitcher’s ability as lesser because they happen to pitch a scoreless 6th and not a scoreless 9th; the manager made that call.

7) If we can’t see the counterfactual case, we can’t include it. For now, that includes hits and turnovers, because I don’t have enough data to see what would have happened if the player didn’t make the hit or didn’t cough up the puck. (See point 5 — I can’t simulate it.) Maybe with better data, but not today.

With those guidelines set up (and ready to be broken if necessary) it’s a little more straightforward to describe what follows what rules that we’ve seen from previous systems. First, let’s assume that goaltenders are easy: Goals Against Average is a team metric far more than an individual one, and basic save percentage does a decent job. We can (and will) do more later.

Goal Plus-Minus — Various (1950s-present)

This is worth mentioning purely as a baseline value: a player’s goal differential is predictive of their future performance, because it’s the quality that matters the most to the game. Unfortunately it’s a bit akin to the idea that eating fat will make you fat because they’re made of the same stuff; the lack of sample size and the confounding with your linemates are both problems that make this essentially unusable as a true comparative statistic.

Pros:
Easy to calculate.
Minimal amounts of data are needed compared to the other methods.
Easiest appreciation of defensive ability around.

Cons:
Essentially even strength only.
Way too sensitive to team effectiveness.
Doesn’t blend well with shooter statistics (goals and assists).

Bear both these points in mind as we move forward.

Player Contribution — Alan Ryder, 2003

This is the earliest attempt I’ve found when trying to assess total value; it works on the concept of Win Shares, in that players who are responsible for success above the baseline receive credit in terms of “Marginal Goals” which are then converted to wins as a base currency.

Pros:
Was first (and is criminally unknown by the community at large).
Uses commonsense methods for calculating goaltending prowess.
Accounts for penalty-taking and penalty-killings in its assignments in everything piecewise.

Cons:
Uses magic numbers, at least in its original incarnation, particularly for threshold values. Does not adjust for relative skill of players (opponents or teammates).
Uses goals as the currency for players, not expected goals or other goal precursors.

it’s a good starting point, because it breaks everything down into its component pieces. And it certainly has pieces that are worth saving and preserving; once the game is broken down like this, it’s tough to avoid. This leads to…

Goals Versus Threshold (GVT), 2009

A reasonably popular measure used by Hockey Prospectus, this has many of the same elements as PC above, with one exception: regular GVT scores tend to be published. Otherwise, it estimates scoring contributions to each player on the team on ice when goals are scored, gives extra credit to goals and assists, and does not use expected shots or account for teammates. Both of these are mission-specific: the kind of detail each author proposed for these was so that the method could be applied across decades. Which is wonderful and admirable for all us geeks who want a fuller picture of the game over time, but less useful for prediction given what’s collected today.

So that leads to our modern-era predictors, or our best one (or two) shot numbers:

Modern Plus-Minus: Fenwick Close, 2011 (ish)

Take all missed shots, blocked shots and goals for and against a team when a player is on the ice, and only those when the game is “close” (with varying definitions, but essentially within one goal). There are a few more of these around – varieties known under the names Corsi and Fenwick, for their coiners — but this would seem to be the most respected version of those modern pure plus-minus statistics. I include this not only for completeness but to point out that many critics of these methods will falsely suggest that their proponents actually treat it as a one-number be-all and end-all.

Pros:
Captures a lot of scoring-type events that were otherwise ignored.
Removes noisier/known-to-be-biased observations.
Popular.

Cons:
Equates shots and goals as equal contributions.
Ignores shot quality (strongly by choice; many might call this a pro).
Still doesn’t directly account for teammate and opponent collinearity.
Removes potentially useful observations.
Popular with people who call this “possession”, even though it’s a necessary combination of possession and offensive-zone location and a direct goal precursor.

There have definitely been attempts to correct the shortcomings of these numbers, such as linemates and opponents, by collecting the same statistic for said ice-sharers and taking the average weighted by shared ice time for each of these, known as Quality of Teammates and Quality of Competition respectively, but these aren’t integrated with the original statistic, merely used as an “eye test” to gauge whether they are higher or lower.

Regression-Adjusted Plus/Minus – Macdonald, 2010
Total Hockey Rating (THoR) – Schuckers and Curro, 2012
Logistic Event Ratings — Gramacy, Taddy, Jensen, 2013

Here we’re getting into methods that could actually be considered “advanced” since they go beyond simple counting statistics and into a little more theory. Each of these three methods takes an individual event as the outcome of interest and uses linear or logistic regression to establish how each contributing factor — in this case, the players on the ice, and other factors like zone position and (particularly) home ice advantage.

These models produce coefficients for each player or term in the model; Goals above Replacement come from examining the change in probability or expectation across all events and adding up the relative difference.

Pros:
Each rating has teammates and opponents included by default.
Incorporates multiple event types (THoR, at least)

Cons:
Everything has to be done all at once; estimates of ability will change, meaning old estimates change with new data.
Can be computationally expensive; tools are standard but data takes some doing (mainly due to massive sparsity).
Models are not generative.

G-net through the Mean Even-Strength Hazard model — Thomas et al, 2013

My team and I designed our approach to fix some of our dissatisfaction with existing models — essentially all the models we’ve just covered — though we have our own weaknesses as well. I’ll explain in more detail in the next posts how this works, but the premise is that hockey is a game of events on two goals: a strong offensive player raises the rates of their team scoring goals, and a strong defensive player lowers the rates of their team allowing goals.

Pros:
It can be used to actually simulate games, so the terms are directly interpretable.
Directly incorporates teammate and opponent effects, and other changes in the state of the game.
Uses time on ice directly rather than as regression weights.

Cons:
SLOW. It took running overnight on a multicore processor to calculate everything we needed the first time, even though parts of it were written in C.
Built for an academic audience, rather than general interest (befitting our day jobs)
Needs the full slate of RTSS data to work properly.

Another con: my colleagues and I have done a pretty poor job of sharing our results on a rolling basis. That’s something I intend to change with this series.

UPDATE, August 13 2014: Rob Vollman of Hockey Prospectus gives us this list.

Why This Site Isn’t Going Anywhere

Since even mild service disruptions in sites like capgeek and ShiftChart have made people wonder if they’ve disappeared for good, we want to reassure you of a few things:

1) This resource isn’t going to disappear. If we’re down and you don’t know why, it’s technical.

2) We’ve got the source code, and we’re going to share it by season’s beginning, so that if we did disappear, someone else could relaunch with another name. This was part of our promise to our friends out there.

3) Two of us run this site, so that if one of us leaves, the other will continue running it and recruit another partner. (The Sith model.)