WAR On Ice: The Blog » Analytics

GUEST POST: Hockey and Euclid — Introduction to Bombay Ratings

@stat_sam — Mon, 20 Apr 2015 14:12:46 +0000

Note: This is part two of a series of guest posts written by @MannyElk. In the first installment of Hockey and Euclid, Manny outlined the player similarity calculation used in the Similarity Calculator. The explanations to follow will assume knowledge from that article, so we urge anybody who wishes to understand the derivation of the Bombay function in detail to get caught up.

We at www.war-on-ice.com are happy to host Manny’s newest Bombay Ratings App! We continue to encourage others in the hockey research community to follow Manny’s lead and develop public applications that will further the frontiers of research in hockey analytics.

While working on the Similarity Calculator, I stumbled upon a study in which Euclidean distance was used to compare NBA players to Michael Jordan. The author used the distances to generate a list of the most similar players to the man most would agree is the best ever. From this idea, Bombay ratings were just a conceptual hop, skip and jump away. Instead of choosing my own Michael Jordan from a list of historical players, I invented one.

Gordon Bombay (no relation to the legendary coach) played two seasons in the NHL. In his first season, as a forward, he led the league in every single statistical category and eventually his team to the Stanley Cup. Seeking a challenge, Bombay converted to defence in his second season. Undaunted, he repeated his rookie season success, once again unmatched by his peers in every single facet of the game. Bombay promptly retired, and no player since 2005 has been able to surpass his accomplishments.

Bombay’s stats at either position are equal to the best recorded values among regular skaters at that position since the 2005-2006 season. Thus, he possesses the best stats we can imagine without stepping outside the boundaries of what real players have been able to accomplish. If you don’t wish to entertain hypotheticals, consider an alternative explanation: The similarity calculation evaluates “distance” between players, each occupying a position in imaginary space. This space has as many dimensions as there are categories by which you choose to compare players, and the limits of each dimension are set by the maximum and minimum recorded values since 2005-2006. Gordon Bombay is simply a marker we’ve decided to place at the positive-most position in space — the position where the positive extrema of each dimension meet. In a three-dimensional plot, this is simply a corner. The Bombay Rating is the similarity between a player and Gordon Bombay.

Hence, we’ve laid the foundation for a method by which we can easily evaluate how “good” a player’s stats are that is surprisingly flexible and reasonably effective at producing intuitively pleasant results. The Bombay function essentially does what we all do when we pull up a player’s statistics. The advantage is that it’s more precise, quicker, and returns a single number. Recall that the similarity calculation is a function of the chosen dimensions and corresponding weights. It follows that the Bombay is function of those same variables. While this permits fluidity in what can be accomplished by the method, it also makes it entirely dependent on the quality of the measures used.

The Bombay app I developed uses a variety of 5v5 stats to assign ratings to skaters based on the selected weights and generate charts comparing players to Gordon Bombay in each of the chosen categories.

(click to enlarge)

The outer edge of the chart represents a 100% similarity to Bombay in that measure. This is only achieved if a selected player-season possesses the best recorded value in that metric among regular skaters since 2005. The dashed grey polygon represents another fictional player – one whose stats are all equal to the league average for regular skaters at that position. Note that league average does not signify a 50% similarity. At the default weights, this hypothetical average forward has a Bombay rating near 46 and the defenceman, 45.

I should confess that the default weights are largely arbitrary. I believe the correct weights to use are case-dependent, and I certainly encourage users to assign their own. I’ve found that using the “Defence” preset weights as a starting point to evaluate bottom-six or defensively-oriented forwards often produces more agreeable results. Individual season rankings can be viewed by toggling the “Table” tab and further filtered using the inputs at the bottom of each column. Using preset weights, the names atop the Forward rankings (Ovechkin, Sedin, Jagr, Crosby, Zetterberg, Malkin, Sakic) are who you’d expect; Defencemen, to a much lesser extent (Visnovsky, Karlsson, Giordano, Byfuglien, Campbell, Niskanen, Weber). It’s no secret that the evaluation of defencemen, by analytical and traditional methods alike, leaves to be desired at times. With better measures of defensive ability will come better results by this method.

Bombay ratings can easily be computed using aggregate player stats. You can view career Bombays here. While I wouldn’t necessarily trust default Bombays to provide a single number indicative of player quality over metrics like WAR and GvT, I believe the method has very interesting potential and flexibility. For one, it can easily be expanded as new stats become available. Secondly, the same method can be applied in other leagues, namely Canadian Major Junior and college leagues.

Back-to-backs and goalie performance

@acthomasca — Sat, 11 Apr 2015 00:35:26 +0000

I’ve been curious for a while about the impact of rest and travel on goaltending, especially after reading the work of Gus Katsaros and Eric Tulsky, so I re-ran the numbers on save percentages including back to back games, away versus home and with danger zones included. We know that shooting rates for go down and rates against go up, especially for high danger shots, when teams play back to back games; this is enough to make me wonder whether our enhanced database will tell us more with goaltending than we previously knew.

Eric Tulsky previously found that the back-to-back edge was worth a full percentage point in the second half of back to backs, from .912 to 901, using data from the 2011-12 and 2012-13 seasons. Since we now have data at war-on-ice.com with quality goaltending data from 2005-06 until this season (2014-15), it’s worth a fresh look. Here’s the effective difference by season.

The reputation for tired goalies has apparently been made based on the two worst years in our record; in fact, in three other seasons the effective change in save percentage is positive.

Given the additional tools we have at our disposal, let’s break them out and see if they tell us anything new about this. Let’s do it in this sequence using good old logistic regression:

Start with the home advantage, the indicator for the second half of a back to back, and the interaction between the two.
Add in danger zones, since we know this has played a role.
Add score difference, since teams with the lead have higher shooting percentages.
Add in the game state (5v5, PP, SH, 4v4, etc)
Finally, we add in terms for each goaltender in case there are selection effects for which coaches are willing to lean on their number ones.

The negative changes in save “percentage” in thousandths for each factor:

Model	Away Goalie	Back-To-Back (Home)	Back-To-Back (Away)
1	3.3	1.3	3.1
2	1.2	2.1	3.4
3	0.6	1.9	3.4
4	0.1	1.8	2.9
5	-0.03	2.3	3.7

The home advantage on save percentage disappears the more factors we add, and the difference in “tired” performance persists, but only at 3 and a half points below their usual performance, not 11. I was personally expecting the differences to be bigger, and I was also expecting shot danger to play a bigger role than effectively none. Still, while we still don’t have a good idea if it’s there’s greater risk for injury, or other unknown factors, we can be confident that coaches aren’t completely nuts if they send their Number One out back to back.

Replication materials:

GUEST POST: Hockey And Euclid — Calculating Statistical Similarity Between Players

@stat_sam — Sun, 29 Mar 2015 03:30:43 +0000

Editor’s note: This is a guest post written by Emmanuel Perry. Manny recently created a Shiny app for calculating statistical similarities between NHL players using data from www.war-on-ice.com. The app can be found here. You can reach out to Manny on Twitter, @MannyElk.

We encourage others interested in the analysis of hockey data to follow Manny’s lead and create interesting apps for www.war-on-ice.com.

The wheels of this project were set in motion when I began toying around with a number of methods for visualizing hockey players’ stats. One idea that made the cut involved plotting all regular skaters since the 2005-2006 season and separating forwards and defensemen by two measures (typically Rel CF% and P/60 at 5v5). I could then show the position of a particular skater on the graph, and more interestingly, generate a list of the skaters closest to that position. These would be the player’s closest statistical comparables according to the two dimensions chosen. Here’s an example of what that looked like:

(click to enlarge)

The method I used to identify the points closest to a given player’s position was simply to take the shortest distances as calculated by the Pythagorean theorem. This method worked fine for two variables, but the real fun begins when you expand to four or more.

In order to generalize the player similarity calculation for n-dimensional space, we need to work in the Euclidean realm. Euclidean space is an abstraction of the physical space we’re familiar with, and is defined by a set of rules. Abiding by these rules can allow us to derive a function for “distance,” which is analogous to the one used above. In simple terms, we’re calculating the distance between two points in imaginary space, where the n dimensions are given by the measures by which we’ve chosen to compare players. With help from @xtos__ and @IneffectiveMath, I came up with the following distance function:

And Similarity calculation:

In decimal form, Similarity is the distance between the two points in Euclidean n-space divided by the maximum allowable distance for that function, subtracted from one. The expression in the denominator of the Similarity formula is derived from assuming the distance between both points is equal to the difference between the maximum and minimum recorded values for each measure used. The nature of the Similarity equation means that a 98% similarity between players indicates the “distance” between them is 2% of what the maximum allowable distance is.

To understand how large the maximum distance is, imagine two hypothetical player-seasons. The highest recorded values since 2005 for each measure used belong to the first player-season; the lowest recorded values all belong to the second. The distance between these two players is the maximum allowable distance.

Stylistic similarities between players are not directly taken into account, but can be implicit in the players’ statistics. Contextual factors such as strength of team/teammates and other usage indicators can be included in the similarity calculation, but are given zero weight in the default calculation. In addition, the role played by luck is ignored.

The Statistical Similarity Calculator uses this calculation to return a list of the closest comparables to a given player-season, given some weights assigned to a set of statistical measures. It should be noted that the app will never return a player-season belonging to the chosen player, except of course the top row for comparison’s sake.

(click to enlarge)

Under “Summary,” you will find a second table displaying the chosen player’s stats, the average stats for the n closest comparables, and the difference between them.

(click to enlarge)

This tool can be used to compare the deployment and usage between players who achieved similar production, or the difference between a player’s possession stats and those of others who played in similar situations. You may also find use in evaluating the average salary earned by players who statistically resemble another. I’ll continue to look for new ways to use this tool, and I hope you will as well.

** Many thanks to Andrew, Sam, and Alexandra of WAR On Ice for their help, their data, and their willingness to host the app on their site. **

The Road To WAR, Part 8: Penalties Taken And Drawn

@acthomasca — Fri, 27 Mar 2015 04:00:16 +0000

Note: This is a quick detour from the original plan, but it illustrates one of the most apparent difficulties that we’re facing in this task: the changing nature of data over time. Plus, it’s quicker than the others.

How valuable is a penalty drawn or taken to a team? In goals, the marginal effect is clear: you get up to 2 minutes during which your scoring rate for goes up and the rate against goes down. And if it’s your best penalty killers who are penalized, they don’t get to help clean up the mess they’ve made in the process.

The secondary effects are less clear. For example, what changes in terms of a team’s future effort when a player takes an ill-advised penalty? We’re not in a position to answer this when it comes to the share of responsibility to the penalty taker; we can only assess a team’s performance during those times.

And so, for the time being we’re left with the credit and blame for the penalty taker and drawer in terms of an expected goals measure. To get goals above replacement, we need to know the rate at which a replacement player at each position would take or draw penalties — aside from misconducts and matching fighting majors — so we do this in the same type of method as with faceoffs and the Poor Man’s Replacement method:

Pick a threshold below which we consider a player to be replacement level. For this demo we consider this to be three full games, or 180 minutes of ice time.
Establish placeholders for forwards and defensemen alike.
Fit a model to establish the most likely rate at which each player (including the replacements) takes and draws penalties. We use a Poisson model for the rate with regression toward the mean for the group.

The results for the 10 seasons since 2005 are below. Note that we do not have penalties drawn in the 2005-06 and 2006-07 seasons.

The “replacement” rate for taking penalties for forwards and defensemen is higher than the league average. When it comes to penalties drawn, forwards draw penalties at a greater rate than defensemen, which is to be expected on scoring plays; replacement rate at each position is roughly the same as the league average otherwise. This suggests that if drawing penalties is a skill, it’s exceptionally rare, whereas general discipline to avoid taking penalties is clearly a behaviour seen in full-time players.

Now it’s simple enough to get the number of penalties drawn and taken by replacement players at each position, and subtract this from their actual results. The final table is available in full here.

We convert to goals with an approximation: A team on the powerplay scores at a clip of roughly 6.5 goals/60 and allows 0.78 shorthanded goals/60. We move each of those rates from a 5v5 rate of 2.5 goals per 60 minutes, and assume that 20 percent of powerplays end in goals, for an average of 1.8 minutes on the PP, and reach an average figure of 0.17 net goals per penalty taken or drawn. For now we use the relation that 6 goals equals one win.

The champion in total penalty WAR in total volume in the last 10 seasons is Dustin Brown, and it’s not even close: 8.47 wins above replacement for Brown in that time. Per 60 minutes, though, he’s the third ranked player in the top 50 over that time; Nazem Kadri and Darren Helm take the 1 and 2 spots.

The special prize here goes to Patrick Kaleta of the Buffalo Sabres, who has a penalties drawn rate well above the average and in the number one spot for the top 200. We know this about him already but it helps his case that he has a penalties-taken rate that isn’t as bad as a replacement player and gives him an extra boost.

Links:

R code and data for replication.
Seasonal and overall result tables.

The Road to WAR, Part 7: What do we mean by “replacement”? A case study with faceoffs

@acthomasca — Mon, 23 Mar 2015 14:01:59 +0000

It’s been a busy time here at war-on-ice.com, and we haven’t had as much time to do anything with regard to our stated primary mission — the creation of an all-inclusive Wins Above Replacement measure. So it’s about time we went back to our roots and provided a coherent framework on which we can move forward.

In the next week we’ll be releasing our proposed three main elements from which we can derive WAR using the data we have, in what we feel is the ascending order of importance: faceoffs, shooting/goaltending success, and shot attempt rates.

For each process, the pathway we’re laying out to establish value sounds straightforward:

Measure the relative value of a particular skill or event in the game.
Establish what a replacement player would have done in this place according to a standard rule.
Convert this value to goals.
Convert goals to wins, which is a measure that can change from season to season.

We’ve been talking about parts 1, 3 and 4 in previous entries in this series, and we will continue to do so in the parts to come. But we need to establish what “replacement” means, because there are two important qualities we need to factor in.

First, there’s the standard definition: a level of performance against which we judge everyone else, under the assumption that it’s the level of skill that a team could purchase at the league minimum price. This is fairly clear-cut in most examples in, say, baseball: for every position, there’s a different baseline expected level of performance, and the average can be calculated at each position by that standard; replacement level can then be calculated relative to the average. A shortstop that hits 20 home runs in a season is more valuable than a first baseman with the same numbers, because “replacement-level” shortstops will tend to have less power.

But a benchmark for performance isn’t sufficient here. When we measure team achievement, we simultaneously adjust for the strengths of their opponents to get a more precise estimate. To do the same thing for player-player interactions, we have to adjust for player strengths, but since estimates for replacement players are inherently unstable — there’s so little data on each player, almost by definition — it helps us even more to have a single standard for each type of replacement player to ensure that our adjustments are accurate.

One standard definition that I like for replacement players is based on the total number of regular players in the league, like in the Baumer-Jensen-Matthews OpenWAR method: for 30 teams with 25 regular players, any player beyond the original 750 can be considered “replacement”. This makes sense if players have only one or two roles, like fielding and batting. Where this differs in hockey is that a replacement player at even strength would come from the minor leagues, but a replacement player on the power play might be a regular roster player promoted from the third line, and so establishing an exact count of players in those other roles may prove more difficult.

For this reason, let’s test out what I’m calling the poor man’s replacement: for the statistic in question, set a threshold value for which all players under get pooled together as the canonical “replacement” player. Let’s test this with a standard model for faceoff ability. A modified Bradley-Terry model can be built into a logistic regression model; for every faceoff between players A and B, when player A is on the home team, we get the model

log (Pr(Player A wins) / Pr(Player B wins)) = home_bonus + R_A – R_B

In this case there are two classes of replacement: centers, for whom faceoff skill is expected as part of the job, and non-centers, typically brought in as a second choice for the faceoff following the designated taker being tossed. [UPDATE, 2015-03-23: The current value being used for a replacement threshold is if a player takes fewer than 50 faceoffs in a season.] We fit this model for the home-ice advantage, every non-replacement player, the replacement center and the replacement non-center, for all 12 seasons in the war-on-ice.com database — this is 889,733 faceoffs total as of Saturday March 21, 2015. The results are fairly consistent across all 12 years for these main factors:

We now have terms for each player in each season that estimates their individual strength over all events. This calibrates the model for the next step: calculate the number of faceoffs that the appropriate replacement player would be expected to win against the same opponents (and home vs away); an average player would only beat Patrice Bergeron 40% of the time, or a non-centered replacement over 60% of the time, and a replacement center would be less likely on either (roughly 32% and 53% each). We then calculate Faceoff Wins Above Replacement by subtracting this number from their actual win total.

Converting Faceoff Wins to Goals

Now that we have a measure of faceoffs won against replacement, the simplest conversion into goals is to take a direct conversion factor from one to the other. My earlier estimate from a small, biased sample of college hockey gives about 0.015 goals per faceoff won; Schuckers’ group used a similar method to calculate NP20, and their estimate is 1 goal per 76.5 faceoffs won, or 0.013 G/W. We can narrow this estimate by location and man-situation if necessary using similar adjustment factors, though we prefer to treat all faceoffs as fungible so that coach’s usage, or differences in team scoring abilities, do not play a direct role.

Overall Results

We link below to the season-by-season and total results in their own Google spreadsheets; they’ll be added to the main site after this alpha period. The top producer above replacement is Patrice Bergeron, with Joe Thornton close behind; more interesting to me is that Rod Brind’Amour performed nearly as well in only 7 seasons of our data that the others did in 11 or 12. Brind’Amour also owns the best season on record with 6.64 goals above replacement in 2005-06. That’s probably being driven in part by the low level of replacement quality in that year, but it would likely still stand up due to the sheer number of faceoffs he took.

It should be pointed out that we didn’t necessarily need the more elaborate Bradley-Terry model; just using the Poor Man’s Replacement with individual win-loss records and home/away counts yields nearly the same results. But it’s just as important to show that the slightly more complex method will work in new situations without making too many unusual assumptions.

Links

R Code to run the method
Pre-processed faceoff data (in RData format)
Season-by-season results
Aggregated results

Sam’s Zone Transition Time Paper

@stat_sam — Mon, 23 Mar 2015 00:15:46 +0000

In November, I introduced a preliminary version of my work on “Zone Transition Times” (ZTTs) at the Pittsburgh Hockey Analytics Workshop. The slides for and video of my presentation can be found here.

In December, I submitted a paper on ZTTs to the Sloan Sports Analytics Conference; the paper was not selected as a finalist in the research paper competition. A slightly modified version of this paper can be found here. The results and text are unchanged from the December submission, except for minor typos.

Since then, I’ve identified some flaws with this work that I didn’t (have time to) explore in November/December:

While there are modest in-season correlations between different team ZTTs and future positive outcomes (e.g. Corsi%, Goals-For%, GoalsFor/60), these correlations are lower than those of more established metrics like Corsi, Fenwick, and scoring chances.
Team ZTTs are only moderately repeatable across seasons. The correlations between previous-season-ZTTs and next-season-ZTTs are positive but usually less than 0.5. This varies substantially depending on which ZTTs are being examined. For example, fast transitions out of the team’s defensive zone (“good puck moving”) are typically more repeatable than are slow transitions out of the offensive zone (“good forecheck”). The repeatability of ZTTs is lower than that of more established metrics for team evaluation.
While we can calculate ZTTs for players, there just isn’t enough data to come to any reasonable conclusions about which players have better/worse ZTTs. That is, the standard errors on player ZTTs are very high, so that differences in players’ ZTTs are almost never statistically significant within a given season.

Thanks to everyone who gave feedback on earlier versions of this work. Please feel free to share your own feedback with me on Twitter. I hope to revisit this in the summer, or when player tracking data allows us greater precision to evaluate players and teams with ZTTs.

Finally, I’d like to conclude this post with a comment on null results in quantitative research. For those unfamiliar with scientific jargon, the term “null result” is typically associated with completed scientific studies that are “unsuccessful” in proving a hypothesized claim (statistically, studies where there is not enough evidence to reject the null hypothesis).

I’m not sure I would call my findings with ZTT a “null result,” but the results I did find were not as grandiose or game-changing as I had originally hoped they would be; they could best be described as “weak.” I’m sure that others have experienced similar results when trying to further research into hockey analytics and other fields. For those people, I encourage you to publish your null/weak results! They are interesting on their own.

For example, if I found that a team’s ability to quickly transition the puck out of their defensive zone had absolutely no effect on that team’s ability to suppress goals or shots in the future, would you think that was interesting? If I found that a team’s ability to keep the puck in the offensive zone for longer periods of time had no effect on a team’s ability to score in the future, would you think that was interesting? I would. And if I was someone interested in exploring this topic in the future, I’d want to know what previous researchers have found.

Publish all of your results, regardless of how “strong” or “weak” they are. It can only serve to benefit the research community by putting this information out there.

Predictability Differences for Forwards and Defensemen

@stat_sam — Fri, 09 Jan 2015 22:18:31 +0000

Note: If you haven’t yet read our post on how SCF% better predicts future GF% than does CF%, we recommend reading that first. The definitions of metrics, data used, and methodology used in this post is the same as what is written here, so we refer interested readers to there for more info.

Summary: This is an update to our previous post on the best metrics for predicting player performance. Here, we split the analysis out by position (forwards vs. defensemen).

In this analysis, using all data going back to the 2005-06 NHL season:

For forwards, SCF% is the best predictor of future GF% of the metrics we tested
For forwards, CF% is a better predictor of future GF% than is FF%, but for defensemen, the opposite is true.
For defensemen, FF% is the best predictor of future GF% (with SCF% finishing a close second) of the metrics we tested.
SCF% is a much better predictor of future GF% for forwards than it is for defensemen.
In general, future GF% is more accurately predicted for forwards than it is for defensemen.

Glossary: See here.

Data: See here, and add FF% = Fenwick For% = percentage of unblocked shot attempts directed at the opposing goal when a player is on the ice.

Methods: See here, and add FF% to the list of metrics evaluated. (Note: In our original analysis, we also evaluated FF%, but it was found to be worse than CF% and SCF%, so we did not include it in our post.

Results — Past-vs-Future Correlations:

For forwards, SCF% had the highest past-vs-future correlation with future GF% across all seasons in this analysis. Here are the results (in order of correlation magnitude):

cor(past SCF%, future GF%) = 0.348
cor(past CF%, future GF%) = 0.332
cor(past GF%, future GF%) = 0.331
cor(past FF%, future GF%) = 0.316

For defensemen, FF% had the highest past-vs-future correlation with future GF% across all seasons (with SCF% finishing a close second) in this analysis. Here are the results (in order of correlation magnitude):

cor(past FF%, future GF%) = 0.285
cor(past SCF%, future GF%) = 0.282
cor(past CF%, future GF%) = 0.277
cor(past GF%, future GF%) = 0.198

Results — Future | Past Regression Models (all seasons):

For forwards, we used past SCF% and past CF% as explanatory variables in a multiple linear regression model of future GF% across all seasons. The results of this model are summarized here:

Past CF%: Coefficient = 0.2063, p-value = 0.0223. Note, this coefficient is similar to what was found in the all-positions regression from our previous post (0.2227).
Past SCF%: Coefficient = 0.4856, p-value < 0.0000001. Note, this coefficient is increased from what was found in the all-positions regression from our previous post (0.4067).
Both SCF% and CF% have statistically significant associations with future GF%.
Interpretation: For every one-percentage-point increase in SCF%, future GF% is expected to rise by about 0.49 percentage points, holding all other variables constant.
Interpretation: For every one-percentage-point increase in CF%, future GF% is expected to rise by about 0.21 percentage points, holding all other variables constant.

Interestingly, the magnitude of the SCF% coefficient increased for forwards, indicating that SCF% is a better predictor of future GF% for forwards than it is for defensemen in this analysis.

Note that we also repeated this analysis using FF% instead of CF%, but FF% was found to have an insignificant effect on future GF% when accounting for SCF% in the model (results not shown). This is very interesting: CF% was found to be significant even after accounting for SCF%, while FF% was not. This may indicate that the additional information included in CF% — blocked shots, for and against — is driving some of the metric’s predictability of future GF%. Our hypothesis is that blocked shots against (i.e. shot attempts taken by the opposition at the player’s goal) are driving the effect here, since forwards do a lot of shot-blocking at the points in the defensive zone.

For defensemen, we first used past SCF%, past CF%, and past FF% as explanatory variables in a multiple linear regression model of future GF% across all seasons. Since CF% and FF% are highly collinear, we opted to do two separate two-explanatory-variable regressions and examine the results: (1), future GF% given past SCF% and past CF%, and (2), future GF% given past SCF% and past FF%.

The results of the first model, which models future GF% given past SCF% and past CF%, are summarized here:

Past CF%: Coefficient = 0.2230, p-value = 0.02705. Note, this coefficient is almost identical to what was found in the all-positions regression from our previous post (0.2227).
Past SCF%: Coefficient = 0.3114, p-value = 0.00282. Note, this coefficient is decreased from what was found in the all-positions regression from our previous post (0.4067).
Both SCF% and CF% have statistically significant associations with future GF%.
Interpretation: For every one-percentage-point increase in SCF%, future GF% is expected to rise by about 0.31 percentage points, holding all other variables constant.
Interpretation: For every one-percentage-point increase in CF%, future GF% is expected to rise by about 0.22 percentage points, holding all other variables constant.

In other words, SCF% has a more substantial effect* on future GF% than does CF% in this analysis.

The results of the second regression model, which models future GF% given past SCF% and past FF%, are summarized here:

Past FF%: Coefficient = 0.2993, p-value = 0.00417
Past SCF%: Coefficient = 0.2495, p-value = 0.01713. Note, this coefficient is decreased from what was found in the all-positions regression from our previous post (0.4067).
Both SCF% and FF% have statistically significant associations with future GF%.
Interpretation: For every one-percentage-point increase in SCF%, future GF% is expected to rise by about 0.25 percentage points, holding all other variables constant.
Interpretation: For every one-percentage-point increase in CF%, future GF% is expected to rise by about 0.30 percentage points, holding all other variables constant.

Interestingly, SCF% is less predictive of future GF% for defensemen than it is for forwards, and FF% is the superior predictor of future GF% for defensemen by a small margin in this analysis.

Season-to-Season Past-vs-Future Correlation Plot, Forwards:

Season-to-season, the metric with the highest past-vs-future correlation for forwards varies. Recall that across all seasons, SCF% has the highest past-vs-future correlation. This seems to be backed up by the graph, where SCF% is highest by a relatively large margin in 4 of 9 seasons.

Season-to-Season Past-vs-Future Correlation Plot, Defensemen:

Season-to-season, the metric with the highest past-vs-future correlation for defensemen varies quite a bit. Recall, though, that across all seasons, FF% has the highest past-vs-future correlation. This seems to be backed up by the graph, where FF% appears to be a bit more consistent from season to season than other metrics.

Notes:

*Since SCF%, CF%, and FF% can all be approximated with a Normal(mean = 50%, standard deviation = 10%) distribution — that is, they are on the same scale — we can directly compare the magnitude of their regression coefficients.

Similar to our last post, one thing we found interesting is that in these multiple linear regression models across all seasons, both SCF% and CF% / FF% (whichever was used) had significant coefficients. In these analyses, it’s common for only one independent variable to explain most of the variance in the dependent variable (due to collinearity). The fact that both are significant indicates that SCF% and CF% / FF% are accounting for (at least slightly) different parts of the variance in future GF%. Predictive models of player performance would do well to include both metrics, regardless of player position.

Is future GF% the metric we should be using to evaluate forwards? Defensemen? If not, what should we use? We used future GF% since that seems to be the standard in predicting future player performance. That said, we’re open to suggestions here, and we’ll happily update our analyses depending on what the community thinks.

We removed Brett Burns and Dustin Byfuglien from these analyses, since their positions changed from season to season. If there are other players who we should remove, please let us know!

NEW: Annual salary/compensation data for skaters

@acthomasca — Wed, 07 Jan 2015 14:12:00 +0000

When CapGeek founder Matthew Wuest announced on Saturday that he was shutting the site down for personal health reasons, we were doubly saddened, for the well-being of an important member of the community but also for the loss of a stellar resource used by many. I was particularly in awe at the reach of the data he found — it’s one thing to work with public sources, but the dedication to finding what he had, or working to build a position where it comes to you, is outstanding. We also deeply respect his privacy as he goes through this tough time and we hope that he’ll be able to resume doing the things he loves (whether or not CapGeek was one of them.)

Needless to say, we won’t be replicating his massive efforts any time soon.

What we do have is access to public data on annual compensation, from past USA Today records and current NHLPA postings, dating across our database from 2002 to the present. After cleaning and matching, we’ve added it to the Goaltender History, Skater History and Skater Comparison apps when individual season data is present. (Goaltender Comparisons will be added soon.)

This has total compensation in salary and bonuses by year; it is not adjusted for inflation or cap share. We see it as a stopgap first and a starting point second to have deeper discussions about what users want that we can provide.

Better Than Corsi: Scoring Chances More Accurately Predict Future Goals For Players

@stat_sam — Tue, 06 Jan 2015 23:07:01 +0000

Summary:

Since the 2005-06 NHL season, the percentage of Scoring Chances For (SCF%) is a better predictor of future Goals For (GF%) than Corsi For (CF%) is for individual players. (Under specific conditions, of course, but it’s promising either way.)

Combining data across all seasons since 2005-06, the season-to-season correlations are:

cor(Past SCF%, Future GF%) = 0.322
cor(Past CF%, Future GF%) = 0.311
cor(Past GF%, Future GF%) = 0.287

Using multiple linear regression and combining across all seasons, we find that:

For every one-percentage-point increase in SCF%, future GF% is expected to rise by about 0.41 percentage points, holding all other variables constant.
For every one-percentage-point increase in CF%, future GF% is expected to rise by about 0.22 percentage points, holding all other variables constant.

Glossary:

SCF%: The percent of all on-ice “scoring chances” that were for a player’s team.
Scoring chances: See our post defining this.
GF%: The percentage of all on-ice goals that were scored by the player’s team.
CF%: The percentage of all on-ice shot attempts (on goal, missed, or blocked) that were taken by the player’s team.
SA: “Score-adjusted” — these statistics adjusted for score situation, home/away advantage, and rink scorer bias.

Our definition of what constitutes a “scoring chance” came about through discussions with people in the community first, because it’s a fairly subjective term. It’s clear that the community wants something with more definition than those based on distance, like our three danger zones, that captures something else about both scoring probabilities and in-game opportunities. Here we show that this definition has predictive advantages over other commonly used measures, even in their score-adjusted states.

Data:

This war-on-ice.com table, which has all the data we need.
Why start in 2005-06? Because before that, data on shot location and missed/blocked shots were not collected, rendering our definition of scoring chances (see above) useless.
Why use score-adjusted measures? Because these were found to increase out-of-sample predictive accuracy (results not shown here). Score-adjusted measures are used for SCF%, GF%, and CF%, ensuring a fair comparison of these metrics.
Why “divide data by season”? Because the goal of this post is to determine which metric is best at predicting future outcomes. In other words, we’ll use players’ SCF%, GF%, and CF% data from one season to predict their GF% in the following season. Note: We also plan to do “in-season” predictions as well, similar to what Micah Blake McCurdy did here.
Why require at least 500 minutes of time-on-ice per player? Because players who don’t play much in one season are more likely to have skewed metrics that aren’t representative of their true ability. In other words, because small samples. Note: A better approach for this would be to use regularization in a formal statistical model, but in the interest of laziness, using a min-TOI will do.
Why use both home and away data? Because any home advantages and rink count biases are taken into account using our implementation of score-adjusted measures.

Methods:

Using the data described above, we iterated through each season, finding players who played at least 500 minutes in that season and the following season. This gave us a matrix that looks something like this (except, for all players, not just the handful listed here):

Name past CF% past GF% past SCF% future GF%
Jake.Muzzin 61.8 57.7 60.5 51.2
Marc-Edouard.Vlasic 59.6 60.0 62.5 57.8
Drew.Doughty 59.3 58.5 57.0 54.4
Justin.Williams 61.4 58.1 60.2 55.6
Brent.Seabrook 58.2 56.2 58.2 56.4
Duncan.Keith 57.9 56.7 58.2 64.7

Then, we did two things:

Past-vs-Future Correlations: For each season, we found the correlation between the following season’s GF% and each of the current season’s SCF%, CF%, and GF%. (Actually, we did it for Fenwick% and a bunch of other metrics too, but these were not as good at predicting future GF%.)
Future | Past Regression Models: We used the current season’s SCF%, CF%, and GF% as explanatory/predictor variables and next season’s GF% as the response variable. Note: Even though 0 < GF% < 1, we used linear regression, since the GF%s are roughly distributed as Normal(mean = 50, standard deviation = 10). In a more formal analysis, we might instead use something like Beta regression.

Results — Past-vs-Future Correlations:

Above is a graph of the past-vs-future correlations of each metric with GF% over time. From this, we make the following observations:

In 6 of 9 seasons, SCF% (blue) has a higher past-vs-future correlation with GF% than does CF% (red).
In 2 of 9 seasons, CF% has a higher past-vs-future correlation with GF% than does SCF%.
In 1 of 9 seasons, the two are roughly equal.
Immediately after the lockout, past GF% was a great predictor of future GF%. We’re not sure why this is the case, but we’re open to others’ explanations!
The 2010-11 to 2011-12 season transition was strange. The predictive accuracy of all three metrics is reduced, most substantially that of SCF%. We’re not sure what happened here, but we’re again open to others’ explanations.

As mentioned in the intro, these results hold when combining data across all seasons since 2005-06, where the season-to-season correlations are:

cor(Past SCF%, Future GF%) = 0.322
cor(Past CF%, Future GF%) = 0.311
cor(Past GF%, Future GF%) = 0.287

In other words, combining across all seasons, SCF% is more highly correlated with future GF% than is CF%.

Results — Future | Past Regression Models:

First, we used past SCF% and past CF% as explanatory variables in separate, univariate linear regression models of future GF% (one model for each season and explanatory variable). Not surprisingly, these results were nearly identical to the past-vs-future correlations:

In 6 of 9 seasons, SCF% has a lower p-value than does CF% (both coefficients are positive).
In 2 of 9 seasons, CF% has a lower p-value than does SCF% (both coefficients are positive).

Second, we used both past SCF% and past CF% as explanatory variables in a multiple linear regression model of future GF% across all seasons. The results of this regression are summarized here:

Past CF%: Coefficient = 0.22271, p-value = 0.00101
Past SCF%: Coefficient = 0.40665, p-value < 0.00000001
This means that both SCF% and CF% have statistically significant associations with future GF%.
Interpretation: For every one-percentage-point increase in SCF%, future GF% is expected to rise by about 0.41 percentage points, holding all other variables constant.
Interpretation: For every one-percentage-point increase in CF%, future GF% is expected to rise by about 0.22 percentage points, holding all other variables constant.

In other words, SCF% has a more substantial effect* on future GF% than does CF%.

*Note that since SCF% and CF% can both be approximated with a Normal(mean = 50%, standard deviation = 10%) distribution — that is, they are on the same scale — we can directly compare the magnitude of their regression coefficients.

Author’s note: One thing we found interesting is that in the across-season regression, both CF% and SCF% had significant coefficients. In these analyses, it’s common for only one independent variable to explain most of the variance in the dependent variable (due to collinearity). The fact that both are significant indicates that CF% and SCF% are accounting for (at least slightly) different parts of the variance in future GF%. Predictive models of player performance would do well to include both metrics.

Future Work:

Cross-validation
Within-season predictions
Repeat for teams

Appendix:

Below is the R script used for this analysis. Feel free to try it out and add your own analyses.

nhl-past-vs-future

The Road To WAR, Part 6: Rate-Based Event Adjustments For Score Effects, Home Advantage and Event Count Bias

@acthomasca — Sat, 27 Dec 2014 21:54:19 +0000

Summary: We get back on the Road to WAR and address the adjustments for circumstances that others have used. We blend them back into our preferred rate-based method.

In the Road to WAR series (which I’m delighted to return to) we’ve been using event rates as a basis for how we model hockey. One of its strongest points is that it works at multiple scales; from a single event in a game to multiple seasons, we can quickly calculate both expected values and variances once we know how the rates should be altered, up or down, so that our comparisons take sample size into account.

The strongest reason I prefer to use the rates approach? Not only is it a reasonable model for how the game goes, it lets us add different combinations of effects, observations and quirks simultaneously while judging their impact, separately or cumulatively. And we’ve got a few of them that are getting people’s attention that are basically together:

Home ice (dis?)advantage — systematic differences between home and away performance. (Older than dirt, so no references provided.)
Score effects — in game, team behaviour changes based on whether the team is leading or trailing, and proposed corrections tended to take one of two forms: the approach credited to Eric Tulsky reweights a simple average according to precalculated baseline rates, and the approach from Micah Blake McCurdy reweighs the events themselves based on said underlying rates. (In brief, teams that are trailing tend to shoot more and their quality of shot is worse; those in the lead shoot less but better.)
Rink count bias — the official scorers in each building can be externally inconsistent with each other on what constitutes a shot attempt on goal. (The counts get worse with hits and turnovers, but I’m not considering them in this piece.) A recent piece by Michael Schuckers and Brian Macdonald models the inconsistencies by rink using a linear model form and Elastic Net shrinkage, to identify which rinks have count rates that are statistically different from the league mean.

The Schuckers-Macdonald approach takes care of most of this in their linear model, and we couldn’t be happier for their presentation. After hearing and sharing plenty of complaints about the quality of event counts in NHL games (present company included), the authors (who are also our friends) put together a model for capturing how much overcount or undercount happens in each arena. Some are more severe than the others, and you should read the paper to see their approach. What I personally found the most reassuring was that “homer” bias is extremely rare; count bias very rarely favors one team or the other. (For now, we’re not adding “homer” bias to the model, but it’s easily included.)

We wanted to replicate this result for the site while trying it with a slightly different model, partly out of personal preference, partly just to see if things would change under a different specification. (According to Brian, they didn’t, noticeably.) We’re using a Poisson multiplicative linear model where

Expected (Number of events recorded over a time interval) = Time Elapsed * Event Rate

with the base event rate multiplied by each factor. Each row of the data table corresponds to a team’s performance in a single game at a known game state; for example, in the first game of the season, Toronto faced Montreal at home and recorded 7 shots in the first period, at five-on-five, with the score tied. There are many rows for each game corresponding to each observed state; we model the events for explicitly and the events against implicitly (automatically, since each team has its own rows.)

In logarithmic form, everything is additive in the specification of the expected rate:

Events
log(Event Rate)	=	log(Base Event Rate) + indicator(home/away)
	+	factor(score differential and period of game)
	+	factor(home rink)
	+	factor(the event team "for")
	+	factor(the opposing team "against")

The “for” and “against” terms are to filter out the fact that, shockingly, teams have different underlying ability. We estimate this model for every season from 2005-2006 until the present, separately for blocked, missed and saved shots, and separately for 5v5, power play and 4v4. And since the rink count bias is expected to persist from year to year, we estimate a linear model where one season’s rink bias predicts the next and attenuate each bias by the observed annual correlation. If the estimate from one year was uncorrelated to that from the next, it would be far more likely that this measurement was noise — for blocked and missed shots, the average measured year-to-year correlation was around 0.8, while for saved shots it was closer to 0.5.

(Note: we repeated this procedure by dividing events by “danger zone” since we expected the event count to be more biased the longer the shot, since we though this would be where judgments are less than clear. Surprisingly, there was comparable non-negligible bias at each level.)

While we have included score differential in factorial form before, the note that so-called score effects have a temporal dependence came to my attention from Fangda Li, so we added period-specific effects to the corrections for saved shots, missed and blocked shots independently. And there’s good reason to believe that these effects aren’t necessarily the same. Here’s the extract of the score/period table averaged over all seasons since 2005-2006 (updated from first posting):

Score Differential	Period	Shot Multiplier	Miss Multiplier	Block Multiplier
Trail 3+	1	1.06	1.06	1.09
Trail 3+	2	1.15	1.11	1.2
Trail 3+	3	1.13	1.12	1.16
Trail 2	1	1.02	1	1.11
Trail 2	2	1.12	1.13	1.2
Trail 2	3	1.14	1.15	1.23
Trail 1	1	1.03	1.02	1.09
Trail 1	2	1.12	1.11	1.16
Trail 1	3	1.1	1.11	1.23
Tied	1	1.03	1.01	1.04
Tied	2	1.08	1.08	1.07
Tied	3	0.99	1.01	1.01
Lead 1	1	0.94	0.94	0.94
Lead 1	2	1.02	1.01	0.99
Lead 1	3	0.85	0.84	0.74
Lead 2	1	0.9	0.89	0.9
Lead 2	2	0.98	0.98	0.92
Lead 2	3	0.82	0.82	0.7
Lead 3+	1	0.86	0.92	0.86
Lead 3+	2	0.93	0.94	0.85
Lead 3+	3	0.81	0.81	0.67

When splitting by period, two things are clear: first, the persistent effects of score are largely independent of period, and not linear by score. Second, the drop-off when leading is clearest in the third period, but only when leading.

Implementation

Now that the adjustment factors have been calculated, we can take the McCurdy interpretation and divide the value of each individual event by the “inflation” factor: Corsi events when trailing, at home or in overcounting buildings will be reduced in value to the count, relative to the global rate, and those when leading, on the road and with undercounts will be elevated. It may be uneasy to consider 0.9 blocked shots, but we already do this when talking about event rates per 60 minutes and in percentage form.

These adjustments can be done one at a time in the Team Comparison app. We’ll be rolling them out to the other apps as soon as we shake out the initial issues.

Links

Final results and script: all-rink-adjustments.csv.zip
Data, which comes from R: all-team-seasons-20052006.RData (figure out the other seasons for yourself)