We at www.war-on-ice.com are happy to host Manny’s newest Bombay Ratings App! We continue to encourage others in the hockey research community to follow Manny’s lead and develop public applications that will further the frontiers of research in hockey analytics.
While working on the Similarity Calculator, I stumbled upon a study in which Euclidean distance was used to compare NBA players to Michael Jordan. The author used the distances to generate a list of the most similar players to the man most would agree is the best ever. From this idea, Bombay ratings were just a conceptual hop, skip and jump away. Instead of choosing my own Michael Jordan from a list of historical players, I invented one.
Gordon Bombay (no relation to the legendary coach) played two seasons in the NHL. In his first season, as a forward, he led the league in every single statistical category and eventually his team to the Stanley Cup. Seeking a challenge, Bombay converted to defence in his second season. Undaunted, he repeated his rookie season success, once again unmatched by his peers in every single facet of the game. Bombay promptly retired, and no player since 2005 has been able to surpass his accomplishments.
Bombay’s stats at either position are equal to the best recorded values among regular skaters at that position since the 2005-2006 season. Thus, he possesses the best stats we can imagine without stepping outside the boundaries of what real players have been able to accomplish. If you don’t wish to entertain hypotheticals, consider an alternative explanation: The similarity calculation evaluates “distance” between players, each occupying a position in imaginary space. This space has as many dimensions as there are categories by which you choose to compare players, and the limits of each dimension are set by the maximum and minimum recorded values since 2005-2006. Gordon Bombay is simply a marker we’ve decided to place at the positive-most position in space — the position where the positive extrema of each dimension meet. In a three-dimensional plot, this is simply a corner. The Bombay Rating is the similarity between a player and Gordon Bombay.
Hence, we’ve laid the foundation for a method by which we can easily evaluate how “good” a player’s stats are that is surprisingly flexible and reasonably effective at producing intuitively pleasant results. The Bombay function essentially does what we all do when we pull up a player’s statistics. The advantage is that it’s more precise, quicker, and returns a single number. Recall that the similarity calculation is a function of the chosen dimensions and corresponding weights. It follows that the Bombay is function of those same variables. While this permits fluidity in what can be accomplished by the method, it also makes it entirely dependent on the quality of the measures used.
The Bombay app I developed uses a variety of 5v5 stats to assign ratings to skaters based on the selected weights and generate charts comparing players to Gordon Bombay in each of the chosen categories.
(click to enlarge)
The outer edge of the chart represents a 100% similarity to Bombay in that measure. This is only achieved if a selected player-season possesses the best recorded value in that metric among regular skaters since 2005. The dashed grey polygon represents another fictional player – one whose stats are all equal to the league average for regular skaters at that position. Note that league average does not signify a 50% similarity. At the default weights, this hypothetical average forward has a Bombay rating near 46 and the defenceman, 45.
I should confess that the default weights are largely arbitrary. I believe the correct weights to use are case-dependent, and I certainly encourage users to assign their own. I’ve found that using the “Defence” preset weights as a starting point to evaluate bottom-six or defensively-oriented forwards often produces more agreeable results. Individual season rankings can be viewed by toggling the “Table” tab and further filtered using the inputs at the bottom of each column. Using preset weights, the names atop the Forward rankings (Ovechkin, Sedin, Jagr, Crosby, Zetterberg, Malkin, Sakic) are who you’d expect; Defencemen, to a much lesser extent (Visnovsky, Karlsson, Giordano, Byfuglien, Campbell, Niskanen, Weber). It’s no secret that the evaluation of defencemen, by analytical and traditional methods alike, leaves to be desired at times. With better measures of defensive ability will come better results by this method.
Bombay ratings can easily be computed using aggregate player stats. You can view career Bombays here. While I wouldn’t necessarily trust default Bombays to provide a single number indicative of player quality over metrics like WAR and GvT, I believe the method has very interesting potential and flexibility. For one, it can easily be expanded as new stats become available. Secondly, the same method can be applied in other leagues, namely Canadian Major Junior and college leagues.
]]>Eric Tulsky previously found that the back-to-back edge was worth a full percentage point in the second half of back to backs, from .912 to 901, using data from the 2011-12 and 2012-13 seasons. Since we now have data at war-on-ice.com with quality goaltending data from 2005-06 until this season (2014-15), it’s worth a fresh look. Here’s the effective difference by season.
The reputation for tired goalies has apparently been made based on the two worst years in our record; in fact, in three other seasons the effective change in save percentage is positive.
Given the additional tools we have at our disposal, let’s break them out and see if they tell us anything new about this. Let’s do it in this sequence using good old logistic regression:
The negative changes in save “percentage” in thousandths for each factor:
Model | Away Goalie | Back-To-Back (Home) | Back-To-Back (Away) |
---|---|---|---|
1 | 3.3 | 1.3 | 3.1 |
2 | 1.2 | 2.1 | 3.4 |
3 | 0.6 | 1.9 | 3.4 |
4 | 0.1 | 1.8 | 2.9 |
5 | -0.03 | 2.3 | 3.7 |
The home advantage on save percentage disappears the more factors we add, and the difference in “tired” performance persists, but only at 3 and a half points below their usual performance, not 11. I was personally expecting the differences to be bigger, and I was also expecting shot danger to play a bigger role than effectively none. Still, while we still don’t have a good idea if it’s there’s greater risk for injury, or other unknown factors, we can be confident that coaches aren’t completely nuts if they send their Number One out back to back.
Replication materials:
]]>We encourage others interested in the analysis of hockey data to follow Manny’s lead and create interesting apps for www.war-on-ice.com.
The wheels of this project were set in motion when I began toying around with a number of methods for visualizing hockey players’ stats. One idea that made the cut involved plotting all regular skaters since the 2005-2006 season and separating forwards and defensemen by two measures (typically Rel CF% and P/60 at 5v5). I could then show the position of a particular skater on the graph, and more interestingly, generate a list of the skaters closest to that position. These would be the player’s closest statistical comparables according to the two dimensions chosen. Here’s an example of what that looked like:
(click to enlarge)
The method I used to identify the points closest to a given player’s position was simply to take the shortest distances as calculated by the Pythagorean theorem. This method worked fine for two variables, but the real fun begins when you expand to four or more.
In order to generalize the player similarity calculation for n-dimensional space, we need to work in the Euclidean realm. Euclidean space is an abstraction of the physical space we’re familiar with, and is defined by a set of rules. Abiding by these rules can allow us to derive a function for “distance,” which is analogous to the one used above. In simple terms, we’re calculating the distance between two points in imaginary space, where the n dimensions are given by the measures by which we’ve chosen to compare players. With help from @xtos__ and @IneffectiveMath, I came up with the following distance function:
And Similarity calculation:
In decimal form, Similarity is the distance between the two points in Euclidean n-space divided by the maximum allowable distance for that function, subtracted from one. The expression in the denominator of the Similarity formula is derived from assuming the distance between both points is equal to the difference between the maximum and minimum recorded values for each measure used. The nature of the Similarity equation means that a 98% similarity between players indicates the “distance” between them is 2% of what the maximum allowable distance is.
To understand how large the maximum distance is, imagine two hypothetical player-seasons. The highest recorded values since 2005 for each measure used belong to the first player-season; the lowest recorded values all belong to the second. The distance between these two players is the maximum allowable distance.
Stylistic similarities between players are not directly taken into account, but can be implicit in the players’ statistics. Contextual factors such as strength of team/teammates and other usage indicators can be included in the similarity calculation, but are given zero weight in the default calculation. In addition, the role played by luck is ignored.
The Statistical Similarity Calculator uses this calculation to return a list of the closest comparables to a given player-season, given some weights assigned to a set of statistical measures. It should be noted that the app will never return a player-season belonging to the chosen player, except of course the top row for comparison’s sake.
(click to enlarge)
Under “Summary,” you will find a second table displaying the chosen player’s stats, the average stats for the n closest comparables, and the difference between them.
(click to enlarge)
This tool can be used to compare the deployment and usage between players who achieved similar production, or the difference between a player’s possession stats and those of others who played in similar situations. You may also find use in evaluating the average salary earned by players who statistically resemble another. I’ll continue to look for new ways to use this tool, and I hope you will as well.
** Many thanks to Andrew, Sam, and Alexandra of WAR On Ice for their help, their data, and their willingness to host the app on their site. **
]]>How valuable is a penalty drawn or taken to a team? In goals, the marginal effect is clear: you get up to 2 minutes during which your scoring rate for goes up and the rate against goes down. And if it’s your best penalty killers who are penalized, they don’t get to help clean up the mess they’ve made in the process.
The secondary effects are less clear. For example, what changes in terms of a team’s future effort when a player takes an ill-advised penalty? We’re not in a position to answer this when it comes to the share of responsibility to the penalty taker; we can only assess a team’s performance during those times.
And so, for the time being we’re left with the credit and blame for the penalty taker and drawer in terms of an expected goals measure. To get goals above replacement, we need to know the rate at which a replacement player at each position would take or draw penalties — aside from misconducts and matching fighting majors — so we do this in the same type of method as with faceoffs and the Poor Man’s Replacement method:
The results for the 10 seasons since 2005 are below. Note that we do not have penalties drawn in the 2005-06 and 2006-07 seasons.
The “replacement” rate for taking penalties for forwards and defensemen is higher than the league average. When it comes to penalties drawn, forwards draw penalties at a greater rate than defensemen, which is to be expected on scoring plays; replacement rate at each position is roughly the same as the league average otherwise. This suggests that if drawing penalties is a skill, it’s exceptionally rare, whereas general discipline to avoid taking penalties is clearly a behaviour seen in full-time players.
Now it’s simple enough to get the number of penalties drawn and taken by replacement players at each position, and subtract this from their actual results. The final table is available in full here.
We convert to goals with an approximation: A team on the powerplay scores at a clip of roughly 6.5 goals/60 and allows 0.78 shorthanded goals/60. We move each of those rates from a 5v5 rate of 2.5 goals per 60 minutes, and assume that 20 percent of powerplays end in goals, for an average of 1.8 minutes on the PP, and reach an average figure of 0.17 net goals per penalty taken or drawn. For now we use the relation that 6 goals equals one win.
The champion in total penalty WAR in total volume in the last 10 seasons is Dustin Brown, and it’s not even close: 8.47 wins above replacement for Brown in that time. Per 60 minutes, though, he’s the third ranked player in the top 50 over that time; Nazem Kadri and Darren Helm take the 1 and 2 spots.
The special prize here goes to Patrick Kaleta of the Buffalo Sabres, who has a penalties drawn rate well above the average and in the number one spot for the top 200. We know this about him already but it helps his case that he has a penalties-taken rate that isn’t as bad as a replacement player and gives him an extra boost.
Links:
In the next week we’ll be releasing our proposed three main elements from which we can derive WAR using the data we have, in what we feel is the ascending order of importance: faceoffs, shooting/goaltending success, and shot attempt rates.
For each process, the pathway we’re laying out to establish value sounds straightforward:
We’ve been talking about parts 1, 3 and 4 in previous entries in this series, and we will continue to do so in the parts to come. But we need to establish what “replacement” means, because there are two important qualities we need to factor in.
First, there’s the standard definition: a level of performance against which we judge everyone else, under the assumption that it’s the level of skill that a team could purchase at the league minimum price. This is fairly clear-cut in most examples in, say, baseball: for every position, there’s a different baseline expected level of performance, and the average can be calculated at each position by that standard; replacement level can then be calculated relative to the average. A shortstop that hits 20 home runs in a season is more valuable than a first baseman with the same numbers, because “replacement-level” shortstops will tend to have less power.
But a benchmark for performance isn’t sufficient here. When we measure team achievement, we simultaneously adjust for the strengths of their opponents to get a more precise estimate. To do the same thing for player-player interactions, we have to adjust for player strengths, but since estimates for replacement players are inherently unstable — there’s so little data on each player, almost by definition — it helps us even more to have a single standard for each type of replacement player to ensure that our adjustments are accurate.
One standard definition that I like for replacement players is based on the total number of regular players in the league, like in the Baumer-Jensen-Matthews OpenWAR method: for 30 teams with 25 regular players, any player beyond the original 750 can be considered “replacement”. This makes sense if players have only one or two roles, like fielding and batting. Where this differs in hockey is that a replacement player at even strength would come from the minor leagues, but a replacement player on the power play might be a regular roster player promoted from the third line, and so establishing an exact count of players in those other roles may prove more difficult.
For this reason, let’s test out what I’m calling the poor man’s replacement: for the statistic in question, set a threshold value for which all players under get pooled together as the canonical “replacement” player. Let’s test this with a standard model for faceoff ability. A modified Bradley-Terry model can be built into a logistic regression model; for every faceoff between players A and B, when player A is on the home team, we get the model
log (Pr(Player A wins) / Pr(Player B wins)) = home_bonus + R_A – R_B
In this case there are two classes of replacement: centers, for whom faceoff skill is expected as part of the job, and non-centers, typically brought in as a second choice for the faceoff following the designated taker being tossed. [UPDATE, 2015-03-23: The current value being used for a replacement threshold is if a player takes fewer than 50 faceoffs in a season.] We fit this model for the home-ice advantage, every non-replacement player, the replacement center and the replacement non-center, for all 12 seasons in the war-on-ice.com database — this is 889,733 faceoffs total as of Saturday March 21, 2015. The results are fairly consistent across all 12 years for these main factors:
We now have terms for each player in each season that estimates their individual strength over all events. This calibrates the model for the next step: calculate the number of faceoffs that the appropriate replacement player would be expected to win against the same opponents (and home vs away); an average player would only beat Patrice Bergeron 40% of the time, or a non-centered replacement over 60% of the time, and a replacement center would be less likely on either (roughly 32% and 53% each). We then calculate Faceoff Wins Above Replacement by subtracting this number from their actual win total.
Converting Faceoff Wins to Goals
Now that we have a measure of faceoffs won against replacement, the simplest conversion into goals is to take a direct conversion factor from one to the other. My earlier estimate from a small, biased sample of college hockey gives about 0.015 goals per faceoff won; Schuckers’ group used a similar method to calculate NP20, and their estimate is 1 goal per 76.5 faceoffs won, or 0.013 G/W. We can narrow this estimate by location and man-situation if necessary using similar adjustment factors, though we prefer to treat all faceoffs as fungible so that coach’s usage, or differences in team scoring abilities, do not play a direct role.
Overall Results
We link below to the season-by-season and total results in their own Google spreadsheets; they’ll be added to the main site after this alpha period. The top producer above replacement is Patrice Bergeron, with Joe Thornton close behind; more interesting to me is that Rod Brind’Amour performed nearly as well in only 7 seasons of our data that the others did in 11 or 12. Brind’Amour also owns the best season on record with 6.64 goals above replacement in 2005-06. That’s probably being driven in part by the low level of replacement quality in that year, but it would likely still stand up due to the sheer number of faceoffs he took.
It should be pointed out that we didn’t necessarily need the more elaborate Bradley-Terry model; just using the Poor Man’s Replacement with individual win-loss records and home/away counts yields nearly the same results. But it’s just as important to show that the slightly more complex method will work in new situations without making too many unusual assumptions.
Links
]]>In December, I submitted a paper on ZTTs to the Sloan Sports Analytics Conference; the paper was not selected as a finalist in the research paper competition. A slightly modified version of this paper can be found here. The results and text are unchanged from the December submission, except for minor typos.
Since then, I’ve identified some flaws with this work that I didn’t (have time to) explore in November/December:
Thanks to everyone who gave feedback on earlier versions of this work. Please feel free to share your own feedback with me on Twitter. I hope to revisit this in the summer, or when player tracking data allows us greater precision to evaluate players and teams with ZTTs.
Finally, I’d like to conclude this post with a comment on null results in quantitative research. For those unfamiliar with scientific jargon, the term “null result” is typically associated with completed scientific studies that are “unsuccessful” in proving a hypothesized claim (statistically, studies where there is not enough evidence to reject the null hypothesis).
I’m not sure I would call my findings with ZTT a “null result,” but the results I did find were not as grandiose or game-changing as I had originally hoped they would be; they could best be described as “weak.” I’m sure that others have experienced similar results when trying to further research into hockey analytics and other fields. For those people, I encourage you to publish your null/weak results! They are interesting on their own.
For example, if I found that a team’s ability to quickly transition the puck out of their defensive zone had absolutely no effect on that team’s ability to suppress goals or shots in the future, would you think that was interesting? If I found that a team’s ability to keep the puck in the offensive zone for longer periods of time had no effect on a team’s ability to score in the future, would you think that was interesting? I would. And if I was someone interested in exploring this topic in the future, I’d want to know what previous researchers have found.
Publish all of your results, regardless of how “strong” or “weak” they are. It can only serve to benefit the research community by putting this information out there.
]]>Summary: This is an update to our previous post on the best metrics for predicting player performance. Here, we split the analysis out by position (forwards vs. defensemen).
In this analysis, using all data going back to the 2005-06 NHL season:
Glossary: See here.
Data: See here, and add FF% = Fenwick For% = percentage of unblocked shot attempts directed at the opposing goal when a player is on the ice.
Methods: See here, and add FF% to the list of metrics evaluated. (Note: In our original analysis, we also evaluated FF%, but it was found to be worse than CF% and SCF%, so we did not include it in our post.
Results — Past-vs-Future Correlations:
For forwards, SCF% had the highest past-vs-future correlation with future GF% across all seasons in this analysis. Here are the results (in order of correlation magnitude):
For defensemen, FF% had the highest past-vs-future correlation with future GF% across all seasons (with SCF% finishing a close second) in this analysis. Here are the results (in order of correlation magnitude):
Results — Future | Past Regression Models (all seasons):
For forwards, we used past SCF% and past CF% as explanatory variables in a multiple linear regression model of future GF% across all seasons. The results of this model are summarized here:
Interestingly, the magnitude of the SCF% coefficient increased for forwards, indicating that SCF% is a better predictor of future GF% for forwards than it is for defensemen in this analysis.
Note that we also repeated this analysis using FF% instead of CF%, but FF% was found to have an insignificant effect on future GF% when accounting for SCF% in the model (results not shown). This is very interesting: CF% was found to be significant even after accounting for SCF%, while FF% was not. This may indicate that the additional information included in CF% — blocked shots, for and against — is driving some of the metric’s predictability of future GF%. Our hypothesis is that blocked shots against (i.e. shot attempts taken by the opposition at the player’s goal) are driving the effect here, since forwards do a lot of shot-blocking at the points in the defensive zone.
For defensemen, we first used past SCF%, past CF%, and past FF% as explanatory variables in a multiple linear regression model of future GF% across all seasons. Since CF% and FF% are highly collinear, we opted to do two separate two-explanatory-variable regressions and examine the results: (1), future GF% given past SCF% and past CF%, and (2), future GF% given past SCF% and past FF%.
The results of the first model, which models future GF% given past SCF% and past CF%, are summarized here:
In other words, SCF% has a more substantial effect* on future GF% than does CF% in this analysis.
The results of the second regression model, which models future GF% given past SCF% and past FF%, are summarized here:
Interestingly, SCF% is less predictive of future GF% for defensemen than it is for forwards, and FF% is the superior predictor of future GF% for defensemen by a small margin in this analysis.
Season-to-Season Past-vs-Future Correlation Plot, Forwards:
Season-to-season, the metric with the highest past-vs-future correlation for forwards varies. Recall that across all seasons, SCF% has the highest past-vs-future correlation. This seems to be backed up by the graph, where SCF% is highest by a relatively large margin in 4 of 9 seasons.
Season-to-Season Past-vs-Future Correlation Plot, Defensemen:
Season-to-season, the metric with the highest past-vs-future correlation for defensemen varies quite a bit. Recall, though, that across all seasons, FF% has the highest past-vs-future correlation. This seems to be backed up by the graph, where FF% appears to be a bit more consistent from season to season than other metrics.
Notes:
Needless to say, we won’t be replicating his massive efforts any time soon.
What we do have is access to public data on annual compensation, from past USA Today records and current NHLPA postings, dating across our database from 2002 to the present. After cleaning and matching, we’ve added it to the Goaltender History, Skater History and Skater Comparison apps when individual season data is present. (Goaltender Comparisons will be added soon.)
This has total compensation in salary and bonuses by year; it is not adjusted for inflation or cap share. We see it as a stopgap first and a starting point second to have deeper discussions about what users want that we can provide.
]]>Since the 2005-06 NHL season, the percentage of Scoring Chances For (SCF%) is a better predictor of future Goals For (GF%) than Corsi For (CF%) is for individual players. (Under specific conditions, of course, but it’s promising either way.)
Combining data across all seasons since 2005-06, the season-to-season correlations are:
Using multiple linear regression and combining across all seasons, we find that:
Glossary:
Our definition of what constitutes a “scoring chance” came about through discussions with people in the community first, because it’s a fairly subjective term. It’s clear that the community wants something with more definition than those based on distance, like our three danger zones, that captures something else about both scoring probabilities and in-game opportunities. Here we show that this definition has predictive advantages over other commonly used measures, even in their score-adjusted states.
Data:
Methods:
Using the data described above, we iterated through each season, finding players who played at least 500 minutes in that season and the following season. This gave us a matrix that looks something like this (except, for all players, not just the handful listed here):
Name past CF% past GF% past SCF% future GF%
Jake.Muzzin 61.8 57.7 60.5 51.2
Marc-Edouard.Vlasic 59.6 60.0 62.5 57.8
Drew.Doughty 59.3 58.5 57.0 54.4
Justin.Williams 61.4 58.1 60.2 55.6
Brent.Seabrook 58.2 56.2 58.2 56.4
Duncan.Keith 57.9 56.7 58.2 64.7
Then, we did two things:
Results — Past-vs-Future Correlations:
Above is a graph of the past-vs-future correlations of each metric with GF% over time. From this, we make the following observations:
As mentioned in the intro, these results hold when combining data across all seasons since 2005-06, where the season-to-season correlations are:
In other words, combining across all seasons, SCF% is more highly correlated with future GF% than is CF%.
Results — Future | Past Regression Models:
First, we used past SCF% and past CF% as explanatory variables in separate, univariate linear regression models of future GF% (one model for each season and explanatory variable). Not surprisingly, these results were nearly identical to the past-vs-future correlations:
Second, we used both past SCF% and past CF% as explanatory variables in a multiple linear regression model of future GF% across all seasons. The results of this regression are summarized here:
In other words, SCF% has a more substantial effect* on future GF% than does CF%.
*Note that since SCF% and CF% can both be approximated with a Normal(mean = 50%, standard deviation = 10%) distribution — that is, they are on the same scale — we can directly compare the magnitude of their regression coefficients.
Author’s note: One thing we found interesting is that in the across-season regression, both CF% and SCF% had significant coefficients. In these analyses, it’s common for only one independent variable to explain most of the variance in the dependent variable (due to collinearity). The fact that both are significant indicates that CF% and SCF% are accounting for (at least slightly) different parts of the variance in future GF%. Predictive models of player performance would do well to include both metrics.
Future Work:
Appendix:
Below is the R script used for this analysis. Feel free to try it out and add your own analyses.
]]>In the Road to WAR series (which I’m delighted to return to) we’ve been using event rates as a basis for how we model hockey. One of its strongest points is that it works at multiple scales; from a single event in a game to multiple seasons, we can quickly calculate both expected values and variances once we know how the rates should be altered, up or down, so that our comparisons take sample size into account.
The strongest reason I prefer to use the rates approach? Not only is it a reasonable model for how the game goes, it lets us add different combinations of effects, observations and quirks simultaneously while judging their impact, separately or cumulatively. And we’ve got a few of them that are getting people’s attention that are basically together:
The Schuckers-Macdonald approach takes care of most of this in their linear model, and we couldn’t be happier for their presentation. After hearing and sharing plenty of complaints about the quality of event counts in NHL games (present company included), the authors (who are also our friends) put together a model for capturing how much overcount or undercount happens in each arena. Some are more severe than the others, and you should read the paper to see their approach. What I personally found the most reassuring was that “homer” bias is extremely rare; count bias very rarely favors one team or the other. (For now, we’re not adding “homer” bias to the model, but it’s easily included.)
We wanted to replicate this result for the site while trying it with a slightly different model, partly out of personal preference, partly just to see if things would change under a different specification. (According to Brian, they didn’t, noticeably.) We’re using a Poisson multiplicative linear model where
Expected (Number of events recorded over a time interval) = Time Elapsed * Event Rate
with the base event rate multiplied by each factor. Each row of the data table corresponds to a team’s performance in a single game at a known game state; for example, in the first game of the season, Toronto faced Montreal at home and recorded 7 shots in the first period, at five-on-five, with the score tied. There are many rows for each game corresponding to each observed state; we model the events for explicitly and the events against implicitly (automatically, since each team has its own rows.)
In logarithmic form, everything is additive in the specification of the expected rate:
Events | ||
---|---|---|
log(Event Rate) | = | log(Base Event Rate) + indicator(home/away) |
+ | factor(score differential and period of game) | |
+ | factor(home rink) | |
+ | factor(the event team "for") | |
+ | factor(the opposing team "against") |
The “for” and “against” terms are to filter out the fact that, shockingly, teams have different underlying ability. We estimate this model for every season from 2005-2006 until the present, separately for blocked, missed and saved shots, and separately for 5v5, power play and 4v4. And since the rink count bias is expected to persist from year to year, we estimate a linear model where one season’s rink bias predicts the next and attenuate each bias by the observed annual correlation. If the estimate from one year was uncorrelated to that from the next, it would be far more likely that this measurement was noise — for blocked and missed shots, the average measured year-to-year correlation was around 0.8, while for saved shots it was closer to 0.5.
(Note: we repeated this procedure by dividing events by “danger zone” since we expected the event count to be more biased the longer the shot, since we though this would be where judgments are less than clear. Surprisingly, there was comparable non-negligible bias at each level.)
While we have included score differential in factorial form before, the note that so-called score effects have a temporal dependence came to my attention from Fangda Li, so we added period-specific effects to the corrections for saved shots, missed and blocked shots independently. And there’s good reason to believe that these effects aren’t necessarily the same. Here’s the extract of the score/period table averaged over all seasons since 2005-2006 (updated from first posting):
Score Differential | Period | Shot Multiplier | Miss Multiplier | Block Multiplier |
---|---|---|---|---|
Trail 3+ | 1 | 1.06 | 1.06 | 1.09 |
Trail 3+ | 2 | 1.15 | 1.11 | 1.2 |
Trail 3+ | 3 | 1.13 | 1.12 | 1.16 |
Trail 2 | 1 | 1.02 | 1 | 1.11 |
Trail 2 | 2 | 1.12 | 1.13 | 1.2 |
Trail 2 | 3 | 1.14 | 1.15 | 1.23 |
Trail 1 | 1 | 1.03 | 1.02 | 1.09 |
Trail 1 | 2 | 1.12 | 1.11 | 1.16 |
Trail 1 | 3 | 1.1 | 1.11 | 1.23 |
Tied | 1 | 1.03 | 1.01 | 1.04 |
Tied | 2 | 1.08 | 1.08 | 1.07 |
Tied | 3 | 0.99 | 1.01 | 1.01 |
Lead 1 | 1 | 0.94 | 0.94 | 0.94 |
Lead 1 | 2 | 1.02 | 1.01 | 0.99 |
Lead 1 | 3 | 0.85 | 0.84 | 0.74 |
Lead 2 | 1 | 0.9 | 0.89 | 0.9 |
Lead 2 | 2 | 0.98 | 0.98 | 0.92 |
Lead 2 | 3 | 0.82 | 0.82 | 0.7 |
Lead 3+ | 1 | 0.86 | 0.92 | 0.86 |
Lead 3+ | 2 | 0.93 | 0.94 | 0.85 |
Lead 3+ | 3 | 0.81 | 0.81 | 0.67 |
When splitting by period, two things are clear: first, the persistent effects of score are largely independent of period, and not linear by score. Second, the drop-off when leading is clearest in the third period, but only when leading.
Implementation
Now that the adjustment factors have been calculated, we can take the McCurdy interpretation and divide the value of each individual event by the “inflation” factor: Corsi events when trailing, at home or in overcounting buildings will be reduced in value to the count, relative to the global rate, and those when leading, on the road and with undercounts will be elevated. It may be uneasy to consider 0.9 blocked shots, but we already do this when talking about event rates per 60 minutes and in percentage form.
These adjustments can be done one at a time in the Team Comparison app. We’ll be rolling them out to the other apps as soon as we shake out the initial issues.
Links
Final results and script: all-rink-adjustments.csv.zip
Data, which comes from R: all-team-seasons-20052006.RData (figure out the other seasons for yourself)