As complete a list of terms that we have on the site follows (or the relevant links.)
WAR/GAR: The whole series is linked here.
Time on ice:
General Shot-Based Event Counts: The bread and butter of modern hockey analysis.
Goal Events (G): Shots that are not saved and cross the goal line. The ceremony consists of a bright red light of shame, a stripe-fashioned league employee exaggeratedly drawing attention to the goaltender’s failure to do their job, and a small party at center ice to begin the process anew.
Shots On Goal (S): Shot attempts that are somehow the goaltender’s responsibility. Includes goals, as above, and saved shots, which comprises the bulk of a goaltender’s productive output.
Take the above quantities and replace “G” with “S” like so:
New ones:
Fenwick (F)/Unblocked Shot Attempts (USAT): Take shots on goal and add shots that miss the net entirely. Named for its inventor, Battle of Alberta author Matt Fenwick, not a fictional Duchy that changed the course of the fictional world.
Take the above quantities and replace “G” with “F” like so:
New ones:
Corsi (C)/Shot Attempts (SAT): Corsi events consist of all shot attempts: blocked, missed, saves, goals. Should probably just be called “shots”, so we’ll do that here. Named for goaltending coach and mustache champion Jim Corsi, coined and minted by Tim Barnes (aka Vic Ferrari).
Take the above quantities and replace “G” with “C” like so:
New ones:
Location Data:
The NHL has (x,y) location data available for all shots-on-goal, as well as hits and penalties. ESPN and Sportsnet both have this data available for missed shot locations and the location where shots were blocked (not from where they were taken, though both are useful).
This location data has systematic bias from rink to rink as well as random measurement error. We can’t do much for the random error, but to correct the bias in shot location data, we use the basic method proposed by Schuckers and Curro (Appendix A):
Having a de-biased measure for shot location is essential for any measures that are going to compare from building to building. Speaking of:
Danger Zones And Types:
There are all sorts of mechanisms for judging the relative worth of a shot given its (x,y) coordinates and other information. Schuckers has a comprehensive method for evaluating expected goals called DIGR; for our purposes, we simplified the available data into three main features:
A shot’s Danger is then defined by this method:
For goaltenders, we then have
Scoring Chances (SC):
All shot attempts that have danger 2 or greater. As originally described here.
Take the above quantities and replace “G” with “SC” like so:
High-Danger Scoring Chances (HSC):
All shot attempts that have danger 3 or greater. Take the above quantities and replace “G” with “HSC” like so:
Adjusted Save Percentage (AdSv%):
Defined here, it is the weighting of a goaltender’s save percentage in each danger level by the fraction of shots that would be expected from the league-wide distribution.
Score Situations:
Score Effects are the acknowledged differences in team performance based on the difference in score. There are popular methods are used for accounting for the score in whatever results are presented; we host two. The first, Score Close, was pioneered by Tore Purdy (aka JLikens) and simply includes situations where teams are within 1 goal of each other in Periods 1 and 2, and tied afterwards.
The second, manual score adjustment, has a few different predecessors:
Not surprisingly, we went with our adjustments, and implement a full Poisson model for score, period and rink effects for each shot type by each danger zone. Adding the rink bias correction to our score and period correction was inspired by Schuckers and Macdonald.
Charts:
Bubble charts: The main look and design of the bubble charts, with four variables displayed simultaneously, comes from Rob Vollman’s Player Usage Charts, including the starting variables: x-axis for zone starts, y-axis for quality of competition, color for Relative Corsi. Our expansions and additions include every variable we have at our disposal including different game and score states.
Hextally: Directly inspired by Kirk Goldsberry’s NBA Shot Charts for Grantland and 538 (but perhaps the lack of a player with an appropriate name in the NBA made it difficult.) Expanded for both shot success probabilities (standard for basketball) and the rate of shots taken from each area of the ice (not standard for basketball, even if it were played on the ice.)
Shift Charts: We started with the original NHL shift charts before adding our own features. Both ShiftChart.com and timeonice.com (now defunct) hosted their own versions, inspiredby the same NHL.com template.
Shot Attempt Timelines: ExtraSkater (defunct) made them popular, but Behind The Net had the first ones we could find online.
Pulling the Goalie: Original post is here.
Raw Teammate/Competition Statistics:
For each of the teammate and competition statistics, relative numbers on a game by game basis by taking an exponentially weighted prediction of the next game’s numbers.
Other:
To use this, you’ll need to install the library devtools first, then once that’s loaded, use the command
install_github (“war-on-ice/nhlscrapr”)
This is updated for the 15-16 season and includes all our other most recent upgrades.
]]>With our danger breakdown, standard save percentage of Saves/(Saves + Goals) is expressed as
Sv% = (Saves_low + Saves_med + Saves_high)/(Saves_low + Goals_low + Saves_med + Goals_med + Saves_high + Goals_high)
The adjustment is to re-weigh every danger-based save percentage by the league-wide distribution of shots on goal in each zone:
AdSv% = (S_l/(S_l + G_l) * AllShots_l + S_m/(S_m + G_m) * AllShots_m + S_h/(S_h + G_h) * AllShots_h ) / (AllShots_l + AllShots_m + AllShots_h)
If the shots faced by the goalie have the same ratio as the league average, then their unadjusted and adjusted save percentages will be equal.
]]>
We have a few objectives with the site, moving forward:
You should join up if you want to give back to the community that you enjoy so much, by honing your own data-ninja skills on stuff that you know will help people out.
You shouldn’t join up if:
If you’re still interested, here are some more details about the site.
We tend to divide site operations into two categories:
Still with us? If you are interested in joining the team, please send us a note to waronice.com@gmail.com. Please include: the responsibilities you’re interested in and any prior experience you have.
]]>We’ve always taken requests for data and shared privately, but since the news of our comrade’s hiring, and our desire to stimulate further research, we’ve begun the process to share everything we’ve put together — and we mean everything — with the hopes that given our existing database format, it will be even easier for community members to make their own tools and run their own analyses.
Here’s what we’re posting publicly in the first round:
The full file list is available here and will be updated as needed.
Here is a description for all variables in nhlscrapr and in the derived WAR files.
UPDATE: Here is the processed contracts table.
]]>
Entry.Name | Score |
Perfect | 0 |
Don’t Toews Me Bro! | 6.8716238602 |
The Brass Bonanza | 7.0910481644 |
microdino | 7.2444384897 |
Corsi Hockey League | 7.2457407713 |
Sean Dooley | 7.272956604 |
Adam Odenwelder | 7.3923966229 |
derek8 | 7.5417095449 |
Clown Predictions, Bro | 7.6645163552 |
JohnScottScoringMachine | 7.7063411263 |
Corsi Calamity | 7.7112029266 |
HawksNumbers | 7.7604368224 |
Ben Lutz | 7.8195240411 |
BentleyNathan1 | 7.8491414946 |
Sebastian Mankowski | 7.8646730078 |
danbowie | 7.9074140647 |
Carolyn Tries Her Hand | 8.1370624749 |
dvgmacdonald | 8.1405948215 |
thehaze | 8.204558261 |
trevor | 8.2293711818 |
mikael johansson | 8.2495068414 |
Jay32600 | 8.2540820936 |
Flesh and Bone | 8.3085470955 |
Fancy Fenwicks | 8.3136010606 |
whichocho | 8.3401821184 |
Losing Entry | 8.3442167105 |
YT | 8.4366901828 |
sad bruins fan ![]() |
8.4866993775 |
Tom.Andreu | 8.5119277864 |
Andrei D | 8.5239516775 |
Holmgren | 8.5284431538 |
Danton Danielson | 8.6351995803 |
Daniel Sandler | 8.650645999 |
Smittens | 8.6721579007 |
Ken Peterson | 8.6911295819 |
Corey | 8.7173200392 |
Ovi is God | 8.7205833069 |
Eric Single | 8.7439823599 |
Getzlaf’s Forehead | 8.7872070859 |
QuickkNess | 8.8246743919 |
altrockposeur | 8.8278775422 |
Nilesh Shah | 8.8403094971 |
StatsbyLopez | 8.8705267611 |
Steven David | 8.8833133041 |
Andrew Wisneski | 8.8868548107 |
John Barr | 8.9228851524 |
Adam R | 8.9424973307 |
quack attack | 8.9674498118 |
seanbailly | 9.0954598132 |
Flashes of Quincy | 9.1015593194 |
Kurtis Wells | 9.1390202842 |
Midnight Ramblers | 9.1484016026 |
Félix Magny | 9.1547807581 |
The Math of Khan | 9.1563240565 |
kncpt | 9.1598845942 |
Sabres Win | 9.1612333752 |
Matt Cane | 9.163722536 |
evohnave | 9.1648887848 |
Hero Squad | 9.2113579453 |
Mcurcio94 | 9.2398682861 |
clib542 | 9.2730859542 |
Sapp Macintosh | 9.2840653428 |
RangerSmurf | 9.2915912403 |
gut feels | 9.3410879975 |
Jason Richland | 9.4032997817 |
MBwinz | 9.4134528979 |
Iron Ringer | 9.468006452 |
Legs Feed The Wolf | 9.5184845615 |
Lugnut Ninety-Two | 9.5650159312 |
Nick | 9.5677243517 |
Andrew Pritchard | 9.5986201249 |
Jenni | 9.5990676441 |
Sean | 9.6218580301 |
Mysonzdad | 9.654913344 |
Josh Norman | 9.6743593226 |
Pat Holden | 9.7033732654 |
Adrian F | 9.717788625 |
Neel | 9.7563674943 |
Wolfram Ott | 9.7777860246 |
PDOwned | 9.7863596557 |
Nikm8 | 9.787107307 |
Peace On Liquid Water | 9.7934054506 |
RyanPrice | 9.8405501909 |
Jon Stolte | 9.8851916059 |
ScuttlePuck | 9.9107184334 |
Winterhawk11 | 9.9367867374 |
aasiaat | 9.9585003239 |
Jon Garcia | 9.9856430718 |
PhilKessel | 10.1325615448 |
artigascruz | 10.1368246784 |
The Philosopher | 10.1498649026 |
Regression to the mean streak | 10.1720356159 |
CBJ 2016 | 10.3383148238 |
Zone Entries | 10.3457779324 |
Wrathman | 10.3840926146 |
clarendonbandit | 10.3850245284 |
Kerfuffle on Frozen Water | 10.4104886866 |
Andrew Rasmussen | 10.4440463984 |
bigMac | 10.6044809869 |
UponFurtherReview | 10.6437886061 |
amcassells | 10.663771638 |
BlueMoons68 | 10.6866708283 |
Karan | 10.7103734232 |
BFitz | 10.7969523003 |
Not Gonna Win | 10.839300026 |
Stefan | 10.9414853807 |
ZachMacDonald | 11.043075811 |
Chris Kang | 11.1936846403 |
kd5mdk | 11.3652086796 |
kadri’s drunks | 11.6127684377 |
Ohno | 11.7344710644 |
Micah Blake McCurdy | 11.9241412719 |
Stanley’s Bandwagon | 12.0942643644 |
the Flamingos | 12.1588773778 |
JH | 12.7173764514 |
fbourassa | 13.4674255004 |
Paddyboy | 14.963914817 |
Aerofan79 | 15.0526656878 |
A J G | 16.1596523009 |
Sean Burke | 16.3475285916 |
finlayj | 20.0711222646 |
Back in January when CapGeek went offline, we grabbed a series of contract pieces from USA Today and the NHLPA website to help tide the community over. We didn’t have any plans to take the reins on a new site, partly because it would be a lot of work, partly because we didn’t have a clue where to get the data from an original source, but mainly because we hoped that the permanence of the shutdown was overstated.
What changed for us? Once the initial salary data went up, we were approached by those with the original, genuine contract data, from the same basic sources that Matthew had access to, who wanted to see the work continue. We then set to work converting it over into a new database, recruited volunteers to help us crawl through it (particularly the incomparable Alexandra Mandrycky), spent way too many hours bothering our resident cap expert (Mike Colligan) with questions, partnered up with other efforts (including the ninja himself, Greg Sinclair) and posted an initial set of contracts for all the players we could find who were active this past season at the end of February, along with a buyout and cap recapture tool. The reaction broke the site temporarily, so we knew we’d have to build a better infrastructure before we could go big.
Well, we’re ready to go big today. The “beta” version of our contracts database is now ready for consumption, reliably dating back to the 2009-10 season, and including everything we could find on contract structure, signing and performance bonuses (particularly the achievable A and B bonuses). Other features:
We and our (friendly) competitors surely have a ways to go before the functionality of CapGeek is once again matched; Matthew Wuest put over five years of work into it, after all. And if our experience after ExtraSkater’s shuttering has taught us anything, it’s that a gap in the market leads to fresher and better alternatives to come from those who would not have acted given the dominance of the one, so we’re expecting greater work from ourselves and others in following Matthew’s lead.
But of all the roles that CapGeek played — to have the most trusted database, the quickest reaction time to contract announcements, and the user-friendliest interface — we’re most confident that we can serve the community best in the first role, and so that’s where the bulk of our energy has gone. Which is why we’re opening all our data — contracts, game statistics, and (soon) transactions — to anyone who wants to use it (but not sell it), as long as we’re cited as the original source.
Let’s go forward together. With everyone jumping on board this train, this should be a fun off-season.
]]>
1. The use of this site comes with absolutely no warranty.
2. We are not responsible for the consequences of what you do with the data — legally or morally.
3. Our site’s prime purpose is for research: you can use what you like for plain fandom, articles, personal exploration and knowledge, tweets, blog posts, and so forth. Our work to build the site started as the result of our needs for academic work and that’s still what we do with it first. In that spirit, we ask that you cite our work and, in particular, the specific page you obtained the data from so that others can obtain it as well. If you would like to show extra appreciation, the Donate button is on the site’s front page.
4. Automatically scraping our pages for data is not only strictly prohibited, it’s not worth your time to do it; it’s an unnecessary strain on our servers, and that’s why we have Download buttons. If you want data that’s more complicated than our current queries, you can ask us by email at waronice.com@gmail.com or on Twitter at @war_on_ice.
5. We reserve the right to offer our services in consulting arrangements to help you get the most from transforming and aggregating the data in meaningful ways. We offer what we can for free because we treasure the community and value openness, but we’re not able to take every request for free simply because we all have full time jobs.
5. You may not sell any raw data you obtain from the site. This is not just because you can get it from here for free and your customers would be fleeced; it’s because this comes from the league and their partners and it’s not our place to sell what they allow us to use. This is not to say you can’t use it in articles that are posted behind a paywall, or use it in anything that’s at all transformative. You know what we mean here; if you have any doubt, ask us,.
6. Our underlying software is licensed under the GPLv2. We will be sharing what we can on GitHub for your convenience.
]]>We at www.war-on-ice.com are happy to host Manny’s newest Bombay Ratings App! We continue to encourage others in the hockey research community to follow Manny’s lead and develop public applications that will further the frontiers of research in hockey analytics.
While working on the Similarity Calculator, I stumbled upon a study in which Euclidean distance was used to compare NBA players to Michael Jordan. The author used the distances to generate a list of the most similar players to the man most would agree is the best ever. From this idea, Bombay ratings were just a conceptual hop, skip and jump away. Instead of choosing my own Michael Jordan from a list of historical players, I invented one.
Gordon Bombay (no relation to the legendary coach) played two seasons in the NHL. In his first season, as a forward, he led the league in every single statistical category and eventually his team to the Stanley Cup. Seeking a challenge, Bombay converted to defence in his second season. Undaunted, he repeated his rookie season success, once again unmatched by his peers in every single facet of the game. Bombay promptly retired, and no player since 2005 has been able to surpass his accomplishments.
Bombay’s stats at either position are equal to the best recorded values among regular skaters at that position since the 2005-2006 season. Thus, he possesses the best stats we can imagine without stepping outside the boundaries of what real players have been able to accomplish. If you don’t wish to entertain hypotheticals, consider an alternative explanation: The similarity calculation evaluates “distance” between players, each occupying a position in imaginary space. This space has as many dimensions as there are categories by which you choose to compare players, and the limits of each dimension are set by the maximum and minimum recorded values since 2005-2006. Gordon Bombay is simply a marker we’ve decided to place at the positive-most position in space — the position where the positive extrema of each dimension meet. In a three-dimensional plot, this is simply a corner. The Bombay Rating is the similarity between a player and Gordon Bombay.
Hence, we’ve laid the foundation for a method by which we can easily evaluate how “good” a player’s stats are that is surprisingly flexible and reasonably effective at producing intuitively pleasant results. The Bombay function essentially does what we all do when we pull up a player’s statistics. The advantage is that it’s more precise, quicker, and returns a single number. Recall that the similarity calculation is a function of the chosen dimensions and corresponding weights. It follows that the Bombay is function of those same variables. While this permits fluidity in what can be accomplished by the method, it also makes it entirely dependent on the quality of the measures used.
The Bombay app I developed uses a variety of 5v5 stats to assign ratings to skaters based on the selected weights and generate charts comparing players to Gordon Bombay in each of the chosen categories.
(click to enlarge)
The outer edge of the chart represents a 100% similarity to Bombay in that measure. This is only achieved if a selected player-season possesses the best recorded value in that metric among regular skaters since 2005. The dashed grey polygon represents another fictional player – one whose stats are all equal to the league average for regular skaters at that position. Note that league average does not signify a 50% similarity. At the default weights, this hypothetical average forward has a Bombay rating near 46 and the defenceman, 45.
I should confess that the default weights are largely arbitrary. I believe the correct weights to use are case-dependent, and I certainly encourage users to assign their own. I’ve found that using the “Defence” preset weights as a starting point to evaluate bottom-six or defensively-oriented forwards often produces more agreeable results. Individual season rankings can be viewed by toggling the “Table” tab and further filtered using the inputs at the bottom of each column. Using preset weights, the names atop the Forward rankings (Ovechkin, Sedin, Jagr, Crosby, Zetterberg, Malkin, Sakic) are who you’d expect; Defencemen, to a much lesser extent (Visnovsky, Karlsson, Giordano, Byfuglien, Campbell, Niskanen, Weber). It’s no secret that the evaluation of defencemen, by analytical and traditional methods alike, leaves to be desired at times. With better measures of defensive ability will come better results by this method.
Bombay ratings can easily be computed using aggregate player stats. You can view career Bombays here. While I wouldn’t necessarily trust default Bombays to provide a single number indicative of player quality over metrics like WAR and GvT, I believe the method has very interesting potential and flexibility. For one, it can easily be expanded as new stats become available. Secondly, the same method can be applied in other leagues, namely Canadian Major Junior and college leagues.
]]>
Entries: Only one entry permitted per person. Violators to this rule will be disqualified (and publicly shamed on Twitter). Sam, Andrew, and Alexandra will be participating, but their entries will be ineligible to win the prize.
Prize: First place receives a $50 Amazon.com gift card. All other entries will be awarded with nothing, because you don’t play to lose in the Stanley Cup Playoffs.
Donations: Entry is free, as this is a “for fun” contest only. Although not required, we suggest making a small donation through the link on our home page.
Half of all donations received between now and the end of the Stanley Cup Playoffs will be forwarded to Colon Cancer Canada in memory of capgeek.com founder Matthew Wuest. We also encourage people to donate directly to Colon Cancer Canada if they prefer.
Official Scores: All statistics will be taken from www.war-on-ice.com. Any stat changes that occur after noon on the day following the completion of the Stanley Cup Playoffs will not be taken into account.
Scoring: Answers to each question will be standardized using the scale() function in R. The Euclidean distance between each entry and the actual results will be calculated using the dist() function in R. The entry with the lowest Euclidean distance from the actual results will be the winner.
Subsequent Rounds: Additional questions will be sent to participants for rounds 2, 3, and 4. These will be sent shortly after the conclusion of the previous round. Participants who fail to submit entries for rounds 2, 3, and 4 will be assigned a random response from one of the other contestants for each question. Note that there will be limited time between rounds to complete these questions, so please plan accordingly.
Submission Times: Any submission received after the first puck drops each round will be disqualified (for Round 1) or assigned random responses, as described in “Subsequent Rounds” above (for Rounds 2, 3, and 4).
Disclaimer: We reserve the right to disqualify any entry for any reason at our discretion.
]]>