Category Archives: Uncategorized

nhlscrapr updates now on GitHub

We will no longer be making updates to nhlscrapr on CRAN; instead you can get the most recent version from the war-on-ice repository on GitHub.

To use this, you’ll need to install the library devtools first, then once that’s loaded, use the command

install_github (“war-on-ice/nhlscrapr”)

This is updated for the 15-16 season and includes all our other most recent upgrades.

A Quick Note on Adjusted Save Percentage

Different goaltenders face different distributions of shots from across the ice due to the offenses they face and the defenses in front of them. We adjust save percentage by re-weighing the components according to the league-wide distribution of shots, so that the value better translates between different goaltenders. This is similar to stratified sampling in survey methodology, and also goes by the name benchmarking.

With our danger breakdown, standard save percentage of Saves/(Saves + Goals) is expressed as

Sv% = (Saves_low + Saves_med + Saves_high)/(Saves_low + Goals_low + Saves_med + Goals_med + Saves_high + Goals_high)

The adjustment is to re-weigh every danger-based save percentage by the league-wide distribution of shots on goal in each zone:

AdSv% = (S_l/(S_l + G_l) * AllShots_l + S_m/(S_m + G_m) * AllShots_m + S_h/(S_h + G_h) * AllShots_h ) / (AllShots_l + AllShots_m + AllShots_h)

If the shots faced by the goalie have the same ratio as the league average, then their unadjusted and adjusted save percentages will be equal.



Sharing is Caring

Back in March, I met Darryl Metcalf for the first time at the Sloan conference. We were talking about how we had advertised that we would be open with our data and infrastructure, and he said something to the effect of “when?” And as I recall, my answer was “real soon”.

We’ve always taken requests for data and shared privately, but since the news of our comrade’s hiring, and our desire to stimulate further research, we’ve begun the process to share everything we’ve put together — and we mean everything — with the hopes that given our existing database format, it will be even easier for community members to make their own tools and run their own analyses.

Here’s what we’re posting publicly in the first round:

  • The raw output from nhlscrapr, including our integration from other sources and corrections for rink bias.
  • The full underlying database for all player, team and goaltender statistics.
  • By-game and by-season calculations for Goals Above Replacement.

The full file list is available here and will be updated as needed.

Here is a description for all variables in nhlscrapr and in the derived WAR files.

UPDATE: Here is the processed contracts table.


Playoffs Prediction Contest Scoreboard

Entry names and final scores listed below.

Entry.Name Score
Perfect 0
Don’t Toews Me Bro! 6.8716238602
The Brass Bonanza 7.0910481644
microdino 7.2444384897
Corsi Hockey League 7.2457407713
Sean Dooley 7.272956604
Adam Odenwelder 7.3923966229
derek8 7.5417095449
Clown Predictions, Bro 7.6645163552
JohnScottScoringMachine 7.7063411263
Corsi Calamity 7.7112029266
HawksNumbers 7.7604368224
Ben Lutz 7.8195240411
BentleyNathan1 7.8491414946
Sebastian Mankowski 7.8646730078
danbowie 7.9074140647
Carolyn Tries Her Hand 8.1370624749
dvgmacdonald 8.1405948215
thehaze 8.204558261
trevor 8.2293711818
mikael johansson 8.2495068414
Jay32600 8.2540820936
Flesh and Bone 8.3085470955
Fancy Fenwicks 8.3136010606
whichocho 8.3401821184
Losing Entry 8.3442167105
YT 8.4366901828
sad bruins fan :( 8.4866993775
Tom.Andreu 8.5119277864
Andrei D 8.5239516775
Holmgren 8.5284431538
Danton Danielson 8.6351995803
Daniel Sandler 8.650645999
Smittens 8.6721579007
Ken Peterson 8.6911295819
Corey 8.7173200392
Ovi is God 8.7205833069
Eric Single 8.7439823599
Getzlaf’s Forehead 8.7872070859
QuickkNess 8.8246743919
altrockposeur 8.8278775422
Nilesh Shah 8.8403094971
StatsbyLopez 8.8705267611
Steven David 8.8833133041
Andrew Wisneski 8.8868548107
John Barr 8.9228851524
Adam R 8.9424973307
quack attack 8.9674498118
seanbailly 9.0954598132
Flashes of Quincy 9.1015593194
Kurtis Wells 9.1390202842
Midnight Ramblers 9.1484016026
Félix Magny 9.1547807581
The Math of Khan 9.1563240565
kncpt 9.1598845942
Sabres Win 9.1612333752
Matt Cane 9.163722536
evohnave 9.1648887848
Hero Squad 9.2113579453
Mcurcio94 9.2398682861
clib542 9.2730859542
Sapp Macintosh 9.2840653428
RangerSmurf 9.2915912403
gut feels 9.3410879975
Jason Richland 9.4032997817
MBwinz 9.4134528979
Iron Ringer 9.468006452
Legs Feed The Wolf 9.5184845615
Lugnut Ninety-Two 9.5650159312
Nick 9.5677243517
Andrew Pritchard 9.5986201249
Jenni 9.5990676441
Sean 9.6218580301
Mysonzdad 9.654913344
Josh Norman 9.6743593226
Pat Holden 9.7033732654
Adrian F 9.717788625
Neel 9.7563674943
Wolfram Ott 9.7777860246
PDOwned 9.7863596557
Nikm8 9.787107307
Peace On Liquid Water 9.7934054506
RyanPrice 9.8405501909
Jon Stolte 9.8851916059
ScuttlePuck 9.9107184334
Winterhawk11 9.9367867374
aasiaat 9.9585003239
Jon Garcia 9.9856430718
PhilKessel 10.1325615448
artigascruz 10.1368246784
The Philosopher 10.1498649026
Regression to the mean streak 10.1720356159
CBJ 2016 10.3383148238
Zone Entries 10.3457779324
Wrathman 10.3840926146
clarendonbandit 10.3850245284
Kerfuffle on Frozen Water 10.4104886866
Andrew Rasmussen 10.4440463984
bigMac 10.6044809869
UponFurtherReview 10.6437886061
amcassells 10.663771638
BlueMoons68 10.6866708283
Karan 10.7103734232
BFitz 10.7969523003
Not Gonna Win 10.839300026
Stefan 10.9414853807
ZachMacDonald 11.043075811
Chris Kang 11.1936846403
kd5mdk 11.3652086796
kadri’s drunks 11.6127684377
Ohno 11.7344710644
Micah Blake McCurdy 11.9241412719
Stanley’s Bandwagon 12.0942643644
the Flamingos 12.1588773778
JH 12.7173764514
fbourassa 13.4674255004
Paddyboy 14.963914817
Aerofan79 15.0526656878
A J G 16.1596523009
Sean Burke 16.3475285916
finlayj 20.0711222646

Recapped, geek

Summary: go to to see what we’ve put together. Read below for what we have to share.

Back in January when CapGeek went offline, we grabbed a series of contract pieces from USA Today and the NHLPA website to help tide the community over. We didn’t have any plans to take the reins on a new site, partly because it would be a lot of work, partly because we didn’t have a clue where to get the data from an original source, but mainly because we hoped that the permanence of the shutdown was overstated.

What changed for us? Once the initial salary data went up, we were approached by those with the original, genuine contract data, from the same basic sources that Matthew had access to, who wanted to see the work continue. We then set to work converting it over into a new database, recruited volunteers to help us crawl through it (particularly the incomparable Alexandra Mandrycky), spent way too many hours bothering our resident cap expert (Mike Colligan) with questions, partnered up with other efforts (including the ninja himself, Greg Sinclair) and posted an initial set of contracts for all the players we could find who were active this past season at the end of February, along with a buyout and cap recapture tool. The reaction broke the site temporarily, so we knew we’d have to build a better infrastructure before we could go big.

Well, we’re ready to go big today. The “beta” version of our contracts database is now ready for consumption, reliably dating back to the 2009-10 season, and including everything we could find on contract structure, signing and performance bonuses (particularly the achievable A and B bonuses). Other features:

  • You can find any player contract or statistical breakdown from the homepage at
  • We have a link under every contract to see what the buyout terms would be for any year after the first. (We’ve only found one example where a contract was bought out without a game being played — Tim Kennedy with the Sabres — and that was a result of an arbitration decision.)
  • We’ve made it clearer which contracts have slid, which have been bought out, and what years have been “retired” out.
  • We have a quick summary of active contracts, new signings, and performance bonuses achieved at the cap home,
  • We’ll be posting quick team summaries, including team-level obligations like buyouts, performance bonus overages and retained salaries, as soon as they’re ready.

We and our (friendly) competitors surely have a ways to go before the functionality of CapGeek is once again matched; Matthew Wuest put over five years of work into it, after all. And if our experience after ExtraSkater’s shuttering has taught us anything, it’s that a gap in the market leads to fresher and better alternatives to come from those who would not have acted given the dominance of the one, so we’re expecting greater work from ourselves and others in following Matthew’s lead.

But of all the roles that CapGeek played — to have the most trusted database, the quickest reaction time to contract announcements, and the user-friendliest interface — we’re most confident that we can serve the community best in the first role, and so that’s where the bulk of our energy has gone. Which is why we’re opening all our data — contracts, game statistics, and (soon) transactions — to anyone who wants to use it (but not sell it), as long as we’re cited as the original source.

Let’s go forward together. With everyone jumping on board this train, this should be a fun off-season.


Site Terms of Use

We at built this site together for two reasons: to disseminate our own research ideas and purposes, and to have tools that everyone can use for their own research. To clarify our goals and our means, and in light of other sites having similarly stated policies, we state our positions on all of these matters.

1. The use of this site comes with absolutely no warranty.

2. We are not responsible for the consequences of what you do with the data — legally or morally.

3. Our site’s prime purpose is for research: you can use what you like for plain fandom, articles, personal exploration and knowledge, tweets, blog posts, and so forth. Our work to build the site started as the result of our needs for academic work and that’s still what we do with it first. In that spirit, we ask that you cite our work and, in particular, the specific page you obtained the data from so that others can obtain it as well. If you would like to show extra appreciation, the Donate button is on the site’s front page.

4. Automatically scraping our pages for data is not only strictly prohibited, it’s not worth your time to do it; it’s an unnecessary strain on our servers, and that’s why we have Download buttons. If you want data that’s more complicated than our current queries, you can ask us by email at or on Twitter at @war_on_ice.

5. We reserve the right to offer our services in consulting arrangements to help you get the most from transforming and aggregating the data in meaningful ways. We offer what we can for free because we treasure the community and value openness, but we’re not able to take every request for free simply because we all have full time jobs.

5. You may not sell any raw data you obtain from the site. This is not just because you can get it from here for free and your customers would be fleeced; it’s because this comes from the league and their partners and it’s not our place to sell what they allow us to use. This is not to say you can’t use it in articles that are posted behind a paywall, or use it in anything that’s at all transformative. You know what we mean here; if you have any doubt, ask us,.

6. Our underlying software is licensed under the GPLv2. We will be sharing what we can on GitHub for your convenience.

GUEST POST: Hockey and Euclid — Introduction to Bombay Ratings

Note:  This is part two of a series of guest posts written by @MannyElk.  In the first installment of Hockey and Euclid, Manny outlined the player similarity calculation used in the Similarity Calculator. The explanations to follow will assume knowledge from that article, so we urge anybody who wishes to understand the derivation of the Bombay function in detail to get caught up.  

We at are happy to host Manny’s newest Bombay Ratings App!  We continue to encourage others in the hockey research community to follow Manny’s lead and develop public applications that will further the frontiers of research in hockey analytics.

While working on the Similarity Calculator, I stumbled upon a study in which Euclidean distance was used to compare NBA players to Michael Jordan.  The author used the distances to generate a list of the most similar players to the man most would agree is the best ever.  From this idea, Bombay ratings were just a conceptual hop, skip and jump away.  Instead of choosing my own Michael Jordan from a list of historical players, I invented one.

Continue reading

The Road To WAR, Part 11: Shot Rates For And Against, or that quality that we deliberately avoid calling “possession”

This is the big one that drives most of what we see in the game, but is also the most difficult to calculate directly: how would the shot rates for and against a team behave if we swapped out a player with their equivalent replacement?

First, here’s the progression in methods that we’ve seen so far:

  1. Good old plus-minus (+/-), which no one seems to think is good but everyone agrees is old. It was the number that was used for the longest time to capture supposed relative defensive ability, but among its flaws are that it’s too dependent on goaltenders, too dependent on linemates, and the sample sizes are too small to produce a strong signal. Relative plus-minus doesn’t have the first problem, if the only job is to compare against one’s own teammates, but can still suffer with too much common time with other players.
  2. Corsi/Fenwick/Bowman numbers take away the impact of the goaltender and of shooting skill, in favor of at least a tenfold increase in sample size. They add in contributions from usage like zone starts which can now be detected statistically and still have the linemate and competition problem.
  3. Regression-adjusted statistics for shot differential; see our comprehensive historical list here, then add in Stephen Burtch’s dCorsi and Domenic Galamini’s Usage-Adjusted Corsi. Essentially, make adjustments to the macro-level stats depending on whom they played with and against.

You could hypothetically drop in any of the above pieces and spin them into a measure of goals; the conversion than can be slotted along the other contributions to get a total value. But we have a few other needs:

  1. We want to adjust for teammates and competition simultaneously, including replacement level players.
  2. We need to separate offensive and defensive contributions.
  3. We adjust for usage, including whether a faceoff was won or lost, and score situation.
  4. We model separately for each shot danger, because we know that forwards and defensemen contribute differently between and within these groups.
  5. We also want to distinguish between performance (what happened) and talent (what would be most likely in future).

Continue reading

The Road to WAR Series (Index)

All the articles in the Road to WAR series.

  1. The Single Number Dream
  2. All Rate Now
  3. Shot Quality Assurance, plus A Bonus on Travel Fatigue
  4. You can’t spell “An Incremental Improvement” without two “team”s<
  5. Getting Goals Above Baseline
  6. Rate-Based Event Adjustments For Score Effects, Home Advantage and Event Count Bias
  7. What do we mean by “replacement”? A case study with faceoffs
  8. Penalties Taken And Drawn
  9. Historical Shooting and Goaltending
  10. Modern Goaltending and Shooting
  11. Shot Rates For And Against