WAR On Ice: The Blog http://blog.war-on-ice.com Your site for modern hockey analytics Fri, 27 Nov 2015 21:43:42 +0000 en-US hourly 1 http://wordpress.org/?v=4.0 Annotated Glossary http://blog.war-on-ice.com/annotated-glossary/ http://blog.war-on-ice.com/annotated-glossary/#comments Thu, 26 Nov 2015 04:17:25 +0000 http://blog.war-on-ice.com/?p=512 Note (Nov 27, 2015): As a blanket disclaimer, we didn’t define most of the terms or notations used on this site. These are all products of the community, both as individuals and as collectives. Our innovations are specifically labelled here; the rest should not be taken as ours, especially if a reference is provided.

As complete a list of terms that we have on the site follows (or the relevant links.)

WAR/GAR: The whole series is linked here.

Time on ice:

  • TOI is the time in minutes when a player is on the ice.
  • TOIoff is the time when a player is off the ice, but in a game in which they played.
  • TOI% is the percentage of time they spend on the ice;
  • TOI60 is the amount of minutes out of 60 that the player was on the ice.
  • TOI/Gm is the amount of minutes spent on the ice per game.

General Shot-Based Event Counts: The bread and butter of modern hockey analysis.

Goal Events (G): Shots that are not saved and cross the goal line. The ceremony consists of a bright red light of shame, a stripe-fashioned league employee exaggeratedly drawing attention to the goaltender’s failure to do their job, and a small party at center ice to begin the process anew.

  • G: Goals scored by the individual/team.
  • A: All assists. A1: Primary assists; A2: Secondary assists.
  • G60: Goals scored by the individual/team, per 60 minutes. A60 and P60 for all assists and points.
  • GF: Goals scored by a player or their teammates when the focal player is on the ice. GFoff: goals scored by a player’s teammates when the player is off the ice.
  • GA: Goals scored against a team when the focal player is on the ice.
  • G+/-: Goal differential (GF – GA). Similar to plus-minus but should always include the specific game scenario when identified (such as 5v5).
  • GF60: GF / TOI * 60 minutes.
  • GA60: GA / TOI * 60 minutes.
  • GF%: The share of goals scored by the focal player’s team compared to all goals scored when the player is on the ice — GF/(GF + GA).
  • GF%off: The share of goals scored by the focal player’s team compared to all goals scored when the player is on the ice — GFoff/(GFoff + GAoff).
  • GF%Rel: The on-ice share of goals for a player’s team minus the off-ice share of goals.

Shots On Goal (S): Shot attempts that are somehow the goaltender’s responsibility. Includes goals, as above, and saved shots, which comprises the bulk of a goaltender’s productive output.

Take the above quantities and replace “G” with “S” like so:

  • SF: Shots on goal produced by a player or their teammates when the focal player is on the ice. SFoff: Shots on goal produced by a player’s teammates when the player is off the ice.

New ones:

  • iSF: Individual shots-on-goal for.
  • SH: Individual saved shots.
  • PSh%: Personal shooting percentage: a player’s goals (G) divided by their individual shots on goal (iSF).
  • OSh%: On-ice shooting percentage: a player’s on-ice goals for (GF) divided by their on-ice shots on goal (SF).
  • OSv%: On-ice save percentage: a player’s on-ice goals against (GA) divided by their on-ice shots on goal against (SA).
  • PDO: The sum of on-ice save and shooting percentages (OSh% + OSv%). League average is 100 by construction. Origins: Irreverent Oilers, with a nice write-up here.

Fenwick (F)/Unblocked Shot Attempts (USAT): Take shots on goal and add shots that miss the net entirely. Named for its inventor, Battle of Alberta author Matt Fenwick, not a fictional Duchy that changed the course of the fictional world.

Take the above quantities and replace “G” with “F” like so:

  • FF: Unblocked shots produced by a player or their teammates when the focal player is on the ice. FFoff: Unblocked shots produced scored by a player’s teammates when the player is off the ice.

New ones:

  • iFF: Individual unblocked-shots-on-goal for.
  • MS: Individual shots that missed the net.
  • PFenSh%: Personal Fenwick shooting percentage: a player’s goals (G) divided by their individual unblocked shots (iFF).
  • OFenSh%: On-ice shooting percentage: a player’s on-ice goals for (GF) divided by their on-ice unblocked shots (FF).
  • OFenSv%: On-ice save percentage: a player’s on-ice goals against (GA) divided by their on-ice unblocked shots against (FA).
  • FenPDO: The sum of on-ice Fenwick save and shooting percentages (OFenSh% + OFenSv%). League average is 100 by construction. Not on the site at present.
  • FP60: Fenwick Pace (per 60 minutes), equal to FF60 + FA60.
  • OFOn%: On-ice Fenwick on-goal percentage: a player’s on-ice shots on net (SF) divided by their on-ice unblocked shots (FF).
  • OFAOn%: On-ice Fenwick on-goal percentage against: a player’s on-ice shots on net against (SA) divided by their on-ice unblocked shots allowed (FA).

Corsi (C)/Shot Attempts (SAT): Corsi events consist of all shot attempts: blocked, missed, saves, goals. Should probably just be called “shots”, so we’ll do that here. Named for goaltending coach and mustache champion Jim Corsi, coined and minted by Tim Barnes (aka Vic Ferrari).

Take the above quantities and replace “G” with “C” like so:

  • CF: Shots produced by a player or their teammates when the focal player is on the ice. CFoff: Shots produced scored by a player’s teammates when the player is off the ice.

New ones:

  • iCF: Individual shots.
  • BK: Shots that an individual attempted that were blocked.
  • AB: Attempts Blocked, shots that the individual themselves blocked.
  • CP60: Corsi Pace (per 60 minutes), equal to CF60 + CA60.
  • OCOn%: On-ice Corsi on-goal percentage: a player’s on-ice shots on net (SF) divided by their on-ice shots (CF).
  • OCAOn%: On-ice Corsi on-goal percentage against: a player’s on-ice shots on net against (SA) divided by their on-ice shots allowed (CA).

Location Data:

The NHL has (x,y) location data available for all shots-on-goal, as well as hits and penalties. ESPN and Sportsnet both have this data available for missed shot locations and the location where shots were blocked (not from where they were taken, though both are useful).

This location data has systematic bias from rink to rink as well as random measurement error. We can’t do much for the random error, but to correct the bias in shot location data, we use the basic method proposed by Schuckers and Curro (Appendix A):

  • Get the distances for each shot from the net, conditioned on type (slap/not) and whether they were at home or away for the team.
  • Calculate the cumulative distribution functions for the distances of these shots at home and away for each team (which assumes that the shot distance distribution is truly the same, both home and away). Assume that all distances differ around the same league average and that there is no net league bias (which for standardization is fine).
  • The adjusted distance for a shot is then calculated by quantile: what fraction of shots in this building of this type were at this distance? (Say, 25% of non slap shots were within 17 feet of the net.) Take that quantile and get the number for that team on the road. (Say, 25% of non slap shots were within 19 feet of the net on the road).
  • Project the shot on a line from the center of the goal line (which is the reference point for distance) going through the shot; move the shot to a position on that line with the correct distance.

Having a de-biased measure for shot location is essential for any measures that are going to compare from building to building. Speaking of:

Danger Zones And Types: 

There are all sorts of mechanisms for judging the relative worth of a shot given its (x,y) coordinates and other information. Schuckers has a comprehensive method for evaluating expected goals called DIGR; for our purposes, we simplified the available data into three main features:

  1. Shot location, by block. There are any number of ways to dissect the impact of location, but the most straightforward is by grouping into location blocks rather than smoothing a continuous function over a surface (as in DIGR). There were a few inspirations for this scheme:
  2. Shot features, gleaned from the play by play: rebounds are classified shots taken within 3 seconds of another shot attempt, and rush shots are taken within 4 seconds of an event in another zone (a definition derived from David Johnson’s work).
  3. Blocked shots pose an extra problem: they’re shots that have been recorded at the point at which they’ve been blocked, and are also more likely to be shots of less quality and speed by nature of their blocking.

A shot’s Danger is then defined by this method:

  1. Start with the zone in which the shot attempt was recorded, 1 through 3.
  2. Add 1 if it was a rebound or a rush shot.
  3. Subtract 1 if it was a blocked shot.
  4. Increase to 1 if it was equal to 0.

For goaltenders, we then have

  • G.U, S.U: Goals and Saves with unknown danger.
  • G.L, S.L: Goals and Saves with low (1) danger.
  • G.M, S.M: Goals and Saves with medium (2) danger.
  • G.H, S.H: Goals and Saves with high (3+) danger.

Scoring Chances (SC):

All shot attempts that have danger 2 or greater. As originally described here.

Take the above quantities and replace “G” with “SC” like so:

  • SCF: Scoring chances produced by a player or their teammates when the focal player is on the ice. SCFoff: Scoring chances produced scored by a player’s teammates when the player is off the ice.
  • iSC: Individual scoring chances.
  • SCP60: Scoring Chance Pace (per 60 minutes), equal to SCF60 + SCA60.

 

High-Danger Scoring Chances (HSC):

All shot attempts that have danger 3 or greater. Take the above quantities and replace “G” with “HSC” like so:

  • HSCF: Scoring chances produced by a player or their teammates when the focal player is on the ice. HSCFoff: Scoring chances produced scored by a player’s teammates when the player is off the ice.
  • iHSC: Individual scoring chances.
  • HSCP60: Scoring Chance Pace (per 60 minutes), equal to HSCF60 + HSCA60.

 

Adjusted Save Percentage (AdSv%):

Defined here, it is the weighting of a goaltender’s save percentage in each danger level by the fraction of shots that would be expected from the league-wide distribution.

Score Situations:

Score Effects are the acknowledged differences in team performance based on the difference in score. There are popular methods are used for accounting for the score in whatever results are presented; we host two. The first, Score Close, was pioneered by Tore Purdy (aka JLikens) and simply includes situations where teams are within 1 goal of each other in Periods 1 and 2, and tied afterwards.

The second, manual score adjustment, has a few different predecessors:

Not surprisingly, we went with our adjustments, and implement a full Poisson model for score, period and rink effects for each shot type by each danger zone. Adding the rink bias correction to our score and period correction was inspired by Schuckers and Macdonald.

Charts:

Bubble charts: The main look and design of the bubble charts, with four variables displayed simultaneously, comes from Rob Vollman’s Player Usage Charts, including the starting variables: x-axis for zone starts, y-axis for quality of competition, color for Relative Corsi. Our expansions and additions include every variable we have at our disposal including different game and score states.

Hextally: Directly inspired by Kirk Goldsberry’s NBA Shot Charts for Grantland and 538 (but perhaps the lack of a player with an appropriate name in the NBA made it difficult.) Expanded for both shot success probabilities (standard for basketball) and the rate of shots taken from each area of the ice (not standard for basketball, even if it were played on the ice.)

Shift Charts: We started with the original NHL shift charts before adding our own features. Both ShiftChart.com and timeonice.com (now defunct) hosted their own versions, inspiredby the same NHL.com template.

Shot Attempt Timelines: ExtraSkater (defunct) made them popular, but Behind The Net had the first ones we could find online.

Pulling the Goalie: Original post is here.

Raw Teammate/Competition Statistics:

For each of the teammate and competition statistics, relative numbers on a game by game basis by taking an exponentially weighted prediction of the next game’s numbers.

  • TOIT60, TOIC60: The average time on ice per 60 minutes for teammates and competition in previous games, weighted by mutual time on ice.
  • CorT%, CorC%: The share of Corsi events for the teammates and competition in previous games, weighted by mutual time on ice.
  • tCF60, cCF60: The rate of Corsi events recorded on-ice for the teammates and competition in previous games, weighted by mutual time on ice.
  • tCA60, cCA60: The rate of Corsi events recorded on-ice against the teammates and competition in previous games, weighted by mutual time on ice.

Other:

  • Penalties: PN are non-coincidental penalties taken by a player; PN- are non-coincidental penalties drawn by a player. PenD is the difference, PN- minus PN; PenD60 is the net rate of penalties drawn every 60 minutes.
  • Faceoffs: FO_W are faceoff wins. FO_L are faceoff losses. FO%^ is a shrunken faceoff win percentage, to avoid extreme results: identical to FO% if more than 20 faceoffs were taken, a combination of this and a 40% success ratio if less.
  • Zone starts: ZSO, ZSN and ZSD are the number of faceoffs taken in the offensive, neutral and defensive zones for which the player was present; ZSOoff, ZSNoff and ZSDoff are the number of faceoffs taken in the offensive, neutral and defensive zones for which the player was absent. ZSO% is the share of offensive starts divided by the offensive plus defensive. ZSO%Off is the share of offensive starts for when that player is absent; ZSO%Rel is ZSO% minus ZSO%Off.
  • GV are giveaways, TK are takeaways. HIT are hits taken, HIT- are hits absorbed. None of these are recorded reliably in NHL buildings.
]]>
http://blog.war-on-ice.com/annotated-glossary/feed/ 0
nhlscrapr updates now on GitHub http://blog.war-on-ice.com/nhlscrapr-updates-now-on-github/ http://blog.war-on-ice.com/nhlscrapr-updates-now-on-github/#comments Sun, 11 Oct 2015 23:15:38 +0000 http://blog.war-on-ice.com/?p=509 We will no longer be making updates to nhlscrapr on CRAN; instead you can get the most recent version from the war-on-ice repository on GitHub.

To use this, you’ll need to install the library devtools first, then once that’s loaded, use the command

install_github (“war-on-ice/nhlscrapr”)

This is updated for the 15-16 season and includes all our other most recent upgrades.

]]>
http://blog.war-on-ice.com/nhlscrapr-updates-now-on-github/feed/ 0
A Quick Note on Adjusted Save Percentage http://blog.war-on-ice.com/a-quick-note-on-adjusted-save-percentage/ http://blog.war-on-ice.com/a-quick-note-on-adjusted-save-percentage/#comments Fri, 11 Sep 2015 13:17:19 +0000 http://blog.war-on-ice.com/?p=504 Different goaltenders face different distributions of shots from across the ice due to the offenses they face and the defenses in front of them. We adjust save percentage by re-weighing the components according to the league-wide distribution of shots, so that the value better translates between different goaltenders. This is similar to stratified sampling in survey methodology, and also goes by the name benchmarking.

With our danger breakdown, standard save percentage of Saves/(Saves + Goals) is expressed as

Sv% = (Saves_low + Saves_med + Saves_high)/(Saves_low + Goals_low + Saves_med + Goals_med + Saves_high + Goals_high)

The adjustment is to re-weigh every danger-based save percentage by the league-wide distribution of shots on goal in each zone:

AdSv% = (S_l/(S_l + G_l) * AllShots_l + S_m/(S_m + G_m) * AllShots_m + S_h/(S_h + G_h) * AllShots_h ) / (AllShots_l + AllShots_m + AllShots_h)

If the shots faced by the goalie have the same ratio as the league average, then their unadjusted and adjusted save percentages will be equal.

 

 

]]>
http://blog.war-on-ice.com/a-quick-note-on-adjusted-save-percentage/feed/ 0
Building the Team http://blog.war-on-ice.com/building-the-team/ http://blog.war-on-ice.com/building-the-team/#comments Fri, 21 Aug 2015 19:56:57 +0000 http://blog.war-on-ice.com/?p=496 The WAR On Ice operation is looking for new people to help keep the site operations going. Can we count on you?

We have a few objectives with the site, moving forward:

  1. Keep data accessible. We know that you like downloading things into a .csv and then doing magic with Excel, or R, or Python, or your abacus. We’re not going to change that. In fact, we’d like to make it easier for you to do.
  2. Act as a community portal for new projects. Have you been working on a project, and you’d like it hosted somewhere? Particularly if it uses our data to start with, if it survives vetting, we’ll work with you and help to get it online.
  3. Maintain accurate information. We want to uphold our reputation as a reliable source of data first and foremost.
  4. Solve the “bus problem”. That is: if one of us gets hit by a bus, the site should keep on going.

You should join up if you want to give back to the community that you enjoy so much, by honing your own data-ninja skills on stuff that you know will help people out.

You shouldn’t join up if:

  1. You want to make any money off this enterprise. Our ad revenue and donations feed straight back into server costs. If there’s anything left over, we donate back to the community or to charities recognized by the community.
  2. You want to become super famous. Your time would probably be better spent making YouTube videos or animated GIFs like others we love.
  3. You enjoy the blissful ignorance of how the data gets created in the first place and what could be wrong with it.

If you’re still interested, here are some more details about the site.

We tend to divide site operations into two categories:

  1. Database Creation: Our database is assembled by starting with a number of different sources: for in-game information, NHL.com and sportsnet.ca; for roster info, NHL.com and TSN; for contract and transaction data, we’ve curated our own sources. Getting this data into a usable format is no easy task, let alone integrating it all into a common framework. This is going to be even more work when the NHL decides to completely overhaul their data back-end, which we expect to happen sometime mid-season with no advance notice. Responsibilities to be taken on:
    • Scraper/Manipulator ExtraordinaireMaintain the creation of our master database from original game sources.
    • Contract/Cap Guru Maintain and possibly add to our original database of contracts.
    • Database Manager: Take the previous two parts and integrate them together in the back end so anyone else in the community can use them with ease.
    • Quality Control: When someone does something new, run validations to make sure nothing else went to hell.
  2. App Creation: What good is data if we have no way to show it? Right now, we have a variety of “apps” running on WAR On Ice that allow users to query information and see it displayed in semi-aesthetically pleasing tables and charts. Yes, we know that they’re slow. That’s where you come in. You can write applications that will hook into our data — skater performance, contract information, whatever your heart desires. Right now, most of our apps are built in Shiny, but for longer-term solutions, they need to be translated into something more efficient and less server-intensive. (Servers are expensive). If everything is built upon a common database format, anyone can develop their own apps on top of this without necessarily relying on our coding method. Responsibilities to be taken on:
    • Existing apps: Take the old app structure from Shiny and rebuild them as you see fit from our back-end.
    • New apps: Build you own apps, extending the ones we already have. This may include working from your own ideas or ideas from the community.
    • Graphics Specialist: Design new interactive graphics for our current modes that are even spiffier, and shareable/exportable to a wider audience.

Still with us? If you are interested in joining the team, please send us a note to waronice.com@gmail.com. Please include: the responsibilities you’re interested in and any prior experience you have.

]]>
http://blog.war-on-ice.com/building-the-team/feed/ 0
Sharing is Caring http://blog.war-on-ice.com/sharing-is-caring/ http://blog.war-on-ice.com/sharing-is-caring/#comments Fri, 31 Jul 2015 18:59:01 +0000 http://blog.war-on-ice.com/?p=489 Back in March, I met Darryl Metcalf for the first time at the Sloan conference. We were talking about how we had advertised that we would be open with our data and infrastructure, and he said something to the effect of “when?” And as I recall, my answer was “real soon”.

We’ve always taken requests for data and shared privately, but since the news of our comrade’s hiring, and our desire to stimulate further research, we’ve begun the process to share everything we’ve put together — and we mean everything — with the hopes that given our existing database format, it will be even easier for community members to make their own tools and run their own analyses.

Here’s what we’re posting publicly in the first round:

  • The raw output from nhlscrapr, including our integration from other sources and corrections for rink bias.
  • The full underlying database for all player, team and goaltender statistics.
  • By-game and by-season calculations for Goals Above Replacement.

The full file list is available here and will be updated as needed.

Here is a description for all variables in nhlscrapr and in the derived WAR files.

UPDATE: Here is the processed contracts table.

 

]]>
http://blog.war-on-ice.com/sharing-is-caring/feed/ 5
Playoffs Prediction Contest Scoreboard http://blog.war-on-ice.com/playoffs-prediction-contest-scoreboard/ http://blog.war-on-ice.com/playoffs-prediction-contest-scoreboard/#comments Fri, 19 Jun 2015 16:50:35 +0000 http://blog.war-on-ice.com/?p=485 Entry names and final scores listed below.

Entry.Name Score
Perfect 0
Don’t Toews Me Bro! 6.8716238602
The Brass Bonanza 7.0910481644
microdino 7.2444384897
Corsi Hockey League 7.2457407713
Sean Dooley 7.272956604
Adam Odenwelder 7.3923966229
derek8 7.5417095449
Clown Predictions, Bro 7.6645163552
JohnScottScoringMachine 7.7063411263
Corsi Calamity 7.7112029266
HawksNumbers 7.7604368224
Ben Lutz 7.8195240411
BentleyNathan1 7.8491414946
Sebastian Mankowski 7.8646730078
danbowie 7.9074140647
Carolyn Tries Her Hand 8.1370624749
dvgmacdonald 8.1405948215
thehaze 8.204558261
trevor 8.2293711818
mikael johansson 8.2495068414
Jay32600 8.2540820936
Flesh and Bone 8.3085470955
Fancy Fenwicks 8.3136010606
whichocho 8.3401821184
Losing Entry 8.3442167105
YT 8.4366901828
sad bruins fan :( 8.4866993775
Tom.Andreu 8.5119277864
Andrei D 8.5239516775
Holmgren 8.5284431538
Danton Danielson 8.6351995803
Daniel Sandler 8.650645999
Smittens 8.6721579007
Ken Peterson 8.6911295819
Corey 8.7173200392
Ovi is God 8.7205833069
Eric Single 8.7439823599
Getzlaf’s Forehead 8.7872070859
QuickkNess 8.8246743919
altrockposeur 8.8278775422
Nilesh Shah 8.8403094971
StatsbyLopez 8.8705267611
Steven David 8.8833133041
Andrew Wisneski 8.8868548107
John Barr 8.9228851524
Adam R 8.9424973307
quack attack 8.9674498118
seanbailly 9.0954598132
Flashes of Quincy 9.1015593194
Kurtis Wells 9.1390202842
Midnight Ramblers 9.1484016026
Félix Magny 9.1547807581
The Math of Khan 9.1563240565
kncpt 9.1598845942
Sabres Win 9.1612333752
Matt Cane 9.163722536
evohnave 9.1648887848
Hero Squad 9.2113579453
Mcurcio94 9.2398682861
clib542 9.2730859542
Sapp Macintosh 9.2840653428
RangerSmurf 9.2915912403
gut feels 9.3410879975
Jason Richland 9.4032997817
MBwinz 9.4134528979
Iron Ringer 9.468006452
Legs Feed The Wolf 9.5184845615
Lugnut Ninety-Two 9.5650159312
Nick 9.5677243517
Andrew Pritchard 9.5986201249
Jenni 9.5990676441
Sean 9.6218580301
Mysonzdad 9.654913344
Josh Norman 9.6743593226
Pat Holden 9.7033732654
Adrian F 9.717788625
Neel 9.7563674943
Wolfram Ott 9.7777860246
PDOwned 9.7863596557
Nikm8 9.787107307
Peace On Liquid Water 9.7934054506
RyanPrice 9.8405501909
Jon Stolte 9.8851916059
ScuttlePuck 9.9107184334
Winterhawk11 9.9367867374
aasiaat 9.9585003239
Jon Garcia 9.9856430718
PhilKessel 10.1325615448
artigascruz 10.1368246784
The Philosopher 10.1498649026
Regression to the mean streak 10.1720356159
CBJ 2016 10.3383148238
Zone Entries 10.3457779324
Wrathman 10.3840926146
clarendonbandit 10.3850245284
Kerfuffle on Frozen Water 10.4104886866
Andrew Rasmussen 10.4440463984
bigMac 10.6044809869
UponFurtherReview 10.6437886061
amcassells 10.663771638
BlueMoons68 10.6866708283
Karan 10.7103734232
BFitz 10.7969523003
Not Gonna Win 10.839300026
Stefan 10.9414853807
ZachMacDonald 11.043075811
Chris Kang 11.1936846403
kd5mdk 11.3652086796
kadri’s drunks 11.6127684377
Ohno 11.7344710644
Micah Blake McCurdy 11.9241412719
Stanley’s Bandwagon 12.0942643644
the Flamingos 12.1588773778
JH 12.7173764514
fbourassa 13.4674255004
Paddyboy 14.963914817
Aerofan79 15.0526656878
A J G 16.1596523009
Sean Burke 16.3475285916
finlayj 20.0711222646
]]>
http://blog.war-on-ice.com/playoffs-prediction-contest-scoreboard/feed/ 0
Recapped, geek http://blog.war-on-ice.com/recapped-geek/ http://blog.war-on-ice.com/recapped-geek/#comments Fri, 29 May 2015 23:40:11 +0000 http://blog.war-on-ice.com/?p=475 Summary: go to http://war-on-ice.com/cap/ to see what we’ve put together. Read below for what we have to share.

Back in January when CapGeek went offline, we grabbed a series of contract pieces from USA Today and the NHLPA website to help tide the community over. We didn’t have any plans to take the reins on a new site, partly because it would be a lot of work, partly because we didn’t have a clue where to get the data from an original source, but mainly because we hoped that the permanence of the shutdown was overstated.

What changed for us? Once the initial salary data went up, we were approached by those with the original, genuine contract data, from the same basic sources that Matthew had access to, who wanted to see the work continue. We then set to work converting it over into a new database, recruited volunteers to help us crawl through it (particularly the incomparable Alexandra Mandrycky), spent way too many hours bothering our resident cap expert (Mike Colligan) with questions, partnered up with other efforts (including the ninja himself, Greg Sinclair) and posted an initial set of contracts for all the players we could find who were active this past season at the end of February, along with a buyout and cap recapture tool. The reaction broke the site temporarily, so we knew we’d have to build a better infrastructure before we could go big.

Well, we’re ready to go big today. The “beta” version of our contracts database is now ready for consumption, reliably dating back to the 2009-10 season, and including everything we could find on contract structure, signing and performance bonuses (particularly the achievable A and B bonuses). Other features:

  • You can find any player contract or statistical breakdown from the homepage at war-on-ice.com.
  • We have a link under every contract to see what the buyout terms would be for any year after the first. (We’ve only found one example where a contract was bought out without a game being played — Tim Kennedy with the Sabres — and that was a result of an arbitration decision.)
  • We’ve made it clearer which contracts have slid, which have been bought out, and what years have been “retired” out.
  • We have a quick summary of active contracts, new signings, and performance bonuses achieved at the cap home, war-on-ice.com/cap.
  • We’ll be posting quick team summaries, including team-level obligations like buyouts, performance bonus overages and retained salaries, as soon as they’re ready.

We and our (friendly) competitors surely have a ways to go before the functionality of CapGeek is once again matched; Matthew Wuest put over five years of work into it, after all. And if our experience after ExtraSkater’s shuttering has taught us anything, it’s that a gap in the market leads to fresher and better alternatives to come from those who would not have acted given the dominance of the one, so we’re expecting greater work from ourselves and others in following Matthew’s lead.

But of all the roles that CapGeek played — to have the most trusted database, the quickest reaction time to contract announcements, and the user-friendliest interface — we’re most confident that we can serve the community best in the first role, and so that’s where the bulk of our energy has gone. Which is why we’re opening all our data — contracts, game statistics, and (soon) transactions — to anyone who wants to use it (but not sell it), as long as we’re cited as the original source.

Let’s go forward together. With everyone jumping on board this train, this should be a fun off-season.

 

]]>
http://blog.war-on-ice.com/recapped-geek/feed/ 3
Site Terms of Use http://blog.war-on-ice.com/site-terms-of-use/ http://blog.war-on-ice.com/site-terms-of-use/#comments Sun, 17 May 2015 14:21:00 +0000 http://blog.war-on-ice.com/?p=468 We at war-on-ice.com built this site together for two reasons: to disseminate our own research ideas and purposes, and to have tools that everyone can use for their own research. To clarify our goals and our means, and in light of other sites having similarly stated policies, we state our positions on all of these matters.

1. The use of this site comes with absolutely no warranty.

2. We are not responsible for the consequences of what you do with the data — legally or morally.

3. Our site’s prime purpose is for research: you can use what you like for plain fandom, articles, personal exploration and knowledge, tweets, blog posts, and so forth. Our work to build the site started as the result of our needs for academic work and that’s still what we do with it first. In that spirit, we ask that you cite our work and, in particular, the specific page you obtained the data from so that others can obtain it as well. If you would like to show extra appreciation, the Donate button is on the site’s front page.

4. Automatically scraping our pages for data is not only strictly prohibited, it’s not worth your time to do it; it’s an unnecessary strain on our servers, and that’s why we have Download buttons. If you want data that’s more complicated than our current queries, you can ask us by email at waronice.com@gmail.com or on Twitter at @war_on_ice.

5. We reserve the right to offer our services in consulting arrangements to help you get the most from transforming and aggregating the data in meaningful ways. We offer what we can for free because we treasure the community and value openness, but we’re not able to take every request for free simply because we all have full time jobs.

5. You may not sell any raw data you obtain from the site. This is not just because you can get it from here for free and your customers would be fleeced; it’s because this comes from the league and their partners and it’s not our place to sell what they allow us to use. This is not to say you can’t use it in articles that are posted behind a paywall, or use it in anything that’s at all transformative. You know what we mean here; if you have any doubt, ask us,.

6. Our underlying software is licensed under the GPLv2. We will be sharing what we can on GitHub for your convenience.

]]>
http://blog.war-on-ice.com/site-terms-of-use/feed/ 1
GUEST POST: Hockey and Euclid — Introduction to Bombay Ratings http://blog.war-on-ice.com/bombay-ratings/ http://blog.war-on-ice.com/bombay-ratings/#comments Mon, 20 Apr 2015 14:12:46 +0000 http://blog.war-on-ice.com/?p=457 Note:  This is part two of a series of guest posts written by @MannyElk.  In the first installment of Hockey and Euclid, Manny outlined the player similarity calculation used in the Similarity Calculator. The explanations to follow will assume knowledge from that article, so we urge anybody who wishes to understand the derivation of the Bombay function in detail to get caught up.  

We at www.war-on-ice.com are happy to host Manny’s newest Bombay Ratings App!  We continue to encourage others in the hockey research community to follow Manny’s lead and develop public applications that will further the frontiers of research in hockey analytics.

While working on the Similarity Calculator, I stumbled upon a study in which Euclidean distance was used to compare NBA players to Michael Jordan.  The author used the distances to generate a list of the most similar players to the man most would agree is the best ever.  From this idea, Bombay ratings were just a conceptual hop, skip and jump away.  Instead of choosing my own Michael Jordan from a list of historical players, I invented one.

Gordon Bombay (no relation to the legendary coach) played two seasons in the NHL.  In his first season, as a forward, he led the league in every single statistical category and eventually his team to the Stanley Cup.  Seeking a challenge, Bombay converted to defence in his second season.  Undaunted, he repeated his rookie season success, once again unmatched by his peers in every single facet of the game.  Bombay promptly retired, and no player since 2005 has been able to surpass his accomplishments.

Bombay’s stats at either position are equal to the best recorded values among regular skaters at that position since the 2005-2006 season.  Thus, he possesses the best stats we can imagine without stepping outside the boundaries of what real players have been able to accomplish.  If you don’t wish to entertain hypotheticals, consider an alternative explanation:  The similarity calculation evaluates “distance” between players, each occupying a position in imaginary space.  This space has as many dimensions as there are categories by which you choose to compare players, and the limits of each dimension are set by the maximum and minimum recorded values since 2005-2006.  Gordon Bombay is simply a marker we’ve decided to place at the positive-most position in space — the position where the positive extrema of each dimension meet. In a three-dimensional plot, this is simply a corner.  The Bombay Rating is the similarity between a player and Gordon Bombay.

Hence, we’ve laid the foundation for a method by which we can easily evaluate how “good” a player’s stats are that is surprisingly flexible and reasonably effective at producing intuitively pleasant results.  The Bombay function essentially does what we all do when we pull up a player’s statistics.  The advantage is that it’s more precise, quicker, and returns a single number.  Recall that the similarity calculation is a function of the chosen dimensions and corresponding weights.  It follows that the Bombay is function of those same variables.  While this permits fluidity in what can be accomplished by the method, it also makes it entirely dependent on the quality of the measures used.

The Bombay app I developed uses a variety of 5v5 stats to assign ratings to skaters based on the selected weights and generate charts comparing players to Gordon Bombay in each of the chosen categories.

Screen Shot 2015-04-20 at 10.09.23 AM

(click to enlarge)

The outer edge of the chart represents a 100% similarity to Bombay in that measure.  This is only achieved if a selected player-season possesses the best recorded value in that metric among regular skaters since 2005.  The dashed grey polygon represents another fictional player – one whose stats are all equal to the league average for regular skaters at that position.  Note that league average does not signify a 50% similarity.  At the default weights, this hypothetical average forward has a Bombay rating near 46 and the defenceman, 45.

I should confess that the default weights are largely arbitrary.  I believe the correct weights to use are case-dependent, and I certainly encourage users to assign their own.  I’ve found that using the “Defence” preset weights as a starting point to evaluate bottom-six or defensively-oriented forwards often produces more agreeable results.  Individual season rankings can be viewed by toggling the “Table” tab and further filtered using the inputs at the bottom of each column.  Using preset weights, the names atop the Forward rankings (Ovechkin, Sedin, Jagr, Crosby, Zetterberg, Malkin, Sakic) are who you’d expect; Defencemen, to a much lesser extent (Visnovsky, Karlsson, Giordano, Byfuglien, Campbell, Niskanen, Weber).  It’s no secret that the evaluation of defencemen, by analytical and traditional methods alike, leaves to be desired at times.  With better measures of defensive ability will come better results by this method.

Bombay ratings can easily be computed using aggregate player stats.  You can view career Bombays here.  While I wouldn’t necessarily trust default Bombays to provide a single number indicative of player quality over metrics like WAR and GvT, I believe the method has very interesting potential and flexibility. For one, it can easily be expanded as new stats become available.  Secondly, the same method can be applied in other leagues, namely Canadian Major Junior and college leagues.

]]>
http://blog.war-on-ice.com/bombay-ratings/feed/ 2
Stanley Cup Playoff Prediction Contest http://blog.war-on-ice.com/stanley-cup-prediction-contest/ http://blog.war-on-ice.com/stanley-cup-prediction-contest/#comments Sun, 12 Apr 2015 14:55:27 +0000 http://blog.war-on-ice.com/?p=423 Enter Here:  Submit your entry for the 2015 WAR On Ice Stanley Cup Playoff Prediction Contest (Round 1) here.  Rules are below.

Entries:  Only one entry permitted per person.  Violators to this rule will be disqualified (and publicly shamed on Twitter).  Sam, Andrew, and Alexandra will be participating, but their entries will be ineligible to win the prize.

Prize:  First place receives a $50 Amazon.com gift card.  All other entries will be awarded with nothing, because you don’t play to lose in the Stanley Cup Playoffs.

Donations:  Entry is free, as this is a “for fun” contest only.  Although not required, we suggest making a small donation through the link on our home page.

Half of all donations received between now and the end of the Stanley Cup Playoffs will be forwarded to Colon Cancer Canada in memory of capgeek.com founder Matthew Wuest.  We also encourage people to donate directly to Colon Cancer Canada if they prefer.

Official Scores:  All statistics will be taken from www.war-on-ice.com.  Any stat changes that occur after noon on the day following the completion of the Stanley Cup Playoffs will not be taken into account.

Scoring:  Answers to each question will be standardized using the scale() function in R.  The Euclidean distance between each entry and the actual results will be calculated using the dist() function in R.  The entry with the lowest Euclidean distance from the actual results will be the winner.

Subsequent Rounds:  Additional questions will be sent to participants for rounds 2, 3, and 4.  These will be sent shortly after the conclusion of the previous round.  Participants who fail to submit entries for rounds 2, 3, and 4 will be assigned a random response from one of the other contestants for each question.  Note that there will be limited time between rounds to complete these questions, so please plan accordingly.

Submission Times:  Any submission received after the first puck drops each round will be disqualified (for Round 1) or assigned random responses, as described in “Subsequent Rounds” above (for Rounds 2, 3, and 4).

Disclaimer:  We reserve the right to disqualify any entry for any reason at our discretion.

]]>
http://blog.war-on-ice.com/stanley-cup-prediction-contest/feed/ 0