Predicting NHL hockey injuries

I recently built a database of information about hockey injuries, player attributes and player performance for the 2009-2010 to 2015-2016 seasons.  Injury data are from Man Games Lost, player attributes are from Stattleship and player performance data are from several online sources.  My goal is to come up with a tool that predicts hockey player injury, and the impact of injuries on player and team success.

Importantly, this is not an epidemiological study, or a study of the health, economic or social impacts of hockey injuries.  I may do that in the future, but good work has already been done in this area, so I am not sure I have much to offer.  This study is an attempt to determine a) how predictable hockey injuries are and b) determine if there are any general rules for understanding the likelihood of injury among NHL hockey players.

My first attempt at this is quick and dirty, but is, nonetheless, interesting.  In this analysis, I modelled injuries as a function of age, total games lost to injury in previous 5 years and player weight.  The formula for the model is:


I compared the predictions from the model (which is based on data from 2009 to 2015) to injury data in 2016, and found that the model predicts player injuries considerably better than chance, although the predictions are not bounded at 0 and 82 (the minimum and maximum number of games a player can play in a season) so it needs tweaking at the extremes.  (Technical note: I used a linear model above, but also fit the data with a generalized linear model–negative binomial to be precise–and the direction and statistical significance of the coefficients were the same, so I am presenting the results of the linear model for ease of interpretation).  In general, the model is a good starting point.  In the graphic below, you can see the model predicted man games lost by team for the actual 2015-2016 season (y-axis), and the man games lost predicted by the model (x-axis):


The correlation between predicted injuries and real injuries is not perfect, but it was better than expected. There are some outliers–Toronto had way more man games lost (455) than predicted by the model (135).  This could be just unexplained model error (‘randomness’), or perhaps has something to do with the Auston Mathews sweepstakes…  Note that the the model underpredicted injuries to my beloved Oilers; this is good news, actually.  This suggests that the Oil may have just had an unlucky year, and can expect fewer man games lost next season!

What’s interesting is the complexity of the relationship between age and previous injury. Here is a graph that shows the predictions of this model (holding weight at the NHL average):


This graph is not particularly intuitive, but most importantly, it says that injuries become more common with age overall, but that a previous injury history predicts the risk of future injury.  It also says that injuries early in a career are more predictive of future injuries than injuries later in the career.  This seems plausible; a 25 year old who has missed 100 games in the last 5 years might be fairly classified as ‘injury prone’.  Some of these players have shortened careers, and others adapt their game and/or lifestyle in a way that helps them later in their career.  Late career injuries may be career ending, but also less likely to reflect some inherent vulnerability.  A player who makes it to 35 in the NHL is probably more often fighting the effects of aging than some underlying vulnerability to injury.

I need more data to properly untangle this, and need to model the data in a way that accounts for loss to follow-up or ‘censoring’.  Nevertheless, the fundamental question has probably been answered: injuries are not completely random, and we can use numeric statistical models to understand future man games lost.