Monthly Archives: July 2016

Hockey GM tenure length

I scraped a little data from the internet to analyze a couple of simple questions:

  1. What is the average tenure of an NHL GM?
  2. Is there a difference in the length of tenure between Canadian and U.S. based teams?

The data include GMs who worked in the 1950s onward.

You can access the data here.

Results

1. First, GMs last an average of about 5.5 years in their job.  I didn’t break the data down by month, so this number could be off by a decimal point or so.  But that’s the ballpark figure.  The old-time GMs (like Art Ross, Frank Selke and Harry Sinden) are outliers that drag the mean away from centre.  It should be no surprise that the median tenure is 4 years. Here’s a histogram of the distribution:

hockey_GM_histogram

2. There is a clear difference in the average (and median) length of GM tenure based on whether or not they are employed by a Canadian team.

Mean Median
U.S. 5.76 4
Canada 4.79 3

Conclusions

I can’t say anything about cause here–maybe GMs working for Canadian teams are forced to be more budget conscious?  Or maybe Canadian fans are more fickle?  Or the Canadian media is more critical?  Who knows, and who can even say the trend will continue into the future?

However, if GMs know about this trend, it could influence how they think about running teams.  A GM who has to worry about employment may make decisions more likely to maximize short-term success even at some expense of long term success.  This could involve trading away young talent or future picks for veterans, or signing players to long term deals in exchange for a short term payoffs.  Seems like something I’d keep an eye on if I were the owner of a Canadian team…

Racial bias in police shootings

I used the Washington Post data on 1499 killings of Americans by police officers in 2015 and July 2016 to analyze the following question:

Are African Americans who are killed by police officers more likely to be unarmed than non African Americans killed by police?  I am interested in this specific question because answering it gives a fairly clear indication if African Americans are treated differently when they encounter police.  One concern about racial bias is that police may (knowingly or unknowingly) be more likely to interpret a situation as dangerous depending on characteristics of the person they encounter.  If we assume that African Americans and non African Americans are equally likely to be armed in any encounter with police (it’s hard to know if this assumption is true) then any difference in the probability of being shot and killed by police might depend on how the officer is interpreting the situation, and may reflect a racial bias.

The analysis is simple.  I predicted the log-odds of being armed (classified two different ways: one as armed vs. unarmed and one as armed vs. unarmed or armed with a toy weapon) as a function of race (either African American or Other) using logistic regression.  I threw in ‘signs of mental illness’ and ‘gender’ into the model as well, but they aren’t consequential.  If you don’t care about the statistics, skip to the interpretation section below.

Results

The results for predicting the log-odds ‘unarmed’ or ‘toy weapon’

results3

The results for predicting the log-odds ‘unarmed’

results4

Converting the predicted log odds into a probability, we get:

Probability (as %) of an African American being armed when killed by police: 64.1%

Probability (as %) of a non African American being armed when killed by police: 71.0%

I ran the same analysis using data from the Mapping Police Violence site (based on their classification of unarmed), and found roughly the same effect:

results5

Interpretation

Based on these data (which are publicly available and can be openly scrutinized) the results are unequivocal: African Americans who are killed by police are about 7% less likely to be armed than non African Americans killed by police.  Perhaps this is because African Americans are less likely to be armed in police encounters when compared to non African Americans.  It may also reflect a racial bias in the interpretation of danger by police officers.  Keep in mind that this analysis implicitly controls for any differences in the rates of crime by ethnic status; all the subjects in the analyses were shot and killed by police, but African Americans were simply less likely to be armed.

Predicting NHL hockey injuries

I recently built a database of information about hockey injuries, player attributes and player performance for the 2009-2010 to 2015-2016 seasons.  Injury data are from Man Games Lost, player attributes are from Stattleship and player performance data are from several online sources.  My goal is to come up with a tool that predicts hockey player injury, and the impact of injuries on player and team success.

Importantly, this is not an epidemiological study, or a study of the health, economic or social impacts of hockey injuries.  I may do that in the future, but good work has already been done in this area, so I am not sure I have much to offer.  This study is an attempt to determine a) how predictable hockey injuries are and b) determine if there are any general rules for understanding the likelihood of injury among NHL hockey players.

My first attempt at this is quick and dirty, but is, nonetheless, interesting.  In this analysis, I modelled injuries as a function of age, total games lost to injury in previous 5 years and player weight.  The formula for the model is:

formula

I compared the predictions from the model (which is based on data from 2009 to 2015) to injury data in 2016, and found that the model predicts player injuries considerably better than chance, although the predictions are not bounded at 0 and 82 (the minimum and maximum number of games a player can play in a season) so it needs tweaking at the extremes.  (Technical note: I used a linear model above, but also fit the data with a generalized linear model–negative binomial to be precise–and the direction and statistical significance of the coefficients were the same, so I am presenting the results of the linear model for ease of interpretation).  In general, the model is a good starting point.  In the graphic below, you can see the model predicted man games lost by team for the actual 2015-2016 season (y-axis), and the man games lost predicted by the model (x-axis):

team_man_games_lost

The correlation between predicted injuries and real injuries is not perfect, but it was better than expected. There are some outliers–Toronto had way more man games lost (455) than predicted by the model (135).  This could be just unexplained model error (‘randomness’), or perhaps has something to do with the Auston Mathews sweepstakes…  Note that the the model underpredicted injuries to my beloved Oilers; this is good news, actually.  This suggests that the Oil may have just had an unlucky year, and can expect fewer man games lost next season!

What’s interesting is the complexity of the relationship between age and previous injury. Here is a graph that shows the predictions of this model (holding weight at the NHL average):

hockey_injury

This graph is not particularly intuitive, but most importantly, it says that injuries become more common with age overall, but that a previous injury history predicts the risk of future injury.  It also says that injuries early in a career are more predictive of future injuries than injuries later in the career.  This seems plausible; a 25 year old who has missed 100 games in the last 5 years might be fairly classified as ‘injury prone’.  Some of these players have shortened careers, and others adapt their game and/or lifestyle in a way that helps them later in their career.  Late career injuries may be career ending, but also less likely to reflect some inherent vulnerability.  A player who makes it to 35 in the NHL is probably more often fighting the effects of aging than some underlying vulnerability to injury.

I need more data to properly untangle this, and need to model the data in a way that accounts for loss to follow-up or ‘censoring’.  Nevertheless, the fundamental question has probably been answered: injuries are not completely random, and we can use numeric statistical models to understand future man games lost.