Mischief on Wikipedia

About a decade ago, I made a fraudulent entry on Wikipedia.   I was motivated by a curiosity about how long it would take for an obscure, and false, entry posted on Wikipedia to be removed by content experts.  The entry was an addition to a list of Lithuanian gods:

wiki_record

There is no Lithuanian god named Vytautus (which is usually spelled ‘Vytautas’) and no ‘god of good grooming’ in any pantheon of gods that I am aware of.

Well, the entry stayed on Wikipedia for several years, and was then removed, I assume, by an actual authority on Lithuanian gods.  However, sometime between when I inserted the entry and 2013, BBC broadcaster Andrew Marr (or his researchers) must have discovered the entry, and wrote the following in his book, The History of the World (which is also a BBC television series of the same name).

wikipedia_excerpt

I’m not sure there is any lesson here that everyone doesn’t already know: Wikipedia is a good starting point for research, but cannot be treated as a lone authority on most subjects.  I suppose it is encouraging that eventually some expert did delete the fraudulent content, and suggests that perhaps the truth on Wikipedia can emerge given enough time and scrutiny.  Then again, now that it is in a book, perhaps it should be added back into Wikipedia, with appropriate sourcing, of course…;)

Ecological analysis of homicide rates

Introduction

About two years ago I scoured the internet to put together some international statistics on homicide.  My intent was to determine what explains the geographic variation in homicide rates worldwide.  I did some quick analysis on these data, and I think some of the findings are interesting.

Methods

The data I use are all from public internet sources, though I can’t remember which for all of them.  There may be some errors here and there, but I am pretty confident of the data set as a whole.  I excluded countries with incomplete data.

I am using a linear regression model to predict the natural log of the homicide rates as a function of a set of predictor variables.  I use the log of the rate because the distribution of homicide rates is non-normal.  By using the natural log, I can use standard linear regression modelling in a way that is reasonably sound, and relatively easy to interpret.

The predictor variables were selected based on what I’ve read and heard about the social and structural influences on homicide rates.  This includes: income inequality (in the form of a Gini coefficient), gross domestic product per capita (with a squared term), religiosity (as measured by a 2009 GALLUP poll), the proportion of the population 20-24, literacy rate, and gun ownership rate based on a compilation of data on Wikipedia.  I also include dummy terms for region of the world, and considered other terms for conflict zones and drug trade, though these measures seemed too qualitative, so I avoided including them in the model.

I ran the models in SAS using PROC REG.  You can download the data here.

Results

Visually, the geographical differences in homicide at the global-scale rate are striking.

homicide1

 

Rates are two orders of magnitude higher in some Central American countries than they are in the countries with the lowest homicide rates (like Japan).  Within regions there is considerable variation too.  For example, in Europe, Russia has the highest rate (at over 10 per 100,000) while Iceland has a rate less than 1 in a million.

homicide2

So the next task is to find out what might explain this geographic variation (as far as it can be explained using ecological data).

First, the results of the full model with all predictor variables:

homicide3

As we can see from the top part of Table 1, the R2 is fairly large; it indicates that roughly 63% of the variation in homicide rates between countries is explained by the parameters contained in this model.  Since I am modelling a human system, and year to year vagaries of human systems call the explanatory power of these models into question, we should expect weaker predictive power in the real world.  But the model clearly does do a pretty good job of accounting for global-scale geographic variations in the past homicide rate.  For every 1 unit increase in the Gini coefficient, there is a 2.5% (0.02537*100) increase in the homicide rate.  To put this in context, all else being equal, if the U.S. had a Gini coefficient equal to Canada’s, according to the model, the U.S. would have a 31% lower homicide rate. It would still have a higher homicide rate than Canada, but it would be more in line with other OECD countries.  Interpreting GDP is more complicated.  The linear term suggests that for every $1000 increase in GDP there is a 6% decline in the homicide rate.  But the squared term suggests that this curve flattens out.  Nevertheless, it would appear that both relative wealth and absolute wealth are important predictors of geographic variation in homicide rates at the global scale.

Neither religiosity, literacy rates, the rates of gun ownership or the proportion of population 20-24 explain a significant degree of geographic variation in homicide rates.  However, there are distinct regional differences in homicide.  Compared to North America and Europe, Central American countries have a 178% higher homicide rate independent of the other model variables.  It is very plausible that the higher rate in Central America is partly explained by connections to the drug trade, though perhaps not entirely.  Costa Rica and Nicaragua are part of the northward flow of drugs, but they don’t have homicide rates as high as other Central America countries, or even Mexico.  Still, it seems reasonable to suggest that connections to North American illicit drug consumption could explain at least the difference between the Central and South American homicide rates (about 71%).  Since there are roughly 50,000 homicides in Central America each year, this means around 35,000 could be accounted for by the North American drug market.

I also stratified the analysis by geography, keeping only the terms that are statistically significant.  In this model, I include only wealthier countries (Europe and North America)

homicide4

According to table 2, absolute wealth doesn’t explain variation in homicide rates between wealthy western countries.  Instead, much of the variation (about 51%) in homicide rates can be explained by income inequality and the proportion of young people 20-24 years of age alone.  This seems consistent with the Wilkinson hypothesis; specifically, that income inequality may be an important influence on a country’s mortality rate.

In Table 3 we see that absolute wealth is still important for explaining differences in homicide rates outside of Europe and North America.  I suspect this effect would have been stronger had I included Japan, South Korea, New Zealand and Australia in the North America and Europe group rather than this one.

homicide5

Conclusions?

I think there are two important observations here.  First is the role of geography in helping us understand the magnitude of homicide explained by the drug trade.  Central American homicide rates are very high after controlling for demographics, income, gun ownership, religiosity and income inequality, and the best explanation for this is probably proximity to the U.S.  The difference in homicide rates within Central America may be associated with how drugs move from South America to the U.S. market.  The UNODC has a fairly comprehensive analysis of how some countries have higher levels of organized crime and other gang activity than others.  This could help explain the variations within Central America.  But in any case, this analysis is consistent with the assertion that tens of thousands of lives are lost every year as a result of the international trafficking of illicit drugs, particularly into the North American market.

The second observation is the role of income inequality.  While much of the gun violence debate in the U.S. has focused on gun control, it could be that the more immediate cause of higher-than expected U.S. homicide rates is income inequality.  Income inequality has no easy policy fix, or at least, not without major political turmoil, but it could very well be the more proximate cause of America’s high homicide rates.  Perhaps some wealth transfer from gun owners (as a social externalities excise tax) could be used to help close the income gap?  This could feed two birds with one hand, as my wife likes to say.