Introduction
About 7 years ago or so I had a graduate student working on geographic patterns of arson in Toronto. We published one of the chapters from her work, but then the other one lingers. It lingers because it was pretty clear to me that in spite of the fact that our analysis suggested some interesting processes at work, arson is a black swan.
What do I mean by black swan? Well, I mean it in the sense used by Nasim Taleb, a cranky statistical philosopher who authored three important books: Fooled by Randomness (my favourite), The Black Swan and Antifragile. Taleb’s focus is on black swan events–very rare but highly impactful phenomena that are very hard (if not impossible) to predict, but that we often come up with explanations for after the fact. Examples include stock market collapses and major terror attacks. Taleb argues that we give false authority to experts who claim to understand black swans, and recommends that instead of trying to predict or explain these events, we should learn how to build systems that actually benefit from black swans when they do happen.
Arson is a black swan spatial process because the realization of arson frequency in space is made up of a small amount of explanable variation (population, poverty, housing conditions, street permiability) and a whole lot of hard to explain variation. The unexplained variation could be due to many processes, known and unknown. For example, the unexplained variation could be driven by serial behaviour; an arsonists sets a large number of fires in a small area in one year, and then nothing in the next. We know that serial criminal behaviour occurs, however predicting it is hard (if not impossible). Or perhaps it has to do with some unknown process. In either case, our work on this problem strongly suggested that predicting the location of small clusters of intense arson activity will occur in the long run is a fool’s errand.
What’s the problem?
It is fairly easy to publish research showing only the explained part of a system even if the explained part is a small component of the variation of the system overall. This is because any explanation (even if small) seems to be of some value. If a physician tells you that you need to change your diet to reduce your blood pressure, she’s not making a specific prediction about what will happen to your blood pressure if you don’t change your diet. This is impossible to know. in fact, most variation in blood pressure is not caused by diet. Nevertheless, she’s using information that shows how diet explains some variation in blood pressure in populations as a whole to offer you advice that, on average, is probably helpful.
When we worked on the factors that explain geographic variation in arson, my student came up with a model that explained some of the geographic variation in arson. She was even able to identify which areas of the city had higher and lower arson frequency in the long run. However, year to year black swans (what I suspect are probably clusters of unpredictable serial arson behaviour) made the predictions of arson quite poor, usually leading to major under-predictions. The following figure is illustrative:

Purple dots are the count of arson across Toronto neighbourhoods–the variation from place to place is quite variable, and varies considerably year to year
The purple dots on this figure are the actual number of arson events in a given year across Toronto neighbourhoods. In some neighbourhoods there are over 35 arson events in a year, but the city average is around 2 or 3. Attempts to model these data using things like population, commercial activity, poverty, street permiability and other factors can’t predict these extreme variations, and we never found a term to put into a model that picks up much of this variation in a training data set and can then predict the variation in other data sets.
(technical note: what’s particularly funny about this above example is that the ‘best’ performing model structure here is the old-fashioned linear model, seemingly because it picks up the possibility of extreme variations better than models that attempt to parameterize it through some specific model link structure or some scaling parameter.
Conclusion
Given enough data, anyone can model some of the pattern in almost any phenomena. The fundamental question is how well can your model predict future patterns? For this arson project, the predictions were just not compelling enough; sure, we could predict the relatively higher and lower arson neighbouhoods, and some of the factors that may explain some variation from neighbourhood to neighbourhood. However, the real challenge is being able to predict the extremes–the neighbouhoods which are suddenly targetted by serial arsonists, and result in a large number of arson events that whip up fear and threaten the safety of a community. This is not a simple task, and we certainly had no luck with it, so the chapter sits unpublished.
This also points to the importance of context; a model that explains a small amount of variation in a system might be very useful if if that knowledge can save lives, or save money. In this case, I did not feel that our model was useful for anything–not for arson prevention, policing, urban design, etc., even if it did explain some variation in arson. However, going back to the hypertension example, there is evidence that a little information about diet and hypertension might be useful at a population scale.