# Problematic statistical maps

A few years ago the Hamilton Spectator published an interactive web site on cancer data in Hamilton as part of its Code Red series.  There were a number of stories published on the subject highlighting the variation in cancer in the city, and its tendency to apparently cluster in lower Hamilton.

Here is a screen capture of one of the maps:

The map is interactive, in so far as you can hover over the census tracts (roughly equivalent to neighbourhoods) to see the local cancer risks, as well as some other information about the tract.

There are four important things wrong with this graphic.

1. At first glance, everyone should be terrified when they look at this information–these data suggest that the average person over 45 years of age has somewhere between a 12% and 19% chance of getting cancer every year!  The problem is that these incidence rates are very wrong. The annual age-standardized all cancer incidence in Ontario is around 5-6 per 1000.  For persons over 45, the rate is around 12-15 per 1000, with an annual risk of around 1.5%.  This map suggests risk 10 times that.  I had some discussions about this issue with the authors of this map a few years ago, and I recommended changing how the information was displayed, but they chose to leave the maps unchanged.
2. The rates are not age-standardized.  Age standardization is used to correct maps for the effect that geographic variation in age can have on disease and death.  The primary risk factor associated with almost all cancer is age, and the effect of age on cancer risk is so strong, that we usually ignore it so we can focus on causes that are modifiable (like smoking).  This map does not properly correct for geographic variations in age, but only maps data for persons over 45 years of age.  This means that some (perhaps even most) of the geographic variation in this map is still due to geographic variation in age.
3. The incidence rate for the Barton Street East location is reported as 340 per 1000.  This is almost certainly a statistical anomaly due to the small numbers problem.  When population sizes are small, incidence and mortality rates can often appear anomalously high (or low) when a disease is rare (like cancer). Part of the reason I suspect this explains the high rate in this tract is that there are missing census indicators for the pop-up table associated with this community–a problem most typical for small population areas.  At the very least, this rate should be looked into more carefully.
4. The actual geographic variation in risk in Hamilton is fairly small, but the colour gradient suggests a striking visual contrast, that is, in my view, misleading.  If we divide the highest rate tract (excluding the Barton East anomaly) with the lowest rate tract, we can get the largest rate-ratio. The largest rate ratio measures the largest degree of variation in rates between geographic areas.  In this map, the largest rate ratio is around 1.2.  This means that the highest rate tract has a cancer incidence rate 20% larger than the lowest rate tract .  This is not a huge difference in risk, particularly for cancer which is fairly rare.  It would mean that at most we see a handful more cases of cancer in the highest risk tract compared to the lowest risk tract, and that’s before accounting for differences due to chance or due to the uncontrolled for variations in age between tracts.  The ‘red’ colour compared to the ‘green’ colour is suggestive of a more dire reality–where people in some regions of the city are at significantly greater risk of cancer.

My conclusion

The incidence rates presented on the map are misleading as estimates of absolute risk.  I think I know what the authors have done wrong, but since the methods are not obviously available, it’s hard to know for sure the source of the problem.

Second, while there appear to be some differences in cancer risk in Hamilton, and these differences may warrant some concern, the differences are not large, and the map should better represent the reality, perhaps with less alarming contrasts in colour, or with some more context for interpreting differences in risk.  I am all for sharing data with the public, but it must be done properly, and I think this is an example of the opposite.  These ‘Code Red’ graphics may make for a sensational story, and may have boosted newspaper sales, but they are problematic, and should be interpreted with great care.