I woke up this morning feeling very frustrated by the public conversation about covid-19 data going on in Ontario. A number of media outlets (CBC and others) and experts seem to be fixated on the need for more sharing of covid-19 data, and in particular, implying that more public data at finer geographic resolutions (such as maps of case counts in neighbourhoods or postal codes) would be useful for resolving the covid-19 crisis. To put it simply: I don’t agree. Here’s some of the reasons why.
1.The data are neither a precise nor accurate representation of true positive cases
Some studies are suggesting that current covid-19 case reporting based on non random PCR tests (which is what most publicly available data are based on) under count the true number of cases by as much as an order of magnitude [1,2,3,4]. When error is this high, then any small bias in the data collection process could dominate any patterns we see. This is mostly because the variability in the true (unknown) positive case numbers is larger than the daily positive tests identified.
Most of the under count problem is not an administrative data sharing issue, or even necessarily the result of insufficient testing, but is a problem of sampling; the current testing protocol relies on physician or self-referrals for testing, and not a widespread random selection of the population. This is a big problem that is widely accepted, and means that day to day variations in cases are not particularly meaningful.
This is also a problem when reporting data at small geographic scales. Small idiosyncratic variations that are not about underlying infection rates can lead to dramatic apparent differences in daily cases, particularly in small population areas. One source of such small idiosyncratic variations is difference in physician practice style. In one area a physician might refer 80% of her patients for SARS-CoV-2 testing, and a physician in another area maybe 40%. Even if the underlying incidence is the same for these areas, the number of cases could differ significantly–not because of epidemiology, but physician practice style.
2. The data may influence behaviour in some unwanted ways
For the sake of illustration, imagine two neighbourhoods next to each other: A and B. They are equal in population, services offered and ease of access, but their reported case numbers differ: say there are 50 cases in A and 5 in B. What might happen once these data become public?
i. People in living in neighbourhood A could use this information to spend more time outside neighbourhood A out of fear of their immediate neighbours. Since these people may have a higher risk of having been infected (given the baseline higher risk in A), they will probably increase spread to other neighbourhoods.
ii. There will be an increase in activity in neighbourhood B overall. People from other parts of the city will be less likely to go to neighbourhood A, so B will get more of this activity, which could increase the infections in B.
Neither of these outcomes seem to be particularly desirable, but both seem to be a plausible response to reported data on cases. Moreover, this is made worse by point 1 above; the data are probably wrong anyway.
3. The data offer a misguided sense of certainty and predictability of infectious disease spread, and ignore the role of super-spreaders, and the lag in testing.
Infectious diseases area very tricky to predict geographically because just one infected person can change the future geographic distribution dramatically, and the behaviour of that one infected person is hard to predict. It is possible that the case count could be low in a neighbourhood, but that one of these cases is a super-spreader who unwittingly infects dozens of people. This would mean that the true risk of infection in this neighbourhood may be considerably higher than a superficial count of cases would indicate (even if they were accurate). Low case counts could provide a false sense of security, riskier behaviour, and more infection in the future.
Moreover, the lag between positive test results and infectiousness means that the data may often be late to indicate sharp increases in case numbers. A map of low confirmed cases in a neighbourhood may hide many currently sub-clinical but infectious people. Once again, if people act on these data assuming that they are accurate, then their behaviour might lead to more spreading of infection in the future.
What this means, and what to do
Some will argue that maps tell us about populations at risk; a neighbourhood with persistently high cases could suggest a vulnerable population that needs help. It’s true that there are some contextual factors that may cause some neighbourhood to have persistently high infection rates–such as neighbourhoods with large older populations or otherwise vulnerable people. But we probably don’t need detailed geographic data to tell us this; we have enough information now to just assume that these vulnerable populations should be helped, and care and prevention can be focused in these areas now. Other data already tell us where vulnerable people are; if they need help, we can help them.
Moreover, we as individuals can’t do much about such geographically vulnerable populations–this is the job of the provincial government and the public health units. They have the ability to divert resources to areas in need. That’s why the information is useful to them. It may be a challenge to hold these agencies to account in the short term if they fail to do their job, but this kind of deep and comprehensive scrutiny would probably be more useful in the future–once the crisis is over and we have a more complete picture of the crisis.
The critical question is how would these data improve any of our decisions as individuals? We all know what we should do about covid-19: wear a mask when in public, wash hands, physically distance, get tested if symptomatic, don’t be a jerk. If data don’t improve decisions, then their cost is greater than their value, and there is no point in clamouring for more. I’d argue that for most of us, neighbourhood level data are of no value, or worse.
I did not cite the Toronto Star anywhere above, though they seem to be very keen on ‘open government data’. The problem is that their content is behind a paywall. Yeah, that’s a (tiny) bit of irony.