Monthly Archives: April 2020

Bayes’ Theorem and novel coronavirus testing

In previous posts, I have advocated surveillance of the novel coronavirus through testing of random samples of the population (or some quasi-random sampling method that yields equivalent estimates of infection). I still think this is a good idea, though there remain practical concerns.

Some people are not only clamouring for more tests, but are demanding widespread use of serological testing as well. These are blood tests that detect indirect evidence of infection related to our immune system. These tests are important for determining what proportion of the population has been exposed to the novel coronavirus, and could be informative for understanding the impact of the disease, and possibly the level of immunity in the population.

More testing is useful, but it does require a level of caution in interpretation, particularly since all tests are imperfect–both failing to detect true cases, and falsely detecting non-cases. Serological tests for the novel coronavirus could have a higher error rate than the
polymerase chain reaction (PCR) tests. PCR tests work by detecting actual genetic material from the virus, and serological tests detect antibodies that are signs of past infection. However, for both tests, you might be surprised at how little information the test may actually provide you as an individual, particularly in such uncertain times. The purpose of this post is to show why tests (of all sorts) should be interpreted with considerable care, and how increased testing may not always yield useful information to the person being tested.

Now for a little probability theory

We begin with an assumption about the current incidence rate. The incidence rate tells us the average probability of infection at a moment in time–what the risk is to the average individual. Let’s assume that it’s 0.01. This means that 1% of the population is infected, and a randomly selected person has a 1% chance of being infected absent any other information. I’ll refer to this as the baseline probability of infection, or P(infected).

Next, we will assume that the probability of a tested person testing positive is around 2%. This is in the ballpark for a number of jurisdictions at present. It is higher in some areas (in New York, the number is closer to 25%), and lower in some places (in Alberta, it’s less than 2%). This is P(testing positive).

Finally, we’ll assume that the PCR test has a sensitivity of 0.95. This means that it will correctly identify a true positive case 95% of the time. We’ll call this P(test positive|true infected), which is the probability that a person will test positive given that they are truly infected.

With this information, we can understand the meaning of a positive test issued to a person selected randomly from the population, and this information can contribute to our understanding of the level of infection in a population.

However, for the tested person, and the population in general, this information is tricky to interpret.

The first thing to realize is that in spite of the high sensitivity above, a positive test is not 95% accurate. In other words, someone who tests positive is not 95% likely to have the novel coronavirus.

We’re going to use Bayes’ theorem to figure out what the probability of infection is given a test with a 95% sensitivity. Thomas Bayes made one of the most profound discoveries in probability theory (if not science!) about how information can update our understanding about the state of the world. Most simply, the theorem tells how we can update a prior probability with information to get a posterior probability. The posterior probability is the probability we estimate given the information available to us. In this case, the posterior probability is the probability that a person has SARS-CoV-2 given they received a positive test result. The information is the test result. The prior probability is the current incidence rate.

We use the following formula:

to calculate the posterior probability. If we substitute in the values above, we get 0.01 * 0.95 / 0.02 = 0.475. Based on these values, if you get a positive test for SARS-CoV-2, there is less than a 48% chance that you actually are infected!

Interpreting the meaning of a posterior probability is a little tricky, and gets into questions about the very meaning of probability. The posterior is an estimate of subjective probability–the probability assessment we make given what we know. Since what we know varies, this subjective probability can also vary. If you have symptoms typical of covid-19 when you get tested, then we might make adjustments to the formula; for example the baseline incidence of infection with covid-19 is higher among the subset of the population that have a fever and a dry cough. This could increase P(infected) and P(testing positive), and result in a posterior probability much closer to 1.0.

If this all seems overly technical, think of the problem in the following way. If you lived on a desert island and never came into contact with anyone and then took a test for SARS-CoV-2 , would you trust the result? Probably not. Why? Well, because the prior knowledge of your circumstances says that your underlying risk is very, very low–almost 0. You also know that the information from the test is not perfect, so your intuition tells you that it is more likely that the test information is wrong than that you actually have a
SARS-CoV-2 infection. This is an extreme example, but illustrates the same general idea:when the uncertainty from the information we receive is high compared to the probability of the outcome that the information predicts, the information is not very useful.


This is important to understand when we consider the prospect of more widespread testing, particularly when using a test with a higher false positive rate. Let’s say that the serological antibody tests have a sensitivity of 0.95, but a higher false positive rate. A higher false positive rate will increase the denominator of the formula above, all else being equal. So it could be that rather than 0.02 (2% of people taking the test are found to be positive) it would be 0.03 (note: I don’t know if this is the right number, so take this example with a grain of salt). This increase is due to the increased number of false positives from serology-based antibody tests. This would yield a posterior probability of around 0.32–meaning that a person with a positive test would have a 32% chance of actually having a a novel coronavirus infection.

Again, from a public health perspective, this issue is not necessarily a problem if experts are careful in their interpretation of data, especially if the purpose of the tests are to estimate past infection (which is typical for serological tests for antibodies). But for tests used to estimate current infections, it means that many people could be told they are infected by SARS-CoV-2 that aren’t, and could endure all the burdens that come with this diagnosis (quarantine, fear, etc.) unnecessarily. It could even lead to reckless behaviour at some point; people who think they’ve had an infection that they have not had may then go into the community with a false sense of security, and get the infection due to the false belief of immunity.

This should also be a reminder about the dangers of individual people demanding medical tests, particularly if the tests have large risks of false positives. Tests have to be interpreted in the context of the person receiving them; if an asymptomatic person is tested, they should probably be warned that a positive test may not be an indication of infection.

Information from tests is much more complicated than most of us realize, and can be downright misleading in some cases. While some folks may think that the gate-keeping of tests (by physicians or governments) is a just another example of big brother trying to save money at the expense of our health, it may sometimes be in our best interests to remain untested, particularly if the tests have high rates of error.

Google mobility reports

Google recently released the results of an analysis of mobility data. These results are very interesting, especially when cross regional comparisons are made. In brief, these results tell us how much society is using public spaces–like grocery stores, retail outlets and parks–and is an indirect indicator of physical distancing behaviour. Below are a few of my early observations.

Where do we go now?

Regions that have been identified as locations of high levels of infection and mortality appear to have the greatest reduction in visits to retail outlets, workplaces and other public destinations in our communities. Here is a figure for Italy:

A screen capture from one of Google’s Community Mobility Reports (

What we see is a 94% reduction in Italian visits to retail and recreation locations as of March 29, 2020 compared to a baseline of activity from a few weeks prior. The decline started in early March, when news of Italy’s health care crisis was first emerging.

In contrast, here is a figure for Sweden:

A screen capture from one of Google’s Community Mobility Reports (

Note that the drop is much less dramatic, and the decline starts around March 8th. It’s worth noting that Sweden has a different take on the covid-19 crisis. There is little mandated physical distancing. Most public health policy is focussed on vulnerable populations and prohibition of large gatherings.

In Canada, it looks like this:

A screen capture from one of Google’s Community Mobility Reports
( )

Canada saw a later start to the decline than Italy and Sweden, but has seen a dramatic reduction nonetheless. The US data as a whole look very much like Canada’s, though the magnitude of reduction is a bit lower (a 47% decline from baseline as of March 29th). However, this varies from state to state. In New York, for example:

A screen capture from one of Google’s Community Mobility Reports
( )

In Arkansas, on the other hand, the reduction has been less dramatic:

A screen capture from one of Google’s Community Mobility Reports
( )

Also interesting is the change in activity by location type. National parks, public beaches, dog parks and public gardens have seen less of a decline in activity over time. In Canada, we see a smaller drop in the use of these park spaces:

A screen capture from one of Google’s Community Mobility Reports

Perhaps this makes sense–people need to get out for exercise, and view these spaces as safe (given their spaciousness, and our feelings towards to the healthiness of nature generally). Moreover, in much of Canada, we are coming out of winter, and many people are clamouring for sun.

In Australia, where climate is warmer generally, and seasons are reversed from those in the northern hemisphere, we see a larger decline in use of parks over this period:

A screen capture from one of Google’s Community Mobility Reports

What does this all look like in countries where novel coronavirus has been around for a while? Well, here are some figures for South Korea:

A screen capture from one of Google’s Community Mobility Reports
( )

We can see here that South Korea did not seem to rely as much on physical distancing for infection control, at least not to the same extent as many countries are doing now. Note that visits to parks and grocery/pharmacy are actually up. I should mention that these comparisons are a little tricky, as we don’t know what happened in early February or January, and I am unsure what Google uses as a baseline for South Korea; if it’s the same for everywhere around the world, it’s hard to know how to interpret the data for countries that experienced the outbreak earlier.

There are no data for mainland China, but there are data for Hong Kong and Taiwan. Here are some figures for Hong Kong:

A screen capture from one of Google’s Community Mobility Reports

Based on these data, residents of Hong Kong seem to be pretty consistent in their mobility behaviour for at least the last 6 weeks or so.

What does this all mean?

At this point, it’s very hard to say much about these data beyond what we see from these figures. I wish Google would put this in a table with day-specific records; as it stands, all we can do is look at the graphs, and can’t really analyze many numbers yet. Still, I think these data could be useful in the next few weeks, when we can compare the trends in mobility to trends in cases, testing and deaths (notwithstanding the very low quality of infection data right now). At some point, we could see some connection to mobility behaviour and testing.

However, one observation that could be important here is that the most extreme physical distancing behaviour is a response to crises already in force. New York state and Italy and other areas with a large number of cases and/or deaths take this practice most seriously, but only did this once the the infection was well under way. In other areas, physical distancing has been employed less, but (at least in the case of South Korea…maybe), the infection still flattened with less radical levels of physical distancing behaviour. Maybe something else that Koreans were doing (like wearing masks…?) is what really accounted for the decline in infection. This is all still pretty speculative now, but the data (so far) do not seem inconsistent with that hypothesis.