Bayes’ Theorem and novel coronavirus testing

In previous posts, I have advocated surveillance of the novel coronavirus through testing of random samples of the population (or some quasi-random sampling method that yields equivalent estimates of infection). I still think this is a good idea, though there remain practical concerns.

Some people are not only clamouring for more tests, but are demanding widespread use of serological testing as well. These are blood tests that detect indirect evidence of infection related to our immune system. These tests are important for determining what proportion of the population has been exposed to the novel coronavirus, and could be informative for understanding the impact of the disease, and possibly the level of immunity in the population.

More testing is useful, but it does require a level of caution in interpretation, particularly since all tests are imperfect–both failing to detect true cases, and falsely detecting non-cases. Serological tests for the novel coronavirus could have a higher error rate than the
polymerase chain reaction (PCR) tests. PCR tests work by detecting actual genetic material from the virus, and serological tests detect antibodies that are signs of past infection. However, for both tests, you might be surprised at how little information the test may actually provide you as an individual, particularly in such uncertain times. The purpose of this post is to show why tests (of all sorts) should be interpreted with considerable care, and how increased testing may not always yield useful information to the person being tested.

Now for a little probability theory

We begin with an assumption about the current incidence rate. The incidence rate tells us the average probability of infection at a moment in time–what the risk is to the average individual. Let’s assume that it’s 0.01. This means that 1% of the population is infected, and a randomly selected person has a 1% chance of being infected absent any other information. I’ll refer to this as the baseline probability of infection, or P(infected).

Next, we will assume that the probability of a tested person testing positive is around 2%. This is in the ballpark for a number of jurisdictions at present. It is higher in some areas (in New York, the number is closer to 25%), and lower in some places (in Alberta, it’s less than 2%). This is P(testing positive).

Finally, we’ll assume that the PCR test has a sensitivity of 0.95. This means that it will correctly identify a true positive case 95% of the time. We’ll call this P(test positive|true infected), which is the probability that a person will test positive given that they are truly infected.

With this information, we can understand the meaning of a positive test issued to a person selected randomly from the population, and this information can contribute to our understanding of the level of infection in a population.

However, for the tested person, and the population in general, this information is tricky to interpret.

The first thing to realize is that in spite of the high sensitivity above, a positive test is not 95% accurate. In other words, someone who tests positive is not 95% likely to have the novel coronavirus.

We’re going to use Bayes’ theorem to figure out what the probability of infection is given a test with a 95% sensitivity. Thomas Bayes made one of the most profound discoveries in probability theory (if not science!) about how information can update our understanding about the state of the world. Most simply, the theorem tells how we can update a prior probability with information to get a posterior probability. The posterior probability is the probability we estimate given the information available to us. In this case, the posterior probability is the probability that a person has SARS-CoV-2 given they received a positive test result. The information is the test result. The prior probability is the current incidence rate.

We use the following formula:

to calculate the posterior probability. If we substitute in the values above, we get 0.01 * 0.95 / 0.02 = 0.475. Based on these values, if you get a positive test for SARS-CoV-2, there is less than a 48% chance that you actually are infected!

Interpreting the meaning of a posterior probability is a little tricky, and gets into questions about the very meaning of probability. The posterior is an estimate of subjective probability–the probability assessment we make given what we know. Since what we know varies, this subjective probability can also vary. If you have symptoms typical of covid-19 when you get tested, then we might make adjustments to the formula; for example the baseline incidence of infection with covid-19 is higher among the subset of the population that have a fever and a dry cough. This could increase P(infected) and P(testing positive), and result in a posterior probability much closer to 1.0.

If this all seems overly technical, think of the problem in the following way. If you lived on a desert island and never came into contact with anyone and then took a test for SARS-CoV-2 , would you trust the result? Probably not. Why? Well, because the prior knowledge of your circumstances says that your underlying risk is very, very low–almost 0. You also know that the information from the test is not perfect, so your intuition tells you that it is more likely that the test information is wrong than that you actually have a
SARS-CoV-2 infection. This is an extreme example, but illustrates the same general idea:when the uncertainty from the information we receive is high compared to the probability of the outcome that the information predicts, the information is not very useful.

Implications

This is important to understand when we consider the prospect of more widespread testing, particularly when using a test with a higher false positive rate. Let’s say that the serological antibody tests have a sensitivity of 0.95, but a higher false positive rate. A higher false positive rate will increase the denominator of the formula above, all else being equal. So it could be that rather than 0.02 (2% of people taking the test are found to be positive) it would be 0.03 (note: I don’t know if this is the right number, so take this example with a grain of salt). This increase is due to the increased number of false positives from serology-based antibody tests. This would yield a posterior probability of around 0.32–meaning that a person with a positive test would have a 32% chance of actually having a a novel coronavirus infection.

Again, from a public health perspective, this issue is not necessarily a problem if experts are careful in their interpretation of data, especially if the purpose of the tests are to estimate past infection (which is typical for serological tests for antibodies). But for tests used to estimate current infections, it means that many people could be told they are infected by SARS-CoV-2 that aren’t, and could endure all the burdens that come with this diagnosis (quarantine, fear, etc.) unnecessarily. It could even lead to reckless behaviour at some point; people who think they’ve had an infection that they have not had may then go into the community with a false sense of security, and get the infection due to the false belief of immunity.

This should also be a reminder about the dangers of individual people demanding medical tests, particularly if the tests have large risks of false positives. Tests have to be interpreted in the context of the person receiving them; if an asymptomatic person is tested, they should probably be warned that a positive test may not be an indication of infection.

Information from tests is much more complicated than most of us realize, and can be downright misleading in some cases. While some folks may think that the gate-keeping of tests (by physicians or governments) is a just another example of big brother trying to save money at the expense of our health, it may sometimes be in our best interests to remain untested, particularly if the tests have high rates of error.