# Bayes’ Theorem and novel coronavirus testing

In previous posts, I have advocated surveillance of the novel coronavirus through testing of random samples of the population (or some quasi-random sampling method that yields equivalent estimates of infection). I still think this is a good idea, though there remain practical concerns.

Some people are not only clamouring for more tests, but are demanding widespread use of serological testing as well. These are blood tests that detect indirect evidence of infection related to our immune system. These tests are important for determining what proportion of the population has been exposed to the novel coronavirus, and could be informative for understanding the impact of the disease, and possibly the level of immunity in the population.

More testing is useful, but it does require a level of caution in interpretation, particularly since all tests are imperfect–both failing to detect true cases, and falsely detecting non-cases. Serological tests for the novel coronavirus could have a higher error rate than the
polymerase chain reaction (PCR) tests. PCR tests work by detecting actual genetic material from the virus, and serological tests detect antibodies that are signs of past infection. However, for both tests, you might be surprised at how little information the test may actually provide you as an individual, particularly in such uncertain times. The purpose of this post is to show why tests (of all sorts) should be interpreted with considerable care, and how increased testing may not always yield useful information to the person being tested.

Now for a little probability theory

We begin with an assumption about the current incidence rate. The incidence rate tells us the average probability of infection at a moment in time–what the risk is to the average individual. Let’s assume that it’s 0.01. This means that 1% of the population is infected, and a randomly selected person has a 1% chance of being infected absent any other information. I’ll refer to this as the baseline probability of infection, or P(infected).

Next, we will assume that the probability of a tested person testing positive is around 2%. This is in the ballpark for a number of jurisdictions at present. It is higher in some areas (in New York, the number is closer to 25%), and lower in some places (in Alberta, it’s less than 2%). This is P(testing positive).

Finally, we’ll assume that the PCR test has a sensitivity of 0.95. This means that it will correctly identify a true positive case 95% of the time. We’ll call this P(test positive|true infected), which is the probability that a person will test positive given that they are truly infected.

With this information, we can understand the meaning of a positive test issued to a person selected randomly from the population, and this information can contribute to our understanding of the level of infection in a population.

However, for the tested person, and the population in general, this information is tricky to interpret.

The first thing to realize is that in spite of the high sensitivity above, a positive test is not 95% accurate. In other words, someone who tests positive is not 95% likely to have the novel coronavirus.

We’re going to use Bayes’ theorem to figure out what the probability of infection is given a test with a 95% sensitivity. Thomas Bayes made one of the most profound discoveries in probability theory (if not science!) about how information can update our understanding about the state of the world. Most simply, the theorem tells how we can update a prior probability with information to get a posterior probability. The posterior probability is the probability we estimate given the information available to us. In this case, the posterior probability is the probability that a person has SARS-CoV-2 given they received a positive test result. The information is the test result. The prior probability is the current incidence rate.

We use the following formula:

to calculate the posterior probability. If we substitute in the values above, we get 0.01 * 0.95 / 0.02 = 0.475. Based on these values, if you get a positive test for SARS-CoV-2, there is less than a 48% chance that you actually are infected!

Interpreting the meaning of a posterior probability is a little tricky, and gets into questions about the very meaning of probability. The posterior is an estimate of subjective probability–the probability assessment we make given what we know. Since what we know varies, this subjective probability can also vary. If you have symptoms typical of covid-19 when you get tested, then we might make adjustments to the formula; for example the baseline incidence of infection with covid-19 is higher among the subset of the population that have a fever and a dry cough. This could increase P(infected) and P(testing positive), and result in a posterior probability much closer to 1.0.

If this all seems overly technical, think of the problem in the following way. If you lived on a desert island and never came into contact with anyone and then took a test for SARS-CoV-2 , would you trust the result? Probably not. Why? Well, because the prior knowledge of your circumstances says that your underlying risk is very, very low–almost 0. You also know that the information from the test is not perfect, so your intuition tells you that it is more likely that the test information is wrong than that you actually have a
SARS-CoV-2 infection. This is an extreme example, but illustrates the same general idea:when the uncertainty from the information we receive is high compared to the probability of the outcome that the information predicts, the information is not very useful.

Implications

This is important to understand when we consider the prospect of more widespread testing, particularly when using a test with a higher false positive rate. Let’s say that the serological antibody tests have a sensitivity of 0.95, but a higher false positive rate. A higher false positive rate will increase the denominator of the formula above, all else being equal. So it could be that rather than 0.02 (2% of people taking the test are found to be positive) it would be 0.03 (note: I don’t know if this is the right number, so take this example with a grain of salt). This increase is due to the increased number of false positives from serology-based antibody tests. This would yield a posterior probability of around 0.32–meaning that a person with a positive test would have a 32% chance of actually having a a novel coronavirus infection.

Again, from a public health perspective, this issue is not necessarily a problem if experts are careful in their interpretation of data, especially if the purpose of the tests are to estimate past infection (which is typical for serological tests for antibodies). But for tests used to estimate current infections, it means that many people could be told they are infected by SARS-CoV-2 that aren’t, and could endure all the burdens that come with this diagnosis (quarantine, fear, etc.) unnecessarily. It could even lead to reckless behaviour at some point; people who think they’ve had an infection that they have not had may then go into the community with a false sense of security, and get the infection due to the false belief of immunity.

This should also be a reminder about the dangers of individual people demanding medical tests, particularly if the tests have large risks of false positives. Tests have to be interpreted in the context of the person receiving them; if an asymptomatic person is tested, they should probably be warned that a positive test may not be an indication of infection.

Information from tests is much more complicated than most of us realize, and can be downright misleading in some cases. While some folks may think that the gate-keeping of tests (by physicians or governments) is a just another example of big brother trying to save money at the expense of our health, it may sometimes be in our best interests to remain untested, particularly if the tests have high rates of error.

Google recently released the results of an analysis of mobility data. These results are very interesting, especially when cross regional comparisons are made. In brief, these results tell us how much society is using public spaces–like grocery stores, retail outlets and parks–and is an indirect indicator of physical distancing behaviour. Below are a few of my early observations.

Where do we go now?

Regions that have been identified as locations of high levels of infection and mortality appear to have the greatest reduction in visits to retail outlets, workplaces and other public destinations in our communities. Here is a figure for Italy:

What we see is a 94% reduction in Italian visits to retail and recreation locations as of March 29, 2020 compared to a baseline of activity from a few weeks prior. The decline started in early March, when news of Italy’s health care crisis was first emerging.

In contrast, here is a figure for Sweden:

Note that the drop is much less dramatic, and the decline starts around March 8th. It’s worth noting that Sweden has a different take on the covid-19 crisis. There is little mandated physical distancing. Most public health policy is focussed on vulnerable populations and prohibition of large gatherings.

In Canada, it looks like this:

Canada saw a later start to the decline than Italy and Sweden, but has seen a dramatic reduction nonetheless. The US data as a whole look very much like Canada’s, though the magnitude of reduction is a bit lower (a 47% decline from baseline as of March 29th). However, this varies from state to state. In New York, for example:

In Arkansas, on the other hand, the reduction has been less dramatic:

Also interesting is the change in activity by location type. National parks, public beaches, dog parks and public gardens have seen less of a decline in activity over time. In Canada, we see a smaller drop in the use of these park spaces:

Perhaps this makes sense–people need to get out for exercise, and view these spaces as safe (given their spaciousness, and our feelings towards to the healthiness of nature generally). Moreover, in much of Canada, we are coming out of winter, and many people are clamouring for sun.

In Australia, where climate is warmer generally, and seasons are reversed from those in the northern hemisphere, we see a larger decline in use of parks over this period:

What does this all look like in countries where novel coronavirus has been around for a while? Well, here are some figures for South Korea:

We can see here that South Korea did not seem to rely as much on physical distancing for infection control, at least not to the same extent as many countries are doing now. Note that visits to parks and grocery/pharmacy are actually up. I should mention that these comparisons are a little tricky, as we don’t know what happened in early February or January, and I am unsure what Google uses as a baseline for South Korea; if it’s the same for everywhere around the world, it’s hard to know how to interpret the data for countries that experienced the outbreak earlier.

There are no data for mainland China, but there are data for Hong Kong and Taiwan. Here are some figures for Hong Kong:

Based on these data, residents of Hong Kong seem to be pretty consistent in their mobility behaviour for at least the last 6 weeks or so.

What does this all mean?

At this point, it’s very hard to say much about these data beyond what we see from these figures. I wish Google would put this in a table with day-specific records; as it stands, all we can do is look at the graphs, and can’t really analyze many numbers yet. Still, I think these data could be useful in the next few weeks, when we can compare the trends in mobility to trends in cases, testing and deaths (notwithstanding the very low quality of infection data right now). At some point, we could see some connection to mobility behaviour and testing.

However, one observation that could be important here is that the most extreme physical distancing behaviour is a response to crises already in force. New York state and Italy and other areas with a large number of cases and/or deaths take this practice most seriously, but only did this once the the infection was well under way. In other areas, physical distancing has been employed less, but (at least in the case of South Korea…maybe), the infection still flattened with less radical levels of physical distancing behaviour. Maybe something else that Koreans were doing (like wearing masks…?) is what really accounted for the decline in infection. This is all still pretty speculative now, but the data (so far) do not seem inconsistent with that hypothesis.

# Case fatality rate conundrum

One of the most important questions about the coronavirus outbreak today is the case fatality rate. Specifically, what is the probability that an infected person will die. Ideally, we’d like to know age-specific case fatality rate (what is the probability of an infected case dying by age). However, in spite of all the data online routinely published online and freely accessible by all, the case fatality rate remains unclear. In fact, it remains very unclear, and many have discussed how early estimates of the case fatality rate were exaggerated 1,2,3. However, what does this uncertainty mean? Does it mean that we are over-reacting to the threat that coronavirus poses? How should we make decisions given this uncertainty?

The problem

I remember when the outbreak first started spreading outside of China, the media said the case fatality rate was 2-4%. This was an estimate based on the the deaths attributed to coronavirus divided by the total positive tests reported by China and other early infected countries. If true, this is a case fatality rate more than 10 times typical seasonal influenza. Given the large number of infections (given the novel and transmissible nature of coronavirus) this suggested millions of people would die from the pandemic.

Since this time, many countries have reported much lower case fatality rates. A month into the pandemic, Germany reported a case fatality rate less than 0.5%. Canada has consistently reported numbers less than 2%, with some provinces (like Alberta) reporting case fatality rates less the 0.5%. Other countries have reported case fatality rates much higher. Italy has reported numbers greater than 10%. This range within Europe alone–from 0.5% to 10% may be explained by differences in health care, underlying risks or demographics, or differences in testing criteria. I have discussed this latter issue in a previous post.

I looked at some of the case fatality rates over time and plotted it out:

Here is the R code I used to generate the data used in this figure. You can copy and paste it into RStudio or equivalent and it will generate a similar graphic for you.

This is a strange looking figure, and while there is much that can be said about it, not much is definitive. Italy has seen a month of steadily rising case fatality rates. The higher case fatality rate in Italy may be due to how they define a coronavirus death. It could also be due to lower testing rate; given the overwhelmed health care system, it seems a plausible explanation. The only testing data from Italy I could find is about a week old, and reports some 200,000 total tests in total, with around 50,000 positive. That’s a very high % of positive tests, suggests that testing is fairly restricted, and that a large number of cases are probably undetected in the population. In Canada, less than 2% of tests come back positive for the coronavirus, and the case fatality rate is much lower. A high case fatality rate paired with a large population of untested positive cases is strong evidence that the case fatality rate is not reflective of the real risk of mortality.

China has been fairly stable–around 4%. If there were very few cases and very few deaths over the last month, this figure makes sense, even if the real case fatality rate has dropped over time; the 4% is largely due to the high rate of deaths earlier in the outbreak, but may not reflect risk of death today.

Germany and Canada have case fatality rates at or below 1.5%. Germany has seen the case fatality rate rise slightly over time. In Canada, the case fatality rate has been fairly stable, though it is creeping up as testing becomes more targetted. Here is a timeline for proportion of positive tests (PPT) and case fatality rate (CFR) for Ontario:

I used different data for the US–from the Covid Tracking project, which tracks cases as well as the frequency of testing. Here I plot the trend in proportion of positive tests and case fatality rate for the whole US:

The case fatality rate has been dropping in the US as testing has decreased–from about 2.5% in early March, to about 1.5% in late March. However, the US still has a very high proportion of positive tests–close to 15% right now, which suggests that the infection is much more widespread than positive tests suggest, or that the US is very judicious in who they test.

What does this mean?

If we control for 1) access to a well functioning health care system, 2) age and 3) pre-existing health status, it’s hard to see how the case fatality rates would vary this much internationally or over time. Even the differences between the US and Canada/Germany can’t be explained by access to health care or differences in epidemiology. Is the average German really less than half as likely to die from a coronavirus infection than the average American? I highly doubt it.

As I stated at the outset, infection rate alone is not adequate for policy decisions. A high infection rate coupled with a low case fatality rate of 0.01% is not a unique public health crisis, and does not justify a social and economic upheaval we’ve seen. The coronavirus outbreak is a problem if case fatality and infection rates are high enough to cause a serious increase in death. Yet, we remain uncertain about what these values actually are. So what do we do?

Making decisions under conditions of uncertainty

For the sake of argument, let’s assume that the real case fatality rate is probably less than 1%. By real I mean that over the entire population, in a country where health care services are available, and where cause of death is directly attributed to the infection, the average infected person has less than a 1% chance of dying from an infection. In age specific terms, older patients have a much higher risk, as do people with pre-existing illnesses. In the very young, the risk of deaths is very low, and the infection may even be less dangerous than the seasonal flu. In healthy middle aged adults, it could be in the 0.5% range on average.

However, there is uncertainty in this estimate, and importantly, the bounds of uncertainty are not symmetrical. These bounds tell us what the expected uncertainty is around this estimate, and acknowledge that there is a range of possible true values that we are not certain about. The graphic below is an estimate of the probability that a given estimate of the true case fatality rate is correct:

This is a ‘ball-park’ figure; I have no idea what that true probability distribution is (I’ve not even labelled the y-axis). The red line is the location of the best estimate–again pure speculation. However, we can be pretty certain of some things. First, any reasonable estimate of this curve has to put a non-zero probability in the right tail, and the tail stretches out–perhaps to 2 or 3%. There is a very, very small chance that the case fatality rate could be above 4%, particularly if the strain evolves to become more virulent (deadly) over time. However, there is no probability that the case fatality rate is 0%, or even 0.01% even if it evolves to become less virulent over time. The data, as flawed as they are, seem to show that this virus is killing people at least as often as seasonal influenza.

Furthermore, the nature of infectious diseases is that our decisions today directly impact the state of the world in the future. If we ignore infection control today, then at some point there is a good chance we will have to deal with the consequences of that decision; we won’t be able to reverse our policy and ‘uninfect’ ourselves. If we assume today that the case fatality rate is 0.05% (and accordingly, take no action), but it is actually 1.5%, then we live with the consequences of that decision of inaction forever.

So, in spite of the weak data, and the complex and contested world of policy making in a time of crisis, there is good reason for policy makers and the public to act as if the coronavirus situation is serious even with the current uncertainty, at least until more data emerge that clarify what the true case fatality rate is.

# Coronavirus data: good news and bad news

Today I want to discuss novel coronavirus (SARS-CoV-2) testing and the widely available data resulting from this testing. My conclusion from this analysis is that in spite of the wealth of shared data out there, these data, and much of the analysis done using these data is not useful because existing data do not give much indication of the infection level in the population. The corollary, however, is that the most widely shared current case fatality rate estimate (ranging between 1 and 3%) is probably too high, and the true case fatality rate could easily be less than 0.5%, particularly in countries with functioning health care systems.

Background

I’ll start by defining the key measures I use in this analysis: proportion of positive tests (PPT), the proportion of the population testing positive (PPTP), the proportion of the population tested (PT) and the case fatality rate (CFR).

The PPT is the ratio of positive tests to all tests conducted, and tells us the fraction of the population that has been selected for testing that has tested positive. PPT is an estimate of how likely a test will be positive in a population. PPTP is the ratio of positive tests to the total population, and is an estimate of the fraction of the population known to be infected based on positive test results. PT is a ratio of all tests conducted to the total population and tells us the fraction of the population that has been tested.

Here is a figure that helps visualize the difference between these concepts:

The large light blue circle is the entire population, the middle blue circle is the number of people tested, and the darkest blue circle is the number of people testing positive. Each of these circles are known quantities. Population is known from the census, the middle circle is the number of tests conducted by clinicians and health officials, and the smallest circles is the result of these tests (assuming that the tests are themselves accurate, which I think is generally accepted).

The last measure, CFR, is the ratio of all deaths linked to the coronavirus (the purple circle) to the total number of positive cases. In many ways, this is the most important number of all. If CFR was 0.1%, this coronavirus would not have gotten anywhere near the attention it has recieved; it is the fact that CFR appears to be so high (greater than 1% by most common estimates) that causes so much concern.

With the exception of PT, which is probably fairly accurate, what the rest of these measures represent is complicated by the fact that the methods of selecting the population for testing is not random. If people were randomly selected for testing, then PPT would be a great estimator of the true proportion of the population infected. If we saw changes in this figure over time, we could be confident that this change reflected a change in the level of infection in the population. Furthermore, we could fairly accurately estimate the CFT as well–we’d simply divide the deaths in the sample by the total number of positive cases.

As it turns out, the decision to test people is not based on a random selection of the population. For practical reasons, the test is administered to a subset of the population that meet some pre-test criteria for testing. Although the rules vary by jurisdiction, in general, testing appears to be increasingly focussed on high risk/vulnerable populations and health care workers. Earlier in the outbreak, tests were targetted at people with a high pre-test probability of infection (such as travellers to high risk countries with symptoms). Now, the testing decisions may have changed. In some jurisdictions, people who have travelled and have symptoms are simply told to self-isolate untested–under the assumption that they probably are an infected case. In other jurisdictions, the testing is becoming more widespread.

Lots of data!

There are mountains of data available on coronavirus, and seemingly thousands of data scientists creating beautiful maps and graphs online. I’ve made a few of my own (though, maybe they’re not that beautiful). Given that the process for selecting people for testing is not random, what are we to make of the data that underlie all this analysis? Are all these nice maps and graphs useful? Are the numbers right?

This is where things get a bit tricky. If the pre-test screening is very accurate at identifying cases (specifically, includes most or all infected people who will test positive and very few uninfected) then PPT is a poor measure of infection. In fact, it will drastically overestimate levels of infection.

However, if we knew this was the case, we could use PPTP as an estimate of infection rate–since the number of screened cases would be close to the real number of cases. Of course, if the screening process were that accurate, then we wouldn’t need the laboratory confirmed test in the first place. The unfortunate reality is that the screening process is effective at identifying some likely cases, but misses many others, and also includes many false positives (in fact, the vast majority are false positives).

The decision about who to test and not test is influenced by many factors, and unsurprisingly, testing frequency varies considerably around the world. In the US, tests (as measured by proportion of tested population, or PT) is lower than many other countries. Based on data I have found online, it’s around 2.3 per 10,000 people at present. In Canada, PT is around 13.5 per 10,000 people. In South Korea, the the number is around 60 per 10,000, and perhaps more. I can’t find any firm data on how many tests have been conducted in Germany, but apparently its less than South Korea, but more than most countries in Europe.

However, as I’ve hopefully made clear above, the number of tests does not necessarily influence the accuracy of our estimates of the proportion of the population infected. It does affect precision, and the ability to drill down into details–more tests mean increasingly local estimates of infection rates are possible. What matters more is the process or protocol for choosing who gets laboratory tests. The more random the process for selecting people, the more likely PPT can be used to accurately estimate the current proportion of infected population. The less random, the more uncertain we become.

So what are we left with? Well, implicitly, people seem to be using the ratio of infected people to total population (PPTP) as an indicator of the level of infection. This would be fine if we knew that the testing process captured every case, but we don’t know that. In fact, we can be pretty sure that many cases will go undetected, and that current and future case counts will be low.

Implications

People (including me) are excitedly making maps, and sharing all sorts of data on coronavirus infections. I have seen many very beautiful interactive online tools of infection counts that are fun to explore. As amusing as these are to play with, I am not sure many have been useful. The number of cases in a region is a product of the level of infection, the population size, the number of people tested and the process for selecting people for testing. At present, the impact of a non-random testing selection process leaves us uncertain of what the risk actually is, pretty well everywhere.

This means that the level of infection globally has enormous uncertainty to it–no surprise there, really. This is even more true in our specific communities. This is not just because some people are yet to be tested because they do not show symptoms, but because they may never be tested based on the screening process. Moreover, this process may vary from place to place, and even over time, so it will be hard to make anything but broad and general comparisons.

This uncertainty probably amounts to an underestimate of the true number of cases. It could be a small underestimate or a large one. It’s an underestimate because the test selection process in most jurisdictions is biased towards people who are vulnerable, have serious symptoms and/or have travelled; asymptomatic infections and infections from non travellers are probably being missed.

The good news is that if we are indeed under-counting the number of covid-19 cases, then we are over estimating CFR. The case fatality rates in Germany and South Korea, where there is more widespread testing, are less than 1%. However, even in these countries they are still not randomly selecting people for testing, and may not be testing enough to use PPTP as an estimator of the infection level. As a result, it’s still very possible that cases are being under-counted in these regions, and that the true CFR in South Korea and Germany is less than than 0.5%, or even less than 0.25%.

None of this changes the real impact of the coronavirus so far–thousands have died worldwide, and these deaths are tragic. Taking aggressive action to curtail the infection–even if the CFR is 0.5% or 0.25% can still be justified on public health and ethical grounds. Moreover, the collapse of health care systems remains a real threat, and can cause knock on effects, including deaths from other treatable conditions that are untreated because of health care system failures.

Solutions

If accurate estimates of the proportion of the population affected is important to us, there are possible solutions. For one, there may be some re-sampling options for selecting quasi-random samples from the tested population. To do this would require information about the test subjects, and coordination between testing facilities. But it’s possible some re-sampling process could construct a synthetic ‘sample’ that is more generally representative of the population, and would give a better sense of the underlying proportion of persons infected.

There may also be some post-stratification options. This would involve weighting the tested populations so that under-represented observations are given greater weights, and over-represented observations are given smaller weights. I am not sure if this is possible, but I assume that someone is looking into it.

More testing could help, particularly if it reaches a breadth of the population. Low testing in some places around the world is almost certainly causing problems–in some cases, a false sense of security that could lead to more infection and more death when health care systems get hit with a spike of cases.

Random sampling of the population would solve the problem lickety-split, but that’s probably not going to happen. It would be expensive, particularly if it targeted regions or local areas. Moreover, how many people would subject themselves to a random coronavirus test by a government official knocking on the door? If the rejection rate was high, test refusal would end up biasing the data again.

Conclusion

Testing for coronavirus is important. It can be used for tracing the origin of cases, identifying people for isolation or quarantine, and determining whether or not the infection is present at all in a population. More testing has value, and as tests get easier and more widespread, the information will improve. Indeed, cheaper and easier testing (like home testing kits) could even be the key to getting control of the pandemic.

However, until testing becomes more representative (or we learn that the existing testing sample is already pretty representative) then we should all be wary of much of the data we see and use. The current counts could be close to the mark, could be a small under-estimate, or even a large under-estimate. If this is the case, it also means that current estimates of the case fatality rate could be greatly inflated.

Post script note: the morning I published this post, I read a post by John Ioannidis (published on March 17th) that states similar concerns to the ones I express above. Although I don’t draw the exact same conclusions, I think he raises some important questions, and we both agree that random-sample testing for SARS-CoV-2 in the population could be very useful.

# Coronavirus epi curves

I did some analysis of epidemiology curves for coronavirus. This particular curve plots out the cumulative proportion of cases over time for a number of countries:

Each point on the line is a proportion of the total — which is why they all touch at the far right; all countries are at their daily cumulative maximum
(1.0) as of March 16th.

The graphs differ across counties in two important ways. First, they are shifted in time. This shows something we already know–that China and south-east Asia got hit with the infection first, and Western Europe and North America more recently.

More interesting is the shape of the curves. Notice that the rate of increase has been flattening out for China for some time. South Korea has is seeing a more recent flattening. Countries in Europe and North America are seeing a large increase now.

The most noteworthy line on this graph is Japan. Japan is seeing a slow and steady growth in cases, something that is typically not what infectious disease models predict. Usually growth, and often decline, tends to be nonlinear–a fast rise followed by a fast drop (and then a possible return with a lower amplitude). It’s hard to know what to make of this.

Is it because Japan is under-testing or under-reporting? Or is it that public health interventions were implemented very quickly and effectively in Japan? Only time will tell… Here’s the code for you to see for yourself.