The importance of good research on tricky policy topics

I teach a graduate class in data analysis, and we are going to critically assess a few papers in our last seminar of the year.    I thought a good starting point would have been my own work, since there are myriad flaws in most studies I’ve done.  But then I stumbled on the recently published on the topic of perception and immigration.  I decided that his paper would make a much more interesting example for discussion and illustration.

Noah Carl, a graduate student in the UK, published a paper titled “Net opposition to immigrants of different nationalities correlates strongly with their arrest rates in the UK“.  I was struck by a combination of 1) how bad the paper is and 2) the potential impact of the results on public perception and policy.  Nobody is perfect, so it’s reasonable to be at least somewhat forgiving of bad research, especially of early career scholars.  However, given the ease with which debates about immigration can be hijacked by misinformation, accusation and racism, I think academics should ensure that their contribution to the debate is beyond reproach (or at least close to it).

It’s worth noting that the paper was published in an bottom-tier online journal.  Further, while my review is very critical, my intent is not to pillory the student, but rather critique his work.  Finally, I have no idea what is true.  His conclusions could be right or wrong; I simply contend that this particular research offers no insight on the matter either way, and that research this bad should never be published in any form.

What does the researcher claim?

The paper looks at the association between British perceptions of immigrants by country of origin and the crime rates of these immigrant populations in the UK.  The paper concludes that since the rate of crimes committed by immigrants correlates with the perception of immigrants, ‘public beliefs about immigrants are more accurate than often assumed.’

1280px-flag_of_the_united_kingdom-svg

The conclusions are inconsistent with the evidence

I will go through the specifics of the paper as I critically assess it.

1. Ecological study design problem

The research design is ecological.  This means that the data are not individuals, but aggregates (groups) of individuals.  This is probably the weakest study design in the social sciences; it is not only observational, but does not actually measure anything about people, but rather, just aggregations of people.  One consequence of this is that these research designs tend to over-estimate model fit.  That is, any effects estimated tend to fit more poorly in the real world than they do in the model.  This is because these study designs under-estimate variability.  In this example, had the author used individual data on the perception of immigrants rather than averages to fit his models, he probably would have seen a weaker relationship than he observed.

2. Small sample problem

In addition to being a weak study design, the author relies on 23 observations to draw his conclusions.  Statistics can make up for small samples when study designs are strong and variables are measured without systematic error, but the small sample size used in this study is particularly troubling when combined with all the other problems with the study.  Small samples are a multiplier of all other problems.

3. Bad sample

The researcher did not look at all immigrant data in the UK, but a small non-random sample of 23 countries.  There are a large number of Italian and Portuguese immigrants to the UK, but these data are not included in the study.  If they were included, the results may have looked different.  When the data we use are not exhaustive (complete) and not selected randomly, there is always the possibility that the selection of data used will affect out findings in a systematic way.  This is particularly problematic when the sample is small; a small non-random sample is the holy grail of statistical badness.

4. Missing variable problem

The author uses multiple regression model to control for the ‘confounding’ effect of things like whiteness, English speaking, being from a Western country, and religion on his observation that crime rates influence perception of immigrants.  He did not control for other potential confounding effects, however–like the economic wealth / productivity of the country of origin, historical tensions or media portrayals.  I added per capital GDP to his data set an observed that the log of per capita GDP correlates more strongly with perception of immigrants than the the log of crime rate.  It’s hard to know what variables to include in an analysis like this, but it matters, as the inclusion and exclusion of variables can change how data are interpreted.

5. Non sequitur

Carl draws the conclusion that ‘public beliefs about immigrants are more accurate than often assumed’, but the bulk of his analysis does not meaningfully address this claim.  Carl has not defined what ‘accurate’ is, but no reasonable definition can be boiled down only to crime rate–that is, the negative contributions of immigrants.  If public opinions were really ‘accurate’, their perceptions would also correlate with the positive contributions of immigrants, and in fact there would be strong correlation between net utility of immigrants and the perception of immigrants.  But Carl focuses only on one possibly useful measure, and ignores the rest.  As such,even if the technical difficulties above were overlooked, his conclusion is a misdirection since he’s not really measuring accuracy of public opinion.

My conclusion

There is more bad to say about this research, but I think I’ve made my point.

The author may respond by saying he did the best he could given the data available, but this is no excuse or justification.  There is a useful saying about putting lipstick on a pig; this research is an attempt to make good use of bad data, but it would have been better to have simply collected better data first rather than trying to dress it up to look pretty.

In short, it is never OK to publish research this bad, even in a inconsequential online journal.  Let me repeat, I have no idea what is true here–the author may or may not be correct–I have no idea.  What I know is that this research leaves so much to be desired that it requires more qualification than the word limits of a journal article would typically allow.

What makes the research worse is its very implications on public perception of immigrants and immigration.  This research can (and dare I say will) be read and interpreted in support of a certain perspective on immigration.   Research on fraught subject matter has to be held to an especially high standard of scientific rigour since conversations on these kinds of topics can easily reduce to name calling and hate.  Scientific contributions to such debates are important, but can be undermined by shoddy work.

Conflation of errors in political polls

What is error?

Error is the difference between what we think is true and what is actually the case.  In statistical analysis, error can be classified into two general forms: random and non-random error.

Random error is sampling error, that is, the error that comes from taking a random sample from a population.  This error emerges because samples are not guaranteed to perfectly reflect the populations from which they are selected.  For illustration, imagine I want to know the average height of men in Canada.  I can’t measure the height of every man, it’s just not practical.  So I take a sample–say of 100 men.  Ideally, the sample is taken based on some sort of random process–like a computer program that randomly selects phone numbers from a list of all phone numbers.  A random selection process like this is most likely to cover the breadth of possible heights at their expected frequencies; some men are very tall, some men are very short but mostly are around the average.  However, it is possible that through this random sampling process that I select, just by chance, a sample that is taller than average.  This is possible because random samples are not guaranteed to look just like the populations they are drawn from.  This is analogous to the expected variability we’d get in a coin flipping experiment–we don’t expect heads to come up 50% of the time in every series of coin flips.  Instead, we expect a little of variability–a few more heads than tails (or vice versa) in a series of coin flips is not a surprise

Non-random error is a little more mysterious and has can come from many sources.  It can come from instrument errors, calculation errors, observer and participant biases, incorrect assumptions and a host of other problems that can occur throughout the research design process.  Non-random error is (generally) less of a problem in well designed experimental research–indeed, a properly designed randomized control trial has no non-random error and only random error.  Non-random error is a big problem for almost everything else humans do–including marketing products, climate models, sports analytics, crime prevention, medicine and public health, urban design…and on and on…

Random error has mathematical properties that allow us to understand it; it is a type of error that we can often estimate.  Statistical inference based on the calculation of ‘p-values’ is the conventional attempt to address random error, but it can’t really help us understand non-random error.

voting

Polling errors are different…

The standard political poll says something like ‘in a survey of 10,000 likely voters, 32% of people plan to vote for candidate A’ with a margin of error of 1% 19 times out of 20.  In somewhat awkward statistical language this means that we expect the interval 31% to 33% to include the true % of people who will vote for candidate A, 19 times out of 20.  People reading the poll may think that it conveys a high degree of certainty–and that the small interval probably contains the true voter support and is therefore a good guess about the likely outcome of the election were it held today.  But unfortunately, many non-random errors are unaccounted for in political polling, and worse, may not be improved at all by taking large samples.  Just to name a few:

  1. People may be less willing to admit to voting for radical or unsavoury candidates in a phone or in-person poll (the ‘stealth voting‘ effect)
  2. People may change their votes at the polling both due to a desire to vote for the likely winner (the ‘bandwagon effect‘)
  3. Certain sampling methods may under-represent certain groups of voters
  4. People are influenced to change their voting behaviour based on polling results

All of these effects (and others) have been discussed in the political science literature on polling, so it’s not a new problem, and it should be of little surprise that so many polls seem to get election results wrong, even when taken shortly before election day.  The problem is that the reporting the magnitude of random error (as confidence intervals at a certain confidence level) makes it seem that polling error is known and small.  This ‘margin of error’ information gives too much authority to pollsters, and on occasion journalists have failed to dig deeply enough into the numbers to properly scrutinize the polling information.  Random error and statistical uncertainty are not the main problem with polling data; the main problem is more systematic, more unpredictable, and more difficult to explain.

The solution?

One approach is to include other error directly into the margin of error.  This isn’t easy, but some statistical theorists (like Bayesians) have ideas about how this could be done.  Generally speaking, it involves increasing the level of uncertainty in the margin of error based on what we know about the polling method and other factors.  If the current research shows that a certain data collection method has a had a very low success rate, this should be factored into the margin of error.  If one of the candidates has a poor public image, the results should be factored into the voting results.

It may also be possible to develop a more systematic understanding of how polls are wrong, and eventually improve their accuracy by making better adjustments to poll results. For example, if we know that polls systematically under-predict support for certain types of candidates (e.g., men, blowhards and idiots), then it may be possible to improve poll accuracy by boosting poll numbers for these candidates by a small amount.

These solutions take a lot of work, and while they have been an important part of modern election prediction, they still have a very mixed record of success.  It seems plausible that there will be some cyclic pattern to the accuracy of polls–periods of high accuracy followed by period of low accuracy–based on the ability of researchers to figure out the ways in which poll data are wrong over time.  Indeed, it seems plausible that if the effort to improve polling information continues, we may be able to expect better polling information in the near future.