One easy (and not uncommon) mistake in data analysis is to calculate a statistic from a statistic without considering statistical weighting. For illustration purposes, consider the following example.
Say I have data on neighbourhood income and population for a small city. The table of data look like this:
Perhaps the first thing I want to know is the average income for the city as a whole. It seems pretty natural to simply take the average of the average incomes across these neighbourhoods. This would give me an average income for the city of $68,712. However, this number is incorrect. Taking the average of the average assumes that each neighbourhood contributes the same amount of information to the city average. This is clearly not the case. Gastown has 305 residents, and so the average income of these residents clearly contributes less information to the city average than Zinger Park, which has 3900 residents.
The solution is to simply take the weighted average. In this example, the weight is the population in the neighbourhood (perhaps better would be the popualation of employed people in the neighbourhood). If we sum the products of these weights and average income, and then divide this by the sum of the weights, we get a weighted average. Here is a table that illustrates this visually:
The red cell is the sum of the product of weights and average income. The yellow cell is the sum of the weights (in this case, population). If I divide the red cell by the yellow cell, I get the weighted average (in orange).
Weights are common in statistical data analysis, and their role is usually to adust a statistic based on the information it contains. In this example it’s pretty straightforward. None of this is rocket science, but taking an unweighted average of average (or average of proportions, or average of any statistic) is done all the time. I see it in academic work, public reports and newspaper articles. It’s an easy mistake to make with a (usually) easy fix.