CORRELATIONS: BEWARE OF DANGEROUS LIAISONS
Just as human beings have a tendency to seek connections in life, we’re also prone to making connections when it comes to the analysis of data.
These connections, or correlations as they’re known in our world, can lead us up the garden path of ridiculous conclusions if we don’t tread carefully.
To illustrate this point, Datamine recently investigated the correlation between the location of police stations and where crimes are committed.
Using data from the police detailing the location of crimes and the time they took place, we matched individual crimes to the location in which they occurred. We then gathered all the addresses of police stations and geocoded them to get latitude and longitude for each one. Within each location, we worked out the distance between each crime and the nearest police station.
The purple bars in the graph above show the correlation between the number of crimes and their distance to the nearest cop shop.
Census data provides the exact population for each location. Using this information, we have also calculated the number of crimes per person to account for areas that have a higher population (yellow line).
As you can see from the graph, it appears that there’s a higher instance of crime close to police stations. But what can we take from this?
If we interpret these correlations at face value, we could draw some interesting conclusions:
- The police have cleverly located their stations in crime areas
- Criminals have deliberately committed brazen acts right under police noses
- The presence of police stations is a cause of crime
While the first conclusion might seem logical (NZ police are smart, right?), the other two are a bit more of a stretch. If we concluded that police stations are a cause of crime, who knows what might ensue – closures, funding reductions, and, ironically, an increase in crime itself!
In this example, further analysis is required to uncover any causal influences between police station location and where crimes happen. One way to do this might be to investigate crime rates in an area after the opening of a new police station. If we observed a decrease in crime rates, we would have more evidence of the effect a police station has on local crime.
Here’s another example, just for fun.
The graph* above suggests if you were a margarine eater in Maine earlier this century, things could have been very rocky on the marriage front if you’d been overdoing your annual quota. Why? Because the stats show a correlation between the amount of margarine consumed and the rate of divorce between 2000 and 2009.
Some might say this makes complete sense. Eating too much margarine makes you overweight and unattractive to your spouse, who is likely to run away with the pool boy and file for divorce. Or is it because when things are shaky at home, dejected spouses seek comfort in the arms of ‘Marge’?
Of course this is a ridiculous example. But it just goes to show how we can dream up all sorts of cause and effect scenarios from a simple, unsubstantiated correlation.
So why are we telling you all this?
Because the same mistake can occur in business. We falsely interpret data to attribute the cause of an event or circumstance based on a shallow investigation of its statistical correlation with something else.
One more example: if we experienced an increase in sales at the same time we were running a media campaign, it would be easy to assume the campaign has been successful. But how do we know the sales increase is not the result of other factors (time of year, competitor going out of business, etc.)?
Remember: ‘The mathematics of statistics is not good at identifying underlying causes, which require some other form of judgement.’**
So what’s the moral of the story? When looking for patterns, connections or correlations within data – dig deeper and beware of interpreting the data at face value.