In our current world, where fake news and conspiracy theories flood every corner of the internet, graphs are a double-edged sword
As easily as graphs are used to capture complex trends, they can also be intentionally distorted and truncated, or just used incorrectly. Any graph expresses a wealth of information and we should always pause to understand what it is trying to tell us and to check that is everything is indeed as it seems.
In this article we have put together some examples to show the power of misleading graphs and how to spot them.
Let's get started.
A common form of graph manipulation is excluding sections of the axes from the image. This is often done intentionally to blow small changes out of proportion. Let's look at the below examples.
On the left (Example A) is a chart showing the birth lengths of shark species commonly found in NZ. At a first glance, it might appear that when born, basking sharks are 7 to 8 times larger than whale sharks.
A closer inspection of the axis will show that a baby whale shark is only a third of the size of a baby basking shark. It would also reveal that the axis begins at 45 not at zero. The dwarfing effect on the whale shark is further compounded using shark icons in the place of bars. While this makes for a more interesting graphic, it defies intuition because the icons do not actually represent the size of sharks.
Example B is how this information should be presented to allow an objective interpretation. It is arguable how harmful misinterpreting shark size really is, but truncating graphs can be used maliciously.
Adding white space to a graph can also play on your prejudices. The graph on the right shows the number of shark attacks globally since 1980. The shallow incline of the curve, created by adding unnecessary padding to the top of the figure, dampens the impact of the rise in shark attacks in recent decades to a casual glance.
In the graph below, the vertical axis is much more reasonable, however the horizontal one has been compacted, imparting a sense of dramatic increase on the reader. Not enough room has been left to properly show year-to-year changes, drawing the eye to the decade-long trends only.
A good graph should strive to present all information clearly, without bias, in a manner that is easy to interpret. Like the below chart. It does not overly squeeze, stretch, truncate or pad the timeseries, a blank slate of sorts that allows you to draw your own conclusions, not leading you to them.
As is the case when absorbing any statistical media, when looking at a graph, you should always consider what information might the creator be keeping from you. What context is not visible? Is this everything, or am I only seeing a snapshot of the bigger picture?
It is possible to lie while using genuine data by speaking in half-truths. Data can be intentionally included or excluded, cherry picked to reinforce a narrative.
If you were considering investing in the cherry growing industry in New Zealand, you might be presented with the following graph to convince you that cherry production has remained stable, following a large spike at the start of the millennium.
Not exactly a lie, but also not the whole truth. Zooming out to include production statistics from as far back as the 1960s, the market takes on a more volatile light.
Cherry picking is used by politicians, who select the edge cases that support their agendas; by skeptics, to discredit climate change research; by pharmaceutical companies, ignoring cases with side effects. It was used by the researchers that claimed the debunked link between the MMR vaccine and autism.
The presence of a graph is not always a beacon of honesty.
Every graph has a specific role to play in analytics. A line chart for understanding trends over time, a pie chart for breaking down parts of a whole, a bar chart for comparing categories, a map for gauging geographic distributions.
The inexpert use of the wrong graph for the job can lead to graphs that just don’t make any sense.
Datamine employees (Dataminers) were recently surveyed on their favourite foods. The results could be presented as a pie chart like this one.
A keen eye would see that the sum of the parts adds to much greater than the whole, 194% in fact.
If Dataminers had been asked to select their single favourite from the list of four foods, then a pie chart would have been an excellent way to present this data. In reality, the participants were asked whether they liked each food irrespective of the others, meaning a bar chart, like the following, is a better fit for showing the number of yes responses.
It might be interpreted from this chart that 35% of Dataminers do not like Marmite. In truth, respondents could select between “Yes”, “Sorta” and “No”.
This data is best presented with all the possible responses, either as a clustered column or a stacked column chart.
Graphs should be used to tell a story, but not at the expense of the data. Whether intentional or not, we will continue to see misleading charts in all forms. Hopefully this introduction to spotting when good data goes bad has given you some elements to add to your tool box for the next time you are reading an article or looking to make your next investment.