In today’s world of fake news I thought I might write a post about how easy it is to tell different versions of the truth simply by manipulating the same data and/or visualizing it differently. Each of my examples will be truthful, but can tell profoundly different stories.
Let’s start with the data.
For my example I chose to use the estimated US Median Earnings in dollars from the US Census website. I also chose to color code the data using blue for boys and pink for girls since these are the traditional colors associated with gender in the US.
There are a few things of note about the data:
- First, the earnings values are the median earnings per year. It’s important to understand that median is not necessarily equal to average. Median is literally the middle, meaning that half of the population will earn more than the median and half the population will earn less.
- Second, the % Diff is the percentage of difference between the Male and Female median salaries for the year. For example, males earned 23.3% more than females in 2005.
Let’s take a look at visualizing the data.
Looking at this chart, we would immediately think the difference between gender salaries has decreased dramatically in the last 11 years. And despite the title, it’s not clear which is the higher salary and which is the lower salary. We’re assuming our audience will know.
Now, look again at the same chart with a simple change.
As you can see, when we adjust the scale (vertical axis) to the full 100%, there’s been very little change. One of my favorite quotes is from Leon Trotsky, “Everything is relative in this world, where change alone endures.” Although I usually shorten it to just “everything is relative.” In this example, because most person’s understanding of percentages are relative to a scale of 0 – 100, the first chart’s scale can be misleading. Persons absorb charts visually so the position and movement of the line on the chart is what they are going to pay attention to first and it’s going to be relative to their perception of a percentage scale. Having a truncated vertical axis is not necessarily a bad thing as it allows us to see more detail, but it’s important to make sure that it doesn’t promote misconceptions of the data. This a more truthful chart but it still has the problem of clearly identifying which gender’s salary is higher and which is lower.
Let’s look at a chart which visualizes the median salaries.
This chart’s primary story is the growth of earnings for males and females from 2005 – 2016. Although it does show the male earnings as higher it’s not going to be the primary focus of the viewer despite the chart title. There’s a reason for that. Because it’s a line chart, there’s a perception of movement (hence line charts being so effective for a time series). So our primary perception is the increase of earnings across time and the gap between the two genders is secondary.
By highlighting the gap between the two lines (and adding data labels) the story now becomes more about the % of differences between male and female earnings as stated in the title. The visualization is perceived as one item but the focus is on the gap instead of the lines.
This chart’s story is two simple truths:
- the gap between male and female earnings has decreased 3.6% in the last 11 years and
- earnings have increased at a steady rate for both genders since 2005.
By manipulating the data we can create completely different stories.
This chart gives the impression of a widening gap between the earnings of male and female even though the data labels clearly show the percentage as less. That’s because the data is cumulative. Each year’s earnings is added to all the previous year’s earnings (known as a running total) to end with the total earnings for all 11 years. This is a potentially misleading chart since viewers may not take time to absorb the details required to understand it.
In this chart it appears that female’s earnings are better than male’s. That’s because it’s charting the percentage of growth in earnings and is using a truncated vertical axis. So while it’s true that female’s earnings have increased slightly more than male’s, the difference in the growth is nowhere near bringing parity to the earnings between the genders. This is also a potentially misleading chart.
Like I said at the beginning of this article, every one of these charts are “truthful” but, as you can see, there can be lots of versions of the truth. How you decide to correlate, transform, aggregate, and visualize data has a great impact on how someone perceives the story. And while it might be tempting to have your data shown in such a way that it supports your beliefs, isn’t it better (and more ethical) to show the data in as clear and unbiased way as possible?
So how can you tell your data’s story?
- First, consider what your data is about. My example was pretty straightforward, the difference between male and female’s median salary for the last 11 years.
- Next, decide how you manipulate the data in relation to your intended audience. In my examples, while calculating the percentage of difference between the genders was useful, transforming the data to running totals and percentage of growth was potentially misleading. I’m a huge fan of the KISS principle. It means Keep It Simple, Stupid.
- Finally, consider your visualizations carefully. Start with your main title, then imagine your visual without any other text. Your audience is going to expect the “picture” of the chart to tell the story of your title. Strive for simple truths.