Graphs are used to present information in a visual, summary format. They can be used instead of tables. Used successfully, graphs reduce the amount and complexity of data used in sentences. Hopefully this article gives you additional tools to decide which graphs to (not) use.
The hardest and longest working person in the field of graph design is Edward Tufte. I have included a link to their website under Resources.
Anyone who knows me well also knows two important pieces of information. I hate pie charts and I hate poorly made bar charts. I have taken charts from publicly available reports to illustrate my point. I have also taken examples from various disciplines to show that bad chart design is everywhere.
Lastly, I have deliberately selected reports that have not been identified by the chart designer, or have multiple authors. This article is not intended to name and embarrass individuals, and a designer’s publication approval process does not normally involve much. Managers and/or peer reviewers have decided that these graphics were fine to use.
simple pie chart
The purpose of a pie chart is to show how mutually exclusive, related categories each contribute to information about that category.
Let’s start with a simple example. Below is a pie chart containing only two categories: male and female. Pie charts are often used to show gender ratios, for example when reporting survey results.
But why use pie charts for binary classification? To reiterate, categories are mutually exclusive. All we can say is that 49% of the books reviewed had female authors. It is easy to assume and calculate this was done by 51% of male authors.
The website aims to highlight the lack of reviews for books by women authors. If you visit the link, you will see a series of 14 pie charts for each of the newspapers rated by Stella Count for 2013. Even with the big screen, you’ll be scrolling to see them all. And the categories in The Monthly’s pie chart are inverted – it’s hard to keep track of the consistent formatting for so many charts!
I think the information would be better presented in a bar chart. I have used R for this. The packages I’ve called are ggplot2 and ggridges. Gregg has been used to cycle the two colors through the bars. I find that color cycling improves the readability of the graph compared to having only one color for each bar. There was one hiccup that I can’t fix, with the color cycling on the bottom, so I’ve forced the reverse order for a couple of times using fillValues.
Important information from those 14 pie charts – the representation of female authors in newspaper book reviews – is now clear at a glance.
For ease of interpretation, I have color coded the bar with shades of pink. (Yes, it’s a stereotype, but Pink takes home the point that these have consequences for women). The alternating colors make it easy for the eyes to trace along each strip. I have graphed the data by descending female representation, reinforcing Stella Count’s point.
While the exact ratio cannot be read from the graph, the grid lines at every 5% provide a sense of the numbers. Important numbers can be mentioned in the text.
more complex pie charts
The pie chart below has a lot of fragments, and it deals with gene expression. Only three slices are enough to contain the text. Each category is tagged with its respective ratio.
One category, “Miscellaneous Functions”, had no altered genes, and is shown adjacent to the pie chart. It is hovering in space. However, because that function is sitting next to a purple slice, a quick glance reveals that it is related to that slice. The line “nucleic acid regulation” shows the actual range, but not all slices have lines connecting the range.
Additionally, because all bars are forced to be the same length, the difference is hidden in the numbers that fall below the ratio. Then comparing the relative proportions of the bars can be confusing.
A factor that accounts for 30% of the times may not be interesting if the result relates to three out of ten people. Our interpretation of importance would change if the same percentage were based on 200 people.
Another, less complicated example is below. There are two main problems with this graphic. First, the bar contains percents. It is a confession that people cannot interpret values from the length of bar squares. If you click on the link (in the caption), you will find that all the percentages are listed for all the years on the same page at the bottom of the chart.