2009/03/best_means_bar_none.html

Best Means, Bar None

Although bar charts are the default graph for many statistical packages, they create a number of problems. I think there are better alternatives when you’re graphing mean values with margins of error, and will try to illustrate this with a makeover of a standard bar chart. See if you agree.

btl1.gif

Here is the raw output of the stats package, graphing nine mean values with standard deviations. The biggest problems, for me, are the obtrusive frames and tick marks, and the nasty vibration effect caused by all those high-contrast bars crammed together.

btl2.gif

To improve this, I deleted the axes, greyed out the tick marks, shrunk the graph title, and rotated the y-axis label; all just what you’d expect from me if you’ve read other postings. I needed to distinguish between results from the main body of the river and its tributaries, so tried to make the distinction intuitive by using different shades of a less-raucous color.

All very well. But should we even be using a bar chart at all?

The problem with bars is that they draw attention to a single value—the height of the bar—while minimising information about confidence intervals. Some people don’t even use error bars on their charts, which is a bit unforgivable. Even with error bars, it makes a big difference if you show those confidence intervals in both directions from the mean, or only one way. Compare these three identical pairs of means: bars can make it harder to see whether two values might actually be the same.

btl_example.gif

Let’s try graphing those values again, starting with the error bars alone, with the mean value knocked out in white. (I created a small horizontal white line and manually aligned a copy with the top of each bar. Illustrator CS4 makes this particularly easy, with little guides popping up to give you visual feedback on whether things are lined up exactly. Then I deleted the bars themselves and thickened the remaining vertical lines.)

btl3.gif

Now we’re seeing a much better picture of the uncertainty of our results. We might decide we need better data for site 8 if we want to be sure whether it’s in the low or high group; it may well have the same mean as site 9.

The other advantage of using lines rather than bars is that they take up much less room. I can fit almost twice as much information into the same space, which allows me to compare two different sample localities side by side.

btl4.gif

So next time you’re faced with a whole page of tiny bar charts, you may want to consider pulling out the error bars and displaying them on their own. Probably more honest, and certainly more compact.