« Diagramming Ukuleles | Main

Best Means, Bar None

Although bar charts are the default graph for many statistical packages, they create a number of problems. I think there are better alternatives when you’re graphing mean values with margins of error, and will try to illustrate this with a makeover of a standard bar chart. See if you agree.

btl1.gif

Here is the raw output of the stats package, graphing nine mean values with standard deviations. The biggest problems, for me, are the obtrusive frames and tick marks, and the nasty vibration effect caused by all those high-contrast bars crammed together.

btl2.gif

To improve this, I deleted the axes, greyed out the tick marks, shrunk the graph title, and rotated the y-axis label; all just what you’d expect from me if you’ve read other postings. I needed to distinguish between results from the main body of the river and its tributaries, so tried to make the distinction intuitive by using different shades of a less-raucous color.

All very well. But should we even be using a bar chart at all?

The problem with bars is that they draw attention to a single value—the height of the bar—while minimising information about confidence intervals. Some people don’t even use error bars on their charts, which is a bit unforgivable. Even with error bars, it makes a big difference if you show those confidence intervals in both directions from the mean, or only one way. Compare these three identical pairs of means: bars can make it harder to see whether two values might actually be the same.

btl_example.gif

Let’s try graphing those values again, starting with the error bars alone, with the mean value knocked out in white. (I created a small horizontal white line and manually aligned a copy with the top of each bar. Illustrator CS4 makes this particularly easy, with little guides popping up to give you visual feedback on whether things are lined up exactly. Then I deleted the bars themselves and thickened the remaining vertical lines.)

btl3.gif

Now we’re seeing a much better picture of the uncertainty of our results. We might decide we need better data for site 8 if we want to be sure whether it’s in the low or high group; it may well have the same mean as site 9.

The other advantage of using lines rather than bars is that they take up much less room. I can fit almost twice as much information into the same space, which allows me to compare two different sample localities side by side.

btl4.gif

So next time you’re faced with a whole page of tiny bar charts, you may want to consider pulling out the error bars and displaying them on their own. Probably more honest, and certainly more compact.


Thanks again to EOS Ecology for permission to use work I did for them in Pictures of Numbers—the data are real, the locations have been changed.

Comments

great post!
Hope to see them more frequently!

Alessandro

I agree with your main point.

Two other considerations I would have are:
1) is the use of negative space for the mean better than some other indicator (e.g. a thick line width dash "-" slightly wider than the error bars) and do people readily understand that this is a mean +- error plot?

2) in all of the cases, the order of the sites should be considered. In this case perhaps 1, 2, 3, 4, 5... makes sense because the sites are consecutively further from the river source (or some other metric), but in many cases sorting sites by the value of the mean or some other order than 1, 2, 3, 4, 5 (or a, b, c, d, e...) would make interpretation stronger.

Nice to see your writing again. I think I have colleagues that could benefit from a printout of this discreetly placed on their desks.

Thanks for the suggestions, people.

Scott 1) The white bar for mean seems to work for people; I would include a sample bar with key in the first graph of the series, with a wee explanation about what error bars actually mean (this was for a non-scientific audience).

Scott 2) Rearranging the bars would be nice, but I had no control over the ordering here.

John 1-2) I was given the first chart, error bars and all, as a .wmf generated by a common stats package; the scientists weren't using Excel for their data analysis.

John 3) Excel still can't export charts in an editable, publishable format like EPS. If one has to clean up its charts in a graphics package anyway, then doing this sort of redrafting is far easier there. It's really pretty speedy when you get into the groove.

If, however, one needed to update the chart dynamically as more data was added (though most scholars don't) one can create something very similar in off-the-shelf Excel; I did it using the Stock Chart option, manually creating High and Low columns by adding and subtracting the SD from the mean, thickening the vertical lines and turning the marker into a little white bar. But it wasn't perfect: Excel got confused when I tried to use site numbers as x-axis labels, wouldn't let me show tick marks alone on the y axis, and couldn't manage two different line colors (let alone generate the main/tributary legend).

No doubt some of these limitations are because I'm not quite an Excel expert. John, would you be interested in posting a short tutorial on your blog on how to do this with off-the-shelf Excel?

John 4) Yes, I've "reinvented" the box and whisker plot, although not really as this doesn't show quartiles or outliers. It's a little like Tufte's redrawing of same, but without his offset lines for quartiles, which I never liked.

1. If you want to show spreads (i.e., use error bars), then a bar chart is inappropriate. Use a chart type that encodes the values using markers.

2. For all of its warts, MS Excel shows distinct error bars in both directions on bar charts.

3. For all of its warts, MS Excel could generate charts like your last few, all in Excel (i.e., no exporting from stats package to Illustrator), dynamic (so the chart changes when the data changes), with no more effort than you've spent.

4. Looks to me like you've reinvented the box plot, or at least a box-plot-like display.

Mike -

3. A PDF made from an Excel chart includes a high-res vector graphic, which can readily be modified in Illustrator or whatever.

Also, some third-party programs will accept an Excel chart as a WMF-type vector graphic. Don't ask which, it's been eons (but Corel Draw comes to mind).

Rather than tick marks without an axis, how about no tick marks at all, and light gray gridlines. The axis labels serve as gridline labels.

Alternatively, place a dummy XY series along the axis, with a marker-less point at each tick location, then apply a positive X error bar with a value like 0.25 or 0.5.

Mike -

4. The offset line quartile concept is just one example of Tufte's ink-reductio-ad-absurdum. Sometimes a little extra ink is good, especially if it is actually helping to show data. I purposely called your graph a "box plot" without mention of whiskers. I have some examples of Box and Whisker Plots on my web site, as well as a commercial utility.

Hmm... an unexpected blog posting about two of the better kayaking rivers really makes me want to grab my aging Blitz and head out for a paddle... Cheers Mike!

Hi Mike,

Very nice and, from a scientist's perspective, it gets to the real question - is that within group variation small enough for us to believe the between group variation means something.

I don't what you'll think about it but another option for similar data is a dot-plot. I used one here (that's straight from the very cool ggplot2 package for R). Perhaps not what you'd want for publication but its pretty neat to be able to see all your data and the summary stats at the same time.

I've learned a lot from your posts Mike, but I disagree with you on this one.

If the errors are the main values of interest, then yes, it's a great format. In most cases, however, the point estimates are the main values being compared, and the error bar is simply an accessory that attests to the credibility of the estimate.

My problem with thick error bars is that ironically, they create the impression that the longer ones are somehow more "substantial", when we seasoned graph readers know that the opposite is true.

What I really liked about your chart is the lean y-axis, which I think you should add to your "Better Axes" article. I'll definitely be trying it out on my next chart!

Comments Welcomed

Verification (needed to reduce all the spam I get):