2006/06/better_axes.html

Better Axes

A good rule when making graphs is to remove needless impediments. Every extra act of interpretation we ask of the reader is a chance for them to misunderstand, be baffled, or get frustrated and move on. There should be as little standing between the reader and the data as possible. One level of interpretation all readers have to grapple with is the humble axis; here are some guidelines.

Label directly

If you can, put units right there on the axis, not on the axis label. In general, getting information out of the label and caption and putting it on the graph where people can see it a good idea.

(By the way, that’s a real degrees sign above; all fonts have one. See Robin Williams for tips on finding it and other special characters. Never try to fake it with a superscripted o!)

Rotate for readability

All the graphing software I know generates vertical labels on the y-axis by default, but these are really almost unreadable. It’s a good idea to make them horizontal wherever possible, moving them to the top if there’s no space to the left.

I’m also a fan of getting axes out of log form. Real units are what we’re used to reading, and forcing people to calculate antilogs in their head increases the risk they’ll misread your numbers. We’re pretty terrible at comparing logarithmic values as it is, so it’s almost deceptive to hide them behind a linear axis.

When you expand units to make them comprehensible, they do take up more room. (For example, see below. But who doesn’t understand what Ma means, I hear you ask? Well, your mother, administrators, your congressman, journalists, and the voting taxpayers or undergraduates who pay your salary perhaps.) One solution is to rotate them a little: -30° in this case. It’s also possible to make the axis smarter; why not show geological periods, for example?

Use sensible units

If you’re using non-decimal units for some reason don’t use a decimal scale, even if it seems more “scientific”. Why make needless work for the reader?

(The foot and inch marks used here, by the way, are the prime marks, not the typewriter quotes next to the semicolon key—see Robin Williams again.)

When we count days, we think in months and years, not base 10. If you turn the scale into a calendar, it no longer needs a silly axis label like “day of year”. Of course, always identify months with a word or roman numeral, because 01/05/06 can mean Jan 5 or May 1. We’re all used to reading calendars, so a detailed scale is fine—note that a sufficiently-detailed one doesn’t need an axis line.

 

You Don’t Have to Start with Zero

axes6.gifHonestly. In some cases, it’s just meaningless, as there is no “value zero” to graph, such as with the days to the right. And having the y-axis pass through 1 means a data point might get tangled up with the tick marks on the axis, so there’s no reason not to leave a small gap.

There are also cases where beginning at zero would add pointless empty space to the graph; consider how little trend we’d be able to see if the graph on the right’s y-axis went from 0 to 110. So the answer is to eliminate empty space from the axis as much as possible without being actively deceptive.

William Cleveland, in The Elements of Graphing Data, often allows the scale to continue below zero, to “avoid interference” between the perpendicular axis and any zero-value data; he uses a dotted reference line to stand in for the dropped axis. Unfortunately, in the following graph this implies that ozone could exist in a concentration of less than zero parts per billion! The space below the reference line is in fact a misleading and uninhabitable no-man’s land.axes7.gif

Leaving a gap so points don’t hit the axis is OK, but extending the scale implies data values also continue. These days, with a better color palette, data on the axis no longer has to be a problem. Giving data points a thin white stroke allows them to intersect lines (and each other) while remaining visible; a little better than the jittered circles Cleveland was forced to use. (In the above makeover, you can see I also added units to the axes, rescued the labels from that horrible ALL-CAPS computer font, and condensed the empty space.)

Comments and suggestions welcomed, as always.

COMMENTS

I just randomly stumbled across your nice blog. Very informative post! Thanks for sharing it!

Just a minor detail about using sensible units: wouldn’t it make more sense to group days into weeks instead of blocks of 10? 😉
And while I’m at it… centimeters seem a lot more sensible than inches, too 😉

Ups, now I got started… please don’t take this the wrong way…

I also think that using sans-serif fonts for diagrams is a good idea. It’s somehow a standard and helps distinguishing what is part of the diagram (or figure) and what belongs to the main text. Mixing 3 types of fonts (sans-serif, italic-serif, and serif) for one axis label (Babinet point) might also be questionable. A log-scale which is is not labeled as such should have at least 3 labeled ticks. (I think not everyone might be aware that the non linear spacing of the ticks automatically implies log-scaling.) In your ozone/wind speed example using “o” markers instead of dots makes more sense as overlaps are much better recognizable. I also don’t understand why there is such a big gap between the y-axis and the data. (“pointless empty space” as you called it).

BTW, what software do you use to create such nice graphs?

Yes, I agree about weeks being a more sensible grouping. My mistake.

Inches are of course a silly unit, but sometimes we have to use them (plotting people’s height in the USA, for example).

Most graphs use sans serif, but there’s no law—Tufte for example happily uses serif faces. As long as they’re distinguishable from the caption and body text, it’s fine, and that’s usually not a problem. It’s also fine to mix two different type families (in the Babinet example, Minion and Myriad) if they contrast enough (and I emphasized the contrast with caps and gray). Italic and roman from the same type family would count as one choice in almost every case.

You’re absolutely right that a log scale needs three labeled ticks; I cropped the axis to fit. And I actually think it’s easier to count the number overlaps in my version of Cleveland’s graph, but your mileage may vary.

The graphs are started in a variety of programs, and finished in Illustrator. I’ll be posting an entry on my workflow.

Thanks for writing!

Hi, here from Robert Kosara’s recommendation posted in comments to Kaiser’s Junk Charts. There is a circumstance in which the would-be grapher absolutely must start with zero, and that’s when creating a bar graph. If that causes problems, it’s time to consider abandoning the bar graph and adopting something which doesn’t need a zero on the scale. I’ve seen bar graphs where the designer recognised the problem with zero, adopted and defended the solutions, but without getting rid of the bar graph format. Those wavy gaps are the least bad of the abortive compromises resorted to by people who won’t give up their bars.