Main

August 14, 2006

Adding Variables

Two surveys recently looked at public acceptance of evolution in a range of countries. This one arranged the data in order of decreasing acceptance.

evolution1.gif

Note the pretty good color choice: red for rejection, something wishy-washy for “don’t know.” I would have gone with a dark green for acceptance, if only because the red/blue contrast is so over-used. The white spaces between the bars are a little obtrusive. Here’s what the New York Times did with it in the August 15th issue. They dropped the numbers, and left the “don’t know” category the same color as the white space between bars, which breaks up those obtrusive horizontal lines. Fading back the colors also helped.

nyt_evn_cropped.png

How would the graph have been different if the bars had been sorted by increasing rejection of evolution? Greece would bump lower, Japan higher, and there’d be some shuffling in the middle, but the key point I think the authors wanted to make—that the USA is way down with Turkey—would have been preserved. But perusing the list a little raises all sorts of questions. Why are the Czechs and Slovaks more likely to reject evolution than the Bulgarians? Why are the Finns twice as likely as the Swedes and Danes? Poland and Ireland are two of the most religious countries in Europe, but they’re near the middle of this list. The graphic raises more questions than it answers. Part of the problem is that it’s only depicting three variables (not four; one category is just the remainder of the other two, as the Times recognized). And two of those are not really independent, having a roughly inverse relationship. So we don’t have enough information to do any analysis ourselves.

Here’s another presentation of similar data (I’ve added a key).

evolution2.gif

Now, this is not a slick graphic. I would certainly have played with the spacing, font size, alignment, ALL CAPS labels, and colors. But by choosing a scatter plot instead of a bar chart, and making the country name into the marker, seven variables are being depicted in the same space the previous graph could only manage two-and-a-bit. Moreover, we’re now equipped to evaluate trends ourselves, look at outliers (the churchgoing Irish, the oddly evolution-rejecting Dutch), and examine our preconceptions (there may be very few atheists in the USA, but that alone doesn’t account for the lack of acceptance of evolution). And note that the simple take-home message of the first graph is not being sacrificed, either: the USA stands out each time.

(The site itself has another 8 sets of charts, each with a different variable on the y-axis, and you can click through to see them appear one at a time in the same space, for easier comparison.)

We sometimes forget just how much information can be depicted in a data graphic. In a previous post I noted that exploratory and educational graphics are the extremes of a single axis. But surely a worthy goal is to do both: make your point, and include enough information for the reader to do their own hypothesis testing. Adding some variables is a good way to start.

References:
Jon D. Miller, Eugenie Scott, and Shinji Okomoto. Public Acceptance of Evolution. Science, August 11. 765–766. PDF online.
Paul, Gregory S. 2005. Cross-National Correlations of Quantifiable Societal Health with Popular Religiosity and Secularism in the Prosperous Democracies. Journal of Religion and Society, vol. 7. Online.

August 7, 2006

Charting HTML

The artist/coder Sala, responsible for the 1000 numbers project, has created an applet that turns an web page into a color-coded diagram that depicts the HTML hierarchy. Here’s the main page of Pictures of Numbers when it was displaying the Tufte Library post.

siteasgraph.gif

Here’s Sala’s key to the tags:

siteasgraphkey.gif

It’s easy enough to spot out the three book reviews and their covers, the lists of links down the side, and the flurry of forms, text, and image in the colophon.

siteasgraphnotes.gif

I know what I’d like: a label beside each node giving the tag class or ID. That would make untangling endless nested DIV tags so much easier. Why doesn’t Dreamweaver have a graphical interface to its code?

Some interesting patterns can be seen on Sala’s site when large sites are compared, and design choices (such as tables vs divs) are revealed—the results reminded me of the haplotype diagrams my officemate generates as part of her research. If you’re interested in other graphical depictions of networks, check out the Visual Complexity site.

May 7, 2006

New York Times Style

The New York Times generally has excellent information graphics, and uses a distinctive house style. Here are three typical graphs, from this article in the May 7 edition.

nyt_margins.gif

  • Note the strong contrast between headings (in bold caps), graph title (upper and lower case) and series labels (caps).
  • The same font is used throughout, the same size in axis labels as in graph titles. This means the labels are larger and the graphs smaller than Excel's defaults.
  • Only two shades of black are used throughout, and no color or patterns.
  • The shades and labels used here is repeated in the next two graphs (see below), consistently referring to the same series. The choice of shades intuitively corresponds to specific (Ohio) and general (U.S.).
  • The bars here sensibly overlap instead of being offset as is usual for bar charts.
  • Direct labelling of sample values is used instead of the little labelled swatches that Excel produces—one less interpretative task for the reader. Note how the sample data is offset, so it can't be confused with real numbers.
  • Explanations are right on the chart: what's a WIN is noted by the data, instead of in a caption or title.

nyt_barline.gif

  • All these graphs are as small as they reasonably can be, yet the bar and especially the line chart contain quite a bit of data.
  • No frame around the charts, and no background colors or decoration.
  • The bars have no stroke, just a fill.
  • No y axis line.
  • Lightly dotted y gridlines.
  • Directly labelling one of the values on the y axis with the unit (in this case, percentage, but it could be dollars or kg or anything else) rather than using a rotated axis label (which is usually hard to read and a waste of space).
  • Abbreviating years as '01 rather than needlessly writing them in full.
  • No tickmarks on the x axis; instead, dividers are used between years.
  • The Ohio series is thinner where it has to be (overlapping bars), but the same width where it doesn't (overlapping lines). All differences should be for a reason; otherwise, keep things the same.
  • Unobtrusive gaps are used for the U.S. series in the bar chart, just thick enough to tell the bars apart, but not so think the series can't be viewed as a single shape.

nyt_bars_arrows.gif

And note some nice little touches: the % symbol in the y axis label extending into the chart a little, so the numbers align properly; the way the line for the label "OHIO" knocks out the gridline so they don't clash; and how one of the 2004 values is allowed to dip below the x axis.

A very close reading of a well-designed information graphic can yield all sorts of good ideas. Good design should require a close reading to unpack; anything immediately obvious is probably too flashy and getting in the way of the data. These examples show how a chart can be simple, beautiful, and functional at the same time.