Main

January 20, 2008

Swans and Geese

Ellesmere (Te Waihora) is a huge shallow lake south of Christchurch, which partially connects with the sea—in fact they bulldoze a channel to the ocean when they want to lower the lake level. As part of a conference on the management of te Waihora, I designed a poster for Ken Hughey of Lincoln University.

One of the things Ken wanted to depict was the population fluctuations of Canada geese (Branta canadensis) and black swans (Cygnus atratus) on the lake. He had several years of historical data, though fewer for swans than geese, and wanted to show how goose numbers had dropped below their optimum population level.This is the goose graph; the swan one looked much the same but with fewer bars.

ken_geese.gif

The first thing I decided to do was place both sets of data on the same axes. Because the population data were continuous, it made more sense to link them in a line graph, which I created in Ken’s Excel file and copied into Illustrator. Normally I spend five minutes with the direct selection tool deleting all the crap that Excel adds to its graphs; in this case it was simpler to select the trend lines alone and pull them into a new layer, the create axes and numbers from scratch, using the original graph in the background layer as a guide.

(By the way, not many people seem to know that you can drag whatever’s selected from one layer to another by dragging that little colored square layer_square.gif near the layer’s name.)

swangoose.gif

To color the goose trend line, I used the eyedropper to sample some brown from a photo of a Canada goose; I just made the swan line black. Those colors were applied directly to the text in the title, as I didn’t want to have a key or legend. Working from Googled photos of each bird, I created silhouettes to label each line, again to avoid a key. It was easy enough to annotate the graph with a level line.

When a graph isn’t working, our temptation is to jazz it up. Excel has any number of ways of making graphs fancier: WordArt, shadows, gradients, 3D, backgrounds and so forth. But this usually makes things worse. Ken had used some of these in his lake level graph, but I thought it needed some simplifying.

ken_levels.gif

I used the same color palette, stroke thicknesses, and font as in the swan and goose graph, so they looked like they belonged together on the poster. It was important, I felt, to translate the rather cryptic numbering scheme for lake openings into English, and to annotate the graph with lines to show the duration of lake openings, rather than just listing dates.

ellesmere_depth.gif

If I were to do this again from scratch, I might use a series of little horizontal lines, one at each sampling time, to represent the lake level, rather than a continuous line.

alt_levels.gif

There’s more information on Ken Hughey’s Waihora research at his group’s web site. Thanks to Ken and to EOS Ecology, for whom I did the design work, for permission to reproduce these graphics. I especially appreciate it when a scientist is brave and altruistic enough to let me post their “before” versions in a forum like this.

My posting frequency has taken a hit since I started working full-time as an information designer, but when the ukulele book is finished I’ll be posting more regularly to Numberpix. The project for 2008 is to finish Pictures of Numbers, my book on data presentation. If readers of this blog have any suggestions for content, you’re welcome to email me: I’m mike, at numberpix.com.

September 4, 2007

Disproportionate Risk

Here’s another graphic from those folks at Catalogtree, published a while ago in the New York Times Magazine, in an article on the influence Shia Islam is likely to have on Middle East elections. So where are all the Shiites then?

nyt_shiites.gif

If you said Iran and Bahrain, you are so, so wrong. Pakistan actually has 60 times as many Shiites as Bahrain, a tiny country with far fewer Shiites than Iraq. What the graphic is showing is percentages, not numbers; percentages of a total population we’re not told. The designers probably decided that percentages were most important when you’re discussing elections, but the convention in these types of graphics is that each little person stands for an actual amount (say, a million people), rather than 1%.

If we go the the Foreign Affairs article cited, we find the following data.

shiite_table.gif

Note that Catalogtree chose just a few of these countries to graph; Bahrain rather than, say, Afghanistan or Syria. Why? Well, it rather looks like they read no further than paragraph 2 of the multi-page article (their choices are the bold ones):

“Shiites account for about 90 percent of Iranians, some 70 percent of the people living in the Persian Gulf region, and approximately 50 percent of those in the arc from Lebanon to Pakistan — some 140 million people in all.… Recent events in Iraq have already mobilized the Shiites of Saudi Arabia (about 10 percent of the population); during the 2005 Saudi municipal elections, turnout in Shiite-dominated regions was twice as high as it was elsewhere. … The Shiites of Lebanon (who amount to about 45 percent of the country’s population) have touted the formula, as have the Shiites in Bahrain (who represent about 75 percent of the population there), who will cast their ballots in parliamentary elections in the fall.”

The main problem with including all the countries in the table is obvious when you graph it: India dwarfs everybody else. In other circumstances, one might log-transform the x axis, but that’s just silly when you’re trying to compare numbers and percentages.

shiite_bars.gif

One solution, and it’s the one used in the Foreign Affairs article, is to indicate percentages on a map rather than on a bar graph. The size of countries is very roughly proportional to their population anyway, and by cropping the map one keeps India from dominating it.

shiitesFA.jpg

This map has a few problems, though. Because it uses no colors, it relies on rather odd patterns to convey the percentage of Shiites. It’s also a bit of a tangle of coastlines, borders, and pointers. I tried a redesign with a more intuitive color palette, scaling back boundaries as much as possible. Of course you’d want to label countries as well.

new_shiite_map.gif

We still don’t know the actual numbers of Shiites, though. That could be done by overlaying little person-markers proportional to numbers, the way the Catalogtree graph seemed to be doing but wasn’t. I intentionally didn’t arrange the Shiites in serried ranks, like an army on a parade ground presumably about to march on the effete West.

new_shiite_map2.gif

Of course, it ends up looking like a certain board game. But I guess when you want to show hordes of figures camped on countries, you just have to run that Risk.

shiites_risk.jpg

Reference: Vali Nasr, When the Shiites Rise, Foreign Affairs, July/August 2006

August 28, 2007

Chameleon Ray Grid

David Shiffman, an undergraduate at Duke University, was working with Dr. Dan Rittschof on stingray feeding behavior for his honors thesis, which required catching several rays of two different species and keeping them in captivity. While they were holding them in two different kinds of tank, they noticed to their surprise a color change in both species—a result completely incidental to research, but interesting enough for David to summarize for me thus:

stingray1.jpg

The problem with this graphic is that it separates different individuals of the same species, and it’s a little confusing to see how the tank acts as a treatment and the ray color is the result. I suggested he rearrange the photos into a matrix, which is often a good way to show the interaction between factors (in this case, ray color and tank color).

stingray2.jpg

I’d suggest some pretty standard changes: removing boxes around things wherever possible, paying attention to the typeface, and showing rather than labeling the tank color.

newrays.jpg

I’ve chosen here to make a swatch of the ray color, which can be an excellent way to show color change, but the sacrifice is the distinctive silhouettes of the two species, which make it much easier for us to follow what’s happened. The ideal situation would be better-quality photos of each ray, showing the whole fish without reflections to distract the reader. The rays should be photographed on a neutral background, or using each of the actual tanks as a background (the photos currently all use the same gray background, which just confuses things). The advantage of using the actual tanks is that you’d no longer need to label the colors—they’d be self-evident. The only text necessary would be the species names. Show, not tell.

February 25, 2007

Deceptive Areas

areas_circle.gifPeople are poor at accurately judging areas; they do much better comparing linear measures like the lengths of a bar or the heights of a point. Areas can be useful where precision’s not important—circles can be scattered over a map, for example, to allow readers to scan for trends. But too often designers indicate data with areas because shapes are cooler than lines and you can arrange them in pretty patterns.

Regardless of the shape chosen, because we have a hard time judging areas, it’s vitally important that sizes are calculated accurately: namely, proportional to the value they represent. Otherwise the designer is telling lies.

From the Sunday New York Times magazine for February 25, here’s another mess from Catalogtree. There are plenty of poor choices made here—note the particularly ugly way that circles have been doodled on top of the text, making the chart look like a printing error. Leave aside also the fact that half the wording consists of dull qualifiers that could have been easily turned into footnotes, and that the designer could think of no better way to arrange the text than just dumping it on the page in a block, as if they didn’t actually want anyone to read it. These pale into insignificance beside the circles being the wrong size.

areas_circle_lie.gifNote the largest value (892) and the fourth largest (436). One is just over twice the size of the other, and a circle twice the size of another should have a diameter √2 as big: about 1.4 times as wide. The larger circle in the graphic is actually about twice as wide, and it’s about four times as wide as the 204–225 circles. To see this amount of distortion this creates, compare the original proportions of the two largest circles, (right, top), with the corrected ones (right, below). I bet the designer just halved or doubled the circle diameters rather than actually calculated the areas required, which is pretty inexcusable.

To see the the way it should be done, check out the circles from an earlier New York Times article reporting frequency of the word “Iraq” in different presidential State of the Nation addresses. The first circle is twice the area of the second, not twice the width. Note the elegant overlapping too.

areas_iraq.jpg

It’s understandable that some designers might mess up circular areas, as it takes a little algebra to work back from the desired area to calculate the right diameter to use; namely, 2√(area/π). But even simple squares can defeat a designer’s math abilities. Below is another example from a much earlier NYT magazine—the design firm 5W infographic, discussed in a previous posting, is no longer used by the Times.

areas_immigrant.jpg

How do we know if there might be a problem with an information graphic? One clue is weird instructions on how one should interpret it; a good data graphic doesn’t need instructions in how to read it. Note, for example, a key that points out that area only 25% of the original somehow equals 50% of the value. Again, this looks like a math-phobic designer at work. Illustrator’s Info palette works in widths and heights, not areas, so it’s easy for a designer to drag to draw a square where one pixel of linear distance, not area, equals 1% in value. Actually calculating the correct widths, from √area, was obviously too difficult.

areas_immigrant_key.gif

The result is a graphic that makes large values too large and small ones too small. In this case, the designer chose to use only some of the data in the survey, mostly the pro-US results, so the graphic is not only inaccurate but biased.

It’s really not even necessary to use a data graphic at all when there are just a few data points, if you arrange them carefully. I tracked down the original report and summarized the questions in tabular form, arranging the columns in a different and more intuitive order. The margin of error was such that I wasn’t happy pulling out trends with anything more than some boldface type; in fact, that 41 probably shouldn’t be highlighted at all, since it’s not really different from the 37. And I had room to add a couple of extra questions. The whole thing could I’m sure be squeezed into the area the original graphic occupied, but I’m getting tired of fixing the work of highly-paid, supposedly-professional designers, so will spend the next couple of posts looking at something else and give Catalogtree a break.

areas_table.gif

February 22, 2007

Wacky Wheel of Wedges

preacher.gif Not particularly wanting to harsh on the same design company twice, but the New York Times Magazine included another screwed-up chart on Sunday, February 18th. In this one there are only nine actual data points, which could have been adequately shown with a plain bar chart, but that wouldn’t have looked cool enough, would it? So the designer decided to groove things up by repeating each very thin bar multiple times, and pulling the whole thing into a circle.

Well, it does look exciting and retro. One problem is there’s no room to label the bars directly, so we have to laboriously go back and forth to puzzle out which is which, distinguishing 9 from upside-down 6 (if your chart requires upside-down reading to interpret, you’re probably doing something wrong), making pairwise mental rotations of clumps of bars, and so forth. Quite a bit of work for nine data points; a simple table would be clearer.

But it gets worse. Again the designer helpfully annotated the bars with actual data. The first thing I noticed was that 92% looks rather more than four-and-a-bit times as high as 21%. So I traced over one member of each clump in Illustrator, and measured their lengths. (The easiest way to measure the lengths of angled lines in Illustrator is Window > Document Info > Objects. If I wasn’t intending to reconstruct the graph, I could have just used the Measure tool, of course.) preacher2.gifSure enough, the bars weren’t even remotely to scale. I rotated them all to vertical, turning on the invisible grid to help, then typed the actual data into Excel and produced a quick bar chart, and juxtaposed the two (the Excel bars are flipped to make comparison easier).

We can see straight away that the shorter bars are disproportionately small. This has a pretty serious effect on the political slant of the chart, as it minimizes the amount of time the clergy seem to spend holding forth on immigration and stem-cell research, and overemphasizes their sermonizing on hunger and poverty. (I’d be interested in knowing if clergy really only talked about nine things, and if this is a subset what was left out and why—but another time). What’s caused this distortion? It’s not from stretching the 92% bar too much; there’s progressive distortion of all the other bars. tinypreacher2.gif After playing around with scaling—see the inset—it seemed more like an arbitrary fixed chunk has been lopped off each bar… Then I realized that the bars originally ran all the way to the center of the circle. That’s where we’re supposed to be mentally measuring to! Go check the original chart—was this obvious to you? Or were you fooled by the virtual baseline, and the numbers, into thinking the bars stopped there? (Me, I have enough trouble comparing ink I can see without factoring in imaginary ink that I can’t.) The designer felt they had to fill the middle with white so they could arrange their numerical labels there, and the numbers were only required because the chart’s groovy circularity left no room for anything better. So the path to the resulting mess seems clear.

preacher3.gifBut there’s another problem. If there were just one bar for each value, we’d at least all agree we should be comparing their heights. But using multiple bars creates a sort of exploded pie chart, with a wedge for each datum. Pie charts, clunky as they are, are a type of chart most people recognize, and we’re used to comparing areas, even if we don’t do it very accurately. But look at the exaggeration caused by mistaking the wedges for pie slices. I traced over the largest and smallest wedges and compared their areas; the larger has what Tufte calls a Lie Factor of 6.9 (doesn’t that sound imposing?), meaning it’s nearly seven times as large as it should be.

(An aside for the geeky: there is in fact a way to measure the area of closed paths in Illustrator, though it seems so s00per-seekret that I’m reluctant to share it. Briefly, open the debug window (on a Mac, command-option-shift-F12), click on Objects and Object Tree to expand then, select your closed path and see it become bold in the object tree, click its bold underlined name, and lots of terrifying numbers will appear in the Objects section, including the area—in points, I think, but it scarcely matters.)

So a cascade of bad design choices has led to a needless distortion of the relevant data, for those readers who weren’t so rebuffed by this wacky graphic that they just skipped it. At least the designer left the numbers on there, or we’d never even have known. The whole thing could have been avoided with a simple bar graph—not as sexy, but comprehensible, no larger, and (most importantly) not a big lie.

preacher_final.gif

February 11, 2007

False Advertising

adbuyers.gifAlmost every weekend the New York Times Magazine accompanies their first main story with a relevant infographic. They tend to be commissioned from outside agencies, and sometimes lack the good design one sees in most NYT graphs. I’ve written about bad examples before, on subjects like world conflicts and the threat posed by Iran. One of the worst ones I’ve seen in a while appeared in the Sunday Magazine of February 11th.

What’s so bad about it? Well, there are some pedestrian faults, the sort of things we find in a lot of graphs. The lines are labeled indirectly, with a key, so one has to jump back and forth to interpret them. It’s possible the designer felt there wasn’t room to label the lines directly, yet there’s a lot of wasted horizontal space between each year—he or she could have easily fitted them in. (And what are those lines anyway? What’s the difference between “Automotive Factory and Dealer Associations” and plain old “Auto Dealerships”? Surely one of the designer’s jobs is to communicate, not just recite corporate spin that calls fast-food vendors “Quick-Service Restaurants”.) The colors are pretty, but don’t signify anything, serving only to distinguish the lines. And it’s strange that every point has its value labeled; why not just use a y-axis scale? After all, with a general newspaper readership surely it’s the trends and relative magnitudes that matter here, not the exact values.

adbuyers_notes.gifA good thing the designer did label the points, though, or we wouldn’t be able to see how misleading the graphic is. Absolute height doesn’t correspond to value, for example (see A). I’m guessing he or she did this to stop lines 2 and 3 from crossing—they did cross in inconvenient old reality, but that messes up the pretty pattern. Note that line 1 should be about three times the height of line 2, but I suppose that would create an ugly gap.

Change in height doesn’t match change in value either. The two lines in B correspond to the two changes in line 5. Since both are 0.3 billion dollars, the lines should be the same height, and they obviously aren’t. Look at the magnitude of change in line 1 as well. It looks like every line is using a different scale, and the designer just made the ends all join up so it looked nice, like a subway map.

And what the heck’s going on with C? 1.2 should equal 1.2! Perhaps it’s 1.25, and the digit was left off so it would match the others, and incidentally make the graph absurd. Doesn’t anyone at the Times proofread these things?

my_adbuyers.gif To redo this, I first generated a basic chart in Excel. I pasted this into a background layer in Illustrator, locked it, and just traced over all the components in a new layer (the chart is so simple it’s hardly worth ungrouping and deleting all the junk that Excel puts in its graphs). I came up with category names that were a bit more meaningful, and created a y-axis, which really only needs to be anchored by a few values.

One thing very obvious now is the dominance of auto advertising. I’m sure I’ve oversimplified the two auto categories; I’d want to see to what extent they overlap or could be lumped. Another thing to note is how wildly the original graph overemphasized changes; increases and decreases now look much more modest (although I wish we had ten or twenty more years of data; it would easily fit in the same space.) The color-coding is still meaningless; perhaps rising and declining categories could be colored differently, or color could encode information, like predominantly-print vs. TV advertising. The main thing, though, is that the graph’s no longer telling lies. (And we got to keep the groovy rounded orange lines.)

January 10, 2007

The Scientist’s Rainbow

How many colors do you need? Color costs money to print, and disappears when you laser print, fax, or photocopy a graphic, and some people (like me) have trouble seeing it. This is why I use grayscale graphics wherever possible, saving color for emphasis (I’ll say more about this in a future post).

The opposite of this approach is a graphic that uses every color in the visible spectrum for no apparent reason. One finds this so often in computer-generated charts in scientific publications that I call it The Scientist’s Rainbow. Once identified, you’ll see it everywhere. Here’s a good example.

rainbow1.jpg

The original had no key to the colors, and they seem to correspond only to the scale on the y-axis. One way of testing whether color is conveying any information is to convert the graphic to grayscale.

rainbow1b.jpg

So this color was just for decoration—yes, rainbows are pretty. But sometimes the Scientist’s Rainbow is actually impeding communication. The visual spectrum isn’t arranged in an intuitive order, and if there’s a scattering of colors it can be pretty hard to extract values for the mess.

rainbow2.jpg

We sould start by extracting the meaningful stuff from the empty space. I expanded the vertical scale until the pixels were squares, not rectangles, converted everything to grayscale, blurred away the sharp edges, and messed with the contrast a little.

rainbow2b.jpg

Note that the lightest areas are not the highest intensity, but correspond to yellow and green in the middle of the spectrum. The highest values on the colored graph are reds, which are as dark as the blues at the other end (so the original would be meaningless as soon as it was photocopied). We’d be best to plot the raw data again using a black-to-white scale, although some clever color substitution in Photoshop could also convert a rainbow to shades of gray. That sort of work is best left to interns though, with their love of drudgery.

August 28, 2006

Ditch the Tables, NYT!

Tables are often better than charts, especially with just a few data points. But sometimes a graph is what you need. Here’s an example from the Sunday New York Times magazine of August 6th. It shows American opinons on what to do about Iran.

nyt-table.gif

Quick, what’s the message of the table? That most people think we should use diplomacy against Iran? If so, then why show three surveys? There must be some kind of trend here. Wait, why is the timeline going from right to left? OK, there seems to have been a dip in support for military action back in May, or was that a rise in the number of people who don’t think Iran is a threat? But there’s a 4% margin of error, isn’t there? Does that matter?

Let’s try a graphical depiction. To reflect the 4% margin of error (presumably a 95% confidence interval 4 percentile points either side of the reported figure) I used an 8 pt line, where 1 pt = 1 pixel = 1 per cent when I originally constructed the graph in Illustrator. It would have been better to apply a linear gradient to each line that fades from the midpoint to the edges, but Illustrator can’t do that very easily.

nyttablelines.gif

I went into the public records and added the poll results for February, which are almost exactly the same as June. For some reason the Times left them out, though they had plenty of room. So is the dip in May significant? Did something happen then that the Times should be telling us about? Who knows?

But before we start over-analyzing the data, here are two different polls, both taken in February 2006. Note the difference in responses to essentially the same questions.

nyttablebars.gif

So the biggest differences are caused by pollsters; any supposed “trend” in the Times data is no greater than the difference between two simultaneous polls with slightly different questions. If the media just put a few polls side by side, maybe we’d lose our ill-founded confidence in them.


Why are we polling random Americans about what to do in Iran, anyway? Here’s the opinion poll I’d like to administer:

  1. Should the United States take military action against Iran?
  2. Who’s the President of Iran, anyway? Starts with A. No googling!
  3. Name two cities in Iran, and two countries that border it.
  4. Out of France, China, Israel, and India, which has the most nuclear weapons, and which the fewest?

Scoring note: No correct answers for questions 2, 3, or 4 scores an automatic “N/A” for question 1.

(Answers: 2, 3, 4)

August 21, 2006

Beach Mouse Pelt Map

UC San Diego biologist Hopi Hoekstra and her co-authors found that the light-colored beach mice of Florida differed from their darker cousin by a single nucleotide in one gene—at least in Western Florida. The Eastern beach mice seem to have evolved their color some other way. She produced a very nice graphic mapping coat color and light/dark allele frequency (her colleague Bill Lynn did the mouse pelts, in Photoshop).

beachmouse.gif

Doesn’t this lay out their argument well? I couldn’t resist making a few changes, of course, because I’m fussy:

  • Fading back the coastline and pointers so the data stood out
  • Choosing different and related colors for the mouse ranges
  • Making the color of the pointers match the ranges
  • Changing the circle fills to a dark brown
  • Extending the coastline and range into adjacent states, and labelling the states
  • Making the state border a little different from the coastline
  • And, in a bit of typographic pickiness, raising the baseline of all the “=” by half a point and putting a thin space either side.

(I notice it all looks very Tufte now, with his patented Tufte beige, but that wasn’t the intention.) If one were to do a serious redesign, my first suggestion would be to move the Oldfield mouse up into Georgia, so that it’s physically separated from the beach mice, and on land while they’re in the ocean. Adding a key to the two allele colors would be nice, and would pretty much remove the need for an explanatory caption. But I think the graphic works fine as is.

beachmousenew.gif

Reference: There’s a nice popular article on the findings, and the original paper is:
Hopi E. Hoekstra, Rachel J. Hirschmann, Richard A. Bundey, Paul A. Insel, Janet P. Crossland. 2006. A Single Amino Acid Mutation Contributes to Adaptive Beach Mouse Color Pattern. Science, 313(5783): 101–104. 7 July 2006, DOI: 10.1126/science.1126121. (PDF)

May 17, 2006

More on the Planets

One problem with depicting the solar system in an information graphic is that the enormous sizes and distances are hard to grasp. The usual solution is to use logarithms to compress things, but these can be hard to decipher, and sometimes it’s just better to show things to scale. Here’s an elegantly minimalist graph that shows the planetary diameters and distances (using two scales, though—otherwise you’d need a somewhat wider monitor), created by someone identified only as “Brian0918”.

albedobrian.gif

Practically every pixel is data, and even the x axis could be grayed out or dropped entirely, giving an unsurpassable data/ink ratio (which would also help Pluto show up).

The problem element here is the sun, which is so gigantic it swamps everything else. If we were to eliminate it, or just show a chunk, the graphic would become much more concise. I’ve done this below; note the two scales, implicit in the original but I think necessary here—one is five thousand times the size of the other.
 

diamsemimajor.gif
 

That version’s OK for print, but for the web one needs slightly more contrast and more solid typefaces.
 

diamsemimajor2.gif

The other solution for dealing with scale is breaking the graphic into chunks and recalibrating the scale by a sensible amount in each. These comparisons do this (note the common reference object in each successive picture), and use lovely 3D depictions as well, letting us go beyond the solar system to compare our sun with other stars.

planetmodels.jpg

References: Wikipedia has Brian0918’s chart: if anyone has more information on its origins, please leave a comment. The models were found via BoingBoing at Rense.com, which goes to show there are jewels in dross on the Web, but I don’t know the site they were taken from originally. And a practical activity that communicates the scale of the solar system to kids is the Earth is a Peppercorn.

April 9, 2006

Reflections on the Planets

Here is a chart from Wilkinson, illustrating the bubble plot method, where a third variable is encoded by the size of the marker. Unfortunately, planets are not a good data set for demonstrating bubble plots; we automatically assume these differently-sized circles are representations of the planets to scale. We’re also not very good at discriminating between the sizes of small circles: is Earth 0.4 or 0.5? Is Mercury 0.2 or 0.1?

albedo_wilkinson.gif

The units are not very friendly, either. Albedo is just the percentage of electromagnetic radiation reflected by a planet, and AU are units equal to the Earth's distance from the sun. Why temperature was chosen as a third variable is unclear (and why use degrees Kelvin?). Sure, it varies linearly with distance from the sun on a log–log scale, but that’s no surprise. The only exception is Venus, whose temperature is a consequence of a carbon dioxide atmosphere, not albedo; if anything, its cloud layer lowers the temperature by reflecting sunlight. So the graph is not really telling a coherent story.

What could be improved? We can start by just plotting albedo against distance, using more intuitive units for both. Now some of the variation that was masked by the bubble plot begins to emerge:

albedochart.gif

This is a fairly basic chart, but it still isn’t telling a story. To flesh it out, I added relative sizes, changed the brightness of the planets to match albedo, and annotated some of the outliers. The graph has become a little too cluttered, because the pattern of data points is being swamped by supplementary information, but at least now there’s some sort of narrative going on. And it prompts questions, like why does the Earth have such a high albedo when it’s mostly ocean?

albedochartannotated2.gif

A little more research, and I realized Earth’s albedo is mostly determined by cloud cover. That was the key to understanding why Venus and the gas giants were so reflective, and dry balls of rock like Mercury and Mars weren’t. So I resimplified the chart to make that point, stripping off some of the irrelevant information.

albedochart2.gif

Now the graph has a point to make, and you can just tell it’s happier.

References: The original chart is from Leland Wilkinson’s The Grammar of Graphics (Springer 1999). What piqued my interest was a fascinating discussion of what color the planets really are. Planetary albedo data are taken from part of NASA’s site, Wikipedia supplied the Earth albedo data, which merely lists “Edward Walker” as its reference (as does every other site on the internet, blithely copying Wikipedia of course.) The old Wikipedia page cites, gulp, “Walker, E., 1987: Pictures of Preschoolers Out in the Snow. Dishwasher Picture Publishing, Volume 26, 151–1103.” So you may want to take those figures with a grain of salt.

March 21, 2006

Mountains of War

In the Sunday, March 19th New York Times Magazine, accompanying an article on the decline in global conflict, was the following chart:

hsc05nyt.jpg

Apart from the odd terminology (extrastate vs. interstate? And "war between states" means something quite different here in North Carolina...), I was puzzled by the color choice. Was this a stacked area graph, or one where the areas were superimposed, as the color scheme seemed to suggest? I went to the Human Security Centre website and found the original:

hscoriginal.gif

It is indeed a stacked area graph. The second paragraph of the caption reads:

Figure 1.1 is a ‘stacked graph’, meaning that the number of conflicts in each category is indicated by the depth of the band of colour. The top line indicates the total number of conflicts of all types in each year. Thus in 1946 there were five extrastate conflicts, two interstate conflicts, ten intrastate conflicts, and 17 conflicts in total.

A good rule of thumb is that if you need to spell out how to read the graph, there’s a design problem. In this case, we're being misled by the “mountain illusion.” The human eye is used to distant objects being lighter in color, so sees the area chart as a mountain range, and hence layered.

The New York Times designers, 5W Infographic, exacerbated this illusion by changing the color palette to different percentages of magenta. On the plus side, they simplified the x-axis by only numbering decades, labelled each area directly instead of with a legend, removed the frame, and replaced the x-axis tick marks with pale vertical gridlines.

How to get rid of the mountain illusion? The most obvious solution is to change the color scheme by reversing the light-to-dark direction, but this doesn't seem to help much. Better is a color scheme that flattens out the illusory layering. Making the contrast color as similar as possible to the main color seems to work best.

Another option is to change the stacking order, so the smaller series sits on top of the larger, which also fights the mountain illusion. Unfortunately this makes it much harder to discern trends in the smaller series; they're swamped by the larger. Another complication here is that one series disappears, so the trick is to distinguish between the two that remain. I've used a solid line, but a dotted line would be better, and subtle changes in shade best. The final and probably superior alternative is to just unstack everything in Illustrator, by making three graphs and combining them.

While unstacking this area graph makes it unambiguous, it’s much harder to see the total conflicts for a year. Does this matter? As always, it depends on the point one is trying to make; in this case, that world conflict has declined. The trend is pretty easy to see, as the majority of the data are in just one series. If total conflicts were important, one could add a line:

References: The accompanying article was Wonderful World, by James Traub, NYT Magazine 3/19/2006. The lovely Smoky Mountains photo is by someone called Melissa, who unfortunately has no contact details, or I'd have asked. The folks at the Human Security Center at the University of British Columbia helpfully supplied the original data from their Human Security Report 2005.

(Update: charts by 5W Infographics no longer appear in the NYT magazine...)

February 15, 2006

March of the Monarchs

Every year, the Journey North Project tracks sightings of the first Monarch butterflies (Danaus plexippus) of spring, and produces maps like this one.

mm_map2000.gif

mm_legend.gif

The legend shows the first sightings in given two-week periods, but uses overlaid dots, which become a little confusing. The choice of colors is not the best; I had to get very close to my monitor to tell early and late May apart. It could do with some improvement, and Leland Wilkinson in The Grammar of Graphics produces a far superior version, though for an earlier year.

mm_wilkinson.gif

Could this be further improved? Wilkinson uses the “Scientist’s Rainbow” in his legend. But since the data are associated with increasing spring temperatures, and the lines are in a definite order, a legend using increasingly warm colors (different saturations of orange, for example) would work better; the original graph does this to some extent. Wilkinson uses colder colors as summer arrives, which seems counter-intuitive. The lines could easily be labelled directly, since the legend is sitting right beside them. Since most of the sightings are Eastern, half the map is blank and could be cropped.

mm_mymonarchs2000.gif

Instead of ranges, I used a date for the advancing front. Wilkinson’s lines are nonparametrically smoothed contours through the concentration of points in each date range, rather than across the leading edge, because of “random error in the dataset”. The effect is to retard the contour, and it implies that all the points in advance of it are mistakes. Remember, these points are not a population sample from which we’re discerning a mean value, but the edge of a range.

This raises a bigger issue: what’s the purpose of this graphic, anyway? If the goal is to answer the question “When will the Monarchs appear in my state?” then averaging the lines for several successive years would be best—instead of two-week ranges, a less precise time interval would be better like “early April” and “late April”. The data are all online, and the graph is left as an exercise for the reader. Think how useful such a graph will be to our grandchildren, when they’re jaded from seeing Monarchs flitting around in February.

There was a very good article in the October 3rd New York Times science section on Monarch migration. They use a color palette similar to one I suggested above (though these butterflies are heading south), and dots for sightings like the Journey North project. The dotted line is not the wave front of butterflies, as you might think, but a 60°F isotherm—Monarchs can’t fly when the temperature drops below 55°F. The isotherm is labeled a little way offscreen, and I missed the label on my first reading of the map, so that would be my only criticism.

mm_nytimes.jpg

References. The Journey North Project archives can be found at at Annenberg/CPB Learner Online (www.learner.org). Wilkinson’s thorough and analytical book The Grammar of Graphics (Springer, 1999) is also worth a look if you’re interested in programming graph-generating applications.

December 1, 2005

March of the Ratites

The job was a chart that would display the fossil history of the ratites. (Ratites are the giant flightless birds that include the ostrich, emu and the like). There was one previous attempt: the diagram of bird fossil histories from Unwin (1993). It’s rather daunting, isn’t it?

unwin.gif

The chart shows the geological timescale up the left (from the Maastrichtian in the late Cretaceous through to the Holocene and the present day), and when different groups of birds occur in the fossil record (the vertical lines). The timescale implies the different geological stages are all the same length, which is certainly not the case, but there is no scale in years to make this clear. The bird fossils only actually occupy half the timescale, though the gridlines and boxes continue all the way to the bottom of the page. The legend takes up almost as much room as the data, and actually reading information back off the chart requires constantly referring to the key, making comparison fairly difficult.

unwinall.gif

How could this be improved? For its size, the chart contains very little information; most of it is furniture. For each group, there are only three data points: first appearance, last appearance, and how much of the fossil record is patchy or continuous. I was interested in showing just the ratites (roughly groups 18 to 29), and some of these have no fossil record to speak of, so there was plenty of opportunity to add more information. I wanted to show actual names, localities, a picture of the birds concerned indicating relative sizes, flying ability, whether they were actually ratites or not, and some indication of how many bones had been discovered from each. So I started sketching.

marchsketch.gif

I realized I could combine the bone information with the outlines, and indicate flying ability by showing the wings of flying ratites. At this point I put together a geological timeline, to scale, in Illustrator. I decided to show the actual boundary dates of geological periods as well as a linear timescale.

marchsketch.jpg

I roughly sketched each ratite (in some cases reconstructing them from partial skeletons, in other cases working from Googled photographs). I then photocopied them up to full-page size and traced the outline with magic marker. Each ratite was then scanned into Photoshop, the outlines filled in, and the contrast adjusted so only the black silhouette remained. This was then placed into Illustrator as a template. Illustrator has autotrace tools for converting a scanned image into an outline, but I found it much easier to click and drag my way around the silhouette with the pen tool. The nice thing about working large like this is that when you shrink the graphic back down it looks pretty sharp.

ostrichsketch.jpg

Here’s a version of the final graphic. I simplified the crowd of ratites clustered around the Holocene by noting in the text which groups did not have a significant fossil record. The shades of gray indicate how closely-related each species is to modern ratites.

ratiteslines.gif

Where only partial skeletons are known, bones were added to the silhouettes. Where the fossil record becomes continuous, the line for each group becomes solid. I hoped these conventions would be intuitive enough that the chart would pretty much speak for itself.

ratitesbones.gif

One side effect is that I now have a bunch of scalable silhouettes I can use to label other diagrams, phylogenetic trees, and the like, so it was worth the effort.

tree.gif

Reference: Unwin, D.M., 1993. Aves. In Benton, M.J. The Fossil Record II. London: Benton & Hall. Criticisms of this chart should in no way be seen as a reflection on the expertise of Dave Unwin, a splendid fellow, and not just because he let me hold the Berlin Archaeopteryx.

November 3, 2005

The Density of Deer Heels

This graph shows the density of osteocytes, or bone cells, in different parts of the calcaneum (heel) of a mule deer. Cranial, caudal, medial, and lateral translate as front, back, inside, and outside.

oldcalc.gif

The biggest problem is that the arrangement of the bars doesn't correspond with the parts of the bone in question. Nor are the colors of the bars meaningful; they're only there to make the legend work. The cryptic y-axis label is just number of osteocytes per square millimeter. The three cases where there are significantly more osteocytes are indicated on the graph with an asterisk (*), the conventional indicator of statistical significance. Note that the asterisks are in a different position in each case; one might even think the the three black bars are being referred to, but in young fawns it's in fact the white bar.

It's so confusing that the caption takes up nearly half as much space as the graph itself.

oldcalcstext.gif

How to make this more comprehensible? A good goal would be to reduce the amount of interpretation we were requiring of the reader. I noticed the article had a diagram showing cross-sections of the bone in question, from young fawn to adult.

calcoutlines.gif

I scanned them, placed them as a template in Illustrator, then traced around them with the pen tool. For each bone I made a compound path of the inner and outer circles, and sliced it like a pizza with the knife tool into four chunks, each of which could take a different fill.

I noted significant differences right on the bones themselves, and added a color scale, where shade (40% to 80% black) corresponds intuitively to osteocyte density.

calcs.gif

The revised graph takes up less room and is pretty much self explanatory. Once the scales and labels had been given, one could even show multiple small versions of the same four bones side by side, each showing osteocyte density under different conditions.

Reference: J. G. Skedros, K. J. Hunt, and R. D. Bloebaum (1995), in the Journal of Morphology 265(2):244-247.