Critiquing the Visual.ly Divorce PostPosted: March 13, 2012
Update: Paul Van Slembrouck, the designer of this graphic, has responded to the critique. Be sure to read his comments below!
I am a teaching fellow for a class at Harvard called “The Art of Numbers,” which teaches principles of data presentation to undergraduates from all concentrations. For a recent midterm, students were asked to analyze this graphic from Visual.ly:
For valentine’s day Visual.ly posted a series of visualizations of divorce statistics in the U.S.. Several aspects about this graph bothered me, and I thought it would make for a good exam question.
The point of this graph is to show the distribution in education levels of women who got divorced in 2008. The relative proportions are shown as flowers (from Tim Burton’s cottage, apparently) of different sizes.
The class is structured around the design philosophy of Edward Tufte, and in particular his book The Visual Display of Quantitative Information. The graphic above fails several of the design criteria put forth in that book:
Maximize Data Density
There are only 4 numbers behind this graphic. Would the information be any less clear if the four numbers were listed in a table?
Minimize Graphical Ambiguity
Most humans are pretty bad at estimating the relative sizes of 2-dimensional figures (they are even worse with 3 dimensions). This is one reason why pie charts are almost never an optimal representation of data — we are much better at comparing relative lengths than areas. This graph, while not as flashy, is more to the point:
Here the quantity is mapped onto height, making it easier to compare the relative categories.
The number of displayed dimensions should not exceed the number of data dimensions
Most of the graphical elements in Visual.ly’s image are superflous to the data — that is, they don’t encode any information. What does the height of the flowers mean? The number of leaves? The distance between flowers? Nothing. This extra decoration (Tufte might call it “chart junk”) at best distracts from the data, and at worst misleads the reader into interpreting data that doesn’t exist.
There is a more sinister issue with this graph — one which I believe flirts with Visual.ly’s code of ethics. The conclusion drawn from this graph is that “70 percent of divorced women did not finish college …women with higher levels of education wait longer to get married and report higher levels of satisfication than their less-educated counterparts.”
This conclusion ignores the education distribution for the population as a whole. This is important — were there fewer divorces among the college-educated because such people are less likely to divorce, or simply because there are fewer of them?
To emphasize the problem, consider the category of “women who divorced in 2008 and have won $10,000 or more in the lottery.” There are probably a few such women, and the divorce rate within this category is (obviously) 100%. However, they would represent a vanishingly small flower on the plot, since — overall — very few women fall into this category. Might Visual.ly conclude that divorced lottery winners have low divorce rates from this information?
Looking at the distribution of education for all Americans over 25, we get the following breakdown (note that the Associate’s degree seems to correspond to the “some college education” category in Visual.ly’s graphic):
This is nearly identical to the first plot. Lets plot the two data sets against each other:
Only after we have both of these pieces of information can we calculate the divorce rate within each category. The most obvious point is that there are disproportionally more divorced women in the “some college education” category compared to the general population. However, people with college degrees are essentially no more or less likely to divorce than people with no high school diploma (in fact, the numbers imply women without high school degrees had a slightly lower divorce rate in 2008 than the other categories). This crucial extra information undermines the conclusion of the first graph. Even if it didn’t, however, the caption under the first graph has no business drawing the conclusions it does, given the data it isn’t taking into account.
Part of the “Art of Numbers” class involves students collecting and critiquing visualizations they find on the web. Visual.ly is a frequent source of such graphics, and the class is divided about the website. Some take the Tufte-esque view that data graphics should not be decorated any more than what is required to display the data. Others think that graphics like those on Visual.ly play an important role, in that “pretty” charts grab people’s attention and reach a wider audience (of course, a great graphic should do both!).
I appreciate the tension between aesthetic appeal and information content, but I also think that Visual.ly is sacrificing clarity and efficiency in the name of graphical decoration. And regardless of aesthetic issues, it’s never acceptable to over-interpret what the data actually say.