Critiquing the Visual.ly Divorce Post

Update: Paul Van Slembrouck, the designer of this graphic, has responded to the critique. Be sure to read his comments below!

I am a teaching fellow for a class at Harvard called “The Art of Numbers,” which teaches principles of data presentation to undergraduates from all concentrations. For a recent midterm, students were asked to analyze this graphic from Visual.ly:

Distribution of education levels for women who divorced in 2008

For valentine’s day Visual.ly posted a series of visualizations of divorce statistics in the U.S.. Several aspects about this graph bothered me, and I thought it would make for a good exam question.

The point of this graph is to show the distribution in education levels of women who got divorced in 2008. The relative proportions are shown as flowers (from Tim Burton’s cottage, apparently) of different sizes.

Visual Representation

The class is structured around the design philosophy of Edward Tufte, and in particular his book The Visual Display of Quantitative Information. The graphic above fails several of the design criteria put forth in that book:

Maximize Data Density

There are only 4 numbers behind this graphic. Would the information be any less clear if the four numbers were listed in a table?

Minimize Graphical Ambiguity

Most humans are pretty bad at estimating the relative sizes of 2-dimensional figures (they are even worse with 3 dimensions). This is one reason why pie charts are almost never an optimal representation of data — we are much better at comparing relative lengths than areas. This graph, while not as flashy, is more to the point:

Here the quantity is mapped onto height, making it easier to compare the relative categories.

The number of displayed dimensions should not exceed the number of data dimensions

Most of the graphical elements in Visual.ly’s image are superflous to the data — that is, they don’t encode any information. What does the height of the flowers mean? The number of leaves? The distance between flowers? Nothing. This extra decoration (Tufte might call it “chart junk”) at best distracts from the data, and at worst misleads the reader into interpreting data that doesn’t exist.


Statistical Interpretation

There is a more sinister issue with this graph — one which I believe flirts with Visual.ly’s code of ethics. The conclusion drawn from this graph is that “70 percent of divorced women did not finish college …women with higher levels of education wait longer to get married and report higher levels of satisfication than their less-educated counterparts.”

This conclusion ignores the education distribution for the population as a whole. This is important — were there fewer divorces among the college-educated because such people are less likely to divorce, or simply because there are fewer of them?

To emphasize the problem, consider the category of “women who divorced in 2008 and have won $10,000 or more in the lottery.” There are probably a few such women, and the divorce rate within this category is (obviously) 100%. However, they would represent a vanishingly small flower on the plot, since — overall — very few women fall into this category. Might Visual.ly conclude that divorced lottery winners have low divorce rates from this information?

Looking at the distribution of education for all Americans over 25, we get the following breakdown (note that the Associate’s degree seems to correspond to the “some college education” category in Visual.ly’s graphic):

Distribution of education for US adults as a whole. Look familiar?

This is nearly identical to the first plot. Lets plot the two data sets against each other:

Only after we have both of these pieces of information can we calculate the divorce rate within each category. The most obvious point is that there are disproportionally more divorced women in the “some college education” category compared to the general population. However, people with college degrees are essentially no more or less likely to divorce than people with no high school diploma (in fact, the numbers imply women without high school degrees had a slightly lower divorce rate in 2008 than the other categories). This crucial extra information undermines the conclusion of the first graph. Even if it didn’t, however, the caption under the first graph has no business drawing the conclusions it does, given the data it isn’t taking into account.

Takeaway

Part of the “Art of Numbers” class involves students collecting and critiquing visualizations they find on the web. Visual.ly is a frequent source of such graphics, and the class is divided about the website. Some take the Tufte-esque view that data graphics should not be decorated any more than what is required to display the data. Others think that graphics like those on Visual.ly play an important role, in that “pretty” charts grab people’s attention and reach a wider audience (of course, a great graphic should do both!).

I appreciate the tension between aesthetic appeal and information content, but I also think that Visual.ly is sacrificing clarity and efficiency in the name of graphical decoration. And regardless of aesthetic issues, it’s never acceptable to over-interpret what the data actually say.

Advertisements

3 Comments on “Critiquing the Visual.ly Divorce Post”

  1. Dan says:

    I think this is a wonderful point and that most people do not treat graphs with the respect (at least I think) they deserve.

  2. From the designer of the Visual.ly “Divorce” infographic:

    Appreciate the critique and I’m pleased that you picked up on the Tim Burton reference.

    You are correct that the number of leaves on the flowers, the shapes of the stems, and the distance between the flowers does not encode information. The percentage values are encoded in the surface areas of the flowers.

    I understand that you structure your class to reflect design principles promoted by Tufte, which I believe is a set of guidelines useful for producing visualization for certain audiences (various academics, professionals, experts, for example), and less useful for producing visualizations that have a broader appeal, for the casual reader.

    On the matter of chart junk, I believe that the best design incorporates both accurate and precise encoding of data AND beautiful aesthetic elements and emotionally compelling concepts. The best visualization not only communicates clearly, but is memorable (http://eagereyes.org/blog/2011/want-to-make-chart-memorable-add-junk). Not all chart junk is equal, however. I’d argue that the junk in this chart is “Harmless Junk” (http://eagereyes.org/blog/2012/three-types-chart-junk).

    You state that ”This extra decoration at best distracts from the data, and at worst misleads the reader into interpreting data that doesn’t exist”. Instead, I suggest that at best, the extra decoration makes the content enjoyable and memorable, and at worst reduces the clarity of the raw data without added benefit.

    On the more pressing issue of drawing conclusions:

    1) Divorce versus Education:

    The chart represents the educational attainment of women divorced in 2008 and highlights the observation that 74% of women divorced in 2008 did not have college degrees. The observation is accurate—not a misinterpretation of the data.

    The problem arises when considering the expectations of the reader. This piece opens with the goal of debunking the commonly held notion of a universal 50% divorce rate, and the best way to do this would be to show the divorce rates within as many demographic subsets of the population as possible. The opening question primes the reader to expect the sort of information that starts with a group of people and then shows the divorce rate. In that regard, this chart is mismatched with the desired narrative: it starts with a group of divorced women and then quantifies a demographic attribute, which is the reverse of the chart I would have liked to include–a result of there not being a lot of good data available on this topic.

    In light of reader expectations, the observed 74% implies that divorce is much more prevalent in those without bachelors’ degrees than those with degrees. While the implication is true, I could have done a better job clarifying that it’s not an accurate interpretation of this data.

    Research does show that divorce is lowest, and declining, among the highest educated segment:

    “From the 1970s to the 1990s, divorce or separation within the first 10 years of marriage became less likely for the highly educated (15 percent down to 11 percent), somewhat more likely for the moderately educated (36 up to 37 percent), and less likely for the least educated (46 down to 36 percent).” (http://stateofourunions.org/2010/SOOU2010.pdf)

    2) Age of first marriage versus education; marital satisfaction versus education:

    Here, I think the issue is that the inclusion of additional commentary (“Statistically, women with higher education wait longer to get married and report higher levels of satisfaction than their less-educated counterparts”) is being misconstrued as an observation drawn from the chart, which it is not. The chart pertains to education. This statement is about age and satisfaction and came from reading the material cited at the bottom of the graphic. In this case, we could have done more to clarify that this information came from elsewhere in the quoted source and is not a conclusion drawn from the chart itself. But it remains a true statement, at least within the cited academic research.

    I would welcome any other questions or comments you may have about the design.

    Paul Van Slembrouck
    Staff Editor / Designer
    Visual.ly

    • Paul,

      Thanks for chiming in! You summarized better than I can the aesthetic philosophy behind Visual.ly and similar groups. I’ll be the first to admit that these kinds of infographics are — at least on first exposure — more “emotionally compelling” and palatable to a broad audience. Most people’s eyes glaze over when they first see a bar chart!

      But I do think this issue runs deeper than aesthetics. I believe (as I suspect do the designers at visual.ly) that the point of data visualization is 1) to communicate the stories behind data, and 2) to expose how data ought to inform how we think about the world.

      It sounds like we agree that the commentary related to this graph — that the highly educated are more immune to divorce — is not supported by the data alone. These conclusions may be true (i.e. supported by OTHER data), but they are not what THESE data say. The reason why this is true is somewhat subtle, and data visualization should function to illuminate and explain these subtleties. Even if your conclusion isn’t false, associating the conclusion with the data stikes me as either dishonest or misleading. Also, as I discuss, these data DO have a story, and it has some twists (like the apparent low divorce rate among the least educated — that deserves some thought, given the sources you quote above).

      The greatest power behind beautiful data displays (which yours certainly is!) is that they encourage readers to spend more time thinking about data. This is best exploited when it is used to communicate interesting truths about complex data sets (otherwise, why use data in data art at all?). Unfortunately, this data set seems to be attached to the wrong narrative.


Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s