Not All Data Visualization is Complicated

There's a tension in data visualization between making images that are beautiful -- often visually complex, with a great deal of information that the audience can uncover just by moving a cursor over it -- and visualizing data and information in ways that are simple and clear -- and yes beautiful as well, but balancing aesthetics with clarity. The Information is Beautiful Awards have just released their "long list" of submissions (full disclosure: I submitted an entry but it didn't make the cut), and they are extraordinary. You can see those that did here.

It says a great deal about the technological advances that have occurred in the last twenty years that we can so easily create visualizations that contain so much data, that one can interact with in so many interesting ways, with nothing more than an Internet connection. But the more that I talk with fellow SABR members, and with people who are interested in baseball, history, analysis, and the intersection of the three, the more that I hear from people who are skeptical of data analysis, or the importance of visualization generally. For some, it's a discomfort with using unfamiliar computer applications; for others, it's a strong feeling that stories and strategies drive knowledge, not numbers.

In any case, while I work often in Tableau for data analysis, it's important to be "tool agnostic" in this or any discipline. The principles of what works and doesn't work don't depend on what tools omne chooses, and there are four that fall under an important principle of accessibility.

In the data visualization class that I teach at MICA, we often work with two fundamental principles of data visualization in a world where all of us are constantly bombarded by a huge volume of visual information. 

The first is that the way that we communicate visually has to balance simplicity, clarity, and beauty. A graphic that is aesthetically beautiful may be so complex that audiences can't easily understand its meaning. (There's the idea of a "seven second test," which means that a graphic that a viewer can't fully comprehend within seven seconds is too complicated.) On the other hand, a graphic that is too simple can insult the audience's intelligence or not be compelling enough to draw its attention. 

This leads to the second principle: that as data visualizers, we often make tradeoffs between beauty and clarity. And it's the idea of these tradeoffs that led me to think about the most exclusive club in professional baseball: the list of six players who have hit more than seven hundred home runs in their career.

The 700 Home Run Club

The most exclusive achievement in major league baseball is hitting seven hundred home runs in a career. It's more than four times as rare as a perfect game; while 24 pitchers have pitched a perfect game, only five in the U.S. major leagues have reached 700 home runs. (Not included in this list is Josh Gibson, who may have hit as many as 800, but is credited with only 165, and Sadaharu Oh, who hit 868 home runs in the Japan Central League.) I've also included Roger Connor, who held the record of 138 career home runs before Babe Ruth.



How I Made It

I wanted to visualize--in a way that was as accessible as possible, not limited by language or computer skills--how rare and notable the achievement is. I started thinking about how I wanted to visualize these seven home run totals. I thought of bright, bold (but not excessively so) colors; something that was easy to understand, especially for people who are unfamiliar with baseball; and something that was understandable without needing to interact with an online tool. I thought both of W.E.B. DuBois's data visualizations--which predated online tools by more than a century-- and the bullseye paintings of Kenneth Noland.

I made two sketches with colored pencils, both playing with the idea of showing the number of home runs as the areas of different colored circles:



I chose the version of the right because it shows the relative sizes of the home run totals-from Roger Connor's to Sadaharu Oh's-very well. (Note that there are two bands for Josh Gibson - one representing his "official" home run total of 165, and another representing that he may have hit as many as 800 during his career.)

Taking this visualization off of Tableau and onto canvas meant that I had to calculate three data points on a Google Sheet:

  • The relative area of every player's home run total, using multiples of Roger Connor's total;
  • The relative radius for every player (using the formula for the are of a circle, A=𝜋r^2; and
  • Factors for multiples of 2, 3, and 4 times that radius, so that all of the cirles would fit on a 12 inch by 12 inch canvas.
I could also create a timeline for the 125 years (from 1897 to 2022), based on a ten-inch timeline on the canvas, to show when each player ended their career.

Here's what those calculations look like:


Now, I could create a pencil sketch, using a compass with a pencil point, to lay out the canvas from the center point.


Now, I could add a neutral wash and begin painting.

A diluted neutral wash--made by watering down a mix of colors on the opposite sides of the color wheel like greens and reds--makes it easier to add color and to tell the painting apart from the white field of the gessoed canvas. It's a practice that I learned in high school art class and have made part of my painting practice since then.

This shows the painting taking shape. I like to use very vibrant colors, but I try to keep in mind that reds and greens are often hard for people with colorblindness.

Once I finished the "bullseye" painting, I needed to add a sense of scale to the painting to show when these home run totals were set. I used the scale above to calculate where on a ten-inch line each home run total should fall.

The end product is below. Two notable (and related) points jump out in the final painting. The first is that the difference between the highest total in the range (Oh's 868 home runs) and the lowest (Albert Pujols's 703, achieved this year) is only 165 home runs. The second is that while Babe Ruth more than quadrupled Roger Connor's total thirty-eight years later, over the following 87 years, the American major league record has only increased by 48 home runs and the overall record by 154.


At the end of this exercise in visualizing the top home run hitters at the highest levels of professional baseball, what's the most pleasing for me is that the visual includes people previously excluded from the major leagues, including the Negro Leagues and Josh Gibson, but also the Japanese major leagues which have been an important source of talent for the American majors. It also makes the important point that even in a very technologically-oriented age, there is also an important space for visuals that don't require a computer to look at, appreciate, and use to spur thinking.

There's also an opportunity to see this painting in person at the Maryland Institute's faculty exhibition, in Baltimore Maryland, later this fall. More details to come!








Comments