Negro Leagues as Major Leagues: The Data Tell a Story



Baseball took some strides in 2021 towards racial equality in its use of statistics. It redefined the Negro Leagues as a "major league," placing their statistics at the same level as those of the National and American Leagues. ("Negro Leagues" is the contemporary term used to refer to alternative baseball leagues operated between roughly 1920 and 1960--the term is an outdated way of referring to Black people.) They involved Black players, and often Black ownership and management, when professional baseball was otherwise  White-only for the first half of the twentieth century until 1947.

The Society for American Baseball Research and Baseball Reference has recently released a book of essays celebrating the incorporation of Negro League statistics, The Negro Leagues are Major Leagues. The essays are all well-conceived and written; the highlights for me are former Oriole Adam Jones's reminiscence, Leslie Heaphy's essay about women in the Negro Leagues, and Michael Lomax's piece about the emergence of the Negro Leagues and the entrepreneurship that enabled them to sustain themselves for as long as they did, in some cases beyond the integration of the American and National Leagues.

As a professor of data visualization and an advocate for it in my professional life, I often think about how good visualizations can convey a teaching point even more strongly than the data itself. It is also becoming even more apparent that there are inherent biases in data collection that analysts, custodians, and visualizers of data have to be aware of. 

It goes without saying that even if we are not responsible for data collection biases, systemic racism and sexism have existed in every American institution--including baseball. We are often told that Negro League statistics were not well kept because teams tended to play a lot of exhibitions and barnstorming games. However, as Catherine D'Ignazio and Lauren Klein explain in Data Feminism, data collection often embodies power (or lack of it), and the lack of coverage and recordkeeping for Negro League teams is an outcome of their status as outside of the centers of power in sports.

That leaves us with a nice opportunity to use visualization to answer some fundamental questions about the Negro Leagues. From a competitive perspective, the most essential one is whether Negro League play was comparable to that in the White major leagues. Todd Peterson's essay, "Negro Leagues=Major Leagues", broadly compares the average aggregate offensive (batting average, on-base percentage, slugging percentage, and on-base plus slugging percentage) and pitching statistics (WHIP and strikeouts per nine innings), for both sets of leagues across the 1920-1948 period. Except for K/9, which showed MLB outperforming NLB by 24.2 percent, all other statistics only showed that MLB only outperformed NLB by between 0.8% (for WHIP) to 4.5% (for SLG).

As an analyst, though, my preference is to analyze more data rather than less, because doing so can tell us whether our analysis leads to true insights borne out by the statistics, or if we can only make directional assumptions about the data. To do this, I downloaded data from Stathead, which collects statistics of all major league seasons. My database included season batting statistics for all MLB and NLB teams from 1920 to 1948; a total of 810 rows of data, and 27,540 individual data points. I grouped data into two categories: White major leagues (in red), including the National and American Leagues; and the Black major leagues (in blue), including the Eastern Colored League (ECL), the American Negro League (ANL), Negro American League (NAL), the two iterations of the Negro National League (NNL and NN2); the East-West League (EWL), and the Negro Southern League (NSL).

The dashboard that resulted is posted as a static picture at the top of this post, but the best way to examine it is to look at the interactive version on Tableau Public, which will allow one to mouse over and interact with specific data points. Across the dashboard, the blue figures represent the seven Negro leagues, while the red elements represent the White American and National Leagues. The spreadsheet is a visual way to answer the questions about Negro League play in several ways:

  • Winning percentage: Figure 1 shows the overall winning percentage, across all 28 years, to have been statistically the same across eight of the nine leagues. (The exception to this is the East-West League, where its winning percentage falls outside of the grey band that denotes the 95 percent confidence interval.) Figure 2 shows a slightly different story, looking year-by-year at the aggregate winning percentage for the Black and White leagues. The winning percentage is much more volatile for the Black leagues, likely because of the ebb and flow of the leagues across the 28 years covered by this analysis.
  • Competitive balance: Figure 3 shows pennant winners for each league from 1920 through 1948. Save for the East-West league, which only operated for one season, each league features multiple teams that won pennants, suggesting that one team did not dominate the competition.
  • Batting statistics: To visualize the quality of offensive play, Figure 4 shows the on-base percentage (the percentage of plate appearances in which the batter reached base on a hit, walk, or error) versus slugging percentage (a measure of how many bases a batter hit for). The figures are averaged by team and season. 
  • Pitching statistics (Newly updated): I updated the visualization today with an eye towards submitting it to Tableau's IronViz competition. (This is an annual contest where the three finalists compete in a live final for a $10,000 prize. It's a good chance to get your visualization noticed, but the deadline to submit is tomorrow, February 7th, at midnight Pacific time.) What impresses me about these data, in Figure 5, is how Negro League pitching had a higher ERA, it also had a higher strikeout to walk ratio--suggesting a different approach to pitching and a unique brand.

One of the points that a presenter made in this weekend's Data Visualization Society's Outlier Conference was that data visualization simplifies and clarifies data, leaving the raw information able to be analyzed, but preserving the essential points for a broader audience.

That's why I think this visualization works. By examining all the data from 1920 through 1948 for all teams, we can say that the competitive level for the Negro Leagues and the White major leagues was very similar, if not identical. Contemporary accounts show how Major League executives like Clark Griffith and Branch Rickey treated Negro League teams, often dismissing them as "unorganized" and refusing to compensate them for signing their players in the late 1940s. Now that the Negro Leagues are officially major leagues, it's important to give them credit for achieving what they did.

Comments