An Experiment in Baseball Data Visualization: Minor Tests for Major Changes

After the reboot of the minor leagues
in 2021 and rule changes in 2022,
what bodes for MLB?

In my "day job" at the National Library of Medicine, and in my teaching, I'm often an advocate for using data visualization to help make decisions based on the data, and not just on facts and intuition. In fields like baseball (and many others), this can be a challenge because the decision makers often have years and years of experience that they harness -- deservedly -- when thinking about tactics and strategies. 

To be clear, I strongly support using both data and intuition to guide thinking, because over-reliance on either one can lead to bad outcomes. Rely too much on data, and one can fail to account for results that the data can't anticipate. Rely too much on one's own knowledge and intuition, and decisions could rely too much on subjective biases or preferences. But using data to inform decision making and insights makes sure that beliefs and ideas are grounded in something that can stand up to scrutiny.

Case in point: in 2021, the year following a lost COVID-19 pandemic season, a lot of changes came to pass in professional baseball. After years of a loose relationship with the minor leagues, Major League Baseball took greater control of Minor League Baseball. Some were quite drastic: MLB contracted about forty minor league teams, and those that remained often shifted affiliations with major league teams, and the level at which they played. The leagues were consolidated into four levels (low A, high A, AA, and AAA), with three leagues in the three lower levels and two leagues in AAA. 

MLB began to use the minor leagues as a test bed for rules changes, typically designed to speed games and provide more scoring and excitement for fans. Low-A, in 2022, was slated for three major changes that are now scheduled to be adopted in the major leagues for the 2023 season: 

  • Limits on the infield shift, requiring infielders to play on each side of second base;
  • Limits on pitchers' pickoff moves to first base; and
  • Institution of a pitch clock
Now, let's use data visualization to explore the difference between the 2019 season -- the last full season under the "old" rules -- and the 2022 season. One of the challenges that this analysis faces is that there is only one season of experimental data (in other words, games with the new rules). To maximize the number of data points, I analyzed the data for the three low-A leagues - the Carolina League, California League, and Florida State League -- which include 33 teams. 

It's also important to note that the rule changes also came with the realignment that I described above. Ideally, we would look at the same set of teams with multiple seasons under new and old rules for comparison. Here, though (as is sometimes the case), we need to make do with what we have. Below is the visualization:

It shows four parameters:
  • Pitching, shown by ERA versus walks and hits per inning pitched (WHIP). 
  • Batting, shown by OBP versus SLG.
  • Fielding, shown by errors and double plays per game.
  • Team stolen bases by league and year.
The visualization is embedded below, and you can also click here to see the full workbook in Tableau Public. Each dot in the scatterplots represents a single team in the 2019 or 2022 season, and by moving your mouse over each dot you can see statistics about that team.

The legends are also filters; clicking on the season or league will show only data for that year or league. So you can explore the data at your pace, focusing on your favorite team. (Mine is the Fredericksburg Nationals, who played for 15 years as the Potomac Nationals, and who for a few years were about five minutes from my house.)


What do the data tell us? The results are pretty interesting in the context of where the rule changes have been aimed.
  • Pitching performance declined, with higher ERA and WHIP in 2022, though the regression lines are virtually identical.
  • Offense improved, with higher SLG and OBP in 2022, though as with pitching, the regression lines are also identical.
  • What I read in this is that the rules here--probably centered on the pitch clock and restrictions on the shift -- resulted in more batters reaching base. And as we know from Moneyball,  getting on base is a key predictor for scoring runs.
  • Fielding and base stealing are both interesting. The average number of stolen bases across all leagues rose 54 percent from about 199 to about 307 per year. However, while errors and double plays per game both rose in 2022 while in 2019 there was a positive relationship between errors and double plays, in 2022 the relationship was inverse--meaning that teams traded double plays for errors, without any significant tradeoff in fielding percentage.
So in short, pitching got worse, batting correspondingly got better, but the basepaths got more interesting with more stolen bases, and more errors and double plays. So with one season under the belt with new rules, it might be reasonable to say that they did potentially increase the excitement of the game, while also shortening it, with average game length decreasing by about 25 minutes.

In a perfect world, of course, we would have more data. Sometimes as analysts, though, we have to make due with what we have. I hope you enjoy walking through these data, and please comment below with what you think about what they mean!

Comments