In addition to my card paintings, I enjoy working with baseball data to visualize trends within a season, or over time. It relates to what I do professionally, where I lead a team of researchers and designers at the National Library of Medicine to analyze and visualize how effectively our digital products are helping people find the medical and health information which they need.
While data visualization for its own sake is fun, it is most relevant when it's applied to answer questions, whether in business or research. So as Major League Baseball teams have started spring training, I've come to wonder how unusual the 2020 season really was. Of course, it was much shorter -- 60 games compared to the usual 162, with games played in empty stadiums and, in the case of the playoffs, at neutral sites. But was performance really that different?
With the SABR Analytics Conference coming up next week, attendees got a free month of Stathead, the statistical service of Baseball Reference. (Thanks, Stathead!) To take advantage of the opportunity, I downloaded data for the 2017 through 2020 seasons to see if there are statistical differences between 2020 and the previous three seasons, and analyzed them in Tableau Public. All of these visualizations are on my Tableau Public page, where you can view them interactively. You can also follow my page so you will be notified when I post more data.
UPDATE: This morning, I ran another analysis looking at the years 2017 through 2020 separately. What I found confirms the initial analysis (in other words, that while the 2020 season was only 60 games compared to the usual 162), offensive performances weren't proportionally different from the previous three seasons. Every dot on these two scatterplots represents a single season by a single player. Let's take a closer look:
Like in the first analysis, the 2020 offensive statistics are bunched in the lower left hand corner, which makes sense. But look at the trend lines (light blue for 2020; dark blue, orange, and red for 2017, 2018, and 2019 respectively). The slopes of the lines for 2018-2020 are all fairly similar (meaning a similar proportion of home runs to overall hits). It actually looks like 2017 is the outlier, with fewer home runs as a share of all hits. I'll leave it to another time to explore that.
Now let's look at on base percentage (OBP) and slugging percentage (SLG), which correct for the truncated length of the 2020 season.
Like the year-on-year analysis of hits vs. home runs, the trend lines here show that 2020 wasn't much of an outlier either, and that in the case of slugging percentage, 2019 seems to have been a higher-performing season for batters.My next step will be to look at pitching: watch this space, and please comment with your thoughts.
The Basics: Hits vs. Home Runs
Comments
Post a Comment