2022 Negative BABIP Regression Candidates by Lucas Kelly January 13, 2022 Work done by Voros McCracken has shown that a batter’s previous 3-year BABIP is a good predictor for next year’s BABIP, it is known. Here’s a quote from our own glossary: ..changes in BABIP are to be met with caution. If a batter has consistently produced a .310 BABIP and all of a sudden starts a season with a .370 BABIP, you can likely identify this as an instance in which the batter has been lucky unless there has been a significant change in their style of play. In this article, I’ll point out the players whose 3-year BABIP (2017, 2018, 2019) was lower by at least 3 percentage points than their 2021 BABIP. I’ll tackle the positive regression candidates in next week’s article. I’m also using players who accumulated at least 300 PAs in each of those seasons. That really limits the pool, but it is also populated by players that you should certainly draft in 2022. The 2020 season was excluded because no batter reached 300 PAs. Why 300 PAs? Well, BABIP takes a really long time to stabilize, so much so that it doesn’t even happen in one season. In fact, looking at it in the way of PAs doesn’t really do it justice as a player really needs somewhere around 820 balls in play to show stabilization. However, 300 PAs in each of the 3-year averaged seasons is enough to get a pool of reliable players and I think you’ll see that in the lists. Here’s another gem from our glossary: In reality, there is no magic threshold at which one’s BABIP becomes predictive of future BABIP, but about two seasons worth of data will give you a decent indication of true talent. Let’s add a third a season, just to be safe. Here are the league averages from all four seasons being used in this exercise: 2021 league average BABIP: .292 2017, 2018, 2019 league average BABIP: .300, .296, .298 Here are players who both accumulated 300 PAs in 17′, 18′, 19′, and 21′ and are likely due to regress in 2022. BABIP Negative Regression Candidates Name 3 Year BABIP 1 Year BABIP Diff Starling Marte 0.317 0.372 -0.055 Kevin Kiermaier 0.292 0.345 -0.053 Brandon Crawford 0.290 0.334 -0.044 Bryce Harper 0.317 0.359 -0.042 Kyle Schwarber 0.270 0.306 -0.036 Yuli Gurriel 0.301 0.336 -0.035 Tim Anderson 0.337 0.372 -0.035 A.J. Pollock 0.291 0.326 -0.035 Trea Turner 0.329 0.362 -0.033 Adam Frazier 0.306 0.339 -0.033 Joc Pederson 0.249 0.281 -0.032 3 Year BABIP – 17′, 18′, 19′ 1 Year BABIP – 21′ Any BABIP close to .380 is simply too high, even for Tim Anderson. But, most of these players have demonstrated higher than average BABIP in a three-year span and so it’s difficult to distinguish between luck and absolutely smoking the ball, like Kyle Schwarber, who found himself in the 93rd HardHit% percentile in 2021 according to statcast. Pair that with a 94th percentile BB% and you’ve got, what I think is, an underrated outfielder, especially in OBP leagues. But, the real question is, what can we project for someone like Schwarber? What do you say, Steamer? The answer: .270. Which is exactly Schwarber’s three-year average. That’s the key here, trying to figure out which players have made a profile change and which players got lucky. A good way to understand the differences and similarities by all the players in this group is to cluster them or place them in groups based on commonalities. However, it can be difficult to visualize data with more than a few dimensions, so I’ve calculated a principal component analysis of the batted ball statistics (HardHit%, EV, LA, Barrel%) and then ran a K-means clustering algorithm on just two metrics, the batted ball PCA (which tries to explain all the batted ball data in one number) and statcast’s HP_to_1B metric, which our own Mike Podhorzer wrote about a few years ago as a great correlative statistic with BABIP. Here are each of our regression candidates batted ball statistics, the metric that tries to encapsulate all of them at once (Batted Ball PCA) and their Statcast HP_to_1B statistic (measured in seconds): 2021 Statistics of BABIP Overperformers Name Statcast HP_to_1B HardHit% EV LA Barrel% Batted Ball PCA Cluster Bryce Harper 4.38 49.2 92.5 13.3 18.1 -12.8 4 Kyle Schwarber 4.43 52.2 92.3 15.4 17.5 -15.2 4 Trea Turner 4.13 46.0 89.6 11.4 7.4 -4.3 2 Brandon Crawford 4.48 43.1 88.8 14.7 11.5 -3.8 2 A.J. Pollock 4.39 47.1 90.3 12.0 11.1 -7.2 2 Starling Marte 4.22 39.4 87.6 4.6 8.4 2.0 3 Yuli Gurriel 4.42 41.4 89.8 13.4 3.4 1.2 3 Tim Anderson 4.32 41.8 89.7 4.3 7.8 -0.2 3 Adam Frazier 4.42 25.0 85.4 12.6 1.0 17.4 1 Joc Pederson 4.38 47.6 91.0 14.1 10.1 -7.5 2 Kevin Kiermaier 4.11 36.4 86.6 1.1 4.0 7.2 1 While the PCA calculation can be difficult to explain, we can look at players on the opposite ends of the number line to see how it’s derived. A more negative PCA relates to a good batted ball profile. Take Kyle Schwarber at a -15.2 and we can see that he has the highest HardHit%, the second-highest EV, the highest LA, and the second-highest Barrel%. Marte finds himself in the middle range of the batted ball PCA and a top three fastest runner based on statcast’s hp_to_1b metric. You can also see that Yuli Gurriel’s batted ball PCA stands near the middle of this group because most of his metrics are great, brought down somewhat by a 3.4 Barrel% and a low-speed stat. Another interesting player is Frazier, as his batted ball metrics are poor when looking at a principal component and he’s not much of a speedster. I’d like to dig in a bit more with him in particular, but it’s possible this could be one of our biggest BABIP regression candidates. Overall, however, this gives us a good way to categorize these BABIP overperformers. What do you think? A player’s single-year BABIP should be expected to regress to their 3-year average. These players all beat their 2017-2019 BABIP in 2021. By the parameters of the players involved in this investigation, we’re left with really good players. But, we can see that due to a combination of good batted ball metrics and speed, players like Tim Anderson, Starling Marte, and Yuli Gurriel (Cluster 3) are getting on base with balls in play, based on good batted ball metrics and decent speed. Players like Kiermaier and Frazier (Cluster 1) are a little one-sided; speed for Kiermaier and I’m not really sure for Frazier. Cluster 2 is a little suspect. Turner is obviously very quick and has good batted ball metrics, while Pollock, Crawford, and Pederson are more so relying on just batted ball metrics, so without the speed component, maybe a negative regression is more likely for cluster 2, omitting Turner. Schwarber and Harper just smoke the ball (Cluster 4). This is a fun exercise to get a feel for players and how they are getting such high BABIP marks. We can use all these fancy analyses to fill the hours of our day and read about something that we love, baseball, or we can just put our money on regression. I’ll run the same analysis for the positive regression candidates next week.