2022 Negative BABIP Regression Candidates

Work done by Voros McCracken has shown that a batter’s previous 3-year BABIP is a good predictor for next year’s BABIP, it is known. Here’s a quote from our own glossary:

..changes in BABIP are to be met with caution. If a batter has consistently produced a .310 BABIP and all of a sudden starts a season with a .370 BABIP, you can likely identify this as an instance in which the batter has been lucky unless there has been a significant change in their style of play.


In this article, I’ll point out the players whose 3-year BABIP (2017, 2018, 2019) was lower by at least 3 percentage points than their 2021 BABIP. I’ll tackle the positive regression candidates in next week’s article. I’m also using players who accumulated at least 300 PAs in each of those seasons. That really limits the pool, but it is also populated by players that you should certainly draft in 2022. The 2020 season was excluded because no batter reached 300 PAs. Why 300 PAs? Well, BABIP takes a really long time to stabilize, so much so that it doesn’t even happen in one season. In fact, looking at it in the way of PAs doesn’t really do it justice as a player really needs somewhere around 820 balls in play to show stabilization. However, 300 PAs in each of the 3-year averaged seasons is enough to get a pool of reliable players and I think you’ll see that in the lists. Here’s another gem from our glossary:

In reality, there is no magic threshold at which one’s BABIP becomes predictive of future BABIP, but about two seasons worth of data will give you a decent indication of true talent.

Let’s add a third a season, just to be safe. Here are the league averages from all four seasons being used in this exercise:

2021 league average BABIP: .292
2017, 2018, 2019 league average BABIP: .300, .296, .298

Here are players who both accumulated 300 PAs in 17′, 18′, 19′, and 21′ and are likely due to regress in 2022.

BABIP Negative Regression Candidates
Name 3 Year BABIP 1 Year BABIP Diff
Starling Marte 0.317 0.372 -0.055
Kevin Kiermaier 0.292 0.345 -0.053
Brandon Crawford 0.290 0.334 -0.044
Bryce Harper 0.317 0.359 -0.042
Kyle Schwarber 0.270 0.306 -0.036
Yuli Gurriel 0.301 0.336 -0.035
Tim Anderson 0.337 0.372 -0.035
A.J. Pollock 0.291 0.326 -0.035
Trea Turner 0.329 0.362 -0.033
Adam Frazier 0.306 0.339 -0.033
Joc Pederson 0.249 0.281 -0.032
3 Year BABIP – 17′, 18′, 19′
1 Year BABIP – 21′

Any BABIP close to .380 is simply too high, even for Tim Anderson. But, most of these players have demonstrated higher than average BABIP in a three-year span and so it’s difficult to distinguish between luck and absolutely smoking the ball, like Kyle Schwarber, who found himself in the 93rd HardHit% percentile in 2021 according to statcast. Pair that with a 94th percentile BB% and you’ve got, what I think is, an underrated outfielder, especially in OBP leagues. But, the real question is, what can we project for someone like Schwarber? What do you say, Steamer? The answer: .270.  Which is exactly Schwarber’s three-year average. That’s the key here, trying to figure out which players have made a profile change and which players got lucky.

A good way to understand the differences and similarities by all the players in this group is to cluster them or place them in groups based on commonalities. However, it can be difficult to visualize data with more than a few dimensions, so I’ve calculated a principal component analysis of the batted ball statistics (HardHit%, EV, LA, Barrel%) and then ran a K-means clustering algorithm on just two metrics, the batted ball PCA (which tries to explain all the batted ball data in one number) and statcast’s HP_to_1B metric, which our own Mike Podhorzer wrote about a few years ago as a great correlative statistic with BABIP. Here are each of our regression candidates batted ball statistics, the metric that tries to encapsulate all of them at once (Batted Ball PCA) and their Statcast HP_to_1B statistic (measured in seconds):

2021 Statistics of BABIP Overperformers
Name Statcast HP_to_1B HardHit% EV LA Barrel% Batted Ball PCA Cluster
Bryce Harper 4.38 49.2 92.5 13.3 18.1 -12.8 4
Kyle Schwarber 4.43 52.2 92.3 15.4 17.5 -15.2 4
Trea Turner 4.13 46.0 89.6 11.4 7.4 -4.3 2
Brandon Crawford 4.48 43.1 88.8 14.7 11.5 -3.8 2
A.J. Pollock 4.39 47.1 90.3 12.0 11.1 -7.2 2
Starling Marte 4.22 39.4 87.6 4.6 8.4 2.0 3
Yuli Gurriel 4.42 41.4 89.8 13.4 3.4 1.2 3
Tim Anderson 4.32 41.8 89.7 4.3 7.8 -0.2 3
Adam Frazier 4.42 25.0 85.4 12.6 1.0 17.4 1
Joc Pederson 4.38 47.6 91.0 14.1 10.1 -7.5 2
Kevin Kiermaier 4.11 36.4 86.6 1.1 4.0 7.2 1

While the PCA calculation can be difficult to explain, we can look at players on the opposite ends of the number line to see how it’s derived. A more negative PCA relates to a good batted ball profile. Take Kyle Schwarber at a -15.2 and we can see that he has the highest HardHit%, the second-highest EV, the highest LA, and the second-highest Barrel%. Marte finds himself in the middle range of the batted ball PCA and a top three fastest runner based on statcast’s hp_to_1b metric. You can also see that Yuli Gurriel’s batted ball PCA stands near the middle of this group because most of his metrics are great, brought down somewhat by a 3.4 Barrel% and a low-speed stat. Another interesting player is Frazier, as his batted ball metrics are poor when looking at a principal component and he’s not much of a speedster. I’d like to dig in a bit more with him in particular, but it’s possible this could be one of our biggest BABIP regression candidates. Overall, however, this gives us a good way to categorize these BABIP overperformers. What do you think?

A player’s single-year BABIP should be expected to regress to their 3-year average. These players all beat their 2017-2019 BABIP in 2021. By the parameters of the players involved in this investigation, we’re left with really good players. But, we can see that due to a combination of good batted ball metrics and speed, players like Tim Anderson, Starling Marte, and Yuli Gurriel (Cluster 3) are getting on base with balls in play, based on good batted ball metrics and decent speed. Players like Kiermaier and Frazier (Cluster 1) are a little one-sided; speed for Kiermaier and I’m not really sure for Frazier. Cluster 2 is a little suspect. Turner is obviously very quick and has good batted ball metrics, while Pollock, Crawford, and Pederson are more so relying on just batted ball metrics, so without the speed component, maybe a negative regression is more likely for cluster 2, omitting Turner. Schwarber and Harper just smoke the ball (Cluster 4).

This is a fun exercise to get a feel for players and how they are getting such high BABIP marks. We can use all these fancy analyses to fill the hours of our day and read about something that we love, baseball, or we can just put our money on regression. I’ll run the same analysis for the positive regression candidates next week.

Leave a Reply

4 Comment authors
Joe WilkeyLucas KellyDetroit MichaelDknapp26 Recent comment authors
newest oldest most voted

I might be pulling this out of my @$$, but wasn’t there a fairly significant increase in exit velo, particularly on ground balls, with the ball change this past year?

If that is the case, this analysis might be getting muddied by leaguewide changes in average BABiP on certain batted ball types, the most obvious example would probably be pulled ground balls (especially into the shift).

If, for an example of a hypothesis, Schwarber’s BABiP on pulled ground balls went up simply because the league average increased due to slightly higher average exit velos, that would essentially represent a predictable “skill” change on his part.

An analysis that breaks down BABiP based on batted ball type, and perhaps also incorporates exit velo, could help to identify changes in true talent level or predictable results.

Joe Wilkey
Joe Wilkey

Actually, EV on GB was way down the last two years, from 85.6 in 2018/2019 to 84.4 in 2020/2021. I don’t know if that’s due to the ball or not, since the overall average EV on all batted balls was more or less the same and almost always has been. Other than 2017, which had an average EV of 87.2 mph for all batted balls, every other year in StatCast (2015-2021) has had an overall average EV between 88.0 and 88.5 mph.

Someone should really look into what was going on in 2017…