The Keys to Pitcher BABIP and HR/FB, Perhaps

Long has the relationship between pitcher performance and batted ball metrics been dubious. The Sabermetric community has a solid understanding of why, fundamentally, a pitcher is good or bad. Strikeouts are good. Walks are bad. Hits by pitch are also bad. Home runs allowed are especially bad. So on, so forth. And by no means are batted ball metrics useless. It’s how we know ground balls allowed are superior to fly balls allowed, for example.

The community had hoped, however, that more granular batted ball metrics would help us better explain some of the more nuanced elements of pitcher performance, including those related to luck, such as batting average on balls in play (BABIP) and the percentage of home runs per fly ball (HR/FB). Since their introduction to the public sphere in 2015, and even with the inclusion of more granular Statcast data in 2016, any relationships that might exist between the physics and outcomes for batted balls during an individual pitcher’s season are still poorly explained. The following table depicts the correlations between pitcher BABIP and various batted ball metrics, sorted by the strength of the relationship (all qualified seasons, 2007-17, n = 898):

BABIP Correlations
Metric BABIP
BABIP 1.000
LD% 0.348
FB% -0.329
PU% -0.326
OFFB% -0.307
IFFB% -0.239
GB% 0.204
IFH% 0.183
Hard% 0.179
Soft% -0.157
BUH% 0.110
HR/FB 0.083
Med% -0.033
Cent% 0.019
Pull% -0.017
Oppo% 0.004
PU% = pop-ups = FB% * IFFB%
OFFB% = outfield fly balls = FB% * (1 – IFFB%)

A coefficient of 1.0 indicates a perfect relationship (BABIP perfectly correlates with itself), and -1.0 indicates a perfect inverse relationship. There are some adequate correlations here — all of the batted ball types except for ground ball rate (GB%) exhibit weak correlations — but each of them, individually, explain little more than 10% of the variance in pitcher BABIP. Regressing BABIP on all of the batted ball types (LD%, GB%, OFFB%, PU%) produces an adjusted r2 of 0.22 — commendable, but it leaves unexplained almost 80% of the variance that a pitcher’s BABIP experiences in a given season. Meanwhile, none of the contact quality metrics (Hard%, Med%, Soft%) exhibit any semblance of a relationship with BABIP, and regressing BABIP against all three hardly inspires confidence (adjusted r2 = 0.14).

The thing is, we know, intuitively, in our hearts, that contact quality matters. The harder a ball is hit, the less time a fielder has to react to and cleanly field it; weak contact suggests the inverse. These relationships play themselves out for hitters more readily, affirming our understanding of the sport. But the lack of similar evidence in support of pitcher contact management is maddening.

Alas, our artificial truncation of per-observation sample sizes — anywhere from 162 innings to 253 innings (CC Sabathia, 2008) — theretofore truncates our understanding of these relationships. What happens if we expand sample sizes for each pitcher to, say, 500 innings? It may not be immediately helpful from a fantasy perspective (except to those who play in insane three-year leagues, perhaps), but lengthening the window during which each pitcher “collects” batted ball data significantly sharpens our correlation estimates (n = 287):

BABIP Correlations, Pt. II
Metric 162+ IP 500+ IP
BABIP 1.000 1.000
PU% -0.326 -0.531
IFFB% -0.239 -0.503
FB% -0.329 -0.430
OFFB% -0.307 -0.395
Soft% -0.157 -0.391
GB% 0.204 0.353
LD% 0.348 0.320
Med% -0.033 0.238
Pull% -0.017 -0.180
Cent% 0.019 0.173
HR/FB 0.083 0.138
IFH% 0.183 0.137
Hard% 0.179 0.122
Oppo% 0.004 0.114
BUH% 0.110 0.035
PU% = pop-ups = FB% * IFFB%
OFFB% = outfield fly balls = FB% * (1 – IFFB%)

The batted ball types exhibit stronger correlations. Pop-ups, which are effectively automatic outs, explain almost 30% of the variance by themselves — a legitimately moderate correlation. Moreover, contact quality — namely soft contact (Soft%), and not hard contact (Hard%) — has borne what amounts to a non-zero correlation with BABIP. Lengthening the per-pitcher duration to 750 innings sharpens our correlations even further (n = 154):

BABIP Correlations, Pt. III
Metric 162+ IP 500+ IP 750+ IP
BABIP 1.000 1.000 1.000
PU% -0.326 -0.531 -0.582
IFFB% -0.239 -0.503 -0.549
FB% -0.329 -0.430 -0.485
OFFB% -0.307 -0.395 -0.450
Soft% -0.157 -0.391 -0.447
GB% 0.204 0.353 0.425
Med% -0.033 0.238 0.368
LD% 0.348 0.320 0.215
Cent% 0.019 0.173 0.213
Pull% -0.017 -0.180 -0.161
IFH% 0.183 0.137 0.120
HR/FB 0.083 0.138 0.099
Oppo% 0.004 0.114 0.070
Hard% 0.179 0.122 0.022
BUH% 0.110 0.035 -0.009
PU% = pop-ups = FB% * IFFB%
OFFB% = outfield fly balls = FB% * (1 – IFFB%)
Note: the columns in all these tables are sortable!

Pop-ups now explain a third of BABIP variance on their own. But look! Soft and medium (Med%) contact demonstrate better-than-weak correlations, and the relationship between hard contact and BABIP is essentially nonexistent. This result is not to be conflated with the conclusion that hard contact doesn’t matter (don’t worry, it does) but it’s almost worthless as it relates to pitcher BABIP, whereas its inverse companions are much more valuable. We must acknowledge the significantly smaller sample size here — 154 player-seasons is still fairly small — but it’s also self-evident these kinds of relationships, while definitely existing, don’t necessarily flesh themselves out over the course of one season.

Using the sample of 750+ inning observations, regressions of BABIP on batted ball types, contact quality, and directions bear the following correlations (as measured by adjusted r2):

LD%, GB%, OFFB%, PU%: 0.40
Soft%, Med%, Hard%: 0.23
Pull%, Cent%, Oppo%: 0.03 (this likely bears a stronger correlation from the hitter side, especially when controlling for hitter handedness)

The takeaway? Most everything we’ve known, or, through intuition, thought we’ve known, about baseball is probably correct in the grand scheme of things. It’s just that we may need to look at it through a slightly different lens. Soft and medium contact affect BABIP much more significantly than hard contact does. It helps explain why guys like Dallas Keuchel, Jake Arrieta, Kyle Hendricks, Tanner Roark, and Marco Estrada have a habit of running impressively low BABIPs — and also, in a roundabout way, explains why the jury is still out on Robbie Ray’s insanely volatile annual BABIPs. Maybe now looking at this year’s BABIP leaders, sorted by Soft%, won’t surprise you.

HR/FB

HR/FB, another pitcher metric deeply embedded with luck, also improves in correlation/explanation as pitcher sample sizes increase. Using all the aforementioned samples:

HR/FB Correlations
Metric 162+ IP 500+ IP 750+ IP
HR/FB 1.000 1.000 1.000
Oppo% -0.227 -0.418 -0.431
PU% -0.169 -0.283 -0.377
IFFB% -0.166 -0.280 -0.376
Pull% 0.255 0.369 0.376
GB% 0.111 0.229 0.346
FB% -0.139 -0.248 -0.342
OFFB% -0.122 -0.233 -0.326
Soft% -0.143 -0.265 -0.261
Hard% 0.374 0.362 0.240
BUH% 0.059 0.234 0.177
LD% 0.073 0.050 -0.099
BABIP 0.083 0.138 0.099
Cent% -0.086 -0.029 -0.032
IFH% 0.023 -0.005 0.025
Med% -0.237 -0.140 -0.019
PU% = pop-ups = FB% * IFFB%
OFFB% = outfield fly balls = FB% * (1 – IFFB%)

Pitcher HR/FB behaves so much differently than pitcher BABIP. For example, batted ball direction plays a pretty huge role: pulled batted balls are bad, and oppo batted balls are good. That makes sense; on average, hitters hit for twice as much power to their pull side than to the opposite field. There’s some weirdness in the table that suggests the correlation between hard-hit rate and HR/FB dramatically decreases as pitchers accumulate more innings, but I’m convinced it’s nothing more than a data quirk: when I limited the data to pitcher-seasons 1,000+ innings (n = 99), the correlation coefficient increases to 0.406.

So, hard contact is bad, of course. And so are pulled balls, especially of the fly ball variety. Lastly, a minor interesting note (something Jeff Zimmerman has previously shown, I think): ground ball pitchers tend to allow higher HR/FB rates.

Again, I’m not sure this is anything new to us. This, for HR/FB and for BABIP, is all intuition. These are relationships that some simple manipulation of FanGraphs’ splits leaderboard would reveal to us. But there’s something about seeing these correlations emerge over time at the individual pitcher level that is comforting, reassuring. Sure, your favorite pitcher might still be subject to a ton of good or bad luck in any given season. But there’s also evidence that there’s a method to this madness. There’s evidence there still exists, in some form or another, The ProcessTM.





Two-time FSWA award winner, including 2018 Baseball Writer of the Year, and 8-time award finalist. Featured in Lindy's magazine (2018, 2019), Rotowire magazine (2021), and Baseball Prospectus (2022, 2023). Biased toward a nicely rolled baseball pant.

17 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Jetsy Extrano
5 years ago

It genuinely weirds me out that the correlations with Hard% decrease as data size increases. Especially for HRs. It seems like we should all be banned from using Hard% until we have a satisfying explanation of that.

Jetsy Extrano
5 years ago

That is somewhat reassuring, thanks. But it still feels a bit like we’re cherrypicking an endpoint — do we want to peek at whether correlation bumps down again at 2000?

And even 500 is a lot of innings. If the data is quirky that far out… I don’t know, I’m not a statistician, it just seems like a warning.