Reconciling Pitcher (x)BABIP and Hard Contact Allowed

This is a long one. I appreciate your patience in advance.

Mike Podhorzer, I and sporadic others have — but primarily Mike has — carried the torch on developing ‘expected’ metrics, such as xBABIP (expected batting average on balls in play), xHR/FB (expected home run-to-fly ball ratio) and xK% (expected strikeout rate), all and the rest of which can be found here. For the uninitiated, these xMetrics help describe how a hitter or pitcher should have performed based on various measurements of the events that unfolded and typically are more predictive of future performance than the original metric. They’re not perfect, but, like other advanced metrics, they give us a better understanding of player performance and ability.

Each metric — xHR/FB, xK%, etc. — has formulas for both hitters and pitchers, with the hitter metrics typically having stronger correlations than those for pitchers. Unfortunately, pitcher xBABIP has always eluded us. It’s inappropriate to repurpose hitter xBABIP for pitchers, but it’s because the model coefficients (weights) would be different, not because the theory underpinning the model is flawed.

That’s the problem, though: hard hits, line drives, infield fly balls — these all should affect a pitcher’s BABIP allowed. Our intuition begs it to be true. Yet there’s a resounding lack of evidence that suggest otherwise. The correlation between BABIP and hard-hit rate (Hard%), line drive rate (LD%) and infield fly ball rate (IFFB%), among others, borders on nonexistent:

  • LD%: Line drive rate correlates very weakly with BABIP, producing a 0.12 R2. It’s better than nothing, but, unfortunately, LD% has virtually no correlation from year to year, something that rplunkett97 documented here. In other words, line drives do affect pitcher BABIP, but they’re worthless to use in a predictive manner, which is typically the reason why we use these metrics.
  • IFFB%: Infield fly ball rate does not correlate with BABIP, producing a 0.05 R2. Using IFFB% in a vacuum ignores how frequently a pitcher allows fly balls in the first place; converting it to pop-up rate (PU%, hypothetically) by multiplying IFFB% by overall fly ball rate (FB%) generates a metric that correlates very weakly with BABIP, producing a 0.10 R2. It should have a much stronger correlation, given pop-ups are basically free outs.
  • Hard%: Hard-hit rate does not correlate with BABIP, producing a 0.03 R2. This will be the impetus for the rest of this post.

All together, these metrics do very little to tell us anything meaningful about pitcher BABIP allowed. Line drives result in hits almost 70% of the time and infield flies, for all intents and purposes, are automatic outs. Yet we struggle to demonstrate any kind of meaningful relationship between them and pitcher BABIP. Likewise, you’d think a pitcher allowing frequent hard contact should struggle more than one who doesn’t, yet no dice.

Enter Statcast. It has revolutionized, or at least begun the new revolution about, how we consume player baseball data. (You may or may not be aware of the alleged Launch Angle Revolution in progress.) There are so many tools at our disposal — exit velocity, launch angle, barrels, and so on and so on — that we can basically predict a batted ball’s hit probability based on the results of previous batted balls with similar exit velocities and launch angles. (Statcast’s expected wOBA, or xwOBA, can tell us the probability of a hit but also the probable value of that hit as well.)

The granularity of this data has seemingly demystified many lingering sabermetric arguments. Yet… yet. I’m still struggling to grapple with the enigma that is Robbie Ray. (Yeah, it’s [yet] another post about Robbie Ray and BABIP.) Somehow, it always comes back to Ray.

Ray got BABIP’d to death last year, allowing a historically bad .352 mark in a full season’s work. Now, his BABIP is 75 points lower than last season (50 points lower than his career mark) — and Statcast validates it, suggesting his .206 expected batting average (xBA), per Statcast, stays true to his actual .203 batting average allowed despite allowing hard contact at a rate that almost paces the league (as sorted by 95 MPH+ percentage). Statcast, with its robust and granular data, accounting for exit velocities and launch angles and all that, defends Ray’s performance.

I want to defend it, too, but I’m still not totally sold. It’s really hard for me to swallow the fact that it’s somehow OK that he allows as much hard contact as he does. Statcast’s leaderboard, as previously linked, helpfully provides many of its cornerstone metrics in shorthand as percentages or averages. I’ll use a snippet from Tony Blengino’s contact survivors piece a month ago to further my cause:

Well, first off, let’s not get overly excited. Ray has been extremely lucky across all BIP types, with his Unadjusted Contact Score well below his adjusted mark on flies (92 vs. 119), liners (62 vs. 110), grounders (104 vs. 132) and overall (81 vs. 115). Hitters are batting only .500 AVG-.690 on liners, compared to the MLB average of .650 AVG-.869 SLG. Not a lot of skill in that; Ray’s average liner authority of 96.3 mph is second highest of the above hurlers.

I wanted to strip away all of the extra context and focus exclusively on exit velocity. Such focus assumes that pitchers don’t necessarily have control over where hitters put the ball in play. I think this is more truthful than not. Data-driven pitchers probably understand how their offerings fare by hitter handedness, pitch location in or out of the zone, etc., but they can’t reasonably coax a particular batted ball type, let alone a particular batted ball type in a particular direction, at any given moment. Statcast’s xwOBA and xBA credits the pitcher for these phenomena — credit I don’t think pitchers necessarily deserve in full.

Statcast’s leaderboard simplifies its robust database, but its shorthand representations made it easy for me to run a quick regression of BABIP to the percentage of batted ball events (BBE) of 95+ MPH (among pitchers in 2017 who have allowed at least 190 BBE). Nothing revolutionary — just putting what I perceive to be the collective intuition into code. The correlation was weak bordering on moderate, registering a 0.16 R2; incorporating barrel percentage (barrels per BBE, or Brls/BBE) improves the R2 to 0.19, which is encroaching upon legitimate “moderate correlation” territory. (There’s a moderate-to-strong correlation between 95+ MPH% and Brls/BBE.) It’s not a lot, but it’s the best I’ve seen. The correlations are weaker for 2016, which suggests to me MLBAM’s tools and technologies have improved (unsurprisingly).

Accordingly, I calculated new expected BABIPs (hey, xBABIP!) for each pitcher based on both specifications. The former (95+ MPH% only) for Ray: .325 xBABIP. The latter (95+ MPH% and Brls/BBE): .332 xBABIP. These are deliberately simplified models, but they fundamentally betray what Statcast is telling me, telling us. Ray has the 7th-largest discrepancy between his BABIP and xBABIP; the other six have BABIPs lower than .230, so it’s understandable why Statcast doubts them (looking at you, Ervin Santana).

I fully support having more data than less — goes without saying. But I think we lost the forest for the trees. This isn’t to say these results override Statcast ‘expected’ metrics. It’s just that these results are tangible and support what I think many of us intuit about pitchers who allow hard contact: that if we see it on one side of the coin (for hitters), we should somehow see it manifest on its other side as well. It can be theoretically divisive, though, if you think, like I do, that pitchers have less control over where batted balls go upon contact than we may think, or if you disagree. The easy but probably most honest answer is the truth is likely somewhere in the middle, blurred by variance.

So, that’s it. It’s not a big revelation; it just makes me feel a little better about believing that contact quality allowed actually means something. Although I had defended Ray and his inflated BABIP previously, I have had a hard time defending his low BABIP (and remarkably high strand rate) now.

All that said, I can’t leave you hanging without a table. Here’s your table of BABIPs versus xBABIPs.

2017 Pitcher BABIP vs. xBABIP
Player  95 MPH+%  Brls/BBE  BABIP xBABIP 1 xBABIP 2 avg diff
Ervin Santana .314 .043 .217 .289 .296 .292 .075
Ian Kennedy .327 .102 .215 .293 .277 .285 .070
Ariel Miranda .316 .072 .220 .289 .285 .287 .067
Dallas Keuchel .304 .052 .222 .286 .288 .287 .065
Lance Lynn .303 .077 .220 .285 .277 .281 .061
Max Scherzer .299 .054 .226 .284 .285 .284 .058
Robbie Ray .424 .070 .275 .325 .332 .329 .054
Matt Harvey .349 .087 .252 .300 .293 .297 .045
Hector Santiago .388 .116 .263 .313 .298 .306 .043
Jake Odorizzi .346 .083 .256 .299 .293 .296 .040
Jeremy Hellickson .312 .070 .248 .288 .284 .286 .038
Ivan Nova .358 .063 .267 .303 .306 .305 .038
Antonio Senzatela .342 .063 .264 .298 .300 .299 .035
Carlos Martinez .347 .063 .266 .300 .302 .301 .035
Ubaldo Jimenez .376 .089 .272 .309 .304 .307 .035
Mike Leake .356 .052 .272 .303 .310 .306 .034
Dylan Bundy .368 .084 .271 .307 .302 .304 .033
Sean Manaea .391 .047 .289 .314 .327 .321 .032
John Lackey .367 .079 .274 .306 .304 .305 .031
Sonny Gray .388 .058 .287 .313 .321 .317 .030
Nick Martinez .311 .078 .254 .288 .280 .284 .030
Jesse Chavez .401 .089 .286 .318 .314 .316 .030
Yu Darvish .327 .064 .263 .293 .293 .293 .030
Jose Urena .290 .089 .245 .281 .267 .274 .029
Clayton Kershaw .283 .068 .248 .279 .272 .275 .027
Dan Straily .266 .067 .243 .273 .265 .269 .026
Gio Gonzalez .300 .055 .259 .284 .285 .285 .026
Alex Wood .262 .021 .254 .272 .282 .277 .023
Mike Pelfrey .347 .063 .278 .300 .302 .301 .023
Mike Fiers .328 .077 .269 .293 .288 .291 .022
Alex Cobb .343 .051 .282 .298 .305 .302 .020
CC Sabathia .339 .047 .282 .297 .305 .301 .019
Tim Adleman .315 .062 .270 .289 .288 .289 .019
Matt Shoemaker .344 .094 .276 .299 .288 .293 .017
Edinson Volquez .327 .054 .278 .293 .297 .295 .017
Ricky Nolasco .401 .100 .298 .318 .310 .314 .016
Derek Holland .411 .091 .307 .321 .318 .319 .012
Jose Berrios .276 .061 .262 .276 .272 .274 .012
Jaime Garcia .339 .052 .288 .297 .303 .300 .012
Taijuan Walker .355 .048 .295 .302 .311 .307 .012
Andrew Cashner .313 .034 .282 .288 .299 .294 .012
Kyle Freeland .331 .046 .287 .294 .302 .298 .011
Miguel Gonzalez .371 .068 .298 .308 .310 .309 .011
Julio Teheran .295 .070 .269 .283 .277 .280 .011
JC Ramirez .354 .066 .294 .302 .304 .303 .009
Matt Garza .336 .052 .290 .296 .301 .299 .009
Erasmo Ramirez .367 .085 .296 .306 .301 .304 .008
Tyler Chatwood .292 .050 .275 .282 .283 .282 .007
Jason Vargas .288 .040 .276 .280 .286 .283 .007
Luis Severino .354 .061 .297 .302 .306 .304 .007
Corey Kluber .324 .051 .288 .292 .297 .294 .006
Adalberto Mejia .322 .080 .282 .291 .284 .288 .006
R.A. Dickey .275 .060 .269 .276 .272 .274 .005
Jordan Montgomery .317 .075 .282 .290 .284 .287 .005
Andrew Triggs .338 .057 .294 .297 .300 .299 .005
Johnny Cueto .355 .084 .295 .302 .297 .300 .005
Stephen Strasburg .318 .073 .285 .290 .285 .288 .003
Trevor Williams .310 .058 .285 .287 .288 .288 .003
Jose Quintana .348 .055 .301 .300 .305 .303 .002
Carlos Carrasco .333 .082 .290 .295 .288 .292 .002
Zack Greinke .274 .069 .271 .276 .268 .272 .001
Jharel Cotton .304 .043 .288 .286 .291 .288 .000
Chase Anderson .259 .040 .272 .271 .273 .272 .000
Michael Fulmer .271 .042 .277 .275 .278 .276 -.001
Mike Foltynewicz .328 .066 .294 .293 .292 .293 -.001
Aaron Nola .306 .045 .290 .286 .291 .289 -.001
Marcus Stroman .379 .067 .315 .310 .314 .312 -.003
Gerrit Cole .344 .087 .298 .299 .291 .295 -.003
Yovani Gallardo .339 .055 .304 .297 .302 .299 -.005
Michael Pineda .332 .061 .302 .295 .296 .295 -.007
Chris Sale .305 .062 .292 .286 .284 .285 -.007
James Paxton .286 .023 .293 .280 .292 .286 -.007
Jhoulys Chacin .296 .063 .289 .283 .280 .281 -.008
Jake Arrieta .320 .055 .300 .291 .293 .292 -.008
Justin Verlander .373 .082 .316 .308 .305 .307 -.009
Jacob deGrom .298 .066 .292 .284 .280 .282 -.010
Chris Archer .387 .052 .329 .313 .323 .318 -.011
Masahiro Tanaka .373 .097 .316 .308 .299 .304 -.012
Tanner Roark .350 .056 .316 .301 .306 .303 -.013
Ty Blach .297 .046 .298 .283 .287 .285 -.013
Zach Davies .335 .064 .310 .296 .296 .296 -.014
Joe Biagini .318 .047 .307 .290 .296 .293 -.014
Chad Kuhl .368 .054 .325 .307 .314 .311 -.014
German Marquez .362 .068 .320 .305 .306 .305 -.015
Trevor Bauer .421 .100 .336 .324 .319 .321 -.015
Lance McCullers .300 .042 .303 .284 .290 .287 -.016
Jordan Zimmermann .348 .085 .313 .300 .293 .297 -.016
Kenta Maeda .257 .036 .289 .270 .274 .272 -.017
Mike Montgomery .261 .035 .292 .271 .276 .274 -.018
Brandon McCarthy .198 .017 .272 .250 .256 .253 -.019
Robert Gsellman .371 .060 .331 .308 .313 .310 -.021
Jerad Eickhoff .364 .057 .329 .305 .311 .308 -.021
Jesse Hahn .348 .048 .326 .300 .308 .304 -.022
Matt Moore .431 .102 .347 .327 .322 .325 -.022
Drew Pomeranz .354 .071 .325 .302 .302 .302 -.023
Scott Feldman .256 .063 .292 .270 .263 .266 -.026
Kyle Gibson .382 .088 .335 .311 .307 .309 -.026
Bronson Arroyo .309 .102 .305 .287 .270 .278 -.027
Joe Ross .356 .069 .330 .303 .303 .303 -.027
Matt Cain .347 .053 .330 .300 .306 .303 -.027
Jason Hammel .337 .083 .321 .296 .289 .293 -.028
Marco Estrada .348 .086 .325 .300 .293 .296 -.029
Hyun-Jin Ryu .318 .073 .317 .290 .285 .288 -.029
Luis Perdomo .362 .051 .339 .305 .313 .309 -.030
Zack Wheeler .342 .079 .327 .298 .293 .296 -.031
Wade Miley .364 .057 .341 .305 .311 .308 -.033
Joe Musgrove .336 .060 .330 .296 .298 .297 -.033
Francisco Liriano .333 .071 .328 .295 .293 .294 -.034
Jeff Samardzija .309 .053 .323 .287 .290 .288 -.035
Danny Duffy .283 .054 .313 .279 .278 .278 -.035
Patrick Corbin .372 .076 .346 .308 .307 .308 -.038
Jimmy Nelson .325 .043 .337 .292 .300 .296 -.041
Josh Tomlin .363 .069 .347 .305 .306 .306 -.041
Jon Lester .273 .052 .317 .275 .274 .275 -.042
Bartolo Colon .394 .072 .360 .315 .318 .317 -.043
Rick Porcello .360 .082 .346 .304 .300 .302 -.044
Clayton Richard .311 .044 .338 .288 .294 .291 -.047
Adam Wainwright .336 .049 .347 .296 .303 .299 -.048
Daniel Norris .366 .071 .359 .306 .307 .306 -.053
Tyler Anderson .316 .088 .337 .289 .278 .284 -.053
Martin Perez .349 .054 .357 .300 .306 .303 -.054
Michael Wacha .287 .053 .347 .280 .280 .280 -.067
Kevin Gausman .368 .087 .371 .307 .301 .304 -.067
SOURCE: Statcast leaderboard
Min. 190 batted ball events (BBE)
Data extracted during All-Star Break (prior to June 14, 2017 games)
xBABIP 1 = 95+ MPH% only
xBABIP 2 = 95+ MPH% and Brls/BBE





Currently investigating the relationship between pitcher effectiveness and beard density. Two-time FSWA award winner, including 2018 Baseball Writer of the Year, and 8-time award finalist. Previously featured in Lindy's Sports' Fantasy Baseball magazine (2018, 2019). Tout Wars competitor. Biased toward a nicely rolled baseball pant.

newest oldest most voted
jdbolick
Member

Ray’s exit velocity seems like it’s coming at the same issue from a different angle as the recent discussion of Byung-ho Park. Park was running extremely high exit velocities on contact, so FanGraphs readers were shocked when the Twins released him, but a high average exit velocity doesn’t mean much if you’re rarely making contact. Similarly, Ray allowing a high average exit velocity doesn’t mean all that much when he’s allowing the third lowest rate of contact among qualified pitchers. At extremes like this, it appears that averages are not the best way of describing proficiency, as they lose sight of the number of times that no contact is made.

Mike Podhorzer
Editor
Member

This is all moot though because BABIP only includes balls in play. Maybe you’re arguing that the denominator is low so BABIP is more prone to significant fluctuation in such an instance like for Ray? So perhaps it’s silly to overanalyze his BABIP when the answer is that it’s just random and is less likely to match his high EV allowed or Hard%?

jdbolick
Member

Not random, just more noisy.