New Hitter xBABIP Based on BIS Batted Ball Data

You may have noticed that FanGraphs now feeds batted ball data, courtesy of Baseball Info Solutions, into its leaderboards. The day the data appeared, my mind buzzed with ways they could be useful in improving our understanding of a hitter’s batting average on balls in play (BABIP).

Mike Podhorzer already augmented previous attempts at devising an equation for expected batting average on balls in play (xBABIP) for hitters by incorporating elements of a hitter’s power, speed, plate discipline and batted ball tendencies. So, with fresh numbers in hand, I embarked on a journey to further improve the ever-evolving xBABIP. However, I sought to do so by using only batted ball data. Basically, I intended to develop a convenient xBABIP equation, one that can be computed using almost entirely variables found on the same page.

What I ultimately developed is a hitter xBABIP that is more a complement to Mike’s xBABIP than a substitute — that is, it is arguably no better than Mike’s equation, nor no worse, but it’s still different. I will explain what I mean in due time.

I chose batted ball variables that I thought would correlate well with BABIP (obviously):

  • LD%, True FB% and True IFFB%: In his introduction to the new batted ball data, Tony Blengino demonstrated the frequencies with which certain types of batted balls turn into hits. Thus, optimal proportions of certain batted ball types could maximize BABIP. True IFFB% represents infield fly balls as a percentage of all balls in play, not just fly balls; it is calculated by multiplying IFFB% and FB%. True FB% denotes all fly balls minus infield flies. Econometric note: Because the fractions of every batted ball type sum to one (100 percent), I must omit one of them or else the regression will do it for me. That is why you do not see ground ball rate here.
  • Hard%: Hard%, one of the new statistics, indicates how often a player hit a ball hard. (Who knew?) Eno Sarris was as surprised as I am to find that there is “virtually no correlation” between line drive rate and hard-hit percentage.
  • Oppo%: Oppo%, another new statistic, indicates how often a player hit to the opposite field. One could argue in favor of using Pull%, the percentage of balls pulled, which would likely be negatively correlated with BABIP, especially for guys who encounter a ton of infield shifts. But righties experience shifts less often, so Pull% might not adequately capture the effect we might seek.
  • Spd: I originally wanted to include infield hit rate (IFH%) so the equation would consist entirely of batted-ball variables. The idea was to capture (skilled) hits by bunts and (lucky) hits by dinkers and dribblers. However, per Jeff Zimmerman’s insight, I reconsidered my inclusion of them because they’re not necessarily exogenous to BABIP. A hitter’s speed score (Spd), on the other hand, is independent of infield hits; in other words, infield hits are a function of speed, not the other way around.

This model specification could be considered an expansion on Jeff’s work regarding hitter analytics in which he uses the aforementioned Hard% and Spd to generate expected BABIP values.

You Aren't a FanGraphs Member
It looks like you aren't yet a FanGraphs Member (or aren't logged in). We aren't mad, just disappointed.
We get it. You want to read this article. But before we let you get back to it, we'd like to point out a few of the good reasons why you should become a Member.
1. Ad Free viewing! We won't bug you with this ad, or any other.
2. Unlimited articles! Non-Members only get to read 10 free articles a month. Members never get cut off.
3. Dark mode and Classic mode!
4. Custom player page dashboards! Choose the player cards you want, in the order you want them.
5. One-click data exports! Export our projections and leaderboards for your personal projects.
6. Remove the photos on the home page! (Honestly, this doesn't sound so great to us, but some people wanted it, and we like to give our Members what they want.)
7. Even more Steamer projections! We have handedness, percentile, and context neutral projections available for Members only.
8. Get FanGraphs Walk-Off, a customized year end review! Find out exactly how you used FanGraphs this year, and how that compares to other Members. Don't be a victim of FOMO.
9. A weekly mailbag column, exclusively for Members.
10. Help support FanGraphs and our entire staff! Our Members provide us with critical resources to improve the site and deliver new features!
We hope you'll consider a Membership today, for yourself or as a gift! And we realize this has been an awfully long sales pitch, so we've also removed all the other ads in this article. We didn't want to overdo it.

I limited the sample to all qualified hitters from 2002 through 2014, good for 1,971 observations. What follows are the results from the OLS regression:

BABIP vs xBABIP

xBABIP = .1975 — .4383*(True IFFB%) — .0914*(True FB%) + .2594*LD% + .1822*Hard% + .1198*Oppo% + .0042*Spd
Adjusted R-squared = .456

In light of Mike’s adjusted R-squared of .424, it’s clear that, on its own, more granular batted ball data hardly, let alone significantly, improve our understanding of BABIP. (I’ll interject and say it’s unwise to judge a model strictly by its R-squared, as there are a variety of statistical tests one can perform to test a model’s validity. But, alas, it is commonly used and more easily understood.) Even the improvement in year-to-year correlation is only the slightest upgrade to Mike’s results:

Y1 BABIP to Y2 BABIP: .4072
Y1 xBABIP to Y2 BABIP: .4712

So what can we conclude? For one, there isn’t necessarily a “correct” or “better” way to approach xBABIP — at least not yet. I’m sure if we threw the kitchen sink at the problem, everything would fall into place. But the sink would probably break and it would be really messy and no one would want to clean it up and that’s why we can’t have nice things. From what I observe, the new spray statistics (Pull%, Cent%, Oppo%) replace, rather than augment, absolute average angle, as used by Mike and provided by Baseball Heat Maps, and isolated power (ISO) serves as a proxy for the various degrees of contact quality as represented by Hard%, Med% and Soft%. Ultimately, it appears that despite having more precise batted ball data, we are not much closer to explaining away the luck component of BABIP in consideration of my attempt here — an attempt that is far from the be-all and end-all.

While my equation appears to be “better” at first glance, we won’t know for sure until the xBABIPs from Mike’s and my equations are compared side by side (in the form of, say, minimizing root mean squared error, or RMSE). Until then, indulge in the xBABIPs of 2015’s qualified hitters provided below. “Diff” represents the difference between xBABIP and BABIP; I conditionally formatted the cells so that blue indicates an overachiever and red an underachiever.





Two-time FSWA award winner, including 2018 Baseball Writer of the Year, and 8-time award finalist. Featured in Lindy's magazine (2018, 2019), Rotowire magazine (2021), and Baseball Prospectus (2022, 2023, 2024, 2025). Biased toward a nicely rolled baseball pant.

33 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
TangoAlphaLima
10 years ago

Alex, I’m not sure if it’s a problem on my end, but the Workbook isn’t loading. Error says it can’t be opened.

Jackie T.
10 years ago

Same here.

dj_mosfett
10 years ago

Nope, that was me. It’s still doing it, but again, thank you for taking a look into it!

obsessivegiantscompulsive
10 years ago

I had the problem of it not being opened when my tab first opened, with there being an error noted, but the problem cleared up once I refreshed.

It could just be MS’s Azure cloud infrastructure, I tried to click to open it up into another tab, after refreshing, and it failed to do that too, the tab noted a service unavailablility error. “We are currently experiencing technical difficulties.
Please try again later.”

I eventually got it open, downloaded it, opened it in Google, then converted to Sheets format, in order to play around with it (using Chromebook).