2009 BABIP-xBABIP Splits

Yesterday, we took a look at the starting pitchers with the biggest difference between their ERAs and their Expected Fielding Independent ERAs, attempting to find which hurlers performed above or below their peripheral stats in 2009.

Today, let’s turn out attention to the hitters. I compiled a list of the batters (minimum 350 plate appearances) with the biggest gap between their batting average on balls in play (BABIP) and their expected batting average on balls in play (xBABIP).

What’s xBABIP? Last winter, Chris Dutton and Peter Bendix sought to find which variables were most strongly correlated with a batter’s BABIP. Using data from the 2002-2008 seasons, Dutton and Bendix found that a hitter’s eye (BB/K ratio), line drive percentage, speed score and pitches per plate appearance had a positive relationship with BABIP (the better a batter rated in those areas, the higher his BABIP). Pitches per extra-base hit, fly ball/ground ball rate, spray (distribution of hits to the entire field) and contact rate had a negative relationship with BABIP. From this research, they created a model for predicting a batter’s BABIP.

Prior to Dutton and Bendix’s work, a lot of people used to calculate a hitter’s expected batting average on balls in play by taking line drive rate and adding .120. It made some sense: line drives have the highest batting average of any batted ball type by far, falling for a hit well over 70 percent of the time.

However, line drive rates don’t show a high correlation from year to year. That makes the “LD% plus .120” method unreliable. Dutton and Bendix’s model showed a 59 percent correlation between actual and expected BABIP. The LD +.120 method showed just an 18 percent correlation.

Some of the numbers used in Dutton and Bendix’s study are not readily available. However, Derek Carty of The Hardball Times and Slash12 of Beyond the Box Score have both come up with expected batting average on balls in play calculators based on the new findings.

For the purposes of this article, I used Slash12’s calculator. It uses the following variables:
– Line Drive Percentage (LD%)
– Ground Ball Percentage (GB%)
– Fly Ball Percentage (FB%)
– Infield/Fly Ball Percentage (IFFB%)
– Home Run/Fly Ball Percentage (HR/FB%)
– Infield Hit Percentage (IFH%)

While not identical to the variables used by Dutton and Bendix, these batted ball numbers do a good job of taking into account the aspects that lead to a higher or lower BABIP.

First, a disclaimer. Like the ERA-xFIP charts from yesterday, these lists of “lucky” and “unlucky” hitters are based on just one year of data. To get a better feel for how a hitter will perform in the future, it’s vital to take a good hard look at multiple seasons worth of performance. This is just a quick-and-dirty exercise.

To provide a little more context, I also included each batter’s actual BABIP since 2007, when possible. The three-year averages help us get a better picture of each hitter, and help us figure out which batters might be “tricking” the xBABIP calculator based on one year of abberrant batted ball numbers.

Take Jason Kendall, for instance. Kendall had a 12 percent infield hit rate in 2009, compared to a 7.6% career average. The calculator doesn’t know that Kendall’s ankle exploded like a cheap Acme bomb a decade ago, and that he’s a 35 year-old catcher who has a BABIP under .270 since 2007. It thinks he has speed due to the infield hit rate. That’s why you need to look at multi-year numbers.

Here are the hitters with actual batting average on balls in play figures exceeding the expected batting average on balls in play numbers. These are the guys who might see their batting averages fall in 2010:

Higher BABIP than xBABIP

And here are the batters with actual BABIPs falling well short of the XBABIP totals. These hitters could experience a bounce-back in 2010:

Lower BABIP than xBABIP

We hoped you liked reading 2009 BABIP-xBABIP Splits by David Golebiewski!

Please support FanGraphs by becoming a member. We publish thousands of articles a year, host multiple podcasts, and have an ever growing database of baseball stats.

FanGraphs does not have a paywall. With your membership, we can continue to offer the content you've come to rely on and add to our unique baseball coverage.

Support FanGraphs

A recent graduate of Duquesne University, David Golebiewski is a contributing writer for Fangraphs, The Pittsburgh Sports Report and Baseball Analytics. His work for Inside Edge Scouting Services has appeared on ESPN.com and Yahoo.com, and he was a fantasy baseball columnist for Rotoworld from 2009-2010. He recently contributed an article on Mike Stanton's slugging to The Hardball Times Annual 2012. Contact David at david.golebiewski@gmail.com and check out his work at Journalist For Hire.

newest oldest most voted
Dan Budreika

One important note about Slash 12’s xBABIP calculator: He doesn’t use park factors in his calculator. I prefer The Hardball Times’ calculator because they do consider park factors.

I’d really like to see a study comparing these two xBABIP calculators one day.

Bobby Boden

Slash12 here, actually it does take park factors into account, I believe. What does a park effect? The biggest things that come to mind are: Home Run Rate, Slowness of the infield, and Foul Ball territory. These are all included in the analysis in HR/FB, IFH%, and IFFB% respectively.

Still, there’s definitely something missing from this equation, because if you look over a few years, there are a few anomalies that really stick out. It thinks ryan howard, and brandon phillips should be hitting for much higher BABIP’s for instance. I have a feeling it’s something like spray that’s missing, but I am not sure where to get that data.

Still, spray aside, it still does a pretty descent job predicting future BABIP (though a less accurate job against dead pull hitters, or hitters going the other way more regularly I suspect) as it is.

Dan Budreika

I got this from the link of your article introducing the calculator:

“It’s worth noting, that I’m not taking into account ballpark factors (which surely have some kind of effect on BABIP as well)”

I do see how they are kinda incorporated in the %’s from your last post. But you don’t use the ESPN or Firstinning or some other park factor?

And when I try opening the link to your calculator why does it say I do not have permission to access this spreadsheet??

Bobby Boden

Sorry about the google docs link, I guess it doesn’t work the way I had hoped (I can’t seem to just make a spreadsheet available to anybody who wants it). The data here is much more interesting then what I’ve got in that spreadsheet anyway (and the download should work I believe).

I guess I’m contradicting my initial post a little, but I’ve done a lot of thinking, and research since I wrote the original article, in my defense. I do currently believe that ballpark factors are worked into the calculator, at least somewhat, but probably not completely. A big outfield probably accounts for more fly balls dropping for hits, and that’s not factored in anywhere for instance. It’s only built in, as much as it’s built into the batted ball data that I’m working with allows. If I had access to an outfield flyball hit % that would help even more incorporate park factors.


Actually, according to this study at Hardball Times park factors affect just about everything — including unexpected things like singles and walks and even groundball rates. Intuitively you wouldn’t think that to be the case, and nobody seems to have a good explanation, but the data apparently supports it. I’d love to see a park factors adjustment that incorporates this data (as well as their home run park factors) because our current crude park factors don’t seem to be adequately modelling their effect.

Bobby Boden

I’m not exactly clear on the exact details of their study, but it seems to me like it would be really difficult to factor out other non-ballpark factors that seem much more likely to effect strikeouts in a given ballpark. Some pitching staffs are going to be more likely to give up more strikeouts, and some teams are going to be less likely to strikeout as well, how do you truly factor these out? I don’t think that ESPN’s ballpark factor does a good job of doing so, based on how much so many of their ballpark factors change from year to year.

Homerun ballpark factors are the same thing….if you go back and look at previous years history, a lot of ballparks will go from a homerun ballpark, to a pitcher friendly ballpark (the twins stadium if I recall is one).

I believe all they are doing is taking the # of K’s, Homeruns, or whatever for visiting team, and the home team (and giving more weight to visiting teams). Doing this, is going to be highly prone to error, because the home team’s batting, or hitting skill will surely play into those numbers (and that changes every year, unlike the ballpark itself).

Anyway, in summary, I’m just skeptical as to how good these various ballpark factors are, and how much they are really indicative of the ballpark itself, and not just a factor of something else. For instance, if a team has a ton of flyball pitchers on their staff, that’s going to drive the ballpark HR factor way up, no matter what the dimensions are. Same with strikeouts, Walks, etc.

Dan Budreika

Oh so I’m guessing David G. used the formula you had there to compile his own spreadsheet posted above since your link doesn’t work at your post.


Bobby, my understanding of ESPN park factors is that they are derived by comparing home rates vs road rates for the home team, and park rates vs average rates for road teams.

So to use your example, a home team with a lot of flyball pitchers wouldn’t affect the HR park factor, since it’s based on how many more (or fewer) HR’s that pitching staff gives up at home than on the road — not just how many HR’s above (or below) average are hit in that park for the year.

The Red Sox and Mariners serve as decent examples, as their staffs had the 2nd and 3rd highest FB%, respectively, in 2009 but Fenway and Safeco played as the 10th and 3rd worst home run parks. Since the park factor is similar to looking at home-road splits, at least a few seasons of data is required to make a determination of the true park effect, which accounts for the annual swings you mention.

I agree that it is difficult to say whether the THT study is based on the same methodology.