A New BABIP for a Statcast Era

Last week I wrote about my efforts to build batted ball stats exclusively using data from Statcast. I described my method for classifying launch angles into larger launch windows, and then separating these windows into a series of buckets based upon their launch velocity. Whereas others have used combinations of line drive, opposite field and hard hit rates to construct approximations for launch angles, I am, for better or worse, exclusively using the launch angles and speed, discarding every other facet of the game in the process.

Many have worked towards teasing apart the luck and skill aspects for balls in play. Up until the last calendar year, perhaps the best methods available involved incorporating line drive and opposite field hit rates. Line drives due to their significantly higher likelihood of being a base hit, and the opposite field hits because they are more likely to be line drives. However, there is a lot of information lost to this sort of categorization. For instance, where was the ball hit, how hard? Using Statcast data, we can build a more granular view of batted balls, and define new types of batted balls, along with their observed characteristics. Now that we are in the Statcast era, the line drive, fly ball, ground ball, and pop up categorical system may become obsolete.

Physics complicates things.

One of the Holy Grail stats in baseball for the past few years has been ‘hang time’, which is measuring the amount of time available to successfully play a batted ball for an out. With this information, we can find the odds of a given defender successfully playing a ball, given their positioning, range, and arm strength. Using the Statcast data alone, my versions of xOBA and xBABIP cannot directly measure this, but I do attempt to approximate the hang time of the batted ball in a more round about manner.

Generally speaking, finding the hang time of a batted ball is a physics problem, and a rather complicated one at that. The number of variables involved is large, and include air density; wind; the ball’s coefficient of drag and spin; and the launch angle and speed. Even knowing these starting conditions isn’t enough, either, as wind and air density can vary with both the height above the ground, and with location within a given ballpark.

The launch angle and speed of the batted ball does not tell us enough information to accurately estimate how far the ball will travel or where it will land, but it does give us upper and lower bounds to work with. It gives us a basic estimate of where the ball may land, and a general sense of how much time defenders may have to successfully play the ball. It isn’t, by any means, a perfect solution, but it is the best we have right now, and for the foreseeable future.

As I’ve described previously, my system assumes a more naive approach to batted ball categories. I toss out the older GB/FB/LD/PU system entirely, and replace it with one that has, on the other end of the complexity spectrum, roughly 15 thousand ‘types’ of batted balls. These ‘types’ are defined by their launch window, a 5 degree by 5 degree wedge, and exit speed. These 15 thousand ‘types’, I assume, each approximate a range of hang times, landing locations, and time a defender is given to register an out. Any of these assumptions may be wrong, and it all requires a lot more research and refinement, but, I believe, this type of approach offers the greatest ability to tease out skill and luck. Over time, perhaps the 15 thousand batted ball categories will be pruned to a smaller number as certain groups of launch windows and velocities may be combined.

The differences between BABIP and xBABIP

There are some pretty significant differences between BABIP and my version xBABIP. My version of xBABIP is a pure batting stat. It only measures the player’s ability to bat the ball on an angle and speed conducive to reaching base safely, nothing more. BABIP incorporates many other unrelated aspects of the game, such as foot speed, and arbitrarily subtracts certain types of power (balls that happen to be home runs), while including other types of power (near miss home runs, doubles, and triples). However, this leads to an issue I struggle with personally in calculating xBABIP. I am forced to leave out the probability of a ball being a home run. All balls have their probability of single, double, and triple included, but in order to make xBABIP resemble BABIP numbers, with .300 being roughly average, I am forced to exclude the probability of home runs. This, to me, feels like I am unduly penalizing players for hitting the ball too well. I have been tempted to include this HR% and publish an xBACON stat instead, but I wonder how it will be perceived. Maybe people prefer BABIP due to familiarity. Please let me know how you feel about this particular topic. Is there a reason to keep xBABIP over xBACON that I am overlooking?

Limitations of xBABIP

xBABIP does not currently take shifting data into account, but I am hopeful it will have this ability in the future. I’ve created two simple heat maps (I’m sorry if it looks primitive) of the xOBA and xBABIP values with respect to landing location on the field.

xOBA2xBABIP

You can clearly see the defender’s shadows. In a future iteration of these stats, I hope to further tease apart this data to create a weighted map for the average defender, or perhaps for specific defenders, that I could position on the field for each pitch given the shift data, assuming this shift data includes the starting location of each defender upon the pitch release or perhaps the point of contact.

Similarly, I would love to be able to work in running speed in the future. Having everyone’s average speed home to first, home to second, and home to third, could help estimate the odds of base hits, especially extra base hits, even more accurately. Especially given shift data and defender arm strength, either of which could dramatically influence the in game value of speed.

I think that pretty much sums up where xBABIP is at the moment. I’ll leave you with an up to date spreadsheet of every batter’s BABIP and xBABIP for you to peruse.

You can always find an up to date version of my stats in this google doc (which you can download).





Andrew Perpetua is the creator of CitiFieldHR.com and xStats.org, and plays around with Statcast data for fun. Follow him on Twitter @AndrewPerpetua.

22 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Ryan Brockmember
7 years ago

Just gotta say, I’ve been loving the xOBA/xBABIP sheet you posted last week. Wish Fangraphs would host a page featuring all the ‘x’ stats you guys have come up with: wOBA, BABIP, HR/FB, K%, BB%, etc…

cohend1275
7 years ago
Reply to  Ryan Brock

Completely agree Ryan, would love to see it all in a centralized location.

evo34
7 years ago

One question: are all of your “x” stats park-adjusted? That is, do you try to calc. what should have happened for every BIP in the particular park in which it was hit, or rather what should have happened assuming it was hit in an average MLB park?