Eating Crow: xBABIP and the Shift

A few days ago, I looked at the effects the shift may be having on players using the difference between their BABIP and xBABIP. The observed drop in a player’s BABIP, compared to their xBABIP, was 41 points. As reader phoenix2042 pointed out, I was using a dated formula for xBABIP. By using an updated xBABIP formula, I still found a difference, but not as much of one.

The main problem with the using the old BABIP formula is that the league wide BABIP value has dropped over the last couple of years. Here are the BABIP values for all seasons that batted ball data is available here on Fangraphs:

Season: BABIP
2002: 0.293
2003: 0.294
2004: 0.297
2005: 0.295
2006: 0.301
2007: 0.303
2008: 0.300
2009: 0.299
2010: 0.297
2011: 0.295
2012: 0.295

The original xBABIP formula publish used 2007 to 2009 data which were some of the highest BABIP values for the years being examined. I got a hold of Robert Boden (aka: slash12) and asked him to re-run the xBABIP formula with recent data. He gladly re-ran it and here are his comments on the new formula:

I know more now than I did when I originally developed my first xBABIP equation. So I decided to go back to the drawing board and do something from scratch. I think the resulting equation should be significantly more accurate at calculating xBABIP. One big improvement is that it now incorporates bunt hits. I also re-worked the regression to use individual batted balls instead of batted ball percentages. This will result in a better, more accurate equation.

The logic of the new equation is a little different. You earn your bunt hits, and your infield hits, so you get 100% credit for these in your xBABIP. Likewise, if you hit an infield fly ball you get 0% credit for that. What remains are: line drives, outfield fly balls (non-home run), and ground balls (that weren’t infield hits), the equation assigns a expected BABIP to each of these remaining batted ball types.

He was able to keep the same basic formula and just change the year-to-year constants. Here is the formula and constants:

xBABIP = (( GB – IFH ) * (GB-IFH constant) + (FB-HR-IFFB) * (OFFB Constant) + LD * (LD Constant) + IFH + BUH ) / (GB + FB + LD + BU + – HR – SH)

Constants 2009 2010 2011 2012 2009-2011 avg
GB-IFH 0.221 0.161 0.182 0.159 0.195
OFFB 0.098 0.156 0.148 0.121 0.134
LD 0.763 0.800 0.710 0.750 0.740

I have re-created a spreadsheet that people can use to quickly calculate xBABIP for themselves (Appendix).

Using the new xBABIP formula, I re-ran the analysis. In addition to the new formula, I added 3 new players (Jose Bautista, Josh Hamilton and Adrian Gonzalez) to the data group. The group’s average BABIP, weighted to PAs, is 13 points lower than the groups xBABIP. The difference is significantly less than the 41 point difference I previously calculated.

Here an example player to show how much of an effect a shift may have on a player’s AVG:

Consider the following player:
600 AB
90 K (15% K%)
20 HR
10 SF
0.320 BABIP

The previous decline in BABIP by 0.041 dropped the player’s AVG to 0.267 from 0.300. Using the new value of 0.013 for the BABIP decline, the player’s AVG drops to 0.289.

With the recent drop in league wide BABIP, the previous xBABIP formula I used was dated. When I used it to calculate the difference in xBABIP and BABIP of players that are getting shifted, I found more of a difference than I should have found using an updated formula. Using the new formula, I still found a drop in BABIP, but just not as much of one.

Appendix

The following is a procedure for downloading and using the xBABIP spreadsheet. First download the spreadsheet from Google Docs by going to File, Download As and select the desired format (don’t select .csv). Open the spreadsheet in Excel or OpenOffice (they are the only two formats I verified). Next, go to a hitter’s Standard data (like for Dustin Pedroia). The Minor League data needs to be hidden by selecting the “Minor Leagues” link (red box in image). Select and copy all the yearly data (some funkiness happens with the career data).

Finally, open the downloaded spreadsheet and Paste the copied data into the spreadsheet (select/highlight the Yellow box that designates the location to paste this data). Some of the columns are hid in order to only show the data being used for the calculations.

Now the More Batted Ball needs to be copied and pasted like the Standard data. Paste the More Batted Ball data after selecting/highlighting the blue box.

The xBABIP values will be automatically generated in 5 different columns. You will need to match up the correct year from the raw data to find the corresponding xBABIP value. Besides the xBABIP value that is generated, the BABIP value is also calculated. Hopefully you find the information useful and let me know if you have any questions.





Jeff, one of the authors of the fantasy baseball guide,The Process, writes for RotoGraphs, The Hardball Times, Rotowire, Baseball America, and BaseballHQ. He has been nominated for two SABR Analytics Research Award for Contemporary Analysis and won it in 2013 in tandem with Bill Petti. He has won four FSWA Awards including on for his Mining the News series. He's won Tout Wars three times, LABR twice, and got his first NFBC Main Event win in 2021. Follow him on Twitter @jeffwzimmerman.

7 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
phoenix2042
11 years ago

This is awesome. Thanks so much for doing this. I think this should actually be a regular fangraphs post, because it concerns more than just fantasy implications, but a leaguewide strategy of increased shifting. Some commentators have been calling this the “year of the shift,” and I think you have shown here that it makes a clear difference in BABIP. The next thing I wonder is if it affects extra bases: I know some players talk about taking the ball the other way to beat the shift for infield singles that roll down the line where no one is playing. But that takes away the possibility of the extra base hit or HR down the pull side.