Gettin’ Shifty With It — Introducing the New xBABIP by Mike Podhorzer February 6, 2017 For years now, we have attempted to better understand the seems-impossible-to-predict metric we all know and love, batting average on balls in play, or BABIP for us nerds who like acronyms. As far back as 2008, we have tried, tried, and tried again to come up with an xBABIP equation strong enough to play with the other xMetrics. Two years ago, I developed what was at the time, the best we had. Then, we were given a gift in the form of data collected by Baseball Info Solutions, some of which Alex Chamberlain used to improve my equation and make it easier to calculate. Then Andrew Perpetua developed a Statcast driven xBABIP. Finally, Alex updated his equation by tacking on a seasonal constant. However, after all these incremental jumps in equation accuracy, we were still missing a critical piece of data, and we all knew it. But then, we were given the best gift of all, the Splits Leaderboard. And just like that, we suddenly had the ability to incorporate shift data into an xBABIP equation. It was a joyous time indeed. We thought we knew that the shift killed the BABIPs of many a left-handed hitter, but we simply didn’t have the type of data needed to quantify the effect, if any. I began with the research stage, collecting the average BABIP marks for ground balls and fly balls in a variety of different situations. I looked at the MLB average BABIP from 2012 to 2015, which were thus: MLB Batted Ball Type BABIP 2012-2015 Batted Ball Type BABIP GB – Pull 0.191 GB – Center 0.269 GB – Oppo 0.329 FB – Pull 0.157 FB – Center 0.137 FB – Oppo 0.102 You see here that while pulled fly balls are most desirable (likely because a batter could hit with more power to the pull side), there isn’t a significant difference between the best fly ball type (pulled) or the worst (oppo). And they all kinda stink. Grounders, on the other hand, have a much more dramatic slope. The dropoff to pulled grounders is the most drastic in any batted ball type. Next, I checked in on those pulled grounders, this time breaking them up by each of the three Shift Splits: MLB Pulled GB Shift Type BABIP 2012-2015 Shift Type BABIP Pull – No Shift 0.197 Pull – Trad Shift 0.116 Pull – Non-Trad Shift 0.276 BINGO! When a ground ball is pulled into a traditional shift, the likelihood of a hit drops like a rock. I’m not sure why balls hit into non-traditional shifts fall for hits even more than in no-shift situations, but that’s what has happened, so I left non-traditional shifts out of the remainder of my analysis. We have now established that pulling ground balls into a traditional shift is a killer for BABIP, and ultimately, batting average. It’s time to put that concept to work and figure out how to incorporate it into xBABIP. It took me a little while to figure out the correct way to account for pulled grounders hit into the shift. Eventually, it came to me — the shift-related metric that would make its way into my equation would reflect how often the batter is shifted against and once shifted against, how often he pulls grounders. Then some mathematical gymnastics would take place (really, just multiplication), and voilà, we have our shift component. So before actually presenting the equation, let’s discuss how I used the Splits Leaderboard to gather the shift data needed to form the newest component in my equation. It required several filters and data exports, so please follow along: 1) No Shift Balls in Play — under the Shifts split, select No Shift, export the data, and then take AB, subtract SO, and add SF 2) Shift Balls in Play — under the Shifts split, select Shift – Traditional, export the data, and then take AB, subtract SO, and add SF 3) Pull% on GB While Shifted — under the Batted Balls split, select Groundballs, and under the Shifts split, select Shift – Traditional, export the data, and use the Pull% Once you have these three data points, there is some additional easy math necessary, as follows: 1) % Balls in Play Shifted = Shift Balls in Play / (No Shift Balls in Play + Shift Balls in Play) 2) Pull GB While Shifted% = Overall GB% * Pull% on GB While Shifted * % Balls in Play Shifted Phew! Tired yet? Our final number is now presented as a percentage of all balls in play, on the same scale as the normal batted ball data you find on the main player page, like GB%. It’s correlation time! My population data set consisted of all batter seasons from 2012 to 2015 in which at least 400 at-bats were recorded, giving me 705 to work with. Correlations with BABIP 2012-2015 Metric BABIP LD% 0.475 GB% 0.223 FB% -0.421 True FB% -0.335 IFFB% -0.428 True IFFB% -0.497 Hard% 0.158 Spd 0.281 Pull GB While Shifted% -0.274 Look at the correlation of my new metric, Pull GB While Shifted%! True FB% is actual FB% once infield flies are excluded, while True IFFB% are pulled out of fly balls and represented as a percentage of all batted balls. So we have our gang of components and welcome the newbie. Alas, it’s time to slide these metrics into the blender and cook up our newest xBABIP: xBABIP = 0.1911 + (LD% * 0.3800) – (True FB% * 0.1502) – (True IFFB% * 0.4173) + (Hard% * 0.25502) + (Spd * 0.0049) + (Pull GB While Shifted% * -0.1492) Adjusted R-squared = 0.5377 True FB% = FB% – (FB% * IFFB%) True IFFB% = FB% * IFFB% This is the first time we have pushed our xBABIP R-squared above the 0.50 barrier. It’s an exciting time, folks, an exciting time. Before you leave to celebrate, we’ll finish things off with the all-important year-over-year correlations for the components during the 2012 to 2016 period: YoY Correlations 2012-2016 Metric Correlation LD% 0.282 True FB% 0.677 True IFFB% 0.541 Hard% 0.663 Spd 0.713 Pull GB While Shifted% 0.866 % BIP Shifted 0.899 With the exception of LD%, the metrics you’re familiar with are rather stable from one season to the next. And then you you get to the bottom and what do you see? The two new metrics I created are highly correlated. So if a guy gets shifted one year, you better believe he’s going to see a similar rate of shifts the following season. And if a player hits lots of pulled grounders into those shifts, he’s probably going to continue doing so. Lastly, the most important of the all-important year-over-year correlations: YoY Correlations 2012-2016 Metric Pairing Correlation BABIP Yr1 to BABIP Yr2 0.274 xBABIP Yr1 to xBABIP Yr2 0.509 xBABIP Yr1 to BABIP Yr2 0.348 BABIP 2015 to BABIP 2016 0.266 xBABIP 2015 to BABIP 2016 0.341 Amazingly, BABIP in year 1 only correlates to BABIP in year 2 at 0.274, which is disappointingly low. And that’s precisely why we have been so keen on creating an exceptional xBABIP equation. xBABIP correlates with itself at a much higher clip, which speaks to the stability of the metrics driving it. So BABIP set a low baseline of just 0.274 to clear, and clear we did, as xBABIP in year 1 to BABIP in year 2 rose to 0.348, which is a substantial jump. Even when using the out of sample 2016 season, the correlation surges from 0.266 when just using 2015 BABIP to 0.341 when using 2015 xBABIP. Of course, these correlations are still lower than many of our other metrics and my array of xMetrics. But that speaks more to the randomness of BABIP in any given season than an inability for us to isolate the right metrics to include in an equation…I think. I’m sure that we’ll continue to take baby steps forward as even more data becomes available, but for now, I’m ecstatic about the progress the shift data has allowed us to make.