Trying to Capture the BABIP Penalty for Lefty Hitters
Where the defensive shift and batting average on balls in play (BABIP) intersect intrigues me, but I’ve had a hard time figuring out a way to quantify it without having some sort of access to shift data. Despite advances Major League Baseball has made in measuring and collection data, not all of this information is publicly available or easily accessible, even if you know someone who knows someone (this guy).
But I think I finally had some kind of breakthrough or epiphany or what-have-you. It would be a time-intensive approach — a problem for a lazy person (this guy) — but it would be worth it to, perhaps, chip away at the relatively enigmatic BABIP with only publicly available tools at our disposal.
More than four months ago, I posted an expected BABIP (xBABIP) equation that is not necessarily better than any other that exists but does use strictly publicly available data. Here, I expand.
With a good deal of data manipulation, I was able to determine, for each hitter in each year, the percentage of ground balls hit to the pull side relative to all balls in play as well as his handedness. This enabled me to isolate the effect of pulled ground balls by lefties in particular — currently my best proxy for defensive shifts.
I expand the data set to include all hitters from 2010 through 2014 who, in a given season, pulled at least 100 batted balls and hit at least 50 balls to the opposite field. I admit it’s a strangely specific cross-section, but I wanted to narrow the sample to hitters who not only recorded enough plate appearances but also distributed the ball relatively well in small samples. (Very few hitters did not qualify for the sample given these restrictions, so there’s little to worry about.) The data excludes switch hitters.
I include the same variables used in the previously linked piece, but I replace opposite-field rate (Oppo%) with pull rate (Pull%) because of the focus on pulled ground balls. I specify two separate multiple regression equations, each of which I will explain in due time.
Dummy Variable Approach
The simplest way to quantify the penalty upon a left-handed hitter’s BABIP is to include a dummy variable for handedness. For the uninitiated, a dummy variable (“LHB“) is coded in binary — 0 for righties, 1 for lefties — with it sole function to estimate the average difference between two subgroups within the sample relative to the dependent variable.
Ultimately, its coefficient will specify how much more or less a hitter’s BABIP should be simply based on his handedness, holding all else constant. Again, it’s an admittedly crude approach, but it’s better than nothing.
The adjusted R-squared statistic, a commonly used measure of how well the model fits the data, clocks in at .456, an almost-negligible improvement over my previous installment (R^2 = .447)*, indicating the dummy variable is not particularly helpful.
* The adjusted R-squared looks a little different from the original piece due to differences in the data samples.
From 2010 through 2014, a left-handed hitter’s BABIP is, on average, seven points lower than a righty’s BABIP. But, again, that doesn’t tell us much given how much we know about the variability between hitters.
Dummy-Interaction Approach
I use the aforementioned dummy variable to isolate, specifically, the Pull% for left-handed hitters. This more closely resembles a proxy for the effectiveness of defensive shifts: teams employ defensive shifts primarily against lefty hitters in order to soak up pull-side balls in play.
The adjusted R-squared bumps up a fraction of a tick to .458. Again, the improvement is so small, it’s hardly worth mentioning. It’s not quite what I hoped to see at this point.
However, the frequency with which teams deploy defensive shifts has increased over time. Thus, we have variability between years in how a hitter’s pull rate might affect his BABIP.
Year-Effects Approach
Instead of treating left-handed Pull% equally among all fives years in the sample, I can isolate the effect of each year individually. Because we don’t have aggregate shift measurements per year by team nor by hitter, we can use year effects as a proxy for them. Ideally, we will see different coefficients for each year, helping estimate the effectiveness of defensive shifts over time without having specific year-by-year information on them.
The adjusted R-squared statistic jumps a few points to .507 — finally, a noteworthy improvement to the model, even if it is still fairly small. I will include the equation here, because I know people will ask for it, but it is less important than the point it conveys.
BABIP = .2181 — .3586*(True IFFB%) — .0852*(True FB%) + .3749*LD% + .2026*Hard% + .0053*Spd — .0607*(2010 Pull%) — .0560*(2011 Pull%) — .0862*(2012 Pull%) — .0976*(2013 Pull%) — .0768*(2014 Pull%) + .0017*(2010 LHB Pull%) + .0042*(2011 LHB Pull%) — .0364*(2012 LHB Pull%) — .0162*(2013 LHB Pull%) — .0373*(2014 LHB Pull%)
A bit unruly, yeah?
The equation is less important because it needs more context. Notice the color-coded portions of the equation above. Green indicates statistical significance at 95-percent confidence, orange at 90 percent, red otherwise. Statistical significance is important because it tells us… well, it tells us if the variable is significant. Statistically!
The equation predicts that Pull% for left-handed hitters had a positive effect on those hitters’ BABIPs in 2011 and 2012. However, the lack of statistical significance indicates their estimates are too volatile to trust.
Meanwhile, the effects of left-handed Pull% on BABIP from 2012 through 2014 are statistically significant, and they all carry negative values, meaning lefties who pull the ball more often are more prone to suppressed BABIPs. It’s an entirely familiar premise, but now intuition can align with something quantifiable.
Moreover, the trend in statistical significance, from nonexistent in 2010 to very strong in 2012, mirrors the rising popularity of the defensive shift over time.
 
| Year | RHB | LHB | increase in effect for LHBs | 
| 2010* | -.061 | -.061 | 0% | 
| 2011* | -.056 | -.056 | 0% | 
| 2012 | -.086 | -.123 | 43% | 
| 2013 | -.098 | -.114 | 16% | 
| 2014 | -.077 | -.113 | 47% | 
Discussion
Despite the increased magnitude of the downward effect on BABIP by the pull rates on left-handed hitters, the effect really isn’t that big. The coefficients presented in the table above represent the correlated effects given 100-percentage-point changes in pull rate. Scaling it down a bit, the difference between 25-percent and 35-percent pull rates in 2014, for example, correlates with a mere .008-point drop in BABIP For righties and a .011-drop in BABIP for lefties.
As John Duwan points out at Bill James Online, it’s not the shift that’s ineffective. As Duwan points out, shifts suppress batting averages by about 30 batting average points on particular balls in play affected by them. My model confirms as much: the coefficients for LHB Pull% from 2012 through 2014 (-.036, -.017, -.037) validate the results presented by Duwan.
But frequency is important, and the overall effect of the defensive shift on a player’s BABIP can be relatively small because of it despite its effectiveness.
Ultimately, I think left-handed Pull% appears to serve as an OK proxy for defensive shifts versus lefties given a lack of publicly available shift data for the time being.
More information may help refine left-handed Pull% as a proxy, however, in order to more adequately capture the effects of defensive shifts. For example, power hitters may experience more shifts (do they?), so something like isolated power (ISO) or home runs per fly ball (HR/FB) could help proxy for frequency of defensive shifts employed between hitters (rather than assuming a constant rate for everyone).
Further discrimination by batted ball type for lefties — pulled ground balls, fly balls, line drives — could further benefit the estimates sans defensive shift data.
I’m not sure why a dip in statistical significance occurs for left-handed Pull% in 2013. The total effect on lefties is about the same, so BABIPs on left-handed pulled balls in play may have been uncharacteristically high in 2013 regardless of the number of defensive shifts employed. The reverse could be said about the BABIPs for right-handed pulled balls in play. Still, the trend remains fairly clear.
I would provide a list of expected BABIPs for 2015 hitters, but the equation doesn’t incorporate 2015 year effects. Narrowing the data set to exclude the negligible effects of 2010 and 2011, and consolidating the equation from year-effects back down to the dummy-interaction approach, would produce an equation that yields sample-average estimates for Pull% that could be used to predict BABIPs for 2015 hitters. (I can’t yet attest to its external validity, however.) Unfortunately, I don’t have the 2015 data set ready — didn’t have the foresight — and it takes effort I’m shamefully too lazy to spare at the moment. Besides, it wouldn’t do any of us much good right now anyway. I’ll follow up with a list during the offseason — perhaps with a refined equation, too.
 
								
Are you doing this work in R? Digging the posts and the thorough knowledge of statistics.