Challenge #2 Follow-Up: BABIP and Weak Contact

The second of last week’s challenges asked you to prove that a low BABIP means that the pitcher induced weak contact. I’m tired of reading that Pitcher X has a .220 BABIP and so that means he has “kept hitters off-balance”, “induced weak contact” or that “hitters have a difficult time squaring him up”. The opposite is equally as annoying, reading that Pitcher Y, sporting a .330 BABIP, is “hittable”. With no evidence ever presented to support such a conclusion aside from the BABIP itself (and perhaps, if we’re lucky, a mention of the batted ball distribution allowed), the claims are meaningless.

All of those descriptions may very well be true, but we still have no real proof of it for any specific pitcher, so it’s all just conjecture. That has no place in fantasy analysis when these statements are made as if they were facts. It’s misleading to the reader and a real disservice.

I’ll climb down from my high horse now and get to discussing the comments.

The very first comment summarized my thoughts so perfectly.

steve says:

Right, this one is even harder… because even if you could show that a pitcher is getting weaker contact, you then have to prove he is actually INDUCING that weaker contact. This seems to fluctuate enough that you have to lean to the hitter on that one.

Insofar as IFFB are weak contact, I think that’s as far as you can go.

BINGO! From listening to broadcasters, it seems as if pitchers are credited with the ultimate batted ball result far more often than they should be. There are exceptions of course where the commentator will acknowledge a good pitch was made, but the batter just hit it. But that’s rare. Is it a human psychology thing that we think the pitcher is more in control because he’s the one actually throwing the ball, whereas the hitter is simply reacting?

Brad Johnson says:

IMO, this is getting awfully close to asking us to prove a negative. We know what we know about batted ball contact and that obviously plays a big part – esp. IFFBs. Without access to a large set of HITf/x data, I’m not sure we can do what you ask.

Yup, that’s the point. If it cannot be proven, then I don’t think it’s fair to label pitchers as hittable or possessing the skill to induce weak contact.

dl80 says:

Using standard deviations, can’t we figure out that a certain number or percentage of pitchers are likely to have an abnormally low (at least 1 SD) BABIP for 1, 2, 3, even 4 years in a row, even just if BABIP is absolutely all luck-based? We have no way to predict which pitcher it will be, but it’s possible someone like Matt Cain or Kyle Lohse are just outliers because, statistically speaking, there has to be an outlier. It just happens to be them.

BINGO again! I can’t remember where I read it, perhaps Tom Tango’s The Book Blog, but it’s an important reminder. Somebody is going to have a low or inflated BABIP several years in a row due to just randomness. It’s kind of like the sports gambling best bets scam. The advice site will tell 1,000 customers their Best Best is the Yankees and the other 1,000 the Red Sox, when the teams are facing each other. Half of those customers will be happy and pay for another Best Bet. They repeat, this time with 500 happy and paying up. Over and over again it continues until the pool of repeat customers dwindles. But that pool just won many games in a row and believes the advice site is brilliant. They had no idea what scam they just pulled.

What about pitchers who throw a fastball/sinker/cutter that tails away from the plate? If a righthanded pitcher throws a fastball/sinker/whatever that has glove side run, it will obviously tail away from righties. If this pitcher tends to throw on (and just off) the outer half of the plate a lot, that pitch may be really hard for righties to square up correctly. Obviously, that would be the opposite of lefties, and presumably righties could eventually wait on that pitch and still hit it. But I wonder if a pitch tailing off and away is harder to get good contact (and presumably more likely to have a lower BABIP)?

Does a large differential between a fastball and breaking ball and/or changeup make a difference? Do guys with 10 mph between fastball and breaking ball, and another 10 between breaking ball and changeup, tend to have lower BABIP? Do guys with a smaller difference have a higher BABIP?

The first idea is is a pitch movement and location theory. With the data we have available now, could this even be researched? I have no idea. But it’s something to consider. The next idea is also intriguing. I know Eno has found that a smaller differential between fastball and changeup produces more grounders, while a larger differential induces swings and misses. I wonder if the differential also affects BABIP.

Does pitcher height matter at all? What about sidearmers or three-quarters guys? What about guys like Carter Capps and his funky motion?

I’m not sure about height, but I would say a funky motion should be proven to reduce BABIP. It’s lumped into the deception group and for as long as hitters are deceived, that would be a logical effect. I don’t know if it has ever been studied in-depth though.

Ruki Motomiya says:

My first intuition was high fstrike% + zone % would lead to some lower BABIPs, but when I looked at it, a different pattern seemed to emerge, which is that a high fstrike% with a lower zone %(50% or so) seemed to lead to lower BABIPs.

Buried in the comments is a quick study I performed to determine if F-Strike% or Zone% correlated at all with BABIP. I found:

I quickly ran correlations and a regression using 754 qualified starter seasons from 2006-2014. Correlation of Zone% with BABIP and F-Strike% with BABIP. Both were tiny, Zone% at .06 and F-Strike% at .03. It’s both a positive correlation meaning more strikes equals higher BABIP, but obviously the effect is minuscule.

Combining the two, R-squared is just .005, so nothing to see here.

Capt. Clutch says:

How can you possibly expect to prove that a pitcher induces weak contact without hit/fx data? The closest one can come right now is FB%, GB%, LD%, IFFB%. The data available to the public now cannot be used to definitively prove that some pitchers run low BABIPs because of weak contact, because we can’t even sufficiently measure the quality of said contact.

BAM! There it is again.

Turning back to Zachary Smith, who emailed me after the first challenge, we learn this:

I’ve done massive amounts of research on the topic of pitchers and BABIP, and barring some new data come to light, it turns out pitchers can’t control anything about their BABIP except, you guessed it, pop ups. I’ve sorted pitchers into groups based on skills, talent level, & in addition to broad stroke analysis of “pitchers” generally and the results stand firm. No matter how nasty a guy’s stuff, one’s BABIP is forever approaching .295 (or whatever the average MLB hitter’s talent level is in a given year). The ability to generate pop ups is the lone skill in a pitcher’s arsenal for the obvious reason that pop ups are almost always automatic outs. I used true IFFB% again (FB% * IFFB%) and one’s BABIP is most accurately predicted in the last 3 years by the following formula:

=0.3086*(2.17828^(-1.742*TRUE IFFB%)

As for why a pitcher has no control over his BABIP, the reasons are three-fold. Firstly, hitters largely control the moment of contact (speed, trajectory, and proximity of the bat to the ideal location of contact with the ball). Secondly, the talent level at the major league level is, despite large disparities in results, almost uniform. Thirdly, getting the bat on the ball is what largely determines the outcome of an at bat: swings and misses are the key skill a pitcher controls since two round objects approaching each other at two hundred miles an hour are largely outside of anyone’s control considering the fine motor skill and vision required to make the adjustment of millimeters in the contact zone that leads to the difference between a pop up and a home run is beyond the ability of human perception and action.

This is great stuff. But, surprising. Given the huge disparity in BABIP between fly balls and ground balls, I would have thought the rest of a pitcher’s batted ball distribution would also matter. Zachary found that this wasn’t the case. I don’t know the entire details of his research, so perhaps he didn’t cover every base. And he certainly hasn’t looked at data not available that could shine more light on this topic.

I thought that such things as location, changing offerings, &c. should affect BABIP, but no matter how hard I looked I couldn’t find anything of real note. Sure I found some statistically significant correlations, but they were extremely large sample size, low R-squared, small deviation results so they weren’t worth factoring into my heuristic. I feel like there probably is meat left on the bone for BABIP research with pitchers, but it will have to be extremely specific driven by rigorous categorization and without broadly applicable concepts that apply to all pitchers, instead with results that apply to pitchers with only a certain pitch mix, quality of offering, location inside/outside early in counts or when they need to pitch a strike.

That’s the next frontier. Looking at individual pitches, location, etc. But it’s so hard to isolate all the various things worth testing because the effectiveness of a pitch is influenced by previous pitches.

Zachary went one step further:

So, I took all starting pitchers with 150 or more innings last year who threw curveballs and analyzed that data set. A small data set, surely, but I was hoping the early returns would allow me to get a direction for multi-year analysis. Annnndddd, success. Here are the results:

Curveball Usage BABIP   vCurveball BABIP   X-movement BABIP   Y-movement BABIP
<8% 0.293   Top 50% 0.294   Top 50% 0.293   Top 50% 0.290
<12% 0.294   Top 40% 0.296   Top 40% 0.296   Top 40% 0.290
<16% 0.292   Top 30% 0.294   Top 30% 0.296   Top 30% 0.283
<20% 0.291   Top 20% 0.294   Top 20% 0.298   Top 20% 0.280
<24% 0.290   Top 10% 0.295   Top 10% 0.296   Top 10% 0.280
Bottom 20% 0.286   Bottom 20% 0.291   Bottom 20% 0.294      
Bottom 10% 0.289   Bottom 10% 0.284   Bottom 10% 0.296      

The data speaks for itself, but generally the less horizontal movement and the greater the vertical movement on one’s curveball, the lower one’s BABIP. That said, such effects only take place at the extreme’s of the given skill sets which is why most linear regression (without categorization) will not reveal such underlying forces. As for usage it appears that pitchers use their curveballs to varying degrees regardless of how good those curveballs are at managing contact; this is likely because of how many swings and whiffs said pitch generates, or the other pitches a pitcher has at his disposal and their relative strength or weakness. And finally, there appears to be a “best velocity” for the curveball—I believe this is likely relative to the velocity of one’s primary pitch (four-seamer, two-seamer, sinker), a goldilocks zone where the pitch is slow enough to maximize movement and differentiate itself from the primary pitch, but fast enough so that the hitter cannot pick up on the pitch easily/quickly and lay off it or adjust.

Somebody hire this man to do more research!





Mike Podhorzer is the 2015 Fantasy Sports Writers Association Baseball Writer of the Year and three-time Tout Wars champion. He is the author of the eBook Projecting X 2.0: How to Forecast Baseball Player Performance, which teaches you how to project players yourself. Follow Mike on X@MikePodhorzer and contact him via email.

15 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Kris Gardham
9 years ago

I mentioned this in the other post, but there are a couple small correlations (like five percent r2) between speed, movement and BABIP.

For some reason, I thought speed difference from FA->CH came across with the biggest correlation, with like .15-.25, or like 5%.

I also, for some reason, thought X-movement mattered a lot more than y-movement. I didn’t use buckets, though.

Mike, it takes 20 minutes to dump the whole she-bang into a custom report and export it to excel, and run the correlations. It takes another 20 to calculate speed difference by subtracting pitch a from pitch b (or whatever)

Using custom reports isn’t as great as querying pfx, but it does the trick.