The Best Pitcher Expected BB% Formula Yet

About a month ago, I shared the pitcher expected K% regression equation I came up with that does a great job of estimating what a pitcher’s strikeout percentage should be given the combination of his called, swinging and foul strikes. Since then, I have wanted to do the same for a pitcher’s walk percentage. Unfortunately, there have been many attempts in the past, but with much less success than for a pitcher’s expected strikeout percentage. I believe the highest R-squared attained has been in the low 0.50 range, which isn’t bad, but clearly suggests that there’s a lot more going on that isn’t being accounted for.

Since I still had all the same data from the strikeout percentage study (pitchers with at least 50 innings pitched from 2008-2012, n = 1,629), I decided to make use of it to try devising an equation for walk percentage. Let’s begin by taking a look at some correlations between a pitcher’s BB% and various metrics. The first five are from Baseball Reference, while the rest you should be familiar with.

  Correlation w/BB%
Str% (Strike %) -0.75
L/Str (Looking Strike %) 0.09
S/Str (Swinging Strike %) 0.21
F/Str (Foul Strike %) 0.03
I/Str (Balls in Play Per Strike %) -0.29
O-Swing% -0.42
Z-Swing% -0.15
Swing% -0.52
O-Contact% -0.28
Z-Contact% -0.19
Contact% -0.25
Zone% -0.23
F-Strike% -0.60
SwStr% 0.09

As one might expect, the best predictor of walk percentage is strike percentage. Throw more strikes, walk fewer hitters. Simple. Swing% also has a relatively high correlation, which makes sense as well. If a hitter swings, it’s going to either be a strike or put into play, both of which do not result in a ball.

Initially, I decided to once again make use of the strike type rates from Baseball Reference and combine them with Str%. Lo and behold, that resulted in a strong 0.86 correlation, well above previous attempts. However, just before writing this post, I thought that maybe the strike rate types don’t matter all that much and what’s most important is whether the pitch is put into play or not. All of the strike rate types are balls not put into play, so it is indirectly measuring the pitcher’s ability to prevent non-foul ball contact. The equation also had four variables and for some, that is too many.

So I decided I would instead pare things down and test I/Str along with Str%. I/Str ends up being the remainder after adding up the three strike type rates. If a pitcher’s strike type rates add to 70%, then his I/Str would be 30%. That means that 30% of the strikes this pitcher has thrown have been put into play. Excitingly, the loss in R-squared was negligible, and the equation now requires just two variables.

Now for your graphical viewing pleasure…


xBB% = 0.6583 + (-0.7838 * Str%) + (-0.2685 * I/Str)

The equation makes sense and the coefficients derived follows with what I described above. The biggest factor in a pitcher’s walk percentage is simply the frequency in which he throws strikes. So that’s the largest component of the equation. The more strikes thrown, the lower the walk rate. After that, if a pitcher’s balls keep getting put into play, then obviously he isn’t going to be walking those batters. So the rate at which the strikes that he throws are put into play also plays a role. I am making the assumption that any ball put into play is automatically considered a strike by the metric, even if it was technically outside the strike zone.

Though it’s no 0.89 R-squared like I got with the xK% equation, 0.73 is so far the best I have seen. And from Matt Klaassen’s post at the beginning of the year that looks at various historical pitcher metric correlations, this equation beats the YoY R-squared for BB% of 0.5184 (0.721 correlation). Furthermore, I tested the equation using 2013 data. I looked at all pitchers with at least 20 innings pitched (n = 445), which is a more relaxed innings pitched requirement than my original population. R-squared on this season’s data was a less impressive, but still better than the rest, 0.64.

Of course, there still appears to be room for some improvement. I believe that 0.73 is the best we’ve got so far, but we should be able to do better. I’m not sure what else could be looked at with the current slate of metrics available, so it might come down to a sequencing thing. It’s possible that some pitchers really do lose their control mid-game and are prone to bouts of wildness. If a pitcher throws 16 balls all game, but they all come in a row, he has walked four batters. Yet if he pitched seven innings and threw 100 pitches, only 16 balls is one heck of a ratio and would not normally match up with four walks. So sequencing is important, but only if there is a real difference in ability between pitchers.

So unlike my expected strikeout percentage equation where I was confident that there was little more to do, I end this post with a hope that we keep exploring. If we can continue to develop equations using underlying skill metrics to project the results, or the surface stats, then I will be a happy man.

Mike Podhorzer is the 2015 Fantasy Sports Writers Association Baseball Writer of the Year. He produces player projections using his own forecasting system and is the author of the eBook Projecting X 2.0: How to Forecast Baseball Player Performance, which teaches you how to project players yourself. His projections helped him win the inaugural 2013 Tout Wars mixed draft league. Follow Mike on Twitter @MikePodhorzer and contact him via email.

Newest Most Voted
Inline Feedbacks
View all comments
9 years ago

How would one calculate ‘balls in play’? Would that just be (gb+fb+ld+iffb) ?

Steve Staude
9 years ago
Reply to  labe

Balls in play, according to the BABIP formula, is AB – K – HR + SF. What you had there is more along the lines of “batted balls,” which is GB + LD + FB. IFFB are included in FB on FanGraphs.