What to Expect When You’re Expecting Strikeouts

I tend to get rather obsessed about starting pitchers, strikeouts, and predictability – three things that go together like Tracy Morgan, funny stuff, and sanity. What keeps me up at night, and what I find particularly unnerving are sizable, unexplained variations from year to year. Why did Jhoulys Chacin go from 9.04 strikeouts per nine innings pitched in 2010 to 6.96 in 2011? Why did Ricky Nolasco go from 8.39 K/9 to 6.47? Jered Weaver from 9.35 to 7.56? When obvious indicators of age or injury aren’t there, the resulting chaos has me reaching for torches and pitchforks.

My background is mostly in higher education and doing social science research reveals a high degree of human behavior is surprisingly predictable. And perhaps what gets me so riled is that measuring athletic performance just isn’t the same as trying to predict, say, how people in Concrete, WA might vote on an initiative. But what we can attempt to do is take the information available to us and partially explain away the outliers — that is, if we have a model that says a pitcher ought to have performed at a certain level, we can then look outside the controlled variables for answers. Or something like that.

About a month ago, Bradley Woodrum nicely demonstrated the relationship between swinging strike rate and strikeout rate, heteroscedacicity be damned. While the fit of the model wasn’t perfect, it demonstrated a pretty interesting relationship, and then of course Bradley goes on to present the worst-case scenarios of strikeout rates, making it all the more interesting – but the relationship between the two variables is what got my noodle baking. There have also been a few notable looks at fastball velocity as it relates to strikeouts, most notably, Dave Cameron’s effort from a couple years back.

What I’m after is expected K/9, but not as a predictive tool looking forward, but more as a check, much the way we use expected BABIP.  To keep it simple, I looked at the 90 qualified starting pitchers from 2011 and their corresponding K/9 in 2010 (which unfortunately cuts us down to 76 in the sample as not everyone qualified). In using 2010 K/9 perhaps we can help control for some of that volatility that we see year to year, but also capture pitchers that for one reason or another don’t fit the model when it comes to only swinging strike rate and fastball velocity (but it does run the risk of multicollinearity).

Performing a correlation on FBv, SwStr%, 2011 K/9, and 2010 K/9, the corresponding correlations look like this:

K/9 2010 SwStr% FBv
K/9 0.819** 0.813** 0.506**

All three are statistically significant at .01.

Apologies for the quality of the following graphs, but clicking on them will make them more legible. The trend line for each, with their corresponding R-squared, looks as follows, starting with the trend line with K/9 2010:

Swinging Strike Rate and 2011 K/9:

FBv and 2011 K/9:

The R-squared in the K/9 2010 and SwStr% are right around .67 while the fit of the FBv is considerably lower at .256, as was expected. Taking all three as dependent variables with 2011 K/9 as the independent variable gives us a model that looks like this (with an R = .890; R squared = .792; Standard Error of the Estimate .689):

K/9 = -7.355 + (..089)*FBv + (39.726)*SwStr% + (.420)*K/9 2010

Running that against all qualified starters for 2011 give us an expected K/9 for 2011, helping to gauge whether some pitchers pitched above or beneath their predicted ability.

Name k92011 k92010 FBv SwStr% xK/9 Difference
Daniel Hudson 6.85 7.93 93.2 9.9% 8.20 -1.35
Edwin Jackson 6.7 7.78 94.5 9.3% 8.02 -1.32
Ricky Nolasco 6.47 8.39 90.5 8.9% 7.76 -1.29
Carl Pavano 4.14 4.76 89.0 7.1% 5.39 -1.25
Luke Hochevar 5.82 6.64 92.7 8.2% 6.94 -1.12
Fausto Carmona 5.2 5.31 92.5 7.9% 6.25 -1.05
Cole Hamels 8.15 9.1 91.7 11.3% 9.12 -0.97
Josh Tomlin 4.84 5.3 87.9 7.7% 5.75 -0.91
Jhoulys Chacin 6.96 9.04 91.0 8.2% 7.80 -0.84
Dan Haren 7.24 8.27 90.0 9.9% 8.06 -0.82
Hiroki Kuroda 7.17 7.29 92.0 10.3% 7.99 -0.82
Ricky Romero 7.12 7.46 92.1 9.6% 7.79 -0.67
Jaime Garcia 7.21 7.27 89.8 10.5% 7.86 -0.65
Joe Saunders 4.58 5.05 89.6 6.2% 5.20 -0.62
Shaun Marcum 7.09 7.6 86.9 10.3% 7.66 -0.57
Jered Weaver 7.56 9.35 89.1 9.1% 8.12 -0.56
Wade Davis 5.14 6.05 91.4 5.9% 5.66 -0.52
Jake Westbrook 5.11 5.68 90.0 6.3% 5.54 -0.43
Bud Norris 8.52 9.25 92.6 10.5% 8.94 -0.42
Mat Latos 8.57 9.21 92.8 10.6% 8.98 -0.41
Josh Beckett 8.16 8.18 93.1 10.5% 8.54 -0.38
James Shields 8.12 8.32 91.0 10.7% 8.49 -0.37
John Lannan 5.17 4.46 89.8 7.6% 5.53 -0.36
Max Scherzer 8.03 8.46 93.1 9.8% 8.38 -0.35
Mike Pelfrey 4.89 5.01 92.1 5.5% 5.13 -0.24
John Danks 7.13 6.85 91.6 9.3% 7.37 -0.24
Justin Masterson 6.57 6.97 92.7 7.5% 6.80 -0.23
Chris Carpenter 7.24 6.86 92.5 9.2% 7.41 -0.17
Tim Lincecum 9.12 9.79 92.3 10.7% 9.22 -0.10
Matt Cain 7.27 7.13 91.2 9.1% 7.37 -0.10
Ervin Santana 7.01 6.83 92.8 8.4% 7.11 -0.10
Gavin Floyd 7.05 7.25 91.2 8.4% 7.14 -0.09
Randy Wolf 5.68 5.93 88.4 6.8% 5.70 -0.02
Rick Porcello 5.14 4.65 90.1 6.3% 5.12 0.02
Matt Harrison 6.17 5.29 92.8 7.6% 6.15 0.02
Brandon Morrow 10.19 10.95 93.8 11.5% 10.16 0.03
Jeremy Guthrie 5.57 5.12 92.5 6.3% 5.53 0.04
Roy Halladay 8.47 7.86 92.0 10.8% 8.42 0.05
Bronson Arroyo 4.88 5.05 87.0 5.8% 4.81 0.07
Jason Vargas 5.87 5.42 87.4 7.8% 5.80 0.07
Colby Lewis 7.59 8.78 89.0 8.2% 7.51 0.08
Chad Billingsley 7.28 8.03 91.5 7.6% 7.18 0.10
Jon Lester 8.55 9.74 92.7 8.7% 8.44 0.11
Justin Verlander 8.96 8.79 95.0 10.2% 8.84 0.12
Kyle Lohse 5.3 5.28 89.4 5.9% 5.16 0.14
Tim Stauffer 6.2 6.64 90.4 6.5% 6.06 0.14
CC Sabathia 8.72 7.46 93.8 11.2% 8.58 0.14
Brett Myers 6.6 7.24 88.4 7.3% 6.45 0.15
Mark Buehrle 4.78 4.24 85.6 6.5% 4.63 0.15
Tim Hudson 6.61 5.47 90.4 8.6% 6.40 0.21
Mike Leake 6.36 5.92 89.1 7.7% 6.12 0.24
Derek Lowe 6.59 6.32 88.0 8.1% 6.35 0.24
Chris Volstad 6.36 5.25 91.3 7.9% 6.11 0.25
R.A. Dickey 5.76 5.35 84.4 7.8% 5.50 0.26
Clayton Kershaw 9.57 9.34 93.4 11.1% 9.29 0.28
Ted Lilly 7.38 7.71 87.4 8.5% 7.04 0.34
A.J. Burnett 8.19 6.99 92.7 10.0% 7.80 0.39
Livan Hernandez 5.08 4.85 83.9 6.4% 4.69 0.39
Wandy Rodriguez 7.82 8.22 89.1 8.5% 7.40 0.42
Yovani Gallardo 8.99 9.73 92.7 9.0% 8.56 0.43
Javier Vazquez 7.57 6.92 90.4 8.9% 7.13 0.44
Ryan Dempster 8.5 8.69 90.3 9.3% 8.03 0.47
Trevor Cahill 6.37 5.4 89.1 7.6% 5.86 0.51
Ian Kennedy 8.03 7.79 90.3 8.8% 7.45 0.58
Felix Hernandez 8.55 8.36 93.3 8.8% 7.96 0.59
Paul Maholm 5.38 4.95 87.4 5.7% 4.77 0.61
Doug Fister 6.08 4.89 90.0 6.7% 5.37 0.71
Matt Garza 8.95 6.62 93.7 11.2% 8.21 0.74
Gio Gonzalez 8.78 7.67 92.5 9.5% 7.87 0.91
David Price 8.75 8.1 94.8 8.4% 7.82 0.93
Ubaldo Jimenez 8.6 8.69 93.5 7.5% 7.60 1.00
Madison Bumgarner 8.4 6.97 91.7 9.2% 7.39 1.01
Anibal Sanchez 9.26 7.25 91.7 10.9% 8.18 1.08
C.J. Wilson 8.3 7.5 91.0 8.3% 7.19 1.11
Cliff Lee 9.21 7.84 91.5 9.3% 7.78 1.43
Zack Greinke 10.54 7.4 92.5 10.6% 8.20 2.34

I won’t spend a great deal of time here, but it’s worth talking through a few of these, and while the more interesting cases exist on the poles, I’d like to point out that the model actually is in full agreement with what Brandon Morrow did in 2011, predicting a 10.16 K/9 against his 10.19 K/9. The model thinks Zack Greinke, Cliff Lee, and C.J. Wilson ought to have looked much more like their 2010 performances, with Greinke being the real outlier. You’ll also notice there are some pretty solid arms at the bottom of that list, and you need to consider cases such as Felix Hernandez and David Price who experienced rather low swinging strike rates in 2011.

For players that we might see an uptick in K/9 should their velocity remain and they can maintain a similar swinging strike rate – the model thought Daniel Hudson, Cole Hamels, Edwin Jackson, and Ricky Nolasco should have all looked more like their 2010 selves than they did in 2011, all giving up right around a full strikeout per nine.

This isn’t perfect, and it will fluctuate right along with the fastball velocity and swinging strike rates of each starter. Also, some starting pitchers don’t rely on their fastball for strikeouts, and there may certainly be cases where a pitcher can maintain a higher K/9 rate without a plus fastball (Ian Kennedy, for instance). But where it is at least useful is to look back at 2011 and try to understand some of the inconsistencies year to year that we see relative to K/9.





Michael was born in Massachusetts and grew up in the Seattle area but had nothing to do with the Heathcliff Slocumb trade although Boston fans are welcome to thank him. You can find him on twitter at @michaelcbarr.

21 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Neal
12 years ago

Would have been interesting to incorporate called strikes and called 3rd strikes. Some pitchers that nibble relentlessly and consistently produce high called 3rd strike rates like Gallardo may be askew in this model. Fun piece though.