xK% Outperformers and Underperformers Through the Years

Yesterday I rolled out an updated version of my pitcher xK% equation, which estimates what a pitcher’s strikeout rate “should be” given various strike and strike type metrics found at Baseball-Reference.com. With my data set, I put together a table calculating historical averages during the time period (2011-2016) I compiled data for. I’ll share the top 10 pitchers that have outperformed and underperformed their xK% (so if a pitcher outperformed by 2% in 2011 and 3% in 2012, I’ll be looking at the total of 5%, rather than the average of 2.5%), and we’ll try to figure out what, if anything, the pitchers in each group have in common with each other. So let the fun begin!

xK% Outperformers 2011-2016
Player Average Season K% Average Season xK% Average Season K%-xK% Total K%-xK%
Craig Kimbrel 40.5% 37.2% 3.3% 19.7%
Kenley Jansen 40.1% 37.0% 3.1% 18.4%
Aroldis Chapman 42.8% 39.9% 2.8% 16.9%
Clayton Kershaw 29.3% 26.7% 2.6% 15.6%
Stephen Strasburg 28.9% 26.3% 2.6% 13.1%
David Robertson 33.1% 31.0% 2.1% 12.6%
Andrew Miller 35.0% 33.0% 2.0% 11.9%
Cesar Ramos 18.2% 16.3% 1.9% 11.5%
Dellin Betances 40.4% 36.6% 3.8% 11.3%
Mike Leake 16.3% 14.4% 1.8% 11.0%
Group Average 32.4% 29.8%

Well this is quite the group of pitchers. On average, they posted a 32.4% strikeout rate each season! Of the 10 pitchers, seven of them are relievers, and the top three are arguably the best in the game right now. Our top starting pitcher is no surprise, as he’s the best pitcher on the planet.

Are you thinking this is obvious? Of course the best pitchers are going to outperform the formulas right? They are the outliers. Well, no, I think it’s the reverse. We perceive these pitchers are the best because they do even more than we expect them to. This is as opposed to us knowing they are the best and then learning that they also outperform their xK%. It’s their outperformance that is driving their success. Or at least acts as one of the drivers.

Of course, two pitchers seemingly don’t belong, and that’s Cesar Ramos and Mike Leake. Ramos has been a journeyman LOOGY who hasn’t exactly racked up the strikeouts, but xK% thinks it should have been even worse. He was averaging in the 91 to 92 mph range until 2013, but then his fastball steadily declined, and averaged just 88.2 mph this past season. Smartly, he has moved away from the pitch in favor of his slider, and this year, his changeup.

Leake is another soft-tosser, but unlike Ramos, and the majority of the rest of the human population, he has actually gained fastball velocity. But he’s essentially been fastball-cutter his whole career, and then also mixes in three more pitches for a full five-pitch repertoire.

What does these pitchers have in common? It would be easier to compare their fastball velocities and the pitch they choose as their secondary weapon of choice:

xK% Overperformers 2011-2016
Player FBv Most Frequently Used Secondary Pitch
Craig Kimbrel 96.9 CB
Kenley Jansen 92.9 SL
Aroldis Chapman 98.9 SL
Clayton Kershaw 93.2 SL
Stephen Strasburg 95.2 CB
David Robertson 92.2 CB
Andrew Miller 93.9 SL
Cesar Ramos 90.5 SL
Dellin Betances 96.9 SL
Mike Leake 90.3 CT
Group Average 94.1

So clearly as a group, their fastball velocity is above the MLB average. To avoid arguing whether some of the pitchers really throw a slider, or is it actually a curve ball, or wait, it’s a SLURVE (!!), let’s just say that nine of the ten use a breaking ball to complement their fastball. Only Leake leans on the cutter, which isn’t entirely fair, since that’s a type of fastball. And heck, that’s really what Jansen’s fastball is too. If we ignore Leake’s cutter, we’re left with his curve ball, or slider, since they have almost identical usage (the changeup is also just barely behind).

What’s interesting to me is that none of these pitchers rely on the changeup. It’s all breaking balls, all the time when they aren’t throwing the fastball. Hmmmmm.

Now let’s check the underperformers:

xK% Underperformers 2011-2016
Player Average Season K% Average Season xK% Average Season K%-xK% Total K%-xK%
Matt Belisle 18.2% 20.9% -2.7% -16.4%
Luke Gregerson 22.9% 25.1% -2.2% -13.4%
Fernando Rodney 24.4% 26.6% -2.2% -13.2%
Louis Coleman 23.3% 26.4% -3.2% -12.6%
James Russell 16.4% 18.6% -2.2% -11.1%
Randall Delgado 19.5% 21.4% -1.8% -11.0%
Hector Noesi 16.1% 18.8% -2.7% -10.8%
Nathan Eovaldi 16.8% 18.6% -1.8% -10.5%
R.A. Dickey 18.2% 20.0% -1.8% -10.5%
Pat Neshek 22.3% 24.9% -2.6% -10.5%
Group Average 19.8% 22.1%

Wowzers. That’s one boring group of pitchers. And quite a bit less attractive than the first group. But again, this shouldn’t be too surprising as these pitchers haven’t enjoyed as much success because they have posted results that didn’t quite match up with some of their underlying advanced metrics.

There’s really only two full-time starters on the list, with a couple of others having some starts here and there included in their averages. Obviously, the group’s average strikeout rate is far lower than the overperformers. The rich get richer and the poor get poorer.

It’s sad to find Nathan Eovaldi, king of the “strikeout rate doesn’t match the stuff” theme, on this list. Of course he’s here! I have even less of an idea of whether there’s a common thread by just looking at the names than I did with the outperformers group, so let’s go through the same exercise:

xK% Underperformers 2011-2016
Player FBv Most Frequently Used Secondary Pitch
Matt Belisle 91.1 SL
Luke Gregerson 89.0 SL
Fernando Rodney 95.4 CH
Louis Coleman 89.6 SL
James Russell 89.0 SL
Randall Delgado 92.3 CH
Hector Noesi 93.0 CH
Nathan Eovaldi 95.8 SL
R.A. Dickey 76.2* FB
Pat Neshek 89.4 SL
Group Average 90.1
*Obviously Dickey’s velocity is based on his knuckleball, not his fastball, and I indicated his FB as his secondary pitch (ha!)

So as a group, the underperformers averaged 4 mph less in fastball velocity than the overperformers. Excluding Dickey, however, the gap narrows to 2.5 mph, which is still rather significant.

It’s interesting to suddenly see changeups show up on the list, but only from three doesn’t really suggest anything. Perhaps if the majority of the group featured the changeup, we could theorize that for whatever reason, a breaking ball was leading to outperformance, while the changeup was leading to underperformance. And although that could still be the case, these two groups of just ten pitchers certainly don’t prove it.

I do recall some previous incarnations of expected strikeout rate equations using fastball velocity as an input. I’m fairly sure that the Steamer projections do, however I don’t think they use any of the strike type rates like I do. I just always assumed that fastball velocity would simply show up in the strike type rates, with higher velocity leading to a higher S/Str%, while lower leading to lower. But maybe there’s something else that a harder fastball is doing to allow the overperformers to outperform? Unfortunately, if I try to introduce fastball velocity into my equation, the commenters already concerned about the prospect of multicollinearity are going to have their heads explode. So maybe I should proceed with caution, as there’s no doubt fastball velocity correlates highly with S/Str%.





Mike Podhorzer is the 2015 Fantasy Sports Writers Association Baseball Writer of the Year. He produces player projections using his own forecasting system and is the author of the eBook Projecting X 2.0: How to Forecast Baseball Player Performance, which teaches you how to project players yourself. His projections helped him win the inaugural 2013 Tout Wars mixed draft league. Follow Mike on Twitter @MikePodhorzer and contact him via email.

5 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
wgmcd
7 years ago

Most of the overachievers are at the extremes (whether it is for starters or relievers). 1. Your model may just break down a bit at the extremes and not capture the impact of extremely high swinging strike rates. The relationship between the predictor variables and K% is probably not truly linear, but close enough for government work for the most part. When you get up in to the extraordinarily high values, the difference between a linear relationship and whatever the “true” relationship is is magnified. 2. At the extremes, there just aren’t that many data points, so the model will likely just not fit as well out there. If I had to guess, if you did a residual plot (plotting residuals on the Y, independent variable on the X) I bet you’d see a pattern of systematic underestimates on the high end.