Introducing the New New xK%, Featuring the Changeup Adjustment

January 23, 2017

Last week, I reintroduced my xK% equation, this time, with updated coefficients. The equation’s components were the exact same, so there was nothing new or exciting to report. However, on the following day, I published the top 10 over/ underperformers from 2011 to 2016 as a fun little exercise to learn who has broken the model. I then attempted to figure out if there was a common theme among the over/underperformers and after performing some additional research and calculations, settled on a possible explanation — changeups are bad for strikeouts. Turns out, I was actually onto something and that something was already discovered two years ago.

Frequent commenter Josiah Rutledge (whose commenter user name escapes me) emailed me to remind me of Eno’s article from January 2015 on this exact topic. Eno asked whether the changeup has a strikeout problem and the data he presents results in a resounding yes.

Since that validated what I thought might be missing from my xK%, I decided to go back to work. I added CH% to my data set and was encouraged to find that it’s correlation with K% was -0.184 during the 2011 to 2015 period. So far, so good. So I took the next step by rerunning the regression, and voilà, a new equation.

xK% = -0.8269 + (Str% * 0.2893) + (L/Str * 1.2515) + (S/Str * 1.1526) + (F/Str * 0.9454) + (CH% * -0.0282)

Adjusted R-Squared: 0.933

Str% — Strike Percentage | Strikes / Total Pitches
L/Str — Looking Strike Percentage | Strikes Looking / Total Strikes
S/Str — Swinging Strike Percentage | Strikes Swinging Without Contact / Total Strikes
F/Str — Foul Ball Strike Percentage | Pitches Fouled Off / Total Strikes
CH% — Percentage of changeups thrown

If you recall (yeah, you definitely don’t), the R-squared from last week’s update was 0.931. So, this equation added all of .002 to the R-squared. It might seem like nothing and pointless, but a) hey, the R-squared did increase, albeit barely and b) the addition of CH% accomplished what I wanted it to, as you’ll see below.

Before getting to the individual players, a bit more results from the new equation compared to last week’s update:

New xK% Correlations

	New xK%	Last Week’s xK%
xK% Y1 to K% Y2	0.7173	0.7158
xK% 2015 to K% 2016	0.7137	0.7137
xK% Y1 to xK% Y2	0.7492	0.7475

So again, we see the tiniest of games from this new equation. It predicts K% in year 2 ever so slightly better (though somehow performed exactly the same from 2015 to 2016), while it’s marginally more stable from year-to-year.

Furthermore, CH% during the period was remarkably consistent, as its year-over-year correlation was 0.8772. It’s nice to add a component to the equation that doesn’t tend to bounce around each season.

Alright, now let’s get to the players and how, as a group, they have been affected.

New xK% vs Last Week’s xK% Group Averages

	FBv	CH%	K%	New xK%	New K%-xK%	Last Week’s xK%	Last Week’s K%-xK%
Overperformer Averages (150)	92.3	7.50%	22.00%	20.53%	1.45%	20.46%	1.53%
Underperformer Averages (150)	91.2	10.80%	20.10%	21.89%	-1.80%	21.92%	-1.83%

So compared to last week’s equation, we see yet again that the new version offers the smallest of gains, but gains, nonetheless. Obviously, we’re not seeking a perfect match between K% and xK% for these two groups — there are always going to be outliers and every data point should never be expected to fit a regression model.

Now for the most important comparison, the changeup guys. On average, the pitchers in the entire data set threw the changeup 9.5% of the time. Below is a comparison of the 50 pitchers who throw changeups most frequently, along with all pitchers that have never thrown a changeup:

New xK% Changeup Group Averages

	FBv	CH%	K%	New xK%	New K%-xK%	Last Week’s xK%	Last Week’s K%-xK%
Heavy Changeup Throwers (50)	90.7	30.2%	19.5%	19.9%	-0.5%	20.5%	-1.0%
Non-Changeup Throwers (103)	92.1	0.0%	21.4%	22.3%	-0.8%	22.0%	-0.5%

Woah, woah, woah, hold my horses. First, the good news — I set out to correct the underperformers after my theory seemed to indicate it was the changeup causing an issue. And that was essentially resolved, as the new xK% for the 50 heaviest changeup throwers is now much closer to the group’s actual K% as compared to last week’s equation. A reduction of 0.6% from the old to the new equation is pretty significant. Success!

HOWEVER, now the non-changeup throwers are all screwed up. They received an unearned xK% boost simply because they failed to throw even one changeup, and now xK% overrates the group even more than last week’s version (even when expanding the non-changeup throwing group to all very low users of the pitch still increased the gap between K% and xK%).

Check out the FBv (fastball velocity) column. My hope was that the non-changeup throwers would come in below the data set average (91.8 mph), which might suggest the need for incorporating that as a variable. But since the group sits above the average, all that would happen is my xK% would jump again and get even further away.

If nothing else, this table suggests that pitchers throw more changeups when their fastball ain’t so fast. That makes sense, of course. The correlation between CH% and FBv in my data set was -0.2126.

I’m not happy that while fixing the heavy changeup users, I seemingly broke everyone else. It’s kind of silly to feel the need to improve an equation that already sports an R-squared above .90, but hey, I’m a perfectionist.

Thoughts? Advice on how to fix the non-changeup users? Do you feel last week’s (without CH%) or this week’s (with CH%, and assuming no adjustment to fix what I broke) xK% should be the official version for my future posts?

13 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

scotman144Member since 2016

8 years ago

Would it be against the spirit of your equation to put a conditional on things that could break it like throwing zero changeups? If you had (for example) an IF CH thrown > 100 threshold for that part of the equation to be included you’d solve the problem without really changing anything.

Reply to scotman144

More concisely: you could just wrap your old and new formulas in an IF statement in excel and hopefully get the best of both worlds.

=IF(CH% > [threshold value], new equation, old equation)

Mike PodhorzerFanGraphs Staff

Yeah, I thought about that, but it seems so arbitrary. Like I mentioned in parentheses, it’s not just the no changeup guys. Even if I expanded that group to below 3%, the gap between K% and xK% still increased. So it’s not as easy as just an IF statement.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG