Updating My Hitter xK% Metric

A whopping eight years ago, I shared the hitter xK% metric I developed using a couple of our plate discipline metrics. It was quite good, using only three variables, but still had a strong R-squared of 0.81. Since then, I haven’t discussed it all that much, but still use it to help formulate my Pod Projections. However, I have actually been using an updated version that I had never shared and it’s even better. The comments on my recent xwOBA articles inspired me to finally reveal the latest and greatest version of the hitter xK% metric.

If you recall, I’ve shared several versions of my pitcher xK%, with this one being the latest. It then dawned on me that perhaps I should try using the same variables for the hitter version. Sure enough, it worked, leading to an even higher R-squared. Now as I type this, I have come to the realization that I actually updated this pitcher xK% formula as well and haven’t shared it, and the newest hitter xK% uses the same variables as the newest, unshared, version of the pitcher xK%. So let’s just get to the equation and I’ll explain.

My data set included all batter seasons with at least 50 plate appearances from 2014-2018, with a total population of 2,530.

Hitter xK% = -0.65909 + (L/Str * 1.774892871) + (S/Str * 1.989362763) + (F/Str * 1.58423575) + (3-0% * -0.156090822) + (Pit/PA * -0.097239383)

Adjusted R-Squared: 0.941

All metrics are from Baseball-Reference.

L/Str — Looking Strike Percentage | Strikes Looking / Total Strikes
S/Str — Swinging Strike Percentage | Strikes Swinging Without Contact / Total Strikes
F/Str — Foul Ball Strike Percentage | Pitches Fouled Off / Total Strikes
3-0% — 3-0 Count Seen Percentage | 3-0 Counts / Plate Appearances
Pit/PA — Pitches Per Plate Appearance | Pitches Seen / Plate Appearances

Can you really get much higher than a 0.941 R-squared? There will always be exceptions and consistent beaters and missers, but this seems to be as slam dunk an equation as it could possibly get.

Like in my original pitcher xK% equation, I am using the three strike type rates as variables, as they perform better split up than overall strike percentage, or just one or two of the trio. I have also added the last two variables, which haven’t been shared in any of my previous equations.

3-0% is important because it accounts for sequencing, which is something that I always knew was missing, but was impossible to factor in with any other variable. If over two plate appearances, a hitter sees four balls and four strikes, the sequencing of those pitches will have a significant effect on the outcomes of those PAs. If those four balls were thrown consecutively during one PA, he’s going to trot to first base. Instead, if those balls and strikes alternated, any outcome is possible. 3-0% might not be a batter skill (I haven’t looked into its YoY correlation), but since xK% is meant to be a backward looking, descriptive metric, and not predictive, it doesn’t matter as much.

Pit/PA is an obvious driver of strikeouts — the more pitches a batter sees, the more opportunities he has to strike out. It has a 0.378 correlation with strikeout rate between 2014 and 2018. Oddly, the variable has a negative coefficient, so in this equation, a higher Pit/PA actually reduces a hitter’s xK%. That doesn’t make any sense to me, so maybe you mathematically smarter readers could help explain. The equation is much better with the metric included though, so I’m not entirely sure what’s going on, but I wanted to keep it in given its high correlation with strikeout rate.

Correlations With K% from 2014-2018
L/Str S/Str F/Str 3-0% Pit/PA
-0.095 0.808 -0.111 -0.137 0.378

It’s not just Pit/PA that features a coefficient opposite its correlation with strikeout rate. L/Str and F/Str both correlate negatively, but have positive coefficients in the equation. I assume it’s the way regression equations are derived and the interaction between all the variables that produces coefficients that aren’t always what you expect.

It’s clear from the correlation table that S/Str has the largest impact on a batter’s strikeout rate. The same is true from the pitcher’s perspective. I’m not sure why it’s so much higher than L/Str, as both strike types could result in strike three, so it probably relates to the types of other skills a batter/pitcher typically owns when they have high/low S/Str vs L/Str marks. In other words, if a hitter has a high S/Str, he likely does other things that correlate with a higher strikeout rate, whereas that’s not the case when a hitter has a high L/Str. That’s just my guess in trying to understand these correlations.

Mike Podhorzer is the 2015 Fantasy Sports Writers Association Baseball Writer of the Year. He produces player projections using his own forecasting system and is the author of the eBook Projecting X 2.0: How to Forecast Baseball Player Performance, which teaches you how to project players yourself. His projections helped him win the inaugural 2013 Tout Wars mixed draft league. Follow Mike on Twitter @MikePodhorzer and contact him via email.

newest oldest most voted

“Oddly, the variable has a negative coefficient, so in this equation, a higher Pit/PA actually reduces a hitter’s xK%.”

I would think it’s likely because, in theory, once you get past 6 pitches in an PA, it means you’re fouling pitches off by definition, and if you get to two strikes before that, you’re extending the AB by either fouling off pitches or taking balls. Over your defined sample, a full 51.3% of PA had a two strike count, with batters seeing an average of 2.172 pitches after they got to 2 strikes. The average plate appearance only had 3.878 pitches for the same sample. So, if more than half of your PAs have two strikes, which also means a minimum of two pitches, once you start extending the PA, it means you either have a good eye, are good at fouling off strikes, or both.


Also, I have an xK% equation for hitters based on data from 2016-2020 that has an r^2 of .78 for hitters with 50 PA this year, and it’s only based on 22.3% of the pitches the average batter sees. My pitcher one has a .72 r^2 for pitchers with at least 50 batters faced, based on the same pitches. BB% is tougher, my r^2s are only .64 and .53 for hitters and pitchers respectively, and it’s based on 53.6% of pitches.