Updating My Hitter xK% Metric by Mike Podhorzer July 1, 2021 A whopping eight years ago, I shared the hitter xK% metric I developed using a couple of our plate discipline metrics. It was quite good, using only three variables, but still had a strong R-squared of 0.81. Since then, I haven’t discussed it all that much, but still use it to help formulate my Pod Projections. However, I have actually been using an updated version that I had never shared and it’s even better. The comments on my recent xwOBA articles inspired me to finally reveal the latest and greatest version of the hitter xK% metric. If you recall, I’ve shared several versions of my pitcher xK%, with this one being the latest. It then dawned on me that perhaps I should try using the same variables for the hitter version. Sure enough, it worked, leading to an even higher R-squared. Now as I type this, I have come to the realization that I actually updated this pitcher xK% formula as well and haven’t shared it, and the newest hitter xK% uses the same variables as the newest, unshared, version of the pitcher xK%. So let’s just get to the equation and I’ll explain. My data set included all batter seasons with at least 50 plate appearances from 2014-2018, with a total population of 2,530. Hitter xK% = -0.65909 + (L/Str * 1.774892871) + (S/Str * 1.989362763) + (F/Str * 1.58423575) + (3-0% * -0.156090822) + (Pit/PA * -0.097239383) Adjusted R-Squared: 0.941 All metrics are from Baseball-Reference. L/Str — Looking Strike Percentage | Strikes Looking / Total Strikes S/Str — Swinging Strike Percentage | Strikes Swinging Without Contact / Total Strikes F/Str — Foul Ball Strike Percentage | Pitches Fouled Off / Total Strikes 3-0% — 3-0 Count Seen Percentage | 3-0 Counts / Plate Appearances Pit/PA — Pitches Per Plate Appearance | Pitches Seen / Plate Appearances Can you really get much higher than a 0.941 R-squared? There will always be exceptions and consistent beaters and missers, but this seems to be as slam dunk an equation as it could possibly get. Like in my original pitcher xK% equation, I am using the three strike type rates as variables, as they perform better split up than overall strike percentage, or just one or two of the trio. I have also added the last two variables, which haven’t been shared in any of my previous equations. 3-0% is important because it accounts for sequencing, which is something that I always knew was missing, but was impossible to factor in with any other variable. If over two plate appearances, a hitter sees four balls and four strikes, the sequencing of those pitches will have a significant effect on the outcomes of those PAs. If those four balls were thrown consecutively during one PA, he’s going to trot to first base. Instead, if those balls and strikes alternated, any outcome is possible. 3-0% might not be a batter skill (I haven’t looked into its YoY correlation), but since xK% is meant to be a backward looking, descriptive metric, and not predictive, it doesn’t matter as much. Pit/PA is an obvious driver of strikeouts — the more pitches a batter sees, the more opportunities he has to strike out. It has a 0.378 correlation with strikeout rate between 2014 and 2018. Oddly, the variable has a negative coefficient, so in this equation, a higher Pit/PA actually reduces a hitter’s xK%. That doesn’t make any sense to me, so maybe you mathematically smarter readers could help explain. The equation is much better with the metric included though, so I’m not entirely sure what’s going on, but I wanted to keep it in given its high correlation with strikeout rate. Correlations With K% from 2014-2018 L/Str S/Str F/Str 3-0% Pit/PA -0.095 0.808 -0.111 -0.137 0.378 It’s not just Pit/PA that features a coefficient opposite its correlation with strikeout rate. L/Str and F/Str both correlate negatively, but have positive coefficients in the equation. I assume it’s the way regression equations are derived and the interaction between all the variables that produces coefficients that aren’t always what you expect. It’s clear from the correlation table that S/Str has the largest impact on a batter’s strikeout rate. The same is true from the pitcher’s perspective. I’m not sure why it’s so much higher than L/Str, as both strike types could result in strike three, so it probably relates to the types of other skills a batter/pitcher typically owns when they have high/low S/Str vs L/Str marks. In other words, if a hitter has a high S/Str, he likely does other things that correlate with a higher strikeout rate, whereas that’s not the case when a hitter has a high L/Str. That’s just my guess in trying to understand these correlations.