xK%, History and Speculating on Dellin Betances by Alex Chamberlain February 18, 2015 I’d like to talk to you about Dellin Betances. Wait! Wait. No. No, I wouldn’t. I’d like to talk about Mike Podhorzer first. Mike has published a lot of great work covering the fundamentals of the xK% (and xBB%) metric for pitchers (and hitters), so if you are unfamiliar with or falling behind on his work, I recommend you first click here, here or here. But if you’re lazy, the short of it is: xK%, or expected strikeout rate, is an equation birthed from a linear regression that measures how a pitcher’s looking, swinging and foul-ball strike rates as well as overall strike percentage correlates with his strikeout rate. It doesn’t predict future strikeout rates as much as it retrospectively adjusts past strikeout rates; thus, it is a good tool for identifying pitchers who potentially benefited (or suffered) from good (bad) luck in a previous season – say, 2014. Like many other metrics completely unrelated to xK%, however, there is evidence that certain players consistently out-perform (or under-perform) what their xK% rates predict their actual K% rates should be. (Mike alludes to this trend in his quip about Jeremy Hellickson, a xK% underachiever, in one of the articles linked above.) Similarly to how a power hitter will post consistently higher ratios of home runs to fly balls (HR/FB) than a non-power hitter, or how Mike Trout will probably post some of the highest batting averages on balls in play (babip) in the league for years to come, it appears there is some skill, or perhaps a particular characteristic, inherent to pitchers who consistently best, or fall short of, their xK% rates. Kevin Correia is a particularly salient example. He who has underwhelmed fantasy owners for years has also notched significantly positive margins between his K% and xK% (which I will henceforth refer to as the “differential”) for the past four years: 2011: +2.5% (aka 2.5 percentage points better than his xK%) 2012: +2.5% 2013: +2.4% 2014: +2.3% Disclaimer: Although I am using the same fundamental equation as Mike, I keep on hand a slightly different data set in terms of longitude, and I use slightly different qualification thresholds. Thus, my coefficients vary, albeit minimally, from those of Mike’s model. I’m wary to establish a definitive measure of consistency, but it’s clear Correia hasn’t left much room for year-to-year variance. There are a handful of elite starters who fit the bill as well. Adam Wainwright and Felix Hernandez have outperformed their annual xK% every year in which they threw at least 500 pitches dating back to 2005; Cliff Lee has achieved the feat annually since 2008. On the flip side of the coin, Homer Bailey has underperformed his xK% in each professional season at the major league level, as has Cole Hamels in every year but one. Thus, calculating a pitcher’s xK% and comparing it to his K% is not enough; it’s important to know how his K% fares historically compared to his xK%. This leaves us in difficult situation when we evaluate the differential for pitchers who debuted or broke out last year. What does James Paxton’s +1.7% differential actually mean? Or Matt Shoemaker’s +1.3%, or Masahiro Tanaka’s +1.2%? To play it safe, I would expect all of them to regress, as their differentials are relatively small (two-thirds of 2014’s differentials fall between +/-1.5%). Which brings me back to Dellin Betances. He recorded a +5.8% differential in 2014, the fourth-highest single-season differential for any pitcher who threw at least 500 pitches in the last 10 years. That kind of statistic screams regression, but a closer look at the data may help us better understand what’s going on. To start, the rest of the names on the top-10 list of which Betances finished fourth looks as follows: 1. Craig Kimbrel, +6.6% (2012) 2. Chien-Ming Wang, +6.1% (2005) 3. Aroldis Chapman, +5.9% (2014) 4. Dellin Betances, +5.8% (2014) 5. Brandon League, +5.2% (2005) 6. J.J. Putz, +4.7% (2007) 7. Aroldis Chapman, +4.7% (2012) 8. Craig Kimbrel, +4.6% (2013) 9. Andrew Miller, +4.6% (2014) 10. Neftali Feliz, +4.6% (2009) The broad trend here is fairly obvious: every entry on the list but Wang is a relief pitcher. (An aside: Wang’s and League’s appearances are especially hilarious, given their year-end strikeout rates were 9.7 percent and 10.5 percent, respectively. Must’ve been something in water in 2005. The names that follow them that year: prime Mariano Rivera, Roy Halladay and Francisco Rodriguez.) Moreover, half the spots on the list are owned by elite relievers such as Kimbrel, Chapman and Miller. Whether or not you call Betances elite at this point is a matter of personal preference, but for the sake of argument, I am willing to consider him among the elite for now. Double-moreover, I ask you to please turn your attention to this sampling of names from 2014’s list of top-20 differentials, which could speak for itself if it knew how to use words: 5. David Robertson, +4.4% 11. Brad Boxberger, +3.2% 13. Wade Davis, +3.1% 14. Ken Giles, +2.8% 16. Zach Duke, +2.8% 18. Sean Doolittle, +2.7% 19. Jake McGee, +2.7% Whatever you may think about the true abilities of these pitchers, positive differentials seem to favor better-than-average pitchers who get to blow hitters away in brief intervals. So while I think Betances’ actual K% stands to regress in 2015, I wouldn’t expect it to fall by 5.8 percentage points, or maybe even half that much. Similarly, Kimbrel’s 2014 differential of +1.6% is the lowest of his career and a solid 2.4 percentage points below his 4-year average differential. It’s about understanding each pitcher’s history. It’s a lot to ask to mentally retain performance data for every name, but if you decide to calculate a pitcher’s xK% sometime during next year, I simply recommend you also look a couple of years back, too, to get a feel for his track record. For most, there won’t be one; it’ll be an unintelligible sequence of positive and negative numbers of all magnitudes. But, occasionally, some semblance of uniformity will appear. With that said, it would behoove me to run the regression separately for starters and relievers, as there appears to be evidence of distinct K%-to-xK% trends between starters and relievers. It would doubly behoove me figure out why a certain few pitchers consistently over- or under-perform their differentials while most others do not. What is the common factor there? In the meantime, and for the sake of not burdening you with long lists of numbers, you can view the data behind the analysis in this Excel document, which lists individual season differentials dating back to 2010 for each pitcher who threw at least 500 pitches in 2014. It is all sorted by the column “5t,” which is my consistency measure for each pitcher with five years’ worth of data; “3t” is for 2012 through 2014. The higher the score, the more consistent (lowest score is zero). Pitchers with only 2014 data will not have a t-score and, thus, be listed farther down the list. If a pitcher is missing completely, remember that he maybe has yet to achieve 500 career major-league pitches (looking at you, Aaron Sanchez).