Pitch Sequencing and Pitcher xBB%: We’re Getting There by Alex Chamberlain February 25, 2015 I expected to follow up my xK% differential post from last week with a complementary xBB% differential post. For those who don’t enjoy surprises, I’ll let you know now that that didn’t happen. In its stead, I bring what I hope is good news — news that will not only influence a future xBB% differential post but also may impact general pitcher analysis henceforth and possibly international diplomacy. The title of this post, however, is a tad misleading. I think I can say, with some degree of certainty — and I hope to demonstrate, with some degree of competency — that pitch sequencing indeed plays a role in a pitcher’s walk rate, as the devilishly handsome Mike Podhorzer has postulated. What I can’t describe, with any degree of certainty, is the magnitude of the role it plays. In truth, I desperately want to prove Mike wrong: there must be other factors, outside of pitch sequencing (and pitch framing, perhaps), that help explain a pitcher’s walk rate. For example, I have tried incorporating O-Swing% and Zone%, two PITCHf/x metrics provided by FanGraphs that I swore would fill in the cracks, but they offer little in the way of additional explanatory power. Undeterred, I revisited the data already available to me. And that’s when I saw it: percentage of counts that reached three balls, no strikes (“3-0%”). It was love at first sight. To poorly segue, I’ll quote Mike, from his original xK% post back in 2013: If a pitcher throws 16 balls all game, but they all come in a row, he has walked four batters. Yet if he pitched seven innings and threw 100 pitches, only 16 balls is one heck of a ratio and would not normally match up with four walks. So sequencing is important, but only if there is a real difference in ability between pitchers. This is essentially what 3-0% measures in shorthand. A great pitcher ideally would limit the number of plate appearances in which he sees a 3-0 count — a count that, unsurprisingly, correlates pretty strongly with walk rate (BB%) and strike rate (Str%). However, the multicollinearity (the degree to which variables move with one another) is not as strong as I expected — not nearly as strong as the collinearity between strikeout rate (K%) and strikes put into play (I/Str), two other components of the equation. Until I have access to (or can compile) better cross-sectional pitch count data, I will have to settle for using 3-0% as a proxy for pitch-sequencing skill, and I’m OK with that. In an attempt to exactly replicate Mike’s work, I used Baseball Reference pitch data from 2008 through 2012, limiting the sample to pitchers who notched at least 50 innings in a season. For whatever reason, my coefficients and R-squared differed (very slightly) from those produced by his model, despite the data set and model specifications being identical. (Mike and I are stumped by it.) Because of the incongruence, I chose to expand the data to include the 2013 and 2014 seasons, and I changed the threshold to 1,000 pitches thrown. Using total pitches, instead of innings, disregards a pitcher’s efficiency; this is simply a matter of preference on my part. (For reference, 1,000 pitches loosely equates to 60 innings pitched, on average.) Without further ado: xBB% = 0.598 — 0.264*K% — 0.595*I/Str — 0.494*Str% + 0.515*(3-0%) The model’s adjusted R-squared improves from 0.7515* to 0.8209. It’s a marginal improvement — about seven points — but it’s the first time I’ve seen FanGraphs (or anyone) achieve an adjusted R-squared in the .80s, so I’m happy. Increasing the innings-pitched (or pitch) threshold to 75 IP (~1,250 pitches) improves the adjusted R-squared another two points, but it also ignores many individual seasons by relief pitchers, which would deviate from the goal of this exercise. To reiterate: the xBB% metric could be helpful to a fantasy owner looking to identify pitchers due to regress. Salient examples include 2014’s most extreme outliers, Kevin Gausman (whose xBB% exceeds his 8.0 BB% by 2.4 percentage points) and A.J. Ramos (whose xBB% undercuts his 15.9 BB% by 3.8 percentage points). There is merit to knowing how a pitcher’s BB% annually performs against his xBB%, however. Consider the previous sentence a teaser for next week’s post, which will pair well with my inaugural work regarding a pitcher’s K%-to-xK% differential and a nice pinot noir. *Because of the inexplicable incongruence between Mike’s and my models, my adjusted R-squared was slightly lower than his, which was .7697.