Pitch Sequencing and Pitcher xBB%: We’re Getting There

I expected to follow up my xK% differential post from last week with a complementary xBB% differential post. For those who don’t enjoy surprises, I’ll let you know now that that didn’t happen. In its stead, I bring what I hope is good news — news that will not only influence a future xBB% differential post but also may impact general pitcher analysis henceforth and possibly international diplomacy.

The title of this post, however, is a tad misleading. I think I can say, with some degree of certainty — and I hope to demonstrate, with some degree of competency — that pitch sequencing indeed plays a role in a pitcher’s walk rate, as the devilishly handsome Mike Podhorzer has postulated. What I can’t describe, with any degree of certainty, is the magnitude of the role it plays. In truth, I desperately want to prove Mike wrong: there must be other factors, outside of pitch sequencing (and pitch framing, perhaps), that help explain a pitcher’s walk rate. For example, I have tried incorporating O-Swing% and Zone%, two PITCHf/x metrics provided by FanGraphs that I swore would fill in the cracks, but they offer little in the way of additional explanatory power.

Undeterred, I revisited the data already available to me. And that’s when I saw it: percentage of counts that reached three balls, no strikes (“3-0%”). It was love at first sight. To poorly segue, I’ll quote Mike, from his original xK% post back in 2013:

If a pitcher throws 16 balls all game, but they all come in a row, he has walked four batters. Yet if he pitched seven innings and threw 100 pitches, only 16 balls is one heck of a ratio and would not normally match up with four walks. So sequencing is important, but only if there is a real difference in ability between pitchers.

This is essentially what 3-0% measures in shorthand. A great pitcher ideally would limit the number of plate appearances in which he sees a 3-0 count — a count that, unsurprisingly, correlates pretty strongly with walk rate (BB%) and strike rate (Str%). However, the multicollinearity (the degree to which variables move with one another) is not as strong as I expected — not nearly as strong as the collinearity between strikeout rate (K%) and strikes put into play (I/Str), two other components of the equation. Until I have access to (or can compile) better cross-sectional pitch count data, I will have to settle for using 3-0% as a proxy for pitch-sequencing skill, and I’m OK with that.

In an attempt to exactly replicate Mike’s work, I used Baseball Reference pitch data from 2008 through 2012, limiting the sample to pitchers who notched at least 50 innings in a season. For whatever reason, my coefficients and R-squared differed (very slightly) from those produced by his model, despite the data set and model specifications being identical. (Mike and I are stumped by it.) Because of the incongruence, I chose to expand the data to include the 2013 and 2014 seasons, and I changed the threshold to 1,000 pitches thrown. Using total pitches, instead of innings, disregards a pitcher’s efficiency; this is simply a matter of preference on my part. (For reference, 1,000 pitches loosely equates to 60 innings pitched, on average.) Without further ado:

pitcher BB% vs xBB%

xBB% = 0.598 — 0.264*K% — 0.595*I/Str — 0.494*Str% + 0.515*(3-0%)

The model’s adjusted R-squared improves from 0.7515* to 0.8209. It’s a marginal improvement — about seven points — but it’s the first time I’ve seen FanGraphs (or anyone) achieve an adjusted R-squared in the .80s, so I’m happy. Increasing the innings-pitched (or pitch) threshold to 75 IP (~1,250 pitches) improves the adjusted R-squared another two points, but it also ignores many individual seasons by relief pitchers, which would deviate from the goal of this exercise.

To reiterate: the xBB% metric could be helpful to a fantasy owner looking to identify pitchers due to regress. Salient examples include 2014’s most extreme outliers, Kevin Gausman (whose xBB% exceeds his 8.0 BB% by 2.4 percentage points) and A.J. Ramos (whose xBB% undercuts his 15.9 BB% by 3.8 percentage points). There is merit to knowing how a pitcher’s BB% annually performs against his xBB%, however. Consider the previous sentence a teaser for next week’s post, which will pair well with my inaugural work regarding a pitcher’s K%-to-xK% differential and a nice pinot noir.


*Because of the inexplicable incongruence between Mike’s and my models, my adjusted R-squared was slightly lower than his, which was .7697.

Currently investigating the relationship between pitcher effectiveness and beard density. Two-time FSWA award winner, including 2018 Baseball Writer of the Year, and 8-time award finalist. Previously featured in Lindy's Sports' Fantasy Baseball magazine (2018, 2019). Tout Wars competitor. Biased toward a nicely rolled baseball pant.

newest oldest most voted
Mike Podhorzer

Awesome! Do you think the correlation between K% and I/Str is an issue in both our equations?