Pitch Sequencing and Pitcher xBB%: We’re Getting There

February 25, 2015

I expected to follow up my xK% differential post from last week with a complementary xBB% differential post. For those who don’t enjoy surprises, I’ll let you know now that that didn’t happen. In its stead, I bring what I hope is good news — news that will not only influence a future xBB% differential post but also may impact general pitcher analysis henceforth and possibly international diplomacy.

The title of this post, however, is a tad misleading. I think I can say, with some degree of certainty — and I hope to demonstrate, with some degree of competency — that pitch sequencing indeed plays a role in a pitcher’s walk rate, as the devilishly handsome Mike Podhorzer has postulated. What I can’t describe, with any degree of certainty, is the magnitude of the role it plays. In truth, I desperately want to prove Mike wrong: there must be other factors, outside of pitch sequencing (and pitch framing, perhaps), that help explain a pitcher’s walk rate. For example, I have tried incorporating O-Swing% and Zone%, two PITCHf/x metrics provided by FanGraphs that I swore would fill in the cracks, but they offer little in the way of additional explanatory power.

Undeterred, I revisited the data already available to me. And that’s when I saw it: percentage of counts that reached three balls, no strikes (“3-0%”). It was love at first sight. To poorly segue, I’ll quote Mike, from his original xK% post back in 2013:

If a pitcher throws 16 balls all game, but they all come in a row, he has walked four batters. Yet if he pitched seven innings and threw 100 pitches, only 16 balls is one heck of a ratio and would not normally match up with four walks. So sequencing is important, but only if there is a real difference in ability between pitchers.

This is essentially what 3-0% measures in shorthand. A great pitcher ideally would limit the number of plate appearances in which he sees a 3-0 count — a count that, unsurprisingly, correlates pretty strongly with walk rate (BB%) and strike rate (Str%). However, the multicollinearity (the degree to which variables move with one another) is not as strong as I expected — not nearly as strong as the collinearity between strikeout rate (K%) and strikes put into play (I/Str), two other components of the equation. Until I have access to (or can compile) better cross-sectional pitch count data, I will have to settle for using 3-0% as a proxy for pitch-sequencing skill, and I’m OK with that.

In an attempt to exactly replicate Mike’s work, I used Baseball Reference pitch data from 2008 through 2012, limiting the sample to pitchers who notched at least 50 innings in a season. For whatever reason, my coefficients and R-squared differed (very slightly) from those produced by his model, despite the data set and model specifications being identical. (Mike and I are stumped by it.) Because of the incongruence, I chose to expand the data to include the 2013 and 2014 seasons, and I changed the threshold to 1,000 pitches thrown. Using total pitches, instead of innings, disregards a pitcher’s efficiency; this is simply a matter of preference on my part. (For reference, 1,000 pitches loosely equates to 60 innings pitched, on average.) Without further ado:

xBB% = 0.598 — 0.264*K% — 0.595*I/Str — 0.494*Str% + 0.515*(3-0%)

The model’s adjusted R-squared improves from 0.7515* to 0.8209. It’s a marginal improvement — about seven points — but it’s the first time I’ve seen FanGraphs (or anyone) achieve an adjusted R-squared in the .80s, so I’m happy. Increasing the innings-pitched (or pitch) threshold to 75 IP (~1,250 pitches) improves the adjusted R-squared another two points, but it also ignores many individual seasons by relief pitchers, which would deviate from the goal of this exercise.

To reiterate: the xBB% metric could be helpful to a fantasy owner looking to identify pitchers due to regress. Salient examples include 2014’s most extreme outliers, Kevin Gausman (whose xBB% exceeds his 8.0 BB% by 2.4 percentage points) and A.J. Ramos (whose xBB% undercuts his 15.9 BB% by 3.8 percentage points). There is merit to knowing how a pitcher’s BB% annually performs against his xBB%, however. Consider the previous sentence a teaser for next week’s post, which will pair well with my inaugural work regarding a pitcher’s K%-to-xK% differential and a nice pinot noir.

*Because of the inexplicable incongruence between Mike’s and my models, my adjusted R-squared was slightly lower than his, which was .7697.

13 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Mike PodhorzerFanGraphs Staff

10 years ago

Awesome! Do you think the correlation between K% and I/Str is an issue in both our equations?

Alex ChamberlainFanGraphs Staff

Reply to Mike Podhorzer

Potentially. But I also think it is a necessary evil for now.. I think if somehow effectively incorporate pitch sequencing into the model, it could displace K% all together. In the meantime, K% sort of serves as a proxy for pitcher effectiveness on the opposite side of the spectrum (it is more impactful than incorporating 0-2%).

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG