Modeling SwStr% and GB% Using Velocity and Movement

This year, I’ve been caught up on pitching. I investigated the nuance inherent to swinging strikes, indirectly made a case for completely abandoning the sinker with this piece comparing pitch type outcomes, and (maybe) identified the keys to unlocking pitcher BABIP and HR/FB.

Here, I’ve modeled swinging strike and ground ball rates using only pitch velocity movement. Surely, this work can be improved; my quantitative tool set, while fairly robust compared to the layman, is meager compared to the professional or even hobbyist statistician. Regardless, I think it’s pretty cool, and I hope it adds to the conversation constructively.

Mostly, this serves to satiate my own curiosity. Unfortunately, it may be denser than I expected — few answers are ever quite as simple as you hope them to be, I guess.

Existing Research

I linked to several of my own pieces above. Dan Lependorf wrote about estimating ground ball rates in 2013 at the Hardball Times, although its conclusions have an anecdotal slant. (It thinks about velocity and movement but doesn’t take the requisite steps to bridge the logic.)

Per the Athletic’s (and formerly FanGraphs’) Eno Sarris and an assist from Baseball Prospectus’ Harry Pavlidis, “sizzling power curves get the whiffs” (also, “there was a decent, if not robust, relationship between horizontal movement and strikeouts”). Pavlidis’ three-part series on change-up composition can be found here, by the way, which describes change-pieces that draw whiffs are not the same as those that induce grounders (“a fastball with plus velocity and a sizable gap (10+ MPH) between the heater and the changeup make for more missed bats with the offspeed pitch, while a smaller gap helps the pitcher to induce more ground balls”).

Eno also talked about sliders here (did you know Eno likes pitching?), where he notes that drop and velocity are most important for incurring whiffs and grounders, although the relationships are not super strong. He includes a regression mode (unbeknownst to me before engaging in this endeavor) very similar to those that follow. I’m not surprised by this overlap; intense baseball fans understand the importance of these variables.

Jonah Pemstein did some high-quality analysis of spin rates here; it’s somewhat tangential, as spin rate evidently affects movement, which (as I hope to demonstrate) affects whiff rate and batted ball outcomes.

And, lastly, as I write this, Eno is compiling lists of 2018’s best pitches by pitch (FS) (CH) (CU), for your reference.

Background/”Methodology”

The following modeling efforts use PITCHf/x data from 2014 through 2017 rolled up to the year-name-pitch-role level, such that each observation appears as follows:

  • Year, Name, Pitch Type, Role (SP/RP)*…
  • 2017, Clayton Kershaw, slider, SP, …
  • 2017, Clayton Kershaw, curve, SP, …
  • etc.

(*The data is parsed by pitcher role in light of how “stuff” plays up differently in relief.)

Based on a cursory inspection of Jonah and Sean Dolinar’s long-needed update on reliability, I set a cutoff point of 250 pitches to omit egregiously small pitch counts without excessively whittling down my sample (n = 3,652).

I originally intended to specify separate models for each pitch type but, instead, concluded that formal pitch type designations shouldn’t matter: a pitch is a pitch, and its movement and velocity is its own. It doesn’t matter what it’s called, and it seems to me labels might only serve to limit or dilute the analysis.* How all of a pitch’s components exist and interact should affect its outcomes and nothing else. (*I confirmed with Eno that some pitches are classified simply on grip rather than flight properties, which seems to me a nonideal data quirk for this exercise.)

I settled on the following independent variables for my model:

  • veloi: velocity (mph)
  • h-movi: horizontal movement (in.)
  • v-movi: vertical movement (in.)
  • The interaction between velocity * horizontal movement
  • The interaction between velocity * vertical movement
  • The interaction between vertical movement * horizontal movement
  • veloFB: primary fastball velocity
    • FB denotes the fastball thrown most often by that pitcher in a given season chosen among whichever ones he throws: four-seam, sinker, cutter (for these purposes).
    • I chose this approach to avoid instances in which a pitcher threw, say, only one true fastball but mostly relies on junk (2017 Alex Claudio being the most salient, immediately memorable example). Alternate approaches included (1) velocity of fastest pitch (regardless of how often it’s thrown) and (2) weighted-average velocity of all fastball types (which, frankly, might be better, but I don’t know).
    • Note that Eno looked at velocity differential in his aforementioned 2015 piece on sliders.
  • The interaction of velocity * primary fastball velocity

… where i represents the pitch type in question.

I considered running a second iteration of the model with additional fastball specs and interactions but feared over-fitting the model without adding much explanatory power to it (which turned out to be true).

LET’S GET TO THE GOODS.

Here’s the model for swinging strike rate (SwStr%)…

SwStr% Coefficients
Variable Value Note
veloi -0.06060 ***
H-movi -0.04816 ***
V-movi +0.03860 ***
veloi * H-movi +0.00057 ***
veloi * V-movi -0.00044 ***
H-movi * V-movi +0.00002
veloFB -0.04188 ***
veloi * veloFB +0.00061 ***
constant +4.37107 ***
adjusted R2 0.532
SOURCE: PITCHf/x
Statistically significant at…
* 90% confidence
** 95% confidence
*** 99% confidence

… and ground ball rate (GB%).

GB% Coefficients
Variable Value Note
veloi -0.04619 ***
H-movi -0.03640 ***
V-movi +0.03128 ***
veloi * H-movi +0.00030 ***
veloi * V-movi -0.00069 ***
H-movi * V-movi +0.00041 ***
veloFB -0.05894 ***
veloi * veloFB +0.00065 ***
constant +4.80310 ***
adjusted R2 0.523
SOURCE: PITCHf/x
Statistically significant at…
* 90% confidence
** 95% confidence
*** 99% confidence

The following table summarizes the mean whiff and ground ball rates before and after the modeling:

Actual and Estimated Means
Pitch Type SwStr% xSwStr% GB% xGB%
Fourseam 8.8% 8.4% 35.8% 37.8%
Sinker 5.9% 7.3% 55.0% 51.8%
Change 17.4% 16.6% 50.6% 48.9%
Slider 17.4% 16.2% 45.7% 47.5%
Curve 13.4% 14.4% 51.6% 51.3%
Splitter 18.7% 15.6% 54.0% 51.9%
Average 11.2% 11.2% 45.2% 45.2%
SOURCE: PITCHf/x
For all pitches thrown at least 250 times in a year from 2014-17.

Note the discrepancies by pitch type. Given the somewhat subjective (and oftentimes confusing or conflicting) nature of classifying pitches, it’s not entirely surprising to me the regression, blind to pitch classifications, estimates somewhat discrepant rates for each pitch type. The model was pretty forgiving on ground balls: regressed outcomes ranged from 21% to 86%, whereas actual outcomes ranged from 12% to 85%. On the whiffs side, it was less so: regressed outcomes ranged from -3% to 27% compared to actual outcomes of 1% to 35%. But the model must be broken if it produces a negative whiff rate! I disagree; I blame Joe Saunders for throwing such a spectacularly bad sinker (and Jamey Carroll, too, for his equally miserable four-seamer) in 2014. They probably deserved to give back strikes for throwing those pitches so often.

Interpretation

It’s deliberately difficult to disentangle interpretations of the model coefficients. You shouldn’t evaluate the effects of velocity or movement in isolation because the model specification prohibits it. Every interaction has to be considered jointly. For example, in looking at only primary pitch velocity, fastball velocity (which can also be the primary pitch, by the way), and their interaction (aka their differential), more velocity and higher differentials are better. In this instance, the model rewards 100 mph fastballs most heavily, which we know is fundamentally untrue, given the superiority of off-speed and breaking pitches regarding swinging strikes. That’s why it’s important to look at other interactions such as those of primary pitch velocity and movement, which heavily punish velocity to adjust for the effectiveness of non-fastball offerings. You would need some kind of four-dimensional visualization to understand the full extent of how every interaction cooperates with one another in regard to both whiffs and ground balls. Don’t try to assess any specific variable or interaction in a vacuum.

There’s one insignificant variable, specific to the whiff rate model: the interaction between horizontal and vertical movement. Again, given all the other interactions, it doesn’t mean movement isn’t important — it’s just that this particular interaction effectively lends nothing in the way of explanatory power.

Notes

Correlation matrices suggested that, indeed, velocity and vertical movement are moderately correlated with one another. Variable inflation factors (VIFs) seemed to confirm my suspicions; the variance of the estimated coefficients might’ve ranged anywhere from 50% to 350% wider than in the absence of the multicollinearity. However, interactions also exacerbate multicollinearity. I can’t, in good faith, argue in favor of removing one or more pitch characteristics to appease this issue (although I’m open to being convinced otherwise). Even if velocity affects movement (and/or vice versa), no two pitches are consistently exactly alike. Velocity and movement interact differently for every pitch and pitcher. It would do us a disservice to assume velocity or movement is a non-factor because the other correlates strongly with it.

I considered specifying the model a number of different ways. I considered removing the interactions to simplify the model and prevent over-fitting, but I think omitting the interactions strips the model of depth; I want to know not only how velocity and horizontal movement and vertical movement affect whiff rates separately but also how velocity and a specific type of movement affect whiff rates in tandem. I considered accounting for velocity differentials using actual differentials (i.e. A minus B), again for simplicity, but I wanted to maintain each pitch’s distinct average velocity (It might, however, benefit from simply using a pitcher’s average velocity as the differential velocity for all of his pitch types.) I considered combining each movement variable into a single vector (Pythagoras would be so proud of me), but it would have reduced four quadrants of movement to one, which is too much condensing for my tastes. I considered year fixed effects, but they added virtually no value to the model despite small incremental changes to whiff rates over recent years. I considered a nonlinear approach, and I considered completely different statistical approaches (partial least squares or principal component analysis).

This model, explaining more than half the variance in whiff and ground ball rates, is clearly missing some important variables — namely, pitch location (i.e. command). Two pitches thrown completely identically will produce different outcomes when spotted perfectly on the outside corner or grooved down the middle. Release point matters (see Eno’s change-up piece linked earlier), as does sequencing, and quality of opponent, and pitcher handedness, and more. Some of these things I omitted by choice, others from a lack of resources and certainly not out of malice.

Ultimately, I settled on this. As stated previously, I expect this to only further the ongoing discussion about how to model outcomes using actual physical pitch characteristics. (Give me your feedback!)

2018 Estimates (They’re Kind of Like “Arsenal Scores”)

Probably the most critical concern to clear up: Applying the coefficients from the SwStr% version of the model (which spans 2014-17) to 2018 inputs actually produces a higher adjusted R2 (0.552) this year than the model among pitches thrown 250 times. It produces a much lower adjusted R2 (0.198) in 2018 for the GB% version of the model. The denominator is fundamentally different for ground balls — it depends not on the number of pitches but, rather, on the number of balls in play, which happen about once every six pitches. It effectively makes the sample sizes dramatically smaller, especially at this point in the season where most pitchers are just cresting the 250-pitch mark with some of their offerings. Alas, the small samples destabilize the “xGB%” estimates quite a bit, but I will still include them here for posterity.

FanGraphs primarily relies on Pitch Info data, which power much of Brooks Baseball. This means it inherently differs from PITCHf/x data in some aspects. However, to my knowledge, Brooks Baseball also relies on some raw PITCHf/x data as well. For example, as of June 13, Max Scherzer has an 18.8% swinging strike rate according to Baseball Prospectus and Brooks Baseball but a 17.9% rate at FanGraphs, whether you use the “Plate Discipline” or “Pitch Info Plate Discipline” tabs.

Alas, the results presented below more closely mirror those you will see at the Baseball Prospectus leaderboard previously linked, which err on the side of higher whiff rates. (The league-wide whiff rate is 11.0% there compared to 10.7% at FanGraphs, as of my writing this.) Same goes for ball-in-play metrics: the ground ball rates will probably look a little funky if you’re accustomed to relying on FanGraphs’ data. (I know I am.) All told, the general sentiment remains unchanged: the best estimated swinging strike (xSwStr%) and ground ball (xGB%) rates indicate what might be considered the filthiest arsenals using only velocity and movement. Remember, these estimates are far from absolute. They omit many variables (as I mentioned earlier), including command/control, which obviously plays a huge factor in how a pitcher’s pitch mix plays up.

Results are presented at the pitcher level — there are thousands of individual pitches to account for — but I’m more than happy to answer questions about specific pitches in the comments, time and sanity permitting. That was kind of the initial goal anyway: to find the best pitch(es). Or! Use the equation coefficients for yourself!

2018 Estimates
Name Count SwStr% xSwStr% diff GB% xGB% diff
Luis Severino 1,286 13.4% 16.9% -3.5% 46.4% 45.6% 0.8%
Garrett Richards 1,156 12.7% 15.9% -3.1% 52.3% 55.2% -2.9%
Shohei Ohtani 804 15.9% 15.2% 0.7% 39.0% 40.7% -1.8%
Noah Syndergaard 1,041 15.5% 14.4% 1.1% 50.3% 47.1% 3.2%
Walker Buehler 711 10.8% 13.7% -2.9% 55.6% 43.7% 11.9%
Chris Archer 1,233 13.2% 13.6% -0.4% 44.2% 44.1% 0.1%
Jacob deGrom 1,229 16.4% 13.5% 2.9% 44.8% 43.5% 1.3%
Blake Snell 1,366 13.8% 13.5% 0.3% 40.4% 40.5% -0.1%
Jon Gray 1,204 13.6% 13.4% 0.2% 46.6% 43.3% 3.3%
German Marquez 1,170 10.3% 13.4% -3.0% 45.5% 44.5% 1.0%
Mike Foltynewicz 1,302 10.8% 13.3% -2.6% 41.5% 42.8% -1.3%
Gerrit Cole 1,339 15.1% 13.3% 1.8% 33.9% 43.8% -9.9%
Reynaldo Lopez 1,118 10.0% 13.3% -3.3% 35.1% 40.1% -5.0%
Marco Estrada 1,131 11.2% 13.3% -2.1% 27.4% 26.2% 1.1%
Michael Fulmer 1,221 11.1% 13.1% -2.1% 48.6% 47.6% 0.9%
Joe Musgrove 251 10.4% 13.0% -2.7% 48.2% 45.4% 2.8%
Daniel Gossett 389 8.7% 12.9% -4.2% 42.0% 40.0% 2.0%
Trevor Bauer 1,421 13.6% 12.9% 0.7% 46.1% 46.8% -0.8%
Domingo German 539 15.4% 12.8% 2.6% 42.2% 50.8% -8.5%
Lance McCullers 1,260 12.8% 12.8% 0.0% 56.3% 56.5% -0.2%
Miles Mikolas 1,139 10.4% 12.7% -2.3% 51.5% 47.4% 4.0%
Stephen Strasburg 1,269 12.8% 12.7% 0.1% 45.2% 45.9% -0.7%
Jameson Taillon 1,125 10.7% 12.6% -1.9% 51.5% 45.8% 5.7%
Yu Darvish 739 11.4% 12.5% -1.1% 39.0% 42.6% -3.6%
Masahiro Tanaka 1,041 14.7% 12.4% 2.3% 47.2% 45.9% 1.2%
Charlie Morton 1,224 12.9% 12.4% 0.6% 52.7% 51.3% 1.5%
Max Scherzer 1,369 18.8% 12.3% 6.5% 35.2% 44.9% -9.7%
Vincent Velasquez 1,159 12.7% 12.2% 0.4% 40.6% 44.2% -3.6%
Carlos Carrasco 1,160 14.7% 12.2% 2.5% 43.5% 49.4% -5.9%
Luis Castillo 1,286 15.3% 12.2% 3.1% 46.4% 52.5% -6.0%
Brandon Woodruff 270 9.3% 12.2% -2.9% 53.8% 43.9% 9.9%
Fernando Romero 648 12.0% 12.1% 0.0% 46.4% 55.9% -9.5%
Zach Eflin 570 10.2% 12.0% -1.9% 33.0% 45.2% -12.1%
Michael Wacha 1,258 10.8% 12.0% -1.2% 44.8% 38.1% 6.6%
David Hess 451 10.4% 12.0% -1.6% 39.4% 35.2% 4.1%
Jacob Faria 803 9.3% 12.0% -2.7% 34.0% 35.9% -1.8%
Jaime Barria 596 13.3% 11.9% 1.3% 40.0% 36.7% 3.3%
Luke Weaver 1,186 9.9% 11.9% -2.0% 42.3% 41.7% 0.6%
Jordan Lyles 550 9.8% 11.8% -2.0% 46.1% 50.7% -4.6%
Anibal Sanchez 423 9.5% 11.8% -2.4% 51.4% 38.4% 12.9%
Robbie Ray 505 14.7% 11.8% 2.9% 38.3% 41.0% -2.7%
Ross Stripling 527 12.3% 11.7% 0.6% 54.7% 41.3% 13.3%
Jose Urena 1,205 9.7% 11.7% -1.9% 51.4% 46.7% 4.6%
Tyler Anderson 1,123 12.3% 11.7% 0.6% 34.3% 35.1% -0.8%
Zack Wheeler 1,035 11.3% 11.6% -0.3% 46.7% 39.9% 6.8%
Blaine Hardy 421 9.0% 11.6% -2.6% 37.1% 37.7% -0.6%
Chad Kuhl 1,213 10.4% 11.6% -1.2% 37.6% 45.9% -8.3%
Tyson Ross 1,287 10.3% 11.6% -1.3% 45.5% 44.7% 0.8%
Kevin Gausman 1,276 13.7% 11.6% 2.1% 49.2% 41.2% 7.9%
Sonny Gray 1,068 10.1% 11.6% -1.5% 46.5% 45.9% 0.6%
James Paxton 1,341 14.8% 11.5% 3.2% 37.2% 43.1% -5.9%
Chris Sale 1,407 16.5% 11.5% 4.9% 42.9% 52.2% -9.4%
Jack Flaherty 673 12.5% 11.5% 1.0% 42.5% 43.1% -0.6%
Matt Boyd 1,121 10.4% 11.5% -1.1% 30.6% 38.4% -7.8%
Corey Kluber 1,248 11.5% 11.5% 0.1% 50.0% 47.3% 2.7%
Danny Duffy 1,388 10.5% 11.5% -1.0% 31.3% 38.9% -7.7%
Jordan Montgomery 453 11.0% 11.4% -0.4% 45.7% 38.4% 7.3%
Mike Minor 1,127 10.9% 11.4% -0.5% 38.7% 37.9% 0.8%
Marcus Stroman 651 9.8% 11.4% -1.6% 60.5% 51.9% 8.6%
Kenta Maeda 784 15.7% 11.4% 4.3% 37.0% 40.8% -3.9%
Nick Kingham 552 12.7% 11.3% 1.4% 39.0% 42.9% -3.9%
Tyler Chatwood 1,089 8.4% 11.3% -2.8% 53.6% 42.5% 11.1%
Kyle Freeland 1,144 9.8% 11.2% -1.4% 49.5% 45.9% 3.6%
Wade LeBlanc 537 10.1% 11.2% -1.2% 33.6% 45.3% -11.7%
Nick Pivetta 1,157 12.6% 11.2% 1.4% 42.0% 42.3% -0.4%
Francisco Liriano 913 11.2% 11.2% 0.0% 45.2% 46.1% -0.9%
Johnny Cueto 477 10.9% 11.2% -0.3% 45.7% 44.3% 1.4%
Brett Anderson 253 8.7% 11.2% -2.5% 56.9% 47.8% 9.1%
Clayton Kershaw 737 12.2% 11.1% 1.1% 45.5% 35.9% 9.6%
Clay Buchholz 332 11.1% 11.1% 0.0% 32.4% 40.6% -8.3%
Frankie Montas 276 8.0% 11.1% -3.2% 41.3% 50.3% -9.0%
Nick Tropeano 813 12.2% 11.1% 1.1% 36.2% 43.4% -7.2%
Justin Verlander 1,429 14.1% 11.1% 3.1% 30.4% 37.0% -6.6%
Bryan Mitchell 519 5.8% 11.1% -5.3% 52.5% 44.5% 8.0%
Daniel Mengden 1,187 9.0% 11.1% -2.1% 40.0% 37.4% 2.6%
Matt Wisler 270 10.0% 11.1% -1.1% 26.9% 38.2% -11.3%
John Gant 264 13.3% 11.0% 2.2% 41.5% 43.1% -1.6%
Mike Clevinger 1,349 11.7% 11.0% 0.7% 44.1% 37.0% 7.0%
Lance Lynn 1,165 10.3% 11.0% -0.7% 50.6% 48.4% 2.1%
Patrick Corbin 1,242 14.7% 11.0% 3.7% 46.3% 43.0% 3.3%
Carson Fulmer 633 6.8% 11.0% -4.2% 33.7% 39.2% -5.5%
Chase Anderson 1,059 9.6% 11.0% -1.3% 35.6% 38.1% -2.5%
Carlos Martinez 852 10.7% 10.9% -0.2% 55.5% 50.3% 5.2%
Jose Berrios 1,133 13.2% 10.9% 2.4% 40.6% 43.7% -3.1%
Chad Bettis 1,181 9.9% 10.8% -0.9% 47.0% 43.9% 3.1%
Homer Bailey 1,036 8.1% 10.8% -2.7% 41.0% 42.9% -1.9%
Dylan Covey 464 9.5% 10.8% -1.4% 61.7% 47.9% 13.8%
Jordan Zimmermann 502 10.8% 10.8% -0.1% 30.9% 41.7% -10.8%
Mike Soroka 254 12.2% 10.8% 1.4% 42.9% 52.5% -9.7%
Jeremy Hellickson 610 10.2% 10.8% -0.6% 48.3% 42.9% 5.4%
Derek Holland 1,080 9.3% 10.8% -1.5% 38.6% 42.1% -3.5%
Jake Junis 1,248 11.4% 10.8% 0.6% 40.9% 44.1% -3.3%
James Shields 1,245 10.1% 10.8% -0.6% 39.5% 46.0% -6.5%
Andrew Suarez 763 8.5% 10.7% -2.2% 49.3% 42.4% 6.8%
Trevor Williams 1,147 7.8% 10.7% -2.9% 41.8% 43.2% -1.5%
Adam Plutko 264 8.7% 10.7% -2.0% 30.9% 32.2% -1.3%
Eduardo Rodriguez 1,210 12.9% 10.7% 2.2% 41.6% 46.5% -4.9%
Yonny Chirinos 354 11.6% 10.7% 0.9% 44.9% 50.5% -5.6%
Sal Romano 1,134 6.7% 10.7% -4.0% 46.3% 47.0% -0.7%
Trevor Cahill 746 13.4% 10.6% 2.8% 60.3% 54.6% 5.8%
David Price 1,114 10.1% 10.6% -0.5% 41.2% 45.4% -4.2%
Aaron Nola 1,261 11.9% 10.5% 1.4% 54.2% 48.1% 6.1%
Kyle Gibson 1,254 12.2% 10.5% 1.7% 49.2% 43.9% 5.4%
Hyun-jin Ryu 462 11.3% 10.5% 0.8% 55.9% 42.4% 13.5%
Alex Wood 1,110 12.1% 10.5% 1.6% 46.6% 48.0% -1.4%
Lucas Giolito 1,084 8.4% 10.5% -2.1% 41.4% 39.5% 1.9%
Matt Harvey 864 8.1% 10.5% -2.4% 42.7% 40.8% 1.8%
Trevor Richards 541 10.5% 10.5% 0.1% 40.5% 40.4% 0.1%
Luis Perdomo 306 10.1% 10.5% -0.3% 40.8% 56.7% -15.9%
Andrew Heaney 931 12.9% 10.4% 2.4% 41.1% 43.3% -2.2%
Dylan Bundy 1,201 15.1% 10.4% 4.6% 36.1% 35.1% 1.0%
Joe Biagini 362 8.6% 10.4% -1.8% 50.8% 40.3% 10.4%
Jake Odorizzi 1,160 13.1% 10.3% 2.8% 26.1% 36.7% -10.6%
Dillon Peters 437 7.6% 10.3% -2.7% 45.0% 43.3% 1.7%
Brandon McCarthy 1,059 7.6% 10.3% -2.7% 49.8% 48.0% 1.8%
Sean Manaea 1,167 10.0% 10.3% -0.3% 44.5% 50.7% -6.2%
Jason Hammel 1,239 9.8% 10.3% -0.4% 38.2% 43.6% -5.5%
Chris Stratton 1,174 9.4% 10.3% -0.9% 39.1% 39.9% -0.8%
Caleb Smith 1,164 13.6% 10.2% 3.3% 29.1% 41.1% -12.0%
Chris Tillman 520 5.4% 10.2% -4.8% 42.3% 39.8% 2.5%
Steven Matz 1,030 8.4% 10.2% -1.8% 53.2% 48.4% 4.8%
Zack Godley 1,193 11.3% 10.2% 1.1% 51.8% 53.4% -1.7%
Jarlin Garcia 507 8.3% 10.2% -1.9% 38.5% 46.6% -8.0%
Brandon Finnegan 392 7.4% 10.2% -2.8% 40.5% 40.1% 0.4%
Marco Gonzales 1,199 9.2% 10.1% -1.0% 46.6% 46.8% -0.2%
Matthew Koch 877 7.9% 10.1% -2.3% 42.6% 41.4% 1.3%
Junior Guerra 967 11.4% 10.1% 1.2% 39.5% 38.3% 1.2%
Kendall Graveman 624 8.0% 10.1% -2.1% 56.0% 47.8% 8.2%
Jeff Samardzija 665 9.3% 10.1% -0.7% 33.9% 39.8% -5.9%
Matt Moore 989 10.9% 10.1% 0.9% 39.8% 42.6% -2.8%
CC Sabathia 940 10.6% 10.1% 0.6% 43.5% 42.1% 1.3%
Wei-Yin Chen 596 8.6% 10.0% -1.5% 34.3% 35.6% -1.3%
Steven Brault 435 9.4% 10.0% -0.6% 46.8% 47.3% -0.5%
Dan Straily 736 12.0% 10.0% 2.0% 32.8% 38.5% -5.7%
Tyler Mahle 1,216 11.3% 10.0% 1.4% 38.4% 39.8% -1.4%
Ty Blach 981 6.9% 9.9% -3.0% 55.2% 47.5% 7.8%
Andrew Triggs 719 10.6% 9.9% 0.7% 47.9% 56.5% -8.7%
Jon Lester 1,236 10.1% 9.9% 0.2% 39.1% 40.6% -1.4%
Sean Newcomb 1,175 11.5% 9.9% 1.6% 51.1% 39.9% 11.2%
Dallas Keuchel 1,371 9.3% 9.9% -0.6% 56.4% 45.5% 10.9%
Elieser Hernandez 293 10.2% 9.8% 0.4% 32.8% 34.6% -1.8%
Brent Suter 1,026 10.6% 9.8% 0.8% 33.5% 35.5% -2.0%
Ian Kennedy 1,212 9.0% 9.8% -0.8% 30.3% 37.4% -7.1%
Andrew Cashner 1,282 8.5% 9.8% -1.3% 39.6% 42.9% -3.2%
Kyle Hendricks 1,101 9.6% 9.8% -0.2% 48.9% 40.8% 8.1%
Jhoulys Chacin 1,165 8.9% 9.8% -0.8% 40.5% 42.0% -1.5%
Eric Skoglund 791 9.0% 9.7% -0.8% 43.7% 42.4% 1.3%
Mike Leake 1,201 8.7% 9.7% -0.9% 47.7% 53.0% -5.2%
Hector Santiago 607 8.9% 9.7% -0.8% 37.9% 40.8% -2.9%
Martin Perez 417 6.2% 9.6% -3.4% 47.8% 46.6% 1.3%
J.A. Happ 1,290 12.1% 9.6% 2.5% 46.7% 39.1% 7.6%
Zack Greinke 1,203 12.3% 9.6% 2.7% 41.3% 39.7% 1.5%
Ben Lively 419 8.6% 9.6% -1.0% 30.0% 35.9% -5.9%
Miguel Gonzalez 257 7.8% 9.6% -1.8% 38.2% 40.5% -2.4%
Joey Lucchesi 650 12.0% 9.5% 2.5% 43.5% 44.0% -0.6%
Jaime Garcia 904 9.5% 9.5% 0.0% 42.2% 45.2% -2.9%
Tyler Skaggs 1,273 11.5% 9.5% 2.1% 47.2% 39.8% 7.4%
Julio Teheran 1,144 11.0% 9.4% 1.6% 39.1% 42.7% -3.6%
Mike Fiers 1,071 9.0% 9.4% -0.5% 39.9% 36.0% 3.9%
Felix Hernandez 1,281 8.4% 9.4% -1.0% 44.0% 48.7% -4.7%
Jason Vargas 536 11.6% 9.4% 2.2% 35.3% 41.0% -5.7%
Rick Porcello 1,307 9.9% 9.4% 0.6% 48.5% 46.6% 2.0%
Jake Arrieta 1,082 7.9% 9.3% -1.4% 56.1% 49.5% 6.6%
Tanner Roark 1,233 10.3% 9.3% 1.0% 47.5% 39.7% 7.8%
Jose Quintana 1,098 9.4% 9.2% 0.1% 44.5% 40.2% 4.3%
Eric Lauer 664 8.0% 9.2% -1.2% 30.6% 35.5% -5.0%
Drew Pomeranz 725 8.1% 9.1% -1.0% 40.5% 45.7% -5.2%
Alex Cobb 909 7.5% 9.0% -1.6% 51.0% 45.3% 5.7%
Ivan Nova 1,025 10.3% 9.0% 1.3% 52.1% 52.1% 0.0%
Samuel Gaviglio 358 10.6% 8.9% 1.8% 56.5% 48.1% 8.5%
Aaron Sanchez 1,231 10.6% 8.8% 1.7% 51.6% 48.5% 3.1%
Zach Davies 730 8.5% 8.8% -0.3% 48.5% 46.6% 1.9%
Clayton Richard 1,281 10.1% 8.6% 1.5% 58.0% 57.8% 0.2%
Gio Gonzalez 1,217 10.5% 8.3% 2.2% 53.6% 46.3% 7.3%
Doug Fister 1,112 5.7% 8.3% -2.6% 50.9% 48.2% 2.7%
Adam Wainwright 350 6.3% 8.2% -1.9% 54.5% 42.4% 12.2%
Bartolo Colon 1,012 5.6% 7.4% -1.7% 46.6% 48.0% -1.4%
Rich Hill 430 8.1% 7.3% 0.9% 32.0% 35.2% -3.2%
Cole Hamels 1,287 12.9% 7.1% 5.8% 42.8% 44.0% -1.2%
Josh Tomlin 503 9.7% 6.9% 2.8% 27.7% 39.6% -11.9%
Click column headings to sort! (Presorted descending by xSwStr%.)
Data current as of June 10.

Holy Luis Severino.

Garrett Richards is the only pitcher with top-10 xSwStr% and xGB% rates. Maybe if he wasn’t averaging more than one wild pitch per game…

Other names who fall in the top-20 of each: Lance McCullers and… Domingo German.

Double top-30 guys: Charlie Morton, Carlos Carrasco, Luis Castillo.

These lists generally pass the smell test, but it’s obviously worth wondering why guys like Scherzer, Corey Kluber, etc. don’t grade out better here. Don’t ask me! I’m not a computer. But, as aforementioned, using only velocity and movement to “predict” these kinds of things can be limiting. Scherzer’s 12.3% xSwStr% doesn’t mean his whiff rate will fall one-third by year’s end. Take it all with a grain of salt. If anything, I’d consider interpreting the numbers as if they were arsenal scores — but, like, peripherals-based rather than outcomes-based arsenal scores.

And remember, the xGB% estimates are really noisy.

Takeaways

Is this model predictive? I don’t know. It’s hard enough for a pitcher to repeat his own pitch, let alone for another to try to replicate it. The average values used in this model paint a picture of a pitcher’s “typical” performance. It essentially assumes he’s static, which offends my sabermetric sensibilities. But I don’t know how else to approach this, really. It takes a few hundred pitches for any specific pitch type to become statistically reliable. I don’t know theoretically defensible it is to cherry-pick velocity and movement from April and extrapolate it. Baseball players are not robots.

Fact of the matter is I don’t know how else to use these results other than, hey, this guy’s xStStr% is way lower than his actual SwStr% for his curve — maybe it’s overperforming a little bit. In this scenario, I’ll willingly (ignorantly) conflate descriptiveness and predictiveness. But also, admittedly, I have no idea how worthwhile that is. Maybe the only purpose of this is to simply exist as having been pursued. Mostly, I’m using it as a way to reconsider interesting names I had otherwise disregarded in some capacity (Fernando Romero, Domingo German, German Marquez, Reynaldo Lopez) and to keep the faith in others (Luis Castillo, maybe).





Two-time FSWA award winner, including 2018 Baseball Writer of the Year, and 8-time award finalist. Featured in Lindy's magazine (2018, 2019), Rotowire magazine (2021), and Baseball Prospectus (2022, 2023). Biased toward a nicely rolled baseball pant.

5 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
jimcalmember
5 years ago

Aaron Nola and Kyle Hendrick look extremely mediocre here.