Exploring Statcast’s Estimated Swing Speed by Alex Chamberlain August 11, 2017 My favorite part of this year’s World Baseball Classic, aside from the baseball, obviously, was the television broadcasts’ frequent reference to players’ swing speeds. I was floored, even if only because I didn’t know (but should’ve known) we had the technology capable of measuring it. Regarding Major League Baseball and Statcast’s adoption of such a metric, a little birdy told me I shouldn’t hold my breath. Disappointed, I moved on. Then yesterday, while fooling around in Baseball Savant’s Statcast database trying to diagnose the misalignment of Miguel Cabrera’s outcomes with his peripherals, I noticed the database query’s “sort by” function offered an option to sort by “estimated swing speed.” A quick Google search indicates to me the Statcast and MLB Advanced Media team(s) has (have) yet to formally announce this; sprint speed has been the more exciting recent development, apparently. Not to me! I quickly got to work querying the data. I also quickly learned downloading the raw data files that underpin the swing speed summaries previously linked do not include swing speed, which is unhelpful. In other words, swing speed is not communicated to us from Baseball Savant’s organs on a play-by-play basis. I imagine this is by design. So, I was resigned to running a single query that summarized swing speed data at a high level: the average swing speed for every hitter with at least 100 at-bats in a given season, from 2015 through 2017. Here’s what I found. Summary Statistics The average swing speed from the start of the Statcast Era™ until yesterday is 59.6 mph with a minimum of 51.3 mph (2017, Mallex Smith) and a maximum of 66.5 mph (2015, take a guess). Despite some minor fluctuations, the average and standard deviation mph by year suggests swing speeds haven’t changed recently. League Estimated Swing Speed (mph), 2015-17 Year Min Avg Max StdDev 2015 52.4 59.7 66.5 2.26 2016 51.6 59.8 65.5 2.26 2017 51.3 59.3 66.1 2.21 Overall 51.3 59.6 66.5 2.25 SOURCE: Baseball Savant/Statcast Here’s a leaderboard detailing an assortment of the fastest swing speeds the last three years: Hitter Average Estimated Swing Speed, 2015-17 Player Year AB MPH Giancarlo Stanton 2015 437 66.5 Aaron Judge 2017 406 66.1 Nelson Cruz 2016 325 65.5 Giancarlo Stanton 2016 192 64.8 Miguel Cabrera 2016 342 64.8 Miguel Sano 2015 197 64.8 Joey Gallo 2017 205 64.6 Miguel Cabrera 2015 439 64.6 Tyler Flowers 2016 347 64.5 Miguel Sano 2017 251 64.4 Nelson Cruz 2017 297 64.2 Gary Sanchez 2016 449 64.2 Kendrys Morales 2016 228 64.1 David Ortiz 2015 313 64.1 Mike Trout 2015 393 64.1 David Ortiz 2016 423 64 Ronald Torreyes 2017 255 53.9 Jarrod Dyson 2017 312 53.8 Ezequiel Carrera 2017 514 53.6 Dee Gordon 2017 468 53.1 Pedro Ciriaco 2015 228 53 Shawn O’Malley 2016 300 53 Jarrod Dyson 2015 168 52.9 Jonathan Herrera 2015 390 52.9 Ichiro Suzuki 2017 393 52.8 Billy Hamilton 2017 228 52.8 Billy Hamilton 2016 346 52.8 Billy Hamilton 2015 449 52.4 Delino DeShields 2017 206 52 Mallex Smith 2016 277 51.6 Mallex Smith 2017 331 51.3 SOURCE: Baseball Savant/Statcast You’ll rarely see a more obvious correlation before ever statistically verifying it. But verify, I will. Correlations with Other Metrics Isolated Power So, yeah, swing speed evidently correlates with power. In a sample of 1,106 hitters who met the previously specified at-bat threshold, swing speed and isolated power (ISO) exhibited a very strong positive correlation (R = 0.63, on a scale of -1.0 to 1.0). For posterity, its regression coefficients turned up as such: xISO = 0.01638*mph — 0.81394 I also modeled the correlation as a quadratic (i.e., incorporating a mph2 term) in the event there eventually were negative marginal returns on swing speed, such that a swing that’s too fast might result in declining ISO. Alas, the model turned up no such evidence within the realm of realistic outcomes (a maximum average swing speed of, say, 70 mph). In terms of swing speed’s predictive capacity, there existed a weak correlation (R = 0.35) between year-to-year changes in swing speed and ISO, e.g., how much a player’s swing speed changed from 2016 to 2017 compared to how much his ISO changed during that time (n = 590). Contact Rate This may come as no surprise: swing speed is negatively correlated with contact rate (Contact%), albeit weakly (R = 0.35). When modeled as a quadratic (R = 0.36), the model indicated a swing speed around 53 mph might be most ideal for making the most contact. It also suggests more contact trades off for less power. (The inverse of that statement perfectly captures our general idea of power hitters: home runs at the expense of strikeouts.) When modeled as year-to-year differences, there existed no meaningful correlation (R = 0.17). An aside: overall contact rate doesn’t really account for a player’s plate discipline; in hindsight, I think establishing the correlation between swing speed and zone contact (Z-Contact%) might be a more helpful measure, because those swings count much more than the bad ones. Weighted On-Base Average Ultimately, my investigation of Cabrera’s wOBA led me to stumble upon this data in the first place. Swing speed exhibited a moderately strong correlation (R = 0.54) with wOBA… xwOBA = 0.00937*mph – 0.23793 … and the quadratic form of the model indicated no negative marginal returns within the reasonable range of swing speed outcomes. Moreover, the year-over-year model suggests at least a weak correlation (R = 0.33) between changing swing speeds and wOBAs. Conclusions It’s cool stuff. It doesn’t do much yet other than reinforce a lot of what we already know (or, at least, what we thought we knew) as well as what we’re continuing to learn in the era of Aaron Judges, Joey Gallos and Miguel Sanos. I think there is substantial opportunity to use swing speed as a diagnostic tool. For example, Cabrera and Jose Bautista, both having very bad years relative to what we expect of them, have seen their swing speeds decline 2.4 mph and 2.6 mph, respectively, this season. Cabrera’s 2017 swing speed ranks in the top 10 percent the last three years, so it’s still elite, for all intents and purposes. But Bautista’s is barely above average. He’s old; they’re both getting up there. So the big question is, are they playing through injury or descending somewhat ungracefully into old age, as human bodies inevitably do? It’s hard to say, but estimated swing speed is the glaring indicator that betrays all of Cabrera’s peripherals (although, again, his swing speed is still elite, suggesting he can produce admirably, even if it’s not MVP-caliber production). While helpful, it does little to solve the problem of the mismatch between Cabrera’s peripherals and outcomes this year. Alas… Data Issues We know nothing about this data on a granular level. Given there’s evidence the technology that measures exit velocity runs hot or cold depending on the ballpark, such is likely the case for swing speed. We don’t know if this data needs recalibration, but it likely does. We don’t know what the fastest swing is. We don’t know if these data include check swings, half-assed swings, etc., and how Statcast controls for them, if at all. We don’t know how the data look on a weekly or monthly basis and how a player’s swing speed might typically change throughout the course of a season. (Correction: You can find game-level data by choosing “Player & Game Date” as your “group by” function — nothing more granular than that — but you can’t search for more than one or a few selected players at a time. This single-player query for Curtis Granderson shows single-game swing speeds ranging from 26.9 mph to 73.5 mph. In my experience, the query will otherwise time out if you try to pull data for every hitter, and setting a minimum number of ABs affects the “group by” specification. This particular query found 1,629 player-games in 2017 of at least five ABs, so there is hope yet, but that leaves you with a spotty data set.) (Anyway, now that I’m re-learning this at the last minute: Judge’s fastest swing is 84.8 mph; 85.4 mph for Giancarlo Stanton in 2016. So. Wow.) (Back to your regularly scheduled caveats.) We have no idea (yet!) how quickly it becomes reliable (“stabilizes”), although I imagine because it’s subject to nothing but the player’s own talents and efforts (as opposed to a ball in play, subject to countless competing factors), it becomes reliable rather quickly. Good stuff. I’m excited to find a way to incorporate it into projections and player analysis for 2018.