A Spring Training Stat That Matters (I Swear) by Alex Chamberlain March 28, 2017 Edit (3/29/17, 7:55 pm EDT): Brent Hershey of BaseballHQ and Ron Shandler’s Baseball Forecaster (very politely) brought to my attention that this has been done before! By Bill Macey back in 2012. Formerly behind a paywall, it has now been made public for your reading pleasure. I didn’t even know this research existed (so I’m really glad Murphy brought it to my attention); I am always reluctant to ever claim to break ground in this field that progresses so quickly but also has such a rich history of research. Please consider the following research a companion to and external validation of Macey’s work. * * * I welcome all constructive criticism. This research is not especially rigorous, but given the nature of the claim — a legitimately significant spring training statistic! — it merits the disclaimer. I found a statistically significant spring training statistic. I’d rather not rehash the history of research and speculation regarding The Spring Training Stat(s) That Matter. Just know that, outside the modest results from this Dan Rosenheck piece in The Economist, it’s generally accepted that Spring Training statistics mean virtually nothing, and you’ll read all manners of baseball writers bashing this notion. The big caveat is most of this research concerns individual players. Mine: team-level statistics. Alas, it’s an inherently different beast with which I’m dealing. Despite small within-year populations (30 teams rather than hundreds of players), the observation-level sample sizes are much larger (hundreds of plate appearances rather than dozens), making the odds of finding meaningful correlations much better despite fewer data points. Per usual, I buried the lede: a team’s rate of stolen base attempts (calculated from stolen bases [SB] plus caught stealing [CS]) during spring training is actually meaningful. I’ll get to the implications of this later because there are many. First, let’s dig into the guts of the research. I gathered team-level spring training statistics from 2006 through 2016 and paired it with regular season statistics from the same span plus 2005. A couple of quick correlations, using the Pearson correlation coefficient: Current-year spring training SBs vs current-year regular season SBs: r = 0.41 Current-year spring traning attempts (SB+CS) vs current-year regular season attempts: r = 0.48 Both of these results surprised me. A Pearson r of 0.41 is not particularly strong, but nearing 0.5 — close to a 0.25 r-squared — is an indication of a weak-bordering-on-moderate correlation. (Which isn’t promising, when I say it like that, but it is something.) Then again, this isn’t particularly surprising. You could probably do the same for many statistics and at least see some sort of statistically significant correlation without scaling for playing time because (1) every team plays 162 regular season games, usually; (2) every team plays 30-something spring games, making any kind of scaling not really necessary; and (3) spring skills in aggregate probably carry over relatively well into the the season. In other words, I wouldn’t be surprised to know that teams in spring training at least slightly resemble their regular-season selves. Ultimately, the whole purpose for this is, for the most part, identifying teams that choose to try to steal more bases — that’s why this is interesting to me in the first place. Stolen bases attempts are the most controllable aspect of the game (arguably), so upticks (or decreases) in attempt rates, if meaningful, could have important fantasy implications. Teams attempt more stolen bases during spring training — I imagine this is a function of the somewhat lax nature of spring training combined with players trying to win jobs or test particular skills in a low-leverage context. I verified this by calculating attempt rate (att%) as attempts divided by opportunities, the latter of which is computed for simplicity as singles plus walks: Stolen Base Attempt Rates (att%) Spring Season 2006 10.86% 8.56% 2007 10.27% 8.51% 2008 10.80% 8.43% 2009 10.79% 9.05% 2010 10.70% 9.22% 2011 11.54% 10.49% 2012 12.19% 10.26% 2013 10.51% 8.66% 2014 10.62% 8.98% 2015 10.33% 8.48% 2016 10.35% 8.33% att% = (SB+CS)/(1B+BB) Knowing this, it’s important to index attempt rates — in other words, scaling them around the average attempt rate, as we might with statistics such as ERA+, OPS+, and so on. Then I calculated percentage change the way you normally would, in the form of (Yt–Yt-1)/Yt. This methodology makes the effort more predictive than descriptive, which is important for the sake of trying to predict fantasy performance (obviously). The correlation I wanted to test involved calculating the following percentage changes: Season Δatt%: Yt-1 season att% to Yt spring att% Spring Δatt%: Yt-1 season att% to Yt season att% The hypothesis is an increase in indexed attempt rate from last year’s regular season to this year’s spring training will correlate with an increased in indexed attempts from last year’s regular season to this year’s regular season. In fewer words, do spring training attempt rate gains carry over into the regular season? Why, yes, they do. Season Δatt% = 0.439*(Spring Δatt%) + 0.017 Adjusted r2: 0.34 (Evaluating this relationship using a simple measure of correlation, such as the Pearson correlation coefficient used in the bulleted list above, produces r = 0.58. This exceeds the correlation coefficients from the bulleted list and is equal to the unadjusted r2 of for the above equation.) We might be spoiled by some of the remarkably strong correlations seen in equations such as xBABIP, xISO, and so on. Know, however, that a 0.34 r2 is nothing to sneeze at. However, these results could be interpreted in ways that don’t necessarily align with my hypothesis. I would hope that increased attempts during spring training would indicate a fundamental methodological shift by a particular team. It might be such that a team steals more bases in spring simply because it now has faster players, both at the major- and minor-league levels. Maybe it’s a combination of both. Maybe it’s one begetting the other — the promotion and/or acquisition of faster players inducing a methodological shift. So, that’s it. Maybe it’s not actually that exciting. But in light of the bleak landscape of The Spring Training Stat(s) That Actually Matter, this felt at least like a minor breakthrough. Of course, here are your biggest changes in indexed stolen base attempt rates from 2016 regular season into 2017 spring training, and how that reflects upon the upcoming season: Indexed Stolen Base Rates (att%) Team 2016 Regular Season 2017 Spring Training % Change Orioles 28.4 110.4 +289% Cardinals 53.1 118.0 +122% Angels 90.4 195.2 +116% Mariners 69.3 118.6 +71% Blue Jays 63.7 107.4 +69% Rangers 117.3 160.5 +37% White Sox 96.9 130.4 +35% Rockies 86.1 108.4 +26% Red Sox 81.6 95.3 +17% Yankees 80.7 92.7 +15% Dodgers 60.3 66.8 +11% Braves 89.8 93.0 +4% Cubs 78.1 80.3 +3% Giants 89.7 89.1 -1% Astros 125.4 123.5 -2% Pirates 122.5 115.2 -6% Athletics 65.9 61.7 -6% Tigers 71.0 63.1 -11% Royals 135.4 119.5 -12% Nationals 134.0 112.8 -16% Mets 52.3 43.9 -16% Marlins 80.7 65.5 -19% Padres 162.0 129.9 -20% Twins 105.9 78.6 -26% Rays 93.8 68.6 -27% Indians 137.7 94.6 -31% Brewers 198.8 123.3 -38% Diamondbacks 143.4 81.8 -43% Reds 165.8 93.3 -44% Phillies 130.4 64.5 -50% Indexed att% = ([team SB+CS) – (lg avg SB+CS)] / (lg avg SB+CS)2017 Spring Training stats as of Monday, March 27 Notes: As someone who pays zero attention to spring training stats, I was floored when I saw the Orioles at the top of this list. Needless to say I was disappointed after rushing to see if Manny Machado topped the list of steal attempts. He, in fact, has zero, and their attempts are composed entirely by current bench bats and minor-league depth. Boo. This kind of (negative) context is important. Still, it doesn’t preclude Baltimore’s regular hitters from running more during the season. (I mean, Machado reached base literally six times this spring — woof — so the opportunities weren’t there to begin with.) With offseason additions Cameron Maybin, Ben Revere and Eric Young Jr. running wild (as we’ve come to expect them to), Mike Trout living up to his self-appointed goal of running more this season, and even C.J. Cron stealing more bases in 64 plate appearances than he did in six times as many last season, the Angels look like they’re going to run wild in 2017. If there’s any team to watch in this regard, it might be this one. Revere, with seven attempts in 53 PAs (which includes eight walks), looks poised for a bounceback, albeit in a part-time role; Maybin looks similarly poised to carry his career-best 2016 season into this year; and Cron might do his best Paul Goldschmidt impression for all we know. Everyone’s talking about the Mariners’ speedy outfield, but maybe it’ll be Los Angeles of Anaheim’s that ultimately wins our hearts. (But, speaking of the Mariners: Jarrod Dyson, Mitch Haniger and Jean Segura continue to run, run, run.) The most profoundly disappointing team on this list: the Brewers. They won our hearts with their lukewarm bats and plus legs last year, but their pace has slowed markedly this season (albeit still rests comfortably above average). Keon Broxton fans should be excited to learn (or be reminded) that he is attempting stolen bases about one-third of the time (in the context of this study), and the Hernan Perez doubters will be disappointed to learn he, too, continues to hit dongs and swipe bags, with three apiece in a mere 40 PAs. All that said, should we be concerned about Jonathan Villar? Three attempts in roughly eight opportunities, but zero successes to show for them. Personally, I’m not concerned, but I understand why someone might sour on the lack of success. As aforementioned, I would love to hear any feedback you might have. This is an exercise in predictiveness, but the conflation of intent — whether it’s speedy runners moving to new teams (or from old teams), or it’s actually a team deliberately makes more attempts — hugely impacts this study. The evidence suggests the two might beget each other, and it’s hard to tell which one is the chicken and the other the egg. When you consider a team like the Angels and the cause of their stolen base surge, it’s a no sh*t moment — I mean, I’ll be the first to admit these aren’t the most revelatory results when subjected to intellectual duress. But knowing someone like Cron might keep running because his teammates are doing so? That’s not bad. No matter what, the statistical significance of the model suggests there’s validity to mining for extra stolen bases using this method. Simply use your best judgment when investigating further.