A Spring Training Stat That Matters (I Swear)

March 28, 2017

Edit (3/29/17, 7:55 pm EDT): Brent Hershey of BaseballHQ and Ron Shandler’s Baseball Forecaster (very politely) brought to my attention that this has been done before! By Bill Macey back in 2012. Formerly behind a paywall, it has now been made public for your reading pleasure. I didn’t even know this research existed (so I’m really glad Murphy brought it to my attention); I am always reluctant to ever claim to break ground in this field that progresses so quickly but also has such a rich history of research. Please consider the following research a companion to and external validation of Macey’s work.

* * *

I welcome all constructive criticism. This research is not especially rigorous, but given the nature of the claim — a legitimately significant spring training statistic! — it merits the disclaimer.

I found a statistically significant spring training statistic.

I’d rather not rehash the history of research and speculation regarding The Spring Training Stat(s) That Matter. Just know that, outside the modest results from this Dan Rosenheck piece in The Economist, it’s generally accepted that Spring Training statistics mean virtually nothing, and you’ll read all manners of baseball writers bashing this notion.

The big caveat is most of this research concerns individual players. Mine: team-level statistics. Alas, it’s an inherently different beast with which I’m dealing. Despite small within-year populations (30 teams rather than hundreds of players), the observation-level sample sizes are much larger (hundreds of plate appearances rather than dozens), making the odds of finding meaningful correlations much better despite fewer data points.

Per usual, I buried the lede: a team’s rate of stolen base attempts (calculated from stolen bases [SB] plus caught stealing [CS]) during spring training is actually meaningful. I’ll get to the implications of this later because there are many. First, let’s dig into the guts of the research. I gathered team-level spring training statistics from 2006 through 2016 and paired it with regular season statistics from the same span plus 2005.

A couple of quick correlations, using the Pearson correlation coefficient:

Current-year spring training SBs vs current-year regular season SBs: r = 0.41
Current-year spring traning attempts (SB+CS) vs current-year regular season attempts: r = 0.48

Both of these results surprised me. A Pearson r of 0.41 is not particularly strong, but nearing 0.5 — close to a 0.25 r-squared — is an indication of a weak-bordering-on-moderate correlation. (Which isn’t promising, when I say it like that, but it is something.) Then again, this isn’t particularly surprising. You could probably do the same for many statistics and at least see some sort of statistically significant correlation without scaling for playing time because (1) every team plays 162 regular season games, usually; (2) every team plays 30-something spring games, making any kind of scaling not really necessary; and (3) spring skills in aggregate probably carry over relatively well into the the season. In other words, I wouldn’t be surprised to know that teams in spring training at least slightly resemble their regular-season selves.

Ultimately, the whole purpose for this is, for the most part, identifying teams that choose to try to steal more bases — that’s why this is interesting to me in the first place. Stolen bases attempts are the most controllable aspect of the game (arguably), so upticks (or decreases) in attempt rates, if meaningful, could have important fantasy implications.

Teams attempt more stolen bases during spring training — I imagine this is a function of the somewhat lax nature of spring training combined with players trying to win jobs or test particular skills in a low-leverage context. I verified this by calculating attempt rate (att%) as attempts divided by opportunities, the latter of which is computed for simplicity as singles plus walks:

Stolen Base Attempt Rates (att%)

	Spring	Season
2006	10.86%	8.56%
2007	10.27%	8.51%
2008	10.80%	8.43%
2009	10.79%	9.05%
2010	10.70%	9.22%
2011	11.54%	10.49%
2012	12.19%	10.26%
2013	10.51%	8.66%
2014	10.62%	8.98%
2015	10.33%	8.48%
2016	10.35%	8.33%

att% = (SB+CS)/(1B+BB)

Knowing this, it’s important to index attempt rates — in other words, scaling them around the average attempt rate, as we might with statistics such as ERA+, OPS+, and so on. Then I calculated percentage change the way you normally would, in the form of (Y_t–Y_t-1)/Y_t. This methodology makes the effort more predictive than descriptive, which is important for the sake of trying to predict fantasy performance (obviously).

The correlation I wanted to test involved calculating the following percentage changes:

Season Δatt%: Y_t-1 season att% to Y_t spring att%
Spring Δatt%: Y_t-1 season att% to Y_t season att%

The hypothesis is an increase in indexed attempt rate from last year’s regular season to this year’s spring training will correlate with an increased in indexed attempts from last year’s regular season to this year’s regular season. In fewer words, do spring training attempt rate gains carry over into the regular season? Why, yes, they do.

Season Δatt% = 0.439*(Spring Δatt%) + 0.017
Adjusted r²: 0.34

(Evaluating this relationship using a simple measure of correlation, such as the Pearson correlation coefficient used in the bulleted list above, produces r = 0.58. This exceeds the correlation coefficients from the bulleted list and is equal to the unadjusted r² of for the above equation.)

We might be spoiled by some of the remarkably strong correlations seen in equations such as xBABIP, xISO, and so on. Know, however, that a 0.34 r² is nothing to sneeze at.

However, these results could be interpreted in ways that don’t necessarily align with my hypothesis. I would hope that increased attempts during spring training would indicate a fundamental methodological shift by a particular team. It might be such that a team steals more bases in spring simply because it now has faster players, both at the major- and minor-league levels. Maybe it’s a combination of both. Maybe it’s one begetting the other — the promotion and/or acquisition of faster players inducing a methodological shift.

So, that’s it. Maybe it’s not actually that exciting. But in light of the bleak landscape of The Spring Training Stat(s) That Actually Matter, this felt at least like a minor breakthrough.

Of course, here are your biggest changes in indexed stolen base attempt rates from 2016 regular season into 2017 spring training, and how that reflects upon the upcoming season:

Indexed Stolen Base Rates (att%)

Team	2016 Regular Season	2017 Spring Training	% Change
Orioles	28.4	110.4	+289%
Cardinals	53.1	118.0	+122%
Angels	90.4	195.2	+116%
Mariners	69.3	118.6	+71%
Blue Jays	63.7	107.4	+69%
Rangers	117.3	160.5	+37%
White Sox	96.9	130.4	+35%
Rockies	86.1	108.4	+26%
Red Sox	81.6	95.3	+17%
Yankees	80.7	92.7	+15%
Dodgers	60.3	66.8	+11%
Braves	89.8	93.0	+4%
Cubs	78.1	80.3	+3%
Giants	89.7	89.1	-1%
Astros	125.4	123.5	-2%
Pirates	122.5	115.2	-6%
Athletics	65.9	61.7	-6%
Tigers	71.0	63.1	-11%
Royals	135.4	119.5	-12%
Nationals	134.0	112.8	-16%
Mets	52.3	43.9	-16%
Marlins	80.7	65.5	-19%
Padres	162.0	129.9	-20%
Twins	105.9	78.6	-26%
Rays	93.8	68.6	-27%
Indians	137.7	94.6	-31%
Brewers	198.8	123.3	-38%
Diamondbacks	143.4	81.8	-43%
Reds	165.8	93.3	-44%
Phillies	130.4	64.5	-50%

Indexed att% = ([team SB+CS) – (lg avg SB+CS)] / (lg avg SB+CS)
2017 Spring Training stats as of Monday, March 27

Notes:

As someone who pays zero attention to spring training stats, I was floored when I saw the Orioles at the top of this list. Needless to say I was disappointed after rushing to see if Manny Machado topped the list of steal attempts. He, in fact, has zero, and their attempts are composed entirely by current bench bats and minor-league depth. Boo. This kind of (negative) context is important. Still, it doesn’t preclude Baltimore’s regular hitters from running more during the season. (I mean, Machado reached base literally six times this spring — woof — so the opportunities weren’t there to begin with.)
With offseason additions Cameron Maybin, Ben Revere and Eric Young Jr. running wild (as we’ve come to expect them to), Mike Trout living up to his self-appointed goal of running more this season, and even C.J. Cron stealing more bases in 64 plate appearances than he did in six times as many last season, the Angels look like they’re going to run wild in 2017. If there’s any team to watch in this regard, it might be this one. Revere, with seven attempts in 53 PAs (which includes eight walks), looks poised for a bounceback, albeit in a part-time role; Maybin looks similarly poised to carry his career-best 2016 season into this year; and Cron might do his best Paul Goldschmidt impression for all we know. Everyone’s talking about the Mariners’ speedy outfield, but maybe it’ll be Los Angeles of Anaheim’s that ultimately wins our hearts.
(But, speaking of the Mariners: Jarrod Dyson, Mitch Haniger and Jean Segura continue to run, run, run.)
The most profoundly disappointing team on this list: the Brewers. They won our hearts with their lukewarm bats and plus legs last year, but their pace has slowed markedly this season (albeit still rests comfortably above average). Keon Broxton fans should be excited to learn (or be reminded) that he is attempting stolen bases about one-third of the time (in the context of this study), and the Hernan Perez doubters will be disappointed to learn he, too, continues to hit dongs and swipe bags, with three apiece in a mere 40 PAs. All that said, should we be concerned about Jonathan Villar? Three attempts in roughly eight opportunities, but zero successes to show for them. Personally, I’m not concerned, but I understand why someone might sour on the lack of success.

As aforementioned, I would love to hear any feedback you might have. This is an exercise in predictiveness, but the conflation of intent — whether it’s speedy runners moving to new teams (or from old teams), or it’s actually a team deliberately makes more attempts — hugely impacts this study. The evidence suggests the two might beget each other, and it’s hard to tell which one is the chicken and the other the egg. When you consider a team like the Angels and the cause of their stolen base surge, it’s a no sh*t moment — I mean, I’ll be the first to admit these aren’t the most revelatory results when subjected to intellectual duress. But knowing someone like Cron might keep running because his teammates are doing so? That’s not bad.

No matter what, the statistical significance of the model suggests there’s validity to mining for extra stolen bases using this method. Simply use your best judgment when investigating further.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG