How Does Batting Order Affect Stolen Bases?

A couple weeks back I provided some (hopefully) useful tables demonstrating how R and RBI are affected by the team’s overall run scoring projection and where the hitter is positioned in the batting order.

At the time I assumed that a similar analysis for stolen bases would be unnecessary. I knew that runs and RBI are largely affected by team context and batting order, but had a feeling that stolen bases were simply a function of player skill. Maybe with a small hint of lineup effect.

Thankfully, our very own Birchwood Brothers questioned that assumption and asked if I had come across any similar research for stolen bases. So here I am to present those findings. Does a player’s position in the batting order affect stolen base frequency?

What I Found

In searching for past analysis, the only article I was able to uncover was this 2009 piece by Tristan Cockcroft, How do Lineup Spots Affect Steal Attempts?. Ignoring the print-ready format of the article (I can’t locate a normal version), it was intended to provide exactly what the Birchwood Brothers were after. I’ll do my best to summarize Cockcroft’s findings…

He took data from the 2007 and 2008 seasons and located all hitters who amassed 100 plate appearances from more than one spot in the batting order AND who stole more than a total of 20 bases in the season. He looked closely at several individual player examples, but also presented “total” findings for those players as a whole. I thoroughly enjoy all of Tristan’s work, but this piece is odd to me. He seems to come to the conclusion that, if the player in question is a base-stealer, it doesn’t matter where they hit in the lineup.

But that’s not what I see. When I look at the tables of data presented I see a very clear trend that players will steal the most from the 1-, 2-, 8-, and 9- holes. And there’s a noticeable drop off through the middle of the lineup.

Updating That Test

I could have simply reperformed Tristan’s test with new data. But with declines in stolen bases, I decided to lower some of the thresholds. My approach was slightly different, but very similar.

I began by recreating the R and RBI per plate appearance tables, but with a slight twist. Plate appearances are not a great statistic to gauge stolen base opportunities from. A player with a low OBP would receive far less opportunities to steal than one that gets on base at a high clip. Cockcroft realizes this very late in his article.

A better measure might be Baseball-Reference’s neat “Stolen Base Opportunities” stat that’s on each player page. This stat is the number of times a player is on first or second with a base open in front of them. That would be impossible for me to calculate (it could be done if you’re skillful using Retrosheet data), but I can approximate it by taking the number singles, doubles, walks, and hit by pitches for each hitter. Cockcroft alludes to this same measure at the very end of his article. And coincidentally, this is very similar to the approach Mike Podhorzer, now of MLB Network fame, uses to project his own stolen base totals in Projecting X 2.0.

Here are the results of that test:

Stolen Bases per Time on Base
Runs Scored 550-599 600-649 650-699 700-749 750-799 800-849 850-899
League AL NL AL NL AL NL AL NL AL NL AL NL AL NL
Bat 1st 18.6% 21.9% 13.3% 18.0% 15.9% 15.1% 13.5% 16.5% 14.9% 14.4% 13.1% 20.3% 18.1% 18.4%
Bat 2nd 15.2% 12.6% 10.3% 9.0% 9.9% 9.3% 8.3% 9.3% 7.7% 8.1% 8.8% 9.1% 9.5% 12.6%
Bat 3rd 11.7% 5.7% 5.2% 7.3% 5.8% 6.1% 6.0% 6.7% 5.9% 6.6% 3.6% 6.4% 4.9% 6.9%
Bat 4th 1.9% 5.3% 3.2% 4.9% 3.8% 4.2% 3.9% 4.2% 4.0% 4.3% 2.5% 4.5% 3.2% 0.8%
Bat 5th 4.8% 5.4% 5.0% 5.4% 4.2% 5.5% 4.5% 5.2% 4.4% 4.4% 4.4% 5.5% 4.0% 3.2%
Bat 6th 5.9% 6.1% 5.5% 4.8% 6.3% 5.3% 5.3% 5.9% 5.8% 5.7% 5.3% 6.5% 4.7% 4.0%
Bat 7th 8.5% 6.9% 6.5% 5.1% 6.7% 5.2% 6.0% 5.4% 6.0% 5.7% 5.9% 5.1% 5.2% 3.3%
Bat 8th 8.1% 5.7% 7.0% 4.0% 7.3% 4.2% 7.1% 3.8% 6.3% 4.1% 6.8% 5.0% 6.2% 2.7%
Bat 9th 7.2% 5.2% 9.3% 5.5% 10.7% 4.4% 10.1% 3.7% 7.9% 4.4% 8.8% 4.5% 9.5% 2.5%
SOURCE: Calculated from Baseball-Reference.com Data

You can probably identify the stark drop off after the first and second hitters, but it should help to see that same data plotted out:
SB_PER_TOB_600

I’m having a difficult time fitting that chart onto the real estate we have for blog posts. But click on the image to see a larger version.

I have three observations from seeing the data in chart form:

  1. There’s a significant drop off in stolen base frequency as soon as you get past the second spot in the lineup. Things then flatten out pretty evenly until the eight and nine spots.
  2. Team runs scored don’t have a noticeable effect on the frequency. So I can drop team runs scored as a variable for the rest of this study.
  3. The AL and NL don’t seem to have many differences (in the chart, AL is mostly blues and greens, NL is reds, oranges, and browns). It does look like leadoff hitters steal more frequently in the NL. Things are consistent until you get to the eight and nine hitters where the AL steals more frequently. This makes sense anecdotally because the AL is likely to put faster, speedier, poor hitting types lower in the order. The NL does this too, but the presence of the pitcher puts the breaks on guys stealing much down there.

If you’re curious about what the chart looks like with the runs scored strata dropped off:

SB_PER_TOB_600_NO_STRATA

The Weakness in this Test

The glaring weakness here is that player skill level (or speed) is completely ignored. Sure, three-hitters steal less than lead off hitters. But what if the three hitter is Mike Trout or Andrew McCutchen? That three hitter will steal more frequently than Miguel Cabrera. So maybe we just need to look at “fast three-hitters”?

But really, that’s not even good enough. The real questions we want answers to are ones like these:

  • If a fast player is moved from batting second to batting sixth, should we expect fewer steals?
  • Or, if a player with average speed moved from batting seventh to batting leadoff, should we expect more steals?

As Cockcroft pointed out, the only way to determine this is to look specifically at players that hit in more than one spot in the lineup to see how their stolen base frequency was affected by the change. If we look at this within one single season, we can presume the player’s speed didn’t change and the only variable at play is the change in the lineup.

To perform the test, I downloaded the split data for all players “batting first” that had at least 50 plate appearances in 2015 (thank you Fangraphs Leaderboards!). I then repeated that same download for all the other batting positions. After getting all the split data, I identified the players that met the criteria in multiple lineup spots. For example, Billy Hamilton batted leadoff for 208 plate appearances and ninth for 226 more.

In all, I identified 308 players that had at least 50 plate appearances in more than one line up spot. Those 308 players hit in a total of 861 different lineup spots. Wow. That sounds like a lot…

Batting Order Count >50 PAs
Lineup Spot Count (Total 861)
Bat 1st 68
Bat 2nd 104
Bat 3rd 67
Bat 4th 96
Bat 5th 126
Bat 6th 132
Bat 7th 130
Bat 8th 92
Bat 9th 46
SOURCE: Fangraphs Leaderboard

Leadoff, three, and cleanup-hitters seem to be more firmly cemented in the lineup than other spots. Pitchers probably screw up the 9-hole.

So now that we’ve isolated this population of players that changed spots in the order, here’s what their stolen base frequency looked like:

SB_PER_TOB_600_CHANGE

In general, I think that chart looks like the chart for all hitters. Billy Hamilton is causing the distortion in the batting ninth data point in the NL. It seems safe to conclude that the typical hitter moving out of the first or second spot in the order is looking at a decline in stolen bases.

Let’s Keep Digging

That last chart above is “all hitters that moved lineup spots”. What would the chart look like if we looked at players with at least 10 stolen bases FOR THE SEASON (not just in their 50 plate appearance stint, for the whole season)? This sets the population to 36 NL hitters and 33 AL hitters (stole 10 bases and also had at least 50 PAs in two lineup spots).

Here’s that data:

Total Stolen Bases Stole at Least 10 Stole Less than 10
League AL NL AL NL
Bat 1st 15.2% 20.8% 6.6% 5.4%
Bat 2nd 14.1% 13.7% 4.2% 3.3%
Bat 3rd 10.0% 11.2% 1.7% 2.7%
Bat 4th 3.9% 10.4% 2.8% 3.0%
Bat 5th 12.3% 12.7% 2.1% 3.9%
Bat 6th 15.5% 10.2% 3.1% 3.4%
Bat 7th 19.1% 10.1% 3.9% 3.4%
Bat 8th 14.5% 9.3% 4.6% 3.4%
Bat 9th 16.4% 28.9% 4.2% 7.2%
 SOURCE: Fangraphs Leaderboard

And charted out:
SB_PER_TOB_600_10SB

Some interesting things going on here. If you are not a speedster (stole less than 10 bases), lineup may not matter much, because you didn’t steal 10 bases… But you’re still better off hitting at the very top or very bottom of the order.

For players with some speed (stole more than 10 bases), being moved into the three or four spot will be a significant drain on their stolen base attempts, but they’ll still steal more frequently than players without speed.

The NL seems to follow the same pattern as we saw in earlier charts. Anywhere outside of first, second, or ninth is the low ground.

The AL is odd. Perhaps this is coincidence or small sample size issues at play, but it seems to matter very little where you hit, as long as it’s not the three or four-spot. Or maybe it’s not coincidence and it’s the effect of the pitcher in the NL. Especially since some teams bat the hitter 8th now.

Final Conclusions

It’s hard to wade through all of this data. Some of it’s conflicting. There are many variables at play. But where a player hits in the lineup does seem to affect stolen base frequency. You never want to see that a base stealer is going to bat third or fourth. Even players that don’t steal frequently will steal more at the very top or bottom of the order. Outside of the three and four-hole, it may not matter significantly where a base stealer in the AL hits. Batting first, second, or being Billy Hamilton appears to be optimal for stolen base frequency in the NL.

I think there are still two factors to consider that I haven’t even broached. Those two factors are manager preferences and who else is hitting near the player in the lineup. Batting behind someone old and slow like Victor Martinez is going to eliminate many opportunities to steal. Or playing for an aggressive manage may negate some of the trends we spotted above.

One last thought that’s highly applicable to the 2016 season… I’ve heard or read several times that Billy Hamilton’s value may be severely in jeopardy because the Reds won’t bat him leadoff and that means he’ll hit eighth, in front of the pitcher. That doesn’t seem like an option. He only batted first or ninth in 2015. And batting ninth didn’t slow him down.

I could have kept on writing, making colorful charts, and drilling deeper into the data. But I think it’s most useful at this level where I’ve left off. With that said, do you have any specific questions about what you see or what else might be in the data? Fire away in the comments below.





Tanner writes for Fangraphs as well as his own site, Smart Fantasy Baseball . He's the co-auther of The Process with Jeff Zimmerman, and has written two e-books, Using SGP to Rank and Value Fantasy Baseball Players and How to Rank and Value Players for Points Leagues, and worked with Mike Podhorzer developing a spreadsheet to accompany Projecting X 2.0. Much of his writings focus on instructional "how to" topics, Excel, and strategy. Follow him on Twitter @smartfantasybb.

17 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
justscha
8 years ago

Good stuff Tanner, you’re becoming one of my favorite writers.

Really enjoyed your piece on developing a cheat sheet and the snake draft prep. Im not an excel guy but was able to follow the cheat sheet process but in regards to the snake draft, how do I use VLookup? Do you have a step by step process?

Thanks and keep up the good work.

Ayrahvon
8 years ago
Reply to  Tanner Bell

After this, I recommend people look up Index Matching, effectively does the same thing as vlookup, but runs faster and you can index any item left or right of your lookup, whereas vlookup can only find items to the right. =index(‘what you want to find’,match(‘what you are matching in your table,’what your are matching in the index table’,0))