# Fitting Running Speed into xOBA and xBABIP.

To date, my various xStats have made no attempt to account for batter speed, and the omission has amounted to one of the most glaring weaknesses of the stats. That changes today. As of this morning, I have implemented a method for estimating batter speed. This is my first real crack at the problem, it is most assuredly a work in progress, but it seems to be offering up better results. Allow me to explain.

How I am estimating speed. Since I don’t have access to clocked running times, and my database doesn’t even have base runner data such as stolen bases, I have to be creative in how I estimate speed. Earlier this week I had a Eureka moment, if you could call it that, regarding infield ground balls. I have noticed that a lot of players who have wildly differing BABIP and xBABIP scores also tended to have  more ground balls, along with above average foot speed. So, I have decided to use this observation in my favor. It isn’t perfect, slow batters do get infield hits from time to time, but they are uncommon and often reliant on misplays, luck, or both. Fast runners, though, they do seem to have the ability to get on base more consistently on infield hits.

I am defining an infield hit as one in which the ball travels no further than 90 feet. Getting some of the math out of the way, I am creating a simple ratio; the actual number of infield singles divided by the expected number of infield singles based upon launch angle and velocity. I am calculating this ratio both for the players and for the whole league. I then divide the player’s ratio by that of the league to generate the player’s ‘speed’. As you might expect, faster runners, like Ben Revere and Billy Burns sit atop this leader board, while slower runners like Jarrod Saltalamacchia and David Ross are towards the bottom.

Next, I wanted to do something similar for doubles and triples. I didn’t feel the need to work in any specific batted ball distance specifications for these, but the calculation is otherwise the same. The ratio of actual hits divided by expected hits, and then divided by the league wide ratio to generate speed values for doubles and triples. Again, guys you’d probably expect are near the top of these speed categories: Kevin Pillar, Jose Altuve, Jarrod Dyson, and Kevin Kiermaier. However, maybe a few guys you wouldn’t expect as there as well, such as Daniel Murphy and A.J. Pierzynski, so perhaps this method could use some refinement. Below you will find a table including all of the 1B, 2B, and 3B speed numbers for batters with 100 or more plate appearances (as of the morning of June 8th). You will also see their total speed score, which is the sum of the three component scores.

Sorting the players by their total speed scores appears to do a good job sorting the players by their speed. The top few players are Billy Burns, Ben Revere, Jarrod Dyson, Alcides Escobar, and Paulo Orlando. The bottom few players are Dae-Ho Lee, Curt Casali, Chris Davis, David Ross, and Chris Carter. None of these names stick out as being out of place, so it is at least passing the eye test.

Integrating Speed into xBABIP. I took the difference between BABIP and xBABIP , ran a regression using the three speed stats I have generated, and I ended up with the following coefficients:

Speed Coefficients
 Single Coef Double Coef Triple Coef 0.0177 0.0351 0.0061

None of these have particularly great p values, they range from .1 to .01, but they do appear to increase the accuracy of the overall model pretty well, so they are a good starting point. I am taking these coefficients and multiplying them by their respective speed values to create an adjustment for each player. These adjustments are then added to the probabilities for each given batted ball. For example, Alcides Escobar has a 2.2%, 0.3%, and 2.4% increase in singles, doubles, and triples respectively, while David Ortiz has a -0.8%, 2.1% and -0.4% increase in singles, doubles, and triples.

The following table is a small part of the second sheet, “Differences”, of the embedded spreadsheet above. You should recognize wOBA, and xOBA. However, you will also see σOBA, which represents the xOBA stats calculated including these new speed variables. You will also see ΔOBA, which is equal to σOBA – xOBA. In this particular table, you see the top eight gainers and losers as a result of using these speed stats.

Sixteen Batters Most Effected by Including an Estimate for Speed in xOBA
 name PA wOBA σOBA xOBA ΔOBA Aledmys Diaz 211 .373 .410 .340 .070 Daniel Murphy 231 .435 .450 .389 .061 Billy Burns 218 .263 .284 .236 .048 J. T. Realmuto 199 .315 .349 .303 .046 Kevin Pillar 237 .274 .328 .288 .040 Alcides Escobar 266 .251 .296 .257 .039 Adeiny Hechavarria 200 .261 .316 .278 .038 Nori Aoki 239 .285 .309 .273 .036 Justin Smoak 173 .338 .357 .380 -.023 Nomar Mazara 215 .368 .328 .351 -.023 Joey Votto 240 .334 .342 .365 -.023 Chris Carter 233 .337 .362 .387 -.025 Brian McCann 175 .319 .300 .325 -.025 Carlos Gonzalez 240 .371 .316 .342 -.026 Mark Reynolds 196 .342 .283 .310 -.027 Adam Duvall 195 .385 .340 .371 -.031

As for the players who are gaining, you see a few speedsters, guys like Billy Burns, Kevin Pillar and Alcides Escobar. You also see Daniel Murphy, which may seem a bit like an outlier here, but he does possess a very good ability to slap hit when behind in the count. He has a knack for hitting the ball right over the third base bag, or through the 5.5 hole, especially when he is shifted against. That in addition to his high contact, line drive approach leads to quite a few more doubles than you might expect given the rest of his skill set.

Jumping on down to the losers and we have a bunch of pretty slow base runners, just as you’d expect. Votto, Carter, McCann, Reynolds, all quite slow. The one potential surprise here might be Carlos Gonzalez, but he has certainly been losing a lot of speed over the years as the result of his laundry list of knee and leg injuries, including his patellar tendon reconstruction in 2014, which sounds like an injury I wouldn’t wish upon an enemy.

Alrighty, so there you have it, my first attempt at bringing batter speed into the equation for my xStats. Currently all stats on my main google doc are calculated using this method with speed. You can see xAVG, xSLG, xOBP, xBABIP, xOBA for every player from both 2015 and 2016, batters and pitchers. That main doc has undergone a bit of an overhaul this week, changing in a large number of ways. I have added a bunch of stats, removed a few, and made this big change regarding speed as well. If you haven’t seen the doc in a while, take a look, you might not recognize it. The stats are updated every morning, and you can always find it here.

We hoped you liked reading Fitting Running Speed into xOBA and xBABIP. by Andrew Perpetua!

Please support FanGraphs by becoming a member. We publish thousands of articles a year, host multiple podcasts, and have an ever growing database of baseball stats.

FanGraphs does not have a paywall. With your membership, we can continue to offer the content you've come to rely on and add to our unique baseball coverage.

Andrew Perpetua is the creator of CitiFieldHR.com and xStats.org, and plays around with Statcast data for fun. Follow him on Twitter @AndrewPerpetua.

Member
JosephGanann

I really like what you did here. Very creative. I’ve been writing analysis pieces on your model over at http://www.southpawseam.com.

I wrote Part 3 this morning where I analyzed batters who have BABIPs that outpace their xBABIPs. Which basically has a list full of speedsters. And this came out like two hours later so great timing!

My suggestion to you would be this: you created a measure of xBABIP as a way to measure the pure hitting statistics of each and every player. I think you should keep it that way. In both my articles where there were large discrepancies between BABIP and xBABIP in either direction, we could point to reasons why. For those with higher BABIPs, they generally are speedy guys and it was good to see that difference. Same thing for those with higher xBABIPs; lots of slow players or shifted left-handers on the list.

xBABIP to me was a hitting statistic that is suppose to tear out the nuances of the game like speed, shift data, etc. Stuff that could explain why a person’s BABIP doesn’t match their xBABIP.

Maybe create a new statistic called zBABIP where you are factoring in both speed and shift data to normalize. Thus we have something to use to say “Hey, based on this player’s speed and the fact they are shifted on to lower their BABIP, this is what their batting average should be.

Thus you get to the point where what’s the difference between zBABIP and BABIP? Luck and fortune which I assume is the one of the end goals of this measurement.