Fitting Running Speed into xOBA and xBABIP.
To date, my various xStats have made no attempt to account for batter speed, and the omission has amounted to one of the most glaring weaknesses of the stats. That changes today. As of this morning, I have implemented a method for estimating batter speed. This is my first real crack at the problem, it is most assuredly a work in progress, but it seems to be offering up better results. Allow me to explain.
How I am estimating speed. Since I don’t have access to clocked running times, and my database doesn’t even have base runner data such as stolen bases, I have to be creative in how I estimate speed. Earlier this week I had a Eureka moment, if you could call it that, regarding infield ground balls. I have noticed that a lot of players who have wildly differing BABIP and xBABIP scores also tended to have more ground balls, along with above average foot speed. So, I have decided to use this observation in my favor. It isn’t perfect, slow batters do get infield hits from time to time, but they are uncommon and often reliant on misplays, luck, or both. Fast runners, though, they do seem to have the ability to get on base more consistently on infield hits.
I am defining an infield hit as one in which the ball travels no further than 90 feet. Getting some of the math out of the way, I am creating a simple ratio; the actual number of infield singles divided by the expected number of infield singles based upon launch angle and velocity. I am calculating this ratio both for the players and for the whole league. I then divide the player’s ratio by that of the league to generate the player’s ‘speed’. As you might expect, faster runners, like Ben Revere and Billy Burns sit atop this leader board, while slower runners like Jarrod Saltalamacchia and David Ross are towards the bottom.
Next, I wanted to do something similar for doubles and triples. I didn’t feel the need to work in any specific batted ball distance specifications for these, but the calculation is otherwise the same. The ratio of actual hits divided by expected hits, and then divided by the league wide ratio to generate speed values for doubles and triples. Again, guys you’d probably expect are near the top of these speed categories: Kevin Pillar, Jose Altuve, Jarrod Dyson, and Kevin Kiermaier. However, maybe a few guys you wouldn’t expect as there as well, such as Daniel Murphy and A.J. Pierzynski, so perhaps this method could use some refinement. Below you will find a table including all of the 1B, 2B, and 3B speed numbers for batters with 100 or more plate appearances (as of the morning of June 8th). You will also see their total speed score, which is the sum of the three component scores.
Sorting the players by their total speed scores appears to do a good job sorting the players by their speed. The top few players are Billy Burns, Ben Revere, Jarrod Dyson, Alcides Escobar, and Paulo Orlando. The bottom few players are Dae-Ho Lee, Curt Casali, Chris Davis, David Ross, and Chris Carter. None of these names stick out as being out of place, so it is at least passing the eye test.
Integrating Speed into xBABIP. I took the difference between BABIP and xBABIP , ran a regression using the three speed stats I have generated, and I ended up with the following coefficients:
Single Coef | Double Coef | Triple Coef |
0.0177 | 0.0351 | 0.0061 |
None of these have particularly great p values, they range from .1 to .01, but they do appear to increase the accuracy of the overall model pretty well, so they are a good starting point. I am taking these coefficients and multiplying them by their respective speed values to create an adjustment for each player. These adjustments are then added to the probabilities for each given batted ball. For example, Alcides Escobar has a 2.2%, 0.3%, and 2.4% increase in singles, doubles, and triples respectively, while David Ortiz has a -0.8%, 2.1% and -0.4% increase in singles, doubles, and triples.
The following table is a small part of the second sheet, “Differences”, of the embedded spreadsheet above. You should recognize wOBA, and xOBA. However, you will also see σOBA, which represents the xOBA stats calculated including these new speed variables. You will also see ΔOBA, which is equal to σOBA – xOBA. In this particular table, you see the top eight gainers and losers as a result of using these speed stats.
name | PA | wOBA | σOBA | xOBA | ΔOBA |
Aledmys Diaz | 211 | .373 | .410 | .340 | .070 |
Daniel Murphy | 231 | .435 | .450 | .389 | .061 |
Billy Burns | 218 | .263 | .284 | .236 | .048 |
J. T. Realmuto | 199 | .315 | .349 | .303 | .046 |
Kevin Pillar | 237 | .274 | .328 | .288 | .040 |
Alcides Escobar | 266 | .251 | .296 | .257 | .039 |
Adeiny Hechavarria | 200 | .261 | .316 | .278 | .038 |
Nori Aoki | 239 | .285 | .309 | .273 | .036 |
Justin Smoak | 173 | .338 | .357 | .380 | -.023 |
Nomar Mazara | 215 | .368 | .328 | .351 | -.023 |
Joey Votto | 240 | .334 | .342 | .365 | -.023 |
Chris Carter | 233 | .337 | .362 | .387 | -.025 |
Brian McCann | 175 | .319 | .300 | .325 | -.025 |
Carlos Gonzalez | 240 | .371 | .316 | .342 | -.026 |
Mark Reynolds | 196 | .342 | .283 | .310 | -.027 |
Adam Duvall | 195 | .385 | .340 | .371 | -.031 |
As for the players who are gaining, you see a few speedsters, guys like Billy Burns, Kevin Pillar and Alcides Escobar. You also see Daniel Murphy, which may seem a bit like an outlier here, but he does possess a very good ability to slap hit when behind in the count. He has a knack for hitting the ball right over the third base bag, or through the 5.5 hole, especially when he is shifted against. That in addition to his high contact, line drive approach leads to quite a few more doubles than you might expect given the rest of his skill set.
Jumping on down to the losers and we have a bunch of pretty slow base runners, just as you’d expect. Votto, Carter, McCann, Reynolds, all quite slow. The one potential surprise here might be Carlos Gonzalez, but he has certainly been losing a lot of speed over the years as the result of his laundry list of knee and leg injuries, including his patellar tendon reconstruction in 2014, which sounds like an injury I wouldn’t wish upon an enemy.
Alrighty, so there you have it, my first attempt at bringing batter speed into the equation for my xStats. Currently all stats on my main google doc are calculated using this method with speed. You can see xAVG, xSLG, xOBP, xBABIP, xOBA for every player from both 2015 and 2016, batters and pitchers. That main doc has undergone a bit of an overhaul this week, changing in a large number of ways. I have added a bunch of stats, removed a few, and made this big change regarding speed as well. If you haven’t seen the doc in a while, take a look, you might not recognize it. The stats are updated every morning, and you can always find it here.
Andrew Perpetua is the creator of CitiFieldHR.com and xStats.org, and plays around with Statcast data for fun. Follow him on Twitter @AndrewPerpetua.
I really like what you did here. Very creative. I’ve been writing analysis pieces on your model over at http://www.southpawseam.com.
I wrote Part 3 this morning where I analyzed batters who have BABIPs that outpace their xBABIPs. Which basically has a list full of speedsters. And this came out like two hours later so great timing!
My suggestion to you would be this: you created a measure of xBABIP as a way to measure the pure hitting statistics of each and every player. I think you should keep it that way. In both my articles where there were large discrepancies between BABIP and xBABIP in either direction, we could point to reasons why. For those with higher BABIPs, they generally are speedy guys and it was good to see that difference. Same thing for those with higher xBABIPs; lots of slow players or shifted left-handers on the list.
xBABIP to me was a hitting statistic that is suppose to tear out the nuances of the game like speed, shift data, etc. Stuff that could explain why a person’s BABIP doesn’t match their xBABIP.
Maybe create a new statistic called zBABIP where you are factoring in both speed and shift data to normalize. Thus we have something to use to say “Hey, based on this player’s speed and the fact they are shifted on to lower their BABIP, this is what their batting average should be.
Thus you get to the point where what’s the difference between zBABIP and BABIP? Luck and fortune which I assume is the one of the end goals of this measurement.
I have read what you wrote over there! I saw it posted on reddit. Anywho, I am thinking about having one speed based BABIP and wOBA in my main doc, and keeping everything else nonspeed based. That is probably going to be the end result of all of this, but that will take me a little time to throw together. I’ll probably do it tomorrow or over the weekend. I agree that the nonspeed based approach has its own merits and uses, but I also feel incorporating speed is a must do exercise and probably the future of how all this stuff will end up working. That is a balance that might take me a while to figure out, but for the moment everything in the doc will incorporate speed, because the code is already written and working.
Definitely. Because like I seeing just pure contact skills measured in terms of BABIP. Just another way to evaluate players. Will continue to follow the developments man. Nice work!