Nearly three years ago, I unveiled my original xHR/FB rate. Back then, Statcast was in its infancy, and wasn’t installed in all 30 stadiums until the upcoming season. As such, my original equation used metrics that Jeff Zimmerman provided me from scraped Gameday data, I believe. The equation was solid enough, producing a 0.649 adjusted R-squared. Clearly, there was more work to be done, but sadly, the data required to make improvements simply wasn’t available.
Then Statcast came along and the newest statistical revolution began. I became giddy, playing around with all its fancy new metrics, eventually leading to the introduction of my Statcast charged xHR/FB rate. This equation was composed of just three variables like my original, but the data was far easier to collect. Two of the components were available in our Splits Leaderboard, while the remaining one was found in the Statcast Leaderboard. Bonus — the adjusted R-squared jumped to 0.682, which was a small, but meaningful improvement.
But then this happened…
…and it broke my equation. Over the last two years as this power surge was ongoing, there has been rampant speculation and research into the cause. We seemingly still don’t have an official definitive answer, but rather some likely culprits, such as less drag/a lower spin rate, thanks to a changed ball. Unfortunately, we don’t have this data, so I can’t just throw spin rate into my regression and call it a day.
Because I was baffled as to how to fix my broken equation, I cried out for help. Then I did so again on Twitter while working on my fix. Thanks to all who provided suggestions.
In my quest to fix my broken equation, I first settled on the suggestion by the majority of commenters of adding some sort of seasonal constant. Whether I simply multiply all xHR/FB rates in a particular season by a constant to force the league average xHR/FB mark to match the actual league HR/FB rate, or add a season constant as a regression variable and let the math determine its value, I thought that was an idea worth investigating.
After about six different regression equations, including throwing in the league average HR/FB rate as a variable, and even experimenting with FB%, I was making dramatic progress. My xHR/FB rates were getting much closer to the actual marks, while the adjusted R-squared peaked at around 0.75, representing an even larger jump than from my non-Statcast xHR/FB rate to Statcast xHR/FB rate.
But then, a long Slack conversation ensued with the amazing Andrew Perpetua, who summed up the problem quite succinctly:
MLB changed the ball to have less drag, drag is making the ball go like 5-6 feet further, and that’s enough to change HR probability by like 10%.
Essentially, batters are hitting the ball as hard as usual, but the ball is simply traveling further. As mentioned above, we don’t have the drag (spin rate) stats, so boo.
Then, Andrew suggested I incorporate Avg FB Distance. I had no idea this was a metric available on Statcast, but indeed it is! The thing was, my initial, antiquated equation used Avg FB Distance, a statistic I wanted to move away from because I only wanted skill-based metrics (like Barrels), not results. Nevertheless, I was quickly convinced that it should be an included variable, as it would basically act as a proxy for all the ball-related changes that occurred. This is how I got to that point:
you’re using barrels to measure the skill
and the distance to measure the barrel lol
I dunno, its just something I think you should try. maybe it wont work.
Pod: basically you’re saying use the distance to quantify how good the barrel is, because not all barrels produce homers at the same rate, right?
Andrew: well, yeah, kinda. well, look at it this way. the barrel is how hard you hit the ball. but there are other factors that turn hard hits into homers. like what if a batter has higher spin rate? that would mean higher distances, and more homers. but barrel doesn’t measure spin. distance sort of covers variables that barrels doesn’t
barrels gets you most of the way there but there are variables you need to bridge the gap. and if we’re talking about trying to simplify everything as much as possible, it makes sense to start with distance.
Makes perfect sense. Throwing Avg FB Distance into the regression worked just fine (it replaced my season constants, which was a positive move in my mind), but it merely just matched the peak adjusted R-squared marks from my other equations that had those season constants.
Then I had an epiphany. Why in the world am I using Barrels per Batted Ball Event (Brls/BBE)?!?!?! Over the entire 2017 season, only four ground balls were classified as Barrels (I’m not even sure how four apparently passed the threshold, probably an error)! Ground balls should never be barrels! Pop-ups? ZERO BARRELS! So with this realization, it dawned on me that a ground ball heavy hitter would very obviously have fewer barrel opportunities, while an extreme fly ball hitter would have greater barrel opportunities. And since the Brls/BBE denominator is all batted balls, clearly the fly ball hitter would have a major advantage in the metric.
I wanted to eliminate that advantage and create a level playing field. So instead of dividing barrels by all batted ball events, I decided to recalculate it as Brls/FB (as I was writing this, I tested Brls/FB+LD, expecting an even better result, but it was actually slightly worse). BOOM, this was the answer, as my adjusted R-squared rose to heights never before seen, RMSE plummeted, and Domingo Santana, he of the power hitting sub-30% flyballer, finally gets his respect and is no longer punished for his scarcity of flies.
After the most fantastical lede burying in history, it’s time to unveil the latest incarnation of the Statcast-fueled xHR/FB rate (feel free to cut the number of decimals):
xHR/True FB = -0.357355596 + (Brls/True FB * 0.470053772) + (FB Pull% * 0.241535642) + (FB Oppo% * 0.05235448) + (Avg FB Distance * 0.000833802) + (HR Park Factor * 0.000630999)
Adjusted R-Squared = 0.792 | RMSE = 1.1159
Equation Population Used = Batters >= 60 BBE, 2015-2017
Brls/True FB = # of barrels / (# fly balls – # of pop-ups) | barrels source: Statcast Leaderboard
FB Pull% & FB Oppo% = Pull% & Oppo% on fly balls | source: FanGraphs Splits Leaderboard
Avg FB Distance = average fly ball distance | source: Statcast Search
HR Park Factor = HR park factor for each season in population, halved (110 HR PF on Statcorner = 105 for equation); switch-hitters received a 69%/31% L/R park factor split after halving| source: Statcorner
True FB is now the denominator you ask? Yes! A pop-up is never a home run, so why are we using all fly balls as the denominator? I removed them from all my calculations, so what you have above is going to result in a higher number than you expect and see on FanGraphs. To convert the above into xHR/FB, we need to add back the pop-ups to fly balls:
xHR/FB = [xHR/True FB * (FB – Pop-Ups)] / FB
And everything together now:
xHR/FB = [(-0.357355596 + (Brls/True FB * 0.470053772) + (FB Pull% * 0.241535642) + (FB Oppo% * 0.05235448) + (Avg FB Distance * 0.000833802) + (HR Park Factor * 0.000630999)) * (FB – Pop-Ups)] / FB
It looks like a mess, but all you need to do is plug the formula into Excel once with inputs to create a calculator and you’re all set!
You may have noticed that in my original equation, I lumped Pulled and Opposite field fly balls together. My thinking was that if we already know a player’s Brls/BBE, it doesn’t matter whether his fly balls went to the pull side or the opposite field, just that it went toward the lines and not to center. Turns out, I was wrong.
Aha moment (thanks again Andrew!) — not all barrels are created equal. Some barrels are better than others and result in a higher percentage chance of flying over the wall. A higher FB Pull% would capture additional potential homers coming from non-barreled fly balls. Put another way, not all homers come from barreled balls. So if a fly ball misses the barrel classification and we know nothing else about the event, we want the ball to be pulled for a better chance of landing on the other side of the fence. So I split the two and it did improve the equation slightly.
As discussed earlier, the Avg FB Distance variable is supposed to capture all the extra stuff happening with the ball that we don’t have the data to quantify yet. Also remember, not all homers result from barreled balls, so perhaps batters are hitting their non-barrels further, which should increase HR/FB rate, but would have been missed in my previous equation.
The HR Park Factor is a carryover from the previous equation, though it was never fully discussed. Let’s get this out of the way — it’s far from perfect. There’s actually no great way to incorporate park factors, but an attempt needs to be made. The problem is many-fold. First, all batters are affected differently by a park’s features, so there isn’t a one park factor fits all. It’s an average, but it doesn’t mean two batters will be affected exactly the same and at that rate.
Second, what drives a park’s home run factor differs among parks. For example, Chase Field boosts exit velocty (EV), which basically shows up in Barrels, while I believe that Coors Field increases distance given the same EV. Therefore, Coors Field would likely require a true park factor adjustment to the equation, while Chase Field less so since the positive HR factor is mostly reflected in the EV, which is accounted for in Barrels.
I tested a regression including the HR Park Factor inside (which I ultimately chose to use) and also a version whereby I run the regression, but then multiply the final rate by the park factor. The former method was better on the whole, but both had their misses when looking at HR/FB – xHR/FB by team.
Last is the season data for each of the equation components and HR/FB rates (using my population of 1,347 player seasons total):
|Season||Brls/True FB||FB Pull%||FB Oppo%||Avg FB Dist||HR/True FB||xHR/True FB||HR/FB||xHR/FB|
Quite the improvement in xHR/FB rate versus actual HR/FB rate from my original Statcast xHR/FB equation, eh? It’s interesting to learn that in 2017, batters actually hit fewer Brls/True FB than in 2016, though note the huge spike from 2015. We’re also seeing more fly balls being pulled, a trend that has moved upward for two consecutive seasons, while hitting it the opposite way is going out of fashion. And look at that, despite fewer barrels on fly balls being hit, the average distance on those flies has risen. This is exactly why Andrew suggested I include that metric – fly balls are traveling further despite the same or lower rate of barrels per true fly ball. Essentially, higher quality barrels or fly balls are being hit, which has been the narrative.
Over on the xHR/FB metrics, with and without pop-ups included, we see that the 2017 HR/FB rates really should have been about the same as 2016. So it was either a truly fortunate outcome, or there’s still something that happened in 2017 not being captured in the equation.
Here’s to hoping the 2018 season doesn’t break the equation for a second time, because I’m totally regressioned out.
Once again, a special thanks to Andrew Perpetua who is a genius. Seriously, I want your skills.
Mike Podhorzer is the 2015 Fantasy Sports Writers Association Baseball Writer of the Year. He produces player projections using his own forecasting system and is the author of the eBook Projecting X 2.0: How to Forecast Baseball Player Performance, which teaches you how to project players yourself. His projections helped him win the inaugural 2013 Tout Wars mixed draft league. Follow Mike on Twitter @MikePodhorzer and contact him via email.